Developing a seed_seq Alternative

M.E. O'Neill

2015-04-30 17:24

In my previous post, I talked about issues with C++11's std::seed_seq. As a reminder, the goal of the std::seed_seq is to generate seed data that can be used to initialize a random number generator. And to allow it to do its job it needs seed data. Seed data in, seed data out. It might seem like a “do nothing” function would be up to the task, but in many situations the seed data you can provide is of variable quality. Some of it is “good” entropy, which changes often, and some of it is “lower quality” entropy which is more predictable or changes less frequently (e.g., the machine id). It also may be that the data that changes often (e.g., the time) changes more in the low-order bits than the high-order bits. So the goal of std::seed_seq is to mix together the seed data that we provide so that the high-quality and low-quality data is well-mixed.

In essence, the task that std::seed_seq does (and thus anything intending to replace it should do) is compute a hash (i.e., a scrambled version) of its input data. The design of hash functions is often (rightly) considered to be something of a black art, but since the same can be said of designing random number generators and I've already done that, I might as well tackle this problem, too.

Design Space

If we're going to develop an alternative to std::seed_seq that fixes some of its deficiencies, we need to understand what some of those issues are. Specifically, std::seed_seq

Is never a bijection if it's generating more than one output—when the amount of output data is equal to (or less than) the amount of input data, it generates some outputs more often than others, and may not generate some outputs at all.
May not be computing a “good” hash (something we'll elaborate on shortly).
Stores all the input seed data verbatim. Because you are allowed to give it an arbitrary amount of seed data, it needs to dynamically allocate memory to store it.

We'll address each of these issues with some design constraints.

Finite Seed Entropy

To address the third point first, we'll adopt a strategy where our std::seed_seq alternative is a fixed (but configurable) size. Usually programmers know either how much entropy they're supposed to provide, or how much they have readily available, so it makes sense to make our entropy store a fixed size. Thus we'll provide two ready-made fixed size types for common use cases,

seed_seq_fe128 (synonym for seed_seq_fe<4, uint32_t>)
seed_seq_fe256 (synonym for seed_seq_fe<8, uint32_t>)

representing four words or eight words of stored ready-mixed seed data. The fe stands for fixed entropy.

Despite the fixed-sized entropy store, we can initialize with less entropy (although that's somewhat foolish—if we have less entropy we could use a smaller entropy store), or more entropy (in which case all the provided entropy is mixed together before being stored in the fixed size container). This latter property is useful if we wish to fill a small entropy store from several low-quality sources of entropy.

Maximal Strict Avalanche

In the parlance of cryptography, good hash functions have good diffusion properties. One property that follows from good diffusion (and one that's often mentioned in the context of hashing), is good avalanche properties—changing one bit in one of the seed inputs affects all the bits of the seed outputs. When a single bit flips every bit in the output with a 50% probability, it is known as strict avalanche.

Strict avalanche refers to the effect of flipping a single bit, but we can generalize the property to multiple bits (“higher-order avalanche”). One way to express this property is to say that if you pick an arbitrary b bits of the input and iterate them through all possible values (while keeping the other bits fixed), any sample of b bits from the output will appear to come from a uniform distribution. Strict (higher-order) avalanche for all choices of b is known as maximal strict avalanche. This property is desirable because we don't know exactly where the “good” entropy might be—it could be all at one end of the input (or scattered throughout in the low order bits)—we want all of it to flow everywhere.

At a glance, std::seed_seq does appear to have good avalanche properties, but both from looking at the algorithm and looking at the distributions its outputs form, it appears to me that it does not satisfy maximal strict avalanche.

Bias Freedom (i.e., Bijection or K-to-1)

Finally, we'd like to address the first point on our list of std::seed_seq issues (and the one we discussed last time)—we want the output to be unbiased to the maximal extent possible.

Specifically, if the entropy store is \(N\) 32-bit words in size, and initialized with \(I\) input words, and producing \(S\) words of output, if

\(I = N \wedge S = N\), the function should be \(1\) to \(1\), a bijection
\(I \ge N \wedge S \le N\), the function should be \(2^{32 (I-S)}\) to \(1\)—in other words, across all possible inputs, all possible outputs repeat the same number of times
\(I < S\) then some outputs cannot be produced (by definition, because there are fewer inputs than outputs) but, if
- \(I \le N\), no outputs should ever be repeated

Basically, we're just stating “be as much of a bijection as you can”.

Output `param`s

One final (somewhat dubious) requirement of the C++11 Seed Sequence concept is that you should be able to clone a seed sequence, but not the normal way via a copy constructor (no, that'd be too easy!) but via a param function which must produce a collection of 32-bit integers which, if passed to the classes's constructor as input seeds, will produce a copy that behaves identically to the original.

In other words, all the entropy mixing we do must be undoable to allow us to recover the seeds we were originally given (or if we gave more seeds than we really needed originally, a functionally equivalent set of seeds).

We can do this—and in some ways it's neat to do it just to show that we can do the unmixing—it's just annoying to be forced to provide this never-actually-used feature.

Basic Design

So, with those design requirements, we can sketch out the basic design. In the diagrams below, the yellow boxes represent the entropy store, initially containing the user-provided seed data, but progressively transformed into a mixed-up version.

The pink circles contain hash functions—each hash function is distinct and provides a distinct 32-bit 1-to-1 mapping. The blue circles are a “mix” function that combines two 32-bit values to make a resulting 32-bit value. (Incidentally, cryptographers call these kinds of mixing functions compression functions because they output fewer bits than they take in, not to be confused with data compression.) We could use multiple mix functions in the same way as we use multiple hashes, but we stick to using the same function throughout.

Initialization (example with four zero seeds!)

The diagram below shows what happens if we initialize a four-word entropy store with four zero seeds.

In the diagram, you can see that we began with a horrible seed, four zeros, and ended up with a random-looking final set of values in the entropy store. If there were more than four seed inputs, we'd add additional rows to the diagram; in each additional row we'd mix the hashed values of that input with each of the entropy-store values.

Output Seed Generation

Once we've initialized the entropy store, we're poised and ready to generate output seeds. This process entails a final round of hashing (with a new set of hash functions) so that if people ask for more seed values than we have in the entropy store, there won't be an obvious correlation between values generated by the same internal entropy.

Avalanche

This diagram gives a sense of how the avalanche property works. Here we change a single bit, and observe how this diagram differs from the all-zeros one we saw earlier. (Data unchanged from the previous diagram is in gray, data that is different is shown in brown).

One thing you might have surmised already is that our algorithm is a \(\mathrm{O}(N \times I)\) algorithm, where \(N\) is the size of the entropy store and \(I\) is the number of words it is initialized with. If we want maximal avalanche, that's alas somewhat unavoidable. Since entropy stores are fixed-size, however, and usually small (four or eight words ought to be ample), this quadratic behavior isn't an issue in practice.

Design Details

Thus far, we have a sense of how information will flow but some details are vague, what are these hash functions (of which we have many) and what is the mix function?

Multiply–Xorshift

For both hashing and mixing, I'll use a currently popular technique, multiply–xorshift. This operation is used as a building block for the mix functions of FastHash and MurmurHash2, and the final mix functions of MurmurHash3, CityHash and xxHash (and the output function of the SplitMix RNG). In other words, it's a very fast and very popular hashing technique.

In multiply–xorshift, we multiply by an odd random-looking constant, and then perform an xorshift on the result; for example, (excerpted from MurmurHash3)

h *= 0xc2b2ae35;    // Binary 11000010101100101010111000110101
h ^= h >> 16;

We could just look at this code and decide that it's some kind of black magic and leave it at that, but instead we'll use two perspectives to understand why it's good for scrambling the bits of an integer (and thus creating good avalanche within a single word).

A Bit-Mixing Perspective on Multiply–Xorshift

In the multiply step, high bits in the word depend in complex ways on the lower bits, but lower bits aren't changed nearly as much—for example, the low-order two bits don't change at all in the multiply step. The xorshift step pushes variation from the high-order bits back down to the lower ones. Just one multiply–xorshift step isn't perfect for creating good avalanche (e.g., the high bit cannot have influenced the low bit). That's why the hash functions I mentioned earlier (CityHash, etc.) apply several multiply–xorshifts. In our case, because we have several repetitions of hashing and mixing, only doing one multiply–xorshift in any one stage is fine.

A Graphical Perspective on Multiply–Xorshift

I like visualizations, so let's also look at what multiply–xorshift does graphically. This time, we'll imagine a much simpler 8-bit version. So we'll be providing a permutation of 256 values.

The figures below show random permutations to give us a feel for what a good permutation might look like:

With an idea of what we might be hoping for, we'll now look at what our hash function does, using an appropriately cut down 8-bit version of the operation:

h *= 0x35;    // Binary 00110101
h ^= h >> 4;

First, let's see what the multiplication and xorshift steps each do by themselves, respectively:

As you can see, these diagrams don't look very random. But our hashing operation doesn't just perform a multiply or an xorshift, it applies both, so let's apply an xorshift to the lefthand one (turning it into multiply–xorshift), and a multiplication to the righthand one (turning it into xorshift–multiply):

As you look at the plots above, you might be disappointed. They're not as regular as the previous plots, but it still looks like there is more order than chaos. But as we repeat the process things become more chaotic and start to look more random.

So let's continue and apply another multiply on the left and another xorshift on the right.

This is quite interesting because it looks like multiply–xorshift–multiply (on the left) does better than xorshift–multiply–xorshift (on the right). That's true in this example, but in general, it depends on what kind of regular structure the input has.

But if we do one more step, so that now each side has two xorshifts and two multiplies, we end up at a similar point:

So from these graphs, we can see that a single multiply–xorshift isn't really sufficient, but a couple of them do a pretty good job. It's also fairly common when repeating the process to use different constants and different shifts in successive steps (on the basis that doing so only adds to the chaos).

The Hash Function(s)

Based on these ideas, we could write a hash function like

uint32_t hash(uint32_t value)
{
    value *= 0xc2b2ae35;
    value ^= value >> 16;
    return value;
}

Notice that this code only does one step of multiply–xorshift, but it will be combined with other hashing and mixing steps, so having a single multiply–xorshift is okay here.

But for our task, we don't need just one hash function, we need many. The obvious fix is to use different multipliers for different hashes. One way to do so would be with a table of multipliers like this: something like this:

size_t which_multiplier = 0;

uint32_t hash(uint32_t value)
{
    value *= MULTIPLIERS[which_multiplier++];
    value ^= value >> 16;
    return value;
}

but then we need enough multipliers in our table for all the hash functions we might produce. Instead, since the hash multipliers are “random-looking” (odd) numbers, instead we can instead randomly generate the multiplier using a MCG (a.k.a. a Lehmer-style LCG), which has the convenient property that its outputs are always odd numbers. This change gives us the following code:

uint32_t hash_multiplier = 0x43b0d7e5;

uint32_t hash(uint32_t value)
{
    hash_multiplier *= 0x931e8875;  // New "random" multiplier

    value *= hash_multiplier;
    value ^= value >> 16;
    return value;
}

Each time we pass through the hash function, we do so with a brand-new multiplier.

There is one final flaw to fix, however. Thus far, our hash function always maps zero to zero. We can fix this issue by xoring our input with a random-looking constant. We have one of those kicking around already, so our final code becomes

uint32_t hash_multiplier = 0x43b0d7e5;

uint32_t hash(uint32_t value)
{
    value ^= hash_multiplier;       // Avoid 0 -> 0

    hash_multiplier *= 0x931e8875;  // New "random" multiplier

    value *= hash_multiplier;
    value ^= value >> 16;
    return value;
}

The Mix Function

The mix function is similarly inspired by these ideas.

uint32_t mix(uint32_t x, uint32_t y)
{
    uint32_t result = 0xca01f9dd*x - 0x4973f715*y;
    result ^= result >> 16;
    return result;
};

The multiplicative constants are somewhat arbitrary; I used constants that are known to be good when used as the multiplier for a MCG random number generator (according to the spectral test).

Developing functions like this one does feel a bit like a black art. In testing, I found that simpler mix functions (such as xor) didn't work very well, and some other simple functions (such as subtraction) worked slightly better, (presumably due to their asymmetry), but still not well enough. But it's still not completely clear to me just what the necessary and sufficient properties are. Was it sensible for me to pick multiplicative constants that do well in the spectral test? They have the maximum possible multiplicative order, but that property, while important for one random-number generation context, is probably of no consequence in this context. (At one point, I considered the constant pair 0x6ca55485 and 0x331e8877 because they gave the mix function slightly better mathematical properties, but I wasn't able to show that these were actually helping in practice, so rather than tinker more, I left the constants as is.)

Final Code

The file randutils.hpp contains the final code for the fixed-entropy seed sequence, as well as code that I'll talk about in later posts in this series.

The code follows the ideas we've outlined above. One annoying thing about C++11's Seed Sequence concept is that we must write a param method that can give the user parameters to recreate the exact same seed sequence, so that needed to be implemented, too. Thankfully, the same design decisions that ensure we have a bijection also make it possible to invert the mixing function, which is exactly what we need to provide param.

Testing

There are several things we can test to make sure our seed sequence concept does its job properly. Since I want to actually get this blog post finished, I've been lazy and not included my testing code, but none of it is difficult to write.

Bijection (and K-to-1) Properties

We've already outlined what these properties are, but it's useful to test them to make sure they actually old. By changing the word size to an 8-bit word, we can exhaustively test the 4-word–entropy version with all possible inputs and make sure that it produces all possible outputs.

It's quite easy and fun to write the necessary test code (especially if you have a machine with plenty of memory), so I'll leave it as one of those legendary “exercises for the reader”. But it passes this test.

(In fact, the we needed to invert the mixing operation to provide the param member function which establishes that it is indeed a bijection.)

Avalanche Properties

It should be the case that, on average, a single-bit change anywhere in the input flips each bit of the output with a 50/50 probability. To test this property, I wrote a tester that considers every possible single-bit change we can make to a 32-bit integer, and tests them all (as we change one 32-bit input through all possible values, we keep the other inputs constant).

In testing, the results were that each 32-bit output had 16 bits on average get flipped, with a standard deviation of 2.8284—exactly what we'd expect for a binomial distribution with 32 trials and a 50% chance of success. In fact, not only does the standard deviation match the expected value of \(2 \sqrt{2}\), but the whole distribution also exactly matches a binomial distribution.

In other words, the avalanche properties look excellent.

RNG Properties

The ultimate test for any hash function is to give it very orderly input (e.g., simple increments) and see if the output is random random enough to pass the stringent tests of RNG test suites.

In other words, we can turn a seed sequence into a random number generator like this:

// Abuse seed_seq_fe to be a random number generator
uint32_t seed_seq_fe_rng()
{
    static uint32_t counter = 0;
    randutils::seed_seq_fe128 seeder{counter,0u,0u,0u};
    ++counter;
    int out;
    seeder.generate(&out,&out+1);
    return out;
}

Before I show you the actual results, let's see what happens if we replace randutils::seed_seq_fe128 in the code above with std::seed_seq and put this RNG through TestU01's SmallCrush suite:

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        std::seed_seq (counter-at:0, result-at:0)
 Number of statistics:  15
 Total CPU time:   00:00:56.06
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
  1  BirthdaySpacings                 eps  
  2  Collision                        eps  
  3  Gap                              eps  
  4  SimpPoker                        eps  
  5  CouponCollector                  eps  
  6  MaxOft                         1 -  1.6e-6
  6  MaxOft AD                       0.9997 
  7  WeightDistrib                    eps  
 ----------------------------------------------
 All other tests were passed

These results are not good. We could argue that we're abusing std::seed_seq by trying to use it like a random-number generator—that seeds don't need to have the same quality as random numbers. But it is nevertheless, not good. It's also pretty slow (these timings were with GCC—it's even slower with Clang, taking 68% longer).

In contrast, here are the results for randutils::seed_seq_fe128:

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        randutils::seed_seq_fe128 (counter-at:0, result-at:0)
 Number of statistics:  15
 Total CPU time:   00:00:12.91

 All tests were passed

We can see that seed_seq_fe128 is both faster and statistically much better. In fact, it doesn't just pass SmallCrush, it passes Crush as well.

In fact if we adjust the code to make counter a 64-bit int, it passes BigCrush too (a 32-bit counter isn't enough because largest BigCrush test uses 12.8 billion numbers, which is significantly more than 2³²).

========= Summary results of BigCrush =========

 Version:          TestU01 1.2.3
 Generator:        randutils::seed_seq_fe128 (counter-at:0, result-at:0)
 Number of statistics:  160
 Total CPU time:   09:06:12.19

 All tests were passed

But what about other positions for the seed data? From the diagrams above, we know that each input position gets mixed in slightly differently. Perhaps later stages are worse. So, here's the code for placing the counter last

// Abuse seed_seq_fe to be a random number generator
uint32_t seed_seq_fe_rng()
{
    static uint32_t counter = 0;
    randutils::seed_seq_fe128 seeder{0u,0u,0u,counter};  // counter at end
    ++counter;
    int out;
    seeder.generate(&out,&out+1);
    return out;
}

On the positive side, this version also passes SmallCrush. But it is actually statistically worse than placing the counter at the beginning; specifically it fails Crush's MaxOft test. That result might seem disappointing, but it's nevertheless performing vastly better than std::seedseq or numerous widely used RNGs. But, more importantly, if any of the other input data changes, even if it changes very rarely, that's enough. In other words, this version has no problem passing BigCrush:

// Abuse seed_seq_fe to be a random number generator
uint32_t seed_seq_fe_rng()
{
    static uint32_t counter = 0;
    static uint32_t bump = 0;
    randutils::seed_seq_fe128 seeder{0u,0u,bump,counter};
    ++counter;
    if ((counter && 0xFFFF) == 0)
        ++bump;
    int out;
    seeder.generate(&out,&out+1);
    return out;
}

In this code, we only update bump 0.0015% of the time, but that's sufficient to make the code pass BigCrush. From that example, we can surmise that if any of the other seed data is worth anything at all, we're fine.

But if, for some reason, you really want our first program to pass all the statistical tests, there is a third template argument that controls how many rounds of mixing are performed. In fact, for tiny entropy pools (two words or less), an extra mixing actually happens automatically, because in those cases the added overhead is tiny.

Conclusions

In this article, we built something that beats std::seed_seq in several respects. It's faster, provably free of bias, empirically statistically better, and it doesn't dynamically allocate memory.

In fact, for the standard C++11 RNGs, our “silly” RNG from abusing seed_seq_fe is actually statistically better the RNG you're initializing.

But even though we're done with this blog post, there is more to do. We have something that does a good job of mixing together entropy from multiple sources, but you still need to get that entropy from somewhere. Where? That'll be the topic for the next two blog posts, covering what's wrong with std::random_device and how to cope with that.