Too Big to Fail

M.E. O'Neill

2017-08-20 11:14

For many PRNGs, the more state bits you give them, the deeper statistical tests need to go to discover their flaws. We'll explore what this phenomenon means, looking at one of the earliest PRNGs ever made, John von Neumann's “Middle Square” method.

Basic Middle Square

John von Neumann first described his Middle Square method in 1949. The idea is this, take a number with n digits, square it to get a number with 2n digits, and then take the middle n as your next value. Repeat as necessary.

Code for a 64-bit implementation is shown below:

uint64_t middle_square() {
    static uint64_t state = 0x58c7ca4eb6f0e618;
    uint128_t squared = state;
    squared *= squared;
    state = squared >> 32;
    return state;
}

If you test this with TestU01's Crush test, we see good results:

========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        Middle Square 64-32
 Number of statistics:  144
 Total CPU time:   00:50:47.42

 All tests were passed

Yet this generator is universally recognized as terrible, and it does badly fail Big Crush. And sometimes, in fact much of the time, when we pick other seeds, it doesn't do nearly so well with Crush either. What is going on?

Exploring Tiny Middle Square

To understand a PRNG more deeply, it's often useful to look at a smaller version (few bits, fewer states). Thus, instead of 64-bit Middle Square, let's look at 6-bit Middle Square and draw out a graph of its state space. We'll use a node for every state, and an edge for transitions from one state to the next.

Graph of 6-bit Middle-Square

From this graph, you can see that it has three cycles, it will either get stuck continuously repeating 0, 8, or 27. For some starting points we'll head into a pathological cycle quickly, for others it'll take a bit of time.

It's a similar story if we go up to eight bits:

Graph of 8-bit Middle-Square

Once again, we eventually end up in pathological short cycles.

The zero short cycle is, in hindsight, fairly obvious, but it is unfortunate that many other numbers map to themselves.

Effecting a Small Repair — Meddle Square

It's tempting to see if we can come up with a simple tweak to Middle Square to avoid some of the problems with short cycles. We'll make a tiny tweak to the algorithm. After we square, we complement all the bits; clearly we cannot loop around at zero now. Here are the results for six bits,

Graph of 6-bit Meddle-Square

and for eight bits,

Graph of 8-bit Meddle-Square

In these examples it does seem to have avoided tiny cycles, so perhaps we should declare this tinkering effort an improvement, a modest success. But the problem with tinkering is that this new generator seems to have little in the way of principle governing its design. I just hacked at the code a little and somewhat arbitrarily threw an extra operation and hoped for the best. Empirically, this tweak does help; for an arbitrarily chosen seed, the 64-bit version is much less likely to fall into a very short cycle and thereby fail TestU01's Crush benchmark, but we need to step back a bit and think about what we have here.

When discussing terrible PRNGs, Donald Knuth wrote (in The Art of Computer Programming, volume 2 (3rd ed.): Seminumerical Algorithms, Chapter 3, Random Numbers, page 6):

The moral of this story is that random numbers should not be generated with a method chosen at random. Some theory should be used.

Even without a theory, just looking at these graphs should tell us two important and worrisome things. Some states (those lower down the graph) are clearly more likely to be reached than others, thus the generator cannot possibly be uniform, it must be biased. We should also see that we are at risk of immediate pathological behavior if we happen to quickly end up in a small cycle.

Answers from Theory — Random Mappings

There actually is a theory for this kind of generator. It is the theory of random mappings, which is discussed in the book Analytic Combinatorics by Philippe Flajolet and Robert Sedgewick (section VII.3.3 pages 462-467). They say

Random number generators.
Many (pseudo) random number generators operate by iterating a given function φ over a finite domaine E ; usually, E is a large integer interval [0 .. N − 1]. Such a scheme produces a pseudo-random sequence u₀, u₁, u₂, . . ., where u₀ is the “seed” and u_n+1 = φ(u_n).

Particular strategies are known for the choice of φ, which ensure that the “period” (the maximum of ρ = λ + μ, where λ is the distance to cycle and μ is the cycle’s length) is of the order of N: this is for instance granted by linear congruential generators and feedback register algorithms; see Knuth’s authoritative discussion in [379, Ch. 3]. By contrast, a randomly chosen function φ has expected O(√N) cycle time (Figure VII.7, p. 462), so that it is highly likely to give rise to a poor generator. As the popular adage says: “A random random number generator is bad!”. Accordingly, one can make use of the results of Figure VII.7 and Example VII.10 in order to compare statistical properties of a proposed random number generator to properties of a random function, and discard the former if there is a manifest closeness.

Based on the theory of random mappings, these are the expected values for various parameters of the state graph, assuming N states:

Property	Expected Value
Number of components	(log N)/2
Component size	N/3
Number cyclic nodes	√(π N/2)
Path length to reach cycle (λ)	√(π N/8)
Cycle length (μ)	√(π N/8)
Rho length (ρ = λ + μ)	√(π N/2)
Number of nodes of in-degree k	N e^−k/k!

One thing this theory tells us is that we can expect cycle size and period to improve as the number of states increases. Thus, for our Meddle Square algorithm, we can surmise that:

When N is 2⁶⁴, we can expect to produce 2^32.3 numbers before we start to see repeats, and when repetition does occur it should be on a cycle of size 2^31.3. Thus a 64-bit random mapping should expect to pass TestU01's Crush (but not BigCrush).
When N is 2⁷² (which can also be computed using only 128-bit math), we can expect to produce 2^36.3 numbers before we start to see repeats, and when repetition does occur it should be on a cycle of size 2^35.3. So a 72-bit random mapping should expect to pass TestU01's BigCrush.
When N is 2⁹⁶ (which requires us to use 192-bit math to do the squaring, but not fully general 192-bit math because the input to the squaring process is only a 96-bit value), we can expect to produce 2^48.3 numbers before we start to see repeats. That's enough for more than 3.75 petabytes of random data (2.5 petabytes if we return an 8-byte 64-bit value rather than a 12-byte 96-bit value). That's two orders of magnitude larger than the standard 32 TB PractRand test.
When N is 2¹²⁸ (which requires us to use 256-bit math to do the squaring, but once again the input to the squaring process is smaller, only a 128-bit value), we can expect to produce 2^64.3 numbers before we start to see repeats. That's enough for 320 exabytes of random data, vastly more than any PractRand test will ever consume.

Actual tests do indeed support these expectations. In my experiments, I find that Meddle Square (and other similar variations) pass each of the statistical test suites exactly where we would expect them to from the theory of random mappings.

On the other hand, the same theory can tell us other things, too, like “with non zero asymptotic probability, the tallest tree in a functional graph is not rooted on the longest cycle”. In other words, the risk that the state-space graph will be an ugly or have unexpectedly pathological shape is always there. The untweaked Middle Square algorithm, for example, gets more and more picky about good seeds as it gets bigger. It's hard to be fully confident just how unlikely it is that we might end up on a small cycle.

And we still have the issue that the generator is biased. It may be hard to detect as the generator gets bigger, but we know from the structure of the state space that the bias is there.

Conclusion

A 128-bit variation of Jon von Neumnann's Middle Square passes TestU01's BigCrush at 72 bits and PractRand at 96 bits.

Here's how I interpret these results:

If someone shows you a PRNG with a state space of 96 bits or higher and tells you that it passes statistical tests, you should not be impressed at its statistical quality. At that size, you should expect PRNGs to be “too big to fail” statistical tests, even if their core algorithm is decidedly mediocre.
If someone shows you a large–state-space generator and tells you that it passes most statistical tests, but fails some, you should be even less impressed.
The basic mathematical properties of period and uniformity matter. If someone introduces a new PRNG and they don't show that the generator is uniform/unbiased/1-dimensionally equidistributed (all ways of saying essentially the same thing) or discuss its period, you should be worried.
If a generator is a random mapping, there is going to be some risk of bad states and short cycles. If someone offers you such a such a generator, you should ask how well they have characterized the probability of pathological behavior. Do they know the chance is infinitesimal, or are they just crossing their fingers and hoping; “no trouble... yet”.

But you could take a different standpoint. You could say:

If you take something crappy, and you make it so big you can't see that it's crappy any more, maybe it's not crappy. Maybe it's just fine.

If you're a user of SplitMix, you ought to think about these issues. The SplitMix paper doesn't adequately characterize the behavior of its split() operation, but split() seeds the PRNG with its own output, forming a random mapping. The SplitMix paper doesn't discuss the consequences of split() for period, uniformity, or the shape of SplitMix's state space, but now you're aware that these topics absolutely needed to be addressed. With split(), you have a random mapping with all of the issues that raises.