The vast majority of my posts about random number generation have focused on looking at the properties of different generation schemes. But, perhaps surprisingly, the performance of your randomized algorithm may hinge not on the generation scheme you chose, but on other factors. In this post (inspired by and building on an excellent recent paper by Daniel Lemire), we'll explore a common source of overhead in random number generation that frequently outweighs PRNG engine performance.
Nearly a year ago, I wrote a post suggesting some good alternatives to PCG. As part of that post, I included links to C++ implementations of those PRNGs, following C++-11 conventions (although omitting rarely-used features such as state I/O).
Since that post, I've written my own C++ implementations of a few more generation schemes. Links and brief descriptions below.
In a recent post, I noted that the Vigna and Blackman's new Xoshiro family of PRNGs is prone to some implausible repeats. I hinted then that I'd write a longer article providing a more in-depth discussion of the issues. This is that article.
We'll begin with a slightly deeper exploration of zeroland (which is how I discovered the issue), see how that leads us to find Xoshiro's near-repeats problems, and, finally, determine just how significant the problem is.
My two favorite statistical test suites for PRNGs are PractRand and TestU01, but while they detect statistical flaws of many kinds, detecting those flaws often takes a long time to run, and for various reasons some flaws take a very long time to detect (if they are detected at all). In this post, we'll put together a test based around a very classic example from the world of probability theory, the Birthday Problem.
In doing so, we'll develop a test that can detect deviations from random behavior in some widely used PRNGs. We'll eventually be able to detect flaws in
std::minstd_rand and XorShift 32 in 0.01 seconds, and SplitMix, XorShift 64 and Xoroshiro64* in less than half an hour.
In my previous post, I looked at Bob Jenkins's JSF PRNG which has, at its heart, a random invertible mapping rather than a random (noninvertible) mapping.
I discussed the theory of random (noninvertible) mappings in my post “Too Big To Fail”, which looked at the behavior generators based on this idea. Although random (noninvertible) mappings produce PRNGs that I consider needlessly flawed (for general-purpose use), their behavior is at least well characterized in the academic literature, particularly by Philippe Flajolet (e.g., Analytic Combinatorics, section VII.3.3 pages 462-467).
Random invertible mappings have different properties, being based on a random bijection (which means that the generator can tick backwards as well as forwards if desired), whereas a random (noninvertible) mapping merely requires a random function (which may not be able to go backwards because there is no guarantee that there is only one place we could have come from).
If I were to search the literature, I would not be surprised to find that someone else has done a similar analysis of random invertible mappings. But a quick Google search revealed nothing, and, for me at least, deriving results myself is far more fun than searching for results derived by others, and often a good deal faster if there are no immediate leads. So that's what I've done. Now, hopefully, if someone Googles it, they'll find something. But if you have a good source to cite, please contact me and I'll update this article.
I've been chatting on and off with David Blackman since August, 2017. Over our various conversations, I've gained huge respect for him and his contributions to random number generation over the last 15 years. In the course of a recent conversation I asked him about some of his favorite random number generators, and one of the ones he mentioned was A Small Noncryptographic PRNG by Bob Jenkins. Even though I had previously been aware of some of Bob Jenkins's other work regarding hash functions, somehow his work on noncryptographic random number generation fell through the cracks when I was surveying things that were out there back in 2014. I'll remedy that omission now.
That day, he also submitted a link to his critique to Reddit (archived here). I think it is fair to say that his remarks did not get quite the reception he might have hoped for. Readers mostly seemed to infer a certain animosity in his tone and his criticisms gained little traction with that audience.
Although I'm pleased to see readers of Reddit thinking critically about these things, it is worth taking the time to dive in and see what what lessons we can learn from all of this.
It's now been a week since David Blackman and Sebastiano Vigna
announced new members of the Xoroshiro family. Although I have been busy with a number of other matters, I recognize that interest in these new PRNGs is likely to be high right now, so I have managed to grab a few stolen moments here and there to take a look at their new work. I plan to write a longer post soon, but my preliminary investigations have turned up enough surprising things that I feel like it's worth sharing some of my discoveries. As with my previous post on these matters, I'll focus in on their best PRNG,
On May 4, David Blackman and Sebastiano Vigna announced new members of
the Xoroshiro family and a new test
for random number generators (based on the z9 test from
gjrand) that their previous
work fails, all described in a new
paper. They claim to have now developed an “all-purpose, rock-solid generator”. In this post, having had less than a day to review their work, I'll present a few preliminary thoughts on this news, mostly looking at their best new generator,
Although I know a lot of effort went into Xoroshiro128+, and there are many good things that have come out of its development, I am sad to say that on balance I feel it has too many flaws to be worth recommending—there are many better choices. In this post, I'll dig a little deeper into some of its flaws.
Let's begin with what we already know:
- John D. Cook showed that it fails PractRand with just a couple of seconds of testing, and before that Chris Doty-Humphrey, author of PractRand, showed that it didn't just fail “binary rank” tests, it also failed the DC6 test which is a short-medium range linear test.
- Daniel Lemire showed that it fails TestU01.
- I showed that it is trivially predictable.
- I showed that it is visualizing smaller-scale versions of the scheme shows clear flaws.
But are the flaws superficial and easily ignored, or more troubling than that?
Suppose that you've written (or just seen someone else announce) a brand new PRNG. Cool! It's nice to have new things. But here's the question you should ask yourself; “Is it better than a reasonable ‘minimal standard’? Does it beat methods devised more than sixty years ago?”
When I wrote the PCG paper back in 2014, I failed really badly when talking about prediction difficulty. When people first started reading the paper and discussing it on the Internet, I realized I had missed the mark, and made sure that the page on this website about predictability had a more nuanced and measured tone, but today I think even that page didn't make the points clearly enough. Possibly I'll fail again today, but let's have another go at trying to articulate the issues.
A major point I made in the PCG paper was that we get useful insights by testing small versions of PRNGs. These mini versions may be too small to be useful in practice, but their structure will give us a good sense about the structure of their larger counterparts.
In this post, we'll look at 16-bit versions of several popular PRNGs, and draw some “randomgrams” to get a sense of their structure. In particular, we're going to look at the pattern of occurrences of pairs in the output, in other words which pairs of outputs occur once, which never occur at all, and which occur multiple times.
For many PRNGs, the more state bits you give them, the deeper statistical tests need to go to discover their flaws. We'll explore what this phenomenon means, looking at one of the earliest PRNGs ever made, John von Neumann's “Middle Square” method.
Continuing my recent burst of PRNG testing with PractRand, here are a few more passes, this time for the generators I mentioned in my post on reasonable alternatives to PCG.
- XorShift* 64/32 (i.e., high 32 bits of 64-bit XorShift*)
- XorShift* 128/64 (i.e., high 64 bits of 128-bit XorShift*)
None of these results should be a surprise. In the PCG paper, I looked at truncated XorShift* generators and onserved that they passed TestU01 with plenty of headroom. Nevetheless, it's nice to confirm.
Also, none of these generators are trivially predictable (but obviously the smaller XorShift versions can be fairly quickly brute forced). ChaCha obviously has much stronger prediction difficulty, given that with more rounds it is considered cryptographically secure. When simply trying to protect against algorithmic complexity attacks, I think four rounds is fine.