When I wrote the PCG paper back in 2014, I failed really badly when talking about prediction difficulty. When people first started reading the paper and discussing it on the Internet, I realized I had missed the mark, and made sure that the page on this website about predictability had a more nuanced and measured tone, but today I think even that page didn't make the points clearly enough. Possibly I'll fail again today, but let's have another go at trying to articulate the issues.
A major point I made in the PCG paper was that we get useful insights by testing small versions of PRNGs. These mini versions may be too small to be useful in practice, but their structure will give us a good sense about the structure of their larger counterparts.
In this post, we'll look at 16-bit versions of several popular PRNGs, and draw some “randomgrams” to get a sense of their structure. In particular, we're going to look at the pattern of occurrences of pairs in the output, in other words which pairs of outputs occur once, which never occur at all, and which occur multiple times.
For many PRNGs, the more state bits you give them, the deeper statistical tests need to go to discover their flaws. We'll explore what this phenomenon means, looking at one of the earliest PRNGs ever made, John von Neumann's “Middle Square” method.
Continuing my recent burst of PRNG testing with PractRand, here are a few more passes, this time for the generators I mentioned in my post on reasonable alternatives to PCG.
- XorShift* 64/32 (i.e., high 32 bits of 64-bit XorShift*)
- XorShift* 128/64 (i.e., high 64 bits of 128-bit XorShift*)
None of these results should be a surprise. In the PCG paper, I looked at truncated XorShift* generators and onserved that they passed TestU01 with plenty of headroom. Nevetheless, it's nice to confirm.
Also, none of these generators are trivially predictable (but obviously the smaller XorShift versions can be fairly quickly brute forced). ChaCha obviously has much stronger prediction difficulty, given that with more rounds it is considered cryptographically secure. When simply trying to protect against algorithmic complexity attacks, I think four rounds is fine.
In an upcoming post, I'll talk about the downsides of having a trivially predictable PRNG, but let's have a little whimsical fun first.
I first mentioned and gave an overview of PractRand in this blog post. Since then, I've been having “fun” testing with PractRand and the results have shown up in several subsequent posts. You could be having similar fun, too! Let's see how, with a worked example: testing the Mersenne Twister.
As is probably evident, I've been on a bit of a PractRand binge lately. Some of the news hasn't been good, so I'll try to temper the bad news with some better news.
I'm pleased to report that truncated 128-bit multiplicative linear congruential generators (sometimes known as a Lehmer generators) pass PractRand.
PCG allows its generators to optionally select a stream. I discuss the overall concept of streams in the general section on of the site on random number generation here; in essence, streams allow you to choose a generator from a collection of distinct-but-related generators.
Until now I haven't said much about the design decisions behind PCG's streams, but that changes here. At the end I'll also take a brief look at SplitMix's streams.
In my previous post about testing PCG with the PractRand statistical tests, I also ran PractRand on SplitMix. When I started the testing process, I had a bug in my implementation of SplitMix, and the same bug exists in many SplitMix implementations (e.g., at least four Haskell implementations and an OCaml implementation) because of a small but crucial error in the code for SplitMix given in the SplitMix paper. The specific bug does not exist in Oracle's JDK8 implementation of SplitMix (because that code was written before the paper, and not based on the code in paper).
I'm not the first person to discover this bug (e.g, it's documented in an implementation here), but it caught me out.
I had hoped that I could just correct the code to the JDK version and we'd be all good, but as I thought a little more deeply about what the code is doing, I discovered that even the code in JDK version doesn't successfully address the underlying issues. The flaw runs deeper.
Let's take a look...
The table on the front page of this site is a double edged sword. It helps people realize that generators they like may have problems, but it also leads some people to think that I'm just trying to aggressively promote my own work.
So, let's be clear, it's certainly nice if you like my work, but the success or otherwise of the PCG family isn't in any way essential to my own career or life success. It's an unexpected side project that I enjoy working on because I find the topic interesting. PCG is free; I don't get any royalties if you use it, and I won't mind if you use another good generator. If you don't see quality in the same way I do, you're entitled to your opinion. In other posts, such as the ones on randutils, I didn't even use my own generator.
So what are some other good generators? I'll mention a couple here...
There were various questions that people asked when PCG first started being discussed on the Internet; one those questions was whether I had also tested PCG with PractRand. I hadn't. I had done my testing using TestU01, which is considered by most to be the “gold standard”, with the broadest range of tests. Back in 2014, PractRand hadn't been on my radar, but looking into it I discovered that it is TestU01's closest competitor. I put testing with PractRand on my to-do list, but it wasn't a very high priority because I knew that all my PCG family members had been designed to pass TestU01's BigCrush test battery with plenty of headroom, which gave me good reason to believe they would do well in any set of statistical tests.
Finally, this summer I was discussing the best practices for testing random number generators with a colleague and I realized that it was high time for me to finally get around to testing PCG (and a few other recent PRNGs, too!). The short version is, it passes just fine. But for the details, read on…
This site has gone more than two years without an update, but today I'm finally making some updates. It seems appropriate to look back at how we got here, and at some of the things that have happened in the last two years.
If you've been wondering what happened with the PCG paper, this blog post will let you know. Eventually.
Academics don't often write about how papers get written and where their ideas come from, since it's not relevant to the end product, but I think there is value in knowing how things get made, so I'll share the full story of the PCG paper. In hindsight, I think its fate was telegraphed by its origins.
To the extent that anyone cares about C++11's random-number facility at all, the C++ community is polarized between two views—one that likes it, and one that hates it.
The two views are basically these:
- It's amazingly elegant, a shining example of separation of concerns. Pseudo-random engines, random distributions, seeding techniques, and entropy sources are all quite different things. C++11 creates a pluggable and extensible architecture that allows for new RNGs, new custom distributions, and so forth. It's comprehensive and flexible.
- It's horrible to use, feeling unnecessarily overengineered. To do even the simplest task requires significant boilerplate. Things that would be one easy-to-understand line in Python are three or four hard-to-follow lines in C++11. It's completely unsuitable for beginners, and even seasoned programmers hate it.
Both camps are right. But it turns out that we don't need to do very much to fix the situation and (hopefully) leave both camps feeling that they're getting what they want.
In this post, we'll develop the missing C++11 class that ties all the pieces together, making C++11 random number generators easy to use without abandoning any of their power and flexibility.
In two recent posts, I've looked at seeding random number generators in in C++11 (looking at what's wrong with
std::seed_seq and developing something that avoids those flaws), but
seed_seq exists as a mechanism to “mix up” entropy that you give it. You still need to get that entropy from somewhere. So where?
std::random_device Is Not Your Friend
The obvious source for external randomness we can use in seeding is
std::random_device. But as I mentioned in this post,
- It's hard to make
std::random_deviceconform to the requirements of a seed sequence.
- It's unspecified just how costly this “device” is to read from.
- It may not be nondeterministic at all, rendering it unfit for our purposes.
Portable code needs to look to other sources of entropy for RNG seeding. And even if
std::random_device works well, mixing in entropy from other sources can have a variety of benefits, including performance.
Continuing my series of blog posts about seeding random number generators in C++, it's time to take a look at
std::random_device. In this post, we'll see the many ways in which it isn't really your friend.