In an upcoming post, I'll talk about the downsides of having a trivially predictable PRNG, but let's have a little whimsical fun first.
I first mentioned and gave an overview of PractRand in this blog post. Since then, I've been having “fun” testing with PractRand and the results have shown up in several subsequent posts. You could be having similar fun, too! Let's see how, with a worked example: testing the Mersenne Twister.
As is probably evident, I've been on a bit of a PractRand binge lately. Some of the news hasn't been good, so I'll try to temper the bad news with some better news.
I'm pleased to report that truncated 128-bit multiplicative linear congruential generators (sometimes known as a Lehmer generators) pass PractRand.
PCG allows its generators to optionally select a stream. I discuss the overall concept of streams in the general section on of the site on random number generation here; in essence, streams allow you to choose a generator from a collection of distinct-but-related generators.
Until now I haven't said much about the design decisions behind PCG's streams, but that changes here. At the end I'll also take a brief look at SplitMix's streams.
In my previous post about testing PCG with the PractRand statistical tests, I also ran PractRand on SplitMix. When I started the testing process, I had a bug in my implementation of SplitMix, and the same bug exists in many SplitMix implementations (e.g., at least four Haskell implementations and an OCaml implementation) because of a small but crucial error in the code for SplitMix given in the SplitMix paper. The specific bug does not exist in Oracle's JDK8 implementation of SplitMix (because that code was written before the paper, and not based on the code in paper).
I'm not the first person to discover this bug (e.g, it's documented in an implementation here), but it caught me out.
I had hoped that I could just correct the code to the JDK version and we'd be all good, but as I thought a little more deeply about what the code is doing, I discovered that even the code in JDK version doesn't successfully address the underlying issues. The flaw runs deeper.
Let's take a look...
The table on the front page of this site is a double edged sword. It helps people realize that generators they like may have problems, but it also leads some people to think that I'm just trying to aggressively promote my own work.
So, let's be clear, it's certainly nice if you like my work, but the success or otherwise of the PCG family isn't in any way essential to my own career or life success. It's an unexpected side project that I enjoy working on because I find the topic interesting. PCG is free; I don't get any royalties if you use it, and I won't mind if you use another good generator. If you don't see quality in the same way I do, you're entitled to your opinion. In other posts, such as the ones on randutils, I didn't even use my own generator.
So what are some other good generators? I'll mention a couple here...
There were various questions that people asked when PCG first started being discussed on the Internet; one those questions was whether I had also tested PCG with PractRand. I hadn't. I had done my testing using TestU01, which is considered by most to be the “gold standard”, with the broadest range of tests. Back in 2014, PractRand hadn't been on my radar, but looking into it I discovered that it is TestU01's closest competitor. I put testing with PractRand on my to-do list, but it wasn't a very high priority because I knew that all my PCG family members had been designed to pass TestU01's BigCrush test battery with plenty of headroom, which gave me good reason to believe they would do well in any set of statistical tests.
Finally, this summer I was discussing the best practices for testing random number generators with a colleague and I realized that it was high time for me to finally get around to testing PCG (and a few other recent PRNGs, too!). The short version is, it passes just fine. But for the details, read on…
This site has gone more than two years without an update, but today I'm finally making some updates. It seems appropriate to look back at how we got here, and at some of the things that have happened in the last two years.
If you've been wondering what happened with the PCG paper, this blog post will let you know. Eventually.
Academics don't often write about how papers get written and where their ideas come from, since it's not relevant to the end product, but I think there is value in knowing how things get made, so I'll share the full story of the PCG paper. In hindsight, I think its fate was telegraphed by its origins.
To the extent that anyone cares about C++11's random-number facility at all, the C++ community is polarized between two views—one that likes it, and one that hates it.
The two views are basically these:
- It's amazingly elegant, a shining example of separation of concerns. Pseudo-random engines, random distributions, seeding techniques, and entropy sources are all quite different things. C++11 creates a pluggable and extensible architecture that allows for new RNGs, new custom distributions, and so forth. It's comprehensive and flexible.
- It's horrible to use, feeling unnecessarily overengineered. To do even the simplest task requires significant boilerplate. Things that would be one easy-to-understand line in Python are three or four hard-to-follow lines in C++11. It's completely unsuitable for beginners, and even seasoned programmers hate it.
Both camps are right. But it turns out that we don't need to do very much to fix the situation and (hopefully) leave both camps feeling that they're getting what they want.
In this post, we'll develop the missing C++11 class that ties all the pieces together, making C++11 random number generators easy to use without abandoning any of their power and flexibility.
In two recent posts, I've looked at seeding random number generators in in C++11 (looking at what's wrong with
std::seed_seq and developing something that avoids those flaws), but
seed_seq exists as a mechanism to “mix up” entropy that you give it. You still need to get that entropy from somewhere. So where?
std::random_device Is Not Your Friend
The obvious source for external randomness we can use in seeding is
std::random_device. But as I mentioned in this post,
- It's hard to make
std::random_deviceconform to the requirements of a seed sequence.
- It's unspecified just how costly this “device” is to read from.
- It may not be nondeterministic at all, rendering it unfit for our purposes.
Portable code needs to look to other sources of entropy for RNG seeding. And even if
std::random_device works well, mixing in entropy from other sources can have a variety of benefits, including performance.
Continuing my series of blog posts about seeding random number generators in C++, it's time to take a look at
std::random_device. In this post, we'll see the many ways in which it isn't really your friend.
In my previous post, I talked about issues with C++11's
std::seed_seq. As a reminder, the goal of the
std::seed_seq is to generate seed data that can be used to initialize a random number generator. And to allow it to do its job it needs seed data. Seed data in, seed data out. It might seem like a “do nothing” function would be up to the task, but in many situations the seed data you can provide is of variable quality. Some of it is “good” entropy, which changes often, and some of it is “lower quality” entropy which is more predictable or changes less frequently (e.g., the machine id). It also may be that the data that changes often (e.g., the time) changes more in the low-order bits than the high-order bits. So the goal of
std::seed_seq is to mix together the seed data that we provide so that the high-quality and low-quality data is well-mixed.
In essence, the task that
std::seed_seq does (and thus anything intending to replace it should do) is compute a hash (i.e., a scrambled version) of its input data. The design of hash functions is often (rightly) considered to be something of a black art, but since the same can be said of designing random number generators and I've already done that, I might as well tackle this problem, too.
Properly seeding random number generators doesn't always get the attention it deserves. Quite often, people do a terrible job, supplying low-quality seed data (such as the system time or process id) or combining multiple poor sources in a half-baked way. C++11 provides
std::seed_seq as a way to encourage the use of better seeds, but if you haven't thought about what's really going on when you use it, you may be in for a few surprises.
C++11 provides access to operating-system–provided randomness via
std::random_device, but, strangely, it isn't easy to use it directly to initialize C++'s random number generators. C++'s supplied generators only allow seeding with a
std::seed_seq or a single integer, nothing else. This interface is, in many respects, a mistake, because it means that we are forced to use
seed_seq (the “poor seed fixer”) even when it's not actually necessary.
In this post, we'll see two surprising things:
- Low-quality seeding is harder to “fix” than you might think.
std::seed_seqtries to “fix” high-quality seed data, it actually makes it worse.
On February 18, I gave a talk about at Stanford University, as part of EE380, Computer Systems Colloquium.
The talk provides an overview random number generation in general and the PCG family in particular. It was a good audience and a fun experience. They recorded the talk, and actually had pretty good production values, and then posted it to YouTube.
When I started looking for tools to create websites, I found there were many options. Many of those options were extremely complex, designed for complex authoring needs (e.g., dynamic content, web-based story submission, multiple authors, etc.), and required significant infrastructure. All I wanted was a simple static site, so these solutions were overkill.
All I wanted was a simple file-based static site generator, but most of the options in that space were focused on blogging, although they often claimed that you didn't have to use them that way. Nikola seemed like the best/easiest choice, with clear instructions for how to disable its blogging functions.
But as time has passed since creating the back in October of last year, I've come to realize that I need a place for news, announcements and random musings about randomness that don't easily fit elsewhere on the site. In other words, it needs a blog.
So, today I adjusted Nikola's configuration once again, and with just a few lines of changes, ta-da, we have a blog. Well, actually initially it was a horrible-looking blog because the blog page templates were terrible, but with a some changes to the templates (following a strategy of expedience over elegance), I managed to create a more-or-less acceptable layout.
Anyway, watch this space for more…