Everything You Never Wanted to Know about C++'s random_device

M.E. O'Neill

2015-05-03 20:07

Continuing my series of blog posts about seeding random number generators in C++, it's time to take a look at std::random_device. In this post, we'll see the many ways in which it isn't really your friend.

It Isn't a Seed Sequence

C++11 random number generators can be initialized either with a single integer or with an object that matches the C++11 Seed Sequence concept. In an ideal world, std::random_device would be usable as a Seed Sequence, allowing us to write

std::mt19937 engine{std::random_device{}};

C++11 and C++14 don't allow that. It would certainly be nice if that changes in C++17 but we'll have to see—thus far, I don't think many people are demanding it. Actually, you pretty much can write this code today, but only if you're using Boost, where you'd say

boost::random::mt19937 engine{boost::random::random_device{}};

Some people think they're saying this when they say:

std::mt19937 engine{std::random_device{}()};

But those two extra parentheses make a huge difference. Rather than passing an object representing the random device, in this code we're passing a single integer taken from the random device, with all the problems that entails.

You might think that we could write a wrapper class that turns the random device into a Seed-Sequence–compatible object. Alas, thanks to the strange rules Seed Sequences are supposed to follow, that's quite a challenge.

C++11 Seed Sequences Are Strange

Seed sequences are required to have a param member function defined as follows. For some Seed Sequence S,

r.param(ob) — copies to the given destination [i.e., ob] a sequence of 32-bit units that can be provided to the constructor of a second object of type S, and that would reproduce in that second object a state indistinguishable from the state of the first object.

In other words, even though Seed Sequences aren't required to have a copy constructor, we must be able to create an “indistinguishable” clone of them.

Obviously, you can't create a “indistinguishable clone” of a std::random_device device object—the moment they start giving different outputs, the objects aren't indistinguishable, and if its outputs are nondeterministic, that'll happen immediately.

In general, the Seed Sequence requirements are written to make it clear that a seed sequence represents a fixed set of seed data, not a (nondeterministic) source of seed data. If it were otherwise, all the verbiage about indistinguishable objects wouldn't be there.

A Seed-Sequence Wrapper Seems Possible

Even though std::random_device lacks the features needed for a Seed Sequence, perhaps there is some way of providing a wrapper that makes it Seed-Sequence compatible.

Here are some options:

Not providing the dreaded size and param member functions at all, just the generate function. In other words, failing to meet the Seed-Sequence requirements but hoping that no one will actually notice. This approach happens to work in several implementations (because these functions are utterly unnecessary for initializing a typical RNG), but, technically, this choice renders our code nonportable.
Providing size and param but making them _impractical to use. This is a delightfully legalistic option. The standard makes no claim that the size of the state should have any correspondence to the amount of initial seed data originally provided, so we can claim that the size of our internal state is so vast that no one could ever allocate enough memory to copy it. You _can clone our object, we say, provided you do something impossible first.
Create a wrapper that lives a double life. So long as param is never called, the object behaves nondeterministically using the random device, but the moment we call param in an attempt to clone it, we “switch modes” to use something entirely deterministic, like std::seed_seq. One small challenge to this approach is that param most work on const objects, but our mode switch requires us to change the state of the object.

There is one added complication, however. The standard also says that default-constructed Seed-Sequence objects should also be indistinguishable. Ugh! I will show how to cope with that requirement too, but not in this post.

It Only Produces Outputs One At a Time

Despite their flaws, Seed-Sequence classes do have one convenient feature, their generate function, which can be asked to produce several 32-bit integers at once. In contrast, std::random_device can only generate one integer at a time.

Thus asking for 64 integers may require std::random_device to make 64 operating system calls (which tend to be expensive).

It Has Unknown Cost

How costly it is to read a number from this “device”? That is unspecified. It could, for example, be reading from /dev/random on a Linux system, which can block for an extended period waiting for entropy (which is itself problematic for a variety of reasons).

In fact, when using GCC on Linux, std::random_device will try to use Intel's rdrand instruction. When using Clang with libc++, however, the program instead uses /dev/urandom. The table below shows cycle counts for each option:

	`/dev/urandom`	`rdrand`
First Call	16936	24595
Subsequent Calls	4972	544

On a modern system, the first call to any function takes longer than subsequent calls. Pages need to be faulted into memory, on-time setup costs paid, and so forth. At one level, it's interesting to note that the rdrand code takes longer to get started than the /dev/urandom code—possibly this difference is related to the rdrand code triggering 37 more page faults than the /dev/urandom code. But at another level, we don't know nearly enough about what's happening during this first call and how it might vary from program to program to draw any useful conclusions. Stepping back, it's worthwhile to realize that for a 3.4 GHZ CPU, a one-time cost of 24,595 cycles amounts to 7.2 µs—seven millionths of a second. We could investigate more deeply where this extra time is spent, but it's not really worth the bother—if you spend just one second thinking about it (and you've spent more than that already), that's more time than it'll ever waste for you.

For subsequent calls, we're on a firmer footing for drawing conclusions. We can see that rdrand is much cheaper to call, almost an order of magnitude faster, which shouldn't be a huge surprise. But for the typical use cases for std::random_device (one-time seeding) the difference between 0.16 µs and 1.46 µs is negligible.

This is positive news. On the platforms most people care about, std::random_device is going to be fast (and high quality, too). We just can't depend on that.

It Might Actually Be Deterministic

We have saved the worst problem for last. C++11's std::random_device is not required to be nondeterministic! Implementations can and do implement it as a simple RNG with a fixed seed, so it produces the same output for every run of the program.

If you read the C++ specification, you might briefly think there is hope of detecting such horrible implementations, because std::random_device provides an entropy member function that must return zero when the generator is completely deterministic. But popular libraries (both GCC's libstdc++ and LLVM's libc++) always return zero, even when they're using high-quality external randomness.

These aren't just theoretical issues, as this stackoverflow post reveals.

Conclusion

So, there's good news and bad news. On popular platforms std::random_device works well, and the awkward impedance mismatch between std::random_device and Seed Sequences is something we can work around (even if I haven't given all the details yet).

The worst news is that in portable C++ code, std::random_device may be unfit for its intended purpose. With some platform-testing magic, we can discover known-good platforms, but the possibility of broken platforms still exists (and the most off-beat platforms are the ones most likely to be problematic).

In my opinion, the C++ committee should never have allowed std::random_device to be deterministic—basically that's lying. It would have been far better to have platforms that cannot provide such a device not provide this class at all.

In my next post, I'll discuss how we can augment std::random_device with other sources of entropy.