Simple Portable C++ Seed Entropy

In two recent posts, I've looked at seeding random number generators in in C++11 (looking at what's wrong with std::seed_seq and developing something that avoids those flaws), but seed_seq exists as a mechanism to “mix up” entropy that you give it. You still need to get that entropy from somewhere. So where?

Sadly std::random_device Is Not Your Friend

The obvious source for external randomness we can use in seeding is std::random_device. But as I mentioned in this post,

  • It's hard to make std::random_device conform to the requirements of a seed sequence.
  • It's unspecified just how costly this “device” is to read from.
  • It may not be nondeterministic at all, rendering it unfit for our purposes.

Portable code needs to look to other sources of entropy for RNG seeding. And even if std::random_device works well, mixing in entropy from other sources can have a variety of benefits, including performance.

Other (Cheap) Sources of Entropy

If we can't rely on std::random_device, what can we use? We will survey our options, focusing on low-cost and highly portable choices (rather than, say, trying to connect to random.org on the Internet).

Entropy from the Time

The current time has a long history as a source of seed entropy. Vast numbers of C programs use

srand(time(NULL));

to initialize the C RNG. In C++11 we can do better than the venerable C time function though, because we have std::chrono::high_resolution_clock. In fact, if we use the LLVM or GNU C++ libraries, we find that the (claimed) resolution of this clock is one nanosecond. Nanosecond resolution means that no calls that occur one after the other will ever see the same value, and makes it extremely unlikely that calls in programs executing at (almost) the same moment will observe the same value. With the low-order 32 bits cycling through 232 values every 4.3 seconds we can easily claim that a high-resolution clock can give a typical program 32 bits of entropy.

Seeding with Just the Clock

Let's see what happens if we seed with just a nanosecond resolution timer. First, let's try std::seed_seq and see if there seems to be any pattern to its output:

uint32_t seedseq_random_using_clock()
{
    uint64_t seed = std::chrono::high_resolution_clock::
                                        now().time_since_epoch().count();
    std::seed_seq seeder{uint32_t(seed),uint32_t(seed >> 32)};
    ++seed;
    int out;
    seeder.generate(&out,&out+1);
    return out;
}

If we test this generator with TestU01's SmallCrush battery, we get these rather disappointing results:

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        std::seedseq (using std::chrono::high_resolution_clock)
 Number of statistics:  15
 Total CPU time:   00:01:15.25
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
  1  BirthdaySpacings                 eps  
  2  Collision                        eps  
  3  Gap                              eps  
  4  SimpPoker                        eps  
  5  CouponCollector                  eps  
  6  MaxOft AD                      1 - 2.1e-11
  7  WeightDistrib                    eps  
 ----------------------------------------------
 All other tests were passed

In other words, if we use std::seed_seq to seed two RNGs at nearby times, there will be a discernible pattern in their seeds.

Now let's try randutils::seed_seq_fe128 instead. not only does it pass SmallCrush, it passes Crush and BigCrush as well:

========= Summary results of BigCrush =========

 Version:          TestU01 1.2.3
 Generator:        randutils::seed_seq_fe128 (using std::chrono::high_resolution_clock)
 Number of statistics:  160
 Total CPU time:   12:02:01.31

 All tests were passed

Suitably scrambled, sampled at high enough resolution, the time alone is good enough to provide workable statistically good seeds. But before we celebrate, we need to look a little closer.

Troubles with the Time

There are some issues with using C++11's high_resolution_clock. One is that libraries lie. The GNU libstdc++ library claims that its high-resolution clock ticks every nanosecond, but on some platforms, including OS X, the underlying implementation uses a POSIX gettimeofday system call which only has microsecond resolution (even though better alternatives exist).

And even if your C++ library tells the truth, there is no requirement that the high-resolution clock be especially high resolution. You could be using a C++ library where the clock only ticks once a second (or even once a year!).

So std::chrono::high_resolution_clock might be a good (if limited) source of entropy (perhaps even able to do all the heavy lifting by itself), but it might be merely mediocre. It can't be our only source.

Entropy from the Memory Configuration

In C and C++ we can find out where in memory our variables and functions are stored. Thus, we can write programs like this one, which prints out where things are in memory:

#include <iostream>
#include <iomanip>
#include <cstddef>
#include <cstdlib>

static void lf()
{
    // Do nothing
}

static int zsv;
static int isv = 1;

int main()
{
    int lv;
    void* mem = malloc(1);
    free(mem);
    std::cout << std::hex;
    std::cout << "Local Variable             " << (void*)&lv    << std::endl;
    std::cout << "Heap Memory                " << mem           << std::endl;
    std::cout << "Static Variable (Zeroed)   " << (void*)&zsv   << std::endl;
    std::cout << "Static Variable (Inited)   " << (void*)&isv   << std::endl;
    std::cout << "Local Function             " << (void*)&lf    << std::endl;
    std::cout << "Library Function           " << (void*)&_Exit << std::endl;
}

If we run this program on OS X, we see the following results for three different runs:

Kind Run 1 Run 2 Run 3
Local Variable 0x7fff5ea636f4 0x7fff5dd726f4 0x7fff50caf6f4
Heap Memory 0x7fa7d1c04bb0 0x7f8eb3404bb0 0x7ffb6ac04bb0
Static Variable (Zeroed) 0x10119e0cc 0x101e8f0cc 0x10ef520cc
Static Variable (Inited) 0x10119e0c8 0x101e8f0c8 0x10ef520c8
Local Function 0x10119d8a0 0x101e8e8a0 0x10ef518a0
Library Function 0x7fff94c4256a 0x7fff94c4256a 0x7fff94c4256a

Notice that almost everything is in a different location each run. A few years ago, the results would have been different—the memory addresses of everything would be the same every time—but today's operating systems improve security with address-space–layout randomization (ASLR). Not everything changes—we can see that in all cases the high-order bits and the low-order bits often stay the same. In addition, the address of the library function stayed constant, but even that would change if we rebooted (or if we set the environment variable DYLD_SHARED_REGION=avoid).

On Linux, by default, the memory map isn't quite as random:

Kind Run 1 Run 2 Run 3
Local Variable 0x7fffa0b322cc 0x7fff8cff3bdc 0x7fffc8e07e4c
Heap Memory 0x2320c20 0x7aac20 0x249cc20
Static Variable (Zeroed) 0x6021b8 0x6021b8 0x6021b8
Static Variable (Inited) 0x602098 0x602098 0x602098
Local Function 0x400db0 0x400db0 0x400db0
Library Function 0x400950 0x400950 0x400950

In these Linux tests, the stack and the heap have moved around but all the code has stayed in the exact same place. Linux can move libraries and application code around as well, but does not do so by default, choosing to do so only for applications where security is considered to be important. The claimed reason for limiting ASLR and keeping code at fixed addresses is one of efficiency—position-independent executables purportedly adds a small amount of additional overhead. Although both OS X and OpenBSD consider the overhead acceptable for the benefit of making attacks difficult, mainstream Linux distributions apparently do not.

But we can override Linux's defaults and compile position-independent code by running by using the compiler arguments -pie -fpie:

Kind Run 1 Run 2 Run 3
Local Variable 0x7ffd6d03e18c 0x7ffec84f456c 0x7fff745b05bc
Heap Memory 0x7f47ef07ac20 0x7f89b0f24c20 0x7fbddf6e4c20
Static Variable (Zeroed) 0x7f47edf2f09c 0x7f89b002009c 0x7fbddf50f09c
Static Variable (Inited) 0x7f47edf2f090 0x7f89b0020090 0x7fbddf50f090
Local Function 0x7f47edd2df70 0x7f89afe1ef70 0x7fbddf30df70
Library Function 0x7f47ecf6b2d0 0x7f89af05c2d0 0x7fbdde54b2d0

Here we can see that everything is moving around, but it doesn't actually seem like the total entropy has changed, since the heap, static data, and functions all seem to be similarly located.

(Looking at startup performance with LD_DEBUG=statistics [and DYLD_PRINT_STATISTICS on OS X], I could find no appreciable performance difference between the position-independent and regular versions, which makes me think that OS X and OpenBSD have it right, and most Linux distributions have it wrong.)

Of course, it's possible that our code won't run on a system that has ASLR. But even on those systems, what is on the stack and on the heap may vary depending on exactly what has previously happened during the program's execution. And different threads will have different stacks, so it provides a mechanism to give different threads different entropy even if other parts of the entropy stay the same (e.g., checking the time at the exact same moment).

Thus the memory configuration will provide some entropy on all systems, and on systems with ASLR, we can hope it's providing about 32 bits worth (even if some of that is baked in at compile time rather than runtime).

Entropy from the CPU

Many CPUs have some kind of cycle counter. Although C++ provides no portable way to read such a counter, there are compiler-specific mechanisms.

  • Clang provides the intrinsic function __builtin_readcyclecounter as a platform-independent way to access such a counter. (Although on platforms that do not provide an accessible cycle counter, such as ARM, it always returns zero.)
  • Intel provides a header file <immintrin.h> that declares several useful X86-specific intrinsics:
    • __rdtsc reads the timestamp counter (essentially equivalent to __builtin_readcyclecounter).
    • _rdrand64_step provides a 64-bit cryptographically secure random number. Unfortunately, this facility only exists in Ivy Bridge (and later) Intel CPUs. Also some (possibly rather paranoid) people have expressed concern that this facility may have been somehow subverted.
    • _rdseed64_step provides a 64-bit “true” random number for use in seeding (generated from thermal noise). Unfortunately, this facility only exists in Broadwell (and later) Intel CPUs.

On the one hand, some of these sources seem to be truly excellent. On the other, the best ones are the least likely to be available. Even if we're only developing for Intel machines, we can't necessarily assume our hardware has Intel's randomness instructions (Sandy Bridge machines are still commonplace, for example, the previous generation Mac Pro).

For portable code, perhaps we can count on 32 bits of entropy from a cycle counter, but not much else. But, like nanosecond-resolution time, the Intel cycle counter alone is sufficient to pass TestU01's BigCrush battery provided we scramble the bits sufficiently (e.g., with seed_seq_fe128).

Entropy from the Operating System

Although std::random_device seems like the obvious source for operating system for entropy, there are some other traditional options. One of the classics for seeding RNGs on a Unix system is to use getpid() to get the process id (PID), a small integer which should vary from run to run. Although getpid is a Unix-ism, Microsoft Windows provides a similar call.

C++11 does, however, provide something remarkably similar for portable code—the thread id—which we can obtain via std::this_thread::get_id(). Unfortunately, for some compilers/platforms, its value doesn't seem to vary from run to run as much as the PID does.

Entropy from Local State

Some of our entropy sources change every time we run the program, but remain the same for the duration of the run. Thus it's useful to also have an entropy source that explicitly changes every time we examine it. A simple counter can do that task.

Design Goals & Resulting Class

Ordinary programmers shouldn't have to think about all these considerations. The whole point of libraries should be to shield everyday programmers from worrying about unnecessary details. Ideally, std::random_device would have been designed to serve as an entropy source (and work as a Seed Sequence), but since that's not the case, we can at least provide the missing functionality.

My way of wrapping up the process of gathering entropy for RNG seeding is to provide a class template auto_seeded that can initialize any Seed Sequence using a combination of the entropy sources I've mentioned in this blog post. For some class S, auto_seeded<S> derives from S but replaces the default constructor with a constructor that initializes S with some appropriately chosen entropy.

You can find the code in randutils.hpp, which includes the code from this post and others in this series.

Thus, to create a Mersenne Twister engine with an arbitrary seed that varies from run to run, we can write

randutils::auto_seeded<std::seed_seq> seeds;
std::mt19937 rng{seeds};

As we've previously learned, std::seed_seq has some issues, so we might want to prefer the alternative, seed_seq_fe, that we developed in the previous blog post. Thus the code also provides auto_seed_128 and auto_seed_256 as handy aliases for auto_seeded<seed_seq_fe128> and auto_seeded<seed_seq_fe256>, allowing us to write:

randutils::auto_seed_128 seeds;
std::mt19937 rng{seeds};

Getting Technical

You might hope that we could compress the two lines of our previous example into a single line, and write it as

std::mt19937 rng{randutils::auto_seed_128{}};

Unfortunately, trying to use a temporary object here doesn't work, due to another minor glitch in the C++11 standard. If we try it, we'll get this error:

error: no matching constructor for initialization of 'std::mt19937'
    std::mt19937 rng{randutils::auto_seed_128{}};
                 ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/v1/random:2113:18: note: 
      candidate constructor [with _Sseq = randutils::auto_seed_128] not viable:
      expects an l-value for 1st argument

The glitch in the standard is that Seed Sequences are passed by l-value reference, and you can't pass a temporary object as an l-value reference (or at least you can't without doing some extra work). The C++ committee should have made the seed parameter a universal reference and then we'd have been all good.

But there is actually a bigger issue than this one. The two lines of code I showed you are (technically) wrong. Once again, C++11's strange requirements of Seed Sequences bite us. For any legitimate Seed Sequence S, auto_seeded<S> is technically not a Seed Sequence, because the requirements state that the default constructor for a Seed Sequence

Creates a seed sequence with the same initial state as all other default-constructed seed sequences [of that type]

In other words, all default constructed seed sequences must have the same state! The whole point of our default constructor was to create seed sequences with (nondeterministically) different states.

Thankfully, there is a solution—we can just cast the object back to its base class which does have a dumb always-the-same default constructor. I thus provide a base method to do just that. And, as a bonus, we get something nice for free, the ability to do all the initialization on one line using a temporary object (because the base method does return a l-value reference!).

Thus the code actually becomes

std::mt19937 rng{randutils::auto_seed_128{}.base()};

Not the most beautiful thing, perhaps, but it is technically correct.

Testing and Performance

Here's a quick test program to show off the resulting class:

#include "randutils.hpp"

#include <cstdio>
#include <cstdint>
#include <cstddef>

int main()
{
    uint32_t seeds[8];

    for (int i = 0; i < 16; ++i) {
        randutils::auto_seed_256 seeder;
        seeder.generate(seeds, seeds+8);
        bool first = true;
        for (uint32_t seed : seeds) {
            printf("%s%08x", first ? "" : ", ", seed);
            first = false;
        }
        printf("\n");
    }
}

Here's one run:

670e5a45, 9ac9ea9c, 88853620, c4c836f9, 07ab5640, b20b313e, 7b945051, 37f50e84
51cb6dd9, b2e34e8a, 8cc4f78e, 6530b128, 1a5eab3c, 0adae353, e3c1587b, dec6000e
45f869e9, 9c802b0a, db790af6, 6ced810c, e8a2a513, 09e6de7c, 87a55cea, 3b67b4a6
7d171d66, e53dd7e9, cc09c87e, 32672c79, 87f427a5, 079b721f, 7e844793, 4729f207
a0db8507, 4ed656d4, 4bc9cc88, bd723730, bb11303d, 979ac968, 34bdc61c, 389d75af
b4840033, e7176d4a, c0a01431, 52e58522, 2ff42863, b4bde340, 34177446, 56b100b6
fc555da1, 56888d72, 2a9c5000, a5a1a01a, 07fa8cfe, a949b563, 1c31c4bd, 37d45aef
4aad1d58, 5cfae50e, e37ecc55, efe03ba6, 6dc0f415, 93fa0f9e, 84c9fa62, b340c696
6767ca87, b0e54c11, 89045654, b889850a, 9aabe2bb, 45628c06, 46d666d6, 0eb4ad37
3273cef7, d0469f31, 759e4dd3, 6fecd66b, b7d350a7, b758b04f, 230c3361, 83cc1a0b
eb0a4104, b99d0441, c69d8737, 40fd935a, 695f7321, 0f71b9bd, 7b05e99f, 02e60d01
0ee38480, 5649ed46, b770579c, c627516b, 6510e5f3, 11df41f0, 5d287c54, 771f7a23
af81b6bb, e437c6df, f062227e, 165653f2, 4be238a8, 8f19f0c1, 001aa816, 2fef1cfb
e088cb5f, 126fb9b6, 4e43d64e, 9cd96242, e0dc26ab, a6b95e20, c989ea24, 9f414b5a
0fdd89ce, 4bea7340, 28a2f1b3, 20a92dec, 5729581a, d20a9670, c0deea0f, 6f0120c4
53f35b6d, 96031a00, 8ac98dc3, 96942558, 0376dd98, aa2e418d, b596fe6c, be355589

and here's another:

cad830e2, 7a572603, 67153026, 15e2b642, 9cbe2af4, d0f7acf3, 11fda8ee, b8cde0bd
9e5b8acd, 1b097e56, cbb3630c, 51b11bd5, 41738595, 4cc7fc18, 125bcf04, 932570f1
fd8854f5, 5c66ebbc, 807b3b14, e747bbf0, 0698eba5, 4875823f, 25c40a6b, 74238874
6fd21e7c, 728f027a, 3c8cc9aa, bf89f104, 5388e256, 7e5cc849, fd3718e1, 3c72215f
a18ec556, 0b523c1e, 6570c3e0, 7cbf8f70, 20f390db, f8d52009, 6851ed91, 55e5d56a
b6460b40, 54659aa8, 8a0ecf2b, fb5cffdc, 49143b62, 22c6c0cb, bb9e47db, 2fc20892
a6df5533, ae12f233, da955e9d, 0abc43df, f3e416f8, 6de0f041, 3d73b1ab, 2b6d860e
0466466c, f5284293, 9d642904, 7ef5f2a9, a4afded5, 0d098b32, 9f8669b6, 51b9c355
7f3222ef, 742f80fc, b3182b2c, 8d6b5fba, 4947df9a, f7ad0a7e, 93e27446, 89d9dfc8
c7cf0252, 755b253a, d8e2c647, 43c7272c, dafc95e1, dd65d784, ca1101d4, 74bdb176
eb37b834, 71509f10, ca168b17, 45855a2d, 4a2c9d6f, 29bece0e, 0d92f673, bc927e6a
daac0c7f, 851aa6b9, 84903a8b, df36a790, 92d87bab, 6096da95, bc79a802, 9b01dd37
5ad0b74e, f75188b6, e8ced8a5, 140c31d8, 3b937a7d, 086fb548, b20a7261, b4e31307
4a20ba89, 58827d51, e8d0fbd9, d052ecb4, e818f611, d97adda9, 33cbe5a2, 863cbc4a
39cb4142, 438fd23d, 7e2aa6bf, 58fdc4c1, 37f3e4da, ad449f06, 5aa3cc36, 3cf6a63a
ec75e458, cda85e15, d60eba81, 9b34dc4e, 6e39d9b4, aa742c69, 311f1950, 027bf991

Of course, as satisfying as it is to see some random output, we can't tell how good this output is just by looking, but, as we did earlier, we can abuse our seed sequence to act as a random number generator. Since we could pass TestU01 using only the high-resolution time, we should expect it to pass TestU01's test batteries. It does. Here are the results from running SmallCrush and BigCrush.

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        auto_seed_128
 Number of statistics:  15
 Total CPU time:   00:01:09.76

 All tests were passed

========= Summary results of BigCrush =========

 Version:          TestU01 1.2.3
 Generator:        auto_seed_128
 Number of statistics:  160
 Total CPU time:   27:34:38.55

 All tests were passed

Although its statistical performance is excellent, this time doesn't look stellar compared to actual random number generators. So let's look more closely at time performance. Here I'll use cycle counts rather than seconds (and take the median of 101 samples), and consider four possible ways to provide our seed data,

  1. randutils::auto_seed_256 seeder;
  2. randutils::auto_seed_128 seeder;
  3. randutils::seed_seq_fe128 seeder{rdev(),rdev()};
  4. std::seed_seq seeder{rdev(),rdev()};

where rdev in the last two is an instance of std::random_device.

auto_seed_128 auto_seed_256 seed_seq_fe_128 std::seed_seq
First Call (LLVM 3.6) 114580 115232 20900 113796
First Call (GCC 4.9) 31724 31153 25424 32132
Subsequent Calls (LLVM 3.6) 440 704 8316 9928
Subsequent Calls (GCC 4.9) 552 808 1252 1516

As I mentioned in a previous post the first-call performance numbers can be hard to make sense of. There's a lot going on, such as page faults bringing library code into memory and so forth. For LLVM, much of the overhead somehow relates to the libc++ C++ library—if we use Clang with the GNU C++ library (not shown above), it actually outperforms the GCC version. Although we could spend more time investigating, it's worth remembering that a one-time cost of 115,000 cycles on my test machine works out to be a mere 34 µs.

Looking beyond the strange world of first-call performance, the auto_seed_128 and auto_seed_256 classes outperform everything else, despite the number entropy sources they consult (and all the mixing done by seed_seq_fe, which they're built on). In fact, auto_seed_128 can produce eight words of output in the time it takes to read a single word from GCC's rdrand-based std::random_device.

(These kind of benchmarks also show that between GCC and LLVM, there's no clear performance winner. Each has scenarios where it shines.)

Conclusions

With auto_seed_128 and auto_seed_256 we can generate high-quality nondeterministic seed entropy, and do so quickly. It works around various questionable decisions in the C++11 specification, from the possibility that std::random_device isn't fit for its purpose, to the strange requirements of C++11 Seed Sequences, to the ungainly ways they must be passed to C++11 RNGs.

Of course, it's still a bit annoying that someone who wants to generate a random letter grade 'A' through 'D' needs to write

std::mt19937 rng_engine{randutils::auto_seed_128{}.base()};
std::uniform_int_distribution<char> dist{'A','D'};
char grade = dist(rng_engine);

That's a lot of visible machinery. But finally we have the pieces in place to fix that, and that'll be the topic for my next post.