On Vigna's PCG Critique

On 14 May 2018, Sebastiano Vigna added a page to his website (archived here) entitled “The wrap-up on PCG generators” that attempts to persuade readers to avoid various PCG generators.

That day, he also submitted a link to his critique to Reddit (archived here). I think it is fair to say that his remarks did not get quite the reception he might have hoped for. Readers mostly seemed to infer a certain animosity in his tone and his criticisms gained little traction with that audience.

Although I'm pleased to see readers of Reddit thinking critically about these things, it is worth taking the time to dive in and see what what lessons we can learn from all of this.

Background

We have to feel a little sympathy for Vigna. On May 4, he updated his website to announce a new generation scheme, Xoshiro and accompanying paper, the product of two years of work. He posted a link to his work on Reddit (archived here and here), and although he got some praise and thanks for his work, he ended up spending quite a lot of time talking not about his new work, but about flaws in his old work and about my work.

Here is an example of the kind of remarks he had to contend with; Reddit user “TomatoCo” wrote:

I liked xoroshiro a lot until I read all of the dire condemnations of it, so I switched to PCG. I'm not a mathematician, I can't understand your papers and PCG's write ups are a lot easier to understand. I'm sure that you've analyzed the shit out of your previous generator and I can see on your site you've come up with new techniques to measure if xoshiro suffers the same flaws. But once bitten, twice shy. Xoroshiro was defended as great with the sole exception of the lowest bit. But then it was "the lowest bit is just a LSFR, so don't use that. Well, actually, the other low bits are also just really long period LSFRs, well, actually," and new flaws were constantly appearing. Respectfully, I think you need to explain more and in simpler terms to earn everyone's trust back.

The reason I picked PCG was because its author could, in plain language, describe its behavior and why some authors witnessed patterns in your RNG.

I think it's quite understandable that Vigna would want to look for ways to take PCG (and me) down a peg or two, and in various comment replies he endeavored to express things he didn't like about PCG (and the PCG website).

Most of the issues he raised were, I thought, adequately addressed and refuted in the Reddit discussion, but having gone to the effort already to try to articulate the things he did not like, even writing code to do so, it makes sense that he would want circulate these thoughts more broadly.

Reddit Reaction #2

Reddit's reaction to Vigna's new PCG-critique page was perhaps not what he hoped for. From what I can tell, pretty much none of the commenters were persuaded by his claims, and much was made of his tone.

Regarding tone, user “notfancy” said:

Take your feud somewhere else. […] theory and practice definitely belong here. The petty squabbling and the name calling definitely don't. Seeing that Vigna himself is posting links to his own site, this is to me self-promoting spam.

and user “foofel” added:

the style in which he presents his stuff is always full of hate and despise, that's not a good way to represent it and probably why people are fed up.

and user “evand” added:

I would describe a lot of it as written very... condescendingly. There's also a lot that is written to attack her and not PCG

and user “AntiauthoritarianNow” chimed in, saying;

Yeah, it's one thing to tease other researchers a little bit, but this guy has a real problem sticking to arguments on the merits rather than derailing into reddit-esque ad-hom.

But the thread also had plenty of rebuttals. For just about every claim Vigna had made in his critique, there was a comment explaining why the claim was flawed.

My Reaction

I could settle back into my chair here, and say, “Thank you, Reddit, for keeping your wits about you!”, but since (at the time of writing) Vigna's page remains live with the same claims, it seems sensible for me to create my own writeup (this one) to address his claims directly.

Moreover, I believe firmly that although it's never much fun to be on the receiving end for invective or personal attacks, in academia peer critique makes everything stronger. While much of what Vigna says about PCG doesn't hold up to closer scrutiny, it is worth trying to find value of some kind in every criticism. I believe in the approach taken in the world of improvisational comedy, known as “Yes, and…”, which suggests that a participant should accept what another participant has stated (“yes”) and then expand on that line of thinking (“and”).

Thus, in the subsequent sections, I'll look at each of Vigna's critiques, first give a defensive response, and then endeavor to find a way to say “Yes, and…” to each one.

Correlations Due to Contrived Seeding

Vigna's first two claims relate to creating two PCG generators whose outputs are correlated because he has specifically set them up to have internal states that would cause them to be correlated.

PCG ext Variants: Single Bit Change to the Extension Array

In the first claim, he modifies the code for the PCG of the extended generation scheme to so that he can flip a single bit in the extension array that adds k-dimensional equidistribution to a base generator.

Vigna creates two pcg64_k32 generators that are the same in all respects except for a single bit difference in one element of the 32-element extension array, and then observes that 31 of every 32 outputs will remain identical between the generators for some considerable time. Vigna clearly considers this behavior to be problematic and notes multiple LFSR-based PRNGs where such behavior would not occur.

Vigna states

Said otherwise, the whole sequence of the generator is made by an enormous number of strongly correlated, very short sequences. And this makes the correlation tests fail.

Vigna concludes that no one should use generators like pcg64_k32 as a result.

Defensive Response

Vigna actually created a custom version of the PCG code to effect his single bit change. The pcg64_k32 generator has 2303 bits of state, 127 bits of LCG increment (which stays constant), 128 bits of LCG current state, and 32 64-bit words in the extension array. The odds of seeding two pcg64_k32 generators each with 2303 bits of seed entropy and finding that they only differ by a single bit in the extension array is 1 in 22292, an order of magnitude so vast that it cannot be represented as a floating point double.

If the PRNG were properly initialized (e.g., using std::seed_seq or pcg_extras::sed_seq_from<std::random_device>), Vigna's observed correlation would not have occurred. Likewise, had the single bit change been in the LCG side of the PRNG, it would also not have occurred.

But what of Vigna's other claim, that PRNGs that are slow to diffuse single-bit changes to their internal state are necessarily bad? Vigna is right that for LFSR-based designs, the rate of bit diffusion (a.k.a. “avalanche”) matters a lot.

However, numerous perfectly good designs for PRNGs would fail Vigna's criteria. All counter-based designs (e.g., SplitMix, Random123, Chacha) will preserve the single bit difference indefinitely if we examine their internal state. In fact, Vigna's collaborator, David Blackman, is author of gjrand, which also includes a counter whose internal state won't diverge significantly over time. But of these designs, only SplitMix would fail a test that looks for output correlations rather than similar internal states.

The closest design to PCG's extension array is found in George Marsaglia's venerable XorWow PRNG, shown below (code taken from the Wikipedia page):

/* The state array must be initialized to not be all zero in the first four 
   words */
uint32_t xorwow(uint32_t state[static 5])
{
    /* Algorithm "xorwow" from p. 5 of Marsaglia, "Xorshift RNGs" */
    uint32_t s, t = state[3];
    t ^= t >> 2;
    t ^= t << 1;
    state[3] = state[2]; state[2] = state[1]; state[1] = s = state[0];
    t ^= s;
    t ^= s << 4;
    state[0] = t;
    return t + (state[4] += 362437);
}

In Marsaglia's design, state[4] is a counter in much the same way that PCG's extension array is a “funky counter”. Marsaglia calls this counter a Weyl sequence after Hermann Weyl, who proved the equidistribution theorem in 1916.

We can exactly reproduce Vigna's claim's about pcg64_k32 producing similar output with XorWow. The program uncxorwow.c is a port of his demonstration program to XorWow. It fails if tested with PractRand, and, if we uncomment the printf statements, after 1 billion iterations we see that the outputs have not become uncorrelated. They continue to differ only in their high bit. And they will continue that way forever:

61b0be0f
e1b0be0f
c5a003d8
45a003d8
20e14479
a0e14479
5a5ebe42
da5ebe42
99ce85af
19ce85af
d2a1aabb
52a1aabb
6bf29670
ebf29670
948587d6
148587d6
e2c0f91c
62c0f91c
536fe7eb
d36fe7eb

Similarly, Vigna's complaint about “strongly correlated very short sequences” could likewise be applied to XorWow. It consists of 264 very similar sequences (differing only by a constant). It might seem bad at a glance to concatenate a number of very similar sequences but it is worth realizing that the nearest similar sequence is 2128-1 steps away. If Vigna would characterize 2128-1 as “very short”, he must be using a mathematician's sense of scale.

Marsaglia's design of Xorwow quite deliberately uses a very simple and weak generator (a Weyl sequence) for a specific purpose. We could say “a counter isn't a very good random number generator”, but the key idea is that it doesn't need to be. It's not the whole story. It's a piece with a specific role to play, and it doesn't need to be any better than it is.

PCG's extended generation scheme is a similar story. The extension array is a funky counter akin to a Weyl sequence (each array element is like a digit of a counter). It's slightly better than a Weyl sequence (a single bit change will quickly affect all the bits in the in that array element), but it is essentially the same idea.

The pcg64_k32_oneseq and pcg64_k32_fast generators follow XorWow's scheme of just joining together the similar sequences. pcg64_k32 swaps around chunks of size 216 from each similar sequence. In all cases, from any starting point you would need 2128 outputs before the base linear congruential generator lined up to the same place again, and vastly more for the extension array to line up similarly. In short, for pcg64_k32 the correlated states are quite literally unimaginably far away from each other.

Talking about his contrived seedings, Vigna notes that, “This is all the decorrelation we get after a billion iterations, and it will not improve (not significantly before the thermodynamical death of the universe).” What he seems to have missed is the corollary to his statements—correlation and decorrelation are sides of the same coin. Two currently uncorrelated pcg64_k32 states will not correlate before the heat death of the universe either.

In short, Vigna contrived a seed to show correlation that would never arise in practice with normal seeding, nor could arise by advancing one generator. His critique is not unique to PCG, and should not be a concern for users of PCG.

“Yes, and…” Response

A rather flippant “Yes, and…” response is that I'm perfectly happy for people to avoid pcg64_k32, as I'm not at all sure it is buying you anything meaningful over and above pcg64— it's a fair amount of added code complexity for something of dubious value. In fact, I didn't even bother to implement it in the C version and only a small number of people who have ported PCG have implemented it. As I see it, k-dimensional equidistribution sounds like a cool property, but the only use case I've found for such a property is performing party tricks. But some people do like k-dimensional equidistribution, so let's press on…

First, Vigna went to far too much trouble to create correlated states. He copied the entire C++ source for PCG and hacked it to make a private data member public so he could set a single bit. Had he been more familiar with the features the extended generators provide, he could instead have written.

pcg64_k32 rng0;
pcg64_k32 rng1 = rng0;
rng1.set(rng0() ^ 1);

This code uses pcg64_k32's party-trick functionality to leap unimaginably huge distances across the state space to find exactly the correlated generator you want, one that is the same in every respect except for one differing output.

In other words, what he sees as a deficiency, I've already highlighted as a feature.

But whether it is achieved by the simple method above, or the more convoluted method Vigna used, we have the question of what to do if people are allowed to create very correlated generator states that would not normally arise in practice. One option is to just say “don't do that”, but a more “Yes, and…” perspective would be to allow people to create such states if they choose but provide a means to detect them. More on that in the next section.

It's also worth asking whether the slowness with which a single bit change diffuses across the extension array is something inherent in the design of PCG's extended generation scheme, or mere happenstance. In fact, it is the latter.

The only cleverness in the extended generation scheme isn't the idea of combining two generators, a strong one and a weaker-but-k-dimensionally-equidistributed one, it's the fact that we can do so without any extra state to keep track of what we're doing.

I'm thus not wedded to the particular Weyl-sequence inspired method I used. If it's important that unimaginably distant similar generators do not stay correlated for long, that's a very easy feature to provide.

When I designed how the extension array advances, I made a choice to make it “no better than it needs to be”. It doesn't need good avalanche properties, so that wasn't a design concern. But that doesn't mean it couldn't be tweaked to have good avalanche properties, so that a single bit change affects all the bits the next time the extension array advances. In fact, having designed seed_seq_fe for randutils, I'm aware of elegant and amply efficient ways to have better avalanche, so why not?

It may not really be necessary, but I actually like this idea. So thanks, Sebastiano, I'll address this issue in a future update to PCG that provides some alternative schemes for updating the extension array!

PCG Regular Variants: Contrived seeds for Inter-Stream Correlations

In his next concern, Vigna uses makes correlated generators from two “random looking” seeds. He presents a program, corrpcg.c that mixes together the two correlated generators and can then be fed into statistical tests (which will fail because of the correlation).

Defensive Response

We can devise bad seed pairs for just about any PRNG. Here are three example programs, corrxoshiro.c, corrsplitmix.c, and corrxorwow.c, which initialize generators with two “random looking” seeds but create correlated streams that will fail statistical tests if mixed.

In all cases, despite being “random looking”, the seeds are carefully contrived. Seeds such as these would be vanishingly unlikely with proper seeding practice.

As before, the concerns Vigna expresses apply to many prior generators. We can view XorWow's state[4] value as being a stream selection constant, but this time let's focus in on SplitMix. For SplitMix, different gamma_ values constitute different streams.

In corrsplitmix.c the implementation is hard-wired to use a single stream (0x9e3779b97f4a7c15), but in corrsplitmix2.c we mix two streams, (0x9e3779b97f4a7c15 and 0xdf67d33dd518d407) and observe correlations. Although these gamma values look random, they are not, they are carefully contrived. In particular, here 0xdf67d33dd518d407 * 3 = 0x9e3779b97f4a7c15 (in 64-bit arithmetic), which means that every third output from the second stream will exactly match an output from the first.

Vigna's critique thus applies at least as strongly to SplitMix's streams as it does to PCG's.

I have written at length about PCG's streams (and discussed SplitMix's, too). I freely acknowledge that these streams exist in a space of trade-offs where we are choosing to do the cheap thing, leveraging the properties of the underlying LCG (or Weyl sequence for SplitMix). In that article, I say:

Changing the increment parameter is just barely enough for streams that are actually useful. They aren't statistically independent, far from it, but they are distinct and they do help.

No one should worry that PCG's streams makes anything worse.

“Yes, and…” Response

Although it is vanishingly unlikely that two randomly seeded pcg64 generators would be correlated (it would only happen with poor/adversarial seeding), it is reasonable to ask if this kind of correlation due to bad seeding can be detected.

We can even argue that another checklist feature for a general-purpose PRNG is the ability to tell how independent the sequences from two seeds are likely to be. PCG goes some way towards this goal with its - operator that calculates the distance between two generators, but the functionality was originally designed for generators on the same stream. I've now updated that functionality so that for generators on different streams, it will calculate the distance to their point of closest approach (i.e., where the differences between successive values of the underlying LCG align).

So it's now possible with PCG to compare two generators to see whether they have been badly seeded so that they correlate.

Here's a short test program:

#include "pcg_random.hpp"
#include "pcg_extras.hpp"

#include <iostream>
#include <iomanip>
#include <random>

int main() {
    using namespace pcg_extras;

#if USE_VIGNA_CONTRIVED_SEEDS
    pcg64 x(PCG_128BIT_CONSTANT(0x83EED115C9CBCC30, 0x4C55E45838B75647),
            PCG_128BIT_CONSTANT(0x3E0897751B1A19E7, 0xD9D50DD3E3A454DC));
    pcg64 y(PCG_128BIT_CONSTANT(0x7C112EEA363433CF, 0xB3AA1BA7C748A9B9),
            PCG_128BIT_CONSTANT(0x41F7688AE4E5E618, 0x262AF22C1C5BAB23));
#elif USE_PCG_UNIQUE
    pcg64_unique x,y;
#elif USE_SMALL_SEEDS1
    pcg64 x(0), y(1);
#elif USE_SMALL_SEEDS2
    pcg64 x(0,0), y(0,1);
#elif USE_SMALL_SEEDS3
    pcg64 x(0,0), y(1,1);
#elif USE_RANDOM_DEVICE
    pcg64 x(seed_seq_from<std::random_device>{}), 
        y(seed_seq_from<std::random_device>{});
#endif

    std::cout << std::hex;
    for (int i = 0; i < 10; ++i) {
        std::cout << (x - y) << ": ";
        std::cout << x() << ", " << y() << "\n";
    }
}

And here are the results of running it (in each case, each line shows the distance between the streams and a value from each PRNG; the distance stays the same because the PRNGs are advancing together):

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_RANDOM_DEVICE && ./strmdist
a571d615b08fea47c84f39f0811f04f: c021049beac5efd0, ceaa3596f168e8b6
a571d615b08fea47c84f39f0811f04f: 573371998db59a67, e5d84a00b37c3556
a571d615b08fea47c84f39f0811f04f: bc4246c671ef9a1f, 1b13ad2f224707c7
a571d615b08fea47c84f39f0811f04f: b1f3e4ffcfef569, 11b50b226a67cdbe
a571d615b08fea47c84f39f0811f04f: 8a378ec693dc1e4, 903ccfd4dc769389
a571d615b08fea47c84f39f0811f04f: 4799de5c580be6ab, 22d13ce52d83c9cb
a571d615b08fea47c84f39f0811f04f: e8fdf041a93626e8, f24c8f49866b7b4e
a571d615b08fea47c84f39f0811f04f: f29e3d08104d7630, b37e5b58ae91d45c
a571d615b08fea47c84f39f0811f04f: 28f524ad8f57bedb, 52d41d39b1186616
a571d615b08fea47c84f39f0811f04f: 9be8cb37ea8952b5, e6812ed8f0613d3

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_RANDOM_DEVICE && ./strmdist
25c3990ef6e7766ab543435aa25f4326: 2f76ab68249fd7f5, 4fbfc0ce19119391
25c3990ef6e7766ab543435aa25f4326: 933845d6c7ad9396, 7572dae64b2cc5a
25c3990ef6e7766ab543435aa25f4326: d7d1dc18bae0604a, 5b1f8310e1f0dc8a
25c3990ef6e7766ab543435aa25f4326: 85cd1dcff8830ad5, a1cfea3c01314c8d
25c3990ef6e7766ab543435aa25f4326: 543ba46266a0b6ba, 7217b15c05cba254
25c3990ef6e7766ab543435aa25f4326: 5a3bd5d4d6c49a55, a243af7df5cfe287
25c3990ef6e7766ab543435aa25f4326: 9f2dc30afc3dcead, deaa9d03f7ca1117
25c3990ef6e7766ab543435aa25f4326: 5856b884c1298dc9, 67502e4490b77bae
25c3990ef6e7766ab543435aa25f4326: 9b94ebb084cc6fdd, 2e07957697add77c
25c3990ef6e7766ab543435aa25f4326: efe6b451c262a3fb, 2e94d782daae964d

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_RANDOM_DEVICE && ./strmdist
32982840d1ddcb5e7f1ed57a6d496525: 96ed26957ef938db, 568fe0aa7e9e8a26
32982840d1ddcb5e7f1ed57a6d496525: 33270d80d24b0965, 44e42e1afc4db710
32982840d1ddcb5e7f1ed57a6d496525: 6de9ac5272dd1193, 90696d1c4f52e71d
32982840d1ddcb5e7f1ed57a6d496525: 43c5c899c7123e57, 337b9d25e00fb0de
32982840d1ddcb5e7f1ed57a6d496525: 753954b73076704d, f4fce4c33756df7e
32982840d1ddcb5e7f1ed57a6d496525: 3b5dc9402b56584d, fd7ae3c708355dc0
32982840d1ddcb5e7f1ed57a6d496525: 15a9227305a442d8, 78fa04eb7f881590
32982840d1ddcb5e7f1ed57a6d496525: b9e58872c3a299, 381a8f851acbc5f4
32982840d1ddcb5e7f1ed57a6d496525: 1b624879e6cf5128, aa908d3a4f2d8f02
32982840d1ddcb5e7f1ed57a6d496525: 79d4836bb5a56a77, 1650f74b3ef617f9

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_SMALL_SEEDS1 && ./strmdist
1c31b969dc65d7b0df636de659042bb1: 1070196e695f8f1, e175e32ed3507bfa
1c31b969dc65d7b0df636de659042bb1: 703ec840c59f4493, c0bf922a0b283109
1c31b969dc65d7b0df636de659042bb1: e54954914b3a44fa, 140bfa21e68785bb
1c31b969dc65d7b0df636de659042bb1: 96130ff204b9285e, c5ec8bcc4fe35830
1c31b969dc65d7b0df636de659042bb1: 7d9fdef535ceb21a, 4dd8ed1ca22869c5
1c31b969dc65d7b0df636de659042bb1: 666feed42e1219a0, c9bffa29c802ef4c
1c31b969dc65d7b0df636de659042bb1: 981f685721c8326f, 3aa09aa4e147478b
1c31b969dc65d7b0df636de659042bb1: ad80710d6eab4dda, 1dfdf6222d06378c
1c31b969dc65d7b0df636de659042bb1: e202c480b037a029, 5a05dacf4df61d4e
1c31b969dc65d7b0df636de659042bb1: 5d3390eaedd907e2, 489650b1eb840a26

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_SMALL_SEEDS2 && ./strmdist
151361a7e7368c239a3988178df4d76d: d4feb4e5a4bcfe09, acdbf879b3c73375
151361a7e7368c239a3988178df4d76d: e85a7fe071b026e6, 7ea754d074e8d88f
151361a7e7368c239a3988178df4d76d: 3a5b9037fe928c11, f8fc7aec8ae6245a
151361a7e7368c239a3988178df4d76d: 7b044380d100f216, 7d2ebc3c0b5bedb4
151361a7e7368c239a3988178df4d76d: 1c7850a6b6d83e6a, cbaf666f55051666
151361a7e7368c239a3988178df4d76d: 240b82fcc04f0926, 4eba9f04dfb9903b
151361a7e7368c239a3988178df4d76d: 7e43df85bf9fba26, 4fab6bcf361bd63d
151361a7e7368c239a3988178df4d76d: 43adf3380b1fe129, 257fcac1ed3817df
151361a7e7368c239a3988178df4d76d: 3f0fb307287219c, bf6f5515988a494
151361a7e7368c239a3988178df4d76d: 781f4b84f42a2df, 1081ed38c84c1c9d

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_SMALL_SEEDS3 && ./strmdist
edfe668df810de6e58b8e92e878fefa: d4feb4e5a4bcfe09, d4692f845d3a3706
edfe668df810de6e58b8e92e878fefa: e85a7fe071b026e6, bb0f09b0eebab6ff
edfe668df810de6e58b8e92e878fefa: 3a5b9037fe928c11, e26ac904ad283c09
edfe668df810de6e58b8e92e878fefa: 7b044380d100f216, 83860212b5d92197
edfe668df810de6e58b8e92e878fefa: 1c7850a6b6d83e6a, 1c3601ed5afd3f49
edfe668df810de6e58b8e92e878fefa: 240b82fcc04f0926, 5e4fa027be29b47e
edfe668df810de6e58b8e92e878fefa: 7e43df85bf9fba26, b930e28d59383019
edfe668df810de6e58b8e92e878fefa: 43adf3380b1fe129, e0d61e1b074df835
edfe668df810de6e58b8e92e878fefa: 3f0fb307287219c, f42c38b1aca3ac9d
edfe668df810de6e58b8e92e878fefa: 781f4b84f42a2df, 19e9cc4fa58fd0ad

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_PCG_UNIQUE && ./strmdist
534a7c98f86b50b72fad6990038ba18: af8a07de4c8d67d1, d649257470c0180d
534a7c98f86b50b72fad6990038ba18: 3789d12fe8e452b1, 1017152e85f732fc
534a7c98f86b50b72fad6990038ba18: c3c4e780fd60901b, 91a9d78551f0c776
534a7c98f86b50b72fad6990038ba18: e7257e02f7fa5b40, 46fb62417ebf2f13
534a7c98f86b50b72fad6990038ba18: 3697948fa9aa8378, 60e44721c6fbc9d0
534a7c98f86b50b72fad6990038ba18: 7bdbcc91de7efbcf, 21de9d1dc03e2ca6
534a7c98f86b50b72fad6990038ba18: 9cf598a61c9ad958, 62e8c3dc421f4e58
534a7c98f86b50b72fad6990038ba18: 5c8a6da6c91b7d35, 3cb08b7e59fd655a
534a7c98f86b50b72fad6990038ba18: f55a8b190a85c9c0, 5a71766fac52ec8a
534a7c98f86b50b72fad6990038ba18: 906b1a30904fe59, f71525dc1d91a06e

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_PCG_UNIQUE && ./strmdist
1b7a9a85b5ed2b6a2a92da9e093eba18: a11d6aa92efc9a79, e646943445e368a
1b7a9a85b5ed2b6a2a92da9e093eba18: 35026a6e1a195a29, 906b9bed756e1667
1b7a9a85b5ed2b6a2a92da9e093eba18: af1f1193515d9e7b, fe51967d5d532f70
1b7a9a85b5ed2b6a2a92da9e093eba18: 61baa5620ceeff38, 644345c453ee3b11
1b7a9a85b5ed2b6a2a92da9e093eba18: 71e88c9c27a7abbf, 1b6a254f565f6c70
1b7a9a85b5ed2b6a2a92da9e093eba18: 1125753cd420e3c1, 8be4065858e93c57
1b7a9a85b5ed2b6a2a92da9e093eba18: a53ce57ffaa57eb3, 7f1c546ae9bf7b61
1b7a9a85b5ed2b6a2a92da9e093eba18: 4cf2c7c152326c4, ada2d31650f07ef8
1b7a9a85b5ed2b6a2a92da9e093eba18: b731cbec3bfba773, 92ce80f0c8dc855f
1b7a9a85b5ed2b6a2a92da9e093eba18: b8c449d4872f7971, 44ed4207442550da

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_PCG_UNIQUE && ./strmdist
360981a27aee6d34271feaa80270ba18: 5da8c0afa4330059, 67af26ab1d05ed52
360981a27aee6d34271feaa80270ba18: ef0ef074871cc9a0, cda2688372cb72b7
360981a27aee6d34271feaa80270ba18: 6a15c49d4ae8d89d, 3708ddd964f616fe
360981a27aee6d34271feaa80270ba18: dd8f24112bcbf580, 69309c3ffa6cea2e
360981a27aee6d34271feaa80270ba18: e8f252a4132fd0e3, e3ff9751773f6db
360981a27aee6d34271feaa80270ba18: e23a1246ea5980be, 1161fd499cbecafa
360981a27aee6d34271feaa80270ba18: 1d19a64904134065, a9e31a01b4c51a43
360981a27aee6d34271feaa80270ba18: 2c3166d304f9dedf, fdd3f540a6859c19
360981a27aee6d34271feaa80270ba18: 8f73778d1f6133ea, 13a54957b3c65205
360981a27aee6d34271feaa80270ba18: c8d362ba3d62239, 66db0b2ae6908dc8

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_PCG_UNIQUE && ./strmdist
1266069359d4404d4fe77f291da43a18: 9994872b3cc3104c, 5582722b3f354f4b
1266069359d4404d4fe77f291da43a18: cec9ae92f2f0a929, 7a2d534e7c3a7281
1266069359d4404d4fe77f291da43a18: ce777879518e6169, c384bb65c1d4364b
1266069359d4404d4fe77f291da43a18: 2cb082454d09aa19, 703c5ad7747a9b42
1266069359d4404d4fe77f291da43a18: a581d3154c60654, b4b9369d997cda6e
1266069359d4404d4fe77f291da43a18: 5ba66e3d99cd33c9, 80aa887fbb5fdef3
1266069359d4404d4fe77f291da43a18: 1038e3281dcae11d, 54c304cf2a66182c
1266069359d4404d4fe77f291da43a18: 9df3df9d27af7148, 7ddd385e114299b9
1266069359d4404d4fe77f291da43a18: bf1656198867bd08, 7aeae9ba84a17dbe
1266069359d4404d4fe77f291da43a18: 60aef1418aa1c6f1, 8a7196feda932f06

unix% c++ -Wall -std=c++11 -o strmdist strmdist.cpp -Iinclude -DUSE_VIGNA_CONTRIVED_SEEDS && ./strmdist
0: e1e4e4b44cca9ade, 43dc3c9c96899953
0: a3ef563648055140, 2b8a051f7ab1b24
0: 7aa3dc341221459a, 1a0960a2cd3d51ee
0: cfa0d055fbe9f476, a0abf5d3e8ed9f41
0: b69403f2c93f3fce, 807e58a7e7f9d6d2
0: a2550ed76e8d9ae, 144aa1daedd1b35e
0: a1f898a64347533b, c532263a99dd0fc4
0: d483377a20c295f0, bbd10614af86a019
0: 5c6469b1053d2ce1, 9c2b8c8d2e20a7a5
0: 5f91b4bd64d5eeb1, 58afc8da4eb26af7

As we can see in the last example, Vigna contrived seeds that had the streams exactly aligned. The values from each stream are distinct in this case, but a statistical test will see that they are correlated.

Interpreting the distance value is easy in this case, but not every user will be able to do so, and some distances (e.g., a just a single high bit set) would also be bad, so better detection of contrived seeds probably demands a new function, independence_score(), based on this distance metric.

Beyond these functions, there is also the question of whether it is wise to allow users to seed generators where they can specify the entire internal state. Vigna's generators (and all basically LFSR generators) must avoid the all-zeros state and do not like states with low hamming weight (so { seed, 0, 0, 0 } is also a poor choice). With these issues in mind, perhaps we should deny users the ability to seed the entire state. That might prevent some contrived seedings like the one Vigna used. I'm not fully sold on this idea, but it is a widely-used approach use by other generators (e.g., Blackman's gjrand) and worth considering.

Although Vigna's contrived seeding was a bit silly, his example has helped me improve the PCG distance metric, given us another checklist feature that some people might want (detecting bad seed pairs), got me thinking about future features, and returned me to the topic of good seeding. All in all, we can call this a positive contribution. Thanks, Sebastiano!

Prediction Difficulty

The next two sections relate to predicting PCG.

Predicting pcg32_oneseq

Vigna writes:

To me, it has been always evident that PCG generators are very easy to predict. Of course, no security expert ever tried to to that: it would be like beating 5-year-old kid on a race. It would be embarrassing.

So we had this weird chicken-and-egg situation: nobody who could easily predict a PCG generator would write the code, because they are too easy to predict; but since nobody was predicting a PCG generator, Melissa O'Neill kept on the absurd claim that they were challenging to predict.

Vigna then goes on to show code to predict pcg32_oneseq, a 64-bit PRNG with 32-bit output.

Defensive Response

As one reddit observer wrote:

[Vigna's] program needs to totally brute force half of the state, and then some additional overhead to brute force bits of the rest of the state, so runtime is 2n/2, exponential, not polynomial.

Vigna has written an exponential algorithm to brute force 32 bits of state. I hope it was obvious to almost everyone that I never claimed that brute-forcing 32-bits of state was hard. In fact, I have already outlined how to predict pcg32 (more bits to figure out given the unknown stream). I observed that pcg32 is predictable using established techniques (specifically the LLL algorithm), and I have even linked to an implementation of those ideas by Marius Lombard-Platet.

I characterize pcg32_oneseq as easy to brute force, and pcg32 as annoying (as Marius Lombard-Platet discovered). Only when we get to pcg64 do we have something where there is a meaningful challenge.

If Vigna really believes that all members of the PCG family are easy to predict, he should have predicted pcg64 or pcg64_c32.

“Yes, and…” Response

The best part of Vigna's critique are these lines:

Writing the function that performs the prediction, recover(), took maybe half an hour of effort. It's a couple of loops, a couple of if's and a few logical operations. Less than 10 lines of code (of course this can be improved, made faster, etc.).

and the source code comment that reads:

Pass an initial state (in decimal or hexadecimal), see it recovered from the output in a few seconds.

So, here Vigna is essentially endorsing all the practical aspects I've previously noted regarding trivial predictability. Specifically, he's noting that with little time or effort, he can write a simple program that quickly predicts a PRNG and has actually done so. This is very different from taking a purely theoretical perspective (e.g., noting that techniques exist to solve a problem in polynomial time without ever implementing them).

In other words, clearly ease of prediction matters to Vigna. So we both agreepcg32_oneseq is easy to predict.

Now let's keep that characterization of easiness and move on to some of the other generators.

Vigna and I would agree, I think, that I lack the necessary insight to develop fast prediction methods for pcg64 or pcg64_c32 (it's an instance of Schneier's Law). Vigna is also right that, if it is tractable to predict, those who might have the necessary skill lack much incentive to try. For some years I have been intending to have a prediction contest with real prizes and I remain hopeful that I'll find the time to run such a contest this summer. When the contest finally launches, I hope he'll have a go—I'd be delighted to send him a prize.

Predicting pcg64_once_insecure

Vigna also notes that he can invert the bijection that serves as the output function for pcg64_once_insecure, which reveals the underlying LCG with all its statistical flaws.

Defensive Response

I noted this exact issue in 2014 in the PCG paper. It's why pcg64_once_insecure has the name it does. I discourage its use as a general-purpose PRNG precisely because of its invertible output function.

“Yes, and…” Response

Vigna is at least acknowledging that some people might care about this property.

Speed and Comparison against LCGs

Finally, Vigna develops a PCG variant using a traditional integer hash function based on MurmurHash (I would call it PCG XS M XS M XS). He claims it is faster than the PCG variants I recommend and notes that he doesn't consider PCG especially fast.

Defensive Response

I considered this exact idea in the 2014 PCG paper. In my tests, I found that a variant using a very similar general integer hash function was not as fast as the PCG permutations I used.

Testing is a finicky business.

“Yes, and…” Response

I absolutely agree with Vigna's claim that people should run their own speed tests.

I also realized long ago that PCG probably won't have the speed crown, because it can't. A simple truncated 128-bit LCG passes all standard statistical tests once we get up to 128 bits, and beats everything, including Vigna's generators. Because pcg64 is built from a 128-bit LCG, it can never beat it in speed.

I should write a blog post on speed testing. But here's a taste. We'll use Vigna's hamming-weight test as our benchmark, because it is a real program that puts randomness to actual use but is coded with execution speed in mind.

First, let's test the Mersenne Twister. Compiling with Clang, we get

processed 1.75e+11 bytes in 130 seconds (1.346 GB/s, 4.847 TB/h). Fri May 25 14:03:25 2018

whereas compiling with GCC, we get

processed 1.75e+11 bytes in 73 seconds (2.397 GB/s, 8.631 TB/h). Fri May 25 14:05:44 2018

With GCC, it runs almost twice as fast.

Now let's contrast that result with this 128-bit MCG:

static uint128_t state = 1;   // can be seeded to any odd number

static inline uint64_t next()
{
    constexpr uint128_t MULTIPLIER =
        (uint128_t(0x0fc94e3bf4e9ab32ULL) << 64) |  0x866458cd56f5e605ULL;
            // Spectral test: M8 = 0.71005, M16 = 0.66094, M24 = 0.61455
    state *= MULTIPLIER;
    return state >> 64;
}

Compiling with Clang, we get

processed 1.75e+11 bytes in 39 seconds (4.488 GB/s, 16.16 TB/h). Fri May 25 14:16:25 2018

whereas with GCC we get

processed 1.75e+11 bytes in 58 seconds (3.017 GB/s, 10.86 TB/h). Fri May 25 14:18:14 2018

The GCC code is no slouch, but Clang's code here is much faster. Clang is apparently better at 128-bit math.

If we really care about speed though, this 128-bit MCG (which uses a carefully chosen 64-bit multiplier instead of a more typical 128-bit multiplier) is even faster and still passes statistical tests:

static uint128_t state = 1;   // can be seeded to any odd number

static inline uint64_t next()
{
    return (state *= 0xda942042e4dd58b5ULL) >> 64;
}

Compiling with Clang, we get

processed 1.75e+11 bytes in 37 seconds (4.73 GB/s, 17.03 TB/h). Fri May 25 14:09:26 2018

whereas with GCC we get

processed 1.75e+11 bytes in 44 seconds (3.978 GB/s, 14.32 TB/h). Fri May 25 14:11:40 2018

Again, Clang takes the speed crown; its executable generates and checks 1 TB of randomness about every 3.5 minutes.

If we test Vigna's latest generator, xoshiro256**, and compile with Clang, it gives us

processed 1.75e+11 bytes in 50 seconds (3.5 GB/s, 12.6 TB/h). Fri May 25 14:30:05 2018

whereas with GCC we get

processed 1.75e+11 bytes in 43 seconds (4.07 GB/s, 14.65 TB/h). Fri May 25 14:31:52 2018

This result is very fast, but not faster than either 128-bit MCG.

Finally, let's look at PCG-style generators. First let's look at Vigna's proposed variant. Compiling with Clang, we get

processed 1.75e+11 bytes in 59 seconds (2.966 GB/s, 10.68 TB/h). Fri May 25 14:44:37 2018

and with GCC we get

processed 1.75e+11 bytes in 62 seconds (2.823 GB/s, 10.16 TB/h). Fri May 25 14:46:42 2018

This is one of the rare occasions where GCC and Clang actually turn in almost equivalent times.

In contrast, with the general-purpose pcg64 generator, compiling with Clang I see:

processed 1.75e+11 bytes in 57 seconds (3.07 GB/s, 11.05 TB/h). Fri May 25 14:57:02 2018

whereas with GCC, I see

processed 1.75e+11 bytes in 64 seconds (2.735 GB/s, 9.844 TB/h). Fri May 25 14:59:07 2018

Thus, depending on which compiler we choose, Vigna's variant is either slightly faster or slightly slower.

Finally, if we look at pcg64_fast, compiling with Clang gives us

processed 1.75e+11 bytes in 49 seconds (3.572 GB/s, 12.86 TB/h). Fri May 25 15:00:45 2018

and with GCC we get

processed 1.75e+11 bytes in 65 seconds (2.693 GB/s, 9.693 TB/h). Fri May 25 15:02:15 2018

Again the performance of GCC is a bit disappointing; this MCG-based generator is actually running slower than the LCG-based one.

From this small amount of testing, we can see that pcg64 is not as fast as xoshiro256**, but a lot depends on the compiler you're using—if you're using Clang (which is the default compiler on OS X), pcg64_fast will beat xoshiro256**.

There's plenty of room for speed improvement in PCG. My original goal was to be faster than the Mersenne Twister, which it is, but knowing that it'll always be beaten by the speed of its underlying LCG I haven't put a lot of effort into optimizing the code. I could have used the faster multiplier that I used above, and there is actually a completely different way of handling LCG increment that reduces dependences and enhances speed but implementing LCGs that way makes the code more opaque. If PCG's speed is an issue, these are design decisions are worth revisiting.

But the speed winner is clearly a 128-bit MCG. It's actually what I use when speed is the primary criterion.

Conclusion

None of Vigna's concerns raise any serious worries about PCG. But critique is useful, and helps spur us to do better.

I'm sure Vigna has spent far longer thinking about PCG than he would like, so it is best to say a big thank you to him for all the thought and energy he has expended in these efforts. I'm pleased that I've mostly been able to put the critique to good use—it may be mostly specious for users, but it is certainly helpful for me. Reddit mostly saw vitriol and condescension, but I prefer to see it as a gift of his time and thought.

Thanks, Sebastiano!