On Trivial Predictability

M.E. O'Neill

2017-08-20 21:04

When I wrote the PCG paper back in 2014, I failed really badly when talking about prediction difficulty. When people first started reading the paper and discussing it on the Internet, I realized I had missed the mark, and made sure that the page on this website about predictability had a more nuanced and measured tone, but today I think even that page didn't make the points clearly enough. Possibly I'll fail again today, but let's have another go at trying to articulate the issues.

How the PCG Paper Failed at Discussing Predictability

To mathematicians, the PCG paper failed to make a convincing argument that predictability should be a pressing worry for anyone. This comment from a recent hacker-news thread discussing the PCG paper captures that viewpoint admirably:

But I'll confess to not really understanding what all the fuss is about insecure generators.

I got similar comments from some of the TOMS reviewers when I eventually got their response to my journal submission of the PCG paper.

On the other hand, to many cryptographers, the PCG paper is a travesty. It seems completely half assed, belaboring basics that every cryptographer knows (like the Birthday Problem), with vital cryptographic analysis utterly absent. Nothing you'd expect from a cryptography paper is there. These Hacker News comments capture their concerns:

we addressed [attacks on hash tables] with actual cryptography: SipHash. Since cryptographic random number generators are generated with very similar primitives, why wouldn't SipHash be the answer to those problems as well?

and

Her paper doesn't pass the sniff test for me whatsoever when it comes to security analysis. She spent close to no time analyzing the primitives she introduced (and with no [cryptography-related] proofs or rigor!),

If you're a cryptographer annoyed by the PCG paper, these comments from the same discussion may help you

It's not a security paper. Or at least, I think it isn't?

and

If you stripped out all the (weird, random) cryptographic stuff from this paper, it would read pretty much the same and make pretty much the same points, which tells me: it's not a cryptographic paper.

Finally!

The PCG paper is not a crypto paper. What the PCG paper says is “Hey cryptographers, I hear you, we need to pay some attention to predictability even in non-crypto applications. I agree with the idea that general-purpose PRNGs shouldn't be easily predicted. I've done the best I could under my chosen constraints, but any serious analysis is going to be up to you to do at a later date. I've labelled the ones I know to be trivially predictable explicitly as _insecure and warned people against using them, but I won't label anything as ‘secure’, as that would be far too presumptuous.”

I tried to make some variants harder to predict than others. In trying to articulate that, I didn't do myself any favors, with profoundly ill-chosen phrases like “sensitive applications” and “insecure”, and perhaps worst of all, using the phrase “cryptographic security” to suggest an diverse spectrum of prediction difficulties and levels of security assurance. Mea culpa.

But, on the other hand, do we want to live in a world where only certified cryptographers are allowed to make any efforts at all to try to avoid blatant security weaknesses? I think it depends on your viewpoint; whether some effort is better than none at all; and whether low or unproven security is better than none. If all we ever used PRNGs for was high-security applications, there would be a clear answer, but I would argue that the world is actually more nuanced than that.

Why Not Always Use Certified Cryptographically Secure PRNGs?

Haven't cryptographers already solved the PRNG problem? That's what commenters like the one we saw earlier about using SIPHash would have us believe.

I like cryptographically secure PRNGs. In my (nonexhaustive!) list of recommended alternative PRNGs, I recommend Orson Peters's implementation of a secure PRNG based on ChaCha. If you're doing cryptography, or designing code for a slot machine, anything where security is paramount, you really should be using a proper secure PRNG.

Unfortunately, the “fast” designs I'm aware of for cryptographic hashes and for stream ciphers are designed to hash large chunks of data, or produce large amounts of random output. The requirements for general purpose PRNGs are different.

People can create multiple RNGs, so size matters.
The interface for a general-purpose PRNG is designed to produce one number at a time, so we can't just optimize for filling large blocks of memory with random data.
Speed really matters; users may be choosing between a randomized algorithm (e.g., random pivot quicksort) and a non-randomized version (e.g., median-of-three quicksort). Speed can really make the difference whether a randomized approach is viable, or not.

In the landscape of modern general-purpose PRNGs, a single random number can be produced by just a few machine instructions in a tiny amount of time, often about a nanosecond. That's not an average for a block of numbers, that's for making just one.

If you have an idea for a cryptographically secure PRNG that is competitive on size and speed, please implement it, test it, and tell the world. Shout it from the rooftops! I'll be delighted. Really!

But please, don't just think that some untested idea you have, like, “using SIPHash will be competitive because SIPHash is fast”, ends the discussion. Go try it, and come back with some actual results, please. Over the years many people have said to me, “You should try X” for some X, and I have, and the performance results have always been disappointing.

Why Not Always Use Trivially Predictable PRNGs?

If you're writing a simulation, or performing Monte Carlo integration, or doing procedural content generation, it probably doesn't matter two hoots if the underlying PRNG is trivially predictable.

But not all algorithms execute in such well-controlled circumstances. People use randomized algorithms, and a key assumption for many such algorithms is that randomness can help to avoid pathological behavior. We can construct pathological data so that the non-randomized median-of-three quicksort will turn in quadratic performance, because the data is arranged so that the algorithm always picks a pathological pivot. We can say that such data is unlikely to arise naturally in practice, but there are people out there who enjoy mounting algorithmic-complexity attacks, taking down public facing services by exploiting weaknesses in their algorithms.

In theory, random-pivot quicksort is safer precisely because you can't know in advance which item will be picked as the pivot and constantly picking a bad pivot is vanishingly unlikely. But if an adversary can predict the underlying generator, someone can likewise arrange data in pathological ways.

Thus, a general-purpose PRNG that might be used for any randomized algorithm in any context should probably be trying to not be trivially predictable.

The Space of Trade-Offs

The trade-offs for general-purpose PRNGs are thus different from the trade-offs in the world of cryptography. Algorithmic-complexity attacks aren't a major risk for our algorithms most of the time, so slowing everything down significantly to guard against them is almost always a very literal waste of time. However, paying no attention to these issues and leaving things wide open for exploitation is also looking for trouble.

In trying to find a middle ground in the space of trade-offs between slow-but-iron-clad security and fast-but-utterly-insecure generation, I have reached the following suggestion for a midpoint: if someone has to invest more effort in mounting an algorithmic-complexity attack than the attack itself will cost its victim, that is a significant (but not complete) deterrent, and the would-be attacker is likely to try easier targets instead.

It's also worth realizing that in the context of randomized algorithms, often “a miss is as good as a mile”. Whereas in cryptography a “break” that is off by one bit once in a while won't necessarily hurt in determining at least some of the content of a message, a similar one-bit error in prediction can throw off the careful construction of an algorithmic-complexity attack.

Defining Trivial Predictability

I'm happy to entertain (or learn) alternative definitions of trivial predictability, but here's what I have at this point:

Trivially predictable PRNGs can be predicted in little time, with little skill or insight.

This definition models security in the rest of our world. The locks on our offices, homes, cars, and so forth, present little challenge to experts, although some of the better weak locks may take even an expert a few minutes to open. But a locked car is less of a target than one with the window down and the keys in the ignition, even if the lock would make a true professional laugh.

Many (but not all) of the general purpose PRNGs in widespread use are trivially predictable; they correspond to the car with the window open and the keys in the ignition. In an earlier blog post, I showed that Vigna's Xoroshiro128+ is trivially predictable. James Roper wrote a fantastic series of articles back in 2010 on cracking PRNGs, including LCGs and the Mersenne Twister. The SplitMix paper actually includes the necessary code to discover its parameters from two outputs. And so on.

On the Horizon: A Contest

Here's my plan: I hope to have a page that tells you some output from a known PRNG, provides you the code for the randomized quicksort algorithm it is going to use for sorting, and invites you to suggest an input to sort. Your job will be to provide it with pathological input.

Different PRNGs will be on display, and the seed will change each day and reveal the seed and a possible pathological input for the previous day. You will have to crack that day's PRNG, so there will be a time limit. Some smaller PRNGs may have tighter time limits, such as a minute or two.

Some of the PRNGs on display will be easy, some will be hard. I hope it'll be fun. Of course, I need to actually write the back end, and the fall semester is now looming, so it may not happen as soon as I'd like. The idea is simple though. You could write it, and I could link to it!