Random Invertible Mapping Statistics
In my previous post, I looked at Bob Jenkins's JSF PRNG which has, at its heart, a random invertible mapping rather than a random (noninvertible) mapping.
I discussed the theory of random (noninvertible) mappings in my post “Too Big To Fail”, which looked at the behavior generators based on this idea. Although random (noninvertible) mappings produce PRNGs that I consider needlessly flawed (for general-purpose use), their behavior is at least well characterized in the academic literature, particularly by Philippe Flajolet (e.g., Analytic Combinatorics, section VII.3.3 pages 462-467).
Random invertible mappings have different properties, being based on a random bijection (which means that the generator can tick backwards as well as forwards if desired), whereas a random (noninvertible) mapping merely requires a random function (which may not be able to go backwards because there is no guarantee that there is only one place we could have come from).
If I were to search the literature, I would not be surprised to find that someone else has done a similar analysis of random invertible mappings. But a quick Google search revealed nothing, and, for me at least, deriving results myself is far more fun than searching for results derived by others, and often a good deal faster if there are no immediate leads. So that's what I've done. Now, hopefully, if someone Googles it, they'll find something. But if you have a good source to cite, please contact me and I'll update this article.
Random Invertible Mappings
In a random invertible mapping, we pick random bijection and use that to define a mapping. For example, suppose we use the function
int next(int current) {
current += 13;
current ^= current >> (WORDBITS/2);
return current;
}
This function is invertible (a bijection) because we can write a prev
function to go backwards:
int prev(int current) {
current ^= current >> (WORDBITS/2);
current -= 13;
return current;
}
If we visualize this function in a tiny world where we have 6-bit integers, it looks like
All the nodes are on a cycle, but they are not on a single cycle—we can see that there is one large cycle and some number of smaller ones.
This graph looks quite different from (and much better than!) a random mapping where we aren't able to go backwards. An example of such a mapping is the following, derived from middle square:
Random Mapping Statistics
For random mappings, Flajolet gives us these statistics
Property | Expected Value |
---|---|
Number of components | (log N)/2 |
Component size | N/3 |
Number cyclic nodes | √(π N/2) |
Path length to reach cycle (λ) | √(π N/8) |
Cycle length (μ) | √(π N/8) |
Rho length (ρ = λ + μ) | √(π N/2) |
Number of nodes of in-degree k | N e^{−k}/k! |
For a random invertible mapping, some of these statistics are zero because every node lies on a cycle, but what about the others?
Random Invertible Mapping Statistics
I will admit, I derived these properties the quick and easy way. Where there was simple math, I used it, but where there wasn't I did things numerically using Monte Carlo methods. I'll leave it as an exercise to check the numbers, ether symbolically or numerically.
For a random invertible mapping with N = 2^{B} nodes (a.k.a. states), we have
Property | Expected Value |
---|---|
Number of cycles | (log N) = (log 2) B |
Largest cycle length | 2^{B-log 2} |
Smallest cycle length | O(1) |
Expected cycle length | 2^{B-1} |
Number of nodes on cycles < 2^{k} | 2^{k} |
The last statistic is perhaps the most interesting. We'll apply it in the next section.
Analysing jsr32
and jsr64
Jenkins's generators are examples of generators designed around a random invertible mapping. As such, we know they will have some short cycles, but should we worry about that? One of the ways in which Jenkins reduces the chance of problems is by not allowing arbitrary seeding. Instead, jsr32
has only 2^{32} allowed seed values and jsr64
only has 2^{64} allowed seeds.
Jenkins explicitly ran all 2^{32} (4 billion) allowed seeds of jsr32
out to 2^{20} outputs to make sure they were not on a short cycle (and did not collide). He found no problems. Although his test results were reassuring, from a mathematical standpoint, did Jenkins need to run these tests? What is the chance that jsr32
might have failed? And should he have run a longer tests that demanded more output?
From the table in the previous section, we know to expect to find about 2^{20} states on cycles shorter than 2^{20}. But seed states are drawn from the entire population of 2^{128} states, which means that the chance that the state space of jsr32
would happen to be arranged such that any of its 2^{32} seeds would be one of the 2^{20} short-cycle states is about 1 in 2^{(128 – 20 – 32)}, or 75 thousand billion billion to one. Jenkins would have had to have been ridiculously unlucky in creating jsr32
for it to be an issue.
But 2^{20} outputs isn't very many at all—it's only about a million. It seems plausible that an extreme user might demand a petabyte from this generator, which would be 2^{48} outputs. The odds that jsr32
's state space is structured such that some of its 2^{32} seeds would lead to a less-than-petabyte cycle is 1 in 2^{(128 – 48 – 32)}, or 281 thousand billion to one.
It's worth realizing here that the results we're talking about relate to the structure of the state space of jsr32
itself, not what would happen on any particular run using it. In other words, jsr32
's seeding method will always be utterly safe unless Jenkins happened to be really unlucky. It's more likely that he would have won the lottery and then been immediately struck by lightning than that he happened to make jsr32
's seeding method accidentally capable of hitting a short cycle.
Obviously the result for jsr64
is even better. You can feel very confident that the odds that its state space is structured such that any of its 2^{64} seeds lead to a small cycle is amazingly vanishingly tiny.
Conclusion
Math is fun. A little empiricism is fun. And you can use Bob Jenkins's PRNGs without worrying at all.
I've deliberately been a little short of details on the math so you can have a go at checking these results yourself, if you like.