Random Invertible Mapping Statistics

In my previous post, I looked at Bob Jenkins's JSF PRNG which has, at its heart, a random invertible mapping rather than a random (noninvertible) mapping.

I discussed the theory of random (noninvertible) mappings in my post “Too Big To Fail”, which looked at the behavior generators based on this idea. Although random (noninvertible) mappings produce PRNGs that I consider needlessly flawed (for general-purpose use), their behavior is at least well characterized in the academic literature, particularly by Philippe Flajolet (e.g., Analytic Combinatorics, section VII.3.3 pages 462-467).

Random invertible mappings have different properties, being based on a random bijection (which means that the generator can tick backwards as well as forwards if desired), whereas a random (noninvertible) mapping merely requires a random function (which may not be able to go backwards because there is no guarantee that there is only one place we could have come from).

If I were to search the literature, I would not be surprised to find that someone else has done a similar analysis of random invertible mappings. But a quick Google search revealed nothing, and, for me at least, deriving results myself is far more fun than searching for results derived by others, and often a good deal faster if there are no immediate leads. So that's what I've done. Now, hopefully, if someone Googles it, they'll find something. But if you have a good source to cite, please contact me and I'll update this article.

Random Invertible Mappings

In a random invertible mapping, we pick random bijection and use that to define a mapping. For example, suppose we use the function

int next(int current) {
    current += 13;
    current ^= current >> (WORDBITS/2);
    return current;
}

This function is invertible (a bijection) because we can write a prev function to go backwards:

int prev(int current) {
    current ^= current >> (WORDBITS/2);
    current -= 13;
    return current;
}

If we visualize this function in a tiny world where we have 6-bit integers, it looks like

Plus 13, Xorshift 3

All the nodes are on a cycle, but they are not on a single cycle—we can see that there is one large cycle and some number of smaller ones.

This graph looks quite different from (and much better than!) a random mapping where we aren't able to go backwards. An example of such a mapping is the following, derived from middle square:

Graph of 6-bit Meddle-Square

Random Mapping Statistics

For random mappings, Flajolet gives us these statistics

Property Expected Value
Number of components (log N)/2
Component size N/3
Number cyclic nodes √(π N/2)
Path length to reach cycle (λ) √(π N/8)
Cycle length (μ) √(π N/8)
Rho length (ρ = λ + μ) √(π N/2)
Number of nodes of in-degree k     N ek/k!

For a random invertible mapping, some of these statistics are zero because every node lies on a cycle, but what about the others?

Random Invertible Mapping Statistics

I will admit, I derived these properties the quick and easy way. Where there was simple math, I used it, but where there wasn't I did things numerically using Monte Carlo methods. I'll leave it as an exercise to check the numbers, ether symbolically or numerically.

For a random invertible mapping with N = 2B nodes (a.k.a. states), we have

Property Expected Value
Number of cycles (log N) = (log 2) B
Largest cycle length 2B-log 2
Smallest cycle length O(1)
Expected cycle length 2B-1
Number of nodes on cycles < 2k 2k

The last statistic is perhaps the most interesting. We'll apply it in the next section.

Analysing jsf32 and jsf64

Jenkins's generators are examples of generators designed around a random invertible mapping. As such, we know they will have some short cycles, but should we worry about that? One of the ways in which Jenkins reduces the chance of problems is by not allowing arbitrary seeding. Instead, jsf32 has only 232 allowed seed values and jsf64 only has 264 allowed seeds.

Jenkins explicitly ran all 232 (4 billion) allowed seeds of jsf32 out to 220 outputs to make sure they were not on a short cycle (and did not collide). He found no problems. Although his test results were reassuring, from a mathematical standpoint, did Jenkins need to run these tests? What is the chance that jsf32 might have failed? And should he have run a longer tests that demanded more output?

From the table in the previous section, we know to expect to find about 220 states on cycles shorter than 220. But seed states are drawn from the entire population of 2128 states, which means that the chance that the state space of jsf32 would happen to be arranged such that any of its 232 seeds would be one of the 220 short-cycle states is about 1 in 2(128 – 20 – 32), or 75 thousand billion billion to one. Jenkins would have had to have been ridiculously unlucky in creating jsf32 for it to be an issue.

But 220 outputs isn't very many at all—it's only about a million. It seems plausible that an extreme user might demand a petabyte from this generator, which would be 248 outputs. The odds that jsf32's state space is structured such that some of its 232 seeds would lead to a less-than-petabyte cycle is 1 in 2(128 – 48 – 32), or 281 thousand billion to one.

It's worth realizing here that the results we're talking about relate to the structure of the state space of jsf32 itself, not what would happen on any particular run using it. In other words, jsf32's seeding method will always be utterly safe unless Jenkins happened to be really unlucky. It's more likely that he would have won the lottery and then been immediately struck by lightning than that he happened to make jsf32's seeding method accidentally capable of hitting a short cycle.

Obviously the result for jsf64 is even better. You can feel very confident that the odds that its state space is structured such that any of its 264 seeds lead to a small cycle is amazingly vanishingly tiny.

Conclusion

Math is fun. A little empiricism is fun. And you can use Bob Jenkins's PRNGs without worrying at all.

I've deliberately been a little short of details on the math so you can have a go at checking these results yourself, if you like.