Xoroshiro128+ Fails PractRand Even When Truncated

M.E. O'Neill

2018-03-25 02:24

Although I know a lot of effort went into Xoroshiro128+, and there are many good things that have come out of its development, I am sad to say that on balance I feel it has too many flaws to be worth recommending—there are many better choices. In this post, I'll dig a little deeper into some of its flaws.

Let's begin with what we already know:

John D. Cook showed that it fails PractRand with just a couple of seconds of testing, and before that Chris Doty-Humphrey, author of PractRand, showed that it didn't just fail “binary rank” tests, it also failed the DC6 test which is a short-medium range linear test.
Daniel Lemire showed that it fails TestU01.
I showed that it is trivially predictable.
I showed that it is visualizing smaller-scale versions of the scheme shows clear flaws.

But are the flaws superficial and easily ignored, or more troubling than that?

Evolving Author Caveats

The authors of Xoroshiro128+ do acknowledge some of these flaws. The source code admits (somewhat obliquely) that it fails PractRand's binary-rank tests, but suggests that the problem is confined to just the lowest few bits. Over time, however, these claims have been progressively weakened. The source for this generator used to say:

with the exception of binary rank tests, which fail due to the lowest bit being an LFSR; all other bits pass all tests.

On 14 October, 2017, the comments in the source were revised to say

with the exception of binary rank tests, as the lowest bit of this generator is an LSFR. The next bit is not an LFSR, but in the long run it will fail binary rank tests, too. The other bits have no LFSR artifacts.

Less than six weeks later, on 29 November 2017, the comments were revised yet again, to what is now their current wording (archive):

with the exception of binary rank tests, the lowest bit of this generator is an LFSR of degree 128. The next bit can be described by an LFSR of degree 8256, but in the long run it will fail linearity tests, too. The other bits needs a much higher degree to be represented as LFSRs.

This version is quite a climbdown—from saying the other bits have no LFSR artifacts to admitting that actually, yes, they do. But the implication remains that only the lowest few bits fail statistical tests in practice. If that is the case, it means that using Xoroshiro128+ to generate 48-bits of randomness (throwing away not just the lowest two bits, but the lowest 16) is fine. But you have to wonder: is that really true?

Small-Scale Tests

Back in July, 2017, before the comments in the source for Xoroshiro128+ were updated, I was wondering if it was really the case that it was only the lowest few bits that were problematic. So I decided to find out using my usual technique: testing a small-scale version of the generator.

Using the theory that underlies Xoroshiro128+ (which is simply Marsaglia's classic work on XorShift generators), I built a small C++ library that supports versions at different sizes. With that in place, I could run tests with Xoroshiro64+, a half-sized version of Xoroshiro128+.

In testing, it behaves similarly to its full-sized sibling, failing the Binary Rank test in just a couple of seconds.

unix% ./xoroshiro64 | ./RNG_test.new stdin32
RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0x80307dbd
test set = normal, folding = standard (32 bit)

rng=RNG_stdin32, seed=0x80307dbd
length= 128 megabytes (2^27 bytes), time= 2.5 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R=  +6.7  p =  1.0e-3   mildly suspicious
  [Low8/32]BRank(12):768(1)         R=+583.3  p~=  1.2e-176   FAIL !!!!!!    
  [Low8/32]BRank(12):1K(1)          R= +1272  p~=  5.4e-384   FAIL !!!!!!!   
  [Low1/32]BRank(12):128(2)         R= +1799  p~=  1.2e-542   FAIL !!!!!!!   
  [Low1/32]BRank(12):256(2)         R= +5696  p~=  1e-1715    FAIL !!!!!!!!  
  [Low1/32]BRank(12):384(1)         R= +6783  p~=  5e-2043    FAIL !!!!!!!!  
  [Low1/32]BRank(12):512(1)         R= +9539  p~=  1e-2872    FAIL !!!!!!!!  
  ...and 110 test result(s) without anomalies

Even at this point, this result is something of an indictment of Xoroshiro128+. There are numerous PRNGs with 64-bits of state and 32-bit output that easily pass PractRand (examples include SplitMix, pcg32, and XorShift* 64/32).

But this smaller version is also ideal for examining the question of whether it is just a few low order bits that have problems. I thus created xoroshiro64plus16, which instead of producing 32 bits of output, throws away the bottom 16 bits to produce only 16 bits of output. 16-bit output is easy to test with PractRand (whereas dropping 16 bits from its larger-sized counterpart would result in an awkwardly sized 48-bit PRNG which would be more annoying to test).

Here are the results of an extensive test of that cut-down generator:

unix% ./xoroshiro64plus16 | ./RNG_test stdin16 -te 1 -tf 2 -tlmax 50 -multithreaded -tlmaxonly
RNG_test using PractRand version 0.93
RNG = RNG_stdin16, seed = 0x3bf1cd82
test set = expanded, folding = extra

rng=RNG_stdin16, seed=0x3bf1cd82
length= 64 megabytes (2^26 bytes), time= 2.3 seconds
  no anomalies in 843 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 128 megabytes (2^27 bytes), time= 13.1 seconds
  no anomalies in 891 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 256 megabytes (2^28 bytes), time= 26.5 seconds
  no anomalies in 938 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 512 megabytes (2^29 bytes), time= 42.0 seconds
  no anomalies in 985 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 1 gigabyte (2^30 bytes), time= 64.1 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low8/64]FPF-14+6/4:cross         R=  -2.5  p =1-2.3e-4   unusual
  ...and 1038 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 2 gigabytes (2^31 bytes), time= 99.0 seconds
  Test Name                         Raw       Processed     Evaluation
  FPF-14+6/32:all                   R=  +5.3  p =  1.7e-4   unusual
  FPF-14+6/16:all                   R=  +5.4  p =  1.4e-4   unusual
  [Low1/16]BCFN_FF(2+0,13-3,T)      R=  -7.2  p =1-5.3e-4   unusual
  ...and 1089 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 4 gigabytes (2^32 bytes), time= 155 seconds
  no anomalies in 1146 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 8 gigabytes (2^33 bytes), time= 270 seconds
  no anomalies in 1224 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 16 gigabytes (2^34 bytes), time= 487 seconds
  no anomalies in 1299 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 32 gigabytes (2^35 bytes), time= 904 seconds
  no anomalies in 1357 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 64 gigabytes (2^36 bytes), time= 1848 seconds
  no anomalies in 1430 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 128 gigabytes (2^37 bytes), time= 3448 seconds
  no anomalies in 1506 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 256 gigabytes (2^38 bytes), time= 6089 seconds
  no anomalies in 1567 test result(s)

rng=RNG_stdin16, seed=0x3bf1cd82
length= 512 gigabytes (2^39 bytes), time= 10918 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R= +13.0  p =  2.2e-5   mildly suspicious
  ...and 1622 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 1 terabyte (2^40 bytes), time= 23189 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R= +24.1  p =  3.1e-9    VERY SUSPICIOUS
  DC6-6x2Bytes-1                    R=  +9.4  p =  1.0e-4   mildly suspicious
  [Low1/32]DC6-9x1Bytes-1           R=  -5.1  p =1-1.4e-3   unusual
  ...and 1675 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 2 terabytes (2^41 bytes), time= 50828 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R= +48.2  p =  1.4e-17    FAIL !
  DC6-6x2Bytes-1                    R= +17.5  p =  2.5e-8   very suspicious
  ...and 1721 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 4 terabytes (2^42 bytes), time= 91453 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+1):freq                 R=  +8.2  p~=   1e-7    suspicious
  DC6-9x1Bytes-1                    R= +90.2  p =  4.0e-32    FAIL !!!
  DC6-6x2Bytes-1                    R= +33.2  p =  2.7e-15    FAIL !
  ...and 1767 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 8 terabytes (2^43 bytes), time= 165029 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+1):freq                 R= +11.4  p~=   5e-13     FAIL
  DC6-9x1Bytes-1                    R=+180.4  p =  2.3e-63    FAIL !!!!
  DC6-6x2Bytes-1                    R= +65.5  p =  1.4e-29    FAIL !!!
  DC6-5x4Bytes-1                    R=  +7.1  p =  1.5e-4   unusual
  ...and 1812 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 16 terabytes (2^44 bytes), time= 310715 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+1):freq                 R= +17.5  p~=   6e-18     FAIL !
  DC6-9x1Bytes-1                    R=+355.7  p =  4.3e-124   FAIL !!!!!
  DC6-6x2Bytes-1                    R=+126.9  p =  9.5e-57    FAIL !!!!
  DC6-5x4Bytes-1                    R= +12.1  p =  1.4e-7   very suspicious
  ...and 1855 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 32 terabytes (2^45 bytes), time= 606600 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+1):freq                 R= +37.2  p~=   6e-18     FAIL !
  DC6-9x1Bytes-1                    R=+713.4  p =  5.3e-248   FAIL !!!!!!
  DC6-6x2Bytes-1                    R=+253.1  p =  1.3e-112   FAIL !!!!!
  DC6-5x4Bytes-1                    R= +24.5  p =  5.3e-15    FAIL !
  [Low1/64]FPF-14+6/32:cross        R=  -3.0  p =1-9.2e-6   mildly suspicious
  ...and 1896 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 64 terabytes (2^46 bytes), time= 1174339 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+0,13-0,T)               R=  +9.0  p =  2.2e-4   unusual
  BCFN_FF(2+1):freq                 R= +80.9  p~=   6e-18     FAIL !
  DC6-9x1Bytes-1                    R= +1421  p =  4.8e-493   FAIL !!!!!!!
  DC6-6x2Bytes-1                    R=+499.4  p =  1.2e-221   FAIL !!!!!!
  DC6-5x4Bytes-1                    R= +50.6  p =  9.7e-31    FAIL !!!
  FPF-14+6/4:(3,14-0)               R=  +7.8  p =  8.2e-7   unusual
  FPF-14+6/4:all2                   R=  +9.4  p =  4.1e-6   mildly suspicious
  ...and 1934 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 128 terabytes (2^47 bytes), time= 2297153 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+0,13-0,T)               R= +21.2  p =  7.4e-11   VERY SUSPICIOUS
  BCFN_FF(2+0):freq                 R= +14.3  p~=   6e-18     FAIL !
  BCFN_FF(2+1):freq                 R=+161.9  p~=   6e-18     FAIL !
  DC6-9x1Bytes-1                    R= +2844  p =  4.0e-986   FAIL !!!!!!!
  DC6-6x2Bytes-1                    R=+982.7  p =  1.3e-435   FAIL !!!!!!!
  DC6-5x4Bytes-1                    R= +97.3  p =  7.8e-59    FAIL !!!!
  FPF-14+6/4:(3,14-0)               R= +18.8  p =  5.3e-17    FAIL !
  FPF-14+6/4:(4,14-0)               R=  +7.4  p =  1.7e-6   unusual
  FPF-14+6/4:(5,14-0)               R=  +7.4  p =  1.6e-6   unusual
  FPF-14+6/4:(6,14-0)               R=  +9.7  p =  1.3e-8   suspicious
  FPF-14+6/4:all                    R=  +8.6  p =  1.5e-7   very suspicious
  FPF-14+6/4:all2                   R= +70.0  p =  5.8e-37    FAIL !!!
  ...and 1968 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 256 terabytes (2^48 bytes), time= 4538082 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+0,13-0,T)               R= +42.1  p =  5.1e-22    FAIL !!
  BCFN_FF(2+1,13-0,T)               R= +13.4  p =  1.0e-6   suspicious
  BCFN_FF(2+0):freq                 R= +24.9  p~=   6e-18     FAIL !
  BCFN_FF(2+1):freq                 R=+329.9  p~=   6e-18     FAIL !
  BCFN_FF(2+2):freq                 R=  +7.0  p~=   5e-6    unusual
  DC6-9x1Bytes-1                    R= +5681  p =  7e-1969    FAIL !!!!!!!!
  DC6-6x2Bytes-1                    R= +1940  p =  3.5e-859   FAIL !!!!!!!
  DC6-5x4Bytes-1                    R=+186.6  p =  1.2e-112   FAIL !!!!!
  FPF-14+6/4:(3,14-0)               R= +31.9  p =  4.2e-29    FAIL !!
  FPF-14+6/4:(4,14-0)               R= +16.7  p =  4.1e-15    FAIL
  FPF-14+6/4:(5,14-0)               R= +19.3  p =  1.7e-17    FAIL !
  FPF-14+6/4:(6,14-0)               R= +18.2  p =  2.0e-16    FAIL
  FPF-14+6/4:all                    R= +16.7  p =  3.6e-15    FAIL
  FPF-14+6/4:all2                   R=+277.5  p =  1.2e-152   FAIL !!!!!
  ...and 2003 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 512 terabytes (2^49 bytes), time= 9012917 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+0,13-0,T)               R=+103.8  p =  4.6e-55    FAIL !!!!
  BCFN_FF(2+1,13-0,T)               R= +26.9  p =  6.7e-14    FAIL
  BCFN_FF(2+0):freq                 R= +60.2  p~=   6e-18     FAIL !
  BCFN_FF(2+1):freq                 R=+658.0  p~=   6e-18     FAIL !
  BCFN_FF(2+2):freq                 R= +14.2  p~=   6e-18     FAIL !
  BCFN_FF(2+9):freq                 R= +16.6  p~=   6e-18     FAIL !
  DC6-9x1Bytes-1                    R=+11350  p =  1e-3932    FAIL !!!!!!!!
  DC6-6x2Bytes-1                    R= +3834  p =  6e-1698    FAIL !!!!!!!!
  DC6-5x4Bytes-1                    R=+367.1  p =  2.8e-221   FAIL !!!!!!
  FPF-14+6/4:(3,14-0)               R= +64.7  p =  1.6e-59    FAIL !!!!
  FPF-14+6/4:(4,14-0)               R= +36.0  p =  6.0e-33    FAIL !!!
  FPF-14+6/4:(5,14-0)               R= +36.1  p =  5.1e-33    FAIL !!!
  FPF-14+6/4:(6,14-0)               R= +38.1  p =  6.9e-35    FAIL !!!
  FPF-14+6/4:all                    R= +35.0  p =  2.5e-32    FAIL !!!
  FPF-14+6/4:all2                   R= +1208  p =  2.0e-661   FAIL !!!!!!!
  ...and 2036 test result(s) without anomalies

rng=RNG_stdin16, seed=0x3bf1cd82
length= 1 petabyte (2^50 bytes), time= 18156195 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+0,13-0,T)               R=+194.7  p =  1.2e-103   FAIL !!!!!
  BCFN_FF(2+1,13-0,T)               R= +53.3  p =  5.2e-28    FAIL !!
  BCFN_FF(2+0):freq                 R=+142.3  p~=   6e-18     FAIL !
  BCFN_FF(2+1):freq                 R= +1313  p~=   6e-18     FAIL !
  BCFN_FF(2+2):freq                 R= +29.2  p~=   6e-18     FAIL !
  BCFN_FF(2+9):freq                 R= +22.9  p~=   6e-18     FAIL !
  DC6-9x1Bytes-1                    R=+22651  p =  3e-7847    FAIL !!!!!!!!
  DC6-6x2Bytes-1                    R= +7635  p =  1e-3380    FAIL !!!!!!!!
  DC6-5x4Bytes-1                    R=+711.6  p =  1.0e-428   FAIL !!!!!!!
  FPF-14+6/4:(3,14-0)               R=+129.1  p =  4.4e-119   FAIL !!!!!
  FPF-14+6/4:(4,14-0)               R= +75.7  p =  1.2e-69    FAIL !!!!
  FPF-14+6/4:(5,14-0)               R= +74.9  p =  5.9e-69    FAIL !!!!
  FPF-14+6/4:(6,14-0)               R= +75.1  p =  4.0e-69    FAIL !!!!
  FPF-14+6/4:all                    R= +66.7  p =  4.4e-62    FAIL !!!!
  FPF-14+6/4:all2                   R= +5043  p =  3e-2631    FAIL !!!!!!!!
  ...and 2068 test result(s) without anomalies

There are a few things to unpack from these results. First, we can see it fails multiple tests in three separate test families, BCFN, DC6, and FPF. Although I ran tests out to a petabyte, I didn't need to, because it failed tests well before 32 TB of testing, the default stopping point for PractRand.

Thus, for this version, the problems are not confined to the lowest few bits; throwing away sixteen bits (half the generator state) leaves detectable statistical flaws.

Full-Scale Tests

At the same time I ran the test of Xoroshiro64+, I also set in motion a test of xoroshiro128plus32, a version of Xoroshiro128+ that discards the lowest 32 bits. We should expect that the more bits we discard the harder we have to work to detect flaws, so I wasn't sure if any issues would be detected. In fact, I thought it would probably be fine, as evidenced by my comments in the original source for my Xoroshiro library.

I was wrong. With the full-blown Xoroshiro128+ PRNG, even if you discard the entire low 32 bits, there are still detectable statistical flaws. Here's the run:

unix% ./xoroshiro128plus32 | ./RNG_test stdin32 -te 1 -tf 2 -tlmax 50 -multithreaded
RNG_test using PractRand version 0.93
RNG = RNG_stdin32, seed = 0x79972d1f
test set = expanded, folding = extra

rng=RNG_stdin32, seed=0x79972d1f
length= 128 megabytes (2^27 bytes), time= 2.8 seconds
  no anomalies in 891 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 256 megabytes (2^28 bytes), time= 12.7 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low4/32]DC6-9x1Bytes-1           R=  -4.9  p =1-1.7e-3   unusual
  ...and 937 test result(s) without anomalies

rng=RNG_stdin32, seed=0x79972d1f
length= 512 megabytes (2^29 bytes), time= 24.6 seconds
  no anomalies in 985 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 1 gigabyte (2^30 bytes), time= 42.7 seconds
  no anomalies in 1038 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 2 gigabytes (2^31 bytes), time= 66.3 seconds
  no anomalies in 1092 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 4 gigabytes (2^32 bytes), time= 105 seconds
  no anomalies in 1153 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 8 gigabytes (2^33 bytes), time= 183 seconds
  no anomalies in 1226 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 16 gigabytes (2^34 bytes), time= 325 seconds
  no anomalies in 1298 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 32 gigabytes (2^35 bytes), time= 570 seconds
  no anomalies in 1361 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 64 gigabytes (2^36 bytes), time= 1140 seconds
  no anomalies in 1431 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 128 gigabytes (2^37 bytes), time= 2221 seconds
  no anomalies in 1506 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 256 gigabytes (2^38 bytes), time= 4116 seconds
  no anomalies in 1567 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 512 gigabytes (2^39 bytes), time= 8583 seconds
  no anomalies in 1623 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 1 terabyte (2^40 bytes), time= 17148 seconds
  no anomalies in 1678 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 2 terabytes (2^41 bytes), time= 33887 seconds
  no anomalies in 1723 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 4 terabytes (2^42 bytes), time= 69431 seconds
  no anomalies in 1770 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 8 terabytes (2^43 bytes), time= 139636 seconds
  no anomalies in 1816 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 16 terabytes (2^44 bytes), time= 278414 seconds
  no anomalies in 1859 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 32 terabytes (2^45 bytes), time= 572459 seconds
  no anomalies in 1901 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 64 terabytes (2^46 bytes), time= 1160830 seconds
  no anomalies in 1941 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 128 terabytes (2^47 bytes), time= 2268770 seconds
  no anomalies in 1980 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 256 terabytes (2^48 bytes), time= 4445499 seconds
  no anomalies in 2017 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 32 terabytes (2^45 bytes), time= 572459 seconds
  no anomalies in 1901 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 64 terabytes (2^46 bytes), time= 1160830 seconds
  no anomalies in 1941 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 128 terabytes (2^47 bytes), time= 2268770 seconds
  no anomalies in 1980 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 256 terabytes (2^48 bytes), time= 4445499 seconds
  no anomalies in 2017 test result(s)

rng=RNG_stdin32, seed=0x79972d1f
length= 512 terabytes (2^49 bytes), time= 8789501 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+9):freq                 R= +13.7  p~=   6e-18     FAIL !
  ...and 2050 test result(s) without anomalies

As we can see, finally just at the last moment, it resoundingly fails BCFN_FF:freq. As with its smaller sibling, we can surmise that if we ran the test longer, we'd pick up more fails.

Conclusions

Of course, few people bother to run a test that lasts 15 weeks and consumes half a petabyte of random numbers, but the point remains, Xoroshiro128+ has detectable flaws even if you throw half of its output, 32-bits, away.

It seems reasonable to surmise that you only throw 16 bits away, leaving 48 bits, you'll see statistical flaws sooner. When I get around to it, I'll test that out too, but even if it's less than 15 weeks of testing, it might need more than the typical week of testing to unmask the issues.

It's my understanding that Vigna mostly dismisses linearity-related failures as overly technical, and has even gone so far as to argue with the author of PractRand as to whether BCFN and DC6 are valid tests at all (even though other PRNGs pass them just fine). But I'd rather use a PRNG where you don't have to worry as much. There are plenty of them out there, including some very old ones.