Statistical Bias in the Distribution of Prime Pairs and Isolated Primes
aa r X i v : . [ m a t h . N T ] J u l STATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRSAND ISOLATED PRIMES
WALDEMAR PUSZKARZ
Abstract.
Computer experiments reveal that twin primes tend to center onnonsquarefree multiples of 6 more often than on squarefree multiples of 6 com-pared to what should be expected from the ratio of the number of nonsquarefreemultiples of 6 to the number of squarefree multiples of 6 equal π / −
1, or ca2.290. For multiples of 6 surrounded by twin primes, this ratio is 2.427, a rel-ative difference of ca 6 .
0% measured against the expected value. A deviationfrom the expected value of this ratio, ca 1 . .
2% of the total number oftwins. In the case of isolated primes, this excess for nonsquarefree numbersamounts to 0 .
4% of the total number of such primes. The above numbers arefor the first 10 primes, with the bias showing a tendency to grow, at leastfor isolated primes. Surprising effect
Except for the first two (2 and 3), all primes are of the form 6 k − k +1. Thisimplies that twin primes (consecutive primes separated by 2) surround multiples of6 (3 and 5 are the only exception).All natural numbers are either squarefree or nonsquarefree. Unlike the former,the latter are divisible by a square greater than 1. All primes are squarefree.Squarefree numbers have natural density of 6 /π , which gives 1 − /π for naturaldensity of nonsquarefree numbers. Thus, the ratio of relative frequencies at whichone expects to find these numbers in a sufficiently large sample of natural numbersis π / − . /π , this ratio is R = π / − . (most numbers in this paper are rounded off tothe 3rd decimal digit). What this means is that in a sufficiently large sample, onaverage for every 100 squarefree multiples of 6 there are about 229 nonsquarefreenumbers divisible by 6.If prime pairs are as likely to center on squarefree multiples of 6 as they are onnonsquarefree multiples of 6 (i.e., their distribution is unbiased in this respect), weshould expect the same ratio for the multiples of 6 in their centers if calculationsare performed on a large enough sample of prime pairs. Date : July 3, 2018.2010
Mathematics Subject Classification.
Key words and phrases.
Primes, twin primes.
WALDEMAR PUSZKARZ
However, this is not the case.
This can be observed already in a sample of the first 10 primes and the dis-crepancy persists (may even be getting slightly stronger) for larger samples. Thelargest we used consisted of the first 10 primes and for it the ratio is R = .Defining the relative difference as the absolute value of the ratio of the differencebetween the experimental value and the theoretical one to the theoretical one, wefind out that the relative difference in this case is ca 6 . p such that neither p − p + 2 is prime) reveal a similar bias: such primes occur next to nonsquarefreemultiples of 6 slightly more often than next to squarefree multiples of 6 comparedto what would be expected in the non-biased distribution. In this case, the ratio is R = (for the sample of 10 primes), and the relative difference is ca 1 . multiples of 6 (and virtually unchanged for the sample of10 ), noticeably smaller than 2.427 and very close to the unbiased ratio, 2.290.But if we exclude twin primes from the test pairs, the effect goes away almostcompletely for the sample of 10 as now we get 2.286, which leads to less than0 .
2% in relative difference, a difference small enough to ascribe it to statisticalfluctuations. Moreover, for the sample of 10 primes, this ratio is 2.2889, less than0 .
05% in relative difference, a very small difference indeed.For isolated test squarefree numbers next to a multiple of 6 (to the left or rightof it), the bias effect pretty much fails to manifest itself already in the sampleof the first 10 such multiples as we get 2.2907 (rounded off to the 4th decimaldigit), which versus 2.2898 is less than 0 .
04% in relative difference. If the primesare excluded from these squarefree numbers, we obtain 2.2894, and the relativedifference is now ca 0 . (primes excluded), the ratio is 2.2897, even closerto the unbiased theoretical value. Hence, to reiterate, the observed effect in both situations is most cer-tainly a property of primes rather than a generic property of squarefreenumbers.
To put it more precisely (if not pedantically), if there is any actualcontribution to it from non-prime squarefree numbers, it is negligible compared tothe contribution from primes.Moreover, the effect appears pretty stable over several sample ranges with therange size growing by 10 for each data point we collected to determine the effectbehavior (see the next section).2.
Excess functions
Let us define them as ǫ = round (1000 ∗ ( R − R )) for isolated primes and ǫ = round (1000 ∗ ( R − R )) for prime pairs, where round ( x ) is tasked withrounding off x to the nearest integer. TATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRS AND ISOLATED PRIMES 3
The data for R and R is obtained numerically while R can be calculatedanalytically as done above. We obtained 5 data points for each of these functions(see the data section for more information) and they suggest (albeit quite weakly)that we may be dealing with slowly growing functions, at least in the case of ǫ .More data is needed to be positive that the trends we suspect these functions maybe showing are not due to statistical fluctuations. It may very well be that the bestapproximation to these functions is a constant. This appears to be the most likelyscenario for ǫ and it may be so asymptotically for ǫ .Below are the values of these functions at arguments that are consecutive powersof 10, starting at 10 (we use the exponent values of 10 n to index the arguments).For isolated primes, ǫ (6) = 34, ǫ (7) = 39, ǫ (8) = 41, ǫ (9) = 42, ǫ (10) = 43.For prime pairs, ǫ (6) = 136, ǫ (7) = 134, ǫ (8) = 135, ǫ (9) = 136, ǫ (10) = 137.These numbers represent the (average) excess of nonsquarefree numbers com-pared to the non-biased case for every 1000 squarefree numbers. For instance, ǫ (10) tells us that there are on average 137 more prime pairs centered on non-squarefree multiples of 6 per 1000 squarefree multiples of 6 surrounded by primesthan one would expect in the unbiased situation for the first 10 primes.3. Code and data
The effect discussed was first observed in Mathematica computer experimentsperformed on the first 10 primes. The data for larger samples was obtained usingPARI/GP, an open source software package for number theory.What follows below is a sample of PARI/GP code used to obtain the data andthe data. The data is indexed by exponents of range size, chosen to be powers of10, from 10 to 10 . The code for Mathematica can easily be produced from thePARI/GP code.Our code counts all prime pairs (even though the first of them, { , } , is notcentered on a multiple of 6) and excludes 2 as an isolated prime. While 2 issometimes treated as an isolated prime, it is actually less isolated from other primesthan all odd primes save for 3. With 2 excluded, the number of isolated primesplus twice the number of pairs still add to a range size (10 through 10 ), for 5is counted twice as a member of two consecutive pairs that share it. These choiceshave no impact on our statistical results. We mention them for the sake of clarity.In what follows, a represents the number of all target objects (primes or testsquarefree numbers), while b only the number of such objects next to or centeredon squarefree numbers. The ratios discussed above are calculated as R = ( a − b ) /b . Part A. Prime numbersTwin primes a=0; forprime(n=2, prime(10^8), isprime(n+2)&&a++); print1(a)\\all twinsb=0; forprime(n=2, prime(10^8), isprime(n+2)&&issquarefree(n+1)&&b++); print1(b) \\twins centered on a squarefree number
WALDEMAR PUSZKARZ a : 86027 (6), 738597 (7), 6497407 (8), 58047180 (9), 524733511 (10). b : 25113 (6), 215732 (7), 1897137 (8), 16944418 (9), 153121114 (10). Isolated primes a=0; forprime(n=3, prime(10^8), !isprime(n+2)&&!isprime(n-2)&&a++);print1(a) \\allb=0; forprime(n=3, prime(10^8), !isprime(n+2)&&!isprime(n-2)&&((n%6==1&&issquarefree(n-1))||(n%6==5&&issquarefree(n+1)))&&b++);print1(b) \\next to a squarefree number a : 827946 (6), 8522806 (7), 87005186 (8), 883905640 (9), 8950532978 (10). b : 249071 (6), 2560208 (7), 26123609 (8), 265275545 (9), 2685404943 (10). Part B. Test squarefree numbersSquarefree twins (primes included) centered on a multiple of 6 a=0; for(n=1, 10^8, issquarefree(6*n-1)&&issquarefree(6*n+1)&&a++); print1(a) \\allb=0; for(n=1, 10^8, issquarefree(6*n)&&issquarefree(6*n-1)&&issquarefree(6*n+1)&&b++); print1(b) \\centeredon a squarefree number a : 82962973 (8), 829630636 (9). b : 25097397 (8), 250974031 (9). Squarefree twins (primes excluded) centered on a multiple of 6 a=0; for(n=1, 10^8, issquarefree(6*n-1)&&!isprime(6*n-1)&&issquarefree(6*n+1)&&!isprime(6*n+1)&&a++);print1(a) \\allb=0; for(n=1, 10^8, issquarefree(6*n)&&issquarefree(6*n-1)&&!isprime(6*n-1)&&issquarefree(6*n+1)&&!isprime(6*n+1)&&b++);print1(b) \\centered on squarefree numbers a : 57015536 (8), 595982891 (9). b : 17348734 (8), 181210143 (9). Isolated squarefree numbers (primes included) next to a multiple of 6 a=0; for(n=1, 10^8, (issquarefree(6*n-1)||issquarefree(6*n+1))&&a++); print1(a) \\number of cases a squarefree number is next toa multiple of 6b=0; for(n=1, 10^8, (issquarefree(6*n))&&(issquarefree(6*n-1)||issquarefree(6*n+1))&&b++); print1(b) \\number of casesa squarefree number is next to a squarefree multiple of 6
TATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRS AND ISOLATED PRIMES 5 a : 99415124 (8). b : 30211331 (8). Isolated squarefree numbers (primes excluded) next to a multiple of 6 a=0; for(n=1, 10^8, (issquarefree(6*n-1)&&!isprime(6*n-1))||(issquarefree(6*n+1)&&!isprime(6*n+1))&&a++); print1(a) \\number ofcases a non-prime squarefree number is next to a multiple of 6b=0; for(n=1, 10^8, issquarefree(6*n)&&((issquarefree(6*n-1)&&!isprime(6*n-1))||(issquarefree(6*n+1)&&!isprime(6*n+1)))&&b++);print1(b) \\number of cases a non-prime squarefree number isnext to a squarefree multiple of 6 a : 94037859 (8), 948253019 (9). b : 28588317 (8), 288245142 (9).4. Other remarks
We mentioned that the effect discussed here can easily be observed even amongthe first million primes. More surprisingly, though, R is consistently larger com-pared to the expected value even for the first 100, 1000, or 10,000 primes all theway up to one million (and beyond). More than 10,000 first primes were knownby the end of the 18th century. What this means is that even Gauss or one of hiscontemporaries could have noticed it some 200 years ago! Yet, we have found noevidence it was known at all.The bias that we have so far chosen to measure using the R ratio can also beexpressed in other ways. For instance, one can inquire how much the number oftwins centered on squarefree numbers, or b (see the data section), differs from itsexpected value for a given sample size (number a in our data section). Or, we caninquire about the same deviation from the expected value for twins centered onnonsquarefree numbers.As an example, let us calculate this for the first 10 primes. In this case, wehave 86206 prime pairs centered on multiples of 6. If these pairs were distributedin an unbiased way, the way non-prime squarefree twins are, the number of themsurrounding squarefree multiples of 6 would be ca round (86026 / ( R + 1)) = 26149and not 25113 that we actually get. The deficit we observe, 1036, represents 3 . − .
73% for the relative excessmeasured against the expected value. For isolated primes, similar calculations give1 .
03% and 0 . R ratio, 5 .
93% and 1 .
50% for the first10 primes for pairs and isolated primes, respectively.One can measure this bias in yet another way suggested to us by Jon E. Schoen-field. This way measures the redistribution of primes due to the bias. For the first10 primes, 1036 twins get redistributed compared to the total of 86026. Thus, asa result of the bias, an excess of 1 .
20% of the total number of twins is captured
WALDEMAR PUSZKARZ by nonsquarefree numbers. For isolated primes, the respective number is 0 . .
39% for the first 10 primes.Since the distribution of non-prime squarefree numbers is not affected by thebias discussed here, one may expect a compensatory effect among the nonsquarefreetwins (nonsquarefree numbers surrounding multiples of 6). Such is the case, indeed.The R ratio of these twins is lower than expected. For the first 10 multiples of 6,it is 2 . R .The most common cluster of consecutive squarefree numbers is that of a triple:three squarefree numbers in a row. Doubles and singles are less common, with sin-gles a bit more common than doubles. While this fact may not be widely known, itis not necessarily surprising. But what seems interesting (perhaps even surprising)here is that prime pairs sabotage the formation of squarefree triples by choosing astheir middle partners nonsquarefree numbers over squarefree ones more often thanis the case among non-prime squarefree twins.Let us add one more remark that we hope has some pedagogical value. Namely,there is really only one insight invested in this study and rather simple one too:examining the value of R . Once the importance of this number is realized, every-thing else becomes child’s play and the whole research can be performed even by aresourceful high school student. 5. Conclusion
The results we reported above are quite basic, were obtained in an elementaryfashion, and concern fundamental classes of numbers. It is therefore rather surpris-ing that we found no mention of them in the literature of the subject. This leadsus to believe that the statistical bias they describe was most likely unknown.The study of biases in the distribution of prime numbers has recently been rein-vigorated by the work of Lemke Oliver and Soundararajan [1] on the phenomenonrelated to the one observed by Chebyshev already in 1853 and known as the Cheby-shev bias (see [2] and [3]). However, the effect under consideration here is of differentnature than those discussed in the papers cited.Moreover, and more importantly, the Chebyshev bias is significantly smallerthan our bias. Using the data for the first million primes from [1], we see that thedeviation from the non-biased distribution (to be half a million of primes for eitherof two classes of primes that the Chebyshev effect concerns) is only 170.Let us contrast this with the bias presented here. Using the redistribution metricfrom the previous section, we get a relative bias of 170 / . primes is 524733511, which makes it comparable to the size of the sample ofnon-prime squarefree twins for the first 10 multiples of 6, 595982891. Yet, for thelatter sample, the ratio R deviates from the expected R by less than 0 . . TATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRS AND ISOLATED PRIMES 7 expect from the unbiased distribution, and, in particular, more often than is thecase for non-prime squarefree numbers.One can view this effect as a certain statistical property of primes. It is substan-tial and simple enough to be of general interest. If only because of this, we believeit deserves further study. The main goal of this paper was to lay the empiricalgroundwork for this study. More empirical work is still needed, in particular toexamine the behavior of excess functions in a wider range of primes to determinethese functions approximate analytical form. To do this efficiently, more powerfulcomputing resources are required than we had at our disposal. But what is neededfirst and foremost is a theoretical model explaining this phenomenon. We hopefurther research will produce such a model.6.
Links
A text file with the PARI/GP code and an Excel spreadsheet with the data andresults can be downloaded from the author’s site.
Acknowledgements.
The author is grateful to the developers of PARI/GP andWolfram Mathematica, whose software was indispensable to this research, and toKevin Ford, Krzysztof D. Ma´slanka, Jon E. Schoenfield, and Marek Wolf for theirinterest in this work and comments.
References [1] R. J. Lemke Oliver and K. Soundararajan,
Unexpected biases in the distribution of consec-utive primes , Proc. Natl. Acad. Sci. USA 113, E4446–E4454, (2016) (also arXiv:1603.03720[math.NT]).[2] M. Rubinstein and P. Sarnak,
Chebyshev’s bias , Experiment. Math., 3(3):173–197, (1994).[3] A. Granville and G. Martin,
Prime number races , Amer. Math. Monthly, 113(1):1–33, (2006).
Los Angeles, CA, USA
E-mail address ::