[PDF] Statistical Bias in the Distribution of Prime Pairs and Isolated Primes

Abstract

Computer experiments reveal that twin primes tend to center on nonsquarefree multiples of 6 more often than on squarefree multiples of 6 compared to what should be expected from the ratio of the number of nonsquarefree multiples of 6 to the number of squarefree multiples of 6 equal π 2 /3−1 , or ca 2.290. For multiples of 6 surrounded by twin primes, this ratio is 2.427, a relative difference of ca 6.0% measured against the expected value. A deviation from the expected value of this ratio, ca 1.9% , exists also for isolated primes. This shows that the distribution of primes is biased towards nonsquarefree numbers, a phenomenon most likely previously unknown. For twins, this leads to nonsquarefree numbers gaining an excess of 1.2% of the total number of twins. In the case of isolated primes, this excess for nonsquarefree numbers amounts to 0.4% of the total number of such primes. The above numbers are for the first 10 10 primes, with the bias showing a tendency to grow, at least for isolated primes.

Full PDF

aa r X i v : . [ m a t h . N T ] J u l STATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRSAND ISOLATED PRIMES

WALDEMAR PUSZKARZ

Abstract.

Computer experiments reveal that twin primes tend to center onnonsquarefree multiples of 6 more often than on squarefree multiples of 6 com-pared to what should be expected from the ratio of the number of nonsquarefreemultiples of 6 to the number of squarefree multiples of 6 equal π / −

1, or ca2.290. For multiples of 6 surrounded by twin primes, this ratio is 2.427, a rel-ative diﬀerence of ca 6 .

0% measured against the expected value. A deviationfrom the expected value of this ratio, ca 1 . .

2% of the total number oftwins. In the case of isolated primes, this excess for nonsquarefree numbersamounts to 0 .

4% of the total number of such primes. The above numbers arefor the ﬁrst 10 primes, with the bias showing a tendency to grow, at leastfor isolated primes. Surprising effect

Except for the ﬁrst two (2 and 3), all primes are of the form 6 k − k +1. Thisimplies that twin primes (consecutive primes separated by 2) surround multiples of6 (3 and 5 are the only exception).All natural numbers are either squarefree or nonsquarefree. Unlike the former,the latter are divisible by a square greater than 1. All primes are squarefree.Squarefree numbers have natural density of 6 /π , which gives 1 − /π for naturaldensity of nonsquarefree numbers. Thus, the ratio of relative frequencies at whichone expects to ﬁnd these numbers in a suﬃciently large sample of natural numbersis π / − . /π , this ratio is R = π / − . (most numbers in this paper are rounded oﬀ tothe 3rd decimal digit). What this means is that in a suﬃciently large sample, onaverage for every 100 squarefree multiples of 6 there are about 229 nonsquarefreenumbers divisible by 6.If prime pairs are as likely to center on squarefree multiples of 6 as they are onnonsquarefree multiples of 6 (i.e., their distribution is unbiased in this respect), weshould expect the same ratio for the multiples of 6 in their centers if calculationsare performed on a large enough sample of prime pairs. Date : July 3, 2018.2010

Mathematics Subject Classiﬁcation.

Key words and phrases.

Primes, twin primes.

WALDEMAR PUSZKARZ

However, this is not the case.

This can be observed already in a sample of the ﬁrst 10 primes and the dis-crepancy persists (may even be getting slightly stronger) for larger samples. Thelargest we used consisted of the ﬁrst 10 primes and for it the ratio is R = .Deﬁning the relative diﬀerence as the absolute value of the ratio of the diﬀerencebetween the experimental value and the theoretical one to the theoretical one, weﬁnd out that the relative diﬀerence in this case is ca 6 . p such that neither p − p + 2 is prime) reveal a similar bias: such primes occur next to nonsquarefreemultiples of 6 slightly more often than next to squarefree multiples of 6 comparedto what would be expected in the non-biased distribution. In this case, the ratio is R = (for the sample of 10 primes), and the relative diﬀerence is ca 1 . multiples of 6 (and virtually unchanged for the sample of10 ), noticeably smaller than 2.427 and very close to the unbiased ratio, 2.290.But if we exclude twin primes from the test pairs, the eﬀect goes away almostcompletely for the sample of 10 as now we get 2.286, which leads to less than0 .

2% in relative diﬀerence, a diﬀerence small enough to ascribe it to statisticalﬂuctuations. Moreover, for the sample of 10 primes, this ratio is 2.2889, less than0 .

05% in relative diﬀerence, a very small diﬀerence indeed.For isolated test squarefree numbers next to a multiple of 6 (to the left or rightof it), the bias eﬀect pretty much fails to manifest itself already in the sampleof the ﬁrst 10 such multiples as we get 2.2907 (rounded oﬀ to the 4th decimaldigit), which versus 2.2898 is less than 0 .

04% in relative diﬀerence. If the primesare excluded from these squarefree numbers, we obtain 2.2894, and the relativediﬀerence is now ca 0 . (primes excluded), the ratio is 2.2897, even closerto the unbiased theoretical value. Hence, to reiterate, the observed eﬀect in both situations is most cer-tainly a property of primes rather than a generic property of squarefreenumbers.

To put it more precisely (if not pedantically), if there is any actualcontribution to it from non-prime squarefree numbers, it is negligible compared tothe contribution from primes.Moreover, the eﬀect appears pretty stable over several sample ranges with therange size growing by 10 for each data point we collected to determine the eﬀectbehavior (see the next section).2.

Excess functions

Let us deﬁne them as ǫ = round (1000 ∗ ( R − R )) for isolated primes and ǫ = round (1000 ∗ ( R − R )) for prime pairs, where round ( x ) is tasked withrounding oﬀ x to the nearest integer. TATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRS AND ISOLATED PRIMES 3

The data for R and R is obtained numerically while R can be calculatedanalytically as done above. We obtained 5 data points for each of these functions(see the data section for more information) and they suggest (albeit quite weakly)that we may be dealing with slowly growing functions, at least in the case of ǫ .More data is needed to be positive that the trends we suspect these functions maybe showing are not due to statistical ﬂuctuations. It may very well be that the bestapproximation to these functions is a constant. This appears to be the most likelyscenario for ǫ and it may be so asymptotically for ǫ .Below are the values of these functions at arguments that are consecutive powersof 10, starting at 10 (we use the exponent values of 10 n to index the arguments).For isolated primes, ǫ (6) = 34, ǫ (7) = 39, ǫ (8) = 41, ǫ (9) = 42, ǫ (10) = 43.For prime pairs, ǫ (6) = 136, ǫ (7) = 134, ǫ (8) = 135, ǫ (9) = 136, ǫ (10) = 137.These numbers represent the (average) excess of nonsquarefree numbers com-pared to the non-biased case for every 1000 squarefree numbers. For instance, ǫ (10) tells us that there are on average 137 more prime pairs centered on non-squarefree multiples of 6 per 1000 squarefree multiples of 6 surrounded by primesthan one would expect in the unbiased situation for the ﬁrst 10 primes.3. Code and data

The eﬀect discussed was ﬁrst observed in Mathematica computer experimentsperformed on the ﬁrst 10 primes. The data for larger samples was obtained usingPARI/GP, an open source software package for number theory.What follows below is a sample of PARI/GP code used to obtain the data andthe data. The data is indexed by exponents of range size, chosen to be powers of10, from 10 to 10 . The code for Mathematica can easily be produced from thePARI/GP code.Our code counts all prime pairs (even though the ﬁrst of them, { , } , is notcentered on a multiple of 6) and excludes 2 as an isolated prime. While 2 issometimes treated as an isolated prime, it is actually less isolated from other primesthan all odd primes save for 3. With 2 excluded, the number of isolated primesplus twice the number of pairs still add to a range size (10 through 10 ), for 5is counted twice as a member of two consecutive pairs that share it. These choiceshave no impact on our statistical results. We mention them for the sake of clarity.In what follows, a represents the number of all target objects (primes or testsquarefree numbers), while b only the number of such objects next to or centeredon squarefree numbers. The ratios discussed above are calculated as R = ( a − b ) /b . Part A. Prime numbersTwin primes a=0; forprime(n=2, prime(10^8), isprime(n+2)&&a++); print1(a)\\all twinsb=0; forprime(n=2, prime(10^8), isprime(n+2)&&issquarefree(n+1)&&b++); print1(b) \\twins centered on a squarefree number

WALDEMAR PUSZKARZ a : 86027 (6), 738597 (7), 6497407 (8), 58047180 (9), 524733511 (10). b : 25113 (6), 215732 (7), 1897137 (8), 16944418 (9), 153121114 (10). Isolated primes a=0; forprime(n=3, prime(10^8), !isprime(n+2)&&!isprime(n-2)&&a++);print1(a) \\allb=0; forprime(n=3, prime(10^8), !isprime(n+2)&&!isprime(n-2)&&((n%6==1&&issquarefree(n-1))||(n%6==5&&issquarefree(n+1)))&&b++);print1(b) \\next to a squarefree number a : 827946 (6), 8522806 (7), 87005186 (8), 883905640 (9), 8950532978 (10). b : 249071 (6), 2560208 (7), 26123609 (8), 265275545 (9), 2685404943 (10). Part B. Test squarefree numbersSquarefree twins (primes included) centered on a multiple of 6 a=0; for(n=1, 10^8, issquarefree(6*n-1)&&issquarefree(6*n+1)&&a++); print1(a) \\allb=0; for(n=1, 10^8, issquarefree(6*n)&&issquarefree(6*n-1)&&issquarefree(6*n+1)&&b++); print1(b) \\centeredon a squarefree number a : 82962973 (8), 829630636 (9). b : 25097397 (8), 250974031 (9). Squarefree twins (primes excluded) centered on a multiple of 6 a=0; for(n=1, 10^8, issquarefree(6*n-1)&&!isprime(6*n-1)&&issquarefree(6*n+1)&&!isprime(6*n+1)&&a++);print1(a) \\allb=0; for(n=1, 10^8, issquarefree(6*n)&&issquarefree(6*n-1)&&!isprime(6*n-1)&&issquarefree(6*n+1)&&!isprime(6*n+1)&&b++);print1(b) \\centered on squarefree numbers a : 57015536 (8), 595982891 (9). b : 17348734 (8), 181210143 (9). Isolated squarefree numbers (primes included) next to a multiple of 6 a=0; for(n=1, 10^8, (issquarefree(6*n-1)||issquarefree(6*n+1))&&a++); print1(a) \\number of cases a squarefree number is next toa multiple of 6b=0; for(n=1, 10^8, (issquarefree(6*n))&&(issquarefree(6*n-1)||issquarefree(6*n+1))&&b++); print1(b) \\number of casesa squarefree number is next to a squarefree multiple of 6

TATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRS AND ISOLATED PRIMES 5 a : 99415124 (8). b : 30211331 (8). Isolated squarefree numbers (primes excluded) next to a multiple of 6 a=0; for(n=1, 10^8, (issquarefree(6*n-1)&&!isprime(6*n-1))||(issquarefree(6*n+1)&&!isprime(6*n+1))&&a++); print1(a) \\number ofcases a non-prime squarefree number is next to a multiple of 6b=0; for(n=1, 10^8, issquarefree(6*n)&&((issquarefree(6*n-1)&&!isprime(6*n-1))||(issquarefree(6*n+1)&&!isprime(6*n+1)))&&b++);print1(b) \\number of cases a non-prime squarefree number isnext to a squarefree multiple of 6 a : 94037859 (8), 948253019 (9). b : 28588317 (8), 288245142 (9).4. Other remarks

We mentioned that the eﬀect discussed here can easily be observed even amongthe ﬁrst million primes. More surprisingly, though, R is consistently larger com-pared to the expected value even for the ﬁrst 100, 1000, or 10,000 primes all theway up to one million (and beyond). More than 10,000 ﬁrst primes were knownby the end of the 18th century. What this means is that even Gauss or one of hiscontemporaries could have noticed it some 200 years ago! Yet, we have found noevidence it was known at all.The bias that we have so far chosen to measure using the R ratio can also beexpressed in other ways. For instance, one can inquire how much the number oftwins centered on squarefree numbers, or b (see the data section), diﬀers from itsexpected value for a given sample size (number a in our data section). Or, we caninquire about the same deviation from the expected value for twins centered onnonsquarefree numbers.As an example, let us calculate this for the ﬁrst 10 primes. In this case, wehave 86206 prime pairs centered on multiples of 6. If these pairs were distributedin an unbiased way, the way non-prime squarefree twins are, the number of themsurrounding squarefree multiples of 6 would be ca round (86026 / ( R + 1)) = 26149and not 25113 that we actually get. The deﬁcit we observe, 1036, represents 3 . − .

73% for the relative excessmeasured against the expected value. For isolated primes, similar calculations give1 .

03% and 0 . R ratio, 5 .

93% and 1 .

50% for the ﬁrst10 primes for pairs and isolated primes, respectively.One can measure this bias in yet another way suggested to us by Jon E. Schoen-ﬁeld. This way measures the redistribution of primes due to the bias. For the ﬁrst10 primes, 1036 twins get redistributed compared to the total of 86026. Thus, asa result of the bias, an excess of 1 .

20% of the total number of twins is captured

WALDEMAR PUSZKARZ by nonsquarefree numbers. For isolated primes, the respective number is 0 . .

39% for the ﬁrst 10 primes.Since the distribution of non-prime squarefree numbers is not aﬀected by thebias discussed here, one may expect a compensatory eﬀect among the nonsquarefreetwins (nonsquarefree numbers surrounding multiples of 6). Such is the case, indeed.The R ratio of these twins is lower than expected. For the ﬁrst 10 multiples of 6,it is 2 . R .The most common cluster of consecutive squarefree numbers is that of a triple:three squarefree numbers in a row. Doubles and singles are less common, with sin-gles a bit more common than doubles. While this fact may not be widely known, itis not necessarily surprising. But what seems interesting (perhaps even surprising)here is that prime pairs sabotage the formation of squarefree triples by choosing astheir middle partners nonsquarefree numbers over squarefree ones more often thanis the case among non-prime squarefree twins.Let us add one more remark that we hope has some pedagogical value. Namely,there is really only one insight invested in this study and rather simple one too:examining the value of R . Once the importance of this number is realized, every-thing else becomes child’s play and the whole research can be performed even by aresourceful high school student. 5. Conclusion

The results we reported above are quite basic, were obtained in an elementaryfashion, and concern fundamental classes of numbers. It is therefore rather surpris-ing that we found no mention of them in the literature of the subject. This leadsus to believe that the statistical bias they describe was most likely unknown.The study of biases in the distribution of prime numbers has recently been rein-vigorated by the work of Lemke Oliver and Soundararajan [1] on the phenomenonrelated to the one observed by Chebyshev already in 1853 and known as the Cheby-shev bias (see [2] and [3]). However, the eﬀect under consideration here is of diﬀerentnature than those discussed in the papers cited.Moreover, and more importantly, the Chebyshev bias is signiﬁcantly smallerthan our bias. Using the data for the ﬁrst million primes from [1], we see that thedeviation from the non-biased distribution (to be half a million of primes for eitherof two classes of primes that the Chebyshev eﬀect concerns) is only 170.Let us contrast this with the bias presented here. Using the redistribution metricfrom the previous section, we get a relative bias of 170 / . primes is 524733511, which makes it comparable to the size of the sample ofnon-prime squarefree twins for the ﬁrst 10 multiples of 6, 595982891. Yet, for thelatter sample, the ratio R deviates from the expected R by less than 0 . . TATISTICAL BIAS IN THE DISTRIBUTION OF PRIME PAIRS AND ISOLATED PRIMES 7 expect from the unbiased distribution, and, in particular, more often than is thecase for non-prime squarefree numbers.One can view this eﬀect as a certain statistical property of primes. It is substan-tial and simple enough to be of general interest. If only because of this, we believeit deserves further study. The main goal of this paper was to lay the empiricalgroundwork for this study. More empirical work is still needed, in particular toexamine the behavior of excess functions in a wider range of primes to determinethese functions approximate analytical form. To do this eﬃciently, more powerfulcomputing resources are required than we had at our disposal. But what is neededﬁrst and foremost is a theoretical model explaining this phenomenon. We hopefurther research will produce such a model.6.

Links

A text ﬁle with the PARI/GP code and an Excel spreadsheet with the data andresults can be downloaded from the author’s site.

Acknowledgements.

The author is grateful to the developers of PARI/GP andWolfram Mathematica, whose software was indispensable to this research, and toKevin Ford, Krzysztof D. Ma´slanka, Jon E. Schoenﬁeld, and Marek Wolf for theirinterest in this work and comments.

References [1] R. J. Lemke Oliver and K. Soundararajan,

Unexpected biases in the distribution of consec-utive primes , Proc. Natl. Acad. Sci. USA 113, E4446–E4454, (2016) (also arXiv:1603.03720[math.NT]).[2] M. Rubinstein and P. Sarnak,

Chebyshev’s bias , Experiment. Math., 3(3):173–197, (1994).[3] A. Granville and G. Martin,

Prime number races , Amer. Math. Monthly, 113(1):1–33, (2006).

Los Angeles, CA, USA

E-mail address ::