aa r X i v : . [ m a t h . N T ] D ec IRREGULAR PRIMES TO 163 MILLION
J. P. BUHLER AND D. HARVEY
Abstract.
We compute all irregular primes less than 163 577 356. For all ofthese primes we verify that the Kummer–Vandiver conjecture holds and thatthe λ -invariant is equal to the index of irregularity. Introduction
Bernoulli numbers are deeply intertwined with the the arithmetic of cyclotomicfields. To explain a basic case, consider the cyclotomic field K = Q ( ζ p ) generatedby a p -th root of unity ζ p = e πi/p , p prime. Let G = Gal( K/ Q ) ≃ ( Z /p Z ) × be itsGalois group, and let ω : G → Z × p be the unique p -adic character for which ω ( σ ) ≡ r mod p where σ ( ζ p ) = ζ rp . Then the p -Sylow subgroup A of the class group decomposesinto components A = M 1, then k is said to be an irregularindex for p if p divides the numerator of the k -th Bernoulli number B k . The indexof irregularity of p , denoted i p , is the number of such irregular indices k , and p is irregular if i p is positive. If k is irregular for p then ( p, k ) is said to be an irregularpair . Once the irregular pairs for a given p have been found it is possible to makefurther calculations verifying the Kummer–Vandiver conjecture, and to calculatethe corresponding λ -invariants in Iwasawa theory; these invariants λ p measure therate of growth of the p -part of the class group of p -power cyclotomic fields.This paper describes a calculation of all irregular pairs for p less than 39 · =163 577 356. The computation consumed more than twenty years of CPU time,and the quick summary of the results is that nothing surprising happened. TheKummer–Vandiver conjecture is true for all of these primes, and the λ -invariantis always as small as it can be, i.e., λ p = i p . This implies that the class group of Q ( ζ p n ) is, for all of these p , isomorphic to ( Z /p n Z ) i p . We did find one new prime with i p = 7, namely p = 32 012 327. At the moment that we write this, the datasupporting these calculations can be found at [Har09a].Irregular indices have been computed many times over the last 160 years, startingwith Kummer’s hand calculations for p ≤ h + p is prime to p . This was extended to p ≤ 619 early inthe twentieth century by Vandiver and his colleagues, using desk calculators (andgraduate students). Vandiver and the Lehmers, making one of the first number-theoretic calculations on electronic computers, extended this to p ≤ p < 150 000 by 1987 (see [Joh75], [Wag78], or [TW87] fora summary and further references). In the early 1990’s, Buhler, Crandall, Ernvall,Mets¨ankyl¨a, Sompolski, and Shokrollahi used “asymptotically fast” algorithms togo much further, and by 1999 the computations were extended to all primes lessthan 12 million [BCS92], [BCEM93], [BCE + p by a factor of almost 14 were: (a) theuse of two large clusters at the Texas Advanced Computing Center at the Universityof Texas at Austin, and (b) new algorithms and implementations of them, due tothe second author, for polynomial arithmetic over finite fields.The paper is organized as follows: we first describe the algorithms for calculating B k mod p for 0 < k < p − 1, algorithms for performing the needed polynomial arith-metic, and the algorithms for the Vandiver and cyclotomic invariant calculations.Then we summarize the results, including a description of the hardware that weused and a discussion of the steps we took to ensure correctness.2. Bernoulli numbers modulo p Bernoulli numbers are defined by a formal power series inversion te t − X k ≥ B k t k k ! . We have B = 1 / 2, and all other odd Bernoulli numbers B , B , . . . are zero. Thefirst, and by far and away the most time-consuming phase of our calculations is thecomputation of B k mod p , 0 ≤ k < p − k even, for each p in turn.We used two different algorithms, the Voronoi identity method and the powerseries method . Both depend crucially on asymptotically fast polynomial arithmeticover Z /p Z . The theoretical complexity is O ( p log ε p ) for both methods, but theimplied constants and memory footprints are different — the algorithm using theVoronoi identity seems to be faster but requires more memory. We first describe thetwo methods, and then discuss the algorithms used for the underlying polynomialarithmetic.The method based on the Voronoi identity is essentially identical to the ‘rootfinding method’ of [BCE + p < · = 88 080 384. Let g bea generator of ( Z /p Z ) × . Then for 2 ≤ k < p − k even, we have(1) B k ≡ k − g k ( p − / X i =0 g ( k − i h ( g i ) (mod p ) , RREGULAR PRIMES TO 163 MILLION 3 where h ( x ) = ( x mod p ) − g ( x/g mod p ) p + g − . This is a version of Voronoi’s congruence with the terms arranged in multiplicativeorder; see [Sla87], [Van17] (or [DSS91] for a Bernoulli number bibliography, withan updated online version at [Dil09]). The sum on the right hand side may beregarded as the Fourier transform of i h ( g i ) /g i with respect to the roots of unity { g k } ( p − / k =0 . Using Bluestein’s FFT algorithm [Blu70], we convert the problem ofsimultaneously computing all B k to the problem of multiplying two polynomials oflength ( p − / Z /p Z , plus O ( p ) pre-processing and post-processing operationsin Z /p Z .The power series method is described in [BCS92]. We used the variant based onthe identity ∞ X n =0 n B n (2 n )! x n = A ( x ) + A ( x ) x + A ( x ) x + A ( x ) x D ( x ) , where D and the A i are power series whose coefficients are given by simple recur-rence formulae [BCS92, p. 719]. This reduces to a power series inversion of length p/ O (1), and four series multiplications of length p/ O (1), over Z /p Z . Thismethod uses less memory than the Voronoi identity method, and we used it for allprimes 21 · < p < · .3. Polynomial and power series arithmetic Most of the algorithms for polynomial and power series arithmetic described hereare implemented in the zn poly library of the second author [Har08b].In what follows, we consider arithmetic in R [ x ], including arithmetic on truncatedpower series in R J x K , where R is a coefficient ring in which 2 is not a zero-divisor.For the algorithms described above, we could take R = Z /p Z , where p is theprime of interest. In our implementation, we found it profitable to handle twonearby primes simultaneously, taking instead R = Z /p p Z , and using the ChineseRemainder Theorem. This was possible since the computations were run on 64-bithardware, and the largest prime considered was less than 32 bits long.Our large polynomial multiplication routine involves a combination of algo-rithms. The outermost layer reduces the multiplication of polynomials f, g ∈ R [ x ] of length n to O ( n / ) negacyclic convolutions of length O ( n / ), using anadaptation to polynomials of the Sch¨onhage–Strassen integer multiplication algo-rithm [SS71] suggested by Sch¨onhage [Sch77]. This proceeds by splitting the in-puts into segments of length M/ R [ y, z ] / ( y M + 1 , z K − 1) via the inverse of the substitution y x , z x M/ , where M = 2 ⌈ (1+log n ) / ⌉ = O ( n / ) and K = 2 ⌈ log (4 n/M ) ⌉ = O ( n / ). These conditionsensure that M K ≥ n (the product ‘fits’ into the bivariate quotient ring) and that K | M . The bivariate multiplication may then be effected by using FFTs overthe ring R [ y ] / ( y M + 1) with respect to the primitive K -th root of unity y M/K .These FFTs involve only additions and subtractions in R , plus some book-keepingto keep track of powers of y . To improve memory locality and smoothness of therunning time with respect to n , we use a cache-friendly version [Har09b] of van derHoeven’s truncated Fourier transform (TFT) method [vdH04, vdH05]. J. P. BUHLER AND D. HARVEY For the negacyclic convolutions (recursive multiplications in R [ y ] / ( y M + 1)) weuse Nussbaumer’s algorithm [Nus80] for sufficiently large M ; this follows the samegeneral plan as Sch¨onhage’s algorithm sketched above, but uses a slightly differentsubstitution that can reduce the size of the Fourier coefficients by a factor of twoin some cases. For sufficiently small M , we switch to a variant of the Kroneckersubstitution method [Har09c], which packs the input polynomials into integers andmultiplies the resulting integers.Our power series inversion algorithm is based on a Newton iteration. We fol-lowed the approach suggested in [HQZ04, Algorithm MP-inv, p. 421], which takesadvantage of the middle product to improve the running time constant. A minorimprovement in our algorithm is that we reuse the Fourier transform of α in lines3 and 4 of [HQZ04], reducing the theoretical cost by a further 1/6. To transportthe smoothness benefits of the truncated FFT from the ordinary product to themiddle product, it was necessary to implement a transposed TFT and inverse TFTover R [ y ] / ( y M + 1), making use of the “transposition principle” [BLS03]. Unlikethe usual (non-truncated) FFT, the matrices for the TFT and inverse TFT are notsymmetric, so genuinely new code was required.It is worth pointing out that our polynomial arithmetic routines are based en-tirely on integer arithmetic; we did not use floating point arithmetic at all.Memory constraints became formidable for those p near the top of the rangethat we handled using the Voronoi identity method. Just to store two polynomialsof length p/ p ≈ · , requires 1.4 GB of memory,assuming a 64-bit word for each coefficient. Each thread had only 2 GB RAMavailable (see description of the hardware below), leaving very little auxiliary work-ing storage, certainly much less than is required to store the FFT coefficients. Toovercome this, we implemented a discrete Fourier transform variant that uses anaive quadratic-time DFT algorithm to handle the first few layers of the transform,and then the standard O ( n log n ) algorithm for the remaining layers. The resultingtime-space tradeoff was acceptable up to some limit, but became infeasible as weapproached the 2 GB barrier, prompting us to switch to the less memory-intensivepower series method.4. Kummer–Vandiver and cyclotomic invariants We quickly summarize some well-known facts about cyclotomic class groups,using notation established above; for details see [Was97]. If p is prime and k is even,0 ≤ k < p − 1, then A p − k is nontrivial if and only if k is irregular for p . Moreover,a “reflection theorem” implies that if A k is nontrivial then A p − k is nontrivial, i.e., k is irregular for p . The component A k has the same order as the ω k -component ofthe p -Sylow subgroup of U K / Cyc K , where U K = O × K is the group of units insidethe ring of integers of K , and Cyc K denotes the group of cyclotomic units. Thelatter can be described explicitly, and Vandiver’s conjecture for p , which says that A k is trivial for all even k , can be proved by showing that the cyclotomic unit u k that generates this component does not have a p -th root lying in K .This can be established by finding a prime ideal Q of K that has residue classdegree 1, for which u k is not a p -th power modulo Q . This reduces (again, fordetails see [Was97]) to showing that it is possible to find a rational prime q with RREGULAR PRIMES TO 163 MILLION 5 q ≡ p such that V ( q − /pp,k q , where V p,k = ( p − / Y c =1 ( z c − z − c ) c p − − k (mod q ) . Here z is a p -th root of unity modulo q , and Q is one of the primes lying over q .Thus, Vandiver’s conjecture for p is proved if the above condition is checked foreach irregular index k for p . We performed this computation for every irregularpair in our table, and in each case the above condition held, and in all cases thesmallest prime q ≡ p worked.To verify that the λ -invariant λ p is equal to i p it suffices to check that certaincongruences between Bernoulli numbers do not hold. These reduce to elementarycongruences [EM91]. To describe these, let S ( e ) = ( p − / X a =1 a e . For each irregular index k for a given prime p we check that2 k p,S ( k − S ( p + k − 2) mod p ,kS ( k − ( k − S ( p + k − 2) mod p . These congruences imply that B k p , and if they hold for all irregularindices k for p then λ p = i p . In the case that 2 k ≡ p , this test can bereplaced by another one; for details on this see [EM91] (though note that ournotation is different). 5. Results There are N = 9 163 831 primes up to 39 · = 163 577 356. The indices ofirregularity i p ranged up to 7, and the number of primes N i of index i are tabulatedbelow. i N i N i /N p i N · p i Table 1. Irregular index statistics for p < · Here p i = e − / / (2 i i !) is the probability that a Poisson process with mean 1 / i successes. This is motivated by the heuristic (Lehmer, Siegel) that if theBernoulli numerators are random modulo p , then i p is the number of successes in p/ /p of success. J. P. BUHLER AND D. HARVEY The five primes of index 7 are 3 238 481, 5 216 111, 5 620 861, 9 208 289 and32 012 327.The Vandiver conjecture was found to be true for all primes tested. The λ -invariant was equal to i p for all primes; the alternate test mentioned above wasrequired for 3 primes.For a discussion of heuristics that might apply to Vandiver’s conjecture and tothe value of λ p see [Was97, p. 159] and [Lan90, p. 261]. If ( p, k ) is an irregularpair, Washington examines the consequences of assuming that the probability thatthe cyclotomic unit u k is a p -th power is 1 /p , all independently of each other.Using this and heuristics on the distribution of i p , he guesses that the number ofcounterexamples to Vandiver’s conjecture for p < x should grow like log log x . Asan exercise in futility, we may try to refine Washington’s heuristics using the knownvalues of i p , taking into account for example that the first irregular prime ( p = 37)is unexpectedly large. We find that the expected number of counterexamples up to12 million is about 0 . . 074 counterexamples were expectedbetween 12 million and 163 million (though of course we now know that thereare none in either case). Many people believe that Vandiver’s conjecture is true;it also seems reasonable to believe that the conjecture is false but that the firstcounterexample is so astronomically large that it may never be known. Similarremarks apply to the λ p = i p conjecture.As mentioned earlier, the table of irregular pairs, together with check data forthe Vandiver and cyclotomic invariant verifications, can be found at [Har09a].6. Correctness Any computation on this scale is virtually guaranteed to encounter faults ofvarious kinds, and this computation was no exception. On one occasion duringthe main Bernoulli number computation, a hardware or operating system failure— we could not determine which — resulted in several kilobytes of output beingoverwritten by random data, and it was necessary to repeat the computation for theaffected primes. On another occasion, the Vandiver and cyclotomic runs locatedsubtle memory errors (subsequently confirmed by other tests) on two workstations.In comparing our data for those tests, we found a handful of examples (which wererecalculated) were some sort of write error invalidating a very small number oflines. To combat this sort of occurrence, we took careful precautionary measuresthroughout the project, which we now describe.The main irregular index computation was far too expensive to run more thanonce. We checked the correctness of this phase by several methods. During themain computation itself, for each p we verified the identity p − X k =0 k ( k + 1) B k ≡ − p )(see [BCS92, p. 720]). Already this is a strong consistency check, but it has thedrawback that it leaves no certificate that can be checked after the computation isfinished. To increase our confidence in the output, we made our program recordsomewhat more than just the irregular indices. Let N p = min(2 log p, ( p − B k mod p for a given p , we extracted and stored the N p smallestpairs ( k, B k mod p ), where the pairs are ordered first by B k mod p and then by k . In particular, the irregular pairs are listed first, and for the largest primes RREGULAR PRIMES TO 163 MILLION 7 considered we store about 37 pairs. The resulting compressed file is 1.5 GB, and isavailable for download [Har09a]. We then used a completely independent programand hardware to check these recorded pairs. For each k , one may compute B k mod p in O ( p ) operations using an identity such as (1). We used the highly optimizedimplementation described in [Har08a].This scheme has an additional property we might expect of a ‘certificate’. Askeptic may randomly select and verify a pair ( k, B k mod p ) from our output file.Since presumably the only way to find a k for which B k mod p is ‘small’ is tocompute all the B k mod p , the skeptic may be persuaded that we must have actuallyperformed the full computation for each p . Note that recording all the B k mod p that we computed is impractical, requiring in the order of 10 GB of storage.For the Vandiver tests, we computed V p,k twice for each irregular pair and cross-checked the results. This was done using programs written independently by thetwo authors, working separately and sharing no code. The programs were run ondifferent hardware, using different libraries for the modular arithmetic (zn poly andNTL [Sho09]).The first run used a new implementation of the method described in [BCE + c from 1 up to ( p − / 2, we write c = 2 i g j ,where g is a primitive root modulo p , and iterate over 0 ≤ i < t in an inner loopand 0 ≤ j < ( p − /t in an outer loop, where t is the order of 2 modulo p . Foreach pair ( i, j ) we only include the multiplicand in the product if the proposed c satisfies 1 ≤ c ≤ ( p − / 2. The advantage of this approach is that only a singlemultiplication in Z /p Z is required to update c p − − k on each iteration, rather thanthe O (log p ) operations needed to compute each c p − − k had the c been processedsequentially. Updating z c under this scheme requires only a single operation in theinner loop (since z i +1 g j = ( z i g j ) ), and slightly more in the outer loop. The latterexecutes infrequently for most p , since t is usually large.For the cyclotomic invariants, we again ran everything twice, using indepen-dent programs on different hardware. The first run used the implementation from[BCE + Hardware The aforementioned computations were performed on a number of different ma-chines:‘Lonestar’ is a 1300-node cluster at the Texas Advanced Computing Center(TACC), running a custom Linux operating system. Each node contains two dual-core 2.66 GHz Intel Xeon (Woodcrest) processors and has 8 GB RAM. This ma-chine was used for computing irregular indices (about 119 000 core-hours). We usedvarious versions of GMP (4.2.1–4.2.4) [Gra08] for the large integer multiplication,patched with assembly code written by Jason Martin for the Core 2 architecture. J. P. BUHLER AND D. HARVEY ‘Ranger’ is a 3936-node cluster at TACC, running a custom Linux operatingsystem. Each node contains four quad-core 2.3 GHz AMD Opteron (Barcelona)processors and has 32 GB RAM. At the time the computations were run, thismachine was ranked by top500.org as the fourth most powerful supercomputer inthe world. This machine was used for computing irregular indices (about 68 000core-hours). We used the same versions of GMP mentioned above, together withassembly code written by the second author and Torbj¨orn Granlund. This code hassubsequently been incorporated into public releases of GMP; indeed, the needs ofthe present computation were a large part of the motivation for this work on GMP.The first run of the Vandiver and cyclotomic invariant checks was performed ona network of 24 desktop machines, each with a 3.4 GHz Pentium 4 CPU (about9 000 core-hours). The second run was performed on a 16-core 2.6 GHz Opteronserver (about 10 000 core-hours). This machine was also used for the verification ofthe B k mod p (about 500 core-hours). Acknowledgements Many thanks to Torbj¨orn Granlund for contributing his invaluable assemblyprogramming expertise.The authors acknowledge the Texas Advanced Computing Center (TACC) atThe University of Texas at Austin for providing high-performance computing re-sources that have contributed to the research results reported within this paper.Thanks to Fernando Rodriguez Villegas and the Department of Mathematics atthe University of Texas at Austin for arranging access to the clusters at TACC, andto Salman Butt for his technical assistance with the clusters.Thanks to the Department of Mathematics at Harvard University for supplyingthe Opteron server.The first author thanks Richard Crandall and Amin Shokrollahi for useful con-versations on this paper. References [BCE + 01] Joe Buhler, Richard Crandall, Reijo Ernvall, Tauno Mets¨ankyl¨a, and M. AminShokrollahi, Irregular primes and cyclotomic invariants to 12 million , J. SymbolicComput. (2001), no. 1-2, 89–96, Computational algebra and number theory (Mil-waukee, WI, 1996).[BCEM93] J. Buhler, R. Crandall, R. Ernvall, and T. Mets¨ankyl¨a, Irregular primes and cyclo-tomic invariants to four million , Math. Comp. (1993), no. 203, 151–153.[BCS92] J. P. Buhler, R. E. Crandall, and R. W. Sompolski, Irregular primes to one million ,Math. Comp. (1992), no. 200, 717–722.[BLS03] Alin Bostan, Gr´egoire Lecerf, and ´Eric Schost, Tellegen’s principle into practice , Sym-bolic and Algebraic Computation (J. R. Sendra, ed.), ACM Press, 2003, Proceedingsof ISSAC’03, Philadelphia, August 2003., pp. 37–44.[Blu70] L. Bluestein, A linear filtering approach to the computation of discrete Fourier trans-form , Audio and Electroacoustics, IEEE Transactions on (1970), no. 4, 451–455.[Cor08] Leo Corry, Fermat meets SWAC: Vandiver, the Lehmers, computers, and numbertheory , IEEE Ann. Hist. Comput. Bernoulli numbers , Queen’sPapers in Pure and Applied Mathematics, vol. 87, Queen’s University, Kingston, ON,1991, Bibliography (1713–1990).[EM91] R. Ernvall and T. Mets¨ankyl¨a, Cyclotomic invariants for primes between 125000 and150000 , Math. Comp. (1991), 851–858. RREGULAR PRIMES TO 163 MILLION 9 [EM92] , Cyclotomic invariants for primes up to one million , Math. Comp. (1992),249–250.[Gra08] Torbj¨orn Granlund, The GNU Multiple Precision Arithmetic library , 2008,http://gmplib.org/.[Har08a] David Harvey, A multimodular algorithm for computing Bernoulli numbers , to appearin Mathematics of Computation, preprint at http://arxiv.org/abs/0910.1926, 2008.[Har08b] David Harvey, The zn poly library A cache-friendly truncated FFT , Theoret. Comput. Sci. (2009),no. 27-29, 2649–2658.[Har09c] , Faster polynomial multiplication via multipoint Kronecker substitution , J.Symbolic Comput. (2009), no. 10, 1502–1510.[HQZ04] Guillaume Hanrot, Michel Quercia, and Paul Zimmermann, The middle product algo-rithm, I. , Appl. Algebra Engrg. Comm. Comput. (2004), no. 6, 415–438.[Joh75] Wells Johnson, Irregular primes and cyclotomic invariants , Math. Comp. (1975),113–120.[Lan90] Serge Lang, Cyclotomic fields I and II , Graduate Texts in Mathematics, vol. 21,Springer-Verlag, New York, 1990.[Nus80] Henri J. Nussbaumer, Fast polynomial transform algorithms for digital convolution ,IEEE Trans. Acoust. Speech Signal Process. (1980), no. 2, 205–215.[Sch77] A. Sch¨onhage, Schnelle Multiplikation von Polynomen ¨uber K¨orpern der Charakter-istik 2 , Acta Informat. (1976/77), no. 4, 395–398.[Sho09] Victor Shoup, NTL: A library for doing number theory A remark on the paper of T. Uehara: “On p -adic continuous func-tions determined by the Euler numbers” [Rep. Fac. Sci. Engrg. Saga Univ. Math. No.8 (1980), 1–8; MR0567622 (81e:12020)] , Rep. Fac. Sci. Engrg. Saga Univ. Math. (1987), 1–2.[SS71] Arnold Sch¨onhage and Volker Strassen, Schnelle Multiplikation grosser Zahlen , Com-puting (Arch. Elektron. Rechnen) (1971), 281–292.[TW87] Jonathan Tanner and Jr. Wagstaff, Samuel S., New congruences for the Bernoullinumbers , Math. Comp. (1987), 341–350.[Van17] H. S. Vandiver, Symmetric functions formed by systems of elements of a finite algebraand their connection with fermat’s quotient and Bernoulli numbers , Ann. Math. (1917), 105–114.[vdH04] Joris van der Hoeven, The truncated Fourier transform and applications , ISSAC 2004,ACM, New York, 2004, pp. 290–296.[vdH05] , Notes on the truncated Fourier transform