Faster integer and polynomial multiplication using cyclotomic coefficient rings
aa r X i v : . [ c s . S C ] D ec FASTER INTEGER AND POLYNOMIAL MULTIPLICATIONUSING CYCLOTOMIC COEFFICIENT RINGS
DAVID HARVEY AND JORIS VAN DER HOEVEN
Abstract.
We present an algorithm that computes the product of two n -bitintegers in O ( n log n (4 √ log ∗ n ) bit operations. Previously, the best knownbound was O ( n log n log ∗ n ). We also prove that for a fixed prime p , polynomi-als in F p [ X ] of degree n may be multiplied in O ( n log n log ∗ n ) bit operations;the previous best bound was O ( n log n log ∗ n ). Introduction
In this paper we present new complexity bounds for multiplying integers andpolynomials over finite fields. Our focus is on theoretical bounds rather than prac-tical algorithms. We work in the deterministic multitape Turing model [19], inwhich time complexity is defined by counting the number of steps, or equivalently,the number of ‘bit operations’, executed by a Turing machine with a fixed, finitenumber of tapes. The main results of the paper also hold in the Boolean circuitmodel.The following notation is used throughout. For x ∈ R , we denote by log ∗ x theiterated logarithm, that is, the least non-negative integer k such that log ◦ k x ◦ k x := log · · · log x (iterated k times). For a positive integer n , we definelg n := max(1 , ⌈ log n ⌉ ); in particular, expressions like lg lg lg n are defined andtake positive values for all n >
1. We denote the n -th cyclotomic polynomial by φ n ( X ) ∈ Z [ X ], and the Euler totient function by ϕ ( n ).All absolute constants in this paper are in principle effectively computable. Thisincludes the implied constants in all uses of O ( · ) notation.1.1. Integer multiplication.
Let M ( n ) denote the number of bit operations re-quired to multiply two n -bit integers. For over 35 years, the best known bound for M ( n ) was that achieved by the Sch¨onhage–Strassen algorithm [23], namely M ( n ) = O ( n lg n lg lg n ) . (1.1)In 2007, F¨urer described an asymptotically faster algorithm that achieves M ( n ) = O ( n lg n K log ∗ n Z ) (1.2)for some unspecified constant K Z > n to a large collection of multiplications of size exponentially smallerthan n ; these smaller multiplications are handled recursively. The K log ∗ n Z term maybe understood roughly as follows: the number of recursion levels is log ∗ n + O (1),and the constant K Z measures the amount of ‘data expansion’ that occurs at eachlevel, due to phenomena such as zero-padding.Immediately following F¨urer’s work, De, Kurur, Saha and Saptharishi describeda variant based on modular arithmetic [10], instead of the approximate complex DAVID HARVEY AND JORIS VAN DER HOEVEN arithmetic used by F¨urer. Their algorithm also achieves (1.2), again for someunspecified K Z > K Z was given by Harvey, van der Hoeven and Lecerf,who described an algorithm that achieves (1.2) with K Z = 8 [17]. Their algorithmborrows some important ideas from F¨urer’s work, but also differs in several respects.In particular, their algorithm has no need for the ‘fast’ roots of unity that were thecornerstone of F¨urer’s approach (and of the variant of [10]). The main presentationin [17] is based on approximate complex arithmetic, and the paper includes a sketchof a variant based on modular arithmetic that also achieves K Z = 8.In a recent preprint, the first author announced that in the complex arithmeticcase, the constant may be reduced to K Z = 6, by taking advantage of new methodsfor truncated integer multiplication [14]. This improvement does not seem to applyto the modular variants.The first main result of this paper is the following further improvement. Theorem 1.1.
There is an integer multiplication algorithm that achieves M ( n ) = O ( n lg n (4 √ log ∗ n ) . In other words, (1.2) holds with K Z = 4 √ ≈ . K Z = 4 under various unproved number-theoretic conjectures: see [17, § K Z = 4 can be reached unconditionally remains an importantopen question.1.2. Polynomial multiplication over finite fields.
For a prime p , let M p ( n )denote the number of bit operations required to multiply two polynomials in F p [ X ]of degree less than n . The optimal choice of algorithm for this problem dependsvery much on the relative size of n and p .If n is not too large compared to p , say lg n = O (lg p ), then a reasonable choiceis Kronecker substitution : one lifts the polynomials to Z [ X ], packs the coefficientsof each polynomial into a large integer (i.e., evaluates at X = 2 b for b := 2 lg p +lg n ), multiplies these large integers, unpacks the resulting coefficients to obtain theproduct in Z [ X ], and finally reduces the output modulo p . This leads to the bound M p ( n ) = O ( M ( n lg p )) = O ( n lg p lg( n lg p ) K log ∗ ( n lg p ) Z ) , (1.3)where K Z is any admissible constant in (1.2). To the authors’ knowledge, this isthe best known asymptotic bound for M p ( n ) in the region lg n = O (lg p ).When n is large compared to p , the situation is starkly different. The Kroneckersubstitution method leads to poor results, due to coefficient growth in the liftedproduct: for example, when p is fixed, Kronecker substitution yields M p ( n ) = O ( M ( n lg n )) = O ( n (lg n ) K log ∗ n Z ) . For many years, the best known bound in this regime was that achieved by thealgebraic version of the Sch¨onhage–Strassen algorithm [23, 22], namely M p ( n ) = O ( n lg n lg lg n lg p + n lg n M (lg p )) . (1.4)The first term arises from performing O ( n lg n lg lg n ) additions in F p , and thesecond term from O ( n lg n ) multiplications in F p . (In fact, this sort of bound holds ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 3 for polynomial multiplication over quite general rings [7].) For fixed p , this is fasterthan the Kronecker substitution method by a factor of almost lg n . The mainreason for its superiority is that it exploits the modulo p structure throughout thealgorithm, whereas the Kronecker substitution method forgets this structure in thevery first step.After the appearance of F¨urer’s algorithm, it was natural to ask whether a F¨urer-type bound could be proved for M p ( n ), in the case that n is large compared to p .This question was answered in the affirmative by Harvey, van der Hoeven andLecerf, who gave an algorithm that achieves M p ( n ) = O ( n lg p lg( n lg p ) 8 log ∗ ( n lg p ) ) , uniformly for all n and p [18]. This is a very elegant bound; however, written in thisway, it obscures the fact that the constant 8 plays two quite different roles in thecomplexity analysis. One source of the value 8 is the constant K Z = 8 arising fromthe integer multiplication algorithm mentioned above, but there is also a separateconstant K F = 8 arising from the polynomial part of the algorithm. There is noparticular reason to expect that K Z = K F , and it is somewhat of a coincidence thatthey have the same numerical value in [18].To clarify the situation, we mention that one may derive a complexity boundfor the algorithm of [18] under the assumption that one has available an integermultiplication algorithm achieving (1.2) for some K Z >
1, where possibly K Z = 8.Namely, one finds that M p ( n ) = O ( n lg p lg( n lg p ) K max(0 , log ∗ n − log ∗ p ) F K log ∗ p Z ) (1.5)where K F = 8 (we omit the proof). The second main result of this paper, provedin Section 4, is the following improvement in the value of K F . Theorem 1.2.
Let K Z > be any constant for which (1.2) holds (for example, byTheorem 1.1, one may take K Z = 4 √ ). Then there is a polynomial multiplicationalgorithm that achieves M p ( n ) = O ( n lg p lg( n lg p ) 4 max(0 , log ∗ n − log ∗ p ) K log ∗ p Z ) , (1.6) uniformly for all n > and all primes p . In other words, (1.5) holds with K F = 4. In particular, for fixed p , one canmultiply polynomials in F p [ X ] of degree n in O ( n lg n log ∗ n ) bit operations.Theorem 1.2 may be generalised in various ways. We briefly mention a few possi-bilities along the lines of [18, §
8] (no proofs will be given). First, we may obtain anal-ogous bit complexity bounds for multiplication in F p a [ X ] and ( Z /p a Z )[ X ] for a > Z /m Z )[ X ] for arbitrary m > A [ X ] of degree less than n , for any F p -algebra A , us-ing O ( n lg n log ∗ n ) additions and scalar multiplications and O ( n log ∗ n ) nonscalarmultiplications (compare with [18, Thm. 8.4]).1.3. Overview of the new algorithms.
To explain the new approach, let us firstrecall the idea behind the polynomial multiplication algorithm of [18].Consider a polynomial multiplication problem in F p [ X ], where the degree n isvery large compared to p . By splitting the inputs into chunks, we convert this DAVID HARVEY AND JORIS VAN DER HOEVEN to a bivariate multiplication problem in F p [ Y, Z ] / ( f ( Y ) , Z m − m and irreducible polynomial f ∈ F p [ Y ]. This bivariate product is handledby means of DFTs (discrete Fourier transforms) of length m over F p [ Y ] /f . The keyinnovation of [18] was to choose deg f so that p deg f − n , even though deg f itself is exponentially smaller. This is possible thanks to a number-theoretic resultof Adleman, Pomerance and Rumely [1], building on earlier work of Prachar [21].Taking m to be a product of many of these primes, we obtain m | p deg f −
1, andhence F p [ Y ] /f contains a root of unity of order m . As m is highly composite, eachDFT of length m may be converted to a collection of much smaller DFTs via theCooley–Tukey method. These in turn are converted into multiplication problemsusing Bluestein’s algorithm. These multiplications, corresponding to exponentiallysmaller values of n , are handled recursively.The recursion continues until n becomes comparable to p . The number of re-cursion levels during this phase is log ∗ n − log ∗ p + O (1), and the constant K F = 8represents the expansion factor at each recursion level. When n becomes compara-ble to p , the algorithm switches strategy to Kronecker substitution combined withordinary integer multiplication. This phase contributes the K log ∗ p Z term.It was pointed out in [16, §
8] that the value of K F can be improved to K F = 4if one is willing to accept certain unproved number-theoretic conjectures, includingArtin’s conjecture on primitive roots. More precisely, under these conjectures, onemay find an irreducible f of the form f ( Y ) = Y α − + · · · + Y + 1, where α is prime,so that F p [ Y, Z ] / ( f ( Y ) , Z m −
1) is a direct summand of F p [ Y, Z ] / ( Y α − , Z m − F p [ X ] / ( X αm − K F .To prove Theorem 1.2, we will pursue a variant of this idea. We will take f to bea cyclotomic polynomial φ α ( Y ) for a judiciously chosen integer α (not necessarilyprime). Since φ α | Y α −
1, we may use the above isomorphism to realise the sameeconomy in zero-padding as in the conjectural construction of [16, § f be irreducible in F p [ Y ]. Thus F p [ Y ] /f is nolonger in general a field, but a direct sum of fields. The situation is reminiscent ofF¨urer’s algorithm, in which the coefficient ring C [ Y ] / ( Y r + 1) is not a field, but adirect sum of copies of C . The key technical contribution of this paper is to showthat we have enough control over the factorisation of φ α in F p [ Y ] to ensure that F p [ Y ] /φ α contains suitable principal roots of unity. This approach avoids Artin’sconjecture and other number-theoretic difficulties, and enables us to reach K F = 4unconditionally. The construction of α is the subject of Section 3, and the mainpolynomial multiplication algorithm is presented in Section 4.Let us now outline how we go about proving Theorem 1.1 (the integer case).The algorithm is heavily dependent on the polynomial multiplication algorithmjust sketched. We take the basic problem to be multiplication in Z / (2 n − Z , forarbitrary positive n . We choose a collection of small primes p , each having around2 lg lg n bits, and whose product P has (lg n ) o (1) bits. By cutting the input inte-gers into many small chunks, we convert to a multiplication in ( Z /P Z )[ X ] / ( X N − N ≈ n/ lg P . One technical headache is that n is not necessarilydivisible by N ; following [17], we deal with this by adapting an idea of Crandall ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 5 and Fagin [9]. Next, by the Chinese remainder theorem, we reduce to multiply-ing in F p [ X ] / ( X N −
1) for each p separately. This is reminisicent of Pollard’salgorithm [20], but instead of using three primes, here the number of primes growswith n . At this stage, the coefficient size lg p is doubly exponentially smaller than N .We perform these multiplications in F p [ X ] / ( X N −
1) by applying two recursion lev-els of the polynomial multiplication algorithm of Theorem 1.2. This reduces theproblem to a collection of multiplication problems in F p [ X ], each doubly exponen-tially smaller than the original problem. Using Kronecker substitution, these areconverted back to multiplications in Z / (2 n ′ − Z , where n ′ is doubly exponentiallysmaller than n , and the algorithm is applied recursively.In effect, each recursive call in the new integer multiplication algorithm corre-sponds to two recursion levels of the existing F¨urer-type algorithms. The speeduprelative to [17] may be understood as follows. In the algorithm of [17], at eachrecursion level we incur a factor of two in overhead due to the zero-padding thatoccurs when we split the inputs into small chunks. In the new algorithm, the pas-sage from Z /P Z to F p manages the same exponential size reduction without anyzero-padding. This roughly corresponds to saving a factor of two at every secondrecursion level of the algorithm of [17], and explains the factor of ( √ log ∗ n overallspeedup. 2. Preliminaries
Logarithmically slow functions.
Let x ∈ R , and let Φ : ( x , ∞ ) → R be a smooth increasing function. We recall from [17, §
5] that Φ is said to be logarithmically slow if there exists an integer ℓ > ◦ ℓ ◦ Φ ◦ exp ◦ ℓ )( x ) = log x + O (1)as x → ∞ . For example, the functions log(5 x ), 5 log x , (log x ) , and 2 (log log x ) arelogarithmically slow, with ℓ = 0 , , , x is chosen large enough to ensure that Φ( x ) x − x > x . According to [17, Lemma 2], this is possible for any logarithmicallyslow function, and it implies that the iterator Φ ∗ ( x ) := min { k > ◦ k ( x ) x } is well-defined on R . It is shown in [17, Lemma 3] that this iterator satisfiesΦ ∗ ( x ) = log ∗ x + O (1) (2.1)as x → ∞ . In other words, logarithmically slow functions are more or less indistin-guishable from log x , as far as iterators are concerned.As in [17] and [18], we will use logarithmically slow functions to measure sizereduction in multiplication algorithms. The typical situation is that we have afunction T ( n ) measuring the (normalised) cost of a certain multiplication algorithmfor inputs of size n ; we reduce the problem to a collection of problems of size n i < Φ ◦ κ ( n ) for some κ >
1, leading to a bound for T ( n ) in terms of the various T ( n i ). Applying the reduction recursively, we wish to convert these bounds into anexplicit asymptotic estimate for T ( n ). This is achieved via the following ‘mastertheorem’. Proposition 2.1.
Let
K > , B > , and let ℓ > and κ > be integers. Let x > exp ◦ ℓ (1) , and let Φ : ( x , ∞ ) → R be a logarithmically slow function such that Φ( x ) x − for all x > x . Assume that x > x is chosen so that Φ ◦ κ ( x ) is DAVID HARVEY AND JORIS VAN DER HOEVEN defined for all x > x . Then there exists a positive constant C (depending on x , x , Φ , K , B , ℓ and κ ) with the following property.Let σ > x and L > . Let S ⊆ R , and let T : S → R > be any functionsatisfying the following recurrence. First, T ( y ) L for all y ∈ S , y σ . Second,for all y ∈ S , y > σ , there exist y , . . . , y d ∈ S with y i Φ ◦ κ ( y ) , and weights γ , . . . , γ d > with P i γ i = 1 , such that T ( y ) K (cid:18) B log ◦ ℓ y (cid:19) d X i =1 γ i T ( y i ) + L. For all y ∈ S , y > σ , we then have T ( y ) CL ( K /κ ) log ∗ y − log ∗ σ . Proof.
The special case κ = 1, x = x is exactly [17, Prop. 8]. We indicatebriefly how the proof of [17, Prop. 8] must be modified to obtain this more generalstatement.The first two paragraphs of the proof of [17, Prop. 8] may be read verbatim. Inthe third paragraph, the inductive statement is changed to T ( y ) E · · · E j L ( K ⌈ j/κ ⌉ + · · · + K + 1) , where j := Φ ∗ σ ( y ). The inductive step is modified slightly: for 0 < j < κ we usethe fact that y i σ , and for j > κ the fact that Φ ∗ σ ( y i ) Φ ∗ σ (Φ ◦ κ ( y )) = Φ ∗ σ ( y ) − κ .With these changes, the proof given in [17] goes through without difficulty. (cid:3) Discrete Fourier transforms.
Let n > R be a commutative ringin which n is invertible. A principal n -th root of unity is an element ω ∈ R suchthat ω n = 1 and such that P n − j =0 ω ij = 0 for i = 1 , , . . . , n −
1. If m is a divisorof n , then ω n/m is easily seen to be a principal m -th root of unity.Fix a principal n -th root of unity ω . The discrete Fourier transform (DFT) of thesequence ( a , . . . , a n − ) ∈ R n with respect to ω is the sequence (ˆ a , . . . , ˆ a n − ) ∈ R n defined by ˆ a j := P n − i =0 ω ij a i . Equivalently, ˆ a j = A ( ω j ) where A = P n − i =0 a i X i ∈R [ X ] / ( X n − inverse DFT recovers ( a , . . . , a n − ) from (ˆ a , . . . , ˆ a n − ). Computationallyit corresponds to a DFT with respect to ω − , followed by a division by n , because1 n n − X j =0 ω − kj ˆ a j = 1 n n − X i =0 n − X j =0 ω ( i − k ) j a i = a k , k = 0 , . . . , n − . DFTs may be used to implement cyclic convolutions. Suppose that we wish tocompute C := AB where A, B ∈ R [ X ] / ( X n − A ( ω i ) and B ( ω i ) for i = 0 , . . . , n −
1. We then compute C ( ω i ) = A ( ω i ) B ( ω i ) foreach i , and finally perform an inverse DFT to recover C ∈ R [ X ] / ( X n − C := AB for A, B ∈ R [ X , . . . , X d ] / ( X n − , . . . , X n d d − . For this, we require that each n k be invertible in R , and that R contain a principal n k -th root of unity ω k for each k . Let n := n · · · n d . We first perform multidimen-sional DFTs to evaluate A and B at the n points { ( ω j , . . . , ω j d d ) : 0 j k < n k } .We then multiply pointwise, and finally recover C via a multidimensional inverseDFT. ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 7
Each multidimensional DFT may be reduced to a collection of one-dimensionalDFTs as follows. We first compute A ( X , . . . , X d − , ω jd ) ∈ R [ X , . . . , X d − ] foreach j = 0 , . . . , n d −
1; this involves n/n d DFTs of length n d . We then recursivelyevaluate each of these polynomials at the n/n d points ( ω j , . . . , ω j d − d − ). Altogether,this strategy involves computing n/n k DFTs of length n k for each k = 1 , . . . , d .Finally, we briefly recall Bluestein’s method [5] for reducing a (one-dimensional)DFT to a convolution problem (see also [17, § n > ω ∈ R be a principal n -th root of unity. Set ξ := ω ( n +1) / , so that ξ = ω and ξ n = 1.Then computing the DFT of a given sequence ( a , . . . , a n − ) ∈ R n with respectto ω reduces to computing the product of the polynomials f ( Z ) := n − X i =0 ξ i a i Z i , g ( Z ) := n − X i =0 ξ − i Z i in R [ Z ] / ( Z n − O ( n ) auxiliary multiplications in R . Notice that g ( Z ) isfixed and does not depend on the input sequence.2.3. The Crandall–Fagin algorithm.
Consider the problem of computing a‘cyclic’ integer product of length n , that is, a product uv where u, v ∈ Z / (2 n − Z .If N and P are positive integers such that N | n and lg P > n/N + lg N , thenwe may reduce the given problem to multiplication in ( Z /P Z )[ X ] / ( X N − n/N bits. In this section we briefly recall avariant [17, § without the assumption that N | n .Assume that N n and lg P > ⌈ n/N ⌉ + lg N + 1, and that we have availablesome θ ∈ Z /P Z with θ N = 2. (This θ plays the same role as the real N -throot of 2 in the original Crandall–Fagin algorithm.) Set e i := ⌈ ni/N ⌉ and c i := N e i − ni . Observe that e i +1 − e i = ⌊ n/N ⌋ or ⌈ n/N ⌉ for each i . Decomposethe inputs as u = P N − i =0 e i u i and v = P N − i =0 e i v i where 0 u i , v i < e i +1 − e i (i.e., a decomposition with respect to a ‘variable base’). Set U ( X ) := P N − i =0 θ c i u i and V ( X ) := P N − i =0 θ c i v i , regarded as polynomials in ( Z /P Z )[ X ] / ( X N − W ( X ) := U ( X ) V ( X ). Then one finds (see [17, § uv may be recovered by the formula uv = P N − i =0 e i w i (mod 2 n − w i are integers in [0 , P ) defined by W ( X ) = P N − i =0 θ c i w i .To summarise, the problem of computing uv reduces to computing a productin ( Z /P Z )[ X ] / ( X N − O ( N ) auxiliary multiplications in Z /P Z ,and O ( N (lg n ) + N lg P ) bit operations to compute the e i and to handle the finaloverlap-add phase (again, see [17, § Data layout.
In this section we discuss several issues relating to the layoutof data on the Turing machine tapes.Integers will always be stored in the standard binary representation. If n is apositive integer, then elements of Z /n Z will always be stored as residues in therange 0 x < n , occupying lg n bits of storage.If R is a ring and f ∈ R [ X ] is a polynomial of degree n >
1, then an elementof R [ X ] /f ( X ) will always be represented as a sequence of n coefficients in thestandard monomial order. This convention is applied recursively, so for rings of thetype ( R [ Y ] /f ( Y ))[ X ] /g ( X ), the coefficient of X is stored first, as an element of R [ Y ] /f ( Y ), followed by the coefficient of X , and so on. DAVID HARVEY AND JORIS VAN DER HOEVEN
A multidimensional array of size n d × · · · × n , whose entries occupy b bitseach, will be stored as a linear array of bn · · · n d bits. The entries are orderedlexicographically in the order (0 , . . . , , , (0 , . . . , , , . . . , ( n d − , . . . , n − · · · ( R [ X ] /f ( X )) · · · )[ X d ] /f d ( X d ) is represented as an n d × · · · × n array of elements of R . We will generally prefer the more compactnotation R [ X , . . . , X d ] / ( f ( X ) , . . . , f d ( X d )).There are many instances where an n × m array must be transposed so that its en-tries can be accessed efficiently either ‘by columns’ or ‘by rows’. Using the algorithmof [6, Lemma 18], such a transposition may be achieved in O ( bnm lg min( n, m )) bitoperations, where b is the bit size of each entry. (The idea of the algorithm is to splitthe array in half along the short dimension, and transpose each half recursively.)One important application is the following result, which estimates the data re-arrangement cost associated to the the Agarwal–Cooley method [2] for convertingbetween one-dimensional and multidimensional convolution problems (this is closelyrelated to the Good–Thomas DFT algorithm [13, 26]). Lemma 2.2.
Let n, m > be relatively prime, and let R be a ring whose elementsare represented using b bits. There exists an isomorphism R [ X ] / ( X nm − ∼ = R [ Y, Z ] / ( Y n − , Z m − that may be evaluated in either direction in O ( bnm lg min( n, m )) bit operations.Proof. Let c := m − mod n , and let β : R [ X ] / ( X nm − → R [ Y, Z ] / ( Y n − , Z m − X to Y c Z , and acts as the identity on R . Sup-pose that we wish to compute β ( F ) for some input polynomial F = P nm − k =0 F k X k ∈R [ X ] / ( X nm − F , . . . , F nm − ) as an n × m array, the( i, j )-th entry corresponds to F im + j . After transposing the array, which costs O ( bnm lg min( n, m )) bit operations, we have an m × n array, whose ( j, i )-th en-try is F im + j . Now for each j , cyclically permute the j -th row by ( jc mod n )slots; altogether this uses only O ( bnm ) bit operations. The result is an m × n array whose ( j, i )-th entry is F ( i − jc mod n ) m + j , which is exactly the coefficient of Y (( i − jc ) m + j ) c Z ( i − jc ) m + j = Y i Z j in β ( F ). The inverse map β − may be computedby reversing this procedure. (cid:3) Corollary 2.3.
Let n , . . . , n d > be pairwise relatively prime, let n := n · · · n d ,and let R be a ring whose elements are represented using b bits. There exists anisomorphism R [ X ] / ( X n − ∼ = R [ X , . . . , X d ] / ( X n − , . . . , X n d d − that may be evaluated in either direction in O ( bn lg n ) bit operations.Proof. Using Lemma 2.2, we may construct a sequence of isomorphisms R [ X ] / ( X n ··· n d − ∼ = R [ X , W ] / ( X n − , W n ··· n d − ∼ = R [ X , X , W ] / ( X n − , X n − , W n ··· n d − · · ·∼ = R [ X , . . . , X d ] / ( X n − , . . . , X n d d − , the i -th of which may be computed in O ( bn lg n i ) bit operations. The overall costis O ( P i bn lg n i ) = O ( bn lg n ) bit operations. (cid:3) ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 9 Cyclotomic coefficient rings
The aim of this section is to construct certain coefficient rings that play a centralrole in the multiplication algorithms described later. The basic idea is as follows.Suppose that we want to multiply two polynomials in F p [ X ], and that the degreeof the product is known to be at most n . If N is an integer with N > n , then byappropriate zero-padding, we may embed the problem in F p [ X ] / ( X N − N = αm , where α and m are relatively prime,then there is an isomorphism F p [ X ] / ( X N − ∼ = F p [ Y, Z ] / ( Y α − , Z m − , and the latter ring is closely related to F p [ Y, Z ] / ( φ α ( Y ) , Z m − ∼ = ( F p [ Y ] /φ α )[ Z ] / ( Z m − φ α ( Y ) denotes the α -th cyclotomic polynomial). In particular, com-puting the product in ( F p [ Y ] /φ α )[ Z ] / ( Z m −
1) recovers ‘most’ of the informationabout the product in F p [ X ] / ( X N − N , α and m with the following properties:(1) N is not much larger than n , so that not too much space is ‘wasted’ in theinitial zero-padding step;(2) ϕ ( α ) (= deg φ α ) is not much smaller than α , so that we do not lose muchinformation by working modulo φ α ( Y ) instead of modulo Y α − F p [ Y ] /φ α contains a principal m -th root of unity, so thatwe can multiply in ( F p [ Y ] /φ α )[ Z ] / ( Z m −
1) efficiently by means of DFTsover F p [ Y ] /φ α ;(4) m is a product of many integers that are exponentially smaller than n , sothat the DFTs of length m may be reduced to many small DFTs; and(5) α is itself exponentially smaller than n .The last two items ensure that the small DFTs can be converted to multiplicationproblems of degree exponentially smaller than n , to allow the recursion to proceed. Definition 3.1. An admissible tuple is a sequence ( q , q , . . . , q e ) of distinct primes( e >
1) satisfying the following conditions. First,(lg N ) < q i < (lg lg N ) , i = 0 , . . . , e, (3.1)where N := q · · · q e . Second, q i − i = 1 , . . . , e , and λ ( q , . . . , q e ) := LCM( q − , . . . , q e − < (lg lg N ) . (3.2)(Note that q − q does not participate in (3.2).)An admissible length is a positive integer N of the form N = q · · · q e where( q , . . . , q e ) is an admissible tuple.If N is an admissible length, we treat ( q , . . . , q e ) and λ ( N ) := λ ( q , . . . , q e ) asauxiliary data attached to N . For example, if an algorithm takes N as input, weimplicitly assume that this auxiliary data is also supplied as part of the input. Example 3.2.
For n = 10 , there is a nearby admissible length N = 1000000000000000000156121 . . . (99971 digits omitted) . . . q q · · · q where q = 206658761261792645783 ,q = 36658226833235899 = 1 + 2 · · · · · · · · · · · ,q = 36658244723486119 = 1 + 2 · · · · · · · · · · ,q = 36658319675739343 = 1 + 2 · · · · · · · · · · · ,q = 36658428883190467 = 1 + 2 · · · · · · · · · · , · · · q = 37076481100386859 = 1 + 2 · · · · · · · · · · λ ( N ) = 2 · · · · ·
113 = 31610054640417607788145206291543662493274686990 . Definition 3.3.
Let p be a prime. An admissible length N is called p -admissible if N > p and p ∤ N (i.e., p is distinct from q , . . . , q e ).The following result explains how to choose a p -admissible length close to anyprescribed target. Proposition 3.4.
There is an absolute constant z > with the following prop-erty. Given as input a prime p and an integer n > max( z , p ) , in O ((lg lg n ) ) bitoperations we may compute a p -admissible length N in the interval n < N < (cid:18) n (cid:19) n. (3.3)The key ingredient in the proof is the following number-theoretic result of Adle-man, Pomerance and Rumely. Lemma 3.5 ([1, Prop. 10]) . There is an absolute constant C > with the followingproperty. For all x > , there exists a positive squarefree integer λ < x such that X q prime q − | λ > exp( C log x/ log log x ) . Proof of Proposition 3.4.
Let λ max := ⌈ (lg lg n ) ⌉ , and for λ > f ( λ ) to bethe number of primes q in the interval (lg n ) < q λ max + 1 such that q − | λ and q = p . We claim that, provided n is large enough, there exists some squarefree λ ∈ { , . . . , λ max } such that f ( λ ) > lg n . To see this, apply Lemma 3.5 with x := 2 (lg lg n ) ; for large n we then have C log x/ log log x > x ) / = 5 lg lg n, so Lemma 3.5 implies that there exists a positive squarefree integer λ < x λ max for which X q prime q − | λ > exp(5 lg lg n ) > (lg n ) and hence f ( λ ) = X (lg n ) (lg n ) − (lg n ) − > lg n. ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 11
We may locate one such λ by means of the following algorithm (adapted fromthe proof of [18, Lemma 4.5]). First use a sieve to enumerate the primes q inthe interval (lg n ) < q λ max + 1, and to determine which λ = 1 , . . . , λ max aresquarefree, in ( λ max ) o (1) bit operations. Now initialise an array of integers c λ := 0for λ = 1 , . . . , λ max . For each q = p , scan through the array, incrementing those c λ for which λ is squarefree and divisible by q −
1, and stop as soon as one of the c λ reaches lg n . We need only allocate O (lg lg n ) bits per array entry, so each passthrough the array costs O ( λ max lg lg n ) bit operations. The number of passes is O ( λ max ), so the total cost of finding a suitable λ is O ( λ lg lg n ) = 2 O ((lg lg n ) ) bit operations. Within the same time bound, we may also easily recover a list ofprimes q , q , . . . , q lg n for which q i − | λ .Next, compute the partial products q , q q , . . . , q q · · · q lg n , and determine thesmallest integer e > q · · · q e > n/ (lg lg n ) . Such an e certainly exists,as q · · · q lg n > lg n > n . Since each q i occupies O ((lg lg n ) ) bits, this can all bedone in (lg n ) O (1) bit operations. Also, as q e λ + 1 (lg lg n ) + 1 < (lg lg n ) and q · · · q e − n/ (lg lg n ) , we find that2 (lg lg n ) < nq · · · q e < (lg lg n ) for large n .Let q be the least prime that exceeds n/ ( q · · · q e ) and that is distinct from p .According to [4], the interval [ x − x . , x ] contains at least one prime for allsufficiently large x ; therefore q < nq · · · q e + (cid:18) nq · · · q e (cid:19) . < (cid:16) (lg lg n ) ) − . (cid:17) nq · · · q e < (cid:18) n (cid:19) nq · · · q e for n sufficiently large. We may find q in 2 O ((lg lg n ) ) bit operations, by using trialdivision to test successive integers for primality.Set N := q q · · · q e . Then (3.3) holds, and certainly N > p and p ∤ N . Let uscheck that ( q , . . . , q e ) is admissible, provided n is large enough. For i = 1 , . . . , e we have (lg N ) < (lg n ) < q i λ + 1 < (lg lg n ) < (lg lg N ) , and also (lg N ) < (lg lg n ) < q < (cid:18) n (cid:19) (lg lg n ) < (lg lg N ) ;this establishes (3.1). Also, as q > (lg lg n ) > q i for i = 1 , . . . , e , we see that q is distinct from q , . . . , q e . Finally, (3.2) holds becauseLCM( q − , . . . , q e − | λ (lg lg n ) < (lg lg N ) . This also shows that we may compute the auxiliary data λ ( q , . . . , q e ) in 2 O ((lg lg n ) ) bit operations. (cid:3) Remark . Example 3.2 was constructed by enumerating the smallest primes q , q , . . . exceeding (lg n ) for which q i − | · · · · · n , and then choosing q to make N as close to n as possible.The proof of Proposition 3.4 goes a different way: rather than choosing λ first,the proof constructs q , . . . , q e and λ simultaneously. In particular, one cannotguarantee that λ will be a product of an initial segment of primes, as occurredin the example. Indeed, the proof of [1, Prop. 10] (and of its predecessor [21])yields very little information at all about the prime factorisation of λ . For furtherdiscussion, see [1, Remark 6.2]. Definition 3.7.
Let p be a prime and let N = q · · · q e be a p -admissible length.A p -admissible divisor of N is a positive divisor α of N , with q | α , such that thering F p [ Y ] /φ α ( Y ) contains a principal ( q · · · q e )-th root of unity, and such thatlg N < α < (lg lg N ) (3.4)and ϕ ( α ) > (cid:18) − N (cid:19) α. (3.5)The next result shows how to construct a p -admissible divisor for any sufficientlylarge p -admissible length N . The idea behind the construction is as follows. Letord n p denote the order of p in the multiplicative group of integers modulo n . Forany α >
1, not divisible by p , the ring F p [ Y ] /φ α ( Y ) is a direct sum of fields oforder p r , where r = ord α p [27, Lemma 14.50]. Our goal is to ensure that p r − q · · · q e , so that F p [ Y ] /φ α ( Y ) contains the desired principal root ofunity. One way to force q i to divide p r − α divisible by q i ,as this implies that ord q i p | r . The difficulty is that we cannot do this for all q i ,because then α would become too large, violating (3.4). Fortunately, we can takeadvantage of the fact that the q i − λ = λ ( N ); thisenables us to take α to be a product of a small subset of the q i , in such a way thatstill every one of q , . . . , q e divides p r − Proposition 3.8.
There is an absolute constant z > with the following property.Given as input a prime p and a p -admissible length N > z , we may compute a p -admissible divisor α of N , together with the cyclotomic polynomial φ α ∈ F p [ Y ] and a principal ( q · · · q e ) -th root of unity in F p [ Y ] /φ α , in O ((lg lg N ) ) p o (1) bitoperations.Proof. We are given as input an admissible tuple ( q , . . . , q e ) with N = q · · · q e ,and the squarefree integer λ := λ ( q , . . . , q e ). Let L be the set of primes dividing λ .By (3.2) we have |L| log λ < (lg lg N ) , and we may compute L in λ O (1) =2 O ((lg lg N ) ) bit operations.We start by computing a table of values of ord q i p for i = 1 , . . . , e ; note that p = q i by hypothesis, so ord q i p is well-defined. We have q i − | λ and henceord q i p | λ for each i . To compute ord q i p , we first compute p mod q i in O (lg q i lg p )bit operations, and then repeatedly multiply by p modulo q i until reaching 1. Sinceord q i p λ , and there are e = O (lg N ) primes q i , the total cost to compute thetable is O (( λ lg q i + lg q i lg p ) lg N ) = (2 (lg lg N ) lg p ) O (1) bit operations. ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 13
Using the above table, we construct a certain vector σ = ( σ , . . . , σ e ) ∈ { , } e as follows. Initialise the vector as σ := (0 , . . . , ℓ ∈ L , search for thesmallest i = 1 , . . . , e such that ℓ | ord q i p . If such an i is found, set σ i := 1; if no i is found, ignore this ℓ . The cost of computing σ is O ( |L| e (lg λ ) ) = (lg N ) O (1) bitoperations.Set α := q Q i : σ i =1 q i . To establish (3.4), note that the number of i for which σ i = 1 is at most |L| , so (3.1) implies thatlg N < q α < (2 (lg lg N ) ) |L| +1 (2 (lg lg N ) ) (lg lg N ) = 2 (lg lg N ) . For (3.5), first observe that ϕ ( α ) α = (cid:18) − q (cid:19) Y i : σ i =1 (cid:18) − q i (cid:19) > (cid:18) − N ) (cid:19) (lg lg N ) . Since − log(1 − ε ) < ε for any ε ∈ (0 , ), we obtain − log ϕ ( α ) α < N ) (lg N ) < N and hence ϕ ( α ) /α > exp( − / lg N ) > − / lg N for sufficiently large N .Now compute the cyclotomic polynomial φ α ∈ F p [ Y ] (i.e., the reduction mod-ulo p of φ α ( Y ) ∈ Z [ Y ]). This can be done in ( α lg p ) O (1) bit operations, using forexample [27, Algorithm 14.48]. We may then determine the factorisation of φ α into irreducibles in F p [ Y ], say φ α = f · · · f k , in α O (1) p / o (1) bit operations [24,Thm. 1]. Since p ∤ α , the f j are distinct, and each f j has degree r := ord α p [27,Lemma 14.50]. In other words, F p [ Y ] /φ α is isomorphic to a direct sum of k copiesof F p r .We claim that q h | p r − h = 1 , . . . , e . For this, it suffices to prove thatord q h p | r for each h . Since λ is squarefree, it suffices in turn to show that everyprime ℓ dividing ord q h p also divides r . But for every such ℓ , the procedure forconstructing σ must have succeeded in finding some i for which ℓ | ord q i p (since atleast one value of i works, namely i = h ). Then σ i = 1 for this i , so q i | α . Thisimplies that ord q i p | ord α p = r , and hence that ℓ | r .We conclude that q · · · q e | p r −
1, so each F p [ Y ] /f j contains a primitive rootof unity of order q · · · q e . As the factorisation of q · · · q e is known, we may locateone such primitive root in each F p [ Y ] /f j in α O (1) p o (1) bit operations [25] (seealso [17, Lemma 3.3]). Combining these primitive roots via the Chinese remaindertheorem, we obtain the desired principal ( q · · · q e )-th root of unity in F p [ Y ] /φ α inanother ( α lg p ) O (1) bit operations. (cid:3) Remark . The p o (1) term in Proposition 3.8 arises from the best known deter-ministic complexity bounds for factoring polynomials and finding primitive roots.If we permit randomised algorithms, then p o (1) may be replaced by (lg p ) O (1) .This has no effect on the main results of this paper. Example 3.10.
Continuing with Example 3.2, let us take p = 3. In the notationof the proof of Proposition 3.8, we have L = { , , , . . . , } . For each ℓ ∈ L , let us write q ( ℓ ) for the smallest q i for which ord q i ℓ . Then we have q (2) = q , q (3) = q , q (5) = q , q (7) = q , q (11) = q ,q (13) = q , q (17) = q , q (19) = q , q (23) = q , q (29) = q ,q (31) = q , q (37) = q , q (41) = q , q (43) = q , q (47) = q ,q (53) = q , q (59) = q , q (61) = q , q (67) = q , q (71) = q ,q (73) = q , q (79) = q , q (83) = q , q (89) = q , q (97) = q ,q (101) = q , q (103) = q , q (107) = q , q (109) = q , q (113) = q . Therefore σ i = 1 for i = 1 , , , , , ,
9, and we have α = q q q q q q q q ≈ . × ,ϕ ( α ) ≈ . × ,r = ord α · · · · · · · · · · . The ring F [ Y ] /φ α is isomorphic to a direct sum of ϕ ( α ) /r copies of F r . Theextraneous factors in r (namely 883, 9041, 327251 and 39551747) arise from theauxiliary prime q . Let m := N/α = q q q q · · · q ≈ . × ;then since m | q · · · q e | r −
1, each copy of F r contains a primitive m -th root ofunity, so F [ Y ] /φ α contains a principal m -th root of unity. Thus it is possible tomultiply in the ring F [ Y, Z ] / ( φ α ( Y ) , Z m −
1) by using DFTs over F [ Y ] /φ α . Remark . In Example 3.10, every ℓ ∈ L divides ord q i p for some i . It seemslikely that this always occurs (at least for large n ), but we do not know how to provethis. If it fails for some ℓ , then r may turn out not to be divisible by ℓ , but theproof of Proposition 3.8 shows that we still have q h | p r − h = 1 , . . . , e .4. Faster polynomial multiplication
The goal of this section is to prove Theorem 1.2. We assume that K Z > M ( n ) = O ( n lg n K log ∗ n Z ).We will describe a recursive routine PolynomialMultiply , that takes as inputintegers r, t >
1, a prime p , and polynomials U , . . . , U t , V ∈ F p [ X ] / ( X r − U V, . . . , U t V . Its running time is denoted by C poly ( t, r, p ).Note that the input polynomials U , . . . , U t , V are expected to be supplied consecu-tively on the input tape (first U , then U , and so on), and the outputs U V, . . . , U t V should also be written consecutively to the output tape.The role of the parameter t is to allow us to amortise the cost of transformingthe fixed operand V across t products. This optimisation (borrowed from [17] and[18]) saves a constant factor in time at each recursion level of the main algorithm.Altogether the algorithm will perform 2 t + 1 transforms: t + 1 forward transformsfor U , . . . , U t and V , followed by t inverse transforms to recover the products U V, . . . , U t V . ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 15
To simplify the analysis, it is convenient to introduce the normalisation C ⋆ poly ( r, p ) := sup t > C poly ( t, r, p )(2 t + 1) r lg p lg( r lg p ) . We certainly have M p ( n ) < C poly (1 , n, p ) + O ( n lg p ), so to prove Theorem 1.2 itis enough to show that C ⋆ poly ( r, p ) = O (4 max(0 , log ∗ r − log ∗ p ) K log ∗ p Z ) . (4.1)The algorithms presented in this section perform many auxiliary multiplicationsand divisions involving ‘small’ integers and polynomials. We assume that all aux-iliary divisions are reduced to multiplication via Newton’s method [27, Ch. 9], sothat the cost of a division (by a monic divisor) is at most a constant multiple of thecost of a multiplication of the same bit size. We also assume that, unless otherwisespecified, all auxiliary multiplications are handled using the integer and polyno-mial variants of the Sch¨onhage–Strassen algorithm, whose complexities are givenby (1.1) and (1.4).We first discuss a subroutine Transform that handles DFTs over rings of theform R p,α := F p [ Y ] /φ α ( Y ), where p is a prime and α >
1. It takes as input p and α ,positive integers t and n such that n is odd and relatively prime to α , a principal n -th root of unity ω ∈ R p,α , and t input sequences ( a s, , . . . , a s,n − ) ∈ R np,α for s = 1 , . . . , t . Its output is the sequence of transforms (ˆ a s, , . . . , ˆ a s,n − ) ∈ R np,α withrespect to ω , for s = 1 , . . . , t . Just like PolynomialMultiply , the input andoutput sequences are stored consecutively on the tape.Let T ( t, n, α, p ) denote the running time of Transform . The following resultshows how to reduce the DFT problem to an instance of
PolynomialMultiply . Proposition 4.1.
We have T ( t, n, α, p ) < C poly ( t, nα, p ) + O ( tnα lg α lg lg α lg p lg lg p lg lg lg p ) . Proof.
Let R := R p,α . We use Bluestein’s method to reduce each DFT to theproblem of computing a certain product f s ( Z ) g ( Z ) in R [ Z ] / ( Z n − O ( n )multiplications in R , where f s ( Z ) and g ( Z ) are defined as in Section 2.2. By (1.1)and (1.4), each multiplication in R costs O (( α lg α lg lg α )(lg p lg lg p lg lg lg p )) (4.2)bit operations. To handle the products f s ( Z ) g ( Z ), we first lift the polynomials from F p [ Y, Z ] / ( φ α ( Y ) , Z n −
1) to F p [ Y, Z ] / ( Y α − , Z n −
1) (for example, by zero-paddingin Y up to degree α ). We then compute their images under the isomorphism F p [ Y, Z ] / ( Y α − , Z n − ∼ = F p [ X ] / ( X nα − O ( tnα lg α lg p ) bit operations. Wecall PolynomialMultiply to compute the products in F p [ X ] / ( X nα − C poly ( t, nα, p ) bit operations. We evaluate the inverse of the above isomorphismto bring the products back to F p [ Y, Z ] / ( Y α − , Z n − φ α ( Y ) to obtain the desired products in R [ Z ] / ( Z n − (cid:3) We now return to multiplication in F p [ X ] / ( X r − PolynomialMultiply chooses one of two algorithms, depending on the size of r relative to p . For r p it uses the straightforward Kronecker substitution methoddescribed in Section 1. By (1.3) this yields the bound C poly ( t, r, p ) = O ( t M ( r lg p )) = O ( tr lg p lg( r lg p ) K log ∗ ( r lg p ) Z )and hence C ⋆ poly ( r, p ) = O ( K log ∗ ( p lg p ) Z ) = O ( K log ∗ p Z ) , r p . (4.3)Therefore (4.1) holds in this case.For r > p , most of the work will be delegated to a subroutine AdmissibleMul-tiply , which is defined as follows. It takes as input an integer t >
1, a prime p ,a p -admissible length N , and polynomials U , . . . , U t , V ∈ F p [ X ] / ( X N − U V, . . . , U t V . In other words, it has the same interface asas PolynomialMultiply , but it only works for p -admissible lengths. We denoteits running time by C ad ( t, N, p ). As above we also define the normalisation C ⋆ ad ( N, p ) := sup t > C ad ( t, N, p )(2 t + 1) N lg p lg( N lg p ) . The reduction from
PolynomialMultiply to AdmissibleMultiply in the case r > p is given by the following proposition. Proposition 4.2.
There is an absolute constant z > with the following prop-erty. For any prime p and any integer r > max( z , p ) , there exists a p -admissiblelength N in the interval r < N < (cid:18) r (cid:19) r (4.4) such that C ⋆ poly ( r, p ) < (cid:18) O (1)lg r (cid:19) C ⋆ ad ( N, p ) + O (1) . (4.5) Proof.
Given as input U , . . . , U t , V ∈ F p [ X ] / ( X r − U V, . . . , U t V . For sufficiently large r we may apply Proposition 3.4 with n := 2 r to find a p -admissible length N such that (4.4) holds. Since N > r , wemay simply zero-pad to reduce each problem to multiplication in F p [ X ] / ( X N − C poly ( t, r, p ) < C ad ( t, N, p ) + O ( tr lg p ) + 2 O ((lg lg r ) ) , where the tr lg p term arises from the reduction modulo X r −
1, and the last termfrom Proposition 3.4. Dividing by (2 t + 1) r lg p lg( r lg p ) and taking suprema over t >
1, we find that C ⋆ poly ( r, p ) < N lg( N lg p ) r lg( r lg p ) C ⋆ ad ( N, p ) + O (1) . Finally, since lg( N lg p ) lg( r lg p ) + 2 we obtain N lg( N lg p ) r lg( r lg p ) < (cid:18) r (cid:19) (cid:18) r lg p ) (cid:19) < O (1)lg r . (cid:3) The motivation for defining admissible lengths is the following result, whichshows how to implement
AdmissibleMultiply in terms of a large collection ofexponentially smaller instances of
PolynomialMultiply . ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 17
Proposition 4.3.
There is an absolute constant z > with the following property.Let p be a prime and let N > z be a p -admissible length. Then there exist integers r , . . . , r d in the interval (lg lg N ) < r i < (lg lg N ) , (4.6) and weights γ , . . . , γ d > with P i γ i = 1 , such that C ⋆ ad ( N, p ) < (cid:18) O (1)lg lg N (cid:19) d X i =1 γ i C ⋆ poly ( r i , p ) + O (1) . (4.7) Proof.
We are given as input a prime p , a p -admissible length N = q · · · q e andpolynomials U , . . . , U t , V ∈ F p [ X ] / ( X N − U V, . . . , U t V . We will describe a series of reductions that converts this problemto a collection of exponentially smaller multiplication problems, plus overhead of O ( tN lg N lg p ) bit operations incurred during the reductions. Step 1 — reduce to products over cyclotomic coefficient ring.
Invoking Propo-sition 3.8, we compute a p -admissible divisor α of N , the cyclotomic polynomial φ α ∈ F p [ Y ], and a principal ( q · · · q e )-th root of unity ω ∈ F p [ Y ] /φ α . As p < N ,this requires at most 2 O ((lg lg N ) ) p o (1) < N / o (1) bit operations.Set ψ α := ( Y α − /φ α ∈ F p [ Y ]. Since Y α − F p [ Y ], we have ( φ α , ψ α ) = 1. Using the Euclidean algorithm, compute polynomials χ , χ ∈ F p [ Y ] of degree at most α such that χ φ α + χ ψ α = 1; this costs at most( α lg p ) O (1) < N o (1) bit operations.Let m := N/α . As m and α are coprime, Lemma 2.2 provides an isomorphism F p [ X ] / ( X N − ∼ = F p [ Y, Z ] / ( Y α − , Z m − O ( mα lg α lg p ) bit operations. By (3.4)this simplifies to O ( N (lg lg N ) lg p ) = O ( N lg N lg p ) bit operations. Next, since( φ α , ψ α ) = 1, there is an isomorphism F p [ Y ] / ( Y α − ∼ = ( F p [ Y ] /φ α ) ⊕ ( F p [ Y ] /ψ α ) . Using the precomputed polynomials χ and χ , we may evaluate the above isomor-phism in either direction in O (( α lg α lg lg α )(lg p lg lg p lg lg lg p )) = O ( α (lg lg N ) (lg lg lg N ) lg p )= O ( α lg N lg p )bit operations (here we have again used (3.4) and the fact that p < N ). Thisisomorphism induces another isomorphism F p [ Y ] / ( Y α − , Z m − ∼ = ( F p [ Y ] /φ α )[ Z ] / ( Z m − ⊕ ( F p [ Y ] /ψ α )[ Z ] / ( Z m − Z i separately; it may be evaluated in eitherdirection in O ( mα lg N lg p ) = O ( N lg N lg p ) bit operations. Chaining these iso-morphisms together, we obtain an isomorphism F p [ X ] / ( X N − ∼ = ( F p [ Y ] /φ α )[ Z ] / ( Z m − ⊕ ( F p [ Y ] /ψ α )[ Z ] / ( Z m − O ( N lg N lg p ) bit operations. We now use the following algorithm. First, at a cost of O ( tN lg N lg p ) bitoperations, apply the above isomorphism to U , . . . , U t and V to obtain polynomials U ′ , . . . , U ′ t , V ′ ∈ ( F p [ Y ] /φ α )[ Z ] / ( Z m − , ˜ U ′ , . . . , ˜ U ′ t , ˜ V ′ ∈ ( F p [ Y ] /ψ α )[ Z ] / ( Z m − . Second, compute the products ˜ U ′ ˜ V ′ , . . . , ˜ U ′ t ˜ V ′ : since deg ψ α < α/ lg N by (3.5),each of these products may be converted, via Kronecker substitution, to a productof univariate polynomials in F p [ X ] of degree O ( mα/ lg N ) = O ( N/ lg N ) (i.e., map Y to X and Z to X ψ α ). The cost of these multiplications is O ( t (( N/ lg N ) lg N lg lg N )(lg p lg lg p lg lg lg p )) = O ( tN lg N lg p )bit operations. Third, compute the products U ′ V ′ , . . . , U ′ t V ′ , using the methodexplained in Step 2 below. Finally, at a cost of O ( tN lg N lg p ) bit operations, applythe inverse isomorphism to the pairs ( U ′ s V ′ , ˜ U ′ s ˜ V ′ ) to obtain the desired products U V, . . . , U t V . Step 2 — convert to multidimensional convolutions.
Let R := F p [ Y ] /φ α . In thisstep our goal is to compute the products U ′ V ′ , . . . , U ′ t V ′ , where U ′ , . . . , U ′ t , V ′ ∈R [ Z ] / ( Z m − m d × · · · × m , for a suitable decomposition m = m · · · m d . Forthe subsequent complexity analysis, it is important that the m i are chosen to besomewhat larger than the coefficient size. To achieve this we proceed as follows.Let m = ℓ · · · ℓ u be the prime factorisation of m . The ℓ j form a subset of { q , . . . , q e } , so by (3.1) we have(lg N ) < ℓ j < (lg lg N ) (4.8)for each j . Let w := ⌊ (lg lg N ) ⌋ . We certainly have u > w for large enough N ,as (4.8) and (3.4) imply that u > log m (lg lg N ) = log N − log α (lg lg N ) > log N − (lg lg N ) (lg lg N ) ≫ (lg lg N ) . Therefore we may take m := ℓ · · · ℓ w ,m := ℓ w +1 · · · ℓ w , · · · m d − := ℓ ( d − w +1 · · · ℓ ( d − w ,m d := ℓ ( d − w +1 · · · ℓ dw ℓ dw +1 · · · ℓ u , where d := ⌊ u/w ⌋ >
1. Each m i is a product of exactly w primes, except possi-bly m d , which is a product of at least w and at most 2 w − N we have m i < (2 (lg lg N ) ) w (lg lg N ) (4.9)and m i > ((lg N ) ) w > (2 lg lg N − ) w > (lg lg N ) (4.10)for all i , and hence d log m (lg lg N ) lg N (lg lg N ) . (4.11) ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 19
Computing the decomposition m = m · · · m d requires no more than (lg N ) O (1) bitoperations.As the m i are pairwise relatively prime, Corollary 2.3 furnishes an isomorphism R [ Z ] / ( Z m − ∼ = R [ Z , . . . , Z d ] / ( Z m − , . . . , Z m d d − O (( m lg m )( α lg p )) = O ( N lg N lg p )bit operations. Therefore we may use the following algorithm. First, at a cost of O ( tN lg N lg p ) bit operations, compute the images U ′′ , . . . , U ′′ t , V ′′ ∈ R [ Z , . . . , Z d ] / ( Z m − , . . . , Z m d d − U ′ , . . . , U ′ t , V ′ under the above isomorphism. Next, as explained in Step 3 below,compute the products U ′′ V ′′ , . . . , U ′′ t V ′′ . Finally, apply the inverse isomorphism torecover the products U ′ V ′ , . . . , U ′ t V ′ ; again this costs O ( tN lg N lg p ) bit operations. Step 3 — reduce to DFTs over R . In this step our goal is to compute the prod-ucts U ′′ V ′′ , . . . , U ′′ t V ′′ , where U ′′ , . . . , U ′′ t and V ′′ are as above. Let ω i := ω q ··· q e /m i for i = 1 , . . . , d , where ω is the principal ( q · · · q e )-th root of unity in R computedin Step 1. According to the discussion in Section 2.2, the desired multidimensionalconvolutions may be computed by performing t +1 multidimensional m -point DFTswith respect to the evaluation points ( ω j , . . . , ω j d d ), followed by tm pointwise multi-plications in R , and then t multidimensional m -point inverse DFTs and tm divisionsby m . The total cost of the pointwise multiplications and divisions is O ( tm ( α lg α lg lg α )(lg p lg lg p lg lg lg p )) = O ( tN lg N lg p )bit operations.Each of the 2 t + 1 multidimensional DFTs may be converted to a collectionof one-dimensional DFTs of lengths m , . . . , m d by the method explained in Sec-tion 2.2. Note that the inputs must be rearranged so that the data to transformalong each dimension may be accessed sequentially. Let 1 i d , and consider thetransforms of length m i . Treating each input vector as a sequence of m i +1 · · · m d arrays of size m i × ( m · · · m i − ), we must transpose each array into an array of size( m · · · m i − ) × m i , perform m/m i DFTs of length m i , and then transpose back tothe original ordering. The total cost of all these transpositions is O ( tmα lg p P i lg m i ) = O ( tN lg p lg m ) = O ( tN lg N lg p )bit operations.The one-dimensional DFTs over R are handled by the Transform subroutine.Combining the contributions from Steps 1, 2 and 3 shows that C ad ( t, N, p ) < (2 t + 1) d X i =1 T (cid:16) mm i , m i , α, p (cid:17) + O ( tN lg N lg p ) . This concludes the description of the algorithm; it remains to establish the overallcomplexity claim. First, Proposition 4.1 yields d X i =1 T (cid:16) mm i , m i , α, p (cid:17) < d X i =1 C poly (cid:16) mm i , m i α, p (cid:17) + O ( dmα lg α lg lg α lg p lg lg p lg lg lg p ) . By (4.11), the last term lies in O ( dN (lg lg N ) (lg lg lg N ) lg p ) = O ( N lg N lg p ) . Setting r i := m i α for i = 1 , . . . , d , we obtain C ad ( t, N, p ) < (2 t + 1) d X i =1 C poly (cid:16) Nr i , r i , p (cid:17) + O ( tN lg N lg p ) . Notice that (4.6) follows immediately from (4.9), (4.10) and (3.4) (for large N ).For the normalised quantities, we have C ⋆ ad ( N, p ) < d X i =1 C poly (cid:0) Nr i , r i , p (cid:1) N lg p lg( N lg p ) + O (1) < d X i =1 (cid:16) Nr i + 1 (cid:17) r i lg( r i lg p ) N lg( N lg p ) C ⋆ poly ( r i , p ) + O (1) . Now observe thatlg( r i lg p )lg( N lg p ) < log m i + log α + lg lg p + O (1)log N < log m i + O ((lg lg N ) )log m . Put γ i := log m i / log m , so that P i γ i = 1. Then (4.10) implies thatlg( r i lg p )lg( N lg p ) < (cid:18) O ((lg lg N ) )log m i (cid:19) γ i < (cid:18) O (1)lg lg N (cid:19) γ i . Moreover, from (4.6) we certainly have (cid:16) Nr i + 1 (cid:17) r i N = 2 + r i N < O (1)lg lg N .
The desired bound (4.7) follows immediately. (cid:3)
Combining Proposition 4.2 and Proposition 4.3, we obtain the following recur-rence inequality for C ⋆ poly ( r, p ). (This is identical to Theorem 7.1 of [18], but withthe constant 8 replaced by 4.) Proposition 4.4.
There are absolute constants z , C , C > and a logarithmicallyslow function Φ : ( z , ∞ ) → R with the following property. For any prime p andany integer r > max( z , p ) , there exist positive integers r , . . . , r d < Φ( r ) , andweights γ , . . . , γ d > with P i γ i = 1 , such that C ⋆ poly ( r, p ) < (cid:18) C lg lg r (cid:19) d X i =1 γ i C ⋆ poly ( r i , p ) + C . (4.12) Proof.
We first apply Proposition 4.2 to construct a p -admissible length N suchthat (4.4) and (4.5) both hold; then we apply Proposition 4.3 to construct inte-gers r , . . . , r d and weights γ , . . . , γ d satisfying (4.6) and (4.7). Define Φ( x ) :=2 (log log x ) ; then certainly r i < (lg lg 3 r ) < Φ( r ) for large r . The bound (4.12)follows immediately by substituting (4.7) into (4.5). (cid:3) Now we may prove our main result for multiplication in F p [ X ]. The proof is verysimilar to that of [18, Thm. 1.1]. Proof of Theorem 1.2.
We have already noted that C ⋆ poly ( r, p ) = O ( K log ∗ p Z ) in theregion r p (see (4.3)). To handle the case r > p , let z , C , C and Φ( x ) be as ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 21 in Proposition 4.4. Increasing z if necessary, we may assume that z > exp(exp(1))and that Φ( x ) x − x > z . For each prime p , set σ p := max( z , p ) and L p := max( C , max r σ p C ⋆ poly ( r, p )) = O ( K log ∗ p Z ) . Now apply Proposition 2.1 with K = 4, B = C / S = { , , . . . } , ℓ = 2, κ = 1, x = x = z , σ = σ p , L = L p , and T ( r ) = C ⋆ poly ( r, p ). The first part of therecurrence for T ( y ) is satisfied due to the definition of L p , and the second part dueto Proposition 4.4. We conclude that C ⋆ poly ( r, p ) = O ( L p log ∗ r − log ∗ σ p ) for r > p .Since log ∗ σ p = log ∗ p + O (1) and L p = O ( K log ∗ p Z ), we obtain the desired bound C ⋆ poly ( r, p ) = O (4 log ∗ r − log ∗ p K log ∗ p Z ) for r > p . (cid:3) Faster integer multiplication
The goal of this section is to prove Theorem 1.1. We will describe a recursiveroutine
IntegerMultiply , that takes as input positive integers n and t , and in-tegers u , . . . , u t , v ∈ Z / (2 n − Z , and computes the products u v, . . . , u t v . Wedenote its running time by C int ( t, n ). As in Section 4, it is convenient to define thenormalisation C ⋆ int ( n ) := sup t > C int ( t, n )(2 t + 1) n lg n . We certainly have M ( n ) < C int (1 , n ) + O ( n ), so to prove Theorem 1.1 it is enoughto prove that C ⋆ int ( n ) = O ((4 √ log ∗ n ) . (5.1)We begin by revisiting the polynomial multiplication algorithm from Section 4.Recall that to handle a multiplication problem in F p [ X ] / ( X r −
1) for r p , weused Kronecker substitution to convert it to an integer multiplication problem ofsize O ( r lg p ) (see (4.3)). This approach is suboptimal because it ignores the cyclicstructure of F p [ X ] / ( X r − RefinedPolynomialMul-tiply and
RefinedAdmissibleMultiply . They have exactly the same inter-face as
PolynomialMultiply and
AdmissibleMultiply . Their running timesare denoted by ˜ C poly ( t, r, p ) and ˜ C ad ( t, N, p ), with corresponding normalisations˜ C ⋆ poly ( r, p ) and ˜ C ⋆ ad ( N, p ). The implementation of
RefinedAdmissibleMultiply is exactly the same as
AdmissibleMultiply , except that calls to
Polynomial-Multiply are replaced by calls to
RefinedPolynomialMultiply . Similarly, theimplementation of
RefinedPolynomialMultiply for r > p is exactly the sameas PolynomialMultiply , except that calls to
AdmissibleMultiply are replacedby calls to
RefinedAdmissibleMultiply . Therefore these routines satisfy ana-logues of Proposition 4.2 and Proposition 4.3, with C ⋆ poly and C ⋆ ad replaced by ˜ C ⋆ poly and ˜ C ⋆ ad .Where the new routines differ is in the implementation of RefinedPolyno-mialMultiply for the case r p , which is described in the proof of the followingresult. The idea is to exploit the cyclic structure by using IntegerMultiply tohandle the (cyclic) integer multiplication. This device saves a constant factor ateach recursion level of the main algorithm.
Proposition 5.1.
For any prime p , and for any positive integer r satisfying (lg lg p ) < lg r < (lg p ) / , there exists an integer n in the interval r lg p < n < (cid:18) r (cid:19) r lg p (5.2) such that ˜ C ⋆ poly ( r, p ) < (cid:18) O (1)lg r (cid:19) C ⋆ int ( n ) + O (1) . (Note that this bound does not hold over the whole range r p ; to obtain theconstant 2, we need to restrict to a smaller range of r .) Proof.
We are given as input U , . . . , U t , V ∈ F p [ X ] / ( X r − U V, . . . , U t V .We use the following algorithm. Lift the inputs to polynomials U ′ , . . . , U ′ t , V ′ ∈ Z [ X ] / ( X r − x < p . Evaluate thesepolynomials at X = 2 b , where b := 2 lg p + lg r ; that is, pack the coefficientstogether to obtain integers u s := U s (2 b ) and v := V (2 b ) in Z / (2 rb − Z . Call IntegerMultiply with n := rb to compute the cyclic integer products w s := u s v .Then we have w s = W ′ s (2 b ) where W ′ s := U ′ s V ′ ∈ Z [ X ] / ( X r − W ′ s lie in the interval 0 x r ( p − < rp −
1, and since 2 b > rp ,we may unpack w s to recover the coefficients of W ′ s unambiguously. Finally, byreducing the coefficients of W ′ s modulo p , we arrive at the desired products W s ∈ F p [ X ] / ( X r − n = 2 r lg p + r lg r , the bound (5.2) follows by taking into account the hy-pothesis that lg r < (lg p ) / . For the complexity we have˜ C poly ( t, r, p ) < C int ( t, n ) + O ( tr lg p lg lg p lg lg lg p ) , where the last term covers the divisions by p at the end of the algorithm (and alsothe linear-time packing and unpacking steps). Dividing by (2 t + 1) r lg p lg( r lg p )and taking suprema over t >
1, we obtain˜ C ⋆ poly ( r, p ) < n lg nr lg p lg( r lg p ) C ⋆ int ( n ) + O (cid:18) lg lg p lg lg lg p lg( r lg p ) (cid:19) . The last term lies in O (1) thanks to the assumption lg r > (lg lg p ) . Moreover,(5.2) implies that lg n lg( r lg p ) + 2, so we find that n lg nr lg p lg( r lg p ) < (cid:18) r (cid:19) (cid:18) r lg p ) (cid:19) < O (1)lg r . (cid:3) Now we describe the implementation of
IntegerMultiply . It chooses one oftwo algorithms, depending on the size of n . For small n , it calls any convenientbasecase multiplication algorithm, such as the Sch¨onhage–Strassen algorithm. Forlarge n , it uses the algorithm described in the proof of Proposition 5.2 below.This algorithm reduces the problem to a collection of instances of RefinedAd-missibleMultiply , one for each prime p ∈ P ( n ), where P ( n ) is defined to bethe set consisting of the smallest lg n primes that exceed (lg n ) (we will see inthe proof below that these primes satisfy lg p = 2 lg lg n + O (1)). For example, P (10 ) = { , , . . . , } (the first 14 primes after 144.5). ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 23
Proposition 5.2.
There is an absolute constant z > with the following property.For all n > z , there exists an admissible length N in the interval n lg n lg lg n < N < (cid:18) n (cid:19) n lg n lg lg n , (5.3) such that N is p -admissible for all p ∈ P ( n ) , and such that C ⋆ int ( n ) < (cid:18) O (1)lg lg n (cid:19) X p ∈P ( n ) n ˜ C ⋆ ad ( N, p ) + O (1) . (5.4) Proof.
We are given as input u , . . . , u t , v ∈ Z / (2 n − Z , and we wish to computethe products u v, . . . , u t v . Step 1 — choose parameters.
In this preliminary step we compute a number ofparameters that depend only on n .The prime number theorem (see for example [3, p. 9]) implies that the numberof primes between (lg n ) and (lg n ) is asymptotically(lg n ) n ) ) ≫ lg n. Therefore, for large n we certainly have (lg n ) < p < (lg n ) (5.5)for all p ∈ P ( n ). Define P := Q p ∈P ( n ) p ; then(2 log lg n −
1) lg n log P (2 log lg n ) lg n, so 2 lg n lg lg n − n lg P n lg lg n. (5.6)Clearly we may compute P ( n ) and P within (lg n ) O (1) bit operations.Let n ′ := (cid:24) n lg P − lg n − (cid:25) ; (5.7)this makes sense for large n , as lg P ≫ lg n . Using Proposition 3.4 (with p = 2),construct an admissible length N in the interval n ′ < N < (cid:18) n ′ (cid:19) n ′ . The invocation of Proposition 3.4 costs 2 O ((lg lg n ′ ) ) = O ( n ) bit operations. Let uscheck that (5.3) holds for this choice of N . In one direction, by (5.6) we have N > n ′ > n lg P − lg n − > n lg P > n lg n lg lg n . (5.8) For the other direction we have
N < (cid:18) n ′ (cid:19) (cid:18) n n lg lg n − n − (cid:19) = (cid:18) n ′ (cid:19) (cid:18) n + 32 lg n lg lg n − n − n lg lg nn (cid:19) n lg n lg lg n< (cid:18) n ′ (cid:19) (cid:18) o (1)lg lg n (cid:19) n lg n lg lg n< (cid:18) n (cid:19) n lg n lg lg n for large n .Finally, let us verify that N is p -admissible for all p ∈ P ( n ) (for large n ). First,by (5.8) and (5.5) we have N > (lg n ) > p . Also, by (3.1) and (5.8), every primedivisor of N = q · · · q e satisfies q i > (lg N ) > (lg n ) > p ; in particular, p ∤ N . Step 2 — convert to polynomial product modulo P . In this step we applythe Crandall–Fagin algorithm from Section 2.3. For this, we require that lg
P > ⌈ n/N ⌉ + lg N + 1; this follows from (5.7) aslg P > nn ′ + lg n + 3 > nN + lg n + 3 > l nN m + lg N + 1 . We also require an element θ ∈ Z /P Z such that θ N = 2. To construct θ , we firstcompute a p := N − (mod p −
1) for each p ∈ P ( n ). This modular inverse existsbecause N is a product of primes that are all greater than p , and hence relativelyprime to p −
1. Then we put θ p := 2 a p (mod p ), so that ( θ p ) N = 2 (mod p ). Usingthe Chinese remainder theorem, we compute θ ∈ Z /P Z such that θ = θ p (mod p )for all p ∈ P ( n ); then θ N = 2 as desired. All of this can be effected in (lg n ) O (1) bitoperations.According to Section 2.3, the problem of computing u v, . . . , u t v reduces to com-puting U V, . . . , U t V for certain polynomials U , . . . , U t , V ∈ ( Z /P Z )[ X ] / ( X N − O ( t ( N lg P + N (lg n ) + N lg P lg lg P lg lg lg P )) = O ( tn lg n/ lg lg n )bit operations. Step 3 — reduce to products modulo small primes.
In this step we convert eachmultiplication problem in ( Z /P Z )[ X ] / ( X N −
1) into a collection of products in F p [ X ] / ( X N − p ∈ P ( n ).We start with the isomorphism Z /P Z ∼ = ⊕ p ∈P ( n ) F p , which may be computed ineither direction in O (lg P (lg lg P ) lg lg lg P ) bit operations using fast simultaneousmodular reduction and fast Chinese remaindering algorithms [27, § Z /P Z )[ X ] / ( X N − ∼ = M p ∈P ( n ) F p [ X ] / ( X N − , which may be computed in either direction, for all s = 1 , . . . , t , in O ( tN lg P (lg lg P ) lg lg lg P ) = O ( tn (lg lg n ) lg lg lg n )bit operations. Note that the isomorphism Z /P Z ∼ = ⊕ p F p must be applied tothe coefficient of each X i independently, but the subroutine for multiplying in ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 25 F p [ X ] / ( X N −
1) needs sequential access to all of the residues for a single prime p .The required data rearrangement corresponds to transposing a tN × |P ( n ) | array,which costs only O ( tN |P ( n ) | lg |P ( n ) | max p lg p ) = O ( tn lg lg n ) bit operations. Fi-nally, for each p ∈ P ( n ), the products in F p [ X ] / ( X N −
1) may be computed bycalling
RefinedAdmissibleMultiply , since N is a p -admissible length.Combining the contributions from Steps 1, 2 and 3, we obtain C int ( t, n ) < X p ∈P ( n ) ˜ C ad ( t, N, p ) + O ( tn lg n/ lg lg n ) . Dividing by (2 t + 1) n lg n and taking suprema over t > C ⋆ int ( n ) < X p ∈P ( n ) ˜ C ⋆ ad ( N, p ) N lg p lg( N lg p ) n lg n + O (1) . By (5.5) we have lg p n , so (5.3) implies that N lg p < (cid:18) O (1)lg lg n (cid:19) n lg n , and also lg( N lg p ) lg n , for large n . The bound (5.4) follows immediately. (cid:3) We may now glue together the various pieces to obtain a doubly-exponentialrecurrence for C ⋆ int ( n ). Proposition 5.3.
There are absolute constants z > z > and C , C > ,and a logarithmically slow function Ψ : ( z , ∞ ) → R such that Ψ( z ) > z , withthe following property. For any n > z , there exist positive integers n , . . . , n d < Ψ(Ψ( n )) , and weights γ , . . . , γ d > with P i γ i = 1 , such that C ⋆ int ( n ) < (cid:18)
32 + C lg lg lg n (cid:19) d X i =1 γ i C ⋆ int ( n i ) + C . (5.9) Proof.
Step 1 — top-level call to
IntegerMultiply . Applying Proposition 5.2,we obtain an admissible N in the interval n lg n lg lg n < N < (cid:18) n (cid:19) n lg n lg lg n , (5.10)such that N is p -admissible for all p ∈ P ( n ), and such that C ⋆ int ( n ) < (cid:18) O (1)lg lg n (cid:19) X p ∈P ( n ) n ˜ C ⋆ ad ( N, p ) + O (1) . (5.11)In what follows, we frequently use the estimates lg N = lg n + O (lg lg n ) andlg lg N = lg lg n + O (1), which follow from (5.10). Also, from (5.5) we have lg p =2 lg lg n + O (1) for all p ∈ P ( n ). Step 2 — first call to
RefinedAdmissibleMultiply . In this step we use (therefined analogue of) Proposition 4.3 to estimate the ˜ C ⋆ ad ( N, p ) term in (5.11), for afixed p ∈ P ( n ). We obtain integers r p, , . . . , r p,d p such that2 (lg lg N ) < r p,i < (lg lg N ) (5.12) and weights γ p, , . . . , γ p,d p > P i γ p,i = 1, such that˜ C ⋆ ad ( N, p ) < (cid:18) O (1)lg lg N (cid:19) d p X i =1 γ p,i ˜ C ⋆ poly ( r p,i , p ) + O (1) . Substituting into (5.11), and using lg lg N = lg lg n + O (1), yields C ⋆ int ( n ) < (cid:18) O (1)lg lg n (cid:19) X p ∈P ( n ) d p X i =1 γ p,i lg n ˜ C ⋆ poly ( r p,i , p ) + O (1) . (5.13) Step 3 — first call to
RefinedPolynomialMultiply . In this step we use (therefined analogue of) Proposition 4.2 to estimate the ˜ C ⋆ poly ( r p,i , p ) term in (5.13), fora fixed p ∈ P ( n ) and i ∈ { , . . . , d p } . The precondition r p,i > max( z , p ) holds forlarge n , because by (5.12) we havelg( p ) = 4 lg lg n + O (1) < (lg lg N ) lg r p,i . Thus there exists a p -admissible length N p,i in the interval2 r p,i < N p,i < (cid:18) n (cid:19) r p,i (5.14)such that ˜ C ⋆ poly ( r p,i , p ) < (cid:18) O (1)lg lg n (cid:19) ˜ C ⋆ ad ( N p,i , p ) + O (1) . Substituting into (5.13) yields C ⋆ int ( n ) < (cid:18) O (1)lg lg n (cid:19) X p ∈P ( n ) d p X i =1 γ p,i lg n ˜ C ⋆ ad ( N p,i , p ) + O (1) . (5.15) Step 4 — second call to
RefinedAdmissibleMultiply . In this step we useProposition 4.3 again, to estimate the ˜ C ⋆ ad ( N p,i , p ) term in (5.15), for a fixed p ∈P ( n ) and i ∈ { , . . . , d p } . We obtain integers r p,i, , . . . , r p,i,d p,i such that2 (lg lg N p,i ) < r p,i,j < (lg lg N p,i ) (5.16)and weights γ p,i, , . . . , γ p,i,d p,i > P j γ p,i,j = 1, such that˜ C ⋆ ad ( N p,i , p ) < (cid:18) O (1)lg lg N p,i (cid:19) d p,i X j =1 γ p,i,j ˜ C ⋆ poly ( r p,i,j , p ) + O (1) . We have lg lg N p,i > lg lg r p,i > lg lg lg n , so substituting into (5.15) yields C ⋆ int ( n ) < (cid:18)
16 + O (1)lg lg lg n (cid:19) X p ∈P ( n ) d p X i =1 d p,i X j =1 γ p,i γ p,i,j lg n ˜ C ⋆ poly ( r p,i,j , p ) + O (1) . (5.17) Step 5 — second call to
RefinedPolynomialMultiply . In this step we useProposition 5.1 to estimate the ˜ C ⋆ poly ( r p,i,j , p ) term in (5.17), for a fixed p ∈ P ( n ), i ∈ { , . . . , d p } and j ∈ { , . . . , d p,i } . The precondition(lg lg p ) < lg r p,i,j < (lg p ) / holds for large n , as (5.16), (5.14) and (5.12) imply that(6 lg lg lg n + O (1)) < lg r p,i,j < (7 lg lg lg n + O (1)) , ASTER INTEGER AND POLYNOMIAL MULTIPLICATION 27 whereas (lg lg p ) = (lg lg lg n + O (1)) and (lg p ) / = (2 lg lg n + O (1)) / . We thusobtain an integer n p,i,j in the interval2 r p,i,j lg p < n p,i,j < (cid:18) n (cid:19) r p,i,j lg p (5.18)such that ˜ C ⋆ poly ( r p,i,j , p ) < (cid:18) O (1)lg lg lg n (cid:19) C ⋆ int ( n p,i,j ) + O (1) . Substituting into (5.17) produces C ⋆ int ( n ) < (cid:18)
32 + O (1)lg lg lg n (cid:19) X p ∈P ( n ) d p X i =1 d p,i X j =1 γ p,i γ p,i,j lg n C ⋆ int ( n p,i,j ) + O (1) . The weights γ p,i γ p,i,j / lg n sum to 1, so after appropriate reindexing we obtain thedesired bound (5.9).Finally, for the logarithmically slow function Ψ( x ) := 2 (log log x ) , let us verifythat for large n , we have n p,i,j < Ψ(Ψ( n )) for all p , i and j . First, since lg lg N p,i > lg lg lg n , we have lg p < n · lg lg N p,i , and hence, by (5.18) and (5.16), n p,i,j < r p,i,j lg p < · (lg lg N p,i ) +lg lg N p,i < Ψ( N p,i ) . Then by (5.14) and (5.12) we have N p,i < r p,i < · (lg lg N ) · (lg lg n ) < Ψ( n ) . Since Ψ( x ) is increasing, we get the desired inequality n p,i,j < Ψ(Ψ( n )). (cid:3) Now we may prove the main theorem for integer multiplication.
Proof of Theorem 1.1.
We have already noted that it suffices to establish that C ⋆ int ( n ) = O ((4 √ log ∗ n ) (see (5.1)). Let z , z , C , C and Ψ( x ) be as in Propo-sition 5.1. Increasing z if necessary, we may assume that z > exp(exp(exp(1)))and that Ψ( x ) x − x > z . Applying Proposition 2.1 with K = 32, B = C / S = { , , . . . } , ℓ = 3, κ = 2, x = z , x = σ = z , L =max( C , max n z C ⋆ int ( n )), and T ( n ) = C ⋆ int ( n ) leads immediately to the desiredbound. (cid:3) Acknowledgments
The authors thank Gr´egoire Lecerf for his comments on a draft of this paper.The first author was supported by the Australian Research Council (DP150101689and FT160100219).
References
1. L. M. Adleman, C. Pomerance, and R. S. Rumely,
On distinguishing prime numbers fromcomposite numbers , Ann. of Math. (2) (1983), no. 1, 173–206. MR 683806 (84e:10008)2. R. Agarwal and J. Cooley,
New algorithms for digital convolution , IEEE Transactions onAcoustics, Speech, and Signal Processing (1977), no. 5, 392–410.3. T. M. Apostol, Introduction to analytic number theory , Springer-Verlag, New York-Heidelberg,1976, Undergraduate Texts in Mathematics. MR 04349294. R. C. Baker, G. Harman, and J. Pintz,
The difference between consecutive primes. II , Proc.London Math. Soc. (3) (2001), no. 3, 532–562. MR 18510815. L. I. Bluestein, A linear filtering approach to the computation of discrete Fourier transform ,IEEE Transactions on Audio and Electroacoustics (1970), no. 4, 451–455.
6. A. Bostan, P. Gaudry, and ´E. Schost,
Linear recurrences with polynomial coefficients andapplication to integer factorization and Cartier-Manin operator , SIAM J. Comput. (2007),no. 6, 1777–1806. MR 2299425 (2008a:11156)7. D. G. Cantor and E. Kaltofen, On fast multiplication of polynomials over arbitrary algebras ,Acta Inform. (1991), no. 7, 693–701. MR 1129288 (92i:68068)8. S. Covanov and E. Thom´e, Fast integer multiplication using generalized Fermat primes , http://arxiv.org/abs/1502.02800 , 2016.9. R. Crandall and B. Fagin, Discrete weighted transforms and large-integer arithmetic , Math.Comp. (1994), no. 205, 305–324. MR 118524410. A. De, P. Kurur, C. Saha, and R. Saptharishi, Fast integer multiplication using modulararithmetic , SIAM Journal on Computing (2013), no. 2, 685–699.11. M. F¨urer, Faster integer multiplication , STOC’07—Proceedings of the 39th Annual ACMSymposium on Theory of Computing, ACM, New York, 2007, pp. 57–66. MR 2402428(2009e:68124)12. ,
Faster integer multiplication , SIAM J. Comput. (2009), no. 3, 979–1005.MR 2538847 (2011b:68296)13. I. J. Good, The interaction algorithm and practical Fourier analysis , J. Roy. Statist. Soc. Ser.B (1958), 361–372. MR 0102888 (21 Faster truncated integer multiplication , https://arxiv.org/abs/1703.00640 ,2017.15. D. Harvey and J. van der Hoeven, Faster integer multiplication using plain vanilla FFTprimes , https://arxiv.org/abs/1611.07144 , to appear in Mathematics of Computation,2016.16. D. Harvey, J. van der Hoeven, and G. Lecerf, Faster polynomial multiplication over finitefields , technical report, http://arxiv.org/abs/1407.3361 , 2014.17. ,
Even faster integer multiplication , J. Complexity (2016), 1–30. MR 353063718. , Faster polynomial multiplication over finite fields , J. ACM (2017), no. 6, 52:1–52:23.19. C. H. Papadimitriou, Computational complexity , Addison-Wesley Publishing Company, Read-ing, MA, 1994. MR 1251285 (95f:68082)20. J. M. Pollard,
The fast Fourier transform in a finite field , Math. Comp. (1971), 365–374.MR 030196621. K. Prachar, ¨Uber die Anzahl der Teiler einer nat¨urlichen Zahl, welche die Form p − haben ,Monatsh. Math. (1955), 91–97. MR 006856922. A. Sch¨onhage, Schnelle Multiplikation von Polynomen ¨uber K¨orpern der Charakteristik 2 ,Acta Informat. (1976/77), no. 4, 395–398. MR 043666323. A. Sch¨onhage and V. Strassen, Schnelle Multiplikation grosser Zahlen , Computing (Arch.Elektron. Rechnen) (1971), 281–292. MR 0292344 (45 On the deterministic complexity of factoring polynomials over finite fields , Inform.Process. Lett. (1990), no. 5, 261–267. MR 1049276 (91f:11088)25. , Searching for primitive roots in finite fields , Math. Comp. (1992), no. 197, 369–380. MR 110698126. L. H. Thomas, Using computers to solve problems in physics , Applications of digital computers (1963), 42–57.27. J. von zur Gathen and J. Gerhard,
Modern computer algebra , third ed., Cambridge UniversityPress, Cambridge, 2013. MR 3087522
E-mail address : [email protected] School of Mathematics and Statistics, University of New South Wales, Sydney NSW2052, Australia
E-mail address : [email protected]@lix.polytechnique.fr