Fast Computation of the Nth Term of an Algebraic Series over a Finite Prime Field
FFast Computationof the Nth Term of an Algebraic Seriesover a Finite Prime Field ∗ Alin Bostan
Inria (France) [email protected] Gilles Christol
IMJ (France) [email protected] Philippe Dumas
Inria (France) [email protected]
ABSTRACT
We address the question of computing one selected term ofan algebraic power series. In characteristic zero, the bestalgorithm currently known for computing the N th coefficientof an algebraic series uses differential equations and hasarithmetic complexity quasi-linear in √ N . We show thatover a prime field of positive characteristic p , the complexitycan be lowered to O (log N ). The mathematical basis forthis dramatic improvement is a classical theorem statingthat a formal power series with coefficients in a finite fieldis algebraic if and only if the sequence of its coefficients canbe generated by an automaton. We revisit and enhance twoconstructive proofs of this result for finite prime fields. Thefirst proof uses Mahler equations, whose sizes appear to beprohibitively large. The second proof relies on diagonals ofrational functions; we turn it into an efficient algorithm, ofcomplexity linear in log N and quasi-linear in p . CCS Concepts • Computing methodologies → Algebraic algorithms;
Keywords algebraic series; finite fields; Mahler equations; diagonals; p -rational series; section operators; algebraic complexity
1. INTRODUCTION
One of the most difficult questions in modular computationsis the complexity of computations mod p for a large prime p of coefficients in the expansion of an algebraic function. D.V. Chudnovsky & G.V. Chudnovsky, 1990 [11].
Context.
Algebraic functions are ubiquitous in all branchesof pure and applied mathematics, notably in algebraic geome-try, combinatorics and number theory. They also arise at theconfluence of several fields in computer science: functionalequations, automatic sequences, complexity theory. From acomputer algebra perspective, a fundamental question is theefficient computation of power series expansions of algebraic ∗ We warmly thank the referees for their very helpful comments.
ISSAC’16, July 19–22, 2016, Waterloo, ON, Canada
ACM ISBN 978-1-4503-4380-0/16/07.DOI: http://dx.doi.org/10.1145/2930889.2930904 functions. We focus on the particular question of computing one selected term of an algebraic power series f whose coeffi-cients belong to a field of positive characteristic p . Beyond itsrelevance to complexity theory, this problem is important inapplications to integer factorization and point-counting [4].
Setting.
More precisely, we assume in this article that theground field is the prime field F p and that the power series f is (implicitly) given as the unique solution in F p [[ x ]] of E ( x, f ( x )) = 0 , f (0) = 0 , where E is a polynomial in F p [ x, y ] that satisfies E (0 ,
0) = 0 and E y (0 , = 0 . (Here, and hereafter, E y stands for the partial derivative ∂E∂y .)Given as input the polynomial E and an integer N > N thcoefficient f N of f = P i f i x i . The efficiency is measured interms of number of arithmetic operations ( ± , × , ÷ ) in thefield F p , the main parameters being the index N , the prime p and the bidegree ( h, d ) of E with respect to ( x, y ).In the particular case d = 1, the algebraic function f isactually a rational function, and thus the N th coefficient ofits series expansion can be computed in O (log N ) operationsin F p , using standard binary powering techniques [20, 14].Therefore, it will be assumed in all that follows that d > Previous work and contribution.
The most straightfor-ward method for computing the coefficient f N of the alge-braic power series f proceeds by undetermined coefficients.Its arithmetic complexity is O ( N d ). Kung and Traub [19]showed that the formal Newton iteration can accelerate thisto ˜ O ( dN ). (The soft-O notation ˜ O ( · ) indicates that polyloga-rithmic factors are omitted.) Both methods work in arbitrarycharacteristic and compute f N together with all f i , i < N .In characteristic zero, it is possible to compute the coef-ficient f N faster, without computing all the previous ones.This result is due to the Chudnovsky brothers [10] andis based on the classical fact that the coefficient sequence( f n ) n ≥ satisfies a linear recurrence with polynomial coef-ficients [12, 9, 3]. Combined with baby steps/giant stepstechniques, this leads to an algorithm of complexity quasi-linear in √ N . Except for the very particular case of rationalfunctions ( d = 1), no faster method is currently known.Under restrictive assumptions, the baby step/giant stepalgorithm can be adapted to the case of positive charac-teristic [4]. The main obstacle is the fact that the linearrecurrence satisfied by the sequence ( f n ) n ≥ has a leadingcoefficient that may vanish at various indices. In the spiritof [4, §8], p -adic lifting techniques could in principle be used,but we are not aware of any sharp analysis of the sufficient p - a r X i v : . [ c s . S C ] M a y dic precision in the general case. Anyways, the best that canbe expected from this method is a cost quasi-linear in √ N .We attack the problem from a different angle. Our startingpoint is a theorem due to the second author in the late1970s [2, Th. 12.2.5]. It states that a formal power serieswith coefficients in a finite field is algebraic if and only if thesequence ( f n ) n ≥ of its coefficients can be generated by a p -automaton, i.e., f N is the output of a finite-state machinetaking as input the digits of N in base p . This implicitlycontains the roots of a log N -method for computing f N , butthe size of the p -automaton is at least p ( d + h ) , see e.g. [22].In its original version [7] the theorem was stated for F ,but the proof extends mutatis mutandis to any finite field. Adifferent proof was given by Christol, Kamae, Mendès-Franceand Rauzy [8, §7]. Although constructive in essence, theseproofs do not focus on computational complexity aspects.Inspired by [1] and [13], we show that each of them leadsto an algorithm of arithmetic complexity O (log N ) for thecomputation of f N , after a precomputation that may becostly for p large. On the one hand, the proof in [8] relieson the fact that the sequence ( f n ) n ≥ satisfies a divide-and-conquer recurrence. However, we show (Sec. 2) that the sizeof the recurrence is polynomial in p d , making the algorithmuninteresting even for very moderate values of p . On theother hand, the key of the proof in [7] is to represent f asthe diagonal of a bivariate rational function. We turn it(Sections 3–4) into an efficient algorithm that has complexity O (log N ) after a precomputation whose cost is ˜ O ( p ) only.To our knowledge, the only previous explicit occurrence of alog N -type complexity for this problem appears in [11, p. 121],which announces the bound O ( p · log N ) but without proof. Structure of the paper.
In Sec. 2, we propose an algo-rithm that computes the N th coefficient f N using Mahlerequations and divide-and-conquer recurrences. Sec. 3 is de-voted to the study of a different algorithm, based on theconcept of diagonals of rational functions. We conclude inSec. 4 with the design of our main algorithm. Cost measures.
We use standard complexity notation. Thereal number ω > n × n matrices with entries in F p in O ( n ω ) operations in F p .The best bound currently known is ω < . F p [ x ] d , the setof polynomials of degree at most d in F p [ x ], can be performedin ˜ O ( d ) operations: addition, multiplication, division, etc .The key to these results is a divide-and-conquer approachcombined with fast polynomial multiplication [23, 5, 18]. Ageneral reference on fast algebraic algorithms is [17].
2. USING MAHLER EQUATIONS
We revisit, from an algorithmic point of view, the proofin [8, §7] of the fact that the coefficients of an algebraic powerseries f ∈ F p [[ x ]] can be recognized by a p -automaton. Start-ing from an algebraic equation for f = P n f n x n , we firstcompute a Mahler equation satisfied by f , then we derive anappropriate divide-and-conquer recurrence for its coefficientsequence ( f n ) n , and use it to compute f N efficiently. Wewill show that, although the complexity of this method isvery good (i.e., logarithmic) with respect to the index N , thecomputation cost of the recurrence, and actually its meresize, are extremely high (i.e., exponential) with respect tothe algebraicity degree d . Mahler equations are functional equations which use theMahler operator M p , or M for short, acting on formal powerseries by substitution: Mg ( x ) = g ( x p ). Their power seriessolutions are called Mahler series . In characteristic 0, thereis no a priori link between algebraic equations and Mahlerequations; moreover, a series which is at the same timealgebraic and Mahler is a rational function [21, §1.3].By contrast, over F p , a Mahler equation is nothing but analgebraic equation, because of the Frobenius endomorphismthat permits writing g ( x p ) = g ( x ) p for any g ∈ F p [[ x ]]. Hencea Mahler series in F p [[ x ]] is an algebraic series. Conversely,an algebraic series is Mahler, since if f is algebraic of degree d then the ( d + 1) power series f , Mf = f p , . . . , M d f = f p d are linearly dependent over F p ( x ).Concretely, the successive powers f p k (0 ≤ k ≤ d ) are ex-pressed as linear combinations of 1, f , . . . , f d − over F p ( x ),and any dependence relation between them delivers a non-trivial Mahler equation with polynomial coefficients c ( x ) f ( x ) + c ( x ) f ( x p ) + · · · + c K ( x ) f ( x p K ) = 0 , (1)whose order K is at most d . Example 1 (A toy example).
Consider E = x + y − y in F [ x, y ]. Let f = − x − x + 2 x − x + 2 x + · · · be theunique solution in x F [[ x ]] of E ( x, f ( x )) = 0. The expressions of f , f , f , f as linear combinations of 1, f , f with coefficientsin F ( x ) give the columns of the matrix x − x + x + x x − x + · · · x − x + 1 x − x + · · · x − x + x + x x − x + · · · . The first three columns are linearly dependent and lead to theMahler equation x (1 − x − x ) f ( x ) − (1 + x − x ) f ( x ) + f ( x ) = 0 . (2) A Mahler equation instantly translates into a recurrenceof divide-and-conquer type, in short a DAC recurrence. Sucha recurrence links the value of the index- n coefficient f n with some values for the indices n/p , n/p . . . , and someshifted indices. The translation is absolutely simple: a term x s f ( x ) in equation (1) translates into f n − s ; more generally,a term x s f ( x q ) becomes f ( n − s ) /q , with the convention thata coefficient f ν is zero if ν is not a nonnegative integer. Example 2 (A binomial case).
Let p > f = x + x + · · · be the algebraic series in x F p [[ x ]]solution of E ( x, y ) = x + (1 + y ) p − −
1. The rewriting x + (1 + f ) p − − x + 1 + f p f − x − f ) + (1 + f p )1 + f yields f p = − x + (1 − x ) f and f p = − x p + (1 − x p ) f p ; hence f p = (1 − x ) p +1 − − x ) p +1 f . The situation is highly non-generic, since all powers f p k are expressible as linear combinationsof 1 and f only. This delivers the second-order Mahler equation( x p − − x p ) f ( x ) − (1 + x p − − x p ) f ( x p ) + f ( x p ) = 0 , which translates into a recurrence on the coefficients sequence f n − p +1 − f n − p − f np − f n − p +1 p + f n − pp + f np = 0 . (3)It is possible to make the recurrence more explicit by consideringthe p cases according to the value of the residue of n modulo p . .2 Nth coefficient via a DAC recurrence The DAC recurrence relation attached to the Mahler equa-tion (1), together with enough initial conditions, can be usedto compute the N th coefficient f N of the algebraic series f .The complexity of the resulting algorithm is linear with re-spect to N . The reason is that the coefficient c ( x ) in (1)generally has more than one monomial, and as a consequence,all the coefficients f i , i < N are needed to compute f N . Example 3 (A binomial case, cont.).
We specializeEx. 2 by taking p = 7 and compute f N with N = 100. Applying (3)to n = 106, we need to know the value for n = 99, next for n = 98, n = 15, n = 14, and also for n = 97, . . . , n = 91, n = 14, n = 13, n = 2. In the end, it appears that all values of f n for the indices n smaller than N = 100 are needed to compute f N . It is possible to decrease dramatically the cost of thecomputation of f N , from linear in N to logarithmic in N .The idea is to derive from (1) another Mahler equation withthe additional feature that its trailing coefficient c ( x ) is 1.To do so, it suffices to perform a change of unknown series.Putting f ( x ) = c ( x ) g ( x ), we obtain the Mahler equation g ( x ) + c ( x ) c ( x ) p − g ( x p ) + · · · + c K ( x ) c ( x ) p K − g ( x p K ) = 0 . (4)Obviously this approach assumes that c is not zero. But, asproved in [2, Lemma 12.2.3], this is the case if (1) is assumedto be a minimal-order Mahler equation satisfied by f .The series g ( x ) is no longer a power series, but a Laurentseries in F p (( x )). We cut it into two parts, its negativepart g − ( x ) in F p [ x − ] and its nonnegative part h ( x ) in F p [[ x ]]: g − ( x ) = X n< g n x n , h ( x ) = X n ≥ g n x n . Let us rewrite Eq. (4) in the compact form L ( x, M ) g ( x ) =0, where L ( x, M ) is a skew polynomial in the variable x andin the Mahler operator M . Plugging g ( x ) = g − ( x )+ h ( x ) in ityields three terms. The first is the power series L ( x, M ) h ( x ).The second is the nonnegative part − b ( x ) of L ( x, M ) g − ( x ),in F p [ x ]. The third is the negative part of L ( x, M ) g − ( x ),and it is zero because it is the only one which belongsto x − F p [ x − ]. Eq. (4) is rewritten as a new (inhomoge-neous) Mahler equation L ( x, M ) h ( x ) = b ( x ), namely h ( x ) + c ( x ) c ( x ) p − h ( x p ) + · · · + c K ( x ) c ( x ) p K − h ( x p K ) = b ( x ) . (5) Example 4 (A toy example, cont.).
The change ofseries f ( x ) = x (1 − x − x ) h ( x ) in (2) gives the new equation h ( x ) − x (1 − x )(1 + x )(1 + x + 2 x )(1 − x − x ) h ( x )+ x (1 − x − x ) h ( x ) = − x − x + x + 2 x + · · · + x − x − x and a recurrence h n = h n − + 2 h n − + · · · − h n − + · · · , while f n is given by f n = h n − − h n − − h n − for n ≥ . The right-hand side b ( x ) of this new inhomogeneous equation isobtained as the nonnegative part of − ( c c g − + c c g − ), where c = x (1 − x − x ), c = 2 x − x − c = 1 are the coefficientsof Eq. (2), and where g − = − x − − x − , the negative part of f/c , can be determined starting from ( f mod x ). To compute the coefficient f N for N = 1251, this approach onlyrequires the computation of thirteen terms of the sequence h n ,namely for n ∈ { , , , , , , , , , , , , } .This number of terms behaves like 3 log p N , and compares well tothe use of a recurrence like (3), which requires N terms. At this point, it is intuitively plausible that the N th coeffi-cient f N can be computed in arithmetic complexity O (log N ):to obtain f N , we need a few values h N and these ones are es-sentially obtained from a bounded number of h N/p , h N/p . . . . This plausibility can be strengthened and made clearerwith the introduction of the section operators (sometimescalled
Cartier operators , or decimation operators ) and theconcept of p -rational series . Definition The section operators S , . . . , S p − , withrespect to the radix p , are defined by S r X n ≥ f n x n = X k ≥ f pk + r x k , for ≤ r < p. (6)In other words, there is a section operator for each digit ofthe radix- p numeration system and the r th section operatorextracts from a given series its part associated with theindices congruent to the digit r modulo p . These operatorsare linear and well-behaved with respect to the product: S r [ f ( x ) g ( x )] = X s + t ≡ r mod p x b s + tp c S s f ( x ) S t g ( x ) . In particular, for g ( x ) = h ( x p ) the previous formula becomes S r [ f ( x ) h ( x p )] = S r [ f ( x )] × h ( x ) , (7)because of the obvious relationships with the Mahler operator S M = I F p [[ x ]] , S r M = 0 for r >
0. The action of the sectionoperators can be extended to the field F p (( x )) of formalLaurent series and the same properties apply to the extendedoperators, which we denote in the same way.The section operators permit to express the coefficient h N of a formal power series h ( x ) using the radix- p digits of N . Lemma Let h be in F p [[ x ]] and let N = ( N ‘ · · · N N ) p be the radix- p expansion of N . Then h N = ( S N ‘ · · · S N S N h )(0) . (8) Proof.
We first apply the section operator S N to theseries h ( x ) associated with the least significant digit of N ,that is we start from N and we pick every p th coefficientof h ( x ). This gives S N h ( x ) = h N + h N + p x + h N +2 p x + · · · . Iterating this process with each digit produces the series S N ‘ · · · S N S N h ( x ) = h N + h N + p ‘ +1 x + · · · , whose constant coefficient is h N . Let us return to the algebraic series f ( x ) and its rela-tive h ( x ). The Mahler equation (5) can be rewritten h ( x ) = b ( x ) + a ( x ) h ( x p ) + · · · + a K ( x ) h ( x p K ) . (9)The coefficients b ( x ) and a k ( x ), 1 ≤ k ≤ K , are polynomialsof degrees at most D , say. As a matter of fact, the vectorspace W of linear combinations s ( x ) = a ( x ) + b ( x ) h ( x ) + b ( x ) h ( x p ) + · · · + b K ( x ) h ( x p K )ith a ( x ) and all b k ( x ) in F p [ x ] D is stable under the actionof the section operators. Indeed, from s ( x ) = ( a ( x ) + b ( x ) b ( x )) + ( b ( x ) a ( x ) + b ( x )) h ( x p ) + · · · + ( b ( x ) a K ( x ) + b K ( x )) h ( x p K )we deduce S r s ( x ) = S r ( a ( x ) + b ( x ) b ( x ))+ S r ( b ( x ) a ( x ) + b ( x )) h ( x ) + · · · + S r ( b ( x ) a K ( x ) + b K ( x )) h ( x p K − ) , (10)and the polynomials S r ( a + b b ) and S r ( b a k + b k ), 1 ≤ k ≤ K ,have degrees not greater than 2 D/p ≤ D .The important consequence of this observation is that wehave produced a finite dimensional F p -vector space W thatcontains h ( x ) and that is stable under the sections ( S r ) ≤ r
Theorem Let E be a polynomial in F p [ x, y ] h,d suchthat E (0 ,
0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be itsunique root with f (0) = 0 . Algorithm 1 computes the N thcoefficient f N of f using ˜ O ( d h p d log N ) operations in F p . To do this, we first need a preliminary result that will alsobe useful in Sec. 4. It provides size and complexity boundsfor the remainder of the Euclidean division of a monomialin y by a polynomial in y with coefficients in F p [ x ]. Lemma Let E = P i e i ( x ) y i be a polynomial in F p [ x ][ y ] of degree d in y and at most h in x . Then, for D ≥ d , y D mod E = 1 e D − d +1 d (cid:0) r ( x ) + · · · + r d − ( x ) y d − (cid:1) , where the r i ’s are in F p [ x ] of degree at most h ( D − d + 1) .One can compute the r i ’s using ˜ O ( hdD ) operations in F p . Algorithm
Nth coefficient via Mahler equations
Input
A polynomial E ( x, y ) ∈ F p [ x, y ] with E (0 ,
0) = 0and E y (0 , = 0, and an integer N ≥ Output
The N th coefficient f N of the unique formal powerseries f ∈ F p [[ x ]] solution of E ( x, f ( x )) = 0, f (0) = 0.1. Use Algorithm 2 to compute a minimal-order Mahlerequation (with c = c ,v x v + · · · + c ,d x d , c ,v = 0): L ( x, M ) f ( x ) = c ( x ) f ( x ) + · · · + c K ( x ) f ( x p K ) = 0.2. Deduce an equation L ( x, M ) g ( x ) = 0 (4), with c = 1,satisfied by g ( x ) = f ( x ) /c ( x ).3. Compute f ( x ) mod x v by Newton iteration, anddeduce the negative part g − ( x ) of f ( x ) /c ( x ).4. Compute the nonnegative part b ( x ) of − L ( x, M ) g − ( x ) . The nonnegative part h ( x ) of f ( x ) /c ( x ) satisfiesthe equation L ( x, M ) h ( x ) = b ( x ) (5).5. Get a linear representation ( L, ( A r ) ≤ r
The degree bounds are proved by induction on D .The computation of y D mod E can be performed by binarypowering. The last step dominates the cost of the wholeprocess. It amounts to first computing the product of twopolynomials in F p [ x, y ] of degree at most d − y and ofdegree O ( hD ) in x , then reducing the result modulo E . Bothsteps have complexity ˜ O ( hdD ): the multiplication is donevia Kronecker’s substitution [17, Cor. 8.28], the reductionvia an adaptation of the Newton iteration used for the fastdivision of univariate polynomials [17, Th. 9.6].The first step of Algorithm 1 is the computation of a Mahlerequation for the algebraic series f . We begin by analyzingthe sizes of this equation, and the cost of its computation. Theorem Let E be a polynomial in F p [ x, y ] h,d suchthat E (0 ,
0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be itsunique root with f (0) = 0 . Algorithm 2 computes a Mahlerequation of minimal-order satisfied by f using ˜ O ( d ω hp d ) op-erations in F p . The output equation has order at most d andpolynomial coefficients in F p [ x ] of degree at most dhp d . Proof.
We first prove the correctness part. Since the R i ’s all live in the vector space generated by 1 , f, . . . , f d − over F p ( x ), the algorithm terminates and returns a Mahleroperator of order K at most d .At step s we have R s = y p s mod E , thus R s ( x, f ) = f p s .Since P Kk =0 c k R k = 0, this yields L ( f ) = P Kk =0 c k f p k = 0,so L is a Mahler operator for f . It has minimal order,because otherwise R , . . . , R K − would be linearly dependentover F p ( x ). Therefore, the algorithm is correct.By Lemma 6, e p s − d +1 d R s is in F p [ x, y ] of degree in y atmost d − x at most h ( p s − d + 1). Moreover, R , . . . , R K can be computed in time ˜ O ( hdp K ).By Cramer’s rule, the degrees of the c k ’s are bounded by dhp K ≤ dhp d . Testing the rank condition, and solving forthe c k ’s amounts to polynomial linear algebra in size at most d × ( d + 1) and degree at most hp d . This can be done incomplexity ˜ O ( d ω hp d ), using Storjohann’s algorithm [24].lgorithm AlgeqToMahler
Input E ( x, y ) ∈ F p [ x, y ] with E (0 ,
0) = 0 and E y (0 , = 0. Output
A minimal-order monic Mahler operator for theunique f ∈ F p [[ x ]] with E ( x, f ( x )) = 0 and f (0) = 0. R = y ; for s = 1 , , . . . do R s = R ps − mod E . R s = y p s mod E if rank F p ( x ) ( R , R , . . . , R s ) < s + 1 then Solve P s − k =0 c k c s R k = − R s for c , . . . , c s in F p [ x ] return P sk =0 c k M k Algorithm 2.
From algebraic equations to Mahler equations.
We are now ready to prove Theorem 5.
Proof of Theorem 5.
The correctness of Algorithm 1follows from the discussion in Sec. 2.4. By Theorem 7, theoutput of Step 1 is a Mahler equation of order K ≤ d andcoefficients c j of maximum degree H ≤ dhp K . In particular, d ≤ dhp d . The cost of Step 1 is ˜ O ( d ω hp d ). Step 2 consistsin computing the coefficients c j = c j c p j − for 1 ≤ j ≤ K , ofmaximum degree D ≤ Hp K . It can be performed in time˜ O ( KHp K ) = ˜ O ( d hp d ). Step 3 has negligible complexity˜ O ( dv ) = ˜ O ( d hp d ) using the Kung-Traub algorithm [19].Step 4 requires ˜ O ( KHp K ) = ˜ O ( d hp d ) operations. ByProposition 4 the last steps of the algorithm can be doneusing ˜ O (( DK ) log N ) = ˜ O (( d hp d ) log N ) for each N , thusfor a total cost of ˜ O (( d h p d ) log N ). This dominates thewhole computation.Generically, the size bounds in Theorem 5 are quite tight.This is illustrated by the following example. Example 5 (A cubic equation).
Let us consider the al-gebraic equation E ( x, y ) = x − (1 + x ) y + x y + (1 + x ) y = 0in F [ x, y ]. The assumption E y (0 ,
0) = − = 0 is fulfilled.We find a Mahler equation of order 3, whose coefficients c = x − x + · · · − x + x , c , c , c have respectively heights 45,47, 50, 32. We compute within precision O ( x ) the solution f ( x ) = x − x − x + x + · · · − x + · · · , and the negative part g − ( x ) = x − + x − + x − − x − − x − of g = f/c . We find anew operator L ( x, M ) by computing the product L ( x, M ) c ( x )in F [ x, M ] and next a new equation L ( x, M ) y ( x ) = b ( x ) forthe nonnegative part h ( x ) of g ( x ) by computing the polynomial b ( x ) = L ( x, M ) g − ( x ). It appears that the coefficients c = 1, c , c , c and the right-hand side b ( x ) have respectively heights 0, 92,365, 1157, and 1103. We arrive at a linear representation withsize (1 + 3 + 1) × (1 + 1157) = 5790 for p = 3. If we increasethe prime number p , we successively find sizes 155190 for p = 5,and 1342725 for p = 7. We do not pursue further because it isclear that such sizes make the computation unfeasible.
3. USING DIAGONALS
To improve the arithmetic complexity with respect to theprime number p , we need a different way to represent thealgebraic series f . It is given by Furstenberg’s theorem, andrelies on the concept of diagonal of rational functions . y − (2 − x ) y − y − x y xy − x ) − (1 − x ) y − y − x y xy x y x y x y
5+ 5 x y
6+ 4 x y x y − x y
5+ 2 x y x y
3+ 3 x y
4+ 3 x y − x y
6+ 4 x y x y
3+ 2 x y − x y − x y x y + 3 x y x y − x y − x y
6+ 5 xy x y + 2 x y − x y
5+ 4 xy − y x y + x y − x y − y x y − y x y − xy − y x y − y xy − y y + · · · Figure 1.
A diagonal of a bivariate rational series (see Ex. 6).
Furstenberg’s theorem [15] tells us that every algebraicseries is the diagonal of a bivariate rational function f ( x ) = D a ( x, y ) b ( x, y ) , b (0 , = 0 . (12)This means that f ( x ) = P i c i,i x i , where a/b = P i,j c i,j x i y j .Moreover, under the working assumption E y (0 , = 0, therational function a/b is explicit in terms of E : a ( x, y ) = yE y ( xy, y ) , b ( x, y ) = E ( xy, y ) /y. (Notice that b (0 ,
0) = E y (0 , = 0.) If the polynomial E ( x, y )has degree d and height h , then the partial degrees of a and b are bounded as follows d x = max(deg x a, deg x b ) ≤ h,d y = max(deg y a, deg y b ) ≤ d + h. (13) Example 6 (A quartic equation).
Consider f ( x ) = x + x + x +3 x +5 x +2 x +4 x − x − x − x + · · · , the unique solution in x F [[ x ]] of the algebraic equation E ( x, y ) = − x + (1 + x ) y − (1 + x ) y − y + (1 + x ) y = 0 . Then f is the diagonal of the rational function a/b with a = y − (2 − x ) y − y + (4 − x ) y + 4 xy ,b = (1 − x ) − (1 − x ) y − y + (1 − x ) y + xy . Fig. 1 shows pictorially that f ( x ) is the diagonal of a/b . We will follow the path drawn in [6], and compute the N thcoefficient f N using a/b as intermediate data structure. Bivariate section operators can be defined by mimicking (6).Although they depend on two residues modulo p , we willalways use them with the same residue for both variables;thus we will simply denote them by S r for 0 ≤ r < p . A nicefeature is that they commute with the diagonal operator, S r D = DS r . Together with (12), this property translates the action of thesection operators from the algebraic series f to the rationalfunction a/b . Moreover, property (7) extends to bivariatesection operators and justifies the following computation S r a ( x, y ) b ( x, y ) = S r a ( x, y ) b ( x, y ) p − b ( x, y ) p = S r a ( x, y ) b ( x, y ) p − b ( x p , y p ) = S r a ( x, y ) b ( x, y ) p − b ( x, y ) . lgorithm Nth coefficient via diagonals
Input
A polynomial E ( x, y ) ∈ F p [ x, y ] with E (0 ,
0) = 0and E y (0 , = 0, and an integer N ≥ Output
The N th coefficient f N of the unique formal powerseries f ∈ F p [[ x ]] solution of E ( x, f ( x )) = 0, f (0) = 0.1. Compute a ( x, y ) = yE y ( xy, y ) , b ( x, y ) = E ( xy, y ) /y . . At this point f = D ( a/b ).2. Set d x = max(deg x a, deg x b ), d y = max(deg y a, deg y b ), L = [1 , , · · · , C the column vector expressing a ( x, y ) in the canonical basis of F p [ x, y ] d x ,d y .3. Compute B = b p − .4. for r from 0 to p − T r x i y j = P n,m A r ; i,j ; n,m x n y m and store the result in the matrix A r return f N = LA N ‘ · · · A N Cb (0 , for N = ( N ‘ . . . N ) p . Algorithm 3. N th coefficient, via diagonals. To put it plainly S r ab = S r ab p − b . This yields an action on bivariate polynomials v of what wemay call pseudo-section operators, defined by T r v = S r vb p − , ≤ r < p, hence( S r · · · S r k ) (cid:16) ab (cid:17) = ( T r · · · T r k )( a ) b , ≤ r i < p. At this point, it is not difficult to see that the vector sub-space F p [ x, y ] d x ,d y , where d x and d y are the maximal partialdegrees of a and b defined in (13), is stable under the pseudo-section operators. The supports of b and of polynomials v in F p [ x, y ] d x ,d y belong to a small rectangle [0 , d x ] × [0 , d y ].Raising b to the power p − p − v results in a rectangle p times larger than the small rectangle. The pseudo-sectionoperators, essentially based on a Euclidean division by p ,send the large rectangle back into the small rectangle.As a consequence we have at our disposal a new linearrepresentation (cid:0) L , ( A r ) ≤ r
We return toEx. 5 with p = 7. We find a = − y − xy + 3 y + 3 xy + 2 x y and b = − x − xy + y + xy + x y , hence d x = 2, d y = 4. Next wecompute B = b p − and we build the linear representation. Themain point is the size of the representation, namely (1 + d x )(1 + · · · · · · · · · · ·· · · · · · · · · · · · · · ·· · · · ·· · · · · · · · · ·· · · · ·· · · · ·· · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · ·· · · · ·· · · · ·· · · · · · · · · ·· · · · ·· · · · ·· · · · ·· · · · · · · · · ·· · · · ·· · · · ·· · · · ·· · · · · Figure 2.
The matrix A in Ex. 7. d y ) = 15. This is ridiculously small as compared to the valueobtained in Ex. 5. Even more importantly, the size remains thesame if we change the prime number.Although it is not necessary that we see the linear representationwith our human eyes, it is quite normal that we watch it if onlyto control the computations. The matrices A r , whose size is(1 + d x )(1 + d y ), have indices which are pairs of pairs. We canview them as square matrices of size 1 + d x whose elements aresquare blocks of size 1 + d y . The coordinates over x i y j of theimage T r x n y m is in the block with indices ( i, n ) at place ( j, m ).We only display the matrix A (the zero coefficients are replacedby dots), in Fig. 2. It gives us for example the value of T xy .Because the exponent of x is 1, we look at the column of blocksnumber 1 (the second one) and because the exponent of y is 2 welook at the column number 2 (the third one) in the matrices of thiscolumn (red column). In the first block, which corresponds to x ,we see a coefficient 1 in row number 1, hence a term x y = y . Inthe second block, we see a coefficient 1 in row number 1 hence aterm x y = xy and a coefficient 6 in row number 2 hence a term6 x y = 6 xy . We conclude T xy = y + xy + 6 xy . All the needed information to build the linear representa-tion is contained in the polynomial B = b p − . Indeed, whenwe compute the image of an element x i y j of the canonicalbasis by the pseudo-section operator T r , we first multiply B by x i y j ; this transformation is merely a translation of theexponents, that requires no arithmetic operation. Next, weextract the monomials whose exponents are congruent to r modulo p , again without any computation in F p . The con-clusion is that, apart the final computation to obtain thevalue f N , the cost comes from the computation of B = b p − .This can be done using binary powering and Kronecker’s sub-stitution. Since B has partial degrees at most pd x in x andat most pd y in y , the arithmetic complexity is ˜ O ( p d x d y ).Gathering the study of the computation and the precom-putation, and using Eq. (13), we obtain the following result. Theorem Let E be in F p [ x, y ] h,d such that E (0 ,
0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be its unique root with f (0) = 0 . Algorithm 3 computes the N th coefficient f N of f in O ( h ( d + h ) log N ) + ˜ O ( p h ( d + h )) operations in F p . The striking points are the decreasing of the exponent of p igure 3. The useful coefficients of B = b p − are located inthe diagonal strips. See Ex. 8 for explanations. from 3 d to 2, and the replacement of the multiplicative com-plexity p d × log N of Theorem 5 by the additive complexity p + log N . This is already a tremendous improvement. Inthe next section, we will present a further improvement.
4. A FASTER ALGORITHM
We will improve the algorithm of the previous section inorder to decrease the exponent of the prime p from 2 to 1. As pointed out in Sec. 3.3, all the information needed tobuild the square matrices A , . . . , A p − of the linear repre-sentation is contained in the large power B = b p − . Howeverbuilding the A i ’s does not require the whole polynomial B .To see this, assume 0 ≤ α ≤ ( p − d x and 0 ≤ β ≤ ( p − d y are such that the monomial x α y β of B contributes to one ofthe matrices A r . Since A r ; i,j ; n,m is the coefficient of x n y m in T r ( x i y j ) = S r ( x i y j B ), this means that i + α ≡ j + β mod p for some 0 ≤ i ≤ d x and 0 ≤ j ≤ d y . As a consequence,the difference exponent δ = β − α necessarily belongs to[ − d y , d x ]+ p Z . For large p , this means that only a fraction 1 /p of B is useful. The aim of this section is to take an algorithmicadvantage of this remark. Example 8 (A cubic equation, cont.).
The previousphenomenon is exemplified in Fig. 3 for the equation E ( x, y ) = x − (1 + x ) y + x y + (1 + x ) y = 0, already encountered inEx. 5 and 7. We use p = 11 on the left-hand side, and p = 109on the right-hand side. The polynomial B is represented by itsmonomial support (in black). Besides, the points ( α, β ) definedby α = pk + r − i , β = p‘ + r − j , with 0 ≤ i ≤ d x , 0 ≤ j ≤ d y are colored, with a color which goes from blue to red when r goesfrom 0 to p − p small ( p = 11), the strips cover almostthe whole rectangle, but for p moderately large ( p = 109) theproportion of useful coefficients in B becomes much smaller. To emphasize the difference exponent δ = β − α , we performa change of variables x := x/t , y := t , which producesLaurent polynomials with respect to t , namely b ( x/t, t ) and B ( x/t, t ) = b ( x/t, t ) p − = X δ π δ ( x ) t δ . By construction, the coefficient π δ ( x ) is a polynomial, namelythe δ -diagonal P i B i,i + δ x i of B ( x, y ) = P i,j B i,j x i y j .Let δ − be the opposite of the lowest degree with respectto t of b ( x/t, t ), and δ + be the degree with respect to t of b ( x/t, t ). An immediate bound is δ − ≤ d x and δ + ≤ d y .By the previous discussion, all the information needed tobuild up the matrices of the linear representation is containedin the polynomials π δ ( x ), for δ in∆ := ([ − d y , d x ] + p Z ) ∩ [(1 − p ) δ − , ( p − δ + ] . This set is a union of small intervals of length d x + d y + 1regularly spaced by p and within [(1 − p ) d x , ( p − d y ]. Our aim is to show that the collection of π δ ( x ) for δ ∈ ∆can be computed in complexity quasi-linear in p . The startingpoint is to write B ( x/t, t ) = b ( x/t, t ) p b ( x/t, t ) = b ( x p /t p , t p ) b ( x/t, t ) . Here b ( x p /t p , t p ) is obtained by a mere rewriting of b ( x/t, t ),hence at no cost, while 1 /b ( x/t, t ) is a rational function. Tobe more concrete, we denote b ( x/t, t ) = δ + X v = − δ − b v ( x ) t v , b ( x/t, t ) = X u ≥ δ − c u ( x ) t u , where the coefficients b v ( x ) are polynomials in x and thecoefficients c v ( x ) are rational functions in x .Now, the polynomial π δ ( x ) is expressed by a convolution π δ ( x ) = X u + pv = δ c u ( x ) b v ( x p ) . (14)The constraint u + pv = δ implies u ≡ δ mod p . Therefore, itis sufficient to compute the c δ ( x ) for all δ ∈ ∆ = pδ − + ∆.To do this efficiently, we will exploit the fact that thesequence of rational functions ( c u ( x )) u satisfies a linear re-currence with coefficients in F p [ x ]. The coefficients of therecurrence relation are those of the polynomial t δ − b ( x/t, t ).We are facing the following issue: since we need the valuesof c δ ( x ) for indices of order δ = Θ( p ), we can not unroll therecurrence relation directly, otherwise the complexity (andactually the mere arithmetic size of the rational functions c i ( x ) , i < u , computed along the way) becomes quadratic in p .Fortunately, one can compute the N th term of a recurrentsequence without computing all the previous coefficients.The key is the following lemma. Lemma
9. [14]
Let R be a commutative ring and a ( t ) /b ( t ) be a rational function in R ( t ) with b (0) invertible in R . Let d be the degree of b ( t ) and b ∗ ( t ) = t d b (1 /t ) be its reciprocal. Forany integer N ≥ , the coefficients c N , c N +1 , . . . , c N + d − ofthe formal power series expansion of a ( t ) /b ( t ) = P n ≥ c n t n are the coefficients of t d − , . . . , t d , t d − in the product ( t N mod b ∗ ( x )) × ( c d − + · · · + c t d − ) . Fiduccia’s algorithm [14] relies on Lemma 9 and computes t N mod b ∗ ( t ) by binary powering. As a consequence, the N thterm of a linear recurrent sequence of order d is computed in˜ O ( d log N ) operations in R . In our setting, we use Fiduccia’salgorithm for R = F p ( x ). Theorem
Let b be in F p [ x, y ] d x ,d y and let δ ∈ Z . Al-gorithm 4 computes the δ -diagonal π δ ( x ) = P i B i,i + δ x i of B = b p − in ˜ O ( d x ( d x + d y ) p ) operations in F p . roof. By (14), to compute π δ it is sufficient to have thecoefficients c δ − pv for − d x ≤ v ≤ d y . The coefficient c N hasthe form p ( x ) /q ( x ) N +1 , where deg p = O ( Nd x ), deg q ≤ d x .The most expensive step in Fiduccia’s algorithm for c N iscomputing t N modulo a polynomial in F p [ x, y ] d x ,d x + d y . ByLemma 6, this can be done in time T N = ˜ O ( Nd x ( d x + d y )).Summing the costs T N over all N ’s of the form δ − pv with − d x ≤ v ≤ d y concludes the proof.Algorithm PartialPowering
Input
A polynomial b in F p [ x, y ] and δ ∈ Z . Output
The δ -diagonal of B = b p − , that is P i B i,i + δ x i . for v from val t b ( x/t, t ) to deg t b ( x/t, t ) do Compute c δ − pv ( x ) = [ t δ − pv ] b ( x/t,t ) via Fiduccia’s algo return π δ ( x ) = P u + pv = δ c u ( x ) b v ( x p ) Algorithm 4. δ -diagonal of a bivariate polynomial power. We finally combine the results in Sections 4.1 and 4.2with the ones of Sec. 3, and modify Algorithm 3 accordingly,by replacing its Step 3 with the partial powering routine(Algorithm 4). This yields our main result.
Theorem
Let E be in F p [ x, y ] h,d satisfy E (0 ,
0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be its unique rootwith f (0) = 0 . One can compute the coefficient f N of f in O ( h ( d + h ) log N ) + ˜ O ( h ( d + h ) p ) operations in F p . Proof.
Only the cost of the precomputation changes.Instead of computing the whole power B = b p − , we onlycompute its δ -diagonals π δ ( x ), for δ ∈ ∆ . Since | ∆ | = | ∆ | ≤ ( d x + d y + 1) , we conclude using Theorem 10. Example 9 (Catalan).
Let E ( x, y ) = y − x − y be in F p [ x, y ], with root f = P n ≥ C n x n +1 , where C n is n +1 (cid:0) nn (cid:1) .Then f is the diagonal of a/b with a = y (1 − y ) and b = 1 − x − y .Thus d x = 1 , d y = 2, and the linear representation has size 6. Theneeded information to compute the 6 × A , . . . , A p − consists in the δ -diagonals of B = b p − for δ ∈ ∆ := ([ − ,
1] + p Z ) ∩ [1 − p, p −
1] = { − p, − , − , , , p − , p − } . Writing B ( x/t, t ) = (1 − x/t − t ) p − = P p − δ =1 − p π δ ( x ) t δ , the goal is tocompute the polynomials π δ ( x ) for δ ∈ ∆. This is done via theformula B ( x/t, t ) = (1 − x p /t p − t p ) / (1 − x/t − t ) which gives π δ ( x ) = − c δ + p ( x ) x p + c δ ( x ) − c δ − p ( x ). Therefore, it is sufficientto compute the rational functions c δ ( x ) for all δ ∈ ∆ = ∆ + p .This is done using Fiduccia’s algorithm, which exploits the factthat the c δ ’s satisfy the recurrence xc k +2 ( x ) = c k +1 ( x ) − c k ( x ).For instance, π ( x ) = − x p c p ( x ), where c p ( x ) is obtained as thecoefficient of t in the product ( − /x − t/x ) · ( t p mod − xt + t − π ( x ) amounts to checking the amusing identity: (cid:2) t (cid:3) (cid:0) t p mod − xt + t − (cid:1) = p − X ‘ =0 (cid:0) ‘‘ (cid:1) x ‘ +1 − p in F p [ x ] . We have implemented the algorithms described in thissection in
Maple . Our prototype is available at the urlhttp://specfun.inria.fr/dumas/Research/AlgModp/.The implementation permits getting an idea of the feasibility of the approach using diagonals. For instance, in the case ofthe quartic equation in Ex. 6, with p = 9001 instead of p = 11,the algorithm spends 142 seconds on the precomputation,then computes the N -th coefficient for N = 10 , , resp.10 , , resp. 10 , , in: 3, resp. 25, resp. 344 seconds.(Timings on a personal laptop, Intel Core i5, 2.8 GHz, 3MB.)In a forthcoming article, we plan to extend our algorithmsto all finite fields, and to improve the complexity results.
5. REFERENCES [1] J.-P. Allouche and J. Shallit. The ring of k -regular sequences. Theoret. Comput. Sci. , 98(2):163–197, 1992.[2] J.-P. Allouche and J. Shallit.
Automatic sequences . CambridgeUniversity Press, 2003. Theory, applications, generalizations.[3] A. Bostan, F. Chyzak, G. Lecerf, B. Salvy, and É. Schost.Differential equations for algebraic functions. In
ISSAC’07 ,pages 25–32. ACM Press, 2007.[4] A. Bostan, P. Gaudry, and É. Schost. Linear recurrences withpolynomial coefficients and application to integer factorizationand Cartier-Manin operator.
SIAM Journal on Computing ,36(6):1777–1806, 2007.[5] D. G. Cantor and E. Kaltofen. On fast multiplication ofpolynomials over arbitrary algebras.
Acta Inform. ,28(7):693–701, 1991.[6] G. Christol. Éléments algébriques. In
Groupe de travaild’Analyse Ultra-métrique (1ère année : 1973/74) , number 14,10 pp. Secrétariat Mathématique, Paris, 1974.[7] G. Christol. Ensembles presque periodiques k -reconnaissables. Theoret. Comput. Sci. , 9(1):141–145, 1979.[8] G. Christol, T. Kamae, M. Mendès France, and G. Rauzy.Suites algébriques, automates et substitutions.
Bull. Soc. Math.France , 108(4):401–419, 1980.[9] D. V. Chudnovsky and G. V. Chudnovsky. On expansion ofalgebraic functions in power and Puiseux series. I.
J.Complexity , 2(4):271–294, 1986.[10] D. V. Chudnovsky and G. V. Chudnovsky. Approximations andcomplex multiplication according to Ramanujan. In
Ramanujanrevisited (Urbana-Champaign, 1987) , pages 375–472. 1988.[11] D. V. Chudnovsky and G. V. Chudnovsky. Computer algebra inthe service of mathematical physics and number theory. In
Computers in mathematics (Stanford, 1986) , volume 125 of
Lecture Notes in Pure and Appl. Math. , pages 109–232. 1990.[12] L. Comtet. Calcul pratique des coefficients de Taylor d’unefonction algébrique.
Enseignemen Math. , 10:267–270, 1964.[13] P. Dumas.
Récurrences mahlériennes, suites automatiques,études asymptotiques . PhD thesis, Univ. Bordeaux I, 1993.Available at https://tel.archives-ouvertes.fr/tel-00614660.[14] C. M. Fiduccia. An efficient formula for linear recurrences.
SIAM J. Comput. , 14(1):106–112, 1985.[15] H. Furstenberg. Algebraic functions over finite fields.
J.Algebra , 7:271–277, 1967.[16] F. L. Gall. Powers of tensors and fast matrix multiplication. In
ISSAC’14 , pages 296–303, 2014.[17] J. von zur Gathen and J. Gerhard.
Modern computer algebra .Cambridge University Press, third edition, 2013.[18] D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomialmultiplication over finite fields. http://arxiv.org/abs/1407.3361,2014.[19] H. T. Kung and J. F. Traub. All algebraic functions can becomputed fast.
J. Assoc. Comput. Mach. , 25(2):245–260, 1978.[20] J. C. P. Miller and D. J. Spencer Brown. An algorithm forevaluation of remote terms in a linear recurrence sequence.
Computer Journal , 9:188–190, 1966.[21] K. Nishioka.
Mahler Functions and Transcendence . Number1631 in Lecture Notes in Mathematics. Springer, Berlin, 1996.[22] E. Rowland and R. Yassawi. Automatic congruences fordiagonals of rational functions.
J. Théor. Nombres Bordeaux ,27(1):245–288, 2015.[23] A. Schönhage. Schnelle Multiplikation von Polynomen überKörpern der Charakteristik 2.
Acta Informat. , 7:395–398, 1977.[24] A. Storjohann. High-order lifting and integrality certification.