[PDF] Fast Computation of the Nth Term of an Algebraic Series over a Finite Prime Field

Abstract

We address the question of computing one selected term of an algebraic power series. In characteristic zero, the best algorithm currently known for computing the N th coefficient of an algebraic series uses differential equations and has arithmetic complexity quasi-linear in N − − √ . We show that over a prime field of positive characteristic p , the complexity can be lowered to O(logN) . The mathematical basis for this dramatic improvement is a classical theorem stating that a formal power series with coefficients in a finite field is algebraic if and only if the sequence of its coefficients can be generated by an automaton. We revisit and enhance two constructive proofs of this result for finite prime fields. The first proof uses Mahler equations, whose sizes appear to be prohibitively large. The second proof relies on diagonals of rational functions; we turn it into an efficient algorithm, of complexity linear in logN and quasi-linear in p .

Full PDF

FFast Computationof the Nth Term of an Algebraic Seriesover a Finite Prime Field ∗ Alin Bostan

Inria (France) [email protected] Gilles Christol

IMJ (France) [email protected] Philippe Dumas

Inria (France) [email protected]

ABSTRACT

We address the question of computing one selected term ofan algebraic power series. In characteristic zero, the bestalgorithm currently known for computing the N th coeﬃcientof an algebraic series uses diﬀerential equations and hasarithmetic complexity quasi-linear in √ N . We show thatover a prime ﬁeld of positive characteristic p , the complexitycan be lowered to O (log N ). The mathematical basis forthis dramatic improvement is a classical theorem statingthat a formal power series with coeﬃcients in a ﬁnite ﬁeldis algebraic if and only if the sequence of its coeﬃcients canbe generated by an automaton. We revisit and enhance twoconstructive proofs of this result for ﬁnite prime ﬁelds. Theﬁrst proof uses Mahler equations, whose sizes appear to beprohibitively large. The second proof relies on diagonals ofrational functions; we turn it into an eﬃcient algorithm, ofcomplexity linear in log N and quasi-linear in p . CCS Concepts • Computing methodologies → Algebraic algorithms;

Keywords algebraic series; ﬁnite ﬁelds; Mahler equations; diagonals; p -rational series; section operators; algebraic complexity

1. INTRODUCTION

One of the most diﬃcult questions in modular computationsis the complexity of computations mod p for a large prime p of coeﬃcients in the expansion of an algebraic function. D.V. Chudnovsky & G.V. Chudnovsky, 1990 [11].

Context.

Algebraic functions are ubiquitous in all branchesof pure and applied mathematics, notably in algebraic geome-try, combinatorics and number theory. They also arise at theconﬂuence of several ﬁelds in computer science: functionalequations, automatic sequences, complexity theory. From acomputer algebra perspective, a fundamental question is theeﬃcient computation of power series expansions of algebraic ∗ We warmly thank the referees for their very helpful comments.

ISSAC’16, July 19–22, 2016, Waterloo, ON, Canada

ACM ISBN 978-1-4503-4380-0/16/07.DOI: http://dx.doi.org/10.1145/2930889.2930904 functions. We focus on the particular question of computing one selected term of an algebraic power series f whose coeﬃ-cients belong to a ﬁeld of positive characteristic p . Beyond itsrelevance to complexity theory, this problem is important inapplications to integer factorization and point-counting [4].

Setting.

More precisely, we assume in this article that theground ﬁeld is the prime ﬁeld F p and that the power series f is (implicitly) given as the unique solution in F p [[ x ]] of E ( x, f ( x )) = 0 , f (0) = 0 , where E is a polynomial in F p [ x, y ] that satisﬁes E (0 ,

0) = 0 and E y (0 , = 0 . (Here, and hereafter, E y stands for the partial derivative ∂E∂y .)Given as input the polynomial E and an integer N > N thcoeﬃcient f N of f = P i f i x i . The eﬃciency is measured interms of number of arithmetic operations ( ± , × , ÷ ) in theﬁeld F p , the main parameters being the index N , the prime p and the bidegree ( h, d ) of E with respect to ( x, y ).In the particular case d = 1, the algebraic function f isactually a rational function, and thus the N th coeﬃcient ofits series expansion can be computed in O (log N ) operationsin F p , using standard binary powering techniques [20, 14].Therefore, it will be assumed in all that follows that d > Previous work and contribution.

The most straightfor-ward method for computing the coeﬃcient f N of the alge-braic power series f proceeds by undetermined coeﬃcients.Its arithmetic complexity is O ( N d ). Kung and Traub [19]showed that the formal Newton iteration can accelerate thisto ˜ O ( dN ). (The soft-O notation ˜ O ( · ) indicates that polyloga-rithmic factors are omitted.) Both methods work in arbitrarycharacteristic and compute f N together with all f i , i < N .In characteristic zero, it is possible to compute the coef-ﬁcient f N faster, without computing all the previous ones.This result is due to the Chudnovsky brothers [10] andis based on the classical fact that the coeﬃcient sequence( f n ) n ≥ satisﬁes a linear recurrence with polynomial coef-ﬁcients [12, 9, 3]. Combined with baby steps/giant stepstechniques, this leads to an algorithm of complexity quasi-linear in √ N . Except for the very particular case of rationalfunctions ( d = 1), no faster method is currently known.Under restrictive assumptions, the baby step/giant stepalgorithm can be adapted to the case of positive charac-teristic [4]. The main obstacle is the fact that the linearrecurrence satisﬁed by the sequence ( f n ) n ≥ has a leadingcoeﬃcient that may vanish at various indices. In the spiritof [4, §8], p -adic lifting techniques could in principle be used,but we are not aware of any sharp analysis of the suﬃcient p - a r X i v : . [ c s . S C ] M a y dic precision in the general case. Anyways, the best that canbe expected from this method is a cost quasi-linear in √ N .We attack the problem from a diﬀerent angle. Our startingpoint is a theorem due to the second author in the late1970s [2, Th. 12.2.5]. It states that a formal power serieswith coeﬃcients in a ﬁnite ﬁeld is algebraic if and only if thesequence ( f n ) n ≥ of its coeﬃcients can be generated by a p -automaton, i.e., f N is the output of a ﬁnite-state machinetaking as input the digits of N in base p . This implicitlycontains the roots of a log N -method for computing f N , butthe size of the p -automaton is at least p ( d + h ) , see e.g. [22].In its original version [7] the theorem was stated for F ,but the proof extends mutatis mutandis to any ﬁnite ﬁeld. Adiﬀerent proof was given by Christol, Kamae, Mendès-Franceand Rauzy [8, §7]. Although constructive in essence, theseproofs do not focus on computational complexity aspects.Inspired by [1] and [13], we show that each of them leadsto an algorithm of arithmetic complexity O (log N ) for thecomputation of f N , after a precomputation that may becostly for p large. On the one hand, the proof in [8] relieson the fact that the sequence ( f n ) n ≥ satisﬁes a divide-and-conquer recurrence. However, we show (Sec. 2) that the sizeof the recurrence is polynomial in p d , making the algorithmuninteresting even for very moderate values of p . On theother hand, the key of the proof in [7] is to represent f asthe diagonal of a bivariate rational function. We turn it(Sections 3–4) into an eﬃcient algorithm that has complexity O (log N ) after a precomputation whose cost is ˜ O ( p ) only.To our knowledge, the only previous explicit occurrence of alog N -type complexity for this problem appears in [11, p. 121],which announces the bound O ( p · log N ) but without proof. Structure of the paper.

In Sec. 2, we propose an algo-rithm that computes the N th coeﬃcient f N using Mahlerequations and divide-and-conquer recurrences. Sec. 3 is de-voted to the study of a diﬀerent algorithm, based on theconcept of diagonals of rational functions. We conclude inSec. 4 with the design of our main algorithm. Cost measures.

We use standard complexity notation. Thereal number ω > n × n matrices with entries in F p in O ( n ω ) operations in F p .The best bound currently known is ω < . F p [ x ] d , the setof polynomials of degree at most d in F p [ x ], can be performedin ˜ O ( d ) operations: addition, multiplication, division, etc .The key to these results is a divide-and-conquer approachcombined with fast polynomial multiplication [23, 5, 18]. Ageneral reference on fast algebraic algorithms is [17].

2. USING MAHLER EQUATIONS

We revisit, from an algorithmic point of view, the proofin [8, §7] of the fact that the coeﬃcients of an algebraic powerseries f ∈ F p [[ x ]] can be recognized by a p -automaton. Start-ing from an algebraic equation for f = P n f n x n , we ﬁrstcompute a Mahler equation satisﬁed by f , then we derive anappropriate divide-and-conquer recurrence for its coeﬃcientsequence ( f n ) n , and use it to compute f N eﬃciently. Wewill show that, although the complexity of this method isvery good (i.e., logarithmic) with respect to the index N , thecomputation cost of the recurrence, and actually its meresize, are extremely high (i.e., exponential) with respect tothe algebraicity degree d . Mahler equations are functional equations which use theMahler operator M p , or M for short, acting on formal powerseries by substitution: Mg ( x ) = g ( x p ). Their power seriessolutions are called Mahler series . In characteristic 0, thereis no a priori link between algebraic equations and Mahlerequations; moreover, a series which is at the same timealgebraic and Mahler is a rational function [21, §1.3].By contrast, over F p , a Mahler equation is nothing but analgebraic equation, because of the Frobenius endomorphismthat permits writing g ( x p ) = g ( x ) p for any g ∈ F p [[ x ]]. Hencea Mahler series in F p [[ x ]] is an algebraic series. Conversely,an algebraic series is Mahler, since if f is algebraic of degree d then the ( d + 1) power series f , Mf = f p , . . . , M d f = f p d are linearly dependent over F p ( x ).Concretely, the successive powers f p k (0 ≤ k ≤ d ) are ex-pressed as linear combinations of 1, f , . . . , f d − over F p ( x ),and any dependence relation between them delivers a non-trivial Mahler equation with polynomial coeﬃcients c ( x ) f ( x ) + c ( x ) f ( x p ) + · · · + c K ( x ) f ( x p K ) = 0 , (1)whose order K is at most d . Example 1 (A toy example).

Consider E = x + y − y in F [ x, y ]. Let f = − x − x + 2 x − x + 2 x + · · · be theunique solution in x F [[ x ]] of E ( x, f ( x )) = 0. The expressions of f , f , f , f as linear combinations of 1, f , f with coeﬃcientsin F ( x ) give the columns of the matrix  x − x + x + x x − x + · · · x − x + 1 x − x + · · · x − x + x + x x − x + · · ·  . The ﬁrst three columns are linearly dependent and lead to theMahler equation x (1 − x − x ) f ( x ) − (1 + x − x ) f ( x ) + f ( x ) = 0 . (2) A Mahler equation instantly translates into a recurrenceof divide-and-conquer type, in short a DAC recurrence. Sucha recurrence links the value of the index- n coeﬃcient f n with some values for the indices n/p , n/p . . . , and someshifted indices. The translation is absolutely simple: a term x s f ( x ) in equation (1) translates into f n − s ; more generally,a term x s f ( x q ) becomes f ( n − s ) /q , with the convention thata coeﬃcient f ν is zero if ν is not a nonnegative integer. Example 2 (A binomial case).

Let p > f = x + x + · · · be the algebraic series in x F p [[ x ]]solution of E ( x, y ) = x + (1 + y ) p − −

1. The rewriting x + (1 + f ) p − − x + 1 + f p f − x − f ) + (1 + f p )1 + f yields f p = − x + (1 − x ) f and f p = − x p + (1 − x p ) f p ; hence f p = (1 − x ) p +1 − − x ) p +1 f . The situation is highly non-generic, since all powers f p k are expressible as linear combinationsof 1 and f only. This delivers the second-order Mahler equation( x p − − x p ) f ( x ) − (1 + x p − − x p ) f ( x p ) + f ( x p ) = 0 , which translates into a recurrence on the coeﬃcients sequence f n − p +1 − f n − p − f np − f n − p +1 p + f n − pp + f np = 0 . (3)It is possible to make the recurrence more explicit by consideringthe p cases according to the value of the residue of n modulo p . .2 Nth coefﬁcient via a DAC recurrence The DAC recurrence relation attached to the Mahler equa-tion (1), together with enough initial conditions, can be usedto compute the N th coeﬃcient f N of the algebraic series f .The complexity of the resulting algorithm is linear with re-spect to N . The reason is that the coeﬃcient c ( x ) in (1)generally has more than one monomial, and as a consequence,all the coeﬃcients f i , i < N are needed to compute f N . Example 3 (A binomial case, cont.).

We specializeEx. 2 by taking p = 7 and compute f N with N = 100. Applying (3)to n = 106, we need to know the value for n = 99, next for n = 98, n = 15, n = 14, and also for n = 97, . . . , n = 91, n = 14, n = 13, n = 2. In the end, it appears that all values of f n for the indices n smaller than N = 100 are needed to compute f N . It is possible to decrease dramatically the cost of thecomputation of f N , from linear in N to logarithmic in N .The idea is to derive from (1) another Mahler equation withthe additional feature that its trailing coeﬃcient c ( x ) is 1.To do so, it suﬃces to perform a change of unknown series.Putting f ( x ) = c ( x ) g ( x ), we obtain the Mahler equation g ( x ) + c ( x ) c ( x ) p − g ( x p ) + · · · + c K ( x ) c ( x ) p K − g ( x p K ) = 0 . (4)Obviously this approach assumes that c is not zero. But, asproved in [2, Lemma 12.2.3], this is the case if (1) is assumedto be a minimal-order Mahler equation satisﬁed by f .The series g ( x ) is no longer a power series, but a Laurentseries in F p (( x )). We cut it into two parts, its negativepart g − ( x ) in F p [ x − ] and its nonnegative part h ( x ) in F p [[ x ]]: g − ( x ) = X n< g n x n , h ( x ) = X n ≥ g n x n . Let us rewrite Eq. (4) in the compact form L ( x, M ) g ( x ) =0, where L ( x, M ) is a skew polynomial in the variable x andin the Mahler operator M . Plugging g ( x ) = g − ( x )+ h ( x ) in ityields three terms. The ﬁrst is the power series L ( x, M ) h ( x ).The second is the nonnegative part − b ( x ) of L ( x, M ) g − ( x ),in F p [ x ]. The third is the negative part of L ( x, M ) g − ( x ),and it is zero because it is the only one which belongsto x − F p [ x − ]. Eq. (4) is rewritten as a new (inhomoge-neous) Mahler equation L ( x, M ) h ( x ) = b ( x ), namely h ( x ) + c ( x ) c ( x ) p − h ( x p ) + · · · + c K ( x ) c ( x ) p K − h ( x p K ) = b ( x ) . (5) Example 4 (A toy example, cont.).

The change ofseries f ( x ) = x (1 − x − x ) h ( x ) in (2) gives the new equation h ( x ) − x (1 − x )(1 + x )(1 + x + 2 x )(1 − x − x ) h ( x )+ x (1 − x − x ) h ( x ) = − x − x + x + 2 x + · · · + x − x − x and a recurrence h n = h n − + 2 h n − + · · · − h n − + · · · , while f n is given by f n = h n − − h n − − h n − for n ≥ . The right-hand side b ( x ) of this new inhomogeneous equation isobtained as the nonnegative part of − ( c c g − + c c g − ), where c = x (1 − x − x ), c = 2 x − x − c = 1 are the coeﬃcientsof Eq. (2), and where g − = − x − − x − , the negative part of f/c , can be determined starting from ( f mod x ). To compute the coeﬃcient f N for N = 1251, this approach onlyrequires the computation of thirteen terms of the sequence h n ,namely for n ∈ { , , , , , , , , , , , , } .This number of terms behaves like 3 log p N , and compares well tothe use of a recurrence like (3), which requires N terms. At this point, it is intuitively plausible that the N th coeﬃ-cient f N can be computed in arithmetic complexity O (log N ):to obtain f N , we need a few values h N and these ones are es-sentially obtained from a bounded number of h N/p , h N/p . . . . This plausibility can be strengthened and made clearerwith the introduction of the section operators (sometimescalled

Cartier operators , or decimation operators ) and theconcept of p -rational series . Definition The section operators S , . . . , S p − , withrespect to the radix p , are deﬁned by S r X n ≥ f n x n = X k ≥ f pk + r x k , for ≤ r < p. (6)In other words, there is a section operator for each digit ofthe radix- p numeration system and the r th section operatorextracts from a given series its part associated with theindices congruent to the digit r modulo p . These operatorsare linear and well-behaved with respect to the product: S r [ f ( x ) g ( x )] = X s + t ≡ r mod p x b s + tp c S s f ( x ) S t g ( x ) . In particular, for g ( x ) = h ( x p ) the previous formula becomes S r [ f ( x ) h ( x p )] = S r [ f ( x )] × h ( x ) , (7)because of the obvious relationships with the Mahler operator S M = I F p [[ x ]] , S r M = 0 for r >

0. The action of the sectionoperators can be extended to the ﬁeld F p (( x )) of formalLaurent series and the same properties apply to the extendedoperators, which we denote in the same way.The section operators permit to express the coeﬃcient h N of a formal power series h ( x ) using the radix- p digits of N . Lemma Let h be in F p [[ x ]] and let N = ( N ‘ · · · N N ) p be the radix- p expansion of N . Then h N = ( S N ‘ · · · S N S N h )(0) . (8) Proof.

We ﬁrst apply the section operator S N to theseries h ( x ) associated with the least signiﬁcant digit of N ,that is we start from N and we pick every p th coeﬃcientof h ( x ). This gives S N h ( x ) = h N + h N + p x + h N +2 p x + · · · . Iterating this process with each digit produces the series S N ‘ · · · S N S N h ( x ) = h N + h N + p ‘ +1 x + · · · , whose constant coeﬃcient is h N . Let us return to the algebraic series f ( x ) and its rela-tive h ( x ). The Mahler equation (5) can be rewritten h ( x ) = b ( x ) + a ( x ) h ( x p ) + · · · + a K ( x ) h ( x p K ) . (9)The coeﬃcients b ( x ) and a k ( x ), 1 ≤ k ≤ K , are polynomialsof degrees at most D , say. As a matter of fact, the vectorspace W of linear combinations s ( x ) = a ( x ) + b ( x ) h ( x ) + b ( x ) h ( x p ) + · · · + b K ( x ) h ( x p K )ith a ( x ) and all b k ( x ) in F p [ x ] D is stable under the actionof the section operators. Indeed, from s ( x ) = ( a ( x ) + b ( x ) b ( x )) + ( b ( x ) a ( x ) + b ( x )) h ( x p ) + · · · + ( b ( x ) a K ( x ) + b K ( x )) h ( x p K )we deduce S r s ( x ) = S r ( a ( x ) + b ( x ) b ( x ))+ S r ( b ( x ) a ( x ) + b ( x )) h ( x ) + · · · + S r ( b ( x ) a K ( x ) + b K ( x )) h ( x p K − ) , (10)and the polynomials S r ( a + b b ) and S r ( b a k + b k ), 1 ≤ k ≤ K ,have degrees not greater than 2 D/p ≤ D .The important consequence of this observation is that wehave produced a ﬁnite dimensional F p -vector space W thatcontains h ( x ) and that is stable under the sections ( S r ) ≤ r

Theorem Let E be a polynomial in F p [ x, y ] h,d suchthat E (0 ,

0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be itsunique root with f (0) = 0 . Algorithm 1 computes the N thcoeﬃcient f N of f using ˜ O ( d h p d log N ) operations in F p . To do this, we ﬁrst need a preliminary result that will alsobe useful in Sec. 4. It provides size and complexity boundsfor the remainder of the Euclidean division of a monomialin y by a polynomial in y with coeﬃcients in F p [ x ]. Lemma Let E = P i e i ( x ) y i be a polynomial in F p [ x ][ y ] of degree d in y and at most h in x . Then, for D ≥ d , y D mod E = 1 e D − d +1 d (cid:0) r ( x ) + · · · + r d − ( x ) y d − (cid:1) , where the r i ’s are in F p [ x ] of degree at most h ( D − d + 1) .One can compute the r i ’s using ˜ O ( hdD ) operations in F p . Algorithm

Nth coeﬃcient via Mahler equations

Input

A polynomial E ( x, y ) ∈ F p [ x, y ] with E (0 ,

0) = 0and E y (0 , = 0, and an integer N ≥ Output

The N th coeﬃcient f N of the unique formal powerseries f ∈ F p [[ x ]] solution of E ( x, f ( x )) = 0, f (0) = 0.1. Use Algorithm 2 to compute a minimal-order Mahlerequation (with c = c ,v x v + · · · + c ,d x d , c ,v = 0): L ( x, M ) f ( x ) = c ( x ) f ( x ) + · · · + c K ( x ) f ( x p K ) = 0.2. Deduce an equation L ( x, M ) g ( x ) = 0 (4), with c = 1,satisﬁed by g ( x ) = f ( x ) /c ( x ).3. Compute f ( x ) mod x v by Newton iteration, anddeduce the negative part g − ( x ) of f ( x ) /c ( x ).4. Compute the nonnegative part b ( x ) of − L ( x, M ) g − ( x ) . The nonnegative part h ( x ) of f ( x ) /c ( x ) satisﬁesthe equation L ( x, M ) h ( x ) = b ( x ) (5).5. Get a linear representation ( L, ( A r ) ≤ r

The degree bounds are proved by induction on D .The computation of y D mod E can be performed by binarypowering. The last step dominates the cost of the wholeprocess. It amounts to ﬁrst computing the product of twopolynomials in F p [ x, y ] of degree at most d − y and ofdegree O ( hD ) in x , then reducing the result modulo E . Bothsteps have complexity ˜ O ( hdD ): the multiplication is donevia Kronecker’s substitution [17, Cor. 8.28], the reductionvia an adaptation of the Newton iteration used for the fastdivision of univariate polynomials [17, Th. 9.6].The ﬁrst step of Algorithm 1 is the computation of a Mahlerequation for the algebraic series f . We begin by analyzingthe sizes of this equation, and the cost of its computation. Theorem Let E be a polynomial in F p [ x, y ] h,d suchthat E (0 ,

0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be itsunique root with f (0) = 0 . Algorithm 2 computes a Mahlerequation of minimal-order satisﬁed by f using ˜ O ( d ω hp d ) op-erations in F p . The output equation has order at most d andpolynomial coeﬃcients in F p [ x ] of degree at most dhp d . Proof.

We ﬁrst prove the correctness part. Since the R i ’s all live in the vector space generated by 1 , f, . . . , f d − over F p ( x ), the algorithm terminates and returns a Mahleroperator of order K at most d .At step s we have R s = y p s mod E , thus R s ( x, f ) = f p s .Since P Kk =0 c k R k = 0, this yields L ( f ) = P Kk =0 c k f p k = 0,so L is a Mahler operator for f . It has minimal order,because otherwise R , . . . , R K − would be linearly dependentover F p ( x ). Therefore, the algorithm is correct.By Lemma 6, e p s − d +1 d R s is in F p [ x, y ] of degree in y atmost d − x at most h ( p s − d + 1). Moreover, R , . . . , R K can be computed in time ˜ O ( hdp K ).By Cramer’s rule, the degrees of the c k ’s are bounded by dhp K ≤ dhp d . Testing the rank condition, and solving forthe c k ’s amounts to polynomial linear algebra in size at most d × ( d + 1) and degree at most hp d . This can be done incomplexity ˜ O ( d ω hp d ), using Storjohann’s algorithm [24].lgorithm AlgeqToMahler

Input E ( x, y ) ∈ F p [ x, y ] with E (0 ,

0) = 0 and E y (0 , = 0. Output

A minimal-order monic Mahler operator for theunique f ∈ F p [[ x ]] with E ( x, f ( x )) = 0 and f (0) = 0. R = y ; for s = 1 , , . . . do R s = R ps − mod E . R s = y p s mod E if rank F p ( x ) ( R , R , . . . , R s ) < s + 1 then Solve P s − k =0 c k c s R k = − R s for c , . . . , c s in F p [ x ] return P sk =0 c k M k Algorithm 2.

From algebraic equations to Mahler equations.

We are now ready to prove Theorem 5.

Proof of Theorem 5.

The correctness of Algorithm 1follows from the discussion in Sec. 2.4. By Theorem 7, theoutput of Step 1 is a Mahler equation of order K ≤ d andcoeﬃcients c j of maximum degree H ≤ dhp K . In particular, d ≤ dhp d . The cost of Step 1 is ˜ O ( d ω hp d ). Step 2 consistsin computing the coeﬃcients c j = c j c p j − for 1 ≤ j ≤ K , ofmaximum degree D ≤ Hp K . It can be performed in time˜ O ( KHp K ) = ˜ O ( d hp d ). Step 3 has negligible complexity˜ O ( dv ) = ˜ O ( d hp d ) using the Kung-Traub algorithm [19].Step 4 requires ˜ O ( KHp K ) = ˜ O ( d hp d ) operations. ByProposition 4 the last steps of the algorithm can be doneusing ˜ O (( DK ) log N ) = ˜ O (( d hp d ) log N ) for each N , thusfor a total cost of ˜ O (( d h p d ) log N ). This dominates thewhole computation.Generically, the size bounds in Theorem 5 are quite tight.This is illustrated by the following example. Example 5 (A cubic equation).

Let us consider the al-gebraic equation E ( x, y ) = x − (1 + x ) y + x y + (1 + x ) y = 0in F [ x, y ]. The assumption E y (0 ,

0) = − = 0 is fulﬁlled.We ﬁnd a Mahler equation of order 3, whose coeﬃcients c = x − x + · · · − x + x , c , c , c have respectively heights 45,47, 50, 32. We compute within precision O ( x ) the solution f ( x ) = x − x − x + x + · · · − x + · · · , and the negative part g − ( x ) = x − + x − + x − − x − − x − of g = f/c . We ﬁnd anew operator L ( x, M ) by computing the product L ( x, M ) c ( x )in F [ x, M ] and next a new equation L ( x, M ) y ( x ) = b ( x ) forthe nonnegative part h ( x ) of g ( x ) by computing the polynomial b ( x ) = L ( x, M ) g − ( x ). It appears that the coeﬃcients c = 1, c , c , c and the right-hand side b ( x ) have respectively heights 0, 92,365, 1157, and 1103. We arrive at a linear representation withsize (1 + 3 + 1) × (1 + 1157) = 5790 for p = 3. If we increasethe prime number p , we successively ﬁnd sizes 155190 for p = 5,and 1342725 for p = 7. We do not pursue further because it isclear that such sizes make the computation unfeasible.

3. USING DIAGONALS

To improve the arithmetic complexity with respect to theprime number p , we need a diﬀerent way to represent thealgebraic series f . It is given by Furstenberg’s theorem, andrelies on the concept of diagonal of rational functions . y − (2 − x ) y − y − x y xy − x ) − (1 − x ) y − y − x y xy x y x y x y

5+ 5 x y

6+ 4 x y x y − x y

5+ 2 x y x y

3+ 3 x y

4+ 3 x y − x y

6+ 4 x y x y

3+ 2 x y − x y − x y x y + 3 x y x y − x y − x y

6+ 5 xy x y + 2 x y − x y

5+ 4 xy − y x y + x y − x y − y x y − y x y − xy − y x y − y xy − y y + · · · Figure 1.

A diagonal of a bivariate rational series (see Ex. 6).

Furstenberg’s theorem [15] tells us that every algebraicseries is the diagonal of a bivariate rational function f ( x ) = D a ( x, y ) b ( x, y ) , b (0 , = 0 . (12)This means that f ( x ) = P i c i,i x i , where a/b = P i,j c i,j x i y j .Moreover, under the working assumption E y (0 , = 0, therational function a/b is explicit in terms of E : a ( x, y ) = yE y ( xy, y ) , b ( x, y ) = E ( xy, y ) /y. (Notice that b (0 ,

0) = E y (0 , = 0.) If the polynomial E ( x, y )has degree d and height h , then the partial degrees of a and b are bounded as follows d x = max(deg x a, deg x b ) ≤ h,d y = max(deg y a, deg y b ) ≤ d + h. (13) Example 6 (A quartic equation).

Consider f ( x ) = x + x + x +3 x +5 x +2 x +4 x − x − x − x + · · · , the unique solution in x F [[ x ]] of the algebraic equation E ( x, y ) = − x + (1 + x ) y − (1 + x ) y − y + (1 + x ) y = 0 . Then f is the diagonal of the rational function a/b with a = y − (2 − x ) y − y + (4 − x ) y + 4 xy ,b = (1 − x ) − (1 − x ) y − y + (1 − x ) y + xy . Fig. 1 shows pictorially that f ( x ) is the diagonal of a/b . We will follow the path drawn in [6], and compute the N thcoeﬃcient f N using a/b as intermediate data structure. Bivariate section operators can be deﬁned by mimicking (6).Although they depend on two residues modulo p , we willalways use them with the same residue for both variables;thus we will simply denote them by S r for 0 ≤ r < p . A nicefeature is that they commute with the diagonal operator, S r D = DS r . Together with (12), this property translates the action of thesection operators from the algebraic series f to the rationalfunction a/b . Moreover, property (7) extends to bivariatesection operators and justiﬁes the following computation S r a ( x, y ) b ( x, y ) = S r a ( x, y ) b ( x, y ) p − b ( x, y ) p = S r a ( x, y ) b ( x, y ) p − b ( x p , y p ) = S r a ( x, y ) b ( x, y ) p − b ( x, y ) . lgorithm Nth coeﬃcient via diagonals

Input

A polynomial E ( x, y ) ∈ F p [ x, y ] with E (0 ,

0) = 0and E y (0 , = 0, and an integer N ≥ Output

The N th coeﬃcient f N of the unique formal powerseries f ∈ F p [[ x ]] solution of E ( x, f ( x )) = 0, f (0) = 0.1. Compute a ( x, y ) = yE y ( xy, y ) , b ( x, y ) = E ( xy, y ) /y . . At this point f = D ( a/b ).2. Set d x = max(deg x a, deg x b ), d y = max(deg y a, deg y b ), L = [1 , , · · · , C the column vector expressing a ( x, y ) in the canonical basis of F p [ x, y ] d x ,d y .3. Compute B = b p − .4. for r from 0 to p − T r x i y j = P n,m A r ; i,j ; n,m x n y m and store the result in the matrix A r return f N = LA N ‘ · · · A N Cb (0 , for N = ( N ‘ . . . N ) p . Algorithm 3. N th coeﬃcient, via diagonals. To put it plainly S r ab = S r ab p − b . This yields an action on bivariate polynomials v of what wemay call pseudo-section operators, deﬁned by T r v = S r vb p − , ≤ r < p, hence( S r · · · S r k ) (cid:16) ab (cid:17) = ( T r · · · T r k )( a ) b , ≤ r i < p. At this point, it is not diﬃcult to see that the vector sub-space F p [ x, y ] d x ,d y , where d x and d y are the maximal partialdegrees of a and b deﬁned in (13), is stable under the pseudo-section operators. The supports of b and of polynomials v in F p [ x, y ] d x ,d y belong to a small rectangle [0 , d x ] × [0 , d y ].Raising b to the power p − p − v results in a rectangle p times larger than the small rectangle. The pseudo-sectionoperators, essentially based on a Euclidean division by p ,send the large rectangle back into the small rectangle.As a consequence we have at our disposal a new linearrepresentation (cid:0) L , ( A r ) ≤ r

We return toEx. 5 with p = 7. We ﬁnd a = − y − xy + 3 y + 3 xy + 2 x y and b = − x − xy + y + xy + x y , hence d x = 2, d y = 4. Next wecompute B = b p − and we build the linear representation. Themain point is the size of the representation, namely (1 + d x )(1 +   · · · · · · · · · · ·· · · · ·   · · · · · · · · · ·· · · · ·· · · · ·   · · · · ·· · · · ·· · · · ·· · · · ·· · · · ·  · · · · · · · · · · · · · · · · ·   · · · · · · ·· · · · · · · · ·   · · · · · · · · · · · · ·  · · · · ·· · · · ·· · · · ·· · · · ·· · · · ·   · · · · ·· · · · ·· · · · ·· · · · ·· · · · ·   · · · · ·· · · · ·· · · · ·· · · · ·· · · · ·   Figure 2.

The matrix A in Ex. 7. d y ) = 15. This is ridiculously small as compared to the valueobtained in Ex. 5. Even more importantly, the size remains thesame if we change the prime number.Although it is not necessary that we see the linear representationwith our human eyes, it is quite normal that we watch it if onlyto control the computations. The matrices A r , whose size is(1 + d x )(1 + d y ), have indices which are pairs of pairs. We canview them as square matrices of size 1 + d x whose elements aresquare blocks of size 1 + d y . The coordinates over x i y j of theimage T r x n y m is in the block with indices ( i, n ) at place ( j, m ).We only display the matrix A (the zero coeﬃcients are replacedby dots), in Fig. 2. It gives us for example the value of T xy .Because the exponent of x is 1, we look at the column of blocksnumber 1 (the second one) and because the exponent of y is 2 welook at the column number 2 (the third one) in the matrices of thiscolumn (red column). In the ﬁrst block, which corresponds to x ,we see a coeﬃcient 1 in row number 1, hence a term x y = y . Inthe second block, we see a coeﬃcient 1 in row number 1 hence aterm x y = xy and a coeﬃcient 6 in row number 2 hence a term6 x y = 6 xy . We conclude T xy = y + xy + 6 xy . All the needed information to build the linear representa-tion is contained in the polynomial B = b p − . Indeed, whenwe compute the image of an element x i y j of the canonicalbasis by the pseudo-section operator T r , we ﬁrst multiply B by x i y j ; this transformation is merely a translation of theexponents, that requires no arithmetic operation. Next, weextract the monomials whose exponents are congruent to r modulo p , again without any computation in F p . The con-clusion is that, apart the ﬁnal computation to obtain thevalue f N , the cost comes from the computation of B = b p − .This can be done using binary powering and Kronecker’s sub-stitution. Since B has partial degrees at most pd x in x andat most pd y in y , the arithmetic complexity is ˜ O ( p d x d y ).Gathering the study of the computation and the precom-putation, and using Eq. (13), we obtain the following result. Theorem Let E be in F p [ x, y ] h,d such that E (0 ,

0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be its unique root with f (0) = 0 . Algorithm 3 computes the N th coeﬃcient f N of f in O ( h ( d + h ) log N ) + ˜ O ( p h ( d + h )) operations in F p . The striking points are the decreasing of the exponent of p igure 3. The useful coeﬃcients of B = b p − are located inthe diagonal strips. See Ex. 8 for explanations. from 3 d to 2, and the replacement of the multiplicative com-plexity p d × log N of Theorem 5 by the additive complexity p + log N . This is already a tremendous improvement. Inthe next section, we will present a further improvement.

4. A FASTER ALGORITHM

We will improve the algorithm of the previous section inorder to decrease the exponent of the prime p from 2 to 1. As pointed out in Sec. 3.3, all the information needed tobuild the square matrices A , . . . , A p − of the linear repre-sentation is contained in the large power B = b p − . Howeverbuilding the A i ’s does not require the whole polynomial B .To see this, assume 0 ≤ α ≤ ( p − d x and 0 ≤ β ≤ ( p − d y are such that the monomial x α y β of B contributes to one ofthe matrices A r . Since A r ; i,j ; n,m is the coeﬃcient of x n y m in T r ( x i y j ) = S r ( x i y j B ), this means that i + α ≡ j + β mod p for some 0 ≤ i ≤ d x and 0 ≤ j ≤ d y . As a consequence,the diﬀerence exponent δ = β − α necessarily belongs to[ − d y , d x ]+ p Z . For large p , this means that only a fraction 1 /p of B is useful. The aim of this section is to take an algorithmicadvantage of this remark. Example 8 (A cubic equation, cont.).

The previousphenomenon is exempliﬁed in Fig. 3 for the equation E ( x, y ) = x − (1 + x ) y + x y + (1 + x ) y = 0, already encountered inEx. 5 and 7. We use p = 11 on the left-hand side, and p = 109on the right-hand side. The polynomial B is represented by itsmonomial support (in black). Besides, the points ( α, β ) deﬁnedby α = pk + r − i , β = p‘ + r − j , with 0 ≤ i ≤ d x , 0 ≤ j ≤ d y are colored, with a color which goes from blue to red when r goesfrom 0 to p − p small ( p = 11), the strips cover almostthe whole rectangle, but for p moderately large ( p = 109) theproportion of useful coeﬃcients in B becomes much smaller. To emphasize the diﬀerence exponent δ = β − α , we performa change of variables x := x/t , y := t , which producesLaurent polynomials with respect to t , namely b ( x/t, t ) and B ( x/t, t ) = b ( x/t, t ) p − = X δ π δ ( x ) t δ . By construction, the coeﬃcient π δ ( x ) is a polynomial, namelythe δ -diagonal P i B i,i + δ x i of B ( x, y ) = P i,j B i,j x i y j .Let δ − be the opposite of the lowest degree with respectto t of b ( x/t, t ), and δ + be the degree with respect to t of b ( x/t, t ). An immediate bound is δ − ≤ d x and δ + ≤ d y .By the previous discussion, all the information needed tobuild up the matrices of the linear representation is containedin the polynomials π δ ( x ), for δ in∆ := ([ − d y , d x ] + p Z ) ∩ [(1 − p ) δ − , ( p − δ + ] . This set is a union of small intervals of length d x + d y + 1regularly spaced by p and within [(1 − p ) d x , ( p − d y ]. Our aim is to show that the collection of π δ ( x ) for δ ∈ ∆can be computed in complexity quasi-linear in p . The startingpoint is to write B ( x/t, t ) = b ( x/t, t ) p b ( x/t, t ) = b ( x p /t p , t p ) b ( x/t, t ) . Here b ( x p /t p , t p ) is obtained by a mere rewriting of b ( x/t, t ),hence at no cost, while 1 /b ( x/t, t ) is a rational function. Tobe more concrete, we denote b ( x/t, t ) = δ + X v = − δ − b v ( x ) t v , b ( x/t, t ) = X u ≥ δ − c u ( x ) t u , where the coeﬃcients b v ( x ) are polynomials in x and thecoeﬃcients c v ( x ) are rational functions in x .Now, the polynomial π δ ( x ) is expressed by a convolution π δ ( x ) = X u + pv = δ c u ( x ) b v ( x p ) . (14)The constraint u + pv = δ implies u ≡ δ mod p . Therefore, itis suﬃcient to compute the c δ ( x ) for all δ ∈ ∆ = pδ − + ∆.To do this eﬃciently, we will exploit the fact that thesequence of rational functions ( c u ( x )) u satisﬁes a linear re-currence with coeﬃcients in F p [ x ]. The coeﬃcients of therecurrence relation are those of the polynomial t δ − b ( x/t, t ).We are facing the following issue: since we need the valuesof c δ ( x ) for indices of order δ = Θ( p ), we can not unroll therecurrence relation directly, otherwise the complexity (andactually the mere arithmetic size of the rational functions c i ( x ) , i < u , computed along the way) becomes quadratic in p .Fortunately, one can compute the N th term of a recurrentsequence without computing all the previous coeﬃcients.The key is the following lemma. Lemma

9. [14]

Let R be a commutative ring and a ( t ) /b ( t ) be a rational function in R ( t ) with b (0) invertible in R . Let d be the degree of b ( t ) and b ∗ ( t ) = t d b (1 /t ) be its reciprocal. Forany integer N ≥ , the coeﬃcients c N , c N +1 , . . . , c N + d − ofthe formal power series expansion of a ( t ) /b ( t ) = P n ≥ c n t n are the coeﬃcients of t d − , . . . , t d , t d − in the product ( t N mod b ∗ ( x )) × ( c d − + · · · + c t d − ) . Fiduccia’s algorithm [14] relies on Lemma 9 and computes t N mod b ∗ ( t ) by binary powering. As a consequence, the N thterm of a linear recurrent sequence of order d is computed in˜ O ( d log N ) operations in R . In our setting, we use Fiduccia’salgorithm for R = F p ( x ). Theorem

Let b be in F p [ x, y ] d x ,d y and let δ ∈ Z . Al-gorithm 4 computes the δ -diagonal π δ ( x ) = P i B i,i + δ x i of B = b p − in ˜ O ( d x ( d x + d y ) p ) operations in F p . roof. By (14), to compute π δ it is suﬃcient to have thecoeﬃcients c δ − pv for − d x ≤ v ≤ d y . The coeﬃcient c N hasthe form p ( x ) /q ( x ) N +1 , where deg p = O ( Nd x ), deg q ≤ d x .The most expensive step in Fiduccia’s algorithm for c N iscomputing t N modulo a polynomial in F p [ x, y ] d x ,d x + d y . ByLemma 6, this can be done in time T N = ˜ O ( Nd x ( d x + d y )).Summing the costs T N over all N ’s of the form δ − pv with − d x ≤ v ≤ d y concludes the proof.Algorithm PartialPowering

Input

A polynomial b in F p [ x, y ] and δ ∈ Z . Output

The δ -diagonal of B = b p − , that is P i B i,i + δ x i . for v from val t b ( x/t, t ) to deg t b ( x/t, t ) do Compute c δ − pv ( x ) = [ t δ − pv ] b ( x/t,t ) via Fiduccia’s algo return π δ ( x ) = P u + pv = δ c u ( x ) b v ( x p ) Algorithm 4. δ -diagonal of a bivariate polynomial power. We ﬁnally combine the results in Sections 4.1 and 4.2with the ones of Sec. 3, and modify Algorithm 3 accordingly,by replacing its Step 3 with the partial powering routine(Algorithm 4). This yields our main result.

Theorem

Let E be in F p [ x, y ] h,d satisfy E (0 ,

0) = 0 and E y (0 , = 0 , and let f ∈ F p [[ x ]] be its unique rootwith f (0) = 0 . One can compute the coeﬃcient f N of f in O ( h ( d + h ) log N ) + ˜ O ( h ( d + h ) p ) operations in F p . Proof.

Only the cost of the precomputation changes.Instead of computing the whole power B = b p − , we onlycompute its δ -diagonals π δ ( x ), for δ ∈ ∆ . Since | ∆ | = | ∆ | ≤ ( d x + d y + 1) , we conclude using Theorem 10. Example 9 (Catalan).

Let E ( x, y ) = y − x − y be in F p [ x, y ], with root f = P n ≥ C n x n +1 , where C n is n +1 (cid:0) nn (cid:1) .Then f is the diagonal of a/b with a = y (1 − y ) and b = 1 − x − y .Thus d x = 1 , d y = 2, and the linear representation has size 6. Theneeded information to compute the 6 × A , . . . , A p − consists in the δ -diagonals of B = b p − for δ ∈ ∆ := ([ − ,

1] + p Z ) ∩ [1 − p, p −

1] = { − p, − , − , , , p − , p − } . Writing B ( x/t, t ) = (1 − x/t − t ) p − = P p − δ =1 − p π δ ( x ) t δ , the goal is tocompute the polynomials π δ ( x ) for δ ∈ ∆. This is done via theformula B ( x/t, t ) = (1 − x p /t p − t p ) / (1 − x/t − t ) which gives π δ ( x ) = − c δ + p ( x ) x p + c δ ( x ) − c δ − p ( x ). Therefore, it is suﬃcientto compute the rational functions c δ ( x ) for all δ ∈ ∆ = ∆ + p .This is done using Fiduccia’s algorithm, which exploits the factthat the c δ ’s satisfy the recurrence xc k +2 ( x ) = c k +1 ( x ) − c k ( x ).For instance, π ( x ) = − x p c p ( x ), where c p ( x ) is obtained as thecoeﬃcient of t in the product ( − /x − t/x ) · ( t p mod − xt + t − π ( x ) amounts to checking the amusing identity: (cid:2) t (cid:3) (cid:0) t p mod − xt + t − (cid:1) = p − X ‘ =0 (cid:0) ‘‘ (cid:1) x ‘ +1 − p in F p [ x ] . We have implemented the algorithms described in thissection in

Maple . Our prototype is available at the urlhttp://specfun.inria.fr/dumas/Research/AlgModp/.The implementation permits getting an idea of the feasibility of the approach using diagonals. For instance, in the case ofthe quartic equation in Ex. 6, with p = 9001 instead of p = 11,the algorithm spends 142 seconds on the precomputation,then computes the N -th coeﬃcient for N = 10 , , resp.10 , , resp. 10 , , in: 3, resp. 25, resp. 344 seconds.(Timings on a personal laptop, Intel Core i5, 2.8 GHz, 3MB.)In a forthcoming article, we plan to extend our algorithmsto all ﬁnite ﬁelds, and to improve the complexity results.

5. REFERENCES [1] J.-P. Allouche and J. Shallit. The ring of k -regular sequences. Theoret. Comput. Sci. , 98(2):163–197, 1992.[2] J.-P. Allouche and J. Shallit.

Automatic sequences . CambridgeUniversity Press, 2003. Theory, applications, generalizations.[3] A. Bostan, F. Chyzak, G. Lecerf, B. Salvy, and É. Schost.Diﬀerential equations for algebraic functions. In

ISSAC’07 ,pages 25–32. ACM Press, 2007.[4] A. Bostan, P. Gaudry, and É. Schost. Linear recurrences withpolynomial coeﬃcients and application to integer factorizationand Cartier-Manin operator.

SIAM Journal on Computing ,36(6):1777–1806, 2007.[5] D. G. Cantor and E. Kaltofen. On fast multiplication ofpolynomials over arbitrary algebras.

Acta Inform. ,28(7):693–701, 1991.[6] G. Christol. Éléments algébriques. In

Groupe de travaild’Analyse Ultra-métrique (1ère année : 1973/74) , number 14,10 pp. Secrétariat Mathématique, Paris, 1974.[7] G. Christol. Ensembles presque periodiques k -reconnaissables. Theoret. Comput. Sci. , 9(1):141–145, 1979.[8] G. Christol, T. Kamae, M. Mendès France, and G. Rauzy.Suites algébriques, automates et substitutions.

Bull. Soc. Math.France , 108(4):401–419, 1980.[9] D. V. Chudnovsky and G. V. Chudnovsky. On expansion ofalgebraic functions in power and Puiseux series. I.

J.Complexity , 2(4):271–294, 1986.[10] D. V. Chudnovsky and G. V. Chudnovsky. Approximations andcomplex multiplication according to Ramanujan. In

Ramanujanrevisited (Urbana-Champaign, 1987) , pages 375–472. 1988.[11] D. V. Chudnovsky and G. V. Chudnovsky. Computer algebra inthe service of mathematical physics and number theory. In

Computers in mathematics (Stanford, 1986) , volume 125 of

Lecture Notes in Pure and Appl. Math. , pages 109–232. 1990.[12] L. Comtet. Calcul pratique des coeﬃcients de Taylor d’unefonction algébrique.

Enseignemen Math. , 10:267–270, 1964.[13] P. Dumas.

Récurrences mahlériennes, suites automatiques,études asymptotiques . PhD thesis, Univ. Bordeaux I, 1993.Available at https://tel.archives-ouvertes.fr/tel-00614660.[14] C. M. Fiduccia. An eﬃcient formula for linear recurrences.

SIAM J. Comput. , 14(1):106–112, 1985.[15] H. Furstenberg. Algebraic functions over ﬁnite ﬁelds.

J.Algebra , 7:271–277, 1967.[16] F. L. Gall. Powers of tensors and fast matrix multiplication. In

ISSAC’14 , pages 296–303, 2014.[17] J. von zur Gathen and J. Gerhard.

Modern computer algebra .Cambridge University Press, third edition, 2013.[18] D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomialmultiplication over ﬁnite ﬁelds. http://arxiv.org/abs/1407.3361,2014.[19] H. T. Kung and J. F. Traub. All algebraic functions can becomputed fast.

J. Assoc. Comput. Mach. , 25(2):245–260, 1978.[20] J. C. P. Miller and D. J. Spencer Brown. An algorithm forevaluation of remote terms in a linear recurrence sequence.

Computer Journal , 9:188–190, 1966.[21] K. Nishioka.

Mahler Functions and Transcendence . Number1631 in Lecture Notes in Mathematics. Springer, Berlin, 1996.[22] E. Rowland and R. Yassawi. Automatic congruences fordiagonals of rational functions.

J. Théor. Nombres Bordeaux ,27(1):245–288, 2015.[23] A. Schönhage. Schnelle Multiplikation von Polynomen überKörpern der Charakteristik 2.

Acta Informat. , 7:395–398, 1977.[24] A. Storjohann. High-order lifting and integrality certiﬁcation.