Computing syzygies in finite dimension using fast linear algebra
CComputing syzygies in finite dimension using fast linear algebra
Vincent Neiger
Univ. Limoges, CNRS, XLIM, UMR 7252, F-87000 Limoges, France ´Eric Schost
University of Waterloo, Waterloo ON, Canada
Abstract
We consider the computation of syzygies of multivariate polynomials in a finite-dimensionalsetting: for a K [ X , . . . , X r ]-module M of finite dimension D as a K -vector space, and givenelements f , . . . , f m in M , the problem is to compute syzygies between the f i ’s, that is, poly-nomials ( p , . . . , p m ) in K [ X , . . . , X r ] m such that p f + · · · + p m f m = M . Assuming thatthe multiplication matrices of the r variables with respect to some basis of M are known, wegive an algorithm which computes the reduced Gr¨obner basis of the module of these syzygies,for any monomial order, using O ( mD ω − + rD ω log( D )) operations in the base field K , where ω is the exponent of matrix multiplication. Furthermore, assuming that M is itself given as M = K [ X , . . . , X r ] n / N , under some assumptions on N we show that these multiplication matri-ces can be computed from a Gr¨obner basis of N within the same complexity bound. In particular,taking n = m = f = M , this yields a change of monomial order algorithm alongthe lines of the FGLM algorithm with a complexity bound which is sub-cubic in D . Keywords:
Gr¨obner basis, syzygies, complexity, fast linear algebra.
1. Introduction
In what follows, K is a field and we consider the polynomial ring K [ X ] = K [ X , . . . , X r ]. Theset of m × n matrices over a ring R is denoted by R m × n ; when orientation matters, a vector in R n is considered as being in R × n (row vector) or in R n × (column vector). We are interested in thee ffi cient computation of relations, known as syzygies, between elements of a K [ X ]-module M .Let us write the K [ X ]-action on M as ( p , f ) ∈ K [ X ] × M (cid:55)→ p · f , and let f , . . . , f m be in M . Then, for a given monomial order ≺ on K [ X ] m , we want to compute the Gr¨obner basis of thekernel of the homomorphism K [ X ] m → M ( p , . . . , p m ) (cid:55)→ p · f + · · · + p m · f m . This kernel is called the module of syzygies of ( f , . . . , f m ) and written Syz M ( f , . . . , f m ).In this paper, we focus on the case where M has finite dimension D as a K -vector space; as aresult, the quotient K [ X ] m / Syz M ( f , . . . , f m ) has dimension at most D as a K -vector space. Thenone may adopt a linear algebra viewpoint detailed in the next paragraph, where the elements of M are seen as row vectors of length D and the multiplication by the variables is represented a r X i v : . [ c s . S C ] J un y so-called multiplication matrices. This representation was used and studied in [2, 37, 1, 27],mainly in the context where M is a quotient K [ X ] / I for some ideal I (thus zero-dimensionalof degree D ) and more generally a quotient K [ X ] n / N for some submodule N ⊆ K [ X ] n with n ∈ N > (see [1, Sec. 4.4 and 6]). This representation with multiplication matrices allows one toperform computations in such a quotient via linear algebra operations.Assume we are given a K -vector space basis F of M . For i in { , . . . , r } , the matrix of thestructure morphism f (cid:55)→ X i · f with respect to this basis is denoted by M i ; this means that for f in M represented by the vector f ∈ K × D of its coe ffi cients on F , the coe ffi cients of X i · f ∈ M on F are f M i . We call M , . . . , M r multiplication matrices ; note that they are pairwise commuting.The data formed by these matrices defines the module M up to isomorphism; we use it as arepresentation of M . For p in K [ X ] and for f in M represented by the vector f ∈ K × D of itscoe ffi cients on F , the coe ffi cients of p · f ∈ M on F are f p ( M , . . . , M r ); hereafter this vectoris written p · f . From this point of view, syzygy modules can be described as follows. Definition 1.1.
For m and D in N > , let M = ( M , . . . , M r ) be pairwise commuting matrices in K D × D , and let F ∈ K m × D . Denoting by f , . . . , f m the rows of F , for p = ( p , . . . , p m ) ∈ K [ X ] m we write p · F = p · f + · · · + p m · f m = f p ( M ) + · · · + f m p m ( M ) ∈ K × D . The syzygy module Syz M ( F ) , whose elements are called syzygies for F , is defined as Syz M ( F ) = { p ∈ K [ X ] m | p · F = } ; as noted above, K [ X ] m / Syz M ( F ) has dimension at most D as a K -vector space. In particular, if in the above context F is the matrix of the coe ffi cients of f , . . . , f m ∈ M onthe basis F , then Syz M ( F ) = Syz M ( f , . . . , f m ). Our main goal in this paper is to give a fastalgorithm to solve the following problem (for the notions of monomial orders and Gr¨obner basisfor modules, we refer to [13] and the overview in Section 2). Problem 1 –
Gr¨obner basis of syzygies
Input: • a monomial order ≺ on K [ X ] m , • pairwise commuting matrices M = ( M , . . . , M r ) in K D × D , • a matrix F ∈ K m × D . Output: the reduced ≺ -Gr¨obner basis of Syz M ( F ). Example . An important class of examples has r =
1; in this case, we are working withunivariate polynomials. Restricting further, consider the case M = K [ X ] / (cid:104) X D (cid:105) endowed withthe canonical K [ X ]-module structure; then M is the upper shift matrix, whose entries are all 0except those on the superdiagonal which are 1. Given f , . . . , f m in M , ( p , . . . , p m ) ∈ K [ X ] m is asyzygy for f , . . . , f m if p f + · · · + p m f m = X D . Such syzygies are known as Hermite-Pad´eapproximants of ( f , . . . , f m ) [22, 40]. Using moduli other than X D leads one to generalizationssuch as M-Pad´e approximants or rational interpolants (corresponding to a modulus that splitsinto linear factors) [33, 4, 49].For r =
1, Syz M ( F ) is known to be free of rank m . Bases of such K [ X ]-modules are oftendescribed by means of their so-called Popov form [43, 26]. In commutative algebra terms, this2s a term over position
Gr¨obner basis. Another common choice is the
Hermite form, which is a position over term
Gr¨obner basis [29].
Example . For arbitrary r , let I be a zero-dimensional ideal in K [ X ] and let M = K [ X ] / I with the canonical K [ X ]-module structure. Then, taking m = f = ∈ M , we haveSyz M ( f ) = { p ∈ K [ X ] | p f = } = { p ∈ K [ X ] | p ∈ I} = I . Suppose we know a Gr¨obner basis of I for some monomial order ≺ , together with the cor-responding monomial basis of M , and the multiplication matrices of X , . . . , X r in M . Thensolving Problem 1 amounts to computing the Gr¨obner basis of I for the new order ≺ .More generally for a given f = f ∈ M , the case m = annihilator of f in K [ X ], often denoted by Ann K [ X ] ( { f } ). Indeed, the latter set is defined as { p ∈ K [ X ] | p f = } , which is precisely Syz M ( f ). Example . For an arbitrary r , let α , . . . , α D be pairwise distinct points in K r , with α i = ( α i , , . . . , α i , r ) for all i . Let I be the vanishing ideal of { α , . . . , α D } and M = K [ X ] / I . As above,take m = f =
1, so that Syz M (1) = I .The Chinese Remainder Theorem gives an explicit isomorphism M → K D that amounts toevaluation at α , . . . , α D . The multiplication matrices induced by this module structure on K D arediagonal, with M j having diagonal ( α , j , . . . , α D , j ) for 1 (cid:54) j (cid:54) r . Taking F = [1 · · · ∈ K × D ,solving Problem 1 allows us to compute the Gr¨obner basis of the vanishing ideal I for any givenorder ≺ . This problem was introduced by M¨oller and Buchberger [36]; it may be extended tocases where vanishing multiplicities are prescribed [35]. Example . Now we consider an extension of the M¨oller-Buchberger problem due to Kehrein,Kreuzer and Robbiano [27]. Given r pairwise commuting d × d matrices N , . . . , N r , we look fortheir ideal of syzygies, that is, the ideal I of polynomials p ∈ K [ X ] such that p ( N , . . . , N r ) = .When r =
1, this ideal is generated by the minimal polynomial of N .One may see this problem in our framework by considering M = K d × d endowed with the K [ X ]-module structure given by X k · A = AN k for all 1 (cid:54) k (cid:54) r and A ∈ K d × d . The ideal I defined above is the module of syzygies Syz M ( f ) of the identity matrix f = I d ∈ M , so we have m = d and D = d here. To form the input of Problem 1, we choose as a basis of M the list ofelementary matrices F = ( c , , . . . , c , d , . . . , c d , , . . . , c d , d ) where c i , j is the matrix in K d × d whoseonly nonzero entry is a 1 at index ( i , j ). Then, for k ∈ { , . . . , r } , the multiplication matrix M k is the Kronecker product I d ⊗ N k , that is, the block diagonal matrix in K d × d with d diagonalblocks, each of them equal to N k . Besides, the input F ∈ K × d is the vector of coordinates of f = I d on the basis F , so that I = Syz M ( F ) where M = ( M , . . . , M r ). Example . Our last example is a multivariate extension of Hermite-Pad´e approximants andinvolves arbitrary parameters r (cid:62) m (cid:62)
2. For a positive integer d , consider the ideal I = (cid:104) X d , . . . , X dr (cid:105) , and let M = K [ X ] / I . Then for given f , . . . , f m in M , which may be seen aspolynomials truncated in degree d in each variable, the syzygy module Syz M ( − , f , . . . , f m ) is { ( p , q , . . . , q m ) ∈ K [ X ] m | p = q f + · · · + q m f m mod (cid:104) X d , . . . , X dr (cid:105)} . It was showed in [19, Thm. 3.1] that this module is generated by { f i c + c i | (cid:54) i (cid:54) m } ∪ { X dk c i | (cid:54) k (cid:54) r , (cid:54) i (cid:54) m } , where c i is the coordinate vector (0 , . . . , , , , . . . , ∈ K [ X ] m with 1 at index i . In the samereference, two algorithms are given to find the Gr¨obner basis of this syzygy module for arbitrary3onomial orders. One algorithm uses the FGLM change of order (extended to modules), basedon the fact that the above generating set is a Gr¨obner basis for a well-chosen order. The otherone proceeds iteratively on the d r vanishing conditions; this approach is similar to the above-mentioned algorithm of [35], and can be seen as a multivariate generalization of the classicaliterative algorithm for univariate Hermite-Pad´e approximation [49, 5].For the linear algebra viewpoint used in this paper, consider the K -vector space basis F of M formed by all monomials X e · · · X e r r for 0 (cid:54) e , . . . , e r < d ordered by the lexicographic order ≺ lex with X r ≺ lex · · · ≺ lex X . Computing bases of the above syzygy module is an instance ofProblem 1 with D = d r , taking for F the matrix of the coe ffi cients of ( − , f , . . . , f m ) on the basis F , and for M , . . . , M r the multiplication matrices of X , . . . , X r in M with respect to F . Thesematrices are types of block upper shift matrices which are nilpotent of order d . Taking r = M is block-diagonal with all diagonal blocks equal to the d × d upper shift matrix,while M is a matrix formed by d rows and d columns of d × d blocks which are all zero exceptthose on the (blockwise) superdiagonal which are equal to the d × d identity matrix. Main result . For r variables and an input module of vector space dimension D , we design analgorithm whose cost is essentially that of performing fast linear algebra operations with r scalarmatrices of dimensions D × D . In the rest of the paper, ω is a feasible exponent for matrixmultiplication over the field K ; the best known bound is ω < .
38 [12, 30].
Theorem 1.7.
Let ≺ be a monomial order on K [ X ] m , let M , . . . , M r be pairwise commutingmatrices in K D × D , and let F ∈ K m × D . Then there is an algorithm which solves Problem 1 usingO (cid:16) mD ω − + D ω ( r + log( d · · · d r )) (cid:17) ⊂ O (cid:16) mD ω − + rD ω log( D ) (cid:17) operations in K , where d k ∈ { , . . . , D } is the degree of the minimal polynomial of M k , for (cid:54) k (cid:54) r. This theorem is proved in Section 3, based on Algorithm 3 which is built upon Algorithm 1(computing the monomial basis) and Algorithm 2 (computing normal forms). Commonly en-countered situations involve m (cid:54) D (see Examples 1.3 to 1.5), in which case the cost bound canbe simplified as O ( rD ω log( D )). The interest in the more precise cost bound involving d , . . . , d r comes from situations such as that of Example 1.6, where d · · · d r = d r = D since all the matri-ces M , . . . , M r have a minimal polynomial of degree d ; in that case, the first cost bound in thetheorem becomes O ( mD ω − + rD ω + D ω log( D )). Another refinement of the cost bound is givenin Remark 3.11 for a particular order ≺ , namely the term over position lexicographic order.Our algorithm deals with the multiplication matrices M , . . . , M r one after another, allowingus to rely on an approach inspired by that designed for the univariate case r = ff erent matrix M k , and instead grouping these operationsinto some matrix-matrix products involving M , then some others involving M , etc. To ourknowledge, this is the first time a general answer is given to Problem 1. For problems such asExamples 1.3 to 1.6, ours is the first algorithm that relies on fast linear algebra techniques, withthe partial exception of [14], as discussed below.Our cost bound can be compared to the input and output size of the problem. The inputmatrices are represented using mD + rD field elements, and we will see in Sections 2 and 3 thatone can represent the output Gr¨obner basis using at most mD + rD field elements as well.4 verview of the algorithm . To introduce matrix multiplication in our solution to Problem 1, werely on a linearization into linear algebra problems over K . From M and F , we build a matrixover K whose nullspace corresponds to a set of syzygies in Syz M ( F ). This matrix is called a multi-Krylov matrix , in reference to its structure which exhibits a collection of Krylov subspacesof K D .The multi-Krylov matrix is a generalization to several variables and to an arbitrary monomialorder of the (striped-)Krylov matrices considered for example in [26, 7]; already in [26, Chap. 6],Popov and Hermite bases of modules over the univariate polynomials are obtained by meansof Krylov matrix computations. Its construction is similar to the Sylvester matrix and moregenerally to the Macaulay matrix [47, 31, 32], which are commonly used when adopting a linearalgebra viewpoint while dealing with operations on univariate and multivariate polynomials.In what follows, given e = ( e , . . . , e r ) in N r , we write X e = X e · · · X e r r and M e = M e · · · M e r r ;the i th coordinate vector in K m is written c i . Then the construction of the multi-Krylov matrix isbased on viewing the product X e c i · F , for a monomial X e c i ∈ K [ X ] m and F in K m × D , as f i M e ,where f i is the i th row of F . Since a polynomial in K [ X ] m is a K -linear combination of mono-mials, this identity means that a syzygy in Syz M ( F ) may be interpreted as a K -linear relationbetween row vectors of the form f i M e .Choosing some degree bounds β = ( β , . . . , β r ) ∈ N r > , our multi-Krylov matrix is thenformed by all such rows f i M e , for 1 (cid:54) i (cid:54) m and (cid:54) e < β entrywise, ordered according to themonomial order ≺ . For su ffi ciently large β (taking β k = D for all k is enough, for instance), weshow in Section 3.2 that the row rank profile of this matrix corresponds to the ≺ -monomial basisof the quotient K [ X ] m / Syz M ( F ).The main task of our algorithm is to compute this row rank profile. Adapting ideas in thealgorithms of [16, 35, 19], one would iteratively consider the rows of the matrix, looking for alinear relation with the previous rows by Gaussian elimination. When such a linear relation isfound, the corresponding row can be discarded. Now, the multi-Krylov structure further permitsto discard all the rows that correspond to monomial multiples of the leading term of the discov-ered syzygy, even before computing these rows. At some point, the set of rows to be consideredis exhausted, and we can deduce the row rank profile.In this approach, a row of the multi-Krylov matrix is computed by multiplying one of thealready computed rows by one of the multiplication matrices. This results in many vector-matrixproducts, with possibly di ff erent matrices each time: this is an obstacle towards incorporatingfast matrix multiplication. We circumvent this by introducing the variables one after another,thus seemingly not respecting the order of the rows specified by the monomial order; yet, wewill manage to ensure that this order is respected in the end. When dealing with one variable X k , we process successive powers M e k in the style of Keller-Gehrig’s algorithm [28], using alogarithmic number of iterations.Finally, from the monomial basis, one can easily find the minimal generating set of the lead-ing module of Syz M ( F ). The union of the monomial basis and of these generators is a set ofmonomials, which thus corresponds to a submatrix of the multi-Krylov matrix; the left nullspaceof this submatrix, computed in reduced row echelon form, yields the reduced ≺ -Gr¨obner basis ofsyzygies. Previous work . An immediate remark is that the number of field entries of the multi-Krylovmatrix is m β · · · β r D ∈ O ( mD r + ), which significantly exceeds our target cost. Exploiting thestructure of this matrix is therefore a common thread in all e ffi cient algorithms.5or the univariate case r =
1, first algorithms with cost quadratic in D were given in [45, 42]for Hermite-Pad´e approximation (Example 1.2); they returned a single syzygy of small degree.Later, work on this case [48, 5] showed how to compute a basis of the module of syzygies intime O ( mD ) and the more general M-Pad´e approximation was handled in the same complexityin [49, 4, 6, 7], with the algorithm of [7] returning the reduced ≺ -Gr¨obner basis (called the shiftedPopov form in that context) of syzygies for an arbitrary monomial order ≺ . Then a cost quasi-linear in D was achieved for M-Pad´e approximation by a divide and conquer approach basedon fast polynomial matrix multiplication [5, 21, 24], with the most recent algorithms returningthe reduced ≺ -Gr¨obner basis of syzygies for an arbitrary ≺ at such a cost [23, 25]. M-Pad´eapproximation exactly corresponds to instances of Problem 1 with r = D , it is necessary that the multiplicationmatrix exhibits such a structure and that the algorithm takes advantage of it, since in generalmerely representing the input multiplication matrix already requires D field elements.Still for r =
1, the case of an upper triangular multiplication matrix M was handled in [7,Algo. FFFG] by relying on the kind of linearization discussed above. This algorithm exploitsthe triangular shape to iterate on the D leading principal submatrices of M , which extends theiteration for M-Pad´e approximation in [49, 4, 6] and costs O ( mD + D ) operations (see [38,Prop. 6.5] for a complexity analysis). To take advantage of fast matrix multiplication, anotherapproach was designed in [24, Sec. 7], considering the same Krylov matrix as in [7] but process-ing its structure in the style of Keller-Gehrig’s algorithm [28]. The algorithm in [24] supportsan arbitrary matrix M at the cost O ( mD ω − + D ω log( d )), and the algorithm in this paper canbe seen as a generalization of it to the multivariate case since both coincide when r = ≺ -Gr¨obner bases). This generalizationis not straightforward: besides the obvious fact that the output basis usually has more than m elements because most submodules of K [ X ] m are not free (unlike in the univariate case), as high-lighted above the multivariate case also involves a more complex management of the order inwhich rows in the multi-Krylov matrix are inserted and processed, in relation with the monomialorder ≺ and the successive introductions of the di ff erent variables.On the other hand, previous algorithms dealing with the case r (cid:62) etal. ’s FGLM algorithm [16]. The former computes the ideal of a finite set of points (Example 1.4),while the latter is specialized to the change of monomial order for ideals (Example 1.3), with acost bound in O ( rD ). Note however that the input in the FGLM algorithm is not the same as inProblem 1; this is discussed below.A first generalization of [36, 16] was presented in [35, Algo. 1], still in the case m = f =
1: the input describes M = K [ X ] / I for some zero-dimensional ideal I of degree D , andthe algorithm outputs a ≺ -Gr¨obner basis of I . The cost bound for this algorithm involves a term O ( rD ), but also a term related to the description of the input. This input description is di ff erentfrom ours: it consists of a set of linear functionals which defines the ideal I ; thus one shouldbe careful when comparing this work to our results. Another related extension of [36] is theBuchberger-M¨oller algorithm for matrices given in [27, Sec. 4.1.2], which solves Example 1.5;the runtime is not specified in that reference.Another type of generalizations of [36] was detailed in [35, Algo. 2], [19, Algo. 4.7], and [39,Algo. 3.2], with the last reference tackling syzygy modules with m (cid:62) f , . . . , f m like in this paper, yet with assumptions on M ; the cost bounds given in [35] and [19] involve aterm in O ( rD ) whereas [39] does not report on complexity bounds. The assumptions on M are6pecified in the input of [35, Algo. 2], in [19, Eqn. (4.1)], and in [39, Eqn. (5)], and they implythat one can solve the problem by finding iteratively Gr¨obner bases for a sequence of “approx-imating” modules of syzygies which decrease towards the actual solution. Such assumptionslead to instances of Problem 1 which generalize to r (cid:62) r =
1; [39, Sec. 5] explicitly men-tions the link with Beckermann and Labahn’s algorithm for M-Pad´e approximation [6, 7]. Inboth the univariate and multivariate settings, it seems that such an iterative approach cannot beapplied to the general case of Problem 1, where the input module has no other property thanbeing finite-dimensional, and is described through arbitrary commuting matrices.For the particular case of the change of order for ideals (Example 1.3), so with m =
1, whenthe target order is the lexicographic order ≺ lex , and under the assumption that the ideal is inShape Position, fast matrix multiplication was used for the first time in [15], yielding a sub-cubiccomplexity. Indeed, if for example X r is the smallest variable, these assumptions ensure thatonly M r is needed; with this matrix as input, [15, Prop. 3] gives a probabilistic algorithm tocompute the ≺ lex -Gr¨obner basis of syzygies within the cost bound O ( D ω log( D ) + r M ( D ) log( D )).Besides ideas from [17, 18], this uses repeating squaring as in [28]. In this paper, we manageto incorporate fast matrix multiplication without assumption on the module, and for an arbitraryorder.Still for the particular of Example 1.3, Faug`ere and Mou give in [17, 18] probabilistic al-gorithms based on sparse linear algebra. These papers do not consider the computation of themultiplication matrices, which are assumed to be known. While we do not make any sparsityassumption on the multiplication matrices, for the sake of comparison we still summarize thisapproach below. Noticing that the multiplication matrices arising from the context of polynomialsystem solving are often sparse, [17, 18] tackle Problem 1 from a point of view similar to theWiedemann algorithm. Evaluating the monomials in K [ X ] / I at certain linear functionals allowsone to build a multi-dimensional recurrent sequence which admits I as its ideal of syzygies (thisis only true for some type of ideals I ). In terms of the multi-Krylov matrices we are consider-ing in Section 3 concerning Problem 1, this is similar to introducing an additional projection onthe right of the multiplication matrices to take advantage of the sparsity by using a black-boxpoint of view. Then recovering a ≺ lex -Gr¨obner basis of this ideal of syzygies can be done via theBerlekamp-Massey-Sakata algorithm [44], or the recent improvements in [8, 9, 10, 11]. Application: change of order . The FGLM algorithm [16] solves the change of order problemfor Gr¨obner bases of ideals in O ( rD ) operations for arbitrary orders ≺ and ≺ : starting only from a ≺ -Gr¨obner basis G for the input order ≺ , it computes the ≺ -Gr¨obner basis G of theideal I = (cid:104)G (cid:105) . Following [17, Sec. 2.1], one can view the algorithm as a two-step process: it firstcomputes from G the multiplication matrices of M = K [ X ] / I with respect to the ≺ -monomialbasis, and then finds G as a set of K -linear relations between certain normal forms modulo G .The algorithm extends to the case of submodules of K [ X ] n for n (cid:62) ≺ -Gr¨obner basis in O ( nD ω − + rD ω log( D )) operations. We now discuss how fast linear algebra may be incorporated into thecomputation of the multiplication matrices (Problem 2).Our solution to this problem finds its roots in [15, Sec. 4], which focuses on the case where n = I = (cid:104) f , . . . , f s (cid:105) is an ideal in K [ X ]. In the context studied in this reference onlythe matrix M r of the smallest variable X r is needed, and it is showed that this matrix can besimply read o ff from the input Gr¨obner basis without arithmetic operations, under some structural7 roblem 2 – Computing the multiplication matrices
Input: • a monomial order ≺ on K [ X ] n , • a reduced ≺ -Gr¨obner basis { f , . . . , f s } ⊂ K [ X ] n such that M = K [ X ] n / (cid:104) f , . . . , f s (cid:105) has finite dimension as a K -vectorspace. Output: the multiplication matrices M , . . . , M r of X , . . . , X r in M = K [ X ] n / (cid:104) f , . . . , f s (cid:105) with respect to its ≺ -monomial basis.assumption on the ideal of leading terms of I described in [15, Prop. 7]. Here we considerthe more general case of submodules N = (cid:104) f , . . . , f s (cid:105) of K [ X ] n , and we design an algorithmwhich computes all multiplication matrices M , . . . , M r in M = K [ X ] n / N using O ( rD ω log( D ))operations, under a structural assumption on the module of leading terms of N . Situations wherethis assumption on (cid:104) lt ≺ ( N ) (cid:105) holds typically involve a monomial order such that X r c i ≺ · · · ≺ X c i for 1 (cid:54) i (cid:54) n . The assumption, described below, naturally extends the one from [15] to the caseof submodules. Definition 1.8.
For a monomial submodule
T ⊆ K [ X ] n , the assumption H ( T ) is: “for allmonomials µ ∈ T , for all j ∈ { , . . . , r } such that X j divides µ , for all i ∈ { , . . . , j − } , themonomial X i X j µ belongs to T ”. In fact, instead of considering all monomials in T , one can observe that H ( T ) holds if and onlyif the property holds for each monomial in the minimal generating set of T (see Lemma 2.2). Theorem 1.9.
For n (cid:62) , let ≺ be a monomial order on K [ X ] n and let { f , . . . , f s } be a re-duced ≺ -Gr¨obner basis defining a submodule N = (cid:104) f , . . . , f s (cid:105) of K [ X ] n such that K [ X ] n / N hasdimension D as a K -vector space. Assuming H ( (cid:104) lt ≺ ( N ) (cid:105) ) , • Problem 2 can be solved using O ( rD ω log( D )) operations in K ; • the change of order problem, that is, computing the reduced ≺ -Gr¨obner basis of N for amonomial order ≺ , can be solved using O ( nD ω − + rD ω log( D )) operations in K . Concerning the first item, an overview of our approach is presented in Section 4.1 and thedetailed algorithms and proofs are in Sections 4.2 and 4.3, with a slightly refined cost boundin Proposition 4.7. The second item is proved in Section 4.4, based on Algorithm 7 whichessentially calls our algorithms to compute first the multiplication matrices (Problem 2) and thenthe ≺ -Gr¨obner basis of N by considering a specific module of syzygies (Problem 1).Our structural assumption has been considered before, in particular in the case where n = I = (cid:104) f , . . . , f s (cid:105) (and the monomial order is such that X r ≺ · · · ≺ X , which is always true up to renaming the variables). In this case, it holds assuming for instancethat the coe ffi cients of f , . . . , f s are generic (that is, pairwise distinct indeterminates over a givenground field) and that the Moreno-Socias conjecture holds [15, Sec. 4.1]. Another importantsituation where the assumption holds is when the leading ideal (cid:104) lt ≺ ( I ) (cid:105) is Borel-fixed and thecharacteristic of K is zero, see [13, Sec. 15.9] and [15, Sec. 4.2]. A theorem first proved byGalligo in power series rings [20], then by Bayer and Stillman [3, Prop. 1] for a homogeneousideal I in K [ X ] shows that after a generic change of coordinates, (cid:104) lt ≺ ( I ) (cid:105) is Borel-fixed.8he most general version of this result we are aware of is due to Pardue [41]. It applies to K [ X ]-submodules N ⊂ K [ X ] n , for certain monomial orders ≺ on K [ X ] n ; the precise conditionson ≺ are too technical to be stated here, but they hold in particular for the term over position orderinduced by a monomial order on K [ X ] which refines the (weighted) total degree. In such cases,Pardue shows that after a generic linear change of variables, (cid:104) lt ≺ ( N ) (cid:105) satisfies a Borel-fixednessproperty on K [ X ] n which implies that H ( (cid:104) lt ≺ ( N ) (cid:105) ) holds, at least in characteristic zero.For polynomial system solving, with n =
1, an interesting particular case of the changeof order problem is that of ≺ = ≺ drl being the degree reverse lexicographic order and ≺ = ≺ lex being the lexicographic order (so that in characteristic zero, Pardue’s result shows that in genericcoordinates, our structural assumption holds for such inputs). Fast algorithms for this case havebeen studied in [15, 14]. The former assumes the ideal I is in Shape Position, whereas thisassumption is not needed here. In [14], an algorithm is designed to compute the multiplicationmatrices from a ≺ drl -Gr¨obner basis in time O ( β r ω D ω ), where β is the maximum total degree ofthe elements of the input Gr¨obner basis. This is obtained by iterating over the total degree: thenormal forms of all monomials of the same degree are dealt with using only one call to Gaussianelimination. While this does not require an assumption on the leading ideal, it is unclear to ushow to remove the dependency in β in general. Outline . Section 2 gathers preliminary material used in the rest of the paper: some notation, aswell as basic definitions and properties related to monomial orders, Gr¨obner bases, and monomialstaircases. Then Section 3 gives algorithms and proofs concerning the computation of bases ofsyzygies, leading to the main result of this paper (Theorem 1.7), while Section 4 focuses on thecomputation of the multiplication matrices (Theorem 1.9); both sections are introduced with amore detailed outline of their content.
2. Notations and definitions
Monomial orders and Gr¨obner bases for modules . Hereafter, we consider a multivariate poly-nomial ring K [ X ] = K [ X , . . . , X r ], for some field K . Recall that the coordinate vectors aredenoted by c , . . . , c m , that is, c j = (0 , . . . , , , , . . . , ∈ K [ X ] m with 1 at index j . A monomial in K [ X ] is defined from exponents e = ( e , . . . , e r ) ∈ N r as X e = X e · · · X e r r and amonomial in K [ X ] m is µ c j = (0 , . . . , , µ, , . . . ,
0) with 1 (cid:54) j (cid:54) m and where µ is any monomialof K [ X ]. A term in K [ X ] or in K [ X ] m is a monomial multiplied by a nonzero scalar from K .Given terms µ and ν in K [ X ], we say that the term µ c j is divisible by the term ν c k if j = k and µ is divisible by ν in K [ X ].A submodule of K [ X ] m generated by monomials of K [ X ] m is called a monomial submodule.A submodule of K [ X ] m generated by homogeneous polynomials of K [ X ] m is called a homoge-neous submodule.Following [13, Sec. 15.3], a monomial order on K [ X ] m is a total order ≺ on the mono-mials of K [ X ] m such that, for any monomials µ, ν of K [ X ] m and any monomial κ of K [ X ], µ ≺ ν implies µ ≺ κµ ≺ κν . Examples of common monomial orders on K [ X ] m are so-called term over position (top) and position over term (pot). In both cases, we start from a monomialorder on K [ X ] written ≺ . Then, given monomials µ c i and ν c j , we say that µ c i ≺ top ν c j if µ ≺ ν or if µ = ν and i < j . Similarly, we say that µ c i ≺ pot ν c j if i < j or if i = j and µ ≺ ν .9or a given monomial order ≺ on K [ X ] m and an element f ∈ K [ X ] m , the ≺ -leading term of f ,denoted by lt ≺ ( f ), is the term of f whose monomial is the greatest with respect to ≺ . This extendsto any collection F ⊆ K [ X ] m of polynomials: lt ≺ ( F ) is the set of leading terms { lt ≺ ( f ) | f ∈ F } of the elements of F . In particular, for a module N in K [ X ] m , (cid:104) lt ≺ ( N ) (cid:105) is a monomial submoduleof K [ X ] m which is called the ≺ -leading module of N . Definition 2.1 (Gr¨obner basis) . Let ≺ be a monomial order on K [ X ] m and let N be a K [ X ] -submodule of K [ X ] m . A subset { f , . . . , f s } ⊂ N is said to be a ≺ -Gr¨obner basis of N if the ≺ -leading module of N is generated by { lt ≺ ( f ) , . . . , lt ≺ ( f s ) } , i.e. (cid:104) lt ≺ ( N ) (cid:105) = (cid:104) lt ≺ ( f ) , . . . , lt ≺ ( f s ) (cid:105) . There is a specific ≺ -Gr¨obner basis of N , called the reduced ≺ -Gr¨obner basis of N , whichis uniquely defined in terms of the module N and the monomial order ≺ . Namely, this is theGr¨obner basis { f , . . . , f s } of N such that for 1 (cid:54) i (cid:54) s , lt ≺ ( f i ) is monic and does not divide anyterm of f j for j (cid:44) i . Monomial basis and staircase monomials . In what follows, the submodules
N ⊆ K [ X ] m weconsider are such that the quotient K [ X ] m / N has finite dimension as a K -vector space. We willoften use its basis formed by the monomials not in lt ≺ ( N ) [13, Thm. 15.3]; this basis is denotedby E = ( ε , . . . , ε D ) and called the ≺ -monomial basis of K [ X ] m / N . Any polynomial f ∈ K [ X ] m can be uniquely written f = g + h , where g ∈ N and h ∈ K [ X ] m is a K -linear combinationof the monomials in E ; this polynomial h is called the ≺ -normal form of f (with respect to N ) and is denoted by nf ≺ ( f ). We extend the notation to sets of polynomials F ⊆ K [ X ] m , that is,nf ≺ ( F ) = { nf ≺ ( f ) | f ∈ F } .As in [35, Sec. 3] (which focuses on the case of ideals), we will use other sets of monomialsrelated to this monomial basis. First, we consider the monomials obtained by multiplying thoseof E by a variable: S = { X k ε j | (cid:54) k (cid:54) r , (cid:54) j (cid:54) D } ∪ { c i | (cid:54) i (cid:54) m such that c i (cid:60) E} . This allows us to define the border B = S − E , which is a set of monomial generators of (cid:104) lt ≺ ( N ) (cid:105) .Then the polynomials { µ − nf ≺ ( µ ) | µ ∈ B} form a canonical generating set of N , called the ≺ -border basis of N [34, 35]. Finally, we consider the minimal generating set L of (cid:104) lt ≺ ( N ) (cid:105) : itis a subset of B such that the reduced ≺ -Gr¨obner basis of N is G = { µ − nf ≺ ( µ ) | µ ∈ L} . Inparticular, we have L = lt ≺ ( G ) = { lt ≺ ( f ) | f ∈ G} .By construction, Card( L ) = Card( G ) (cid:54) Card( B ) (cid:54) Card( S ). Above, S is defined as theunion of a set of cardinality at most rD and a set of cardinality at most m , hence Card( S ) (cid:54) rD + m . Besides, since the coordinate vectors in the second set are in the minimal generatingset L of (cid:104) lt ≺ ( N ) (cid:105) , we have Card( B − L ) (cid:54) rD . Note that if the upper bound on Card( S ) is anequality, then all coordinate vectors c , . . . , c m are in L , which holds only in the case N = K [ X ] m (in particular E = ∅ and D = Lemma 2.2.
Let T be a monomial submodule of K [ X ] m and let { µ , . . . , µ s } be the minimalgenerating set of T . Then H ( T ) holds if and only if for all k ∈ { , . . . , s } , for all j ∈ { , . . . , r } such that X j divides µ k , for all i ∈ { , . . . , j − } , we have X i X j µ k ∈ T . roof. Obviously, H ( T ) implies the latter property since each µ k is a monomial in T . Con-versely, we assume that for all k ∈ { , . . . , s } , for all j ∈ { , . . . , r } such that X j divides µ k , forall i ∈ { , . . . , j − } , we have X i X j µ k ∈ T , and we want to prove that H ( T ) holds. Let µ be amonomial in T , let j ∈ { , . . . , r } be such that X j divides µ , and let i ∈ { , . . . , j − } ; we wantto prove that X i X j µ ∈ T . Since T is a monomial module, µ = νµ k for some monomial ν ∈ K [ X ]and 1 (cid:54) k (cid:54) s . By assumption, X j divides νµ k , thus either X j divides ν or X j divides µ k . Inthe first case X i X j µ = ( X i X j ν ) µ k ∈ T , and in the second case X i X j µ k ∈ T by assumption and therefore X i X j µ = ν ( X i X j µ k ) ∈ T .
3. Computing bases of syzygies via linear algebra
In this section, we focus on Problem 1 and we prove Theorem 1.7. Thus, we are givenpairwise commuting matrices M = ( M , . . . , M r ) in K D × D , a matrix F ∈ K m × D and a monomialorder ≺ on K [ X ] m = K [ X , . . . , X r ] m ; we compute the reduced ≺ -Gr¨obner basis of Syz M ( F ).The basic ingredient is a linearization of the problem, meaning that we will interpret alloperations on polynomials as operations of K -linear algebra. In Section 3.1, we show a corre-spondence between syzygies of bounded degree and vectors in the nullspace of a matrix over K which exhibits a structure that we call multi-Krylov . The multi-Krylov matrix is formed bymultiplications of F by powers of the multiplications matrices; its rows are ordered according tothe monomial order ≺ given as input of Problem 1.Then, in Section 3.2, we show that the row rank profile of this multi-Krylov matrix exactlycorresponds to the ≺ -monomial basis of the quotient K [ X ] m / Syz M ( F ). To compute this row rankprofile e ffi ciently, we use in particular an idea from [28] which we extend to our context, whilealso both exploiting the structure of the multi-Krylov matrix to always work on a small subsetof its rows and ensuring that the rows are considered in the right order. Finally, in Section 3.3,we exploit the knowledge of the ≺ -monomial basis to compute the reduced ≺ -Gr¨obner basis ofsyzygies. We first describe expansion and contraction operations, which convert polynomials of boundeddegrees into their coe ffi cient vectors and vice versa (the bound is written β below). It will be con-venient to rely on the following indexing function. Definition 3.1.
Let β = ( β , . . . , β r ) ∈ N r > and let ≺ be a monomial order on K [ X ] m . Then wedefine the ( ≺ , β )-indexing function φ ≺ , β as the unique bijection φ ≺ , β : { X e c i | (cid:54) e < β , (cid:54) i (cid:54) m } → { , . . . , m β · · · β r } which is increasing for ≺ , that is, such that X e c i ≺ X e (cid:48) c i (cid:48) if and only if φ ≺ , β ( X e c i ) < φ ≺ , β ( X e (cid:48) c i (cid:48) ) . In other words, take the sequence of monomials ( X e c i , (cid:54) e < β , (cid:54) i (cid:54) m ) and sort itaccording to ≺ ; then φ ≺ , β ( X e c i ) is the index of X e c i in the sorted sequence (assuming indicesstart at 1).Hereafter, K [ X ] < β stands for the set of polynomials p ∈ K [ X ] such that deg X k ( p ) < β k for1 (cid:54) k (cid:54) r . This yields a K -linear correspondence between bounded-degree polynomials and11ectors E ≺ , β : K [ X ] m < β → K × m β ··· β r p = (cid:88) f = X e c i (cid:54) e < β , (cid:54) i (cid:54) m u f f (cid:55)→ v = [ u φ − ≺ , β ( k ) | (cid:54) k (cid:54) m β · · · β r ]called expansion , with inverse C ≺ , β called contraction . For a polynomial p ∈ K [ X ] m < β , E ≺ , β ( p ) isthe vector in K × m β ··· β r whose entry at index φ ≺ , β ( X e c i ) is the coe ffi cient of the term involving X e c i in p . Example . Consider the case with r = m =
2, using the ≺ lex -term over positionorder ≺ toplex on K [ X , Y ] , with Y ≺ lex X . Choosing the degree bounds β = (2 , { X j Y k c i | (cid:54) ( j , k ) < (2 , , (cid:54) i (cid:54) } are indexed as follows, according to Definition 3.1:Monomial Index X j Y k c i φ ≺ toplex , (2 , ( X j Y k c i ) (cid:104) (cid:105) (cid:104) (cid:105) (cid:104) Y (cid:105) (cid:104) Y (cid:105) (cid:104) Y (cid:105) (cid:104) Y (cid:105) (cid:104) X (cid:105) (cid:104) X (cid:105) (cid:104) XY (cid:105) (cid:104) XY (cid:105) (cid:104) XY (cid:105) (cid:104) XY (cid:105) p be the polynomial in K [ X , Y ] < (2 , and v be the vector in K × defined by p = (cid:104) + Y + X + XY + Y + Y + X + XY + XY (cid:105) , v = (cid:104)
86 0 32 83 54 26 0 68 86 0 54 22 (cid:105) . In this case, the expansion of p and the contraction of v are given by E ≺ , β ( p ) = (cid:104)
46 36 95 18 0 38 75 77 10 83 0 35 (cid:105) , C ≺ , β ( v ) = (cid:104) + Y + Y + XY + XY Y + Y + X + XY (cid:105) . Now, we detail the construction of the multi-Krylov matrix. Let M = ( M , . . . , M r ) bepairwise commuting matrices in K D × D that define a K [ X ]-module structure on K × D , and let12 be in K m × D , with rows f , . . . , f m . As mentioned in Definition 1.1, for a polynomial p = [ p , . . . , p m ] ∈ K [ X ] m we write p · F = p · f + · · · + p m · f m = f p ( M ) + · · · + f m p m ( M ) . As a result, a polynomial p is in Syz M ( F ) if its coe ffi cients form a K -linear combination ofvectors of the form f i M e which is zero. If furthermore p is nonzero and has its degrees in eachvariable bounded by ( β , . . . , β r ), then it corresponds to a nontrivial K -linear relation between therow vectors { f i M e | (cid:54) e < β , (cid:54) i (cid:54) m } . This leads us to consider the matrices formed by these vectors, ordered according to φ ≺ , β . Definition 3.3.
Let ≺ be a monomial order on K [ X ] m , let M = ( M , . . . , M r ) ∈ K D × D be pairwisecommuting matrices, let F ∈ K m × D whose rows are f , . . . , f m , and let β = ( β , . . . , β r ) ∈ N r > .The ( ≺ , β )-multi-Krylov matrix for ( M , F ) , denoted by K ≺ , β ( M , F ) , is the matrix in K m β ··· β r × D whose row at index φ ≺ , β ( X e c i ) is X e c i · F = f i M e , for (cid:54) e < β and (cid:54) i (cid:54) m.Example . Following on from Example 3.2, we consider the vector space dimension D = F in K × and M = ( M X , M Y ) in K × such that M X M Y = M Y M X . Then from theindexing function φ ≺ toplex , (2 , described above we obtain K ≺ toplex , (2 , ( M , F ) = FFM Y FM Y FM X FM X M Y FM X M Y ∈ K × . By construction, we have the following result, which relates the left nullspace of the multi-Krylov matrix with the set of bounded-degree syzygies.
Lemma 3.5. If v ∈ K × m β ··· β r is in the left nullspace of K ≺ , β ( M , F ) , then its contraction C ≺ , β ( v ) ∈ K [ X ] m < β is in Syz M ( F ) . Conversely, if p ∈ K [ X ] m < β is in Syz M ( F ) , then its expansion E ≺ , β ( p ) ∈ K × m β ··· β r is in the left nullspace of K ≺ , β ( M , F ) . Our first step towards finding a ≺ -Gr¨obner basis G of Syz M ( F ) consists in computing the ≺ -monomial basis E of the quotient K [ X ] m / Syz M ( F ); we are going to prove that it correspondsto the row rank profile of the multi-Krylov matrix. From the above discussion, we know thatconsidering this matrix only gives us access to syzygies which satisfy degree constraints. Inwhat follows, we will choose β = ( β , . . . , β r ), with β i (cid:62) D for all i . In particular, for this choice,all elements in the monomial basis of K [ X ] m / Syz M ( F ) are in K [ X ] m < β .We recall that, for a matrix A in K µ × ν , the row rank profile of A is the lexicographicallysmallest subtuple ( ρ , . . . , ρ ∆ ) of (1 , . . . , µ ) such that ∆ is the rank of A and the rows ( ρ , . . . , ρ ∆ )of A are linearly independent. Theorem 3.6.
Let ≺ be a monomial order on K [ X ] m , let M = ( M , . . . , M r ) be pairwise com-muting matrices in K D × D , let F ∈ K m × D , and let β = ( β , . . . , β r ) , with β i (cid:62) D for all i. Letfurther ( ρ , . . . , ρ ∆ ) ∈ N ∆ > be the row rank profile of K ≺ , β ( M , F ) . Then the ≺ -monomial basis E of K [ X ] m / Syz M ( F ) is equal to { φ − ≺ , β ( ρ ) , . . . , φ − ≺ , β ( ρ ∆ ) } . roof. Write ρ j = φ ≺ , β ( X e j c i j ), so that { φ − ≺ , β ( ρ ) , . . . , φ − ≺ , β ( ρ ∆ ) } = { X e c i , . . . , X e ∆ c i ∆ } . We wantto prove that lt ≺ (Syz M ( F )) is the set of monomials not in { X e c i , . . . , X e ∆ c i ∆ } .First, consider any monomial X e c i for 1 (cid:54) i (cid:54) m and e ∈ N r such that e ≮ β . Such amonomial cannot be in { X e c i , . . . , X e ∆ c i ∆ } since by construction of K ≺ , β ( M , F ) we have e j < β for all j . On the other hand, X e c i cannot be in the ≺ -monomial basis E either. Indeed, writing X e = X e · · · X e r r , e ≮ β means that e k (cid:62) β k (cid:62) D for some index k ; if X e c i is in E then X e k k c i is alsoin E and thus c i , X k c i , . . . , X Dk c i are in E , which is a contradiction since linear independence wouldimply that the minimal polynomial of M k has degree greater than D . Hence X e c i ∈ lt ≺ (Syz M ( F )).Now, let X e c i ∈ lt ≺ (Syz M ( F )) be such that e < β . Then there is a polynomial p in Syz M ( F )such that lt ≺ ( p ) = X e c i , and Lemma 3.5 implies that E ≺ , β ( p ) is in the left nullspace of K ≺ , β ( M , F ).Since by construction the rightmost nonzero entry of E ≺ , β ( p ) is 1 at index φ ≺ , β ( X e c i ), the vector E ≺ , β ( p ) expresses the row of K ≺ , β ( M , F ) with index φ ≺ , β ( X e c i ) as a K -linear combination ofthe rows with smaller indices. By definition of the row rank profile, this implies φ ≺ , β ( X e c i ) (cid:60) { ρ , . . . , ρ ∆ } , and therefore X e c i (cid:60) { X e c i , . . . , X e ∆ c i ∆ } .Conversely, let X e c i (cid:60) { X e c i , . . . , X e ∆ c i ∆ } be a monomial such that e < β . Then φ ≺ , β ( X e c i ) (cid:60) { ρ , . . . , ρ ∆ } . Thus, by definition of the row rank profile, there is a nonzero vector v ∈ K × mD r such that v is in the left nullspace of K ≺ , β ( M , F ) and the rightmost nonzero entry of v is 1 atindex φ ≺ , β ( X e c i ). Then lt ≺ ( C ≺ , β ( v )) = X e c i , and according to Lemma 3.5, C ≺ , β ( v ) is in Syz M ( F ),hence lt ≺ ( C ≺ , β ( v )) ∈ lt ≺ (Syz M ( F )).In particular, we see that the dimension ∆ of K [ X ] m / Syz M ( F ) as a K -vector space is equalto the rank of the multi-Krylov matrix K ≺ , β ( M , F ). In particular this implies ∆ (cid:54) D , since K ≺ , β ( M , F ) has D columns. We now show how to exploit the structure of the multi-Krylov matrix so as to e ffi cientlycompute its row rank profile, yielding the ≺ -monomial basis E of K [ X ] m / Syz M ( F ).For β as in Theorem 3.6, the dense representation of K ≺ , β ( M , F ) uses at least mD r + fieldelements, which is well beyond our target cost O ˜( mD ω − + rD ω ). On the other hand, this matrixis succinctly described by the data ( ≺ , M , F ), which requires O ( mD + rD ) field elements. Likeprevious related algorithms [36, 16, 35], we will never compute the full dense representation ofthis matrix, but rather always store and use a minimal amount of data that allows the algorithmto progress. The main property behind this is that once some monomial is found not to be in thesought monomial basis, then all multiples of that monomial can be discarded from the rest of thecomputation.Our algorithm for computing the monomial basis E = ( ε , . . . , ε ∆ ) also returns the matrix B ∈ K ∆ × D whose rows are ε · F , . . . , ε ∆ · F ; it will be needed later on. The algorithm exploitsthe structure of this matrix, and uses fast arithmetic for matrices over K through • a procedure R ow R ank P rofile which computes the row rank profile of any matrix in K µ × ν of rank ρ in O ( ρ ω − µν ) operations in K , as described in [46, Sec. 2.2]; • matrix multiplication which is incorporated by following a strategy in the style of Keller-Gehrig [28].In short, the latter strategy can be thought of as precomputing powers of the form M e j ofthe multiplication matrices, which then allows us to group many vector-matrix products into asmall number of matrix-matrix products. To achieve this we work iteratively on the variables,thus first focusing on all operations involving M , then on those involving M , etc. The order of14he rows specified by ≺ is not respected in this process, since at a fixed stage of the algorithm wewill only have considered a submatrix of the multi-Krylov matrix which does not involve the lastvariables. To fix this we constantly re-order, according to ≺ , the rows that have been processedand the ones that we introduce. Algorithm 1 – M onomial B asis Input: • monomial order ≺ on K [ X ] m = K [ X , . . . , X r ] m , • pairwise commuting matrices M = ( M , . . . , M r ) in K D × D , • matrix F ∈ K m × D . Output: • the ≺ -monomial basis E = ( ε , . . . , ε ∆ ) of K [ X ] m / Syz M ( F ), • the matrix B ∈ K ∆ × D whose rows are ε · F , . . . , ε ∆ · F . β ← (cid:100) log ( D ) (cid:101) + ; β ← ( β, . . . , β ) ∈ N r > φ ≺ , β ← the indexing function in Definition 3.1 π ← the permutation matrix in { , } m × m such that the tuple t = π [ φ ≺ , β ( c ) , . . . , φ ≺ , β ( c m )] T is increasing B ← π F ˆ δ, ( i , . . . , i ˆ δ ) ← R ow R ank P rofile ( B ) For k from to r // iterate over the variables a. P ← M k ; e ← b. Do (i) δ ← ˆ δ (ii) ( ρ , . . . , ρ δ ) ← subtuple of t with entries i , . . . , i δ (iii) B ← the submatrix of B with rows i , . . . , i δ (iv) ˆ ρ j ← φ ≺ , β ( X e k φ − ≺ , β ( ρ j )) for 1 (cid:54) j (cid:54) δ (v) π ← the permutation matrix in { , } δ × δ such that the tuple t = π [ ρ , . . . , ρ δ , ˆ ρ , . . . , ˆ ρ δ ] T is increasing(vi) B ← π (cid:34) BBP (cid:35) (vii) ˆ δ, ( i , . . . , i ˆ δ ) ← R ow R ank P rofile ( B )(viii) P ← P ; e ← e + Until ˆ δ = δ and ( ρ , . . . , ρ δ ) = subtuple of t with entries i , . . . , i δ Return E = ( φ − ≺ , β ( ρ ) , . . . , φ − ≺ , β ( ρ δ )) and the submatrix of B with rows i , . . . , i δ Proposition 3.7.
Algorithm 1 returns the ≺ -monomial basis E = ( ε , . . . , ε ∆ ) of K [ X ] m / Syz M ( F ) and the matrix B ∈ K ∆ × D whose rows are ε · F , . . . , ε ∆ · F . It usesO (cid:16) mD ω − + D ω ( r + log( d · · · d r )) (cid:17) ⊂ O (cid:16) mD ω − + rD ω log( D ) (cid:17) operations in K , where d k ∈ { , . . . , D } is the degree of the minimal polynomial of M k , for (cid:54) k (cid:54) r. roof. Let β = (cid:100) log ( D ) (cid:101) + as in the algorithm; for 1 (cid:54) k (cid:54) r and 0 (cid:54) e (cid:54) log ( β ), let us considerthe set of monomials S k , e = { X e c i | (cid:54) i (cid:54) m , (cid:54) e < ( β, . . . , β, e , , . . . , } , where 2 e is the k th entry of the tuple. Then we denote by C k , e ∈ K ( m β k − e ) × D the submatrix of K ≺ , β ( M , F ) formed by its rows in φ ≺ , β ( S k , e ). For 0 (cid:54) e (cid:54) e (cid:48) (cid:54) log ( β ), C k , e is a submatrix of C k , e (cid:48) , but not necessarily a top submatrix of it. Remark also that for k < r , C k , log ( β ) = C k + , .The correctness of the algorithm is proved by an induction that involves k and e . For 1 (cid:54) k (cid:54) r , denote by (cid:96) k (cid:62) e for which we enter the body of the Do-Until loop (Step ). Then, for 1 (cid:54) k (cid:54) r and 0 (cid:54) e (cid:54) (cid:96) k , define the following assertions, that weconsider at the beginning iteration ( k , e ) of the Do-Until loop: A : the rows of indices i , . . . , i ˆ δ in B are the rows defining the row rank profile of C k , e ; A : the entries of indices i , . . . , i ˆ δ in t are the indices of these rows in K ≺ , β ( M , F ).We will prove by induction that these properties hold for all values of k and e considered above.For k = e = C , is the submatrix of K ≺ , β ( M , F ) with rows in { φ ≺ , β ( c ) , . . . , φ ≺ , β ( c m ) } ;therefore, by choice of the permutation π at Step , we have C , = π F = B at Step . Thus,upon entering the For loop for the first time, with k = e =
0, we see that A and A hold.Then let k be in { , . . . , r } and e in { , . . . , (cid:96) k } ; we assume that A and A hold at indices k and e . Let us denote ρ = { ρ , . . . , ρ δ } as defined in Step (ii), and let ˆ ρ = { ˆ ρ , . . . , ˆ ρ δ } ,where ˆ ρ j = φ ≺ , β ( X e k φ − ≺ , β ( ρ j )) for 1 (cid:54) j (cid:54) δ are the indices computed at Step (iv). Let also( γ , . . . , γ ν ) be the indices of the rows of K ≺ , β ( M , F ) corresponding to the row rank profile of itssubmatrix C k , e + . To complete this proof, we will use three intermediate lemmas; first, we claimthat the following holds: Lemma 3.8. ( γ , . . . , γ ν ) is a subsequence of the tuple t .Proof of Lemma 3.8. Let j be in { , . . . , ν } and let us prove that γ j is in ρ ∪ ˆ ρ . By assumption,the row of index γ j in K ≺ , β ( M , F ) is not a linear combination of the rows of smaller indices inthe submatrix C k , e + of K ≺ , β ( M , F ).Suppose first that φ − ≺ , β ( γ j ) is in S k , e , so that γ j actually corresponds to a row in C k , e . Since C k , e is a submatrix of C k , e + , the remark above implies that the row of index γ j in K ≺ , β ( M , F ) isnot a linear combination of the rows of smaller indices in C k , e . This means that this row belongsto the row rank profile of C k , e , and so (by A ) γ j is in ρ .Now, we assume that φ − ≺ , β ( γ j ) ∈ S k , e + − S k , e , and we prove that γ j ∈ ˆ ρ , or in other words,that φ − ≺ , β ( γ j ) ∈ { X e k φ − ≺ , β ( ρ j ) | (cid:54) j (cid:54) δ } . Since φ − ≺ , β ( γ j ) ∈ S k , e + − S k , e , we can write φ − ≺ , β ( γ j ) = X e k X f c i , with X f c i in S k , e . Suppose that X f c i is not in φ − ≺ , β ( ρ ), so that by A , the row of index φ ≺ , β ( X f c i ) in K ≺ , β ( M , F ) is a linear combination of the previous rows in C k , e . Right-multiply allthese rows by M e k ; this shows that the row indexed by γ j in K ≺ , β ( M , F ) is a linear combinationof rows of smaller indices in its submatrix C k , e + , a contradiction. Our claim is proved.Now, by A and A , after the update at Steps (vi), the rows of B are precisely the rows of K ≺ , β ( M , F ) of indices in ρ ∪ ˆ ρ sorted in increasing order. Thus, after the row rank profile com-putation at Step (vii), the rows in B of indices i , . . . , i ˆ δ are the rows of C k , e + correspondingto its row rank profile, and the subtuple of t with entries i , . . . , i ˆ δ is precisely ( γ , . . . , γ ν ).If e < (cid:96) k , this implies that A and A still hold at step ( k , e + e = (cid:96) k . We claim the following. 16 emma 3.9. We have (cid:96) k (cid:54) (cid:100) log ( d k ) (cid:101) ; equivalently, if we exit the Do-Until loop at the end ofiteration e, then e (cid:54) (cid:100) log ( d k ) (cid:101) .Proof of Lemma 3.9. First, we observe that the indices of the rows in K ≺ , β ( M , F ) correspondingto the row rank profile of C k , e − , and of those corresponding to the row rank profile of C k , e , aredi ff erent. Indeed, A and A show that the former correspond to indices ( ρ , . . . , ρ δ ) obtained atStep (ii) at iteration e −
1, the latter to the same indices at iteration e , and the fact that we didnot exit the loop at step e − ff erent.Suppose then, by means of contradiction, that e (cid:62) (cid:100) log ( d k ) (cid:101) +
1, so that e (cid:62) log ( d k ) + r in C k , e that is not in C k , e − ; then, r = sM e − k for some row s in C k , e − . Byassumption, 2 e − is at least equal to the degree d k of the minimal polynomial of M k . In particular, M e − k is a linear combination of powers of M k of exponent less than 2 e − . Now, all rows sM ik ,for i < e − , are in C k , e , and have lower indices than r . This implies that r is not in the rowrank profile of C k , e , and thus C k , e − and C k , e have the same row rank profile. This contradicts theproperty in the previous paragraph, hence e (cid:54) (cid:100) log ( d k ) (cid:101) .Using this property, we prove that A and A now hold for indices k + e = Lemma 3.10.
If we exit the
Do-Until loop after step e (equivalently, if e = (cid:96) k ), then the rowsin B of indices i , . . . , i ˆ δ are the rows of C k , log ( β ) corresponding to its row rank profile, and thesubtuple of t with entries i , . . . , i ˆ δ is the indices of these rows in K ≺ , β ( M , F ) .Proof of Lemma 3.10. Our assumption means that ( γ , . . . , γ ν ) = ( ρ , . . . , ρ δ ); this is equivalentto saying that the indices in K ≺ , β ( M , F ) of the row rank profiles of its submatrices C k , e and C k , e + are the same. In particular, any row in C k , e + is a linear combination of rows of lower indices in C k , e .We will prove the following below: any row in C k , log ( β ) is a linear combination of rows oflower indices in C k , e . In particular, this implies that the indices in K ≺ , β ( M , F ) of the row rank pro-files of its submatrices C k , e + and C k , log ( β ) are all the same. Since we saw that after Step (vii),the rows in B of indices i , . . . , i ˆ δ are the rows of C k , e + corresponding to its row rank profile, andthe subtuple of t with entries i , . . . , i ˆ δ are the indices of these rows in K ≺ , β ( M , F ), this is enoughto prove the lemma.We prove our claim by induction on the rows of C k , log ( β ) . Let thus r be a row in C k , log ( β ) , andassume the claim holds for all previous rows.If r is from its submatrix C k , e , we are done. Else, since e + (cid:54) log ( β ) holds (from theprevious lemma), r can be written as r = sM ck , for some c (cid:62)
0, where s is a row in C k , e + . Weknow that s is a linear combination of rows s , . . . , s ι of lower indices in C k , e , so that r is a linearcombination of s M ck , . . . , s ι M ck . All these rows are in C k , log ( β ) and have lower indices than r . Byour induction assumption, they are linear combinations of rows of lower indices in C k , e , and thusso is r . (Continuing proof of Proposition 3.7.) If k < r then we can turn to the next variable X k + , sincethe former lemma, together with the equality C k , log ( β ) = C k + , , shows that A and A hold forindices ( k + , k = r , since C r , log ( β ) = K ≺ , β ( M , F ), the lemma shows that the output of thealgorithm is indeed the row rank profile of K ≺ , β ( M , F ) and the submatrix of K ≺ , β ( M , F ) formedby the corresponding rows. According to Theorem 3.6, the ≺ -monomial basis can directly bededuced from the rank profile of K ≺ , β ( M , F ). This concludes the proof of correctness.17oncerning the cost bound, according to [46, Thm. 2.10], the row rank profile computationat Step can be computed in O ( ρ ω − mD ) operations, where ρ (cid:54) D is the rank of F . In particular,this is O ( mD ω − ).Let us now focus on the iteration ( k , e ) and we show that it uses O ( D ω ) operations. First,note that δ (cid:54) D holds throughout (since δ is the rank of a matrix with D columns). Since uponentering Step (vi), B has δ (cid:54) D rows and D columns, its update and the row rank profileof the latter can be computed in O ( D ω ) operations. Finally, squaring the D × D matrix P atStep (viii) is also done in O ( D ω ) operations.To conclude the proof of the cost bound, we recall from Lemma 3.9 that in iteration k of the For loop, we pass (cid:96) k + (cid:54) (cid:100) log ( d k ) (cid:101) + Do-Until loop.
Remark . We may slightly refine the analysis for some particular monomial orders. Indeed,the order in which the
For and
Do-Until loops introduce the new monomials to be processedcorresponds to the ≺ lex -term over position order ≺ toplex over K [ X ] m , with X ≺ lex · · · ≺ lex X r . As aresult, the behaviour and the cost bound of the algorithm can be described with more precision ifthe input monomial order is ≺ = ≺ toplex .In this case, we are processing the rows of K ≺ , β ( M , F ) in the order they are in the matrix.In particular, the permutation π at Steps and (v) is always the identity matrix, and thetuple ( ρ , . . . , ρ δ ) inside the loops consists of the first δ entries of the actual row rank profile of K ≺ , β ( M , F ).Furthermore, the fact that we are processing the rows in their actual order has a small impacton the cost bound, as follows. Let us denote by G the reduced ≺ -Gr¨obner basis of Syz M ( F ), andlet ˆ β = ( ˆ β , . . . , ˆ β r ) be the tuple of maximum degrees in G , that is, ˆ β k = max p ∈G deg X k ( p ) for1 (cid:54) k (cid:54) r .Then, at iteration k of the For loop, the
Do-Until loop does at most (cid:100) log ( ˆ β k + (cid:101) + X k all greater than ˆ β k ,the partial row rank profile ( ρ , . . . , ρ δ ) is not modified anymore, and the Do-Until loop exits.Now, considering the total number of iterations, we claim that (cid:88) (cid:54) k (cid:54) r log( ˆ β k + (cid:54) r log (cid:32) ˆ β + · · · + ˆ β r r + (cid:33) (cid:54) r log (cid:18) Dr + (cid:19) . (1)As a result, when the input order is ≺ = ≺ toplex , Algorithm 1 uses O (cid:18) mD ω − + rD ω log (cid:18) Dr + (cid:19)(cid:19) operations in K .We now prove our claim. The first inequality in Eq. (1) is a direct application of the arithmeticmean-geometric mean inequality. The second inequality follows from the bound ˆ β + · · · + ˆ β r (cid:54) D + r −
1, which holds since the ≺ -monomial basis, whose cardinality ∆ is at most D , contains the1 + ˆ β + · · · + ˆ β r − r distinct monomials specified hereafter. By definition of ˆ β , for each k ∈ { , . . . , r } such that ˆ β k >
0, there is a monomial appearing in some element of G which is a multiple of X ˆ β k k c i k for some 1 (cid:54) i k (cid:54) m . Thus X ˆ β k k c i k is either in the ≺ -monomial basis or in its border, whichimplies that the ≺ -monomial basis contains { c i k , X k c i k , . . . , X ˆ β k − k c i k } . Considering the union of allsuch sets for 1 (cid:54) k (cid:54) r (with the empty set if ˆ β k =
0) yields at least 1 + ( ˆ β − + · · · + ( ˆ β r − .3. Fast computation of the basis of syzygies Next, we present our algorithm to compute the reduced Gr¨obner basis of Syz M ( F ). By def-inition, it can be described by the minimal generators of the leading module of Syz M ( F ) alongwith the associated normal forms. We first show how to use the knowledge of the monomialbasis to compute such normal forms e ffi ciently. In all this section, we let M = ( M , . . . , M r ) bepairwise commuting matrices in K D × D , F be in K m × D , ≺ be a monomial order on K [ X ] m , and E be the ≺ -monomial basis of K [ X ] m / Syz M ( F ). First, for some arbitrary monomials { X e c i , . . . , X e s c i s } , we give an algorithm that computestheir ≺ -normal forms with respect to Syz M ( F ). Each of these ≺ -normal forms is a uniquelydefined K -linear combination of the monomials in the ≺ -monomial basis E ; the main task of ouralgorithm is to find the coe ffi cients of these s combinations, which we gather below in an s × ∆ matrix N .In the linearized viewpoint, we associate the monomials { X e c i , . . . , X e s c i s } with a matrix T ∈ K s × D , whose j th row is X e j c i j · F = f i j M e j , where f i j is the i j th row of F (we are usingthe notation of Definition 1.1). Similarly, we associate the monomial basis E with a matrix B ∈ K ∆ × D , where ∆ = dim K ( K [ X ] m / Syz M ( F )) as above; if we write E = ( ε ≺ · · · ≺ ε ∆ ), thenthe j th row of B is ε j · F .Then the rows of T are K -linear combinations of the rows of B : there is a matrix N ∈ K s × ∆ such that T = NB . (The notation T stands for terms , while B stands for basis , and N for normalforms .) To compute this matrix N , one can directly use N = TB − if B is square, and moregenerally one can use a similar identity N = ¯ T ¯ B − where ¯ B is a ∆ × ∆ invertible submatrix of B ,as detailed in Step of Algorithm 2. Proposition 3.12.
Algorithm 2 is correct and uses O ( ∆ ω − ( D + s )) operations in K .Proof. For the correctness, we focus on the case s =
1; to prove the general case s (cid:62) ffi cesto apply the following arguments for each j ∈ { , . . . , s } , to the j th row of T and the correspondingoutput element ν j . Thus, we consider T = X e c i · F = f i M e for some e ∈ N r and 1 (cid:54) i (cid:54) m , andour goal is to prove that nf ≺ ( X e c i ) = ν ε + · · · + ν ∆ ε ∆ where N = [ ν · · · ν ∆ ] is the uniquevector in K × ∆ such that T = NB . Choosing large enough exponent bounds β ∈ N r > , such as β = (max( e , D ) + , . . . , max( e , D ) + T is a row of the multi-Krylov matrix K ≺ , β ( M , F ), and from Theorem 3.6 that B is the submatrix of K ≺ , β ( M , F ) formedby the rows corresponding to its row rank profile. This proves the existence and uniqueness of N such that T = NB .We now explain the computation of N in Step . We use the column rank profile of B as aspecific set of column indices ¯ ρ < · · · < ¯ ρ ∆ such that the corresponding ∆ × ∆ submatrix ¯ B of B is invertible. Then, writing ¯ T ∈ K × ∆ for the subvector of T formed by its entries { ¯ ρ , . . . , ¯ ρ ∆ } ,the identity T = NB yields ¯ T = N ¯ B , hence N = ¯ T ¯ B − .Since the j th row of B is ε j · F , we have = T − NB = ( X e c i − ν ε + · · · + ν ∆ ε ∆ ) · F , hence X e c i − ν ε + · · · + ν ∆ ε ∆ is in Syz M ( F ). Thusnf ≺ ( X e c i ) = nf ≺ ( ν ε + · · · + ν ∆ ε ∆ ) = ν ε + · · · + ν ∆ ε ∆ ;indeed ν ε + · · · + ν ∆ ε ∆ is already in ≺ -normal form as it is a combination of the ≺ -monomialbasis E of K [ X ] m / Syz M ( F ). This concludes the proof of correctness.19 lgorithm 2 – N ormal F orm Input: • matrix T ∈ K s × D , • list of monomials E = ( ε , . . . , ε ∆ ) in K [ X ] m , • matrix B ∈ K ∆ × D with full row rank. Output: list ( ν , . . . , ν s ) of elements of K [ X ] m Ensures: assuming the following holds: • M = ( M , . . . , M r ) are pairwise commuting matrices in K D × D , • F is a matrix in K m × D with rows f , . . . , f m , • the j th row of T is X e j c i j · F = f i j M e j for some monomial X e j c i j , • E is the ≺ -monomial basis of K [ X ] m / Syz M ( F ) for some monomial order ≺ on K [ X ] m , • the j th row of B is ε j · F ,then ν j = nf ≺ ( X e j c i j ) for 1 (cid:54) j (cid:54) s . /* Compute the matrix N ∈ K s × ∆ such that T = NB */ ( ¯ ρ , . . . , ¯ ρ ∆ ) ← the column rank profile of B ¯ B and ¯ T ← submatrices of B and T formed by the columns { ¯ ρ , . . . , ¯ ρ ∆ } N = [ ν i , j ] (cid:54) i (cid:54) s , (cid:54) j (cid:54) ∆ ← ¯ T ¯ B − /* Deduce normal forms */ For i from to s : ν i ← ν i , ε + · · · + ν i , ∆ ε ∆ ∈ K [ X ] m Return ( ν , . . . , ν s )Concerning the cost bound, Steps and do not require operations in K . In Step , the col-umn rank profile is obtained in O ( ∆ ω − D ) operations according to [46, Thm. 2.10]; the inversionof ¯ B costs O ( ∆ ω ); and the multiplication ¯ T ¯ B − uses O ( s ∆ ω − ) operations if s (cid:62) ∆ , and O ( ∆ ω )otherwise. Since ∆ (cid:54) D we obtain the announced bound. Remark . One may observe that Algorithm 2 works in the more general case where E isa basis of the K -vector space K [ X ] m / Syz M ( F ), and thus in particular when E is the monomialbasis associated to a border basis of Syz M ( F ) which is not necessarily related to a monomialorder. In that case, each output element ν j ∈ K [ X ] m is the unique polynomial which is equalto X e j c i j modulo Syz M ( F ) and whose monomials are in E . Besides, the argument referring toTheorem 3.6 in the proof above can be replaced by the fact that, assuming β large enough, B is asubmatrix of K ≺ , β ( M , F ) formed by ∆ = rank( K ≺ , β ( M , F )) linearly independent rows. The abovecost bound thus also holds in this more general case, yet one should note that using Algorithm 2requires the knowledge of the input matrix B , which might be expensive to compute from E and F depending on the choice of E . In our specific case, Algorithm 1 outputs both E and B . To compute the reduced ≺ -Gr¨obner basis G of Syz M ( F ), we start by using Algorithm 1 tofind the ≺ -monomial basis E = ( ε , . . . , ε ∆ ), together with the matrix B giving all ε k · F . From E , we deduce the set L = { lt ≺ ( p ) | p ∈ G} formed by the ≺ -leading terms of the polynomials in G , as explained in the next paragraph. Finally, having L , we compute ≺ -normal forms moduloSyz M ( F ) using Algorithm 2 so as to obtain G = { f − nf ≺ ( f ) | f ∈ L} . We refer to Section 2 for20ore details concerning the latter identity and the sets of monomials S and B used in the nextparagraph.To find L , we start from E = ( ε , . . . , ε ∆ ) and consider the set of multiples S = { X k ε j | (cid:54) k (cid:54) r , (cid:54) j (cid:54) ∆ } ∪ { c i | (cid:54) i (cid:54) m such that c i (cid:60) E} . It gives us the border, as B = S − E . The latter is a set of monomial generators for the mono-mial submodule (cid:104) lt ≺ (Syz M ( F )) (cid:105) , while L is the minimal generating set of the same submodule.Thus L can be found from B by discarding all monomials in B which are divisible by anothermonomial in B . The number of generators Card( G ) is not known in advance; it is at least m , andat most r ∆ + m as explained in Section 2. In particular, the output Gr¨obner basis is representedusing rD + mD field elements, as claimed in the introduction.The ≺ -normal forms of the monomials in L will be computed using Algorithm 2; for this,we need to know f · F , for f in L . The matrix B describes all ε j · F , for 1 (cid:54) j (cid:54) ∆ ; on theother hand, we know that any f in L which is not among { c , . . . , c m } is a product of the form f = X k ε j , for some k in { , . . . , r } and j in { , . . . , ∆ } . In such a case, f · F can be computed as( ε j · F ) M k ; in the algorithm, we use fast matrix multiplication to compute several f · F at once.Altogether, Algorithm 3 and Proposition 3.14 prove Theorem 1.7. Proposition 3.14.
Algorithm 3 is correct and usesO (cid:16) mD ω − + D ω ( r + log( d · · · d r )) (cid:17) ⊂ O (cid:16) mD ω − + rD ω log( D ) (cid:17) operations in K , where d k ∈ { , . . . , D } is the degree of the minimal polynomial of M k , for (cid:54) k (cid:54) r.Proof. Concerning correctness, the construction of B ensures that after Step , the rows of T k are the rows X − k X e k , j c i k , j · F . Therefore, after Step the rows of T k are the rows X e k , j c i k , j · F = f i k , j M e k , j . Then Proposition 3.12 implies that ν k , j computed at Step is the normal formnf ≺ ( X e k , j c i k , j ), for 1 (cid:54) j (cid:54) s k and 0 (cid:54) k (cid:54) r . This shows the correctness of the algorithm since,as explained above, the reduced ≺ -Gr¨obner basis of syzygies is { f − nf ≺ ( f ) | f ∈ L} .Concerning the cost bound, the cost of Step is given in Proposition 3.7 and is precisely thecost bound in the present proposition. Then, at the iteration k of the For loop, the multiplicationat Step involves the s k × D matrix T k and the D × D matrix M k . For k in { , . . . , r } , sincewe have by definition s k = Card( L k ) (cid:54) Card( E ) = ∆ (cid:54) D , this multiplication is performed in O ( D ω ) operations; over the r iterations, this leads to a total of O ( rD ω ) operations. For k =
0, wehave s = m , so the cost is O ( mD ω − + D ω ). Finally, we have L ∪ · · · ∪ L r = L and therefore s + · · · + s r (cid:54) Card( L ) (cid:54) r ∆ + m (cid:54) rD + m ; hence the cost for computing normal forms atStep is in O ( ∆ ω − ( D + s + · · · + s r )) ⊆ O ( mD ω − + rD ω ) according to Proposition 3.12. Remark . By considering B instead of L at the second step, one could slightly modifyAlgorithm 3 so that, instead of the reduced ≺ -Gr¨obner basis, it returns the border basis withrespect to the ≺ -monomial basis computed at the first step. One can verify that the computationof that monomial basis remains the most expensive step of the modified algorithm, and thus thatthe overall cost bound is the same as the one in Proposition 3.14.
4. Computing multiplication matrices from the Gr¨obner basis
In this section, we tackle Problem 2 and prove Theorem 1.9. In what follows, N is a submod-ule of K [ X ] n such that K [ X ] n / N has finite dimension D as a K -vector space, ≺ is a monomial21 lgorithm 3 – S yzygy M odule B asis Input: • monomial order ≺ on K [ X , . . . , X r ] × m , • pairwise commuting matrices M = ( M , . . . , M r ) in K D × D , • matrix F ∈ K m × D . Output: the reduced ≺ -Gr¨obner basis of Syz M ( F ). /* Compute monomial basis */ E = ( ε , . . . , ε ∆ ) , B ← M onomial B asis ( ≺ , M , F ) /* Compute leading monomials and their linearizations */ L ← minimal generating set of (cid:104) lt ≺ (Syz M ( F )) (cid:105) , deduced from EL ← L ∩ { c i | (cid:54) i (cid:54) m } write L as { c i , j | (cid:54) j (cid:54) s } for some indices i , , . . . , i , s ( e , , . . . , e , s ) ← (0 , . . . , For k from to r a. L k ← { f ∈ L − ( L ∪ · · · ∪ L k − ) | X k divides f and X − k f ∈ E} b. write L k as { X e k , j c i k , j | (cid:54) j (cid:54) s k } for some exponents and indices e k , j , i k , j c. For j from to s k : µ j ← index such that X − k X e k , j c i k , j = ε µ j d. T k ← matrix formed by the rows µ , . . . , µ s k of B , in this order e. T k ← T k M k T ← T T ... T r /* Compute normal forms ν k , j = nf ≺ ( X e k , j c i k , j ) and return */ ( ν , , . . . , ν , s , . . . , ν r , , . . . , ν r , s r ) ← N ormal F orm ( T , E , B ) Return { X e k , j c i k , j − ν k , j | (cid:54) j (cid:54) s k , (cid:54) k (cid:54) r } K [ X ] n , and G is the reduced ≺ -Gr¨obner basis of N . Having as input G , we give an al-gorithm to compute the multiplication matrices for this quotient with respect to the ≺ -monomialbasis, under the assumption on the leading module of N described in Definition 1.8. For con-ciseness, hereafter this assumption is called the structural assumption . We first discuss the shape of the ≺ -monomial basis of K [ X ] n / N (see [16] and [35, Sec. 3] forsimilar observations in the case of ideals), and then we present an overview of our approach forcomputing the multiplication matrices.Let E = ( ε , . . . , ε D ) be the ≺ -monomial basis of K [ X ] n / N . Below, when discussing multi-plication matrices (with respect to E ) and elements of K [ X ] n / N represented on the basis E , weassume that one has fixed an order on the elements of this basis, for example ε ≺ · · · ≺ ε D .Then the sought multiplication matrices M , . . . , M r ∈ K D × D are such that the row j of M k isthe coe ffi cient vector of the normal form nf ≺ ( X k ε j ) represented in the basis E , for 1 (cid:54) k (cid:54) r and1 (cid:54) j (cid:54) D . Thus, we consider the set of these monomials obtained by multiplying those of E bya variable: S = { X k ε j | (cid:54) k (cid:54) r , (cid:54) j (cid:54) D } ∪ { c i | (cid:54) i (cid:54) n such that c i ∈ lt ≺ ( N ) } . Note that we have added the coordinate vectors c i that are in the leading module lt ≺ ( N ), orequivalently that are not in E . This is because the normal forms of these coordinate vectors willalso be computed by our algorithm, for a negligible cost since they will be directly obtained fromthe ≺ -Gr¨obner basis.Regarding the computation of the normal forms of the monomials in S , we can divide theminto three disjoint categories: S = ( S − B ) ∪ L ∪ ( B − L ) , where B = S − E is the border and L = lt ≺ ( G ) ⊆ B is the minimal generating set of (cid:104) lt ≺ ( N ) (cid:105) (seeSection 2 for more details, and Fig. 1 for an example).The first set S − B is contained in E ; precisely, E = ( S − B ) ∪ { c i | (cid:54) i (cid:54) n such that c i (cid:60) lt ≺ ( N ) } . As a result, each monomial in
S − B is its own ≺ -normal form, and the corresponding rows ofthe multiplication matrices are coordinate vectors of length D which are obtained for free.The monomials in the second set L are the ≺ -leading terms of the elements of G , so that G = { f − nf ≺ ( f ) | f ∈ L} . Thus, from the knowledge of G , one can obtain nf ≺ ( L ) using at most sD computations of opposites in K , where s = Card( G ); by opposite, we mean having on input α ∈ K and computing − α . We recall from Section 2 that s = Card( L ) (cid:54) Card( S ) < rD + n (thebound is strict here since D > B − L . As discussed above, our algorithm works under the structural assumption H ( (cid:104) lt ≺ ( N ) (cid:105) ) from Definition 1.8. The next lemma summarizes the above discussion about thecomputation of nf ≺ ( E ∪ L ), and also highlights one example of how one can exploit the structuralassumption; note that this result appears in [14, Sec. 7] in the case of ideals, under slightlydi ff erent assumptions. 23 Y Figure 1: Illustration of the sets of exponents in the case of the bivariate monomial ideal generated by L = { Y , Y X , Y X , Y X , Y X , Y X , Y X , X } . The elements of L are represented by squares, those of B − L bydiamonds, and those of
S − B by circles. Here, the monomial basis is E = { } ∪ ( S − B ). Monomials in the greyed areaare those in (cid:104)L(cid:105) , or in other words, those not in E . Lemma 4.1.
Given the reduced ≺ -Gr¨obner basis G of N , one can compute nf ≺ ( E ∪ L ) usingat most sD operations in K , where s is the cardinality of G . Assuming H ( (cid:104) lt ≺ ( N ) (cid:105) ) , we have { X r ε j | (cid:54) j (cid:54) D } ⊂ E ∪ L and thus M r can be read o ff from nf ≺ ( E ∪ L ) .Proof. The first claim follows from the above discussion, recalling that s is also the cardinalityof L . Indeed, nf ≺ ( E ) is obtained for free, and for each monomial f in L , its normal form nf ≺ ( f )is computed using at most D computations of opposites of elements of K .Now, assume H ( (cid:104) lt ≺ ( N ) (cid:105) ) and suppose by contradiction that X r ε j (cid:60) E ∪ L for some j . Since X r ε j is not in E , it is in lt ≺ ( N ), and thus it is a multiple X r ε j = X α · · · X α r r f of some f ∈ L ,for some exponents α , . . . , α r . Since X r ε j (cid:60) L , we have α k > (cid:54) k (cid:54) r . If α r >
0, then ε j = X α · · · X α r − r f ∈ lt ≺ ( N ), which is absurd since ε j ∈ E ; hence k < r and X r ε j = X α · · · X α r − r − f . But then X k X r ε j ∈ lt ≺ ( N ), and using H ( (cid:104) lt ≺ ( N ) (cid:105) ) we arrive at the samecontradiction: X k X r X k X r ε j = ε j ∈ lt ≺ ( N ). Therefore X r ε j ∈ E ∪ L for all j , which proves theinclusion in the lemma.The last claim follows, since each row of M r is the ≺ -normal form of a monomial X r ε j , forsome 1 (cid:54) j (cid:54) D .Computing the remaining multiplication matrices requires to compute normal forms of mono-mials in the third set B − L , which is more involved. Our main algorithmic ingredient to computethose e ffi ciently is a procedure which computes a collection of vector-matrix products of the form vM e , where M is some D × D matrix; in our context M is one of the multiplication matrices thatare already known at some point of the algorithm. We call this operation Krylov evaluation and24e give an algorithm for it in Section 4.2. The next example gives a simple illustration of howKrylov evaluation occurs in the computation of multiplication matrices.
Remark . Assume N is an ideal of K [ X ] = K [ X , X ], that is, r = n =
1. The abovelemma shows how to compute M under the structural assumption. Having M , we will now seehow Krylov evaluation allows us to compute M using O ( D ω log( D )) operations in K ; hence, inthis context, both multiplication matrices are obtained in this cost bound.As explained above, the rows of M that correspond to normal forms in nf ≺ ( E ∪ L ) are foundusing O ( D ) operations, since here s (cid:54) D . Thus, it remains to compute its rows that are innf ≺ ( B ), where B = { X ε j | (cid:54) j (cid:54) D } − ( E ∪ L ). Write L = { X α j X β j | (cid:54) j (cid:54) s } with( α j + , β j + ) ≺ lex ( α j , β j ) for 1 (cid:54) j < s ; since L is the minimal generating set of (cid:104) lt ≺ ( N ) (cid:105) , ( α j )is decreasing with α s = β j ) is increasing with β =
0. Then B = { X α j X β j + k | (cid:54) k <β j + − β j , (cid:54) j < s } . Now let v j ∈ K × D be the vector which represents nf ≺ ( X α j X β j ) in the basis E ; since { v , . . . , v s } represent nf ≺ ( L ), these vectors are among the rows of M that have already been computed. Thenthe vectors representing nf ≺ ( B ) are { v j M k | (cid:54) k < β j + − β j , (cid:54) j < s } . Performing this Krylov evaluation using the algorithm in Section 4.2 takes O ( D ω log( D )) oper-ations in K , as stated in Lemma 4.3 which involves parameters that are here σ = β s (cid:54) D and µ (cid:54) D .More generally, for r variables and n (cid:62)
1, one may similarly obtain M r − by computingsuch Krylov evaluations, assuming H ( (cid:104) lt ≺ ( N ) (cid:105) ) (this is a consequence of the more general Lem-mas 4.4 and 4.6). However, when r > M r − cannot be obtained simply by Krylov evaluation with the matrix M r − and the normal forms in nf ≺ ( E ∪ L ) and those given by the rows of M r − . The reason isthat some of the normal forms which constitute the rows of M r − are actually obtained by Krylovevaluation with the matrix M r and the normal forms in nf ≺ ( L ).Thus we change our focus, from the computation of the multiplication matrices to that ofthe normal forms which we can obtained by Krylov evaluation with the known multiplicationmatrices and known normal forms. Roughly, our algorithm is as follows. The first iteration isfor i = r and considers S r = E ∪ L , for which we have seen how to e ffi ciently compute nf ≺ ( S r ).Our structural assumption ensures that these normal forms contain those giving M r . Then theiteration i = r − S r − that can be obtained from S r = E ∪ L bymultiplication by X r , and their normal forms nf ≺ ( S r − ) are computed using Krylov evaluationwith M r and the vectors representing nf ≺ ( S r ). Again, our assumption ensures that nf ≺ ( S r − )gives M r − , but it also contains other normal forms which correspond to rows of multiplicationmatrices M , . . . , M r − , whose computation is not complete yet. Then we continue this processuntil i =
1: at this stage, we have covered the whole set of monomials S and we thus have allthe normal forms in nf ≺ ( S ), from which we read the rows of the multiplication matrices. Thealgorithm is described in detail in Section 4.3. Now we give a simple method for the computation of a collection of vector-matrix productsof the form vM e , obtaining e ffi ciency via repeated squaring of the matrix M . This is detailed inAlgorithm 4, in which we use the following conventions. When specified, instead of indexing25he rows of a matrix K ∈ K σ × D using the integers (1 , . . . , σ ), we index them by a given totallyordered set ( P , (cid:54) ) of cardinality σ . Explicitly, if P = { e , . . . , e σ } with e (cid:54) · · · (cid:54) e σ , then the i th row of K has index e i . Then, for any subset P (cid:48) ⊆ P , we write Rows( K , P (cid:48) ) for the submatrixof K formed by its rows with indices in P (cid:48) . An assignment operation such as Rows( K , P (cid:48) ) ← A for some A ∈ K Card( P (cid:48) ) × D does modify the corresponding entries of K . Algorithm 4 – K rylov E val Input: • a matrix M ∈ K D × D for some D ∈ N > , • row vectors v , . . . , v t ∈ K × D for some t ∈ N > , • bounds e , . . . , e t ∈ N > . Output: the matrix K ∈ K ( e + ··· + e t ) × D whose row e + · · · + e j − + e is equal to v j M e , for1 (cid:54) e (cid:54) e j and 1 (cid:54) j (cid:54) t . /* Initialize set of indices and output matrix */ P ← { ( e , j ) | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t } K ← ∈ K ( e + ··· + e t ) × D with its rows indexed by ( P , ≺ lex ) /* Case e = : compute v j M for (cid:54) j (cid:54) t */ P (cid:48) ← { (1 , j ) | (cid:54) j (cid:54) t } // P (cid:48) ⊆ P Rows( K , P (cid:48) ) ← (cid:104) v T · · · v T t (cid:105) T M /* Repeated squaring: handle e ∈ { i − + , . . . , i } for i > */ N ← M For i from to (cid:100) log (max i e i ) (cid:101) : If i > then N ← N // N = M i − P (cid:48) ← { ( e , j ) | i − < e (cid:54) min( e j , i ) , (cid:54) j (cid:54) t } // P (cid:48) ⊆ P P (cid:48)(cid:48) = { ( e − i − , j ) | ( e , j ) ∈ P (cid:48) } // P (cid:48)(cid:48) ⊆ P Rows( K , P (cid:48) ) ← Rows( K , P (cid:48)(cid:48) ) · N Return K Lemma 4.3.
Given M ∈ K D × D , let v , . . . , v t ∈ K × D , and let e , . . . , e t ∈ N > , Algorithm 4computes the row vectors { v j M e | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t } usingO (cid:16) D ω (1 + log( µ )) + D ω − σ (1 + log( µ t /σ )) (cid:17) ⊆ O (cid:16) D ω − ( D + σ )(1 + log( µ )) (cid:17) operations in K , where σ = e + · · · + e t and µ = max( e , . . . , e t ) .Proof. We want to prove that after Step 3, the row e + · · · + e j − + e of K is v j M e , for 1 (cid:54) e (cid:54) e j and 1 (cid:54) j (cid:54) t . Indexing the rows of K by ( P , ≺ lex ) as in the algorithm, this means that the rowof K at index ( e , j ) is v j M e , for all ( e , j ) ∈ P . To prove this, we show that at the end of the i thiteration of the loop at Step 3, the following assertion holds: A i : the row of K at index ( e , j ) is equal to v j M e , for all ( e , j ) ∈ P such that e (cid:54) i . This gives the conclusion, since when the algorithm completes the loop, we have i = (cid:100) log ( µ ) (cid:101) and 2 i (cid:62) µ , and all ( e , j ) ∈ P are such that e (cid:54) µ by definition of µ .26irst, (1 , j ) ∈ P for 1 (cid:54) j (cid:54) t , and after Step 2, the row of K at index (1 , j ) is equal to v j M .So A holds before the first iteration i =
1. Now, assume that A i − holds before the i th iteration:we want to prove that A i holds at the end of this iteration. After the squaring at the beginning ofthe iteration, we have N = M i − . By construction, P (cid:48) is the set of indices such that we have theequality P (cid:48) ∪ ( P ∩ { ( e , j ) | (cid:54) e (cid:54) i − , (cid:54) j (cid:54) t } ) = P ∩ { ( e , j ) | (cid:54) e (cid:54) i , (cid:54) j (cid:54) t } . Thus our goal is to show that for each ( e , j ) ∈ P (cid:48) , at the end of the iteration the row of K at index( e , j ) is v j M e . By assumption, the row of K at index ( e − i − , j ) is v j M e − i − . Then the last stepof the iteration ensures that the row of K at index ( e , j ) is v j M e − i − N = v j M e − i − M i − = v j M e .Thus, A i holds, and this concludes the proof of correctness.At Step 2, we multiply an t × D matrix and a D × D matrix, using O ( D ω + D ω − t ) operationsin K ; this is within the bound in the lemma since t (cid:54) σ . Over all iterations of the loop atStep 3, the squarings of N use a total of O ( D ω ( (cid:100) log ( µ ) (cid:101) − ⊆ O ( D ω log ( µ )) operations. Theproduct Rows( K , P (cid:48)(cid:48) ) · N is computed using O ( D ω + D ω − Card( P (cid:48)(cid:48) )) operations in K since thematrices have size Card( P (cid:48)(cid:48) ) × D and D × D , respectively. By definition, Card( P (cid:48)(cid:48) ) = Card( P (cid:48) ),and since P (cid:48) = P ∩ { ( e , j ) | i − < e (cid:54) i , (cid:54) j (cid:54) t } , we obtain Card( P (cid:48)(cid:48) ) (cid:54) Card( P ) = σ and Card( P (cid:48)(cid:48) ) (cid:54) i − t . Then the total number of operations in K used for the computation ofRows( K , P (cid:48)(cid:48) ) · N over all iterations of the loop at Step 3 is O (cid:100) log ( µ ) (cid:101) (cid:88) i = ( D ω + D ω − min(2 i − t , σ )) ⊆ O D ω (cid:100) log ( µ ) (cid:101) + D ω − (cid:100) log ( µ ) (cid:101) (cid:88) i = min(2 i − t , σ ) , which is within the cost bound in the lemma. Indeed, (cid:100) log ( µ ) (cid:101) (cid:54) + log ( µ ) and (cid:100) log ( µ ) (cid:101) (cid:88) i = min(2 i − t , σ ) (cid:54) (cid:98) log ( σ/ t ) (cid:99) (cid:88) i = i − t + (cid:100) log ( µ ) (cid:101) (cid:88) i = (cid:98) log ( σ/ t ) (cid:99) + σ (cid:54) σ (1 + (cid:100) log ( µ ) (cid:101) − (cid:98) log ( σ/ t ) (cid:99) ) (cid:54) σ (3 + log ( µ t /σ )) . Now, we describe our algorithm for computing the multiplication matrices and give a com-plexity analysis. We follow on from notation in Section 4.1.We are going to show how to compute the normal forms of all monomials in
E ∪ B = S ∪ { c i | (cid:54) i (cid:54) n such that c i (cid:60) lt ≺ ( N ) } ;since this set contains S , this directly yields the multiplication matrices. We design an iteration onthe r variables which computes the normal forms of r sets E∪L = ˆ S r ⊆ ˆ S r − ⊆ · · · ⊆ ˆ S = E∪B ;at the end, nf ≺ ( ˆ S ) gives the sought normal forms.Thus, we start with the normal forms of the monomials in ˆ S r = E ∪ L , which are easily foundfrom G (see Lemma 4.1). Then, for 1 (cid:54) i < r , we consider the monomials in E ∪ B which areobtained from
E ∪ L through multiplication by X i + , . . . , X r :ˆ S i = { X e i + i + · · · X e r r f | e i + , . . . , e r (cid:62) , f ∈ E ∪ L} ∩ ( E ∪ B ) . The normal forms nf ≺ ( ˆ S i ) can be obtained from those in nf ≺ ( E ∪ L ) through multiplication by M i + , . . . , M r , if these matrices are known. 27rom these sets, we define the sets mentioned at the end of Section 4.1: S r = ˆ S r = E ∪ L for i = r , and S i = ˆ S i − ˆ S i + for 1 (cid:54) i < r . Therefore ˆ S i is the disjoint union S i ∪ · · · ∪ S r , and S i isthe set of monomials in B − ˆ S i + which can be obtained from E ∪ L through multiplication by amonomial in X i + , . . . , X r which does involve the variable X i + . That is, S i = { X e i + i + · · · X e r r f | e i + > , e i + , . . . , e r (cid:62) , f ∈ E ∪ L} ∩ ( B − ˆ S i + ) = { X ei + f | e > , f ∈ ˆ S i + } ∩ ( B − ˆ S i + ) . In particular, if M i + and nf ≺ ( ˆ S i + ) are known, then nf ≺ ( S i ) can be computed via Krylov eval-uation with the matrix M i + and the vectors representing nf ≺ ( ˆ S i + ). Having nf ≺ ( S i ) gives usnf ≺ ( ˆ S i ) = nf ≺ ( S i ) ∪ nf ≺ ( ˆ S i + ), and we will prove in Lemma 4.4 that M i can be read o ff fromnf ≺ ( ˆ S i ), under the structural assumption. Thus we can proceed iteratively, since then, from M i and nf ≺ ( ˆ S i ) we can use Krylov evaluation to find nf ≺ ( S i − ), from which we deduce M i − , etc. Atthe end of this process we have computed nf ≺ ( ˆ S ) ⊇ nf ≺ ( S ) and deduced all the multiplicationmatrices. Lemma 4.4.
Assuming H ( (cid:104) lt ≺ ( N ) (cid:105) ) , we have { X i ε j | (cid:54) j (cid:54) D } ⊆ ˆ S i for all (cid:54) i (cid:54) r ; in particular, the multiplication matrices M i , . . . , M r can be read o ff from nf ≺ ( ˆ S i ) .Proof. Note that for i = r , this result was already proved in Lemma 4.1 (and we will use similararguments in the proof below); besides, for i = S = E ∪ B .Let i ∈ { , . . . , r } . First, M i + , . . . , M r can be read o ff from nf ≺ ( ˆ S i ) since for k ∈ { i + , . . . , r } we have { X k ε j , (cid:54) j (cid:54) D } ⊆ ˆ S k − ⊆ ˆ S i . The fact that M i can be read o ff from nf ≺ ( ˆ S i ) followsfrom the inclusion { X i ε j | (cid:54) j (cid:54) D } ⊆ ˆ S i in the lemma; it remains to prove that inclusion.Thus, we want to prove X i ε j ∈ ˆ S i for any j ∈ { , . . . , D } ; for this we will use the structuralassumption. The particular case X i ε j ∈ E ∪ L is obvious since E ∪ L = S r ⊆ ˆ S i . Now weconsider X i ε j (cid:60) E ∪ L . Then X i ε j ∈ lt ≺ ( N ) and there exist exponents α , . . . , α r not all zero anda monomial f ∈ L such that X i ε j = X α · · · X α r r f .Suppose by contradiction that there exists k ∈ { , . . . , i } such that α k >
0. If k = i , then α i > ε j is a multiple of f , hence ε j ∈ lt ≺ ( N ), which is not the case. If 1 (cid:54) k < i , α k > X k X i ε j ∈ lt ≺ ( N ), and using the structural assumption we obtain the same contradiction: X k X i X k X i ε j = ε j ∈ lt ≺ ( N ). Thus there is no such k , and α = · · · = α i =
0. As a result, X i ε j = X α i + i + · · · X α r r f , which is in ˆ S i .For completeness, in Algorithm 5 we describe a straightforward subroutine for determin-ing the sets of monomials ( S i ) (cid:54) i (cid:54) r , which is directly based on the description of these sets inLemma 4.5. Note that this computation does not involve field operations but only comparisonsof exponents of monomials, so that here the time for finding these sets is not taken into accountin our cost bounds. In an e ffi cient implementation of our algorithm for finding the multiplicationmatrices, one would rather compute these sets while building B from G , and we believe that find-ing these sets should indeed be a negligible part of the running time of such an implementation. Lemma 4.5.
Assume H ( (cid:104) lt ≺ ( N ) (cid:105) ) and let (cid:54) i < r. Let { f , . . . , f t } = { f ∈ ˆ S i + ∩ B | X i + f ∈ B − ˆ S i + } , nd for (cid:54) j (cid:54) t, let e j ∈ N > be the largest integer such that X e j i + f j ∈ B − ˆ S i + . Then S i = { X ei + f j | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t } . Proof.
First, we prove that S i contains the latter set. Let 1 (cid:54) j (cid:54) t and 1 (cid:54) e (cid:54) e j . Then f j isin ˆ S i + and X ei + f j is in B . Since e > X ei + f j is in ˆ S i . Furthermore, X ei + f j is not in ˆ S i + , so thatit is in ˆ S i − ˆ S i + = S i .Now, let g ∈ S i . By definition, this means that g is in ˆ S i ⊆ E∪B , and g is not in ˆ S i + ⊇ E∪L .In particular, g is in B − ˆ S i + . Furthermore, we can write g = X ei + f for some e > f ∈ ˆ S i + .Then let e (cid:48) be the smallest exponent such that X e (cid:48) i + f (cid:60) ˆ S i + ; we have 1 (cid:54) e (cid:48) (cid:54) e since f ∈ ˆ S i + and g (cid:60) ˆ S i + . Let f (cid:48) = X e (cid:48) − i + f ∈ ˆ S i + ; thus f (cid:48) is in B : if this was not the case, then f (cid:48) would bein E and X i + f (cid:48) would be in ˆ S i + according to Lemma 4.4. Furthermore, it is a property of theborder that, since the multiple g = X e − e (cid:48) + i + f (cid:48) is in B , then X i + f (cid:48) is in E ∪ B ; yet X i + f (cid:48) is not inˆ S i + which contains E , hence X i + f (cid:48) ∈ B − ˆ S i + . It follows that f (cid:48) = f j for some 1 (cid:54) j (cid:54) t . Thuswe have g = X e − e (cid:48) + i + f j , with e − e (cid:48) + (cid:54) e j by definition of e j , which concludes the proof. Algorithm 5 – N ext M onomials Input: • the border B , • the set of monomials ˆ S i + = S i + ∪ · · · ∪ S r for some 1 (cid:54) i < r . Output: the set of monomials S i , in the form S i = { X ei + f j | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t } for some f , . . . , f t ∈ ˆ S i + ∩ B and e , . . . , e t ∈ N > . S i ← ∅ ; n ← For each f ∈ ˆ S i + ∩ B such that X i + f ∈ B − ˆ S i + : e ← While X e + i + f ∈ B − ˆ S i + : e ← e + S i ← S i ∪ { X i + f , . . . , X ei + f } t ← t + e t ← e ; f t ← f Return S i = { X ei + f j | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t } Next, we show how to compute nf ≺ ( S i ) from nf ≺ ( ˆ S i + ) and M i + using Krylov evaluation. Lemma 4.6.
Let i ∈ { , . . . , r − } . Given ( B , ˆ S i + , nf ≺ ( ˆ S i + ) , M i + ) , one can compute S i and nf ≺ ( S i ) as follows: • S i = { X ei + f j | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t i } ← N ext M onomials ( B , ˆ S i + ) , for some f , . . . , f t i ∈ ˆ S i + ∩ B and e , . . . , e t i ∈ N > • { v , . . . , v t i } ⊆ K × D ← nf ≺ ( { f , . . . , f t i } ) , retrieved from nf ≺ ( ˆ S i + )nf ≺ ( S i ) ← rows of K rylov E val ( M i + , v , . . . , v t i , e , . . . , e t i ) This usesO (cid:16) D ω (1 + log( µ i )) + D ω − σ i (1 + log( µ i t i /σ i )) (cid:17) ⊆ O (cid:16) D ω − ( D + σ i )(1 + log( µ i )) (cid:17) operations in K , where σ i = e + · · · + e t i is the cardinality of S i and µ i = max( e , . . . , e t i ) . Wehave µ i (cid:54) max { e ∈ N | X ei + f ∈ B for some f ∈ B} . roof. In the first step, S i is determined from ˆ S i + without field operation as shown in Algo-rithm 5, and it is obtained in the form S i = { X ei + f j | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t i } , where f , . . . , f t i areelements of ˆ S i + and e , . . . , e t i are positive integers; in particular, e + · · · + e t i = Card( S i ) = σ i .The upper bound on µ i holds since, by construction of S i as in Algorithm 5 (see also Lemma 4.5), f j ∈ B and X e j i + f j ∈ B for 1 (cid:54) j (cid:54) t i .Going to the normal forms, we getnf ≺ ( S i ) = { v j M ei + | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t i } (2)where v j = nf ≺ ( f j ) ∈ K × D for 1 (cid:54) j (cid:54) t i ; these normal forms are already known since they arein nf ≺ ( ˆ S i + ). This shows that the second item correctly computes nf ≺ ( S i ). The cost bound is aconsequence of Lemma 4.3. Algorithm 6 – M ultiplication M atrices Input: • a monomial order ≺ on K [ X ] n , • a reduced ≺ -Gr¨obner basis G ⊆ K [ X ] n . Requirements: • K [ X ] n / N has finite dimension D as a K -vector space, where N = (cid:104)G(cid:105) , • H ( (cid:104) lt ≺ ( N ) (cid:105) ) holds. Output: • the ≺ -monomial basis E of K [ X ] n / N , • the multiplication matrices of X , . . . , X r in K [ X ] n / N with respect to E . /* Build main sets of exponents */ Read L and E = ( ε , . . . , ε D ) from G , with ε ≺ · · · ≺ ε D S ← { X k ε j | (cid:54) k (cid:54) r , (cid:54) j (cid:54) D } ∪ { c i | (cid:54) i (cid:54) n such that c i ∈ L}B ← S − E /* Below, normal forms are represented in E , as vectors in K × D */ /* Initialize the iteration: ˆ S = ˆ S r = E ∪ L , Q = nf ≺ ( ˆ S i + ) , find M r */ ˆ S ← E ∪ L ; Q ← nf ≺ ( ˆ S ); read M r from Q For i from r − to /* Before iteration i : ˆ S = ˆ S i + , Q = nf ≺ ( ˆ S i + ) , M i + , . . . , M r known *//* After iteration i : ˆ S = ˆ S i , Q = nf ≺ ( ˆ S i ) , M i , . . . , M r known */ a. (cid:101) S = { X ei + f j | (cid:54) e (cid:54) e j , (cid:54) j (cid:54) t } ← N ext M onomials ( B , ˆ S ), for some f , . . . , f t ∈ ˆ S and e , . . . , e t ∈ N > // (cid:101) S = S i b. { v , . . . , v t } ⊆ K × D ← nf ≺ ( { f , . . . , f t } ), retrieved from Q = nf ≺ ( ˆ S ) (cid:101) Q ← rows of K rylov E val ( M i + , v , . . . , v t , e , . . . , e t ) // (cid:101) Q = nf ≺ ( S i ) c. ˆ S ← (cid:101)
S ∪ ˆ S ; Q ← (cid:101)
Q ∪ Q ; read M i from ˆ S Return E , M , . . . , M r The correctness of Algorithm 6 follows from the results and discussions in this section. Thenext proposition implies the first item of Theorem 1.9, and gives a more precise cost bound. Ituses notation from Lemma 4.6. We remark that one could easily verify that the requirements30f Algorithm 6 hold while building the ≺ -monomial basis E at the first step, relying on thecharacterization of H ( (cid:104) lt ≺ ( N ) (cid:105) ) described in Lemma 2.2. Proposition 4.7.
Let ≺ be a monomial order on K [ X ] n and let G be a reduced ≺ -Gr¨obner basissuch that K [ X ] n / N has dimension D, where N = (cid:104)G(cid:105) . Assume H ( (cid:104) lt ≺ ( N ) (cid:105) ) , and using notationabove, let µ = max( µ , . . . , µ r − ) . Thus µ (cid:54) max { e ∈ N | X ei f ∈ B for some f ∈ B and some (cid:54) i (cid:54) r } , and in particular µ (cid:54) D. Then Algorithm 6 solves Problem 2 usingO (cid:16) D ω (cid:0) r − + log( µ · · · µ r − ) (cid:1) + D ω − (cid:16) Card(
B − L ) + (cid:80) (cid:54) i (cid:54) r σ i log( µ i t i /σ i ) (cid:17)(cid:17) ⊆ O (cid:0) rD ω (1 + log( µ )) (cid:1) ⊆ O (cid:0) rD ω log( D ) (cid:1) operations in K .Proof. First, Step 2 computes ˆ S r = S r = E ∪ L and nf ≺ ( ˆ S r ) from G , using O ( rD ) computationsof opposites of field elements (see Lemma 4.1). Then the For loop iteratively applies Lemma 4.6to obtain the remaining matrices. Using notation from Lemma 4.6, the overall cost bound is O (cid:88) (cid:54) i (cid:54) r − D ω (1 + log( µ i )) + D ω − σ i (1 + log( µ i t i /σ i )) ⊆ O D ω ( r − + log( µ · · · µ r − )) + D ω − Card(
B − L ) + (cid:88) (cid:54) i (cid:54) r − σ i log( µ i t i /σ i ) . Indeed, we have σ + · · · + σ r − = Card(
B − L ), since σ i = Card( S i ) and S ∪ · · · ∪ S r − = ˆ S − S r = ( E ∪ B ) − ( E ∪ L ) = B − L . Using the bounds Card(
B − L ) (cid:54) rD (see Section 2) aswell as µ i t i /σ i (cid:54) µ i (cid:54) µ for all i , we obtain the simplified cost bound O ( rD ω (1 + log( µ ))). Combining the above algorithms leads to an e ffi cient change of order procedure, detailed inAlgorithm 7.As above concerning the computation of multiplication matrices, one may easily verify fromthe input of Algorithm 7 whether the requirements hold. For simplicity, here we only use thesimplified cost bounds of the above results; better bounds may be obtained in particular cases. Proposition 4.8.
Algorithm 7 is correct and uses O ( nD ω − + rD ω log( D )) operations in K .Proof. According to Proposition 4.7, Step 1 uses O ( rD ω log( D )) operations in K and returns the ≺ -monomial basis E of K [ X ] n / N and the multiplication matrices M with respect to E . To buildthe matrix F ∈ K n × D , Step 2 uses O ( nD ) operations; precisely, each normal form of a coordinatevector in the Else statement costs at most D computations of the opposite of an element in K .By Proposition 3.14, Step 3 uses O ( nD ω − + rD ω log( D )) operations to compute the reduced ≺ -Gr¨obner basis G of Syz M ( F ). Hence the overall cost bound. Proving correctness amounts toshowing that Syz M ( F ) = N , which directly follows from the construction of F and the fact that31 lgorithm 7 – C hange O rder Input: • a monomial order ≺ on K [ X ] n , • a reduced ≺ -Gr¨obner basis G ⊆ K [ X ] n , • a monomial order ≺ on K [ X ] n . Requirements: • K [ X ] n / N has finite dimension D as a K -vector space, where N = (cid:104)G (cid:105) , • H ( (cid:104) lt ≺ ( N ) (cid:105) ) holds. Output: • the reduced ≺ -Gr¨obner basis of N . E = ( ε , . . . , ε D ) , M = ( M , . . . , M r ) ← M ultiplication M atrices ( ≺ , G ) /* Build a matrix F such that Syz M ( F ) = N */ F ← matrix in K n × D For (cid:54) i (cid:54) n : If c i = ε j for some 1 (cid:54) j (cid:54) D : i th row of F ← [0 · · · · · · ∈ K × D with 1 at index j Else : /* in this case c i − nf ≺ ( c i ) ∈ G */ i th row of F ← vector in K × D representing nf ≺ ( c i ) on the basis E G ← S yzygy M odule B asis ( ≺ , M , F ) Return G c i − nf ≺ ( c i ) is in N :( p , . . . , p n ) ∈ Syz M ( F ) ⇔ (cid:88) (cid:54) i (cid:54) n c i ∈E p i c i + (cid:88) (cid:54) i (cid:54) n c i (cid:60) E p i nf ≺ ( c i ) ∈ N⇔ (cid:88) (cid:54) i (cid:54) n p i c i = ( p , . . . , p n ) ∈ N . References [1] Alonso, M.E., Marinari, M.G., Mora, T., 2003. The Big Mother of all Dualities: M¨oller Algorithm. Communica-tions in Algebra 31, 783–818. doi: .[2] Auzinger, W., Stetter, H.J., 1988. An elimination algorithm for the computation of all zeros of a system of multi-variate polynomial equations, in: Proceedings Numerical Mathematics 1988, Birkh¨auser Basel, Basel. pp. 11–30.doi: .[3] Bayer, D., Stillman, M., 1987. A theorem on refining division orders by the reverse lexicographic order. DukeMathematical Journal 55, 321–328. doi: .[4] Beckermann, B., 1992. A reliable method for computing M-Pad´e approximants on arbitrary staircases. J. Comput.Appl. Math. 40, 19–42. doi: .[5] Beckermann, B., Labahn, G., 1994. A uniform approach for the fast computation of matrix-type Pad´e approximants.SIAM J. Matrix Anal. Appl. 15, 804–823. doi: .[6] Beckermann, B., Labahn, G., 1997. Recursiveness in matrix rational interpolation problems. J. Comput. Appl.Math. 77, 5–34. doi: .[7] Beckermann, B., Labahn, G., 2000. Fraction-free computation of matrix rational interpolants and matrix gcds.SIAM J. Matrix Anal. Appl. 22, 114–144. doi: .
8] Berthomieu, J., Boyer, B., Faug`ere, J.C., 2015. Linear algebra for computing gr¨obner bases of linear recursivemultidimensional sequences, in: ISSAC’15, ACM, New York, NY, USA. pp. 61–68. doi: .[9] Berthomieu, J., Boyer, B., Faug`ere, J.C., 2017. Linear Algebra for Computing Gr¨obner Bases of Linear RecursiveMultidimensional Sequences. J. Symbolic Comput. 83, 36–67. doi: .[10] Berthomieu, J., Faug`ere, J.C., 2016. Guessing linear recurrence relations of sequence tuples and p-recursive se-quences with linear algebra, in: ISSAC’16, ACM, New York, NY, USA. pp. 95–102. doi: .[11] Berthomieu, J., Faug`ere, J.C., 2018. A polynomial-division-based algorithm for computing linear recurrence rela-tions, ACM, New York, NY, USA. pp. 79–86. doi: .[12] Coppersmith, D., Winograd, S., 1990. Matrix multiplication via arithmetic progressions. J. Symbolic Comput. 9,251–280. doi: .[13] Eisenbud, D., 1995. Commutative Algebra: with a View Toward Algebraic Geometry. Graduate Texts in Mathe-matics, Springer, New York, Berlin, Heildelberg. doi: .[14] Faug`ere, J., Gaudry, P., Huot, L., Renault, G., 2013. Polynomial systems solving by fast linear algebra. CoRRabs / http://arxiv.org/abs/1304.6039 .[15] Faug`ere, J.C., Gaudry, P., Huot, L., Renault, G., 2014. Sub-cubic change of ordering for Gr¨obner basis: a proba-bilistic approach, in: ISSAC’14, ACM, New York, NY, USA. pp. 170–177. doi: .[16] Faug`ere, J.C., Gianni, P., Lazard, D., Mora, T., 1993. E ffi cient computation of zero-dimensional Gr¨obner bases bychange of ordering. Journal of Symbolic Computation 16, 329–344. doi: .[17] Faug`ere, J.C., Mou, C., 2011. Fast algorithm for change of ordering of zero-dimensional gr¨obner bases withsparse multiplication matrices, in: ISSAC’11, ACM, New York, NY, USA. pp. 115–122. doi: .[18] Faug`ere, J.C., Mou, C., 2017. Sparse FGLM algorithms. J. Symbolic Comput. 80, Part 3, 538–569. doi: .[19] Fitzpatrick, P., 1997. Solving a Multivariable Congruence by Change of Term Order. J. Symbolic Comput. 24,575–589. doi: .[20] Galligo, A., 1974. A propos du th´eor`eme de pr´eparation de Weierstrass, in: Fonctions de Plusieurs VariablesComplexes, Springer. pp. 543–579. doi: .[21] Giorgi, P., Jeannerod, C.P., Villard, G., 2003. On the complexity of polynomial matrix computations, in: ISSAC’03,ACM. pp. 135–142. doi: .[22] Hermite, C., 1893. Sur la g´en´eralisation des fractions continues alg´ebriques. Annali di Matematica Pura edApplicata (1867-1897) 21, 289–308. doi: .[23] Jeannerod, C.P., Neiger, V., Schost, E., Villard, G., 2016. Fast computation of minimal interpolation bases in Popovform for arbitrary shifts, in: ISSAC’16, ACM. pp. 295–302. doi: .[24] Jeannerod, C.P., Neiger, V., Schost, E., Villard, G., 2017. Computing minimal interpolation bases. J. SymbolicComput. 83, 272–314. doi: .[25] Jeannerod, C.P., Neiger, V., Villard, G., 2019. Fast computation of approximant bases in canonical form. J.Symbolic Comput. In press. doi: .[26] Kailath, T., 1980. Linear Systems. Prentice-Hall.[27] Kehrein, A., Kreuzer, M., Robbiano, L., 2005. An algebraist’s view on border bases, in: and Dickenstein, A.,Emiris, I.Z. (Eds.), Solving Polynomial Equations: Foundations, Algorithms, and Applications. Springer, pp. 169–202. doi: .[28] Keller-Gehrig, W., 1985. Fast algorithms for the characteristic polynomial. Theoretical Computer Science 36,309–317. doi: .[29] Kojima, C., Rapisarda, P., Takaba, K., 2007. Canonical forms for polynomial and quadratic di ff erential operators.Systems and Control Letters 56, 678–684. doi: .[30] Le Gall, F., 2014. Powers of tensors and fast matrix multiplication, in: ISSAC’14, ACM. pp. 296–303. doi: .[31] Macaulay, F.S., 1902. Some formulae in elimination. Proceedings of the London Mathematical Society s1-35,3–27. doi: .[32] Macaulay, F.S., 1916. The Algebraic Theory of Modular Systems. Cambridge Tracts in Mathematics and Mathe-matical Physics, Cambridge University Press.[33] Mahler, K., 1968. Perfect systems. Composit. Math. 19, 95–166.[34] Marinari, M.G., M¨oller, H.M., Mora, T., 1991. Gr¨obner bases of ideals given by dual bases, in: ISSAC’91, ACM,New York, NY, USA. pp. 55–63. doi: .[35] Marinari, M.G., M¨oller, H.M., Mora, T., 1993. Gr¨obner bases of ideals defined by functionals with an applicationto ideals of projective points. Appl. Algebra Engrg. Comm. Comput. 4, 103–145. doi: .[36] M¨oller, H.M., Buchberger, B., 1982. The construction of multivariate polynomials with preassigned zeros, in: UROCAM’82, Springer. pp. 24–31. doi: .[37] Mourrain, B., 1999. A new criterion for normal form algorithms, in: Applied Algebra, Algebraic Algo-rithms and Error-Correcting Codes, Springer Berlin Heidelberg, Berlin, Heidelberg. pp. 430–442. doi: .[38] Neiger, V., 2016. Bases of relations in one or several variables: fast algorithms and applications. Ph.D. thesis.´Ecole Normale Sup´erieure de Lyon. URL: https://tel.archives-ouvertes.fr/tel-01431413/ .[39] O’Kee ff e, H., Fitzpatrick, P., 2002. Gr¨obner basis solutions of constrained interpolation problems. Linear AlgebraAppl. 351, 533–551. doi: .[40] Pad´e, H., 1894. Sur la g´en´eralisation des fractions continues alg´ebriques. Journal de Math´ematiques Pures etAppliqu´ees , 291–330.[41] Pardue, K., 1994. Nonstandard Borel-Fixed Ideals. Ph.D. thesis. Brandeis University.[42] Paszkowski, S., 1987. Recurrence relations in Pad´e-Hermite approximation. J. Comput. Appl. Math. 19, 99–107.doi: .[43] Popov, V.M., 1972. Invariant description of linear, time-invariant controllable systems. SIAM Journal on Control10, 252–264. doi: .[44] Sakata, S., 1990. Extension of the berlekamp-massey algorithm to n dimensions. Inform. and Comput. 84, 207–239. doi: .[45] Sergeyev, A.V., 1987. A recursive algorithm for Pad´e-Hermite approximations. USSR Comput. Math. Math. Phys.26, 17–22. doi: .[46] Storjohann, A., 2000. Algorithms for Matrix Canonical Forms. Ph.D. thesis. Swiss Federal Institute of Technology– ETH. URL: https://cs.uwaterloo.ca/~astorjoh/diss2up.pdf .[47] Sylvester, J.J., 1853. On a Theory of the Syzygetic Relations of Two Rational Integral Functions, Comprising anApplication to the Theory of Sturm’s Functions, and That of the Greatest Algebraical Common Measure. Philo-sophical Transactions of the Royal Society of London 143, 407–548. doi: .[48] Van Barel, M., Bultheel, A., 1991. The computation of non-perfect Pad´e-Hermite approximants. Numer. Algo-rithms 1, 285–304. doi: .[49] Van Barel, M., Bultheel, A., 1992. A general module theoretic framework for vector M-Pad´e and matrix rationalinterpolation. Numer. Algorithms 3, 451–462. doi: ..