[PDF] Fast Operations on Linearized Polynomials and their Applications in Coding Theory

Abstract

This paper considers fast algorithms for operations on linearized polynomials. We propose a new multiplication algorithm for skew polynomials (a generalization of linearized polynomials) which has sub-quadratic complexity in the polynomial degree s , independent of the underlying field extension degree~ m . We show that our multiplication algorithm is faster than all known ones when s≤m . Using a result by Caruso and Le Borgne (2017), this immediately implies a sub-quadratic division algorithm for linearized polynomials for arbitrary polynomial degree s . Also, we propose algorithms with sub-quadratic complexity for the q -transform, multi-point evaluation, computing minimal subspace polynomials, and interpolation, whose implementations were at least quadratic before. Using the new fast algorithm for the q -transform, we show how matrix multiplication over a finite field can be implemented by multiplying linearized polynomials of degrees at most s=m if an elliptic normal basis of extension degree m exists, providing a lower bound on the cost of the latter problem. Finally, it is shown how the new fast operations on linearized polynomials lead to the first error and erasure decoding algorithm for Gabidulin codes with sub-quadratic complexity.

Full PDF

FFast Operations on Linearized Polynomials and theirApplications in Coding Theory

Sven Puchinger

Institute of Communications Engineering, Ulm University, Ulm, Germany

Antonia Wachter-Zeh

Institute for Communications Engineering, Technical University of Munich, Germany

Abstract

This paper considers fast algorithms for operations on linearized polynomials. We propose a new multipli-cation algorithm for skew polynomials (a generalization of linearized polynomials) which has sub-quadraticcomplexity in the polynomial degree s , independent of the underlying ﬁeld extension degree m . We showthat our multiplication algorithm is faster than all known ones when s ≤ m . Using a result by Caruso andLe Borgne (2017), this immediately implies a sub-quadratic division algorithm for linearized polynomi-als for arbitrary polynomial degree s . Also, we propose algorithms with sub-quadratic complexity for the q -transform, multi-point evaluation, computing minimal subspace polynomials, and interpolation, whoseimplementations were at least quadratic before. Using the new fast algorithm for the q -transform, we showhow matrix multiplication over a ﬁnite ﬁeld can be implemented by multiplying linearized polynomials ofdegrees at most s = m if an elliptic normal basis of extension degree m exists, providing a lower bound onthe cost of the latter problem. Finally, it is shown how the new fast operations on linearized polynomialslead to the ﬁrst error and erasure decoding algorithm for Gabidulin codes with sub-quadratic complexity. Key words: linearized polynomials, skew polynomials, fast multiplication, fast multi-point evaluation, fastminimal subspace polynomial, fast decoding

1. Introduction

Linearized polynomials (Ore, 1933a) are polynomials of the form a = (cid:88) k a k x q k , a k ∈ F q m (ﬁnite ﬁeld) , Email addresses: [email protected] (Sven Puchinger), [email protected] (AntoniaWachter-Zeh).

Preprint submitted to Journal of Symbolic Computation August 22, 2018 a r X i v : . [ c s . S C ] J u l hich possess a ring structure with ordinary addition and polynomial composition. They arean important class of polynomials which are of theoretical interest (Evans et al., 1992; Wu andLiu, 2013) and have many applications in coding theory (Gabidulin, 1985; Silva et al., 2008),dynamical systems (Cohen and Hachenberger, 2000) and cryptography (Gabidulin et al., 1991).Especially in coding theory, designing fast algorithms for certain operations on these polynomialsis crucial since it directly determines the complexity of decoding Gabidulin codes, an importantclass of rank-metric codes.The operations that we consider in this paper are multiplication, division, q -transform, com-puting minimal subspace polynomials, multi-point evaluation, and interpolation of linearizedpolynomials of degree at most s over F q m .In this section, we omit log factors using the O ∼ ( · ) notation. These factors can be found in therespective theorems or references. By ω ≤

3, we denote the matrix multiplication exponent.

For s ≤ m and m admitting a low-complexity normal basis of F q m over F q , Silva and Kschis-chang (2009) and Wachter-Zeh et al. (2013) presented algorithms for the q -transform with re-spect to a basis of F q s ⊆ F q m ( O ( ms ) over F q ), multi-point evaluation ( O ( m s ) over F q ), andmultiplication of linearized polynomials modulo x q m − x ( O ( m s ) over F q ), where the complexitybottleneck of the latter two methods is the so-called q -transform with respect to a basis of F q m with complexity O ( m s ).For arbitrary s >

0, Wachter-Zeh (2013, Sec. 3.1.2) presented an algorithm for multiplyingtwo linearized polynomials of degree at most s with complexity O ( s min { ω + , . } ) over F q m , where ω is the matrix multiplication exponent. Finding a minimal subspace polynomial and performinga multi-point evaluation are both known to be implementable with O ( s ) operations in F q m , see (Liet al., 2014). Similarly, the known implementations of the q -transform require O ( s ) operationsover F q m (Wachter-Zeh, 2013), and the interpolation O ( s ) over F q m (Silva and Kschischang,2007).Recently, Caruso and Le Borgne (2017) proposed algorithms for multiplication and divisionof skew polynomials (a generalization of linearized polynomials) that have complexity O ∼ ( sm ).If m ∈ o ( s ), then these algorithms are sub-quadratic in s . Further, they presented two Karatsuba-based algorithms where the so-called Karatsuba method has complexity O ( s . m . ) over F q , if s > m and the so-called matrix method has complexity O ( s . m . ) over F q , if s > m /

2. For m ∈ Ω ( s ), it has been an open problem if division algorithms of sub-quadratic complexity exist.Since operations over F q m can be performed in O ∼ ( m ) operations over F q (cf. (Couveignesand Lercier, 2009)), a quadratic complexity O ( s ) over F q m corresponds to O ∼ ( ms ) over F q .Hence, all the mentioned results over F q m are not slower than the ones over F q from (Silva andKschischang, 2009) and (Wachter-Zeh et al., 2013) and it su ﬃ ces to compare our results to thecost bounds over F q m .The results of this paper were partly presented at the IEEE International Symposium on In-formation Theory (Puchinger and Wachter-Zeh, 2016), with an emphasis on the implications forcoding theory and omitting many proofs and comparisons. In this paper, we present algorithms for the above operations that are sub-quadratic in thepolynomial degree s of the involved polynomials, independent of the ﬁeld extension degree m .First, we generalize the multiplication algorithm for linearized polynomials from (Wachter-Zeh, 2013), which is based on a fragmentation of the involved polynomials similar to (Brent et al.,2980a), to the more general class of skew polynomials. We also analyze the resulting cost boundsin more details than in (Wachter-Zeh, 2013). This algorithm has complexity O ( s min { ω + , . } )and, together with a result of Caruso and Le Borgne (2017), implies a division algorithm in O ∼ ( s min { ω + , . } ).We show that computing the q -transform and its inverse can be reduced to a matrix-vectormultiplication and solving a system of equations, respectively, where in both cases the involvedmatrix has Toeplitz form. Thus, it can be computed in O ∼ ( s ) operations over F q m .Our fast algorithms for multi-point evaluation and computing minimal subspace polynomialsare divide-&-conquer methods that call each other recursively. These convoluted calls enableus to circumvent problems that arise from the non-commutativity of the linearized polynomialmultiplication. We also propose a divide-&-conquer interpolation algorithm that uses the newmulti-point evaluation and minimal subspace polynomial routines. All three methods use ideasfrom well-known algorithms from (Gathen and Gerhard, 1999, Section 10.1-10.2) and can beimplemented in O ∼ ( s max { log (3) , min { ω + , . }} ) operations over F q m .Table 1 summarizes the new cost bounds of operations that we prove in this paper.Since fast multiplication directly determines the cost of the other algorithms in this paper,we present a lower bound on the complexity of multiplying two linearized polynomials of de-gree s = m by showing that matrix multiplication can be reduced to the latter if elliptic normalbases of degree m exist. The resulting bound implies that there cannot be a quasi-linear algo-rithm for multiplying linearized polynomials, independent of m , unless a quasi-quadratic matrixmultiplication algorithm exists.Finally, we use the results to derive a new cost bound, O ∼ ( n max { log (3) , min { ω + , . }} ), for decod-ing Gabidulin codes of length n . This is the ﬁrst bound that is sub-quadratic in n . Table 1.

New and previous cost bounds over F q m (see Section 1.1 why known cost bounds over F q arenot tighter) for operations with linearized polynomialsof degree at most s and coe ﬃ cients in F q m . SeeSection 2.2 for a formal description of the operations.Operation (Source) New BeforeDivision ((Caruso and Le Borgne, 2017) and Theorem 6) O ∼ (cid:16) min (cid:110) sm , s min { ω + , . } (cid:111)(cid:17) O ∼ ( sm )(Inverse) q -Transform (Theorem 12) O ∼ ( s ) O (cid:16) s (cid:17) Minimal Subspace Polynomial Computation (Theorem 15) O ∼ (cid:16) s max { log (3) , min { ω + , . }} (cid:17) O (cid:16) s (cid:17) Multi-point Evaluation (Theorem 15) O ∼ (cid:16) s max { log (3) , min { ω + , . }} (cid:17) O (cid:16) s (cid:17) Interpolation (Theorem 17) O ∼ (cid:16) s max { log (3) , min { ω + , . }} (cid:17) O (cid:16) s (cid:17) This paper is structured as follows. In Section 2, we give deﬁnitions and formally introduce theoperations that are considered in this paper. Section 3 contains the main results of the paper: wepresent fast algorithms for division, q -transform, calculation of minimal subspace polynomials,multi-point evaluation, and interpolation. Using these new algorithms, we accelerate a knownlinearized polynomial multiplication algorithm and prove its optimality for the case s = m inSection 4. In Section 5, we show how our fast algorithms for linearized polynomials imply sub-quadratic decoding algorithms for a special class of rank-metric codes, Gabidulin codes, andSection 6 concludes this paper. 3 . Preliminaries Let q be a prime power, F q be a ﬁnite ﬁeld with q elements and F q m an extension ﬁeld of F q .The ﬁeld F q m can be seen as an m -dimensional vector space over F q . A subspace of F q m is alwaysmeant w.r.t. F q as the base ﬁeld. For a given subset A ⊆ F q m , the subspace (cid:104) A (cid:105) is the F q -span of A . Normal bases facilitate calculations in ﬁnite ﬁelds and can therefore be used to reduce thecomputational complexity. We shortly summarize important properties of normal bases in thefollowing, cf. (Gao, 1993; Lidl and Niederreiter, 1997; Menezes et al., 1993). A basis B = { β , β , . . . , β m − } of F q m over F q is a normal basis if β i = β [ i ] for all i , where β ∈ F q m is called normal element . As shown in (Lidl and Niederreiter, 1997, Thm. 2.35), there is a normal basisfor any ﬁnite extension ﬁeld F q m over F q .The dual basis B ⊥ of a basis B is needed to switch between a polynomial and its q -transform(cf. Section 3.3). For a given basis B of F q m over F q , there is a unique dual basis B ⊥ . The dualbasis of a normal basis is also a normal basis, cf. (Menezes et al., 1993, Thm. 1.1).If we represent elements of F q m in a normal basis over F q , applying the Frobenius auto-morphism · q to an element can be accomplished in O (1) operations over F q m as follows. Let[ A , . . . , A m ] T ∈ F m × q be the vector representation of a ∈ F q m in a normal basis. Then, forany j , the vector representation of a [ j ] is given by [ A m − j , A m − j + , . . . , A , A , . . . , A m − j − ] T , whichis just a cyclic shift of the representation of a . The same holds for an arbitrary automorphism σ ∈ Gal( F q m / F q ) since it is of the form σ ( · ) = · q i for i < m . In this paper, we present operations with linearized polynomials , also called q -polynomialsand deﬁned as follows. A linearized polynomial (Ore, 1933a) is a polynomial of the form a = t (cid:88) k = a k x q k = t (cid:88) k = a k x [ k ] , a k ∈ F q m , t ∈ N , where we use the notation [ i ] : = q i . The set of all linearized polynomials for given q and m isdenoted by L q m . We deﬁne the addition + of a , b ∈ L q m as for ordinary polynomials a + b = (cid:88) i ( a i + b i ) x [ i ] and the multiplication · as a · b = (cid:88) i  i (cid:88) j = a j b [ j ] i − j  x [ i ] . (1)Note that if L q m is seen as a subset of F q m [ x ], the multiplication · equals the composition a ( b ( x )).Using these operations, ( L q m , + , · ) is a (non-commutative) ring (Ore, 1933a). The identity elementof ( L q m , + , · ) is x [0] = x . In the following, all polynomials are linearized polynomials.We say that a ∈ L q m has q -degree deg q a = max { k ∈ N : a k (cid:44) } , where max ∅ : = −∞ .For s ∈ N , we deﬁne the set L ≤ sq m : = { a ∈ L q m : deg q a ≤ s } , and L < sq m analogously. A polyno-mial a is called monic if a deg q a =

1. Further, deg q ( a · b ) = deg q a + deg q b and deg q ( a + b ) ≤ max { deg q a , deg q b } . 4or a ∈ L q m , the evaluation (Boucher and Ulmer, 2014, Operator Evaluation ) is deﬁned by a ( · ) : F q m → F q m , α (cid:55)→ a ( α ) = (cid:88) i a i α [ i ] . Since σ ( α ) = α q is the Frobenius automorphism, α [ i ] = σ i ( α ) is also an automorphism andit can be shown that a ( · ) is an F q -linear map for any a ∈ L q m . It follows that the root spaceker( a ) = { α ∈ F q m : a ( α ) = } is a subspace of F q m . It is also clear that ( a · b )( α ) = a ( b ( α )). The ring of linearized polynomials is a left and right Euclidean domain and therefore admitsa left and right division.

Lemma 1 (Ore (1933a)) . For all a , b ∈ L q m , b (cid:44)

0, there are unique χ R , χ L ∈ L q m (quotients)and (cid:37) R , (cid:37) L ∈ L q m (remainders) such that deg q (cid:37) R < deg q b , deg q (cid:37) L < deg q b , and a = χ R · b + (cid:37) R (right division) , a = b · χ L + (cid:37) L (left division) . Lemma 1 allows us to deﬁne a (right) modulo operation on L q m such that a ≡ b mod c ifthere is a d ∈ L q m such that a = b + d · c . In the following, we use this deﬁnition of “mod”. Subspace polynomials are special linearized polynomials, with the property that their q -degreeequals their number of linearly independent roots. Lemma 2 (Lidl and Niederreiter (1997)) . Let U be a subspace of F q m . Then there exists a uniquenonzero monic polynomial M U ∈ L q m of minimal degree such that ker( M U ) = U . Its degree isdeg q M U = dim U .The polynomial M U in Lemma 2 is called minimal subspace polynomial (MSP) of U . Multi-point evaluation (MPE) is the process of evaluating a polynomial a ∈ L q m at multiplepoints α , . . . , α s ∈ F q m , i.e. computing the vector (cid:2) a ( α ) , . . . , a ( α s ) (cid:3) ∈ F sq m .Notice that for linearized polynomials a ( β α + β α ) = β a ( α ) + β a ( α ) for any β , β ∈ F q and α , α ∈ F q m . If we have therefore evaluated a ( x ) at a few linearly independent points, theevaluation of any F q -linear combination of these points can be calculated by simple additionswith almost no cost. The dual problem of MPE is to ﬁnd a polynomial of bounded degree that evaluates at givendistinct points to certain values and is called interpolation. It is based on the following lemma.

Lemma 3 (Silva and Kschischang (2007, Sec. III-A)) . Let ( x , y ) , . . . , ( x s , y s ) ∈ F q m such that x , . . . , x s are linearly independent over F q . Then there exists a unique interpolation polynomial I { ( x i , y i ) } si = ∈ L < sq m such that I { ( x i , y i ) } si = ( x i ) = y i , ∀ i = , . . . , s . .2.5. The q-transform Let s divide m and let B N = { β [0] , . . . , β [ s − } be a normal basis of F q s ⊆ F q m over F q . Deﬁnition 4.

The q -transform (w.r.t. s and B N ) is a mapping ˆ · : L < sq m → L < sq m , a (cid:55)→ ˆ a withˆ a j = a ( β [ j ] ) = s − (cid:88) i = a i β [ i + j ] , ∀ j = , . . . , s − . (2)Given a dual normal basis B ⊥ N = { β ⊥ [0] , . . . , β ⊥ [ s − } of B N , the inverse q -tranform can becomputed by a i = ˆ a ( β ⊥ [ i ] ) = (cid:80) s − j = ˆ a j β ⊥ [ j + i ] for all i = , . . . , s −

1, cf. (Silva and Kschischang,2009). Thus, the q -transform is bijective. Let K be a ﬁeld. The ring of skew polynomials K [ x ; σ, δ ] over K with automorphism σ ∈ Gal( F q m / F q ) and derivation δ , satisfying δ ( a + b ) = δ ( a ) + δ ( b ) and δ ( ab ) = δ ( a ) b + σ ( a ) δ ( b ) forall a , b ∈ K , is deﬁned as the set of polynomials (cid:80) i a i x i , a i ∈ K , with the multiplication rule xa = σ ( a ) x + δ ( a ) ∀ a ∈ K and ordinary component-wise addition. The degree of a skew polynomial is deﬁned as usual. K [ x ; σ, δ ] is left and right Euclidean, i.e., Lemma 1 also holds for skew polynomials. A compre-hensive description of skew polynomial rings and their properties can be found in (Ore, 1933b).In this paper, we only consider the special case δ =

0, in which we abbreviate the ring by K [ x ; σ ]. Also, we restrict ourselves to ﬁnite ﬁelds K = F q m . Note that there is a ring isomorphism ϕ : L q m → F q m [ x ; σ ] , (cid:80) i a i x [ i ] (cid:55)→ (cid:80) i a i x i , where σ ( · ) = · q is the Frobenius automorphism .Although some of our results might extend to a broader class of skew polynomials, we consider F q m [ x ; σ ] only as an auxiliary tool for obtaining fast algorithms for linearized polynomials.

3. Fast Algorithms

This section presents the main results of this paper: new fast algorithms for division (Sec-tion 3.2), q -transform (Section 3.3), calculation of the MSP for a given subspace (Section 3.4),multi-point evaluation (also in Section 3.4), and interpolation (Section 3.5). We count complexities in terms of operations in the ﬁeld F q m . For convenience, we use thefollowing notations. Deﬁnition 5.

Let s ∈ N . We deﬁne the (worst-case) complexity measures, i.e., the inﬁmum ofthe worst-case complexities of algorithms that solve the given problem.i) Complexity of left- or right-dividing a ∈ L ≤ sq m by b ∈ L ≤ sq m : D q m ( s ) . ii) Complexity of computing the MSP M (cid:104) U (cid:105) for a generating set U = { u , . . . , u s } of a sub-space of F q m : MSP q m ( s ) . a ∈ L ≤ sq m at the points α , . . . , α s ∈ F q m : MPE q m ( s ) . iv) Complexity of computing the q -transform or its inverse of a polynomial a ∈ L < sq m , given anormal basis B N = { β [0] , . . . , β [ s − } of F q s ⊆ F q m : QT q m ( s ) . v) Complexity of ﬁnding the interpolation polynomial of s point tuples I q m ( s ) . Table 2 summarizes best known cost bounds for these operations with linearized polynomials.

Table 2.

Previously best known cost bounds over F q m (see Section 1.1 why known cost bounds over F q are not tighter) for operations with linearized polynomials. For an overview of the results presented in thispaper, see Table 1.Operation Cost Bound Source D q m ( s ) O ∼ ( sm ) (Caruso and Le Borgne, 2017) MSP q m ( s ) O (cid:16) s (cid:17) (Silva et al., 2008) MPE q m ( s ) O (cid:16) s (cid:17) “naive” ( s ordinary polynomial evaluations) QT q m ( s ) O (cid:16) s (cid:17) (Silva et al., 2008) I q m ( s ) O (cid:16) s (cid:17) “naive” (using linearized Lagrange bases,cf. (Silva and Kschischang, 2007)) In this section, we present a division algorithm that has sub-quadratic complexity in thepolynomial degree s for arbitrary s . The currently best-known division algorithm has complex-ity O ∼ ( sm ) (Caruso and Le Borgne, 2017), which is quasi-linear for s (cid:29) m , but quadratic if s ∈ Θ ( m ). We improve upon this algorithm in the latter case.Our method is based on the following result by Caruso and Le Borgne (2017), which statesthat skew polynomial division can be reduced to multiplication in another skew polynomial ring.We denote by M q m ( s ) the complexity multiplying two skew polynomials in F q m [ x ; σ ] ≤ s . Theorem 6 (Caruso and Le Borgne (2017)) . D q m ( s ) ∈ O (cid:16) M q m ( s ) log s (cid:17) .We generalize the fast multiplication algorithm for linearized polynomials from (Wachter-Zeh, 2013, Theorem 3.1) to arbitrary skew polynomial rings with derivation δ = M q m ( s ). The algorithm is based on a fragmentation ofpolynomials, which was used for calculating power series expansions in (Brent and Kung, 1978;Paterson and Stockmeyer, 1973) and is related to the baby-steps giant-steps method.7e say that polynomials a , b ∈ F q m [ x ; σ ] overlap at k positions if the intersection of theirsupports supp( a ) : = { i : a i (cid:44) } and supp( b ) : = { i : b i (cid:44) } has cardinality | supp( a ) ∩ supp( b ) | = k . If k =

0, we say that a and b are non-overlapping. Obviously, the sum of two polynomialsoverlapping at k (known) positions can be calculated with k additions in F q m when the overlappingpositions are known. Theorem 7.

Let a , b ∈ F q m [ x ; σ ] ≤ s , s ∗ : = (cid:100) √ s + (cid:101) . Then c = a · b can be calculated in O ( s ) ﬁeldoperations, plus the cost of multiplying an s ∗ × s ∗ with an s ∗ × ( s + s ∗ ) matrix, using Algorithm 1. Proof.

We can fragment a into s ∗ non-overlapping polynomials a ( i ) as a = s ∗ − (cid:88) i = a ( i ) = s ∗ − (cid:88) i =  s ∗ − (cid:88) j = a is ∗ + j x is ∗ + j  and the result c of the multiplication c = a · b can also be fragmented as c = a · b =  s ∗ − (cid:88) i = a ( i )  · b = s ∗ − (cid:88) i = ( a ( i ) · b ) = : s ∗ − (cid:88) i = c ( i ) with c ( i ) =  s ∗ − (cid:88) j = a is ∗ + j x is ∗ + j  ·  s (cid:88) k = b k x k  = s ∗ − (cid:88) j = s (cid:88) k = (cid:16) a is ∗ + j x is ∗ + j · b k x k (cid:17) = s ∗ − (cid:88) j = s (cid:88) k = (cid:16) a is ∗ + j σ is ∗ + j ( b k ) x is ∗ + j + k (cid:17) = s + s ∗ − (cid:88) h =  h (cid:88) j = a is ∗ + j σ is ∗ + j ( b h − j )  x is ∗ + h = : s + s ∗ − (cid:88) h = c ( i ) h x is ∗ + h . Thus the c ( i ) ’s pairwise overlap at not more than s positions, which we know. In order to obtainthe polynomials c ( i ) , we can use σ − is ∗ ( c ( i ) h ) = σ − is ∗  h (cid:88) j = a is ∗ + j σ is ∗ + j ( b h − j )  σ − is ∗ aut. = h (cid:88) j = σ − is ∗ ( a is ∗ + j ) σ − is ∗ + is ∗ + j ( b h − j ) = h (cid:88) j = σ − is ∗ ( a is ∗ + j ) σ j ( b h − j ) . We can write this expression as a vector multiplication (cid:20) σ − is ∗ ( a is ∗ ) . . . σ − is ∗ ( a is ∗ + s ∗ − ) (cid:21) · (cid:20) σ ( b h ) σ ( b h − ) . . . σ h ( b ) 0 . . . (cid:21) T , where the left vector does not depend on h and the right side is independent of i . Thus, we canwrite the computation of σ − is ∗ ( c ( i ) h ) as a matrix multiplication C = A · B with C = (cid:104) C i j (cid:105) j = ,..., s + s ∗ − i = ,..., s ∗ − , C i j = σ − is ∗ ( c ( i ) j ) , A = (cid:104) A i j (cid:105) j = ,..., s ∗ − i = ,..., s ∗ − , A i j = σ − is ∗ ( a is ∗ + j ) , (3) B = (cid:104) B i j (cid:105) j = ,..., s + s ∗ − i = ,..., s ∗ − , B i j =  σ j ( b i − j ) , ≤ i − j ≤ s , , else . A and B costs s ∗ · s + s ∗ · s ∗ ≈ s many computations of automorphismsto F q m elements. Computing the matrix C from A and B requires a multiplication of an s ∗ × s ∗ withan s ∗ × ( s + s ∗ ) matrix. Extracting c ( i ) j = σ is ∗ ( C i j ) from C costs a computation of an automorphismeach, thus ≈ s computations in total. In order to obtain the skew polynomial c , we need to addup the c ( i ) ’s. For some k < s ∗ , the polynomials (cid:80) k − i = c ( i ) and c ( k ) overlap at not more than s positions. Since we know these overlapping positions, we can compute the sum of all c ( i ) ’s in O ( s ∗ · s ) = O ( s ) time. In a ﬁnite ﬁeld, the computation of an automorphism can be done in O (1), so Algorithm 1 costs O ( s ), plus the matrix multiplication. (cid:3) Algorithm 1:

Multiplication

Input: a , b ∈ F q m [ x ; σ ] ≤ s Output: c = a · b Set up matrices A and B as in (3) // s · O (1) C ← A · B // s ∗ · O (( s ∗ ) ω ) or ( s ∗ ) . Extract the c ( i ) ’s from C as in (3) // s · O (1) return c ← (cid:80) s ∗ − i = c ( i ) // O (cid:16) s (cid:17) Corollary 8. Di ﬀ erent techniques for the multiplication of the s ∗ × s ∗ with the s ∗ × ( s + s ∗ ) matricesin Theorem 7 result in the following cost bounds on the multiplication of skew polynomials:i) Using s ∗ + s ∗ × s ∗ with s ∗ × s ∗ matrices, we obtain M q m ( s ) ∈ O ( s ∗ · ( s ∗ ) ω ) ⊆ O (cid:16) s ω + (cid:17) . For instance, we get M q m ( s ) ∈  O (cid:16) s . (cid:17) , ω ≈ . , O (cid:16) s . (cid:17) , ω ≈ .

376 (Coppersmith and Winograd, 1990) . ii) Direct multiplication algorithms for rectangular matrices (cf. (Huang and Pan, 1998; Keet al., 2008)) result in M q m ( s ) ∈ O (cid:16) ( s ∗ ) . (cid:17) ⊆ O (cid:16) s . (cid:17) , where the power 3 . s ∗ × s ∗ withan ≈ s ∗ × ( s ∗ ) matrix. Remark 9.

Naive skew / linearized polynomial multiplication using the deﬁnition from (1) usesapproximately 2 s many ﬁeld operations. For comparison, if we use case (i) of Corollary 8with naive matrix multiplication, where each multiplication uses approximately 2( s ∗ ) opera-tions, skew polynomial multiplication takes ≈ s ∗ · (2( s ∗ ) ) = s ∗ ) = s operations in total.Thus, we improve upon the naive case as soon as the algorithm for multiplying two matrices ofdimension s ∗ × s ∗ is faster than 2( s ∗ ) .For instance, the algorithm of Strassen (1969) uses ≈ . s ∗ ) log (7) ﬁeld operations, which issmaller than 2( s ∗ ) for s ∗ ≥

85, or in other words s ≥ s . 9esides the asymptotic improvement, Algorithm 1 can yield a practical speed-up, comparedto a naive implementation that does not use linear-algebraic operations, by relying on e ﬃ cientlyimplemented linear algebra libraries that are optimized for the used programming language orhardware.Using Theorem 6 and Corollary 8, we obtain the following new cost bound on the division oflinearized polynomials. Corollary 10. D q m ( s ) ∈ O (cid:16) s min { ω + , . } log s (cid:17) .As a direct consequence of the result above, we obtain a fast (half) linearized extended Eu-clidean algorithm (LEEA) (cf. (Wachter-Zeh, 2013, Corollary 3.2)). Corollary 11.

The fast (half) LEEA from (Wachter-Zeh, 2013, Algorithm 3.4) for two inputpolynomials a , b , where s : = deg q a ≥ deg q b can be implemented in O (cid:16) max (cid:110) D q m ( s ) , M q m ( s ) (cid:111) log s (cid:17) ⊆ O (cid:16) s min { ω + , . } log ( s ) (cid:17) operations over F q m . The following theorem states that both the q -transform and its inverse can be obtained inquasi-linear time over F q m . Recall that s must divide m in order for the q -transform to be well-deﬁned. The idea of the fast q -transform is based on the fact that the q -transform is basicallythe multiplication of the vector with a Toeplitz matrix. Since Toeplitz matrix multiplication canbe reduced to multiplication of polynomials in F q m [ x ] (cf. (Gathen and Gerhard, 1999)), also the q -transform can be implemented in quasi-linear time over F q m . Theorem 12. QT q m ( s ) ∈ O (cid:16) s log ( s ) log(log( s )) (cid:17) . Proof.

Let a ∈ L < sq m . From (2) we know that(ˆ a , . . . , ˆ a s − ) = ( a s − , . . . , a ) ·  β [ s − β [ s ] β [ s + . . . β [2 s − β [ s − β [ s − β [ s ] . . . β [2 s − ... ... ... . . . ...β [0] β [1] β [2] . . . β [ s −  , (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) = : B where the matrix B is an s × s Toeplitz matrix over F q m . At the same time, it is a q -Vandermondematrix which is invertible (see (Lidl and Niederreiter, 1997, Lemma 3.5.1)).As described in (Bini and Pan, 2012, Problems 2.5.1), Toeplitz matrix vector multiplicationcan be reduced to multiplication of F q m [ x ] polynomials with degree ≤ s , which has complexity O (cid:0) s log( s ) log(log( s )) (cid:1) , cf. (Gathen and Gerhard, 1999).The inverse q -transform consists of solving a Toeplitz linear system, which is reducible to aPadé approximation problem, that again can be solved using the extended Euclidean algorithmover F q m [ x ], cf. (Brent et al., 1980a). A fast Extended Euclidean Algorithm with stopping condi-tion was introduced by Aho and Hopcroft in (Aho and Hopcroft, 1974). Its complexity is shown10o be O ( (cid:102) M ( s ) log( s )), where (cid:102) M ( s ) is the complexity of multiplying two polynomials of degree s in F q m [ x ]. However, for some special cases the algorithm does not work properly and therefore theimprovements from (Gustavson and Yun, 1979), (Brent et al., 1980b) have to be considered. Thisfast EEA was summarized and proven in (Blahut, 1985). The resulting complexity of solving theToeplitz linear system is thus O (cid:16) s log ( s ) log(log( s )) (cid:17) . (cid:3) In this subsection, we consider an e ﬃ cient way to calculate the minimal subspace polynomialand the multi-point evaluation. Fast algorithms for multi-point evaluation at a set S ⊆ F q m over F q m [ x ] typically pre-compute a sub-product tree (consisting of polynomials of the form M U = (cid:81) u ∈ U ( x − u ) for U ⊆ S ) and then use divide-and-conquer methods for fast MPE. Such a sub-product tree can only be computed fast since in the commutative case, the polynomial M U canbe written as the product of two such polynomials of a partition U = A ∪ B , A ∩ B = ∅ , M U = M A · M B . The equivalent statement for linearized polynomials, using minimal subspace polynomials, isgiven in Lemma 13 (see below). In contrast to the commutative case, one of the factors dependson a multi-point evaluation of the other factor. Hence, we cannot immediately apply the knownmethods.The following two lemmas lay the foundation to algorithms that compute MSPs and MPEs byconvoluted recursive calls of each other. Thus, we need to analyze their complexities jointly.

Lemma 13 (Li et al. (2014)) . Let U = { u , . . . , u s } be a generating set of a subspace U ⊆ F q m , A , B ⊆ F q m such that U = A ∪ B . Then, M U = M (cid:104) U (cid:105) = M (cid:104)M (cid:104) A (cid:105) ( B ) (cid:105) · M (cid:104) A (cid:105) and M (cid:104) u i (cid:105) =  x [0] , if u i = , x [1] − u q − i x [0] , else. (4) Lemma 14.

Let a ∈ L q m and let U , A , B ⊆ F q m where A , B ⊆ F q m are disjoint and U = A ∪ B . Let (cid:37) A , (cid:37) B be the remainders of the right divisions of a by M (cid:104) A (cid:105) and M (cid:104) B (cid:105) respectively.Then, the multi-point evaluation of a at the set U is a ( U ) = (cid:37) A ( A ) ∪ (cid:37) B ( B ) . If U = { u } and deg q a ≤ a ( U ) = { a ( u ) = a u [1] + a u [0] } . (5) Proof.

Let u ∈ U . If u ∈ A , a ( u ) = ( χ A · M (cid:104) A (cid:105) + (cid:37) A )( u ) = χ A ( M (cid:104) A (cid:105) ( u ) (cid:124) (cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32) (cid:125) = ) + (cid:37) A ( u ) = χ A (0) (cid:124)(cid:123)(cid:122)(cid:125) = + (cid:37) A ( u ) = (cid:37) A ( u ) . Otherwise, u ∈ B and a ( u ) = ( χ B · M (cid:104) B (cid:105) + (cid:37) B )( u ) = χ B ( M (cid:104) B (cid:105) ( u ) (cid:124) (cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32) (cid:125) = ) + (cid:37) B ( u ) = χ B (0) (cid:124)(cid:123)(cid:122)(cid:125) = + (cid:37) B ( u ) = (cid:37) B ( u ) . a ( U ) = (cid:37) A ( A ) ∪ (cid:37) B ( B ). Equation (5) follows directly from the deﬁnition of the evaluationmap. (cid:3) This yields the main statement of this subsection.

Theorem 15.

Finding the MSP of an F q m -subspace spanned by s elements of F q m and computingthe MPE of a polynomial of q -degree at most s at s many points can be implemented in MSP q m ( s ) , MPE q m ( s ) ∈ O (cid:16) max (cid:110) s log (3) log( s ) , M q m ( s ) , D q m ( s ) (cid:111)(cid:17) ⊆ O (cid:16) s max { log (3) , min { ω + , . }} log( s ) (cid:17) . operations over F q m using Algorithms 2 and 3 respectively. Proof.

We prove that Algorithm 2 for computing the MSP and Algorithm 3 for MPE are correctand have the desired complexity.

Correctness:

Since the algorithms call each other recursively, we need to prove their correct-ness jointly by induction.For s =

1, Algorithm 2 returns the base case of Lemma 13 and Algorithm 3 uses Equation (5)of Lemma 14 to compute the evaluation of a polynomial of deg q a ≤ s − s ≥

2. Then,Algorithm 2 works for an input of size s because it uses the recursion formula of Lemma 13 toreduce the problem to two MSP computations and a multi-point evaluation, each of input size ≈ s / ≤ s −

1. Algorithm 3 works for an input of size s due to Lemma 14 and a similar argument.Hence, both algorithms are correct. Complexity:

The lines of Algorithm 2 have the following complexities: • The complexities of Lines 2 (base case) and 4 (partitioning of U ) are negliglible. • Lines 5 and 7 both have complexity

MSP q m (cid:16) s (cid:17) because | A | ≈ | B | ≈ | U | = s . • Line 6 computes the result in

MPE q m (cid:16) s (cid:17) time because deg q M (cid:104) A (cid:105) (cid:46) | B | ≈ | U | = s . • Line 8 performs a multiplication of two polynomials of q -degree (cid:46) s ≤ s and has timecomplexity M q m ( s ).In total, we obtain MSP q m ( s ) = · MSP q m (cid:16) s (cid:17) + MPE q m (cid:16) s (cid:17) + M q m ( s ) . (6)Algorithm 3 consists of the following steps: • Again, the complexities of Lines 2 (base case) and 4 (partitioning of U ) are negliglible. • Lines 5 and 6 compute the MSP of bases with input size | A | ≈ | B | ≈ | U | = s , so both havecomplexity MSP q m (cid:16) s (cid:17) . • Lines 7 and 8 divide polynomials from L ≤ sq m and therefore have complexity D q m ( s ) each. • Line 9 performs two multi-point evaluations of polynomials with q -degree < | A | ≈ | B | ≈ s (cf. Lemma 1, deg q of remainder) at | A | ≈ | B | ≈ s positions. Thus, the line has complexity2 · MPE q m (cid:16) s (cid:17) .Summarized, we get MPE q m ( s ) = · MSP q m (cid:16) s (cid:17) + · MPE q m (cid:16) s (cid:17) + · D q m ( s ) . (7)In fact, the MSP computed in Line 5 of Algorithm 3 is the same as the MSP which was computedin Line 5 of Algorithm 2 at the same recursion depth before (note that MSP ( U ) ﬁrst calls MSP ( A )12nd then MPE (cid:0) M (cid:104) A (cid:105) , B (cid:1) ). This means that we can store this polynomial instead of recomputingit and can reduce (7) to MPE q m ( s ) = MSP q m (cid:16) s (cid:17) + · MPE q m (cid:16) s (cid:17) + · D q m ( s ) . (8)We deﬁne C ( s ) : = max (cid:110) MPE q m ( s ) , MSP q m ( s ) (cid:111) and derive an upper bound on C ( s ). Using (6)and (8), we obtain C ( s ) ≤ · C (cid:16) s (cid:17) + max (cid:110) M q m ( s ) , · D q m ( s ) (cid:111) . We distinguish three cases and use the master theorem: • If max (cid:110) M q m ( s ) , D q m ( s ) (cid:111) ∈ O (cid:16) s log (3) − ε (cid:17) for some ε >

0, then C ( s ) ∈ O (cid:16) s log (3) (cid:17) . • If max (cid:110) M q m ( s ) , D q m ( s ) (cid:111) ∈ Θ (cid:16) s log (3) (cid:17) , then C ( s ) ∈ O (cid:16) s log (3) log( s ) (cid:17) . • If max (cid:110) M q m ( s ) , D q m ( s ) (cid:111) ∈ Ω (cid:16) s log (3) + ε (cid:17) for some ε >

0, then C ( s ) ∈ O (cid:16) max (cid:110) M q m ( s ) , D q m ( s ) (cid:111)(cid:17) . In summary, we obtain

MSP q m ( s ) , MPE q m ( s ) ∈ O (cid:16) max (cid:110) s log (3) log( s ) , M q m ( s ) , D q m ( s ) (cid:111)(cid:17) ⊆ O (cid:16) s max { log (3) , min { ω + , . }} log( s ) (cid:17) . (cid:3) Algorithm 2:

MSP ( U ) Input:

Generating set U = { u , . . . , u s } of a subspace U ⊆ F q m . Output:

MSP M (cid:104) U (cid:105) . if s = then return M (cid:104) u (cid:105) ( x ) according to (4) // O (1) else A ← { u , . . . , u (cid:98) s (cid:99) } , B ← { u (cid:98) s (cid:99) + , . . . , u s } // O (1) M (cid:104) A (cid:105) ← MSP ( A ) // MSP q m (cid:16) s (cid:17) M (cid:104) A (cid:105) ( B ) ← MPE (cid:0) M (cid:104) A (cid:105) , B (cid:1) // MPE q m (cid:16) s (cid:17) M M (cid:104) A (cid:105) ( B ) (cid:105) ← MSP (cid:0) M (cid:104) A (cid:105) ( B ) (cid:1) // MSP q m (cid:16) s (cid:17) return M M (cid:104) A (cid:105) ( B ) (cid:105) · M (cid:104) A (cid:105) // M q m ( s ) lgorithm 3: MPE ( a , { u , . . . , u s } ) Input: a ∈ L ≤ sq m , { u , . . . , u s } ∈ F sq m Output:

Evaluation of a at all points u i if s = then return { a u [1]1 + a u [0]1 } // O (1) else A ←{ u , . . . , u (cid:98) s (cid:99) } , B ←{ u (cid:98) s (cid:99) + , . . . , u s } // O (1) M (cid:104) A (cid:105) ← MSP ( A ) // MSP q m (cid:16) s (cid:17) M (cid:104) B (cid:105) ← MSP ( B ) // MSP q m (cid:16) s (cid:17) [ χ A , (cid:37) A ] ← RightDiv (cid:0) a , M (cid:104) A (cid:105) (cid:1) // D q m ( s ) [ χ B , (cid:37) B ] ← RightDiv (cid:0) a , M (cid:104) B (cid:105) (cid:1) // D q m ( s ) return MPE ( (cid:37) A , A ) ∪ MPE ( (cid:37) B , B ) // · MPE q m (cid:16) s (cid:17) In this section, we present a fast divide-&-conquer interpolation algorithm for linearized poly-nomials that relies on fast algorithms for both computing MSPs and MPEs. The idea resemblesthe fast interpolation algorithm in F q m [ x ] from (Gathen and Gerhard, 1999, Section 10.2) withadditional considerations for the non-commutativity, and is based on the following lemma. Lemma 16.

Let ( x i , y i ) be as in Lemma 3. The interpolation polynomial fulﬁlls I { ( x i , y i ) } si = = I { ( (cid:101) x i , y i ) } (cid:98) s (cid:99) i = · M (cid:104) x (cid:98) s (cid:99) + ,..., x s (cid:105) + I { ( (cid:101) x i , y i ) } si = (cid:98) s (cid:99) + · M (cid:104) x ,..., x (cid:98) s (cid:99) (cid:105) with (cid:101) x i : =  M (cid:104) x (cid:98) s (cid:99) + ,..., x s (cid:105) ( x i ) , if i = , . . . , (cid:98) s (cid:99)M (cid:104) x ,..., x (cid:98) s (cid:99) (cid:105) ( x i ) , otherwiseand I { ( x i , y i ) } i = = y x x [0] (base case s = Proof.

For i = , . . . , (cid:98) s (cid:99) , the (cid:101) x i are linearly independent since the x i are linearly independentand M (cid:104) x (cid:98) s (cid:99) + ,..., x s (cid:105) ( · ) is a linear map whose kernel is spanned by x (cid:98) s (cid:99) + , . . . , x s , and therefore doesnot include any x i for i = , . . . , (cid:98) s (cid:99) . Furthermore, I { ( x i , y i ) } si = ( x i ) = I { ( (cid:101) x i , y i ) } (cid:98) s (cid:99) i = (cid:0) = (cid:101) x i (cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125)(cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123) M (cid:104) x (cid:98) s (cid:99) + ,..., x s (cid:105) ( x i ) (cid:1) + I { ( (cid:101) x i , y i ) } si = (cid:98) s (cid:99) + (cid:0) M (cid:104) x ,..., x (cid:98) s (cid:99) (cid:105) ( x i ) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) = (cid:1) = I { ( (cid:101) x i , y i ) } (cid:98) s (cid:99) i = ( (cid:101) x i ) + I { ( (cid:101) x i , y i ) } si = (cid:98) s (cid:99) + (0) = y i + = y i . By the same argument, also (cid:101) x (cid:98) s (cid:99) + , . . . , (cid:101) x s are linearly independent and I { ( x i , y i ) } si = ( x i ) = y i for all i = (cid:98) s (cid:99) + , . . . , s . Since in addition,deg q I { ( x i , y i ) } si = ≤ max (cid:8) deg q (cid:0) I { ( (cid:101) x i , y i ) } (cid:98) s (cid:99) i = (cid:1)(cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) < s + deg q (cid:0) M (cid:104) x (cid:98) s (cid:99) + ,..., x s (cid:105) (cid:1)(cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ≤ s , deg q (cid:0) I { ( (cid:101) x i , y i ) } si = (cid:98) s (cid:99) + (cid:1)(cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) < s + deg q (cid:0) M (cid:104) x ,..., x (cid:98) s (cid:99) (cid:105) (cid:1)(cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) ≤ s (cid:9) < s , I { ( (cid:101) x i , y i ) } (cid:98) s (cid:99) i = · M (cid:104) x (cid:98) s (cid:99) + ,..., x s (cid:105) + I { ( (cid:101) x i , y i ) } si = (cid:98) s (cid:99) + · M (cid:104) x ,..., x (cid:98) s (cid:99) (cid:105) is the desired interpolationpolynomial I { ( x i , y i ) } si = ( x i ) of Lemma 3. The case s = (cid:3) Lemma 16 implies a divide-&-conquer interpolation strategy. The method is outlined in Al-gorithm 4, whose complexity is stated in the following theorem.

Theorem 17.

Computing the interpolation polynomial of s point tuples can be implemented in I q m ( s ) ∈ O ( MSP q m ( s )) ⊆ O ( s max { log (3) , min { ω + , . }} log( s ))operations over F q m using Algorithm 4. Proof.

Algorithm 4 computes the correct interpolation polynomial due to Lemma 16. Its lineshave the following complexities: • Lines 2 and 4 are again negligible. • The complexities of Lines 5 and 6 are

MSP q m (cid:16) s (cid:17) . • Lines 7 and 7 take

MPE q m (cid:16) s (cid:17) time each. • The algorithm calls itself recursively with input size ≈ s in Lines 9 and 10. • Finally, the result is reassembled in line 11 using two multiplications in 2 · M q m (cid:16) s (cid:17) time.Overall, we have I q m ( s ) = · I q m (cid:16) s (cid:17) + · (cid:16) MSP q m (cid:16) s (cid:17) + MPE q m (cid:16) s (cid:17) + M q m (cid:16) s (cid:17)(cid:17) = · I q m (cid:16) s (cid:17) + O (cid:16) MSP q m ( s ) (cid:17) By the master theorem, we obtain the desired complexity I q m ( s ) ∈ O ( MSP q m ( s )). (cid:3) Algorithm 4: IP (cid:16) { ( x i , y i ) } si = (cid:17) Input: ( x , y ) , . . . , ( x s , y s ) ∈ F q m , x i linearly independent Output:

Interpolation polynomial I { ( x i , y i ) } si = if s = then return { y x x [0] } // O (1) else A ← { x , . . . , x (cid:98) s (cid:99) } , B ← { x (cid:98) s (cid:99) + , . . . , x s } // O (1) M (cid:104) A (cid:105) ← MSP ( A ) // MSP q m (cid:16) s (cid:17) M (cid:104) B (cid:105) ← MSP ( B ) // MSP q m (cid:16) s (cid:17) { (cid:101) x , . . . , (cid:101) x (cid:98) s (cid:99) } ← MPE (cid:0) M (cid:104) B (cid:105) , A (cid:1) // MPE q m (cid:16) s (cid:17) { (cid:101) x (cid:98) s (cid:99) + , . . . , (cid:101) x s } ← MPE (cid:0) M (cid:104) A (cid:105) , B (cid:1) // MPE q m (cid:16) s (cid:17) I ← IP (cid:18) { ( (cid:101) x i , y i ) } (cid:98) s (cid:99) i = (cid:19) // I q m (cid:16) s (cid:17) I ← IP (cid:18) { ( (cid:101) x i , y i ) } si = (cid:98) s (cid:99) + (cid:19) // I q m (cid:16) s (cid:17) return I · M (cid:104) B (cid:105) + I · M (cid:104) A (cid:105) // · M q m (cid:16) s (cid:17) .6. Concluding Remarks In this section, we have presented fast algorithms for the division, q -transform, MSP, MPEand interpolation with subquadratic complexity in s over F q m , independent of m (cf. Table 1 onpage 3). Our fast algorithms are faster than all previously known algorithms when s ≤ m (seeprevious work in Section 1.1).After the initial submission of this paper, the preprint (Caruso and Borgne, 2017) proposed afast algorithm for multiplication of skew polynomials of complexity O ( s ω − m ) over F q , whichimproves upon the multiplication algorithm in Section 3.2 for m / (5 − ω ) ≤ s ≤ m . As an immediateconsequence, also the cost bound for division is improved to O ∼ ( s ω − m ) over F q in this range.However, the result does not improve our cost bounds for the q -transform, MSP, MPE, and in-terpolation since the ﬁrst is already quasi-linear and the other algorithms’ complexity would bedominated by the s log (3) factor, cf. Table 1 on page 3.

4. An Optimal Multiplication Algorithm for s = m Since the algorithms in Section 3 rely on fast multiplication of linearized polynomials, wewould like to know a lower bound on the cost of it. In this section, we therefore show that m × m matrix multiplication can be reduced to multiplication of linearized polynomial of degree at most s = m if an elliptic normal basis of F q m exists. This gives a lower bound on the cost of solvingthe latter problem.We also speed up the algorithm for linearized polynomial multiplication modulo x [ m ] − x from(Wachter-Zeh, 2013, Section 3.1.3) and show that it achieves this optimal complexity for s = m and is faster than the fragmentation-based multiplication algorithm from (Wachter-Zeh, 2013,Section 3.1.2) for m ω − ω + ≤ s < m /

2. As a by-product, we show that the MPE at a basis of F q m from (Wachter-Zeh, 2013, Section 3.1.3) can be implemented in sub-cubic time using our fast q -transform algorithm from Section 3.3. As a ﬁrst step, we summarize and slightly reformulate the statements of (Wachter-Zeh, 2013,Section 3.1.3) which imply that matrix multiplication and linearized polynomial multiplicationare closely connected.

Lemma 18.

The evaluation maps ev a , ev b : F q m → F q m of a , b ∈ L q m are the same if and only if a ≡ b mod ( x [ m ] − x ). Proof.

Let B = { β , . . . , β m } , be an F q -basis of F q m and suppose ev a = ev b . Then, the remainderof a − b right-divided by x [ m ] − x must be zero because it vanishes on the basis B and has degreesmaller than m . The other direction is clear due to ev x [ m ] − x = (cid:3) Lemma 18 implies that the evaluation map provides a bijection between L < mq m and the setEnd F q ( F q m ) of F q -linear maps F q m → F q m . Furthermore, the multiplication modulo x [ m ] − x , de-noted by · mod ( x [ m ] − x ) of two polynomials a , b ∈ L < mq m corresponds to the composition ◦ of their For s ≥ m /

2, the algorithms’ outputs might di ﬀ er due to the modulo operation, so they cannot be compared a · b = ev a ◦ ev b . Using the matrix representation [ ψ ] BB of a linearmap ψ ∈ End F q ( F q m ), we obtain a monoid isomorphism ϕ B : (cid:16) L < mq m , · mod ( x [ m ] − x ) (cid:17) → (cid:16) F m × mq , · (cid:17) , a (cid:55)→ [ev a ] BB , Thus, multiplication of matrices in F m × mq is equivalent to multiplication modulo x [ m ] − x in L < mq m and either operation can be e ﬃ ciently reduced to the other, given that ϕ B and its inverse canbe computed fast.Note that ϕ B ( a ) can be computed by evaluating a at the elements of B and representing theresult in the basis B . The inverse mapping ϕ − B ( A ) corresponds to ﬁnding the polynomial thatevaluates to the values represented by the columns of the matrix A (in the basis B ) at the elementsof B , i.e., an interpolation. Both maps can be e ﬃ ciently computed as follows. Lemma 19.

Let B be a basis of F q m over F q , a ∈ L < mq m , and A ∈ F m × mq . • If B is a normal basis, then ϕ B ( a ) (or ϕ − B ( A )) can be computed by a q -transform (or aninverse q -transform). • Otherwise, ϕ B ( a ) (or ϕ − B ( A )) can be computed by a q -transform (or an inverse q -transform),plus two matrix multiplications. Proof.

Recall that ϕ B ( a ) can be obtained by a multi-point evaluation at the basis B and by rep-resenting the result in the basis B . If B is a normal basis, this corresponds to a q -transform. Inthe other cases, we can choose a normal basis B (cid:48) of F q m over F q and ﬁrst compute ϕ B (cid:48) ( a ). Then,we use two matrix multiplications by the change of basis matrices T BB (cid:48) (from B (cid:48) to B ) and T B (cid:48) B toobtain ϕ B ( a ) = [ev a ] BB = T BB (cid:48) · [ev a ] B (cid:48) B (cid:48) · T B (cid:48) B = T BB (cid:48) · ϕ B (cid:48) ( a ) · T B (cid:48) B . Analogously, we can compute ϕ − B ( A ) by two matrix multiplications and an inverse q -transforminstead of an interpolation. (cid:3) Lemma 19 implies that any MPE and interpolation w.r.t. a basis of F q m can be computed in O ∼ ( m ω ) operations over F q . In the following two subsections, we use this observation to speedup the multiplication algorithm modulo x [ m ] − x from Wachter-Zeh (2013) and show that it hasoptimal complexity. The following theorem shows how to speed up the algorithm for linearized polynomial mul-tiplication modulo x [ m ] − x from (Wachter-Zeh, 2013, Section 3.1.3) using our fast q -transformalgorithm from Theorem 12. The resulting complexity bottleneck is thus a matrix multiplicationinstead of a q -transform. Theorem 20.

Using the q -transform as described in Theorem 12, multiplication of a , b ∈ L < mq m modulo x [ m ] − x can be implemented in O ( m ω ) operations over F q . Proof.

By the properties of ϕ B , we can compute c = a · b mod ( x [ m ] − x ) by c = ϕ − B ( ϕ B ( a ) · ϕ B ( b )) . ϕ B and one ϕ − B cost six matrix multiplications, a q -transform andan inverse q -transform. In addition, we need to perform a matrix multiplication. Using the algo-rithm for q -transform described in Theorem 12 together with the bases from (Couveignes andLercier, 2009), the (inverse) q -transform costs O ∼ ( m ) operations over F q . Hence, the matrixmultiplications with complexity O ( m ω ) over F q are dominant. (cid:3) Remark 21.

For polynomials of q -degree s < m /

2, the algorithm described above is a linearizedpolynomial multiplication algorithm since the result has degree < m and is not a ﬀ ected by themodulo x [ m ] − x reduction. It is possible to extend the algorithm to polynomials of q -degree s ≥ m / s = µ · m / µ ≥

1. Then, we can fragment a , b into µ polynomials of degree < m / a and b respectively (costs µ many multiplicationsof degree < m / O ( µ m ω )). Addition of the fragments is negligible sincewe know the overlapping positions. Hence, we obtain a complexity of O (cid:16) max (cid:110) s m ω − , m ω (cid:111)(cid:17) in operations over F q . For m ω − ω + < s < m , this multiplication algorithm is faster than the one of(Wachter-Zeh, 2013, Section 3.1.2) (see Section 3.2), which has complexity O ∼ ( ms min { ω + , . } )over F q when using the bases of Couveignes and Lercier (2009). In addition, the constant hiddenby the O -notation is smaller since the matrix multiplication is with respect to m , which is muchlarger than √ s in the case of the algorithm in Section 3.2 (cf. Remark 9) for s ≈ m . = m We prove the optimality of the algorithm described in Theorem 20 by reducing matrix multi-plication to polynomial multiplication. Lemma 19 implies that if the basis in which we representelements of F q m is a normal basis, we can reduce matrix multiplication to a q -transform, twoinverse q -transforms and a multiplication of two linearized polynomials modulo x [ m ] − x (notethat the modulo reduction only requires Frobenius automorphisms and O ( m ) many additions in F q m , so in total O ( m ) operations in F q ).In addition, when the bases admit quasi-linear multiplication as the so-called normal ellipticbases from (Couveignes and Lercier, 2009), the q -transform only costs O ∼ ( m ) operations over F q by Theorem 12, and the complexity bottleneck becomes the multiplication of two linearizedpolynomials of degree m . This can be summarized in the following statement. Let M q m ( m ) de-note the worst-case cost of multiplying two polynomials from L < mq m (note that the polynomialdegrees are smaller than s = m ). Lemma 22.

Let q , m be such that there is an elliptic normal basis of F q m over F q . Then, themultiplication of two matrices from F m × mq can be implemented in O ( m ω ) ⊆ O (cid:16) M q m ( m ) (cid:17) operations over F q .Lemma 22 states that if a normal elliptic basis exists for q , m , then matrix multiplication canbe e ﬃ ciently reduced to linearized polynomial multiplication in L < mq m . Such bases do not exist forall pairs ( q , m ). However, we can give the following statement.18 emma 23. Let ( m i ) i ∈ N be a sequence of m i ∈ N with m i → ∞ ( i → ∞ ). Then, there is asequence ( q i ) i ∈ N , where the q i are prime powers, such that there is an elliptic normal basis of F q mii over F q i . Proof.

Let (cid:101) q i be any sequence of prime powers. Due to (Couveignes and Lercier, 2009, Sec-tion 5.2), there is a positive integer f i ∈ O (log ( m i )(log log( m i )) ) such that q i = (cid:101) q f i i admits anelliptic normal basis as desired. (cid:3) Suppose that M q m ( m ) ∈ Θ ( m γ ) for some γ ≥

2, independent of the ground ﬁeld q . Let( q i , m i ) i ∈ N be a sequence of pairs q i and m i as in Lemma 23. Then, by Lemma 22 there must be aconstant C ∈ R > and an index j ∈ N such that m ω i ≤ C · m γ i for all i ≥ j . Hence, ω ≤ γ , which provides a lower bound for the linearized polynomial multi-plication exponent γ . Note that the multiplication algorithm in Section 4.2 achieves this boundand therefore has optimal complexity.The fragmentation algorithm described in (Wachter-Zeh, 2013) (see also Section 3.2), in com-bination with the bases of (Couveignes and Lercier, 2009) achieves γ = ω + , so its complexitydi ﬀ ers from an optimal solution by a factor m − ω (note that this only holds for s = m ).Note that our argumentation also implies that the existence of a linearized polynomial mul-tiplication algorithm with quasi-linear complexity in s over F q m , independent of s , would give aquasi-quadratic matrix multiplication algorithm in the cases where an elliptic normal basis of F q m exists. Hence, proving that a quasi-linear linearized polynomial multiplication algorithm existsis at least as hard as proving that matrix multiplication can be implemented in quasi-quadratictime.

5. Decoding Gabidulin Codes in Sub-quadratic Time

Gabidulin codes are rank-metric codes that can be found in a wide range of applications,including network coding (Silva et al., 2008), code-based cryptosystems (Gabidulin et al., 1991),and distributed storage systems (Silberstein et al., 2012).In this section, we show that two algorithms for decoding Gabidulin codes from (Wachter-Zeh, 2013), one for only errors and one including generalized row and column erasures, canbe implemented in O ∼ ( n max { log (3) , min { ω + , . }} ) operations over F q m using the methods presentedin this paper. This yields the ﬁrst decoding algorithms for Gabidulin codes with sub-quadraticcomplexity. A rank-metric code

C ⊆ F m × nq is a set of matrices over a ﬁnite ﬁeld F q , where the distance oftwo codewords is measured w.r.t. the rank distance d R : F m × nq × F m × nq → N , ( C , C ) (cid:55)→ rank( C − C ) . Since, for a ﬁxed F q -basis of F q m , elements in F nq m can be expanded into matrices in F m × nq , the rankdistance is also well-deﬁned over F nq m . A linear rank-metric code of length n , dimension k , andminimum rank distance d R is a k -dimensional F q m -subspace of F nq m whose elements have pairwiserank distance at least d R . It was shown in (Delsarte, 1978; Gabidulin, 1985; Roth, 1991) that any19uch code with n ≤ m fulﬁlls the rank-metric Singleton bound d R ≤ n − k +

1. Codes achievingthis bound with equality are called maximum rank distance (MRD) codes.

Gabidulin codes (Delsarte, 1978; Gabidulin, 1985; Roth, 1991) are a special class of MRDcodes and are often considered as the analogs of Reed–Solomon codes in rank metric. They canbe deﬁned by the evaluation of degree-restricted linearized polynomials as follows.

Deﬁnition 24 (Gabidulin (1985)) . A linear Gabidulin code G [ n , k ] over F q m of length n ≤ m anddimension k ≤ n is the set G [ n , k ] = (cid:110) (cid:2) f ( g ) f ( g ) . . . f ( g n ) (cid:3) : f ∈ L < kq m (cid:111) , where the ﬁxed elements g , g , . . . , g n ∈ F q m are linearly independent over F q .Note that the encoding of Gabidulin codes, see Deﬁnition 24, is equivalent to the calcula-tion of one MPE and can therefore be accomplished with complexity O ( s min { ω + , . } log( s )). If { g , . . . , g n } is a normal basis, it can be computed as a q -transform in O ( s log ( s ) log(log( s ))).In this section, we assume that a word r = c + e is received, where d R ( r , c ) = rk( e ) denotesthe number of rank errors, and the decoder wants to retrieve c from r . Algorithm 5 shows (Wachter-Zeh, 2013, Algorithm 3.6) for decoding Gabidulin codes up to (cid:98) ( d − / (cid:99) rank errors. This algorithm can be seen as the rank-metric equivalent of the Reed–Solomon decoding algorithms from (Sugiyama et al., 1975; Welch and Berlekamp, 1986; Sudan,1997; Gao, 2003). Its correctness was proven in (Wachter-Zeh, 2013, Theorem 3.7) and its com-plexity was shown to be in O ( n ) over F q m . Since all steps of Algorithm 5 can be performed byalgorithms with sub-quadratic complexity from this paper, the following corollary holds. Corollary 25.

Algorithm 5 can be implemented in O (cid:16) n max { log (3) , min { ω + , . }} log ( n ) (cid:17) operationsover F q m . Algorithm 5:

DecodeGabidulin (cid:0) r , { g , g , . . . , g n } (cid:1) Input:

Received word r ∈ F nq m and g , g , . . . , g n ∈ F q m , linearly independent over F q . Output:

Estimated evaluation polynomial f with deg q f < k and error span polynomial Λ or“decoding failure”. ˆ r ← I { ( g i , r i ) } ni = // I q m ( n ) M ← M (cid:104) g ,..., g n (cid:105) // MSP q m ( n ) [ r out , u out , v out ] ← R ight LEEA (cid:0) M , ˆ r , (cid:98) ( n + k ) / (cid:99) (cid:1) // M q m ( n ) log ( n ) (Corollary 11) [ χ L , (cid:37) L ] ← Left-divide r out by u out // D q m ( n ) if (cid:37) L = then return [ f , Λ ] ← [ χ L , u out ] else return “decoding failure” .3. Error-Erasure Decoding Algorithm 6 shows (Wachter-Zeh, 2013, Algorithm 3.7) for decoding Gabidulin codes with t errors, ρ generalized row erasures and γ generalized column erasures if2 t + ρ + γ ≤ d − , where d is the minimum rank distance of the Gabidulin code. The correctness of Algorithm 6was proven in (Wachter-Zeh, 2013, Theorem 3.9). Theorem 26.

Algorithm 6 can be implemented in O (cid:16) n max { log (3) , min { ω + , . }} log ( n ) (cid:17) operationsover F q m . Proof.

Its lines have the following complexities: • Line 1: d ( C ) i ∈ F q m and if F q m elements are represented in the normal basis generated by β ,the B ( C ) i , j ’s are already the representation of d ( C ) i , and thus no computation is needed. • Lines 2 and 3 calculate MSPs whose cost is in

MSP q m ( n ) ⊆ O ( n max { log (3) , min { ω + , . }} log( n )). • The cost of Line 4 is negligible. • Line 5 ﬁnds the interpolation polynomial of n point tuples, implying a cost of I q m ( n ) ⊆O ( n max { log (3) , min { ω + , . }} log( n )). • Line 6 requires three multiplications of linearized polynomials of degree ≤ n plus themodulo operation which requires O ( m ) ⊆ O ( n ) additions because x [ m ] − x has only twonon-zero coe ﬃ cients. Hence, its complexity lies in O ( M q m ( n )) ⊆ O ( n min { ω + , . } ). • Line 7 has complexity O ( n max { log (3) , min { ω + , . }} log ( n )) by Corollary 11. • Lines 8 and 9 compute a multiplication of polynomials of degree ≤ n and a left division,yielding a complexity of O ( D q m ( n )) ⊆ O ( n min { ω + , . } log( n )).Thus, using the results of Section 3, the overall complexity is as stated. (cid:3) lgorithm 6: DecodeErrorErasureGabidulin (cid:0) r , { g , g , . . . , g n } , a ( R ) , B ( C ) (cid:1) Input:

Received word r ∈ F nq m , g i = β [ i − ∈ F q m , i = , . . . , n , normal basis of F q m over F q , a ( R ) = (cid:2) a ( R )1 a ( R )2 . . . a ( R ) (cid:37) (cid:3) ∈ F (cid:37) q m ; B ( C ) = (cid:2) B i , j (cid:3) i ∈ [1 ,γ ] j ∈ [1 , n ] ∈ F γ × nq Output:

Estimated evaluation polynomial f with deg q f < k or “decoding failure”. d ( C ) i ← (cid:80) nj = B ( C ) i , j β [ j − for all i = , . . . , γ // negligible Γ ( C ) ← M (cid:104) d ( C )1 , d ( C )2 ,..., d ( C ) γ (cid:105) // MSP q m ( γ ) ⊆ MSP q m ( n ) Λ ( R ) ← M (cid:104) a ( R )1 , a ( R )2 ,..., a ( R ) (cid:37) (cid:105) // MSP q m ( (cid:37) ) ⊆ MSP q m ( n ) Γ ( C ) ← (cid:80) m − i = Γ ( C ) i x [ i ] with Γ ( C ) i : = Γ ( C )[ i ] − i mod m for all i = , . . . , m // O ( m ) ⊆ O ( n ) ˆ r ← I { ( g i , r i ) } ni = // I q m ( n ) ˆ y ← Λ ( R ) · ˆ r · Γ ( C ) · x [ γ ] mod ( x [ m ] − x ) // D q m ( n ) [ r out , u out , v out ] ← HalfLEEA (cid:16) x [ m ] − x , ˆ y , (cid:106) n + k + (cid:37) + γ (cid:107)(cid:17) // M q m ( n ) log ( n ) [ χ L , (cid:37) L ] ← LeftDiv (cid:16) r out , u out · Λ ( R ) (cid:17) // D q m ( n ) [ χ R , (cid:37) R ] ← RightDiv (cid:16) χ L , Γ ( C ) · x [ γ ] mod ( x [ m ] − x ) (cid:17) // D q m ( n ) if (cid:37) L = and (cid:37) R = then return f ← χ L else return “decoding failure” Remark 27.

In both Algorithm 5 and 6, the involved polynomials have q -degree at most n ≤ m .Hence, the new algorithms in this paper are asymptotically faster than the ones from (Caruso andLe Borgne, 2017) in this case.

6. Conclusion