[PDF] Fast transforms over finite fields of characteristic two

Abstract

An additive fast Fourier transform over a finite field of characteristic two efficiently evaluates polynomials at every element of an F 2 -linear subspace of the field. We view these transforms as performing a change of basis from the monomial basis to the associated Lagrange basis, and consider the problem of performing the various conversions between these two bases, the associated Newton basis, and the '' novel '' basis of Lin, Chung and Han (FOCS 2014). Existing algorithms are divided between two families, those designed for arbitrary subspaces and more efficient algorithms designed for specially constructed subspaces of fields with degree equal to a power of two. We generalise techniques from both families to provide new conversion algorithms that may be applied to arbitrary subspaces, but which benefit equally from the specially constructed subspaces. We then construct subspaces of fields with smooth degree for which our algorithms provide better performance than existing algorithms.

Full PDF

FFAST TRANSFORMS OVER FINITE FIELDS OFCHARACTERISTIC TWO

NICHOLAS COXON

Abstract.

An additive fast Fourier transform over a ﬁnite ﬁeld of charac-teristic two eﬃciently evaluates polynomials at every element of an F -linearsubspace of the ﬁeld. We view these transforms as performing a change ofbasis from the monomial basis to the associated Lagrange basis, and con-sider the problem of performing the various conversions between these twobases, the associated Newton basis, and the “novel” basis of Lin, Chung andHan (FOCS 2014). Existing algorithms are divided between two families, thosedesigned for arbitrary subspaces and more eﬃcient algorithms designed for spe-cially constructed subspaces of ﬁelds with degree equal to a power of two. Wegeneralise techniques from both families to provide new conversion algorithmsthat may be applied to arbitrary subspaces, but which beneﬁt equally fromthe specially constructed subspaces. We then construct subspaces of ﬁeldswith smooth degree for which our algorithms provide better performance thanexisting algorithms. Introduction

Let F be a ﬁnite ﬁeld of characteristic two, and W = { ω , . . . , ω n − } be an n -dimensional F -linear subspace of F . Deﬁne polynomials L i = n − (cid:89) j =0 j (cid:54) = i x − ω j ω i − ω j , N i = i − (cid:89) j =0 x − ω j ω i − ω j and X i = n − (cid:89) k =0 2 k [ i ] k − (cid:89) j =0 x − ω j ω k [ i ] k − ω j for i ∈ { , . . . , n − } , where [ · ] k : N → { , } for k ∈ N such that i = (cid:88) k ∈ N k [ i ] k for i ∈ N . Let F [ x ] (cid:96) denote the space of polynomials with coeﬃcients in F and degree lessthan (cid:96) . Then { L , . . . , L n − } is the Lagrange basis of F [ x ] n associated with theenumeration of W . Similarly, { N , . . . , N n − } is the associated Newton basis, nor-malised so that N i ( ω i ) = 1. The deﬁnition of the functions [ · ] k implies that eachof the polynomials X i has degree equal to i . Thus, { X , . . . , X n − } is also a basisof F [ x ] n . This unusual basis was introduced by Lin, Chung and Han in 2014 [21].Consequently, we refer to it as the Lin–Chung–Han basis, or simply the LCH ba-sis. In this paper, we describe new fast algorithms for converting between theLagrange, (normalised) Newton, Lin–Chung–Han and monomial bases for speciallyconstructed subspaces. Date : July 23, 2018.2010

Mathematics Subject Classiﬁcation.

Primary 68W30, 68W40, 12Y05.This work was supported by Nokia in the framework of the common laboratory between NokiaBell Labs and INRIA. a r X i v : . [ c s . S C ] J u l NICHOLAS COXON

Converting to the Lagrange basis from one of the three remaining bases cor-responds to evaluating a polynomial at each element in W . An algorithm thateﬃciently performs this evaluation for polynomials written on the monomial ba-sis is referred to as an additive fast Fourier transform (FFT). The designation as“additive” reﬂects the fact that a fast Fourier transform traditionally evaluatespolynomials at each element of a cyclic group. To avoid confusion, we refer to suchalgorithms as multiplicative FFTs hereafter. Additive FFTs have been investigatedas an alternative to multiplicative FFTs for use in fast multiplication algorithms forbinary polynomials [28, 6, 23, 8, 9, 18], and have also found applications in codingtheory and cryptography [4, 3, 10, 1].Additive FFTs ﬁrst appeared in the 1980s with the of algorithm of Wang andZhu [29], which was subsequently rediscovered by Cantor [7]. Applied to character-istic two ﬁnite ﬁelds, the Wang–Zhu–Cantor algorithm is restricted to extensionsof degree equal to a power of two. This restriction is removed by the algorithm ofvon zur Gathen and Gerhard [28], but at the cost of a higher algebraic complexity.For these and subsequent algorithms [14, 4, 3], the subspace W is described by anordered basis β = ( β , . . . , β n − ), which also deﬁnes the enumeration of the space by(1.1) ω i = n − (cid:88) k =0 [ i ] k β k for i ∈ { , . . . , n − } . For an arbitrary choice of basis, the algorithm of von zur Gathen and Gerhardperforms O ( (cid:96) log (cid:96) ) additions and multiplications, where (cid:96) = | W | = 2 n . Thesubsequent algorithm of Gao and Mateer [14] achieves the same additive complexitywhile performing only O ( (cid:96) log (cid:96) ) multiplications.For extensions with degree equal to a power of two, faster algorithms are obtainedthrough a special choice of subspace and its basis. The deﬁning property of thesespaces is the existence of a Cantor basis, i.e., a basis β = ( β , . . . , β n − ) such that(1.2) β = 1 and β i = β i +1 − β i +1 for i ∈ { , . . . , n − } . For subspaces represented by a Cantor basis, the Wang–Zhu–Cantor algorithm per-forms O ( (cid:96) log log (cid:96) ) additions and O ( (cid:96) log (cid:96) ) multiplications. Gao and Mateer [14]also contribute to this special case by providing an algorithm that achieves the samemultiplicative complexity while performing only O ( (cid:96) (log (cid:96) ) log log (cid:96) ) additions.For the same subspace enumeration used in the additive FFTs, Lin, Chungand Han [21] provide algorithms for converting between the Lagrange and LCHbases that perform O ( (cid:96) log (cid:96) ) additions and multiplications. Lin, Chung and Hanuse their basis and conversion algorithms to provide fast encoding and decodingalgorithms for Reed–Solomon codes. This application is further explored in thesubsequent work of Lin, Al-Naﬀouri and Han [19] and Lin, Al-Naﬀouri, Han andChung [20], while Ben-Sasson et al. [2] utilise the conversion algorithms within theirzero-knowledge proof system. Lin, Al-Naﬀouri, Han and Chung [20] additionallyconsider the problem of converting between the LCH and monomial bases. Forsubspaces represented by an arbitrary choice of basis, they provide algorithms forconverting between the two bases that perform O ( (cid:96) log (cid:96) ) additions and O ( (cid:96) log (cid:96) )multiplications. For subspaces represented by a Cantor basis they provide algo-rithms that require only O ( (cid:96) (log (cid:96) ) log log (cid:96) ) additions and perform no multiplica-tions. The techniques used in both cases, as well as for the algorithms of Lin, AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 3

Chung and Han, originate in the work of Gao and Mateer. This relationship be-comes apparent when combining the algorithms to obtain additive FFTs, since oneessentially obtains the algorithms of Gao and Mateer.The techniques developed for additive FFTs have yet to be applied to conver-sions involving the Newton basis. In the realm of multiplicative FFTs, one has thealgorithms of van der Hoeven and Schost [26], which convert between the monomialbasis and the Newton basis associated with the radix-2 truncated Fourier transformpoints [25, 24]. Fast conversion between the two basis is a necessary requirement ofmultivariate evaluation and interpolation algorithms [26, 11] and their applicationto systematic encoding of Reed–Muller and multiplicity codes [11]. For applicationsin coding theory, characteristic two ﬁnite ﬁelds are particularly interesting due totheir fast arithmetic. However, the algorithms of van der Hoeven and Schost arenot suited to such ﬁelds as they require the existence of roots of unity with orderequal to a power of two. It is likely that this problem may be partially overcomeby generalising their algorithm in a manner analogous to the generalisation of theradix-2 truncated Fourier transform [25, 24] to mixed radices by Larrieu [17]. De-veloping an algorithm based on the ideas of additive FFTs provides a second andmore widely applicable solution to the problem.In this paper, we describe new fast conversion algorithms between the Lin–Chung–Han basis and each of the Newton, Lagrange and monomial bases. Thesealgorithms may in-turn be combined to obtain fast conversions algorithms betweenany two of the four bases. We once again represent subspaces by ordered bases,and use (1.1) for their enumeration.In Section 2, we show that if the deﬁning basis β = ( β , . . . , β n − ) has dimensiongreater than one and satisﬁes(1.3) 1 , β β , . . . , β d − β ∈ F d for some d ∈ { , . . . , n − } , then each of the three conversions problems over thesubspace generated by β may be eﬃciently reduced to instances of the problem overthe subspaces generated by α = ( β , . . . , β d − ) and some vector δ ∈ F n − d . One mayalways take d equal to one, allowing the reductions to be applied regardless of thechoice of β . Consequently, fast conversions algorithms are obtained by recursivelysolving the smaller problems admitted by the reduction, and directly solving theproblems for the base case of n = 1.Our basis conversion algorithms are described in Section 3. Over the subspacegenerated by an n -dimensional basis β , the algorithms take as input the ﬁrst (cid:96) coeﬃcients on the input basis of a polynomial in F [ x ] (cid:96) ⊆ F [ x ] n . The algorithmsthen return the ﬁrst (cid:96) coeﬃcients of the polynomial on the desired output basis.For conversion between the Lagrange and LCH bases, the ﬁrst (cid:96) Lagrange basispolynomials do not form a basis of F [ x ] (cid:96) for (cid:96) < n . Consequently, we embedthese cases in the larger case of (cid:96) = 2 n , after-which we disregard unnecessary partsof the resulting computations so as not to incur a large penalty in complexity.This approach results in what is known as “pruned” or “truncated” algorithmsin the literature on fast Fourier transforms. While truncated additive FFTs havebeen previously investigated [28, 23, 6, 4, 3, 8, 9], our algorithms for convertingbetween the Lagrange and LCH bases are obtained as analogues of Harvey’s “cache-friendly” truncated multiplicative FFTs [15]. As a consequence of this approach, thealgorithms in fact solve slightly more general problems than those just described, NICHOLAS COXON allowing us to in-turn provide the slightly generalised additive FFTs required bythe fast Hermite interpolation and evaluation algorithms of Coxon [12].Table 1 provides bounds on the number of additions and multiplications per-formed by our algorithms for conversion in either direction between the LCH basisand each of the three remaining bases. These bounds omit the cost of a small pre-computation requiring O ( n ) ﬁeld operations. The table also provides bounds onthe number of ﬁeld elements that are required to be stored in auxiliary space by thealgorithms. The bound for conversion with the Newton basis is new. For conversionwith the Lagrange basis, we only have the algorithm of Lin, Chung and Han [21]to compare with, and only for the case (cid:96) = 2 n . Their algorithm performs feweradditions in this case, but only after a much larger precomputation, of unanalysedcomplexity, that stores 2 n − (cid:96) = 2 n . Our algorithms perform the same number ofadditions as their algorithms in this case, while performing fewer multiplications.Basis Additions Multiplications Auxiliary spaceNewton (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1 (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101) O ( n )Lagrange (cid:98) (cid:96)/ (cid:99) (3 (cid:100) log (cid:96) (cid:101) + 1) (cid:98) (cid:96)/ (cid:99) ( (cid:100) log (cid:96) (cid:101) + 1) 2 n − (cid:96) + O ( n )Monomial (cid:98) (cid:96)/ (cid:99) (cid:0) (cid:100) log (cid:96) (cid:101) (cid:1) (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101) + 1 O ( n ) Table 1.

Algebraic and space complexities.The cost of the reductions used in our algorithms reduce with the size of thevalue d for which they are applied. For an arbitrary choice of the basis β , thecondition (1.3) may only ever be satisﬁed by d equal to one. This will also bethe case if F is the only subﬁeld of F with degree less than n . The bounds inTable 1 describe the complexity of our algorithms for this case, and thus representtheir worst-case complexities. When the ﬁeld has degree equal to a power of twoand β is a Cantor basis, the condition (1.3) is satisﬁed by d equal to any power oftwo less than n . Moreover, δ = ( β , . . . , β n − d − ) for all such values of d , so thatthe recursive cases are themselves represented by Cantor bases. Consequently, itis possible to always take d to be the largest power of two less than n . With thisstrategy, the algorithms for converting between the LCH and Newton bases performonly (3 (cid:96) − (cid:100) log (cid:96) (cid:101) / (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101)(cid:100) log log max( (cid:96), (cid:101) additions. Moreover, for conversionsbetween the LCH and monomial bases, all multiplications in the algorithms becomemultiplications by one, allowing them to be eliminated altogether. In this case, thealgorithms reduce to those of Lin et al. [20].While the beneﬁts of using a Cantor basis are clear, F will only admit a Cantorbasis of a given dimension if its degree is divisible by a suﬃciently large power oftwo. In Section 4, we propose new basis constructions that provide beneﬁts similarto those aﬀorded by Cantor bases when the degree of the ﬁeld contains a suﬃcientlylarge smooth factor, i.e., one that factors into a product of small primes. Such afactor ensures the presence of a tower of subﬁelds, which we use in Section 4.1 AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 5 to construct bases that reduce the number of ﬁeld operations performed by thealgorithms of Section 3 by allowing their reductions to be applied more frequentlywith values of d greater than one. For towers containing quadratic extensions, weadditionally show in Section 4.2 how to leverage freedom in the construction inorder to eliminate some multiplications in the algorithms for conversion betweenthe monomial and LCH bases, echoing the reduction in multiplications obtained forCantor bases. Finally, in Section 4.3, we show how to take advantage of quadraticextensions in a diﬀerent manner by generalising the construction of Cantor bases.2. Preliminaries

For β = ( β , . . . , β n − ) ∈ F n , deﬁne ω β,i = n − (cid:88) k =0 [ i ] k β k for i ∈ { , . . . , n − } . If the entries of β are linearly independent over F , then let L β,i , N β,i and X β,i respectively denote the i th Lagrange, Newton and LCH basis polynomials associ-ated with the enumeration { ω β, , . . . , ω β, n − } of the subspace it generates. Thatis, deﬁne L β,i = n − (cid:89) j =0 j (cid:54) = i x − ω β,j ω β,i − ω β,j , N β,i = i − (cid:89) j =0 x − ω β,j ω β,i − ω β,j and X β,i = n − (cid:89) k =0 2 k [ i ] k − (cid:89) j =0 x − ω β,j ω β, k [ i ] k − ω β,j for i ∈ { , . . . , n − } .2.1. Factorisations of basis polynomials.

The following lemma provides fac-torisations of the basis polynomials associated with a vector β ∈ F n . These factori-sations in-turn provide the reductions employed in our basis conversion algorithms. Lemma 2.1.

Let n ≥ and β = ( β , . . . , β n − ) ∈ F n have entries that are linearlyindependent over F . For some d ∈ { , . . . , n − } such that β i /β ∈ F d for i ∈ { , . . . , d − } , set α = ( β , . . . , β d − ) , γ = ( β d , . . . , β n − ) and δ = (cid:32)(cid:18) β d β (cid:19) d − β d β , . . . , (cid:18) β n − β (cid:19) d − β n − β (cid:33) . Then δ has entries that are linearly independent over F , and L β, d i + j = L δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33) L α,j ( x − ω γ,i ) ,N β, d i + j = N δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33) N α,j ( x − ω γ,i ) ,X β, d i + j = X δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33) X α,j ( x ) for i ∈ { , . . . , n − d − } and j ∈ { , . . . , d − } . NICHOLAS COXON

Proof.

Let n ≥ β = ( β , . . . , β n − ) ∈ F n have entries that are linearlyindependent over F . Deﬁne α , γ and δ as per the lemma for some d ∈ { , . . . , n − } such that β i /β ∈ F d for i ∈ { , . . . , d − } . Then(2.1) ω β, d i + j = d − (cid:88) k =0 [ j ] k β k + n − d − (cid:88) k =0 [ i ] k β d + k = ω α,j + ω γ,i for i ∈ { , . . . , n − d − } and j ∈ { , . . . , d − } . The choice of d implies that (cid:18) β k β (cid:19) d − β k β = (cid:89) ω ∈ F d β k β − ω = 0 for k ∈ { , . . . , d − } . Thus,(2.2) (cid:18) ω β, d i + j β (cid:19) d − ω β, d i + j β = n − d − (cid:88) k =0 [ i ] k (cid:32)(cid:18) β d + k β (cid:19) d − β d + k β (cid:33) = ω δ,i for i ∈ { , . . . , n − d − } and j ∈ { , . . . , d − } . It follows that(2.3) d − (cid:89) j =0 x − ω β, d i + j = β d (cid:32)(cid:18) xβ (cid:19) d − (cid:18) xβ (cid:19) − ω δ,i (cid:33) for i ∈ { , . . . , n − d − } , since the polynomials on either side of the equation aremonic and split over F with identical roots. As the entries of β are linearly inde-pendent over F , comparing roots shows that the polynomials on the left-hand sideof (2.3) are distinct for diﬀerent values of i ∈ { , . . . , n − d − } . Consequently, com-paring the polynomials on the right-hand side of the equation shows that ω δ,i (cid:54) = ω δ,j for distinct i, j ∈ { , . . . , n − d − } . Thus, the entries of δ are linearly independentover F .Collecting terms and substituting in (2.1), (2.2) and (2.3) shows that L β, d i + j =  n − d − (cid:89) s =0 s (cid:54) = i d − (cid:89) t =0 x − ω β, d s + t ω β, d i + j − ω β, d s + t  d − (cid:89) t =0 t (cid:54) = j x − ω β, d i + t ω β, d i + j − ω β, d i + t  =  n − d − (cid:89) s =0 s (cid:54) = i ( x/β ) d − ( x/β ) − ω δ,s ω δ,i − ω δ,s  d − (cid:89) t =0 t (cid:54) = j x − ω γ,i − ω α,t ω γ,i + ω α,j − ω γ,i − ω α,t  =  n − d − (cid:89) s =0 s (cid:54) = i ( x/β ) d − ( x/β ) − ω δ,s ω δ,i − ω δ,s  d − (cid:89) t =0 t (cid:54) = j x − ω γ,i − ω α,t ω α,j − ω α,t  = L δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33) L α,j ( x − ω γ,i )for i ∈ { , . . . , n − d − } and j ∈ { , . . . , d − } . AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 7

Similarly, N β, d i + j =  i − (cid:89) s =0 2 d − (cid:89) t =0 x − ω β, d s + t ω β, d i + j − ω β, d s + t (cid:32) j − (cid:89) t =0 x − ω β, d i + t ω β, d i + j − ω β, d i + t (cid:33) = (cid:32) i − (cid:89) s =0 ( x/β ) d − ( x/β ) − ω δ,s ω δ,i − ω δ,s (cid:33)(cid:32) j − (cid:89) t =0 x − ω γ,i − ω α,t ω α,j − ω α,t (cid:33) = N δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33) N α,j ( x − ω γ,i )for i ∈ { , . . . , n − d − } and j ∈ { , . . . , d − } . It follows immediately from thedeﬁnition of the Newton and LCH bases that X β,i = n − (cid:89) k =0 N β, k [ i ] k for i ∈ { , . . . , n − } . Hence, X β, d i + j = (cid:32) n − d − (cid:89) k =0 N β, k + d [ i ] k (cid:33)(cid:32) d − (cid:89) k =0 N β, k [ j ] k (cid:33) = (cid:32) n − d − (cid:89) k =0 N δ, k [ i ] k (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33)(cid:33)(cid:32) d − (cid:89) k =0 N α, k [ j ] k ( x − ω γ, ) (cid:33) = X δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33) X α,j ( x )for i ∈ { , . . . , n − d − } and j ∈ { , . . . , d − } . (cid:3) To illustrate how we can utilise Lemma 2.1, let us consider the problem of con-verting polynomials from the Lagrange basis to the LCH basis. The factorisationof the Lagrange basis polynomials provided by the lemma includes a shift of vari-ables for one factor. Consequently, a recursive approach is facilitated by consideringthe more general problem of converting from a basis of shifted Lagrange polyno-mials to the LCH basis. An instance of this new problem is deﬁned by a vector β = ( β , . . . , β n − ) ∈ F n with linearly independent entries over F , a shift pa-rameter λ ∈ F , and coeﬃcients f , . . . , f n − ∈ F . The goal is then to compute h , . . . , h n − ∈ F such that(2.4) n − (cid:88) i =0 h i X β,i = n − (cid:88) i =0 f i L β,i ( x − λ ) . If n = 1, then L β, = xβ + 1 , L β, = xβ , X β, = 1 and X β, = xβ . Thus, one can simply compute h = f − ( λ/β )( f + f ) and h = f + f . If n ≥

2, then the following consequence of Lemma 2.1 decomposes the length 2 n problem into 2 n − d problems of length 2 d , and 2 d problems of length 2 n − d , for d ∈ { , . . . , n − } such that β i /β ∈ F d for i ∈ { , . . . , d − } . After choosing such NICHOLAS COXON a value of d , for which one always has the possibility of taking d = 1, the smallerinstances of the problem admitted by the decomposition can be solved recursively. Lemma 2.2.

Suppose that β = ( β , . . . , β n − ) ∈ F n has n ≥ linearly indepen-dent entries over F , and let α , γ and δ be deﬁned as per Lemma 2.1 for some d ∈ { , . . . , n − } such that β i /β ∈ F d for i ∈ { , . . . , d − } . Suppose that f , . . . , f n − , λ, g , . . . , g n − , h , . . . , h n − ∈ F satisfy (2.5) d − (cid:88) j =0 g d i + j X α,j = d − (cid:88) j =0 f d i + j L α,j ( x − λ − ω γ,i ) for i ∈ { , . . . , n − d − } , and (2.6) n − d − (cid:88) i =0 h d i + j X δ,i = n − d − (cid:88) i =0 g d i + j L δ,i ( x − η ) for j ∈ { , . . . , d − } , where η = ( λ/β ) d − ( λ/β ) . Then (2.4) holds.Proof. Suppose that β = ( β , . . . , β n − ) ∈ F n has n ≥ F , and let α , γ and δ be deﬁned as per Lemma 2.1 for some d ∈ { , . . . , n − } such that β i /β ∈ F d for i ∈ { , . . . , d − } . Suppose that f , . . . , f n − , λ , g , . . . , g n − , h , . . . , h n − ∈ F satisfy (2.5) and (2.6). Then Lemma 2.1 impliesthat n − (cid:88) i =0 f i L β,i ( x − λ ) = n − d − (cid:88) i =0  d − (cid:88) j =0 f d i + j L α,j ( x − λ − ω γ,i )  × L δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ − η (cid:33) , where η = ( λ/β ) d − ( λ/β ). By substituting in (2.5) and (2.6), it follows that n − (cid:88) i =0 f i L β,i ( x − λ ) = n − d − (cid:88) i =0  d − (cid:88) j =0 g d i + j X α,j ( x )  L δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ − η (cid:33) = d − (cid:88) j =0  n − d − (cid:88) i =0 g d i + j L δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ − η (cid:33) X α,j ( x )= d − (cid:88) j =0  n − d − (cid:88) i =0 h d i + j X δ,i (cid:32)(cid:18) xβ (cid:19) d − xβ (cid:33) X α,j ( x ) . Hence, Lemma 2.1 implies that (2.4) holds. (cid:3)

Reduction trees.

We use full binary trees to encode the values of d for whichthe reductions provided by Lemma 2.1 are applied when converting between two ofthe bases. Deﬁnition 2.3.

A full binary tree is a tree that contains a unique vertex of degreezero or two, while all other vertices have degree one or three. Given a full binarytree (

V, E ) with unique vertex r ∈ V of degree zero or two, we use the followingnomenclature: AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 9 • the vertex r is called the root of the tree, • a vertex of degree at most one is called a leaf, • a vertex of degree greater than one is called an internal vertex, • if v ∈ V \ { r } and v = v , v , . . . , v n = r is a path, then v is called a childof v , and a descendant of v i for i ∈ { , . . . , n } , • the set of leaves that are descended from or equal to v ∈ V is denoted L v , • the subtree of ( V, E ) rooted on v ∈ V is the full binary tree ( V (cid:48) , E (cid:48) ) suchthat V (cid:48) consists of v and all its descendants, and E (cid:48) consists of all edges { u, v } ∈ E such that u, v ∈ V (cid:48) . Example 2.4.

The graph shown in Figure 1 is a full binary tree with root vertex v ,internal vertices v , v , v and v , and leaves v , v , v , v and v . The vertex v haschildren v and v , while its descendants are v , v , v and v . Consequently, thesubtree rooted on v consists of those vertices and edges contained in the dotted box.Finally, L v = { v , v , v , v , v } , L v = { v , v , v } , L v = { v , v } , L v = { v , v } and L v i = { v i } for i ∈ { , , , , } . v v v v v v v v v Figure 1.

A full binary tree.Each internal vertex of a full binary tree has exactly two children. As it isnecessary for us to distinguish between the children of internal vertices in ouralgorithms, we assume that each full binary tree (

V, E ) comes equipped with apartition { E α , E δ } of E such that if v ∈ V has children v and v , then { v, v i } ∈ E α if and only if { v, v − i } ∈ E δ . Then for each internal vertex v ∈ V , we denote by v α the child of v such that { v, v α } ∈ E α , and by v δ the child of v such that { v, v δ } ∈ E δ .The partition additionally induces a vertex labelling d : V → N given by d ( v ) = (cid:40) v is a leaf vertex , | L v α | if v is an internal vertex . Example 2.5.

We obtain one such partition { E α , E δ } for the tree shown in Figure 1by letting E α contain the leftmost (as shown in the ﬁgure), and E δ the rightmost,of the two edges that connect each internal vertex with its children. Then d ( v ) = | L v | = 3, d ( v ) = | L v | = 2, d ( v ) = | L v | = 1, d ( v ) = | L v | = 1 and d ( v i ) = 0 for i ∈ { , , , , } .We use the vertex labelling to encode the values of d for which the reductionsprovided by Lemma 2.1 are applied. A reduction tree is a full binary tree thatfulﬁls this role for some subspace basis β ∈ F n . To allow us to formally deﬁne this notion, we now introduce maps that send β to each of the vectors α and δ deﬁnedin Lemma 2.1. For β = ( β , . . . , β n − ) ∈ F n with n ≥ F and d ∈ { , . . . , n − } , deﬁne α ( β, d ) = ( β , . . . , β d − ) and δ ( β, d ) = (cid:32)(cid:18) β d β (cid:19) d − β d β , . . . , (cid:18) β n − β (cid:19) d − β n − β (cid:33) . When d has the additional property that β i /β ∈ F d for i ∈ { , . . . , d − } ,Lemma 2.1 implies that the vector δ ( β, d ) has linearly independent entries over F . Deﬁnition 2.6.

Let β ∈ F n have entries that are linearly independent over F ,and ( V, E ) be a full binary tree with root vertex r ∈ V . Then ( V, E ) is a reductiontree for β if it has n leaves, and the following conditions hold if n > β i /β ∈ F d ( r ) for i ∈ { , . . . , d ( r ) − } ,(2) the subtree of ( V, E ) rooted on r α is a reduction tree for α ( β, d ( r )), and(3) the subtree of ( V, E ) rooted on r δ is a reduction tree for δ ( β, d ( r )).If v is an internal vertex of a full binary tree, then | L v α | = d ( v ) and | L v δ | = | L v | − | L v α | = | L v | − d ( v ) , since { L v α , L v δ } is a partition of L v . Therefore, given a basis vector β and one ofits reduction trees ( V, E ), there exists vectors β v = ( β v, , . . . , β v, | L v |− ) ∈ F | L v | for v ∈ V , each with linearly independent entries over F , such that β v, β v, , . . . , β v,d ( v ) − β v, ∈ F d ( v ) , α ( β v , d ( v )) = β v α and δ ( β v , d ( v )) = β v δ for all internal v ∈ V , and β v = β if v is the root of the tree. These vectors deﬁneinstances of each of the basis conversions problem over the subspaces they gener-ate. If v is an internal vertex, then Lemma 2.1 allows instances of the conversionsproblems over the subspace generated by β v to be reduced to instances over thesubspaces generated by β v α and β v δ . If v is in a leaf, then β v is 1-dimensional andthe corresponding conversion problems may be solved directly. Consequently, theexistence of a reduction tree allows the basis conversion problems to be recursivelysolved with recursion depth equal to that of the tree, i.e., equal to the length of thelongest path between its root vertex and one of its leaves.As the ﬁrst condition of Deﬁnition 2.6 is trivially satisﬁed if d ( r ) = 1, theexistence of a reduction tree for an arbitrarily chosen subspace basis is guaranteed. Proposition 2.7.

Let β ∈ F n have entries that are linearly independent over F .Then every full binary tree with n leaves and Im( d ) ⊆ { , } is a reduction tree for β . It is straightforward to prove Proposition 2.7 by induction on n , or to obtain theproposition as a consequence of Theorem 4.3 in Section 4.1. Consequently, we omitits proof. The choice of reduction trees provided by the proposition captures thestrategy used in recent algorithms [14, 21, 20] for an arbitrary choice of subspacebasis. Accordingly, we use such trees as our baseline for comparison. The reductionstrategy use by recent algorithms speciﬁc to Cantor bases [14, 20] is captured by re-duction trees such that d ( v ) = 2 (cid:100) log | L v |(cid:101)− for all internal vertices. We characterisethe reduction trees of Cantor bases in the following proposition. Proposition 2.8.

Suppose that β ∈ F n is a Cantor basis. Then a full binary tree isa reduction tree for β if and only if it has n leaves and Im( d ) ⊆ { , , , , . . . } . AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 11

We use the following properties of Cantor bases to prove Proposition 2.8.

Lemma 2.9 (Properties of Cantor bases) . Suppose that β = ( β , . . . , β n − ) ∈ F n is a Cantor basis. Then (1) β , . . . , β n − are linearly independent over F , (2) β , . . . , β i − ∈ F k if i ≤ k for some k ∈ N , and (3) β k i − β i = β i − k if i ≥ k for some k ∈ N . Lemma 2.9 is proved by Gao and Mateer [14, Apendix A], and also obtained asa special case of Lemma 4.13 in Section 4.3.

Proof of Proposition 2.8.

A full binary tree with one leaf has Im( d ) = { } . Thus,the proposition holds for n = 1, since a full binary tree is a reduction tree for a1-dimensional Cantor basis if and only if it has one leaf. Therefore, let n ≥ n . Suppose that β = ( β , . . . , β n − ) ∈ F n is a Cantor basis, and let ( V, E ) be a full binary tree withroot vertex r ∈ V . Then ( V, E ) is a reduction tree for β if and only if it has n leaves, β , . . . , β d ( r ) − ∈ F d ( r ) (recall that β = 1 for a Cantor basis), the subtree rootedon r α is a reduction tree for α ( β, d ( r )), and the subtree rooted on r δ is a reductiontree for δ ( β, d ( r )). If the tree has n leaves, then 1 ≤ d ( r ) < n and properties (1)and (2) of Lemma 2.9 imply that β , . . . , β d ( r ) − ∈ F d ( r ) if and only if d ( r ) is apower of two. In this case, property (3) of the lemma implies that α ( β, d ( r )) =( β , . . . , β d ( r ) − ) and δ ( β, d ( r )) = ( β , . . . , β n − d ( r ) − ). Finally, if the tree has n leaves, then the subtree rooted on r α has d ( r ) leaves, and the subtree rooted on r δ has | L r | − d ( r ) = n − d ( r ) leaves. Therefore, the induction hypothesis implies that( V, E ) is a reduction tree for β if and only if it has n leaves, d ( r ) is a power of two,and the subtrees rooted on r α and r δ both satisfy Im( d ) ⊆ { , , , , . . . } . Thatis, if and only if the tree has n leaves and Im( d ) ⊆ { , , , , . . . } . Hence, theproposition follows by induction. (cid:3) Conversion algorithms

In this section, we describe how to convert between the polynomial bases once β and a suitable reduction tree have been chosen. Accordingly, we now ﬁx a vector β = ( β , . . . , β n − ) ∈ F n that has linearly independent entries over F , and areduction tree ( V, E ) for β . The algorithms of this section are then presentedfor this arbitrary selection. Recall that we only provide algorithms for convertingbetween the LCH basis and each of the Newton, Lagrange and monomial bases. Thealgorithms that we propose for each of these problems require the precomputationof a small number of constants associated with the basis and the reduction tree.Consequently, we begin the section by discussing these precomputations and theircost.3.1. Precomputations.

For the remainder of the section, we use the shorthandnotations d v = d ( v ) and n v = | L v | for v ∈ V . Deﬁne β v = ( β v, , . . . , β v,n v − ) ∈ F n v for v ∈ V recursively as follows: if v is the root of the tree, then β v = β ; if v is aninternal vertex, then β v α = α ( β v , d v ) and β v δ = δ ( β v , d v ). Then Deﬁnition 2.6 andLemma 2.1 imply that the vectors β v for v ∈ V each have entries that are linearlyindependent over F . Finally, let γ v = ( β v,d v , . . . , β v,n v − ) for all internal v ∈ V . Given β v for some internal v ∈ V , repeated squaring allows δ ( β v , d v ) to becomputed with ( n v − d v )( d v + 1) multiplications and n v − d v additions. If r ∈ V isthe root vertex of the tree, then V \ L r is the set of internal vertices in V , and(3.1) (cid:88) v ∈ V \ L r d v ( n v − d v ) = (cid:88) v ∈ V \ L r | L v α || L v δ | = (cid:18) | L r | (cid:19) = (cid:18) n (cid:19) . Thus, the vectors β v for v ∈ V can be computed with O ( n ) ﬁeld operations. Thealgorithms we propose for converting between the monomial and LCH bases onlyrequire the precomputation and storage of β v δ , or 1 /β v δ , for all internal v ∈ V . Asa full binary tree with n leaves has n − O ( n ) ﬁeld elements, which can be computed with O ( n ) ﬁeld operations.For conversions involving the Lagrange or Newton bases, we generalise the prob-lems to include a shift parameter, as we did at the end of Section 2.1 for Lagrangeto LCH conversion. As a result, we require additional machinery to allow compu-tations related to the shift parameter to be handled in a time and space-friendlymanner. Deﬁne maps ϕ v : L v × F → F for v ∈ V recursively as follows: for u ∈ L v and λ ∈ F , ϕ v ( u, λ ) =  λ/β v, if u = v,ϕ v α ( u, λ ) if u (cid:54) = v and u ∈ L v α ,ϕ v δ (cid:16) u, ( λ/β v, ) dv − λ/β v, (cid:17) if u (cid:54) = v and u ∈ L v δ . For internal v ∈ V , deﬁne σ v,i = i (cid:88) j =0 β v,d v + j for i ∈ { , . . . , n v − d v − } . Then the algorithms we propose for conversions involving the Newton or Lagrangebases require the precomputation and storage of ϕ v ( u, σ v, ) , . . . , ϕ v ( u, σ v,n v − d v − )for all internal v ∈ V and u ∈ L v α . The identity (3.1) implies that n ( n − / O ( n ) ﬁeld operations. Remark . If β is a Cantor basis, then Proposition 2.8 and property (3) ofLemma 2.9 imply that β v = ( β , . . . , β n v − ) for v ∈ V . It follows that β v δ , = β = 1 for all internal v ∈ V . Thus, no precomputations are required for con-verting between the LCH and monomial bases. For v ∈ V and u ∈ L v , the map ϕ v ( u, · ) : F → F is F -linear, while Proposition 2.8 and properties (2) and (3) ofLemma 2.9 imply that ϕ v ( u, β i ) =  β i if u = v,ϕ v α ( u, β i ) if u (cid:54) = v and u ∈ L v α ,ϕ v δ ( u, β i − d v ) if u (cid:54) = v , u ∈ L v δ and i ≥ d v , u (cid:54) = v , u ∈ L v δ and i < d v , for i ∈ { , . . . , n − } . Consequently, the precomputations for conversions involvingthe Newton or Lagrange bases require fewer operations. AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 13

Conversion between the Newton and Lin–Chung–Han bases.

The fac-torisations of the Newton basis polynomials provided by Lemma 2.1 include a shiftof variables for one factor. Consequently, as we did for Lagrange basis to LCH basisconversion, we consider the more general problem of converting between a basis ofshifted Newton polynomials and the LCH basis. Our conversions algorithms arethen based on the following analogue of Lemma 2.2.

Lemma 3.2.

Let v ∈ V be an internal vertex and (cid:96) ∈ { , . . . , n v } . Suppose that f , . . . , f (cid:96) − , λ, g , . . . , g (cid:96) − , h , . . . , h (cid:96) − ∈ F satisfy (3.2) min ( (cid:96) − dv i, dv ) − (cid:88) j =0 g dv i + j X β vα ,j = min ( (cid:96) − dv i, dv ) − (cid:88) j =0 f dv i + j N β vα ,j ( x − λ − ω γ v ,i ) for i ∈ { , . . . , (cid:6) (cid:96)/ d v (cid:7) − } , and (3.3) (cid:100) ( (cid:96) − j ) / dv (cid:101) − (cid:88) i =0 h dv i + j X β vδ ,i = (cid:100) ( (cid:96) − j ) / dv (cid:101) − (cid:88) i =0 g dv i + j N β vδ ,i ( x − η ) for j ∈ { , . . . , min(2 d v , (cid:96) ) − } , where η = ( λ/β v, ) dv − λ/β v, . Then (3.4) (cid:96) − (cid:88) i =0 h i X β v ,i = (cid:96) − (cid:88) i =0 f i N β v ,i ( x − λ ) . The proof of Lemma 3.2 is omitted since it follows along similar lines to that ofLemma 2.2. Using the lemma, we obtain Algorithms 1 and 2 for conversion betweenthe Newton and LCH bases. The algorithms each operate on a vector ( a , . . . , a (cid:96) − )of ﬁeld elements that initially contains the coeﬃcients of a polynomial on the inputbasis, and overwrite its entries with the coeﬃcients of the polynomial on the outputbasis. The subvectors of this vector that appear in the algorithms would be rep-resented in practice by auxiliary variables, e.g., by a pointer to their ﬁrst elementand a stride parameter. The map ∆ : N → N that appears in the algorithms isdeﬁned by i (cid:55)→ min { k ∈ N | [ i ] k = 0 } . By noting that ∆(0) , ∆(1) , . . . , ∆(2 k −

2) isthe transition sequence of the k -bit binary reﬂected Gray code, it is possible to suc-cessively compute the terms of the sequence at the cost of a small constant numberof operations per element by the algorithm of Bitner, Ehrlich and Reingold [5] (seealso [16]). Theorem 3.3.

Algorithms 1 and 2 are correct.Proof.

We only prove correctness for Algorithm 1, since the proof for Algorithm 2is almost identical. Suppose that the input vertex v is a leaf. Then (cid:96) ∈ { , } since n v = 1. Moreover, L v = { v } , ϕ v ( v, λ ) = λ/β v, , X β v , = N β v , = 1 and X β v , = N β v , = x/β v, . It follows that Algorithm 1 produces the correct outputwhenever the input vertex is a leaf. Therefore, as ( V, E ) is a full binary tree, itis suﬃcient to show that for internal v ∈ V , if the algorithm produces the correctoutput whenever v α or v δ is given as an input, then the algorithm produces thecorrect output whenever v is given as an input.Let v ∈ V be an internal vertex and suppose that the algorithm produces thecorrect output whenever v α or v δ is given as an input. Let λ ∈ F , (cid:96) ∈ { , . . . , n v } and f , . . . , f (cid:96) − ∈ F . Suppose that Algorithm 1 is called on v , ( ϕ v ( u, λ )) u ∈ L v and (cid:96) , Algorithm 1

N2X ( v, ( ϕ v ( u, λ )) u ∈ L v , (cid:96), ( a , . . . , a (cid:96) − )) Input: a vertex v ∈ V , the vector ( ϕ v ( u, λ )) u ∈ L v ∈ F n v for some λ ∈ F , (cid:96) ∈{ , . . . , n v } , and a i = f i ∈ F for i ∈ { , . . . , (cid:96) − } . Output: a i = h i ∈ F for i ∈ { , . . . , (cid:96) − } such that (3.4) holds. if v is a leaf then if (cid:96) = 2 then a ← a + ϕ v ( v, λ ) a return (cid:96) ← (cid:6) (cid:96)/ d v (cid:7) − (cid:96) ← (cid:96) − d v (cid:96) , (cid:96) (cid:48) ← min(2 d v , (cid:96) ) µ ← ( ϕ v ( u, λ )) u ∈ L vα , ν ← ( ϕ v ( u, λ )) u ∈ L vδ for i = 0 , . . . , (cid:96) − do N2X ( v α , µ, d v , ( a dv i , a dv i +1 , . . . , a dv ( i +1) − )) µ ← µ + ( ϕ v ( u, σ v, ∆( i ) )) u ∈ L vα N2X ( v α , µ, (cid:96) , ( a dv (cid:96) , a dv (cid:96) +1 , . . . , a (cid:96) − )) for j = 0 , . . . , (cid:96) − do N2X ( v δ , ν, (cid:96) + 1 , ( a j , a dv + j , . . . , a dv (cid:96) + j )) for j = (cid:96) , . . . , (cid:96) (cid:48) − do N2X ( v δ , ν, (cid:96) , ( a j , a dv + j , . . . , a dv ( (cid:96) − j )) Algorithm 2

X2N ( v, ( ϕ v ( u, λ )) u ∈ L v , (cid:96), ( a , . . . , a (cid:96) − )) Input: a vertex v ∈ V , the vector ( ϕ v ( u, λ )) u ∈ L v ∈ F n v for some λ ∈ F , (cid:96) ∈{ , . . . , n v } , and a i = h i ∈ F for i ∈ { , . . . , (cid:96) − } . Output: a i = f i ∈ F for i ∈ { , . . . , (cid:96) − } such that (3.4) holds. if v is a leaf then if (cid:96) = 2 then a ← a + ϕ v ( v, λ ) a return (cid:96) ← (cid:6) (cid:96)/ d v (cid:7) − (cid:96) ← (cid:96) − d v (cid:96) , (cid:96) (cid:48) ← min(2 d v , (cid:96) ) µ ← ( ϕ v ( u, λ )) u ∈ L vα , ν ← ( ϕ v ( u, λ )) u ∈ L vδ for j = 0 , . . . , (cid:96) − do X2N ( v δ , ν, (cid:96) + 1 , ( a j , a dv + j , . . . , a dv (cid:96) + j )) for j = (cid:96) , . . . , (cid:96) (cid:48) − do X2N ( v δ , ν, (cid:96) , ( a j , a dv + j , . . . , a dv ( (cid:96) − j )) for i = 0 , . . . , (cid:96) − do X2N ( v α , µ, d v , ( a dv i , a dv i +1 , . . . , a dv ( i +1) − )) µ ← µ + ( ϕ v ( u, σ v, ∆( i ) )) u ∈ L vα X2N ( v α , µ, (cid:96) , ( a dv (cid:96) , a dv (cid:96) +1 , . . . , a (cid:96) − ))with a i = f i for i ∈ { , . . . , (cid:96) − } . Then Lines 5 and 8 of the algorithm and the F -linearity of the maps ϕ v ( u, · ) : F → F , for u ∈ L v , ensure that µ =  ϕ v ( u, λ ) + i − (cid:88) j =0 ϕ v (cid:0) u, σ v, ∆( j ) (cid:1) u ∈ L vα =  ϕ v  u, λ + i − (cid:88) j =0 σ v, ∆( j )  u ∈ L vα . AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 15 each time the recursive call of Line 7 is performed. As γ v = ( β v,d v , . . . , β v,n v − d v − ),we have ω γ v , = 0 and ω γ v ,i = ω γ v ,i − + ∆( i − (cid:88) j =0 β v,d v + j = ω γ v ,i − + σ v, ∆( i − = i − (cid:88) j =0 σ v, ∆( j ) for i ∈ { , . . . , n v − d v − } . Thus, µ = ( ϕ v ( u, λ + ω γ v ,i )) u ∈ L vα = ( ϕ v α ( u, λ + ω γ v ,i )) u ∈ L vα each time the recursive call of Line 7 is performed. Similarly, µ = (cid:0) ϕ v α (cid:0) u, λ + ω γ v , (cid:100) (cid:96)/ dv (cid:101)− (cid:1)(cid:1) u ∈ L vα when the recursive call of Line 9 is performed. Therefore, the assumption thatAlgorithm 1 produces the correct output whenever v α is given as an input impliesthat Lines 4–9 set a i = g i for i ∈ { , . . . , (cid:96) − } , where g , . . . , g (cid:96) − are the uniqueelements in F such that (3.2) holds.The recursive calls of Lines 11 and 13 are made with ν = ( ϕ v ( u, λ )) u ∈ L vδ = ( ϕ v δ ( u, η )) u ∈ L vδ , where η = ( λ/β v, ) dv − λ/β v, . Therefore, the assumption that Algorithm 1 pro-duces the correct output whenever v δ is given as an input implies that Lines 10–13set a i = h i for i ∈ { , . . . , (cid:96) − } , where h , . . . , h (cid:96) − are the unique elements in F such that (3.3) holds. Hence, Lemma 3.2 implies that (3.4) holds at the end of thealgorithm. (cid:3) For conversions between the Newton and LCH bases the shift parameter λ isequal to zero for the initial calls to Algorithms 1 and 2. In this case, the vector( ϕ v ( u, λ )) u ∈ L v contains all zeros. If an application arises where it is necessary forthe initial call to be made with λ (cid:54) = 0, then the input vector can be computed with O ( n v ) ﬁeld operations. Storing the input vector and the vectors µ and ν that appearin the algorithms requires storing O ( n v ) ﬁeld elements. Including the computationand storage of the precomputed elements ϕ v ( u, σ v,i ), it follows that Algorithms 1and 2 require auxiliary storage for O ( n ) ﬁeld elements, while all precomputationscan be performed with O ( n ) ﬁeld operations.The number of multiplications performed by Algorithms 1 and 2 is independentof the choice of reduction tree. This is not true of the number of additions performedby the algorithms due to the updates made to the vector µ in the recursive case.Line 8 of Algorithm 1 and Line 12 of Algorithm 2 each perform ( (cid:6) (cid:96)/ d v (cid:7) − d v additions over all iterations of their containing loops. Consequently, we should aimto avoid small values of d v when choosing a reduction tree. Moreover, we expectthe number of additions performed by the algorithms to be maximised when thesubtree rooted on the initial input vertex satisﬁes Im( d ) ⊆ { , } . Example 3.4. If β is a Cantor basis, then Proposition 2.8 implies that d v ≤ (cid:100) log n v (cid:101)− for all internal v ∈ V , and guarantees the existence of a reductiontree for β such that equality always holds. For n up to ﬁfteen, we conﬁrmedexperimentally that this choice of reduction tree always minimises the number ofadditions performed by Algorithms 1 and 2 over all possible reduction trees, whilethose with Im( d ) ⊆ { , } were found to always maximise the number of additionsperformed by the algorithms. Figure 2 shows the maximum and minimum number . . . · · Polynomial length ( ‘ ) N u m b e r o f a dd i t i o n s o r m u l t i p li c a t i o n s Max. additionsMin. additionsMultiplications

Figure 2.

Maximum and minimum number of operations per-formed by Algorithms 1 and 2 for Cantor bases of dimension 15.of additions performed over all reduction trees for n = 15, as well as the number ofmultiplications performed for all trees.The diﬀerence between the maximum and minimum number of additions shownin Figure 2 is reﬂected in the bounds that we obtain on the complexities of thealgorithms. Theorem 3.5.

Algorithms 1 and 2 perform at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101) multiplicationsand (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1 additions in F . If d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V ,then the algorithms perform at most (3 (cid:96) − (cid:100) log (cid:96) (cid:101) / additions in F . We split the proof of Theorem 3.5 into three lemmas, one for each of the threebounds. It is clear that Algorithms 1 and 2 perform the same number of multipli-cations when given identical inputs. Consequently, we only prove the bounds forAlgorithm 1. The three bounds are equal to zero if (cid:96) = 1, and one if (cid:96) = 2. Thus,the bounds hold if the input vertex is a leaf. Therefore, for each of the three boundsit is suﬃcient to show that if v ∈ V is an internal vertex such that the bound holdswhenever the input vertex is v α or v δ , then the bound holds whenever v is the inputvertex. Lemma 3.6.

Algorithms 1 and 2 perform at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101) multiplications in F .Proof. Suppose that the input vertex v ∈ V to Algorithm 1 is an internal vertex suchthat the algorithm performs at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101) multiplications in F whenever v α or v δ is given as the input vertex. If (cid:96) = 0, then (cid:96) = (cid:96) (cid:48) = (cid:96) and the algorithmperforms at most (cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) (cid:101) + (cid:96) (cid:22) (cid:23) (cid:100) log (cid:101) = (cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) (cid:101) AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 17 multiplications. Therefore, suppose that (cid:96) (cid:54) = 0. Then, as (cid:96) ≤ d v , Lines 6–9 ofthe algorithm perform at most(3.5) 2 d v − (cid:96) d v + (cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) (cid:101) ≤ (cid:18) d v − (cid:96) + (cid:22) (cid:96) (cid:23)(cid:19) d v = (cid:22) (cid:96) (cid:23) d v multiplications. Let k ∈ N such that 2 k − < (cid:96) + 1 ≤ k . Then (cid:100) log (cid:96) + 1 (cid:101) = k and 2 d v + k − < d v ( (cid:96) + (cid:96) / d v ) = (cid:96) ≤ d v + k , since k ≥ < (cid:96) / d v ≤ (cid:100) log (cid:96) + 1 (cid:101) = (cid:100) log (cid:96) (cid:101) − d v . It follows that Lines 10–13 of the algorithmperform at most (cid:96) (cid:22) (cid:96) + 12 (cid:23) (cid:100) log (cid:96) + 1 (cid:101) + (cid:0) d v − (cid:96) (cid:1)(cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) (cid:101)≤ (cid:36) (cid:96) ( (cid:96) + 1) + (cid:0) d v − (cid:96) (cid:1) (cid:96) (cid:37) (cid:100) log (cid:96) + 1 (cid:101) = (cid:22) (cid:96) (cid:23) ( (cid:100) log (cid:96) (cid:101) − d v )multiplications. By combining this bound with (3.5), it follows that Algorithm 1performs at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101) multiplications if (cid:96) (cid:54) = 0. (cid:3) Lemma 3.7.

Algorithms 1 and 2 perform at most (cid:96) ( (cid:100) log (cid:96) (cid:101) − additions in F .Proof. Suppose that the input vertex v ∈ V to Algorithm 1 is an internal vertexsuch that the algorithm performs at most (cid:96) ( (cid:100) log (cid:96) (cid:101) − F whenever v α or v δ is given as the input vertex. If (cid:96) = 0, then (cid:96) = (cid:96) (cid:48) = (cid:96) and the algorithmperforms at most (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1 + (cid:96) (1( (cid:100) log (cid:101) −

1) + 1) = (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1additions. Therefore, suppose that (cid:96) (cid:54) = 0. Then Lines 6–9 of the algorithm performat most (cid:96) (cid:0) d v ( d v −

1) + 1 + d v (cid:1) + (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1= (cid:96) ( d v −

1) + 1 + (cid:96) ( d v + 1) + (cid:96) ( (cid:100) log (cid:96) (cid:101) − d v )additions, while Lines 10–13 perform at most (cid:96) ( (cid:96) + 1)( (cid:100) log (cid:96) + 1 (cid:101) −

1) + (cid:0) d v − (cid:96) (cid:1) (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 2 d v ≤ (cid:96) ( (cid:100) log (cid:96) + 1 (cid:101) −

1) + 2 d v = (cid:96) ( (cid:100) log (cid:96) (cid:101) − d v ) − (cid:96) + 2 d v additions. As 1 ≤ (cid:96) ≤ d v , it follows that Algorithm 1 performs at most (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1 + (cid:96) ( d v + 1) + (cid:96) ( (cid:100) log (cid:96) (cid:101) − d v ) − (cid:96) + 2 d v = (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1 − (cid:0) d v − d v − (cid:1) ( (cid:96) −

1) + d v + 1 + (cid:96) (cid:24) log (cid:96) d v +1 (cid:25) ≤ (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1 + d v + 1 − (cid:4) log d v +1 /(cid:96) (cid:5) d v +1 /(cid:96) d v +1 ≤ (cid:96) ( (cid:100) log (cid:96) (cid:101) −

1) + 1additions if (cid:96) (cid:54) = 0. (cid:3) Lemma 3.8.

Suppose that d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V . Then Algo-rithms 1 and 2 perform at most (3 (cid:96) − (cid:100) log (cid:96) (cid:101) / additions in F . Proof.

Suppose that d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V . Furthermore, supposethat the input vertex v ∈ V to Algorithm 1 is an internal vertex such that thealgorithm performs at most (3 (cid:96) − (cid:100) log (cid:96) (cid:101) / F whenever v α or v δ isgiven as the input vertex. If (cid:96) = 0, then (cid:96) = (cid:96) (cid:48) = (cid:96) and the algorithm performsat most 3 (cid:96) − (cid:100) log (cid:96) (cid:101) + (cid:96) (cid:18) − (cid:100) log (cid:101) (cid:19) = 3 (cid:96) − (cid:100) log (cid:96) (cid:101) . additions. Therefore, suppose that (cid:96) (cid:54) = 0. Then, as (cid:96) ≤ d v , Lines 6–9 of thealgorithm perform at most (cid:96) (cid:18) · d v − d v + d v (cid:19) + 3 (cid:96) − (cid:100) log (cid:96) (cid:101) ≤ (cid:96) − d v + (cid:96) d v additions, while Lines 10–13 perform at most (cid:96) (cid:96) + 1) − (cid:100) log (cid:96) + 1 (cid:101) + (cid:0) d v − (cid:96) (cid:1) (cid:96) − (cid:100) log (cid:96) (cid:101)≤ (cid:96) − d v +1 (cid:100) log (cid:96) + 1 (cid:101)≤ (cid:96) −

24 ( (cid:100) log (cid:96) (cid:101) − d v ) − d v −

12 log ( (cid:96) + 1)additions. It follows that Algorithm 1 performs at most3 (cid:96) − (cid:100) log (cid:96) (cid:101) + d v ( (cid:96) + 1) (cid:18) (cid:96) log ( (cid:96) + 1) − d v − d v (cid:19) ≤ (cid:96) − (cid:100) log (cid:96) (cid:101) additions if (cid:96) (cid:54) = 0, since (cid:96) + 1 ≤ n v − d v ≤ d v and the function x/ log ( x + 1) isincreasing for x ≥ (cid:3) Conversion from the Lagrange basis to the Lin–Chung–Han basis.

Our algorithms for converting between the Lagrange and LCH bases are directanalogues of Harvey’s cache-friendly truncated FFT and inverse truncated FFTalgorithms [15]. We align our presentation of the algorithms and their proofs ofcorrectness with their counterparts given by Harvey, and direct the reader to Har-vey’s paper for further motivation behind the algorithms. In this section, we focuson the problem of converting from the Lagrange basis to the LCH basis. The in-verse problem of converting from the LCH basis to the Lagrange basis is consideredseparately in the next section.We propose Algorithm 3 for converting from the Lagrange basis to the LCHbasis. Like the algorithms of Section 3.2, Algorithm 3 operates on a vector of ﬁeldelements whose initial entries are overwritten with the output. However, the lengthof the vector is determined by the input vertex rather than the parameter (cid:96) whichbounds the polynomial length. Consequently, the vector may have coordinates thatare never used to store input or output values, but which are still used for inter-mediate computations. If the parameter c is less than (cid:96) , then the algorithm diﬀerssubstantially from the other basis conversion algorithms in this paper by requiringthat the vector initially contain a combination of coeﬃcients from a polynomial’srepresentations on the input and output bases. If the parameter b is set equal toone, then the algorithm has the additional unique feature of being required to com-pute a coeﬃcient from the input basis representation. These two parameters arepart of the internal mechanics of the recursive approach used by the algorithm, and AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 19 in most practical applications, such as the multiplication of binary polynomials,the initial call to the algorithm is made with c = (cid:96) and b = 0. However, one maybe required to initially call the algorithm with c < (cid:96) if it used within the Hermiteinterpolation algorithm of Coxon [12].The reduction used by Algorithm 3 is provided by the following generalisationof Lemma 2.2, which is rephrased to reﬂect the fact that the algorithm takes in amixture of coeﬃcients from representations on the Lagrange and LCH bases. Lemma 3.9.

Let v ∈ V be an internal vertex and (cid:96) ∈ { , . . . , n v } . Suppose that f , . . . , f nv − , λ, h , . . . , h (cid:96) − ∈ F satisfy (3.6) (cid:96) − (cid:88) i =0 h i X β v ,i = nv − (cid:88) i =0 f i L β v ,i ( x − λ ) . Then there exist unique elements g , . . . , g nv − ∈ F such that (3.7) min ( dv ,(cid:96) ) − (cid:88) j =0 g dv i + j X β vα ,j = dv − (cid:88) j =0 f dv i + j L β vα ,j ( x − λ − ω γ v ,i ) for i ∈ { , . . . , n v − d v − } , and (3.8) (cid:98) ( (cid:96) − − j ) / dv (cid:99) (cid:88) i =0 h dv i + j X β vδ ,i = nv − dv − (cid:88) i =0 g dv i + j L β vδ ,i ( x − η ) for j ∈ { , . . . , d v − } , where η = ( λ/β v, ) dv − λ/β v, .Proof. Let v ∈ V , (cid:96) ∈ { , . . . , n v } and f , . . . , f nv − , λ, h , . . . , h (cid:96) − ∈ F satisfythe conditions of the lemma. Then there exist unique elements g , . . . , g nv − ∈ F such that (3.8) holds for j ∈ { , . . . , d v − } . We show that they also satisfy (3.7)for i ∈ { , . . . , n v − d v − } . If j ∈ { , . . . , d v − } satisﬁes j ≥ (cid:96) , then (3.8) impliesthat g dv i + j = 0 for i ∈ { , . . . , n v − d v − } . Therefore, if we let h (cid:96) = · · · = h nv − = 0, then the left-hand sides of (3.6), (3.7) and (3.8) remain unchangedif we replace (cid:96) by 2 n v in the summation bounds. Consequently, (3.7) must holdfor i ∈ { , . . . , n v − d v − } , since otherwise Lemma 2.2 allows us to contradict theuniqueness of the coeﬃcients f , . . . , f nv − in (3.6) by writing the polynomial onthe left-hand side of (3.7) on the basis (cid:8) L β vα , ( x − λ − ω γ v ,i ) , . . . , L β vα , dv − ( x − λ − ω γ v ,i ) (cid:9) for i ∈ { , . . . , n v − d v − } . (cid:3) Theorem 3.10.

Algorithm 3 is correct.Proof.

Suppose that v ∈ V is a leaf. Then Table 2 displays the input and outputrequirements of Algorithm 3 on the vector ( a , . . . , a nv − ) = ( a , a ) for eachpossible input that includes v . The table also shows the output of the algorithmas computed by Lines 1–7. The elements f i and h i that appear in a row of thetable are the coeﬃcients of (3.6) for the speciﬁed value of (cid:96) . Elements denoted byasterisks are unspeciﬁed by the algorithm. As v is a leaf, we have X β v , = 1 , X β v , = xβ v, , L β v , = xβ v, + 1 and L β v , = xβ v, . Algorithm 3

L2X ( v, ( ϕ v ( u, λ )) u ∈ L v , c, (cid:96), b, ( a , . . . , a nv − )) Input: a vertex v ∈ V , the vector ( ϕ v ( u, λ )) u ∈ L v ∈ F n v for some λ ∈ F , c, (cid:96) ∈ N such that c ≤ (cid:96) and 1 ≤ (cid:96) ≤ n v , b ∈ { , } such that 1 ≤ b + c ≤ n v , a i = f i ∈ F for i ∈ { , . . . , c − } , and a i = h i ∈ F for i ∈ { c, . . . , (cid:96) − } . Output: a i = h i ∈ F for i ∈ { , . . . , c − } such that (3.6) holds for some f c , . . . , f nv − ∈ F , and a c = f c if b = 1. if v is a leaf then if c = 2 then a ← a + a , a ← a + ϕ v ( v, λ ) a if c = 1, (cid:96) = 2 and b = 1 then w ← ϕ v ( v, λ ) a , a ← a + a , a ← a + w if c = 1, (cid:96) = 2 and b = 0 then a ← a + ϕ v ( v, λ ) a if c = 0 and (cid:96) = 2 then a ← a + ϕ v ( v, λ ) a if c = 1, (cid:96) = 1 and b = 1 then a ← a return c ← (cid:4) c/ d v (cid:5) , c ← c − d v c (cid:96) ← (cid:4) (cid:96)/ d v (cid:5) , (cid:96) ← (cid:96) − d v (cid:96) (cid:96) (cid:48) ← min(2 d v , (cid:96) ), b (cid:48) ← min( b + c , s ← min( c , (cid:96) ), t ← max( c , (cid:96) ) µ ← ( ϕ v ( u, λ )) u ∈ L vα , ν ← ( ϕ v ( u, λ )) u ∈ L vδ for i = 0 , . . . , c + b (cid:48) − do L2X ( v α , µ, d v , d v , , ( a dv i , a dv i +1 , . . . , a dv ( i +1) − )) µ ← µ + ( ϕ v ( u, σ v, ∆( i ) )) u ∈ L vα if b (cid:48) = 0 then L2X ( v α , µ, d v , d v , , ( a dv ( c − , a dv ( c − , . . . , a dv c − )) for j = c , . . . , t − do L2X ( v δ , ν, c , (cid:96) + 1 , b (cid:48) , ( a j , a dv + j , . . . , a dv (2 nv − dv − j )) for j = t, . . . , (cid:96) (cid:48) − do L2X ( v δ , ν, c , (cid:96) , b (cid:48) , ( a j , a dv + j , . . . , a dv (2 nv − dv − j )) if b (cid:48) = 1 then L2X ( v α , µ, c , (cid:96) (cid:48) , b, ( a dv c , a dv c +1 , . . . , a dv ( c +1) − )) for j = 0 , . . . , s − do L2X ( v δ , ν, c + 1 , (cid:96) + 1 , , ( a j , a dv + j , . . . , a dv (2 nv − dv − j )) for j = s, . . . , c − do L2X ( v δ , ν, c + 1 , (cid:96) , , ( a j , a dv + j , . . . , a dv (2 nv − dv − j ))Moreover, ϕ v ( v, λ ) = λ/β v, for λ ∈ F . Thus, the coeﬃcients of (3.6) satisfy h = f + ϕ v ( v, λ )( f + f ) and h = f + f if (cid:96) = 2, and h = f = f if (cid:96) = 1.Using these equation, one can readily verify that the computed output agrees withthe required output for all inputs. Consequently, Algorithm 3 produces the correctoutput whenever the input vertex is a leaf. Therefore, as ( V, E ) is a full binary tree,it is suﬃcient to show that for all internal v ∈ V , if the algorithm produces thecorrect output whenever v α or v δ is given as an input, then it produces the correctoutput whenever v is given as an input.Let v ∈ V be an internal vertex and suppose that Algorithm 3 produces thecorrect output whenever v α or v δ is given as an input. Let λ ∈ F , c, (cid:96) ∈ N suchthat c ≤ (cid:96) and 1 ≤ (cid:96) ≤ n v , and b ∈ { , } such that 1 ≤ b + c ≤ n v . Supposethat Algorithm 3 is called on v , ( ϕ v ( u, λ )) u ∈ L v , c , (cid:96) , b and ( a , . . . , a nv − ), with a i = f i ∈ F for i ∈ { , . . . , c − } , and a i = h i ∈ F for i ∈ { c, . . . , (cid:96) − } . Then AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 21

Input Required output Computed output c (cid:96) b a a a a a a f f h h f + ϕ v ( v, λ )( f + f ) f + f f h h f f + ϕ v ( v, λ ) h f + h f h h ∗ f + ϕ v ( v, λ ) h ∗ h h f ∗ h + ϕ v ( v, λ ) h ∗ f ∗ h f f f f ∗ h ∗ f ∗ h ∗ f ∗ h ∗ Table 2.

Required and computed outputs of Algorithm 3 when v is a leaf.there exist unique elements h , . . . , h c − , f c , . . . , f nv − ∈ F such that (3.6) holds.In-turn, Lemma 3.9 implies that there exist unique elements g , . . . , g nv − ∈ F such that (3.7) and (3.8) hold.Repeating arguments from the proof of Theorem 3.3 shows that vector µ is equalto ( ϕ v α ( u, λ + ω γ v ,i )) u ∈ L vα each time the recursive call of Line 14 is made, equalto ( ϕ v α ( u, λ + ω γ v ,c − )) u ∈ L vα whenever the recursive call of Line 17 is performed,and equal to ( ϕ v α ( u, λ + ω γ v ,c )) u ∈ L vα whenever the recursive call of Line 23 isperformed. Similarly, the vector ν is equal to ( ϕ v δ ( u, η )) u ∈ L vδ for the recursivecalls of Lines 19, 21, 25 and 27. It follows that if the recursive call of Line 14 ismade with a dv i + j = f dv i + j for j ∈ { , . . . , d v − } , then (3.7) and the assumptionthat the algorithm produces the correct output whenever v α is given as an inputimply that a dv i + j = g dv i + j for j ∈ { , . . . , d v − } afterwards. Similarly, if therecursive call of Line 19 is made with a dv i + j = g dv i + j for i ∈ { , . . . , c − } and a dv i + j = h dv i + j for i ∈ { c , . . . , (cid:96) } , then (3.8) and the assumption thatthe algorithm produces the correct output whenever v δ is given as an input implythat a dv i + j = h dv i + j for i ∈ { , . . . , c − } , and a dv c + j = g dv c + j if b (cid:48) = 1,afterwards. Similar statements hold for the remaining recursive calls made by thealgorithm.The remainder of the proof is split into four cases. For each case, we providean example in either Figure 3 or 4 of how the vector ( a , . . . , a nv − ) evolves dur-ing the algorithm. In the ﬁgures, the vector is represented by the 2 n v − d v × d v matrix ( a dv i + j ) ≤ i< nv − dv , ≤ j< dv . Under this representation, the subvectors thatare subjected to recursive calls by the algorithm correspond to row and columnvectors of the matrix. Asterisks in the ﬁgures represent unspeciﬁed entries, whileentries surrounded by parenthesis are computed only if b = 1, and are unspeciﬁedotherwise. Case a:

Suppose that (cid:96) = 0. Then Lines 8–11 of the algorithm set c = 0, c = s = c , (cid:96) = (cid:96) (cid:48) = t = (cid:96) and b (cid:48) = 1. Thus, Lines 13–17 have no eﬀect.Equation (3.8) implies that Lines 18–19 set a j = g j for j ∈ { c, . . . , (cid:96) − } . Lines 20–21 have no eﬀect since (cid:96) (cid:48) = t . Equation (3.7) implies that Lines 22–23 set a i = g i for i ∈ { , . . . , c − } , and a c = f c if b = 1. Equation (3.8) implies that Lines 24–25set a i = h i for i ∈ { , . . . , c − } . Finally, Lines 26–27 have no eﬀect since s = c . Case b:

Suppose that (cid:96) (cid:54) = 0 and c = 0. Then Lines 8–11 set c = c/ d v , (cid:96) (cid:48) = 2 d v , b (cid:48) = b , s = 0 and t = (cid:96) . Equation (3.7) implies that Lines 13–17 set Case a Case b f f f h h h ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   f f f f f f f f f f f f f f f f h h h h h ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   (3a) Initial contents. f f f h h h ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   g g g g g g g g g g g g g g g g h h h h h ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   (3b) After Lines 13–17 make recursive calls on the highlighted rows. f f f g g g ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   h h h h h h h h h h h h h h h h ( g ) ( g ) ( g ) ( g ) ( g ) ( g ) ( g ) ( g ) ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   (3c) After Lines 18–21 make recursive calls on the highlighted columns. g g g ( f ) ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   h h h h h h h h h h h h h h h h ( f ) ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   (3d) After Lines 23 makes a recursive call on the highlighted row. h h h ( f ) ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   h h h h h h h h h h h h h h h h ( f ) ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   (3e) After Lines 24–27 make recursive calls on the highlighted columns. Figure 3.

Evolution of the vector ( a , . . . , a nv − ) during Algo-rithm 3 for n v = 5, d v = 3, c = 3, (cid:96) = 6 (Case a), and n v = 5, d v = 3, c = 16, (cid:96) = 21 (Case b). a i = g i for i ∈ { , . . . , c − } . Thus, (3.8) implies that Lines 18–21 set a i = h i for i ∈ { , . . . , c − } , and a c + j = g c + j for j ∈ { , . . . , d v − } if b = 1. Equation (3.7)implies that Lines 22–23 set a c = f c if b = 1, and have no eﬀect otherwise. Lines 24–27 have no eﬀect since s = c = 0. Case c:

Suppose that (cid:96) (cid:54) = 0 and 0 < c ≤ (cid:96) . Then Lines 8–11 set (cid:96) (cid:48) = 2 d v , b (cid:48) = 1, s = c and t = (cid:96) . Equation (3.7) implies that Lines 13–17 set a i = g i for i ∈ { , . . . , d v c − } . Thus, (3.8) implies that Lines 18–21 set a dv i + j = h dv i + j and AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 23 a dv c + j = g dv c + j for i ∈ { , . . . , c − } and j ∈ { c , . . . , d v − } . Equation (3.7)implies that Lines 22–23 set a dv c + j = g dv c + j for j ∈ { , . . . , c − } , and a c = f c if b = 1. Equation (3.8) implies that Lines 24–25 set a dv i + j = h dv i + j for i ∈{ , . . . , c } and j ∈ { , . . . , c − } . Lines 26–27 have no eﬀect since s = c . Case d:

Suppose that (cid:96) (cid:54) = 0 and (cid:96) < c . Then Lines 8–11 set (cid:96) (cid:48) = 2 d v , b (cid:48) = 1, s = (cid:96) and t = c . Equation (3.7) implies that Lines 13–17 set a i = g i for i ∈ { , . . . , d v c − } . Lines 18–19 have no eﬀect since t = c . Equation (3.8) impliesthat Lines 20–21 set a dv i + j = h dv i + j and a dv c + j = g dv c + j for i ∈ { , . . . , c − } and j ∈ { c , . . . , d v − } . Equation (3.7) implies Lines 22–23 set a dv c + j = g dv c + j for j ∈ { , . . . , c − } , and a c = f c if b = 1. Equation (3.8) implies that Lines 24–27set a dv i + j = h dv i + j for i ∈ { , . . . , c } and j ∈ { , . . . , c − } .In each of the four cases, Algorithm 3 terminates with a i = h i for i ∈ { , . . . , c − } , and a c = f c if b = 1, as required. Hence, for internal v ∈ V , if the algorithmproduces the correct output whenever v α or v δ is given as an input, then it producesthe correct output whenever v is given as an input. (cid:3) Algorithm 3 requires the same precomputations as the algorithms of Section 3.2.However, as the length of the vector on which the algorithm operates is no longertied to the polynomial length (cid:96) , the auxiliary space requirements grow to 2 n v − (cid:96) + O ( n ) ﬁeld elements, where the last term accounts for the storage of the vectors µ and ν , and the precomputed elements ϕ v ( u, σ v,i ). The update to the vector µ inLine 15 of the algorithm performs ( c + b (cid:48) − d v = ( (cid:4) c/ d v (cid:5) + b (cid:48) − d v additions overall iterations of its containing loop. Therefore, as for the algorithms of Section 3.2,small values of d v should be avoided when choosing a reduction tree. Example 3.11.

For β equal to a Cantor basis of dimension 15, and inputs (cid:96) ∈{ , . . . , } , c = (cid:96) and b = 0, Figure 5 shows the maximum and minimum numberof additions performed by Algorithm 3 over all possible reduction trees for thebasis. The number of multiplications performed by the algorithm for each valueof (cid:96) , which is independent of the choice of reduction tree, is also shown in theﬁgure. As for Example 3.4, the number of additions performed for each value of (cid:96) is maximised by the tree with Im( d ) ⊆ { , } , and minimised by the tree such that d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V . Theorem 3.12.

Algorithm 3 performs at most min (cid:18) c + b −

12 ( (cid:100) log c + b (cid:101) −

1) + (cid:96) − , n v − n v (cid:19) multiplications in F , and at most min (cid:18) c + b −

12 (3 (cid:100) log c + b (cid:101) −

1) + (cid:96) − , n v − (3 n v −

2) + 1 (cid:19) additions in F . We split the proof of Theorem 3.12 into four lemmas, one for each bound. Itis readily veriﬁed that the bounds hold if the input vertex is a leaf. Therefore, as(

V, E ) is a full binary tree, it is suﬃcient to show for each bound that if v ∈ V isan internal vertex such that the bound holds whenever the input vertex is v α or v δ ,then the bound holds whenever v is the input vertex. Lemma 3.13.

Algorithm 3 performs at most n v − n v multiplications in F . Case c Case d f f f f f f f f f f f f f f f f f f h h h h h h h h h h ∗ ∗ ∗ ∗   f f f f f f f f f f f f f f f f f f f f f f h h h h h h ∗ ∗ ∗ ∗   (4a) Initial contents. g g g g g g g g g g g g g g g g f f h h h h h h h h h h ∗ ∗ ∗ ∗   g g g g g g g g g g g g g g g g f f f f f f h h h h h h ∗ ∗ ∗ ∗   (4b) After Lines 13–17 make recursive calls on the highlighted rows. g g h h h h h h g g h h h h h h f f g g g g g g h h ∗ ∗ ∗ ∗ ∗ ∗   g g g g g g h h g g g g g g h h f f f f f f g g h h h h ∗ ∗ ∗ ∗   (4c) After Lines 18–21 make recursive calls on the highlighted columns. g g h h h h h h g g h h h h h h g g ( f ) ∗ ∗ ∗ ∗ ∗ h h ∗ ∗ ∗ ∗ ∗ ∗   g g g g g g h h g g g g g g h h g g g g g g ( f ) ∗ h h h h ∗ ∗ ∗ ∗   (4d) After Lines 23 makes a recursive call on the highlighted row. h h h h h h h h h h h h h h h h h h ( f ) ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   h h h h h h h h h h h h h h h h h h h h h h ( f ) ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗   (4e) After Lines 24–27 make recursive calls on the highlighted columns. Figure 4.

Evolution of the vector ( a , . . . , a nv − ) during Algo-rithm 3 for n v = 5, d v = 3, c = 18, (cid:96) = 28 (Case c), and n v = 5, d v = 3, c = 22, (cid:96) = 28 (Case d). Proof.

Suppose that Algorithm 3 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lines 13–17of the algorithm perform at most c d v − d v ≤ (2 n v − d v − b (cid:48) )2 d v − d v multiplications,since c ≤ n v − d v with equality implying c = b = b (cid:48) = 0. If (cid:96) ≥ d v , then (cid:96) (cid:48) = 2 d v ≥ t ≥ c . If (cid:96) < d v , then (cid:96) = (cid:96) = (cid:96) (cid:48) ≥ c = c . Thus, the inequalities c ≤ t ≤ (cid:96) (cid:48) ≤ d v hold in either case. It follows that Lines 18–21 perform atmost (2 d v − c )2 n v − d v − ( n v − d v ) multiplications. Lines 22–23 performs at most AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 25 . . . · · Polynomial length ( ‘ ) N u m b e r o f a dd i t i o n s o r m u l t i p li c a t i o n s Max. additionsMin. additionsMultiplications

Figure 5.

Maximum and minimum number of operations per-formed by Algorithm 3 (Algorithm 4) for Cantor bases of dimen-sion 15, with parameters c = (cid:96) and b = 0 (respectively, c = (cid:96) ). b (cid:48) d v − d v multiplications, while Lines 24–27 perform at most c n v − d v − ( n v − d v ).Summing these contributions, it follows that Algorithm 3 performs at most 2 n v − n v multiplications. (cid:3) Lemma 3.14.

Algorithm 3 performs at most (3.9) c + b −

12 ( (cid:100) log c + b (cid:101) −

1) + (cid:96) − multiplications in F .Proof. Suppose that Algorithm 3 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lemma 3.13implies that Lines 13–17 perform at most(3.10) c d v − d v = c − c d v multiplications. As c ≤ t ≤ (cid:96) (cid:48) ≤ d v , Lines 18–21 perform at most(3.11) (cid:0) d v − c (cid:1) c + b (cid:48) −

12 ( (cid:100) log c + b (cid:48) (cid:101) −

1) + ( (cid:96) (cid:48) − c )( (cid:96) −

1) + t − c multiplications. Lines 22–23 perform at most(3.12) b (cid:48) (cid:18) c + b −

12 ( (cid:100) log max( c + b, (cid:101) −

1) + (cid:96) (cid:48) − (cid:19) multiplications, while Lines 24–27 perform at most(3.13) c (cid:16) c (cid:100) log c + 1 (cid:101) −

1) + (cid:96) − (cid:17) + s multiplications. We show that the sum of these contributions is bounded by (3.9).Suppose that b (cid:48) = 1. Then 1 ≤ c + b ≤ c + b . Thus, the sum of (3.10)–(3.13) inthis case is at most(3.14) c − c (cid:100) log c + 1 (cid:101) + d v −

1) + c + b −

12 ( (cid:100) log c + b (cid:101) −

1) + (cid:96) − , since (cid:96) (cid:48) ( (cid:96) −

1) + s + t − c = (cid:96) (cid:48) ( (cid:96) −

1) + (cid:96) ≤ d v (cid:96) + (cid:96) − (cid:96) (cid:48) = (cid:96) − (cid:96) (cid:48) . If c = 0, then c = c . If c (cid:54) = 0 and c (cid:54) = 0, then (cid:100) log c + 1 (cid:101) = (cid:6) log (cid:6) c/ d v (cid:7)(cid:7) = (cid:100) log c (cid:101) − d v ≤ (cid:100) log c + b (cid:101) − d v . If c (cid:54) = 0 and c = 0, then b = 1 since c + b ≥

1, and (cid:100) log c + 1 (cid:101) = (cid:6) log c/ d v + 1 (cid:7) = (cid:100) log c + 1 (cid:101) − d v = (cid:100) log c + b (cid:101) − d v . In all three of these cases, substituting into (3.14) yields (3.9).Suppose now that b (cid:48) = 0. Then c = 0 and b = 0. As c + b ≥

1, it followsthat c = c/ d v (cid:54) = 0. Thus, (cid:96) (cid:48) = 2 d v , since (cid:96) ≥ c ≥ d v . Therefore, the sum of(3.10)–(3.13) in this case is at most c d v + c − d v (cid:100) log c (cid:101) −

1) + (cid:96) − d v = c d v + c −

12 ( (cid:100) log c (cid:101) − d v − − d v −

12 ( (cid:100) log c (cid:101) −

1) + (cid:96) − d v ≤ c d v + c −

12 ( (cid:100) log c (cid:101) − d v −

1) + 2 d v −

12 + (cid:96) − d v = c −

12 ( (cid:100) log c (cid:101) −

1) + (cid:96) − − (cid:0) d v − d v − (cid:1) ≤ c + b −

12 ( (cid:100) log c + b (cid:101) −

1) + (cid:96) − , as required. (cid:3) Lemma 3.15.

Algorithm 3 performs at most n v − (3 n v −

2) + 1 additions in F .Proof. Suppose that Algorithm 3 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lines 13–17of the algorithm perform at most c (cid:0) d v − (3 d v −

2) + 1 + d v (cid:1) − (1 − b (cid:48) ) d v ≤ (cid:0) n v − d v − b (cid:48) (cid:1)(cid:0) d v − (3 d v −

2) + 1 (cid:1) + (cid:0) n v − d v − (cid:1) d v additions, since c ≤ n v − d v with equality implying that c = b = b (cid:48) = 0. As c ≤ t ≤ (cid:96) (cid:48) ≤ d v , Lines 18–21 perform at most (2 d v − c )(2 n v − d v − (3( n v − d v ) −

2) + 1)additions. Lines 22–23 perform at most b (cid:48) (2 d v − (3 d v −

2) + 1) additions, whileLines 24–27 perform at most c (2 n v − d v − (3( n v − d v ) −

2) + 1) additions. Summingthese contributions, it follows that Algorithm 3 performs at most2 n v − (3 n v −

2) + 1 − (cid:0) d v − d v − (cid:1)(cid:0) n v − d v − (cid:1) ≤ n v − (3 n v −

2) + 1additions. (cid:3)

AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 27

Lemma 3.16.

Algorithm 3 performs at most (3.15) c + b −

12 (3 (cid:100) log c + b (cid:101) −

1) + (cid:96) − additions in F .Proof. Suppose that Algorithm 3 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lemma 3.15implies that Lines 13–17 perform at most(3.16) c (cid:0) d v − (3 d v −

2) + 1 + d v (cid:1) − (1 − b (cid:48) ) d v ≤ c − c d v − (1 − b (cid:48) ) d v additions. As c ≤ t ≤ (cid:96) (cid:48) ≤ d v , Lines 18–21 perform at most(3.17) (cid:0) d v − c (cid:1) c + b (cid:48) −

12 (3 (cid:100) log c + b (cid:48) (cid:101) −

1) + ( (cid:96) (cid:48) − c )( (cid:96) −

1) + t − c additions. Lines 22–23 perform at most(3.18) b (cid:48) (cid:18) c + b −

12 (3 (cid:100) log max( c + b, (cid:101) −

1) + (cid:96) (cid:48) − (cid:19) additions, while Lines 24–27 perform at most(3.19) c (cid:16) c (cid:100) log c + 1 (cid:101) −

1) + (cid:96) − (cid:17) + s additions. We show that the sum of these contributions is bounded by (3.15).Suppose that b (cid:48) = 1. Then the sum of (3.16)–(3.19) is at most(3.20) c − c (cid:100) log c + 1 (cid:101) + 3 d v −

1) + c + b −

12 (3 (cid:100) log c + b (cid:101) −

1) + (cid:96) − . If c = 0, then c = c . If not, then (cid:100) log c + 1 (cid:101) ≤ (cid:100) log c + b (cid:101) − d v . In either case,substituting into (3.20) yields (3.15).Suppose now that b (cid:48) = 0. Then c = b = 0, c (cid:54) = 0 and (cid:96) (cid:48) = 2 d v . It follows thatthe sum of (3.16)–(3.19) in this case is at most c d v − d v + c − d v (cid:100) log c (cid:101) −

1) + (cid:96) − d v ≤ c −

12 3 d v + 12 d v + c −

12 (3 (cid:100) log c (cid:101) − d v −

1) + 2 d v −

12 + (cid:96) − d v = c −

12 (3 (cid:100) log c (cid:101) −

1) + (cid:96) − − (cid:0) d v − d v − (cid:1) ≤ c + b −

12 (3 (cid:100) log c + b (cid:101) −

1) + (cid:96) − , as required. (cid:3) Conversion from the Lin–Chung–Han basis to the Lagrange basis.

We propose Algorithm 4 for converting from the LCH basis to the Lagrange basis.The parameter c plays a diﬀerent role than in Algorithm 3, with its function beingto specify the number Lagrange basis coeﬃcients returned by the algorithm, ratherthan to specify a mixture of coeﬃcients. For conversion from the LCH basis tothe Lagrange basis, Algorithm 4 is initially called with c = 2 n v . Smaller initialvalues of c are relevant, for example, when using the algorithm within the Hermiteevaluation algorithm of Coxon [12]. Theorem 3.17.

Algorithm 4 is correct.

Algorithm 4

X2L ( v, ( ϕ v ( u, λ )) u ∈ L v , c, (cid:96), ( a , . . . , a nv − )) Input: a vertex v ∈ V , the vector ( ϕ v ( u, λ )) u ∈ L v ∈ F n v for some λ ∈ F , c, (cid:96) ∈{ , , . . . , n v } , and a i = h i ∈ F for i ∈ { , . . . , (cid:96) − } . Output: a i = f i ∈ F for i ∈ { , . . . , c − } such that (3.6) holds for some f c , . . . , f nv − ∈ F . if v is a leaf then if c = 2 and (cid:96) = 2 then a ← a + ϕ v ( v, λ ) a , a ← a + a if c = 1 and (cid:96) = 2 then a ← a + ϕ v ( v, λ ) a if c = 2 and (cid:96) = 1 then a ← a return c ← (cid:6) c/ d v (cid:7) − c ← c − d v c (cid:96) ← (cid:4) (cid:96)/ d v (cid:5) , (cid:96) ← (cid:96) − d v (cid:96) (cid:96) (cid:48) ← min(2 d v , (cid:96) ) µ ← ( ϕ v ( u, λ )) u ∈ L vα , ν ← ( ϕ v ( u, λ )) u ∈ L vδ for j = 0 , . . . , (cid:96) − do X2L ( v δ , ν, c + 1 , (cid:96) + 1 , ( a j , a dv + j , . . . , a dv (2 nv − dv − j )) for j = (cid:96) , . . . , (cid:96) (cid:48) − do X2L ( v δ , ν, c + 1 , (cid:96) , ( a j , a dv + j , . . . , a dv (2 nv − dv − j )) for i = 0 , . . . , c − do X2L ( v α , µ, d v , (cid:96) (cid:48) , ( a dv i , a dv i +1 , . . . , a dv ( i +1) − )) µ ← µ + ( ϕ v ( u, σ v, ∆( i ) )) u ∈ L vα X2L ( v α , µ, c , (cid:96) (cid:48) , ( a dv c , a dv c +1 , . . . , a dv ( c +1) − )) Proof.

Table 3 displays the input and output requirements of Algorithm 4 when theinput vertex v is a leaf, as well as showing the output of the algorithm as computedby Lines 1–5. The elements f i and h i that appear in a row of the table are thecoeﬃcients of (3.6) for the speciﬁed value of (cid:96) . Elements denoted by asterisksare unspeciﬁed by the algorithm. As v is a leaf, the coeﬃcients of (3.6) satisfy f = h + ϕ v ( v, λ ) h and f = h + ( h + ϕ v ( v, λ ) h ) if (cid:96) = 2, and f = f = h if (cid:96) = 1. Using these equation, one can readily verify that the computed outputagrees with the required output for all inputs. Consequently, Algorithm 4 producesthe correct output whenever the input vertex is a leaf. Therefore, as ( V, E ) is afull binary tree, it is suﬃcient to show that for all internal v ∈ V , if the algorithmproduces the correct output whenever v α or v δ is given as an input, then it producesthe correct output whenever v is given as an input.Input Required output Computed output c (cid:96) a a a a a a h h f f h + ϕ v ( v, λ ) h h + ( h + ϕ v ( v, λ ) h )1 2 h h f ∗ h + ϕ v ( v, λ ) h ∗ h ∗ f f h h h ∗ f ∗ h ∗ Table 3.

Required and computed outputs of Algorithm 4 when v is a leaf.Let v ∈ V be an internal vertex and suppose that Algorithm 4 produces thecorrect output whenever v α or v δ is given as an input. Suppose that the algorithm AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 29 is called on v , the vector ( ϕ v ( u, λ )) u ∈ L v for some λ ∈ F , integers c, (cid:96) ∈ { , , . . . , n v } and ( a , . . . , a nv − ), with a i = h i ∈ F for i ∈ { , . . . , (cid:96) − } . Then there exist uniqueelements f , . . . , f nv − ∈ F such that (3.6) holds. In-turn, Lemma 3.9 implies thatthere exist unique elements g , . . . , g nv − ∈ F such that (3.7) and (3.8) hold.Once again repeating arguments from the proof of Theorem 3.3 shows that ν =( ϕ v δ ( u, η )) u ∈ L vδ for the recursive calls of Lines 9–12, µ = ( ϕ v α ( u, λ + ω γ v ,i )) u ∈ L vα each time the recursive call of Line 14 is performed, and ﬁnally µ = ( ϕ v α ( u, λ + ω γ v ,c )) u ∈ L vα for the recursive call of Line 16. Thus, (3.8) and the assumption thatthe algorithm produces the correct output whenever v δ is given as an input implythat Lines 9–12 set a dv i + j = g dv i + j for i ∈ { , . . . , c } and j ∈ { , . . . , min(2 d v , (cid:96) ) − } . Consequently, (3.7) and the assumption that the algorithm produces the correctoutput whenever v α is given as an input imply that Lines 13–15 set a i = f i for i ∈ { , . . . , d v c − } , and that Line 16 sets a i = f i for i ∈ { d v c , . . . , c − } .Therefore, the algorithm terminates with a i = f i for i ∈ { , . . . , c − } , as required.Hence, for internal v ∈ V , if the algorithm produces the correct output whenever v α or v δ is given as an input, then it produces the correct output whenever v isgiven as an input. (cid:3) Algorithm 4 requires the same precomputations as the algorithms of Sections 3.2and 3.3, while the algorithm requires auxiliary space for 2 n v − max( c, (cid:96) ) + O ( n )ﬁeld elements. Small values of d v should once again be avoided when choosing areduction tree for the algorithm, in order to help reduce the number of additionsperformed by the updates to the vector µ in Line 15. Example 3.18.

For β equal to a Cantor basis of dimension 15, and inputs (cid:96) ∈{ , . . . , } and c = (cid:96) , Figure 5 also shows the maximum and minimum number ofadditions performed by Algorithm 4 over all possible reduction trees for the basis,as well as the number of multiplications performed by the algorithm for all suchtrees. Thus, Algorithm 3 with c = (cid:96) and b = 0 performs the same number ofoperations for both extremes (see Example 3.11). As for Examples 3.4 and 3.11,the maximum and minimum number of additions performed for each (cid:96) are givenrespectively by the trees with d v = 1 and d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V . Theorem 3.19.

Algorithm 4 performs at most min (cid:18) c −

12 ( (cid:100) log c (cid:101) −

1) + (cid:96) − , n v − n v (cid:19) multiplications in F , and at most min (cid:18) c −

12 (3 (cid:100) log c (cid:101) −

1) + (cid:96) − , n v − (3 n v −

2) + 1 (cid:19) additions in F . We split the proof of Theorem 3.19 into four lemmas, one for each bound. Itis readily veriﬁed that the bounds hold if the input vertex is a leaf. Therefore, as(

Algorithm 4 performs at most n v − n v multiplications in F . Proof.

Suppose that Algorithm 4 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lines 9–12of the algorithm perform at most (cid:96) (cid:48) n v − d v − ( n v − d v ) ≤ n v − ( n v − d v ) multipli-cations, while Lines 13–16 perform at most ( c + 1)2 d v − d v ≤ n v − d v multiplica-tions. Summing theses contributions, it follows that Algorithm 4 performs at most2 n v − n v multiplications. (cid:3) Lemma 3.21.

Algorithm 4 performs at most ( c − (cid:100) log c (cid:101) − / (cid:96) − multi-plications in F .Proof. Suppose that Algorithm 4 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lines 9–12perform at most2 d v c (cid:100) log c + 1 (cid:101) −

1) + (cid:96) (cid:48) ( (cid:96) −

1) + (cid:96) ≤ c − c (cid:100) log c + 1 (cid:101) −

1) + (cid:96) − (cid:96) (cid:48) multiplications. Lemma 3.20 implies that Lines 13–16 perform at most c d v − d v + c −

12 ( (cid:100) log c (cid:101) − (cid:96) (cid:48) − c − c d v + c −

12 ( (cid:100) log c (cid:101) − (cid:96) (cid:48) − c ≤ c , it follows that Algorithm 4 performs at most c − c (cid:100) log c + 1 (cid:101) + d v −

1) + c −

12 ( (cid:100) log c (cid:101) −

1) + (cid:96) − c − (cid:100) log c (cid:101) − / (cid:96) − c = c if c = 0, and (cid:100) log c + 1 (cid:101) = (cid:100) log c (cid:101) − d v otherwise. (cid:3) Lemma 3.22.

Algorithm 4 performs at most n v − (3 n v −

2) + 1 additions in F .Proof. Suppose that Algorithm 4 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lines 9–12of the algorithm perform at most (cid:96) (cid:48) (cid:0) n v − d v − (3( n v − d v ) −

2) + 1 (cid:1) ≤ n v − (3 n v − d v ) − d v (cid:0) n v − d v − (cid:1) additions. Lines 13–16 perform at most( c + 1) (cid:0) d v − (3 d v −

2) + d v + 1 (cid:1) − d v ≤ n v − (3 d v −

2) + 1 + ( d v + 1) (cid:0) n v − d v − (cid:1) additions. It follows that Algorithm 4 performs at most2 n v − (3 n v −

2) + 1 − (cid:0) d v − d v − (cid:1)(cid:0) n v − d v − (cid:1) ≤ n v − (3 n v −

2) + 1additions. (cid:3)

Lemma 3.23.

Algorithm 4 performs at most ( c − (cid:100) log c (cid:101)− / (cid:96) − additionsin F .Proof. Suppose that Algorithm 4 is called on an internal vertex v ∈ V such that thebound of the lemma holds whenever the input vertex is v α or v δ . Then Lines 9–12perform at most2 d v c (cid:100) log c + 1 (cid:101) −

1) + (cid:96) (cid:48) ( (cid:96) −

1) + (cid:96) ≤ c − c (cid:100) log c + 1 (cid:101) −

1) + (cid:96) − (cid:96) (cid:48) additions. Lemma 3.22 implies that Lines 13–15 perform at most c (cid:0) d v − (3 d v −

2) + 1 + d v (cid:1) = c − c d v − c (cid:0) d v − d v − (cid:1) ≤ c − c d v AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 31 additions. Line 16 performs at most c −

12 (3 (cid:100) log c (cid:101) −

1) + (cid:96) (cid:48) − c ≤ c , it follows that Algorithm 4 performs at most c − c (cid:100) log c + 1 (cid:101) + 3 d v −

1) + c −

12 (3 (cid:100) log c (cid:101) −

1) + (cid:96) − c − (cid:100) log c (cid:101) − / (cid:96) − c = c if c = 0, and (cid:100) log c + 1 (cid:101) = (cid:100) log c (cid:101) − d v otherwise. (cid:3) Interlude: generalised Taylor expansion.

The generalised Taylor expan-sion of a polynomial F ∈ F [ x ] at a degree t ≥ T ∈ F [ x ], also called its T -adic expansion, is the series expansion F = F + F T + F T + · · · such that F i ∈ F [ x ] t for i ∈ N . Gao and Mateer [14, Section II] provide a fastalgorithm for computing the coeﬃcients of the Taylor expansion when T = x t − x with t ≥

2. The algorithm is then utilised as part of their additive FFT algorithms.Our algorithm for converting from the monomial basis to the LCH basis similarlyrelies on their generalised Taylor expansion algorithm. Consequently, we make abrief aside to recall their algorithm.The algorithm of Gao and Mateer can be viewed as a specialisation of the re-cursive algorithm of von zur Gathen [27] that takes advantage of easy division by( x t − x ) k = x k t − x k in characteristic two. We present a nonrecursive version oftheir algorithm modelled on the basis conversion algorithms of van der Hoeven andSchost [26, Section 2.2]. We also present the inverse algorithm, which recovers apolynomial from the coeﬃcients of its Taylor expansion at x t − x , as it is requiredby our algorithm for converting from the LCH basis to monomial basis. Finally, wederive a bound on the complexity of both algorithms that is tighter than the oneprovided by Gao and Mateer.Let F ∈ F [ x ] (cid:96) and t ≥ k ∈ N , deﬁne F k, , F k, , . . . ∈ F [ x ] k t by the equation(3.21) F = (cid:88) i ∈ N F k,i (cid:0) x t − x (cid:1) k i . Then F , , F , , . . . are the coeﬃcients of the Taylor expansion at x t − x , while F k, = F for k ≥ (cid:100) log (cid:100) (cid:96)/t (cid:101)(cid:101) . By grouping terms of indices 2 i and 2 i + 1 in (3.21),it follows that F = (cid:88) i ∈ N (cid:16) F k, i + F k, i +1 (cid:16) x k t − x k (cid:17)(cid:17)(cid:0) x t − x (cid:1) k +1 i for k ∈ N . Thus, we obtain the recursive formula F k +1 ,i = F k, i + x k F k, i +1 + x k t F k, i +1 for k, i ∈ N . Given F k, i and F k, i +1 on the monomial basis, the recursive formula allows F k +1 ,i to be readily computed on the monomial basis. The formula also allows this compu-tation to be easily inverted. Therefore, given the Taylor coeﬃcients F , , F , , . . . onthe monomial basis, we can eﬃciently compute F = F (cid:100) log (cid:100) (cid:96)/t (cid:101)(cid:101) , on the monomialbasis by means of the recursive formula, and vice versa. Using this observation, weobtain Algorithms 5 and 6. Algorithm 5

TaylorExpansion ( t, (cid:96), ( a , . . . , a (cid:96) − )) Input:

Integers t ≥ (cid:96) ≥

1, and a i = f i ∈ F for i ∈ { , . . . , (cid:96) − } . Output: a i = c i for i ∈ { , . . . , (cid:96) − } such that(3.22) (cid:100) (cid:96)/t (cid:101)− (cid:88) i =0  min( (cid:96) − ti,t ) − (cid:88) j =0 c ti + j x j (cid:0) x t − x (cid:1) i = (cid:96) − (cid:88) i =0 f i x i . for k = (cid:100) log (cid:100) (cid:96)/t (cid:101)(cid:101) − , . . . , do (cid:96) ← (cid:4) (cid:96)/ (2 k +1 t ) (cid:5) , (cid:96) ← (cid:96) − k +1 t(cid:96) for i = 0 , . . . , (cid:96) − do for j = 2 k t − , . . . , do a k t (2 i )+2 k + j ← a k t (2 i )+2 k + j + a k t (2 i +1)+ j for j = (cid:96) − k t − , . . . , do a k t (2 (cid:96) )+2 k + j ← a k t (2 (cid:96) )+2 k + j + a k t (2 (cid:96) +1)+ j Algorithm 6

InverseTaylorExpansion ( t, (cid:96), ( a , . . . , a (cid:96) − )) Input:

Integers t ≥ (cid:96) ≥

1, and a i = c i ∈ F for i ∈ { , . . . , (cid:96) − } . Output: a i = f i i ∈ { , . . . , (cid:96) − } such that (3.22) holds. for k = 0 , . . . , (cid:100) log (cid:100) (cid:96)/t (cid:101)(cid:101) − do (cid:96) ← (cid:4) (cid:96)/ (2 k +1 t ) (cid:5) , (cid:96) ← (cid:96) − k +1 t(cid:96) for i = 0 , . . . , (cid:96) − do for j = 0 , . . . , k t − do a k t (2 i )+2 k + j ← a k t (2 i )+2 k + j + a k t (2 i +1)+ j for j = 0 , . . . , (cid:96) − k t − do a k t (2 (cid:96) )+2 k + j ← a k t (2 (cid:96) )+2 k + j + a k t (2 (cid:96) +1)+ j Lemma 3.24.

Algorithms 5 and 6 perform at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:100) (cid:96)/t (cid:101)(cid:101) additions in F .Proof. For each k ∈ { , . . . , (cid:100) log (cid:100) (cid:96)/t (cid:101)(cid:101) − } , Lines 2–7 of either algorithm performat most2 k t(cid:96) + max( (cid:96) − k t, ≤ k t(cid:96) + max( (cid:96) − (cid:100) (cid:96) / (cid:101) , (cid:98) (cid:96) / (cid:99) ) = 2 k t(cid:96) + (cid:98) (cid:96) / (cid:99) = (cid:98) (cid:96)/ (cid:99) additions in F . (cid:3) Conversion between the Lin–Chung–Han and monomial bases.

Weuse Lemma 2.1 to provide algorithms for converting between the monomial basisand the “twisted” LCH basis { X β v , ( β v, x ) , . . . , X β v ,(cid:96) − ( β v, x ) } of F [ x ] (cid:96) , for v ∈ V and (cid:96) ∈ { , . . . , n v } . Conversions between the LCH and monomial bases thenrequire at most an additional max(2 (cid:96) − ,

0) multiplications for performing thesubstitution x (cid:55)→ x/β v, or x (cid:55)→ β v, x . In particular, no additional multiplicationsare required if β is a Cantor basis, since β v, = 1 for all v ∈ V (see Remark 3.1).We base the conversion algorithms on the following analogue of Lemma 2.2. Lemma 3.25.

Let v ∈ V be an internal vertex and (cid:96) ∈ { , . . . , n v } . Suppose that h , . . . , h (cid:96) − , λ, g , . . . , g (cid:96) − , c , . . . , c (cid:96) − ∈ F satisfy (3.23) min ( (cid:96) − dv i, dv ) − (cid:88) j =0 g dv i + j x j = min ( (cid:96) − dv i, dv ) − (cid:88) j =0 h dv i + j X β vα ,j ( β v α , x ) AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 33 for i ∈ { , . . . , (cid:6) (cid:96)/ d v (cid:7) − } , and (3.24) (cid:100) ( (cid:96) − j ) / dv (cid:101) − (cid:88) i =0 c dv i + j x i = (cid:100) ( (cid:96) − j ) / dv (cid:101) − (cid:88) i =0 g dv i + j X β vδ ,i ( β v δ , x ) for j ∈ { , . . . , min(2 d v , (cid:96) ) − } . Then (3.25) (cid:100) (cid:96)/ dv (cid:101) − (cid:88) i =0  min ( (cid:96) − dv i, dv ) − (cid:88) j =0 c dv i + j x j (cid:32) x dv − xβ v δ , (cid:33) i = (cid:96) − (cid:88) i =0 h i X β v ,i ( β v, x ) . Proof.

Let v ∈ V be an internal vertex and (cid:96) ∈ { , . . . , n v } . Suppose that h , . . . , h (cid:96) − , λ, g , . . . , g (cid:96) − , c , . . . , c (cid:96) − ∈ F satisfy equations (3.23) and (3.24).Then β v,i /β v, ∈ F dv for i ∈ { , . . . , d v − } , since ( V, E ) is a reduction treefor β . Thus, Lemma 2.1 implies that (cid:96) − (cid:88) i =0 h i X β v ,i ( β v, x ) = (cid:100) (cid:96)/ dv (cid:101) − (cid:88) i =0  min ( (cid:96) − dv i, dv ) − (cid:88) j =0 h dv i + j X β vα ,j ( β v, x )  × X β vδ ,i (cid:16) x dv − x (cid:17) . Substituting in β v, = β v α , , (3.23) and (3.24), it follows that (cid:96) − (cid:88) i =0 h i X β v ,i ( β v, x ) = (cid:100) (cid:96)/ dv (cid:101) − (cid:88) i =0  min ( (cid:96) − dv i, dv ) − (cid:88) j =0 g dv i + j x j  X β vδ ,i (cid:16) x dv − x (cid:17) = min ( dv ,(cid:96) ) − (cid:88) j =0  (cid:100) ( (cid:96) − j ) / dv (cid:101) − (cid:88) i =0 g dv i + j X β vδ ,i (cid:32) β v δ , x dv − xβ v δ , (cid:33) x j = min ( dv ,(cid:96) ) − (cid:88) j =0  (cid:100) ( (cid:96) − j ) / dv (cid:101) − (cid:88) i =0 c dv i + j (cid:32) x dv − xβ v δ , (cid:33) i  x j . Hence, (3.25) holds. (cid:3)

Using Lemma 3.25, we obtain Algorithms 7 and 8 for converting between themonomial basis and the twisted basis { X β v , ( β v, x ) , . . . , X β v ,(cid:96) − ( β v, x ) } of F [ x ] (cid:96) .Each algorithm makes what is now a familiar pattern of recursive calls, but withthe addition of the computation of either a generalised Taylor expansion or theinverse transformation, for which the algorithms of Section 3.5 are used. Theorem 3.26.

Algorithms 7 and 8 are correct.Proof.

We prove correctness for Algorithm 7 by induction on (cid:96) . The proof ofcorrectness for Algorithm 8 is omitted since it is almost identical. For v ∈ V , wehave X β v , ( β v, x ) = 1 and X β v , ( β v, x ) = x . Thus, Algorithm 7 produces thecorrect output for all inputs with (cid:96) ≤

2. In particular, it follows that the algorithmproduces the correct output whenever the input vertex is a leaf. Therefore, it issuﬃcient to show that for internal v ∈ V , if the algorithm produces the correct Algorithm 7

X2M ( v, (cid:96), ( a , . . . , a (cid:96) − )) Input: a vertex v ∈ V , (cid:96) ∈ { , . . . , n v } , and a i = h i ∈ F for i ∈ { , . . . , (cid:96) − } . Output: a i = f i ∈ F for i ∈ { , . . . , (cid:96) − } such that(3.26) (cid:96) − (cid:88) i =0 f i x i = (cid:96) − (cid:88) i =0 h i X β v ,i ( β v, x ) . if (cid:96) ≤ then return (cid:96) ← (cid:6) (cid:96)/ d v (cid:7) − (cid:96) ← (cid:96) − d v (cid:96) , (cid:96) (cid:48) ← min(2 d v , (cid:96) ) for i = 0 , . . . , (cid:96) − do X2M ( v α , d v , ( a dv i , a dv i +1 , . . . , a dv ( i +1) − )) X2M ( v α , (cid:96) , ( a dv (cid:96) , a dv (cid:96) +1 , . . . , a (cid:96) − )) for j = 0 , . . . , (cid:96) − do X2M ( v δ , (cid:96) + 1 , ( a j , a dv + j , . . . , a dv (cid:96) + j )) for j = (cid:96) , . . . , (cid:96) (cid:48) − do X2M ( v δ , (cid:96) , ( a j , a dv + j , . . . , a dv ( (cid:96) − j )) if (cid:96) (cid:54) = 0 and 1 /β v δ , (cid:54) = 1 then w ← /β v δ , for i = 1 , . . . , (cid:96) − do for j = 0 , . . . , d v − do a dv i + j ← wa dv i + j w ← w/β v δ , for j = 0 , . . . , (cid:96) − do a dv (cid:96) + j ← wa dv (cid:96) + j InverseTaylorExpansion (2 d v , (cid:96), ( a , a , . . . , a (cid:96) − ))output whenever v α or v δ is given as an input, then it produces the correct outputwhenever v and (cid:96) ∈ { , . . . , n v } are given as inputs.Let v ∈ V be an internal vertex and suppose that Algorithm 7 produces thecorrect output whenever v α or v δ is given as an input. Suppose that the algorithmis called on v and (cid:96) ∈ { , . . . , n v } , with a i = h i ∈ F for i ∈ { , . . . , (cid:96) − } . Then theassumption that Algorithm 7 produces the correct output whenever v α is given asan input implies that Lines 2–5 of the algorithm set a i = g i for i ∈ { , . . . , (cid:96) − } ,where g , . . . , g (cid:96) − are the unique elements in F such that (3.23) holds. Similarly,the assumption implies that Lines 6–9 then set a i = c i for i ∈ { , . . . , (cid:96) − } , where c , . . . , c (cid:96) − are the unique elements in F such that (3.24) holds. As v is an internalvertex, Lemma 3.25 implies that c , . . . , c (cid:96) − also satisfy (3.25).Let f , . . . , f (cid:96) − be the unique elements in F such that (3.26) holds. If (cid:96) ≤ d v ,then (3.25) and (3.26) imply that f i = c i for i ∈ { , . . . , (cid:96) − } . Moreover,Lines 10–18 have no eﬀect in this case. Therefore, the algorithm produces thecorrect output if (cid:96) ≤ d v . If (cid:96) > d v , then Lines 10–17 set a dv i + j = c dv i + j /β iv δ , for i ∈ { , . . . , (cid:6) (cid:96)/ d v (cid:7) − } and j ∈ { , . . . , min( (cid:96) − d v i, d v ) − } . Substitutinginto (3.25), it follows that (cid:100) (cid:96)/ dv (cid:101) − (cid:88) i =0  min( (cid:96) − dv i, dv ) − (cid:88) j =0 a dv i + j x j (cid:16) x dv − x (cid:17) i = (cid:96) − (cid:88) i =0 h i X β v ,i ( β v, x ) AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 35

Algorithm 8

M2X ( v, (cid:96), ( a , . . . , a (cid:96) − )) Input: a vertex v ∈ V , (cid:96) ∈ { , . . . , n v } , and a i = f i ∈ F for i ∈ { , . . . , (cid:96) − } . Output: a i = h i ∈ F for i ∈ { , . . . , (cid:96) − } such that (3.26) holds. if (cid:96) ≤ then return (cid:96) ← (cid:6) (cid:96)/ d v (cid:7) − (cid:96) ← (cid:96) − d v (cid:96) , (cid:96) (cid:48) ← min(2 d v , (cid:96) ) TaylorExpansion (2 d v , (cid:96), ( a , a , . . . , a (cid:96) − )) if (cid:96) (cid:54) = 0 and β v δ , (cid:54) = 1 then w ← β v δ , for i = 1 , . . . , (cid:96) − do for j = 0 , . . . , d v − do a dv i + j ← wa dv i + j w ← β v δ , w for j = 0 , . . . , (cid:96) − do a dv (cid:96) + j ← wa dv (cid:96) + j for j = 0 , . . . , (cid:96) − do M2X ( v δ , (cid:96) + 1 , ( a j , a dv + j , . . . , a dv (cid:96) + j )) for j = (cid:96) , . . . , (cid:96) (cid:48) − do M2X ( v δ , (cid:96) , ( a j , a dv + j , . . . , a dv ( (cid:96) − j )) for i = 0 , . . . , (cid:96) − do M2X ( v α , d v , ( a dv i , a dv i +1 , . . . , a dv ( i +1) − )) M2X ( v α , (cid:96) , ( a dv (cid:96) , a dv (cid:96) +1 , . . . , a (cid:96) − ))when InverseTaylorExpansion is called in Line 18. Thus, the algorithm produces thecorrect output if (cid:96) > d v . Hence, for internal v ∈ V , if the algorithm produces thecorrect output whenever v α or v δ is given as an input, then it produces the correctoutput whenever v and (cid:96) ∈ { , . . . , n v } are given as inputs. (cid:3) Algorithm 8 requires the precomputation and storage of the elements β v δ , , whiletheir inverses are required for Algorithm 7. Consequently, the algorithms requireauxiliary storage for O ( n ) ﬁeld elements, while all precomputations can be per-formed with O ( n ) ﬁeld operations. If (cid:96) = (cid:6) (cid:96)/ d v (cid:7) − (cid:96) − (cid:0) d v + 1 (cid:1) + (cid:96) = (cid:96) + (cid:6) (cid:96)/ d v (cid:7) − d v − TaylorExpansion or InverseTaylorExpansion perform at most (cid:98) (cid:96)/ (cid:99) (cid:6) log (cid:6) (cid:96)/ d v (cid:7)(cid:7) additions. It followsthat we should once again aim to avoid small values of d v when choosing a reduc-tion tree for the algorithms. However, compared to the algorithms for conversionbetween the LCH and the Newton and Lagrange bases, a much greater cost in termsof multiplications and additions is incurred if one fails to do so. If β is a Cantorbasis, then β v δ , = 1 for all internal v ∈ V , regardless of the choice of reductiontree (see Remark 3.1). Thus, Algorithms 7 and 8 perform no multiplications in thiscase, and require no precomputations.Lin et al. [20] provide two algorithms for converting from the monomial basis tothe LCH basis when (cid:96) is a power of two, one for arbitrary bases and one for Cantorbases. The reduction strategy they apply for the arbitrary bases corresponds toreduction trees with Im( d ) ⊆ { , } . For such reduction trees, Algorithm 8 performsthe same number of additions as their algorithm, but fewer multiplications in the recursive case (after equalising precomputations). For Cantor bases, Algorithm 8reduces to their algorithm by choosing the reduction tree so that d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V . Theorem 3.27.

Algorithms 7 and 8 perform at most (cid:98) (cid:96)/ (cid:99) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 mul-tiplications and (cid:98) (cid:96)/ (cid:99) (cid:0) (cid:100) log (cid:96) (cid:101) (cid:1) additions in F . If d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V , then the algorithms perform at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101)(cid:100) log log max( (cid:96), (cid:101) addi-tions in F . If β is a Cantor basis, then the algorithms perform no multiplications. We have already shown that Algorithms 5 and 6 perform no multiplicationswhen β is a Cantor basis. We split the remainder of the proof of Theorem 3.27 intothree lemmas, one for each of three remaining bounds. It is clear that Algorithms 7and 8 perform the same number of multiplications when given identical inputs.Consequently, we only prove the bounds for Algorithm 7. All three bounds are equalto zero or one for (cid:96) ≤

2, while Algorithm 7 performs no additions or multiplicationsfor such input values of (cid:96) . In particular, it follows that all three bound holds if theinput vertex is a leaf. Consequently, for each of the three bounds it is suﬃcient toshow that if v ∈ V is an internal vertex such that the bound holds whenever theinput vertex is v α or v δ , then the bound holds whenever the input vertex is v and (cid:96) ∈ { , . . . , n v } . Lemma 3.28.

Algorithms 7 and 8 perform at most (cid:98) (cid:96)/ (cid:99) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 multi-plications in F .Proof. Suppose that for some internal vertex v ∈ V , Algorithm 7 performs at most (cid:98) (cid:96)/ (cid:99) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 multiplications in F whenever v α or v δ is given as the inputvertex. Furthermore, suppose that v and (cid:96) ∈ { , . . . , n v } are given as inputs tothe algorithm. If (cid:96) = 0, then (cid:96) = (cid:96) (cid:48) = (cid:96) and the algorithm performs at most (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 + (cid:96) × (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1multiplications. Therefore, suppose that (cid:96) (cid:54) = 0. Then, as (cid:96) ≤ d v , Lines 3–5 ofthe algorithm perform at most (cid:96) (cid:0) d v − (3 d v −

4) + 1 (cid:1) + (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 ≤ (cid:22) (cid:96) (cid:23) (3 d v −

4) + (cid:96) + 1multiplications. Lines 6–9 perform at most (cid:96) (cid:22) (cid:96) + 12 (cid:23) (3 (cid:100) log (cid:96) + 1 (cid:101) −

4) + (cid:0) d v − (cid:96) (cid:1)(cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 2 d v ≤ (cid:22) (cid:96) ( (cid:96) + 1) + (2 d v − (cid:96) ) (cid:96) (cid:23) (3 (cid:100) log (cid:96) + 1 (cid:101) −

4) + 2 d v = (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) − d v −

4) + 2 d v AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 37 multiplications, Lines 10–17 perform ( (cid:96) − (cid:0) d v + 1 (cid:1) + (cid:96) multiplications, andLine 18 performs no multiplications. Summing these bounds, it follows that Algo-rithm 7 performs at most (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 − (cid:22) (cid:96) (cid:23) + (cid:0) d v + 2 (cid:1) (cid:96) + (cid:96) − ≤ (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 − (cid:96) + (cid:0) d v + 2 (cid:1) (cid:96) + (cid:96) + 1= (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1 − (cid:0) d v − (cid:1) (cid:96) − ( (cid:96) − ≤ (cid:22) (cid:96) (cid:23) (3 (cid:100) log (cid:96) (cid:101) −

4) + 1multiplications. (cid:3)

Lemma 3.29.

Algorithms 7 and 8 perform at most (cid:98) (cid:96)/ (cid:99) (cid:0) (cid:100) log (cid:96) (cid:101) (cid:1) additions in F .Proof. Suppose that for some internal vertex v ∈ V , Algorithm 7 performs at most (cid:98) (cid:96)/ (cid:99) (cid:0) (cid:100) log (cid:96) (cid:101) (cid:1) additions in F whenever v α or v δ is given as the input vertex. Fur-thermore, suppose that v and (cid:96) ∈ { , . . . , n v } are given as inputs to the algorithm.If (cid:96) = 0, then (cid:96) = (cid:96) (cid:48) = (cid:96) and the algorithm performs at most (cid:22) (cid:96) (cid:23)(cid:18) (cid:100) log (cid:96) (cid:101) (cid:19) + (cid:96) × (cid:22) (cid:96) (cid:23) (cid:100) log (cid:101) = (cid:22) (cid:96) (cid:23)(cid:18) (cid:100) log (cid:96) (cid:101) (cid:19) additions. Therefore, suppose that (cid:96) >

0. Then, as (cid:96) ≤ d v , Lines 3–5 of thealgorithm perform at most (cid:96) d v − (cid:18) d v (cid:19) + (cid:22) (cid:96) (cid:23)(cid:18) (cid:100) log (cid:96) (cid:101) (cid:19) ≤ (cid:96) d v − (cid:18) d v (cid:19) + (cid:22) (cid:96) (cid:23)(cid:18) d v (cid:19) = (cid:22) (cid:96) (cid:23)(cid:18) d v (cid:19) additions. Lines 6–9 of the algorithm perform at most (cid:96) (cid:22) (cid:96) + 12 (cid:23)(cid:18) (cid:100) log (cid:96) + 1 (cid:101) (cid:19) + (cid:0) d v − (cid:96) (cid:1)(cid:22) (cid:96) (cid:23)(cid:18) (cid:100) log (cid:96) (cid:101) (cid:19) ≤ (cid:22) (cid:96) (cid:23)(cid:18) (cid:100) log (cid:96) + 1 (cid:101) (cid:19) additions, since (cid:96) ( (cid:96) + 1) + (2 d v − (cid:96) ) (cid:96) = (cid:96) . Lines 10–17 perform no additions,while Lemma 3.24 implies that Line 18 performs at most (cid:98) (cid:96)/ (cid:99) (cid:6) log (cid:6) (cid:96)/ d v (cid:7)(cid:7) = (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) + 1 (cid:101) additions. As (cid:100) log (cid:96) + 1 (cid:101) = (cid:100) log (cid:96) (cid:101) − d v , it follows by summingthese bounds that Algorithm 7 performs at most (cid:22) (cid:96) (cid:23)(cid:18)(cid:18) (cid:100) log (cid:96) (cid:101) (cid:19) − (cid:100) log (cid:96) + 1 (cid:101) ( d v − (cid:19) ≤ (cid:22) (cid:96) (cid:23)(cid:18) (cid:100) log (cid:96) (cid:101) (cid:19) additions. (cid:3) Lemma 3.30.

Suppose that d v = 2 (cid:100) log n v (cid:101)− for all internal v ∈ V . Then Algo-rithms 7 and 8 perform at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101)(cid:100) log log max( (cid:96), (cid:101) additions in F .Proof. Suppose that d v = 2 (cid:100) log n v (cid:101)− for all internal vertices v ∈ V . Further-more, suppose that for some internal vertex v ∈ V , Algorithm 7 performs at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101)(cid:100) log log max( (cid:96), (cid:101) additions in F whenever v α or v δ is given as theinput vertex. Finally, suppose that v and (cid:96) ∈ { , . . . , n v } are given as inputs tothe algorithm. If (cid:96) = 0, then (cid:96) = (cid:96) (cid:48) = (cid:96) and the algorithm performs at most (cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) (cid:101)(cid:100) log log (cid:96) (cid:101) + (cid:96) × (cid:22) (cid:96) (cid:23) (cid:100) log (cid:101) = (cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) (cid:101)(cid:100) log log (cid:96) (cid:101) additions. Therefore, suppose that (cid:96) >

0. Then, as (cid:96) ≤ d v < (cid:96) , Lines 3–5 of thealgorithm perform at most(3.27) (cid:96) d v − d v (cid:100) log log (cid:96) (cid:101) + (cid:22) (cid:96) (cid:23) d v (cid:100) log log (cid:96) (cid:101) = (cid:22) (cid:96) (cid:23) d v (cid:100) log log (cid:96) (cid:101) additions. Lines 6–7 of the algorithm perform at most (cid:96) (cid:22) (cid:96) + 12 (cid:23) (cid:100) log (cid:96) + 1 (cid:101)(cid:100) log log (cid:96) + 1 (cid:101) additions, while Lines 8–9 perform at most (cid:0) d v − (cid:96) (cid:1)(cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) (cid:101)(cid:100) log log max( (cid:96) , (cid:101) additions. As (cid:96) ( (cid:96) + 1) + (2 d v − (cid:96) ) (cid:96) = (cid:96) , it follows that Lines 6–9 of the algorithmperform at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) + 1 (cid:101)(cid:100) log log (cid:96) + 1 (cid:101) additions. If (cid:96) ≥

2, then thereexists an integer k ≥ k − < (cid:96) + 1 ≤ k . Then (cid:100) log log (cid:96) + 1 (cid:101) = k , 2 k − < n v − d v ≤ d v and (cid:96) = 2 d v ( (cid:96) + (cid:96) / d v ) > d v +2 k − > k . Thus, (cid:100) log log (cid:96) + 1 (cid:101) ≤ (cid:100) log log (cid:96) (cid:101) − (cid:96) ≥

2. As (cid:96) ≥

3, the inequality also holds if (cid:96) = 1. Therefore, Lines 6–9 of the algorithm perform at most(3.28) (cid:22) (cid:96) (cid:23) ( (cid:100) log (cid:96) (cid:101) − d v ) (cid:100) log log (cid:96) (cid:101) − (cid:22) (cid:96) (cid:23) (cid:100) log (cid:96) + 1 (cid:101) additions. Lines 10–17 of the algorithm perform no additions, while Lemma 3.24implies that Line 18 performs at most (cid:98) (cid:96)/ (cid:99) (cid:6) log (cid:6) (cid:96)/ d v (cid:7)(cid:7) = (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) + 1 (cid:101) ad-ditions. By combining this last bound with the bounds (3.27) and (3.28) on thenumber of additions performed by Lines 3–5 and Lines 6–9, it follows that Algo-rithm 7 performs at most (cid:98) (cid:96)/ (cid:99)(cid:100) log (cid:96) (cid:101)(cid:100) log log (cid:96) (cid:101) additions, which is the requiredbound since (cid:96) ≥ (cid:3) Constructing a basis and reduction tree

Let β = ( β , . . . , β n − ) ∈ F n have entries that are linearly independent over F .If n = 1, then there exists a unique reduction tree for β , the tree consisting of asingle vertex. If n >

1, then a full binary tree is a reduction tree for β if it has n leaves, the subtrees rooted on the children r α and r δ of the tree’s root vertex r are themselves reduction trees for α ( β, d ( r )) and δ ( β, d ( r )), respectively, and thequotients β /β , . . . , β d ( r ) − /β belong to F d ( r ) . The requirement on the quotientsis trivially satisﬁed if d ( r ) = 1. Consequently, the full binary tree with n leaves andIm( d ) ⊆ { , } is a reduction tree for β (Proposition 2.7). We view this tree as thetrivial choice of reduction tree for the basis, and as capturing the approach used byexisting algorithms. We expect such trees to yield the worst algebraic complexityfor the algorithms of Section 3. Accordingly, when we have freedom to choose thebasis vector, our choice should enable us to avoid their use. Cantor bases providesuch a choice, and we witnessed the beneﬁts they provide in Section 3. However,Cantor bases are restricted to extensions with degree divisible by suﬃciently largepowers of two. In this section, we propose new basis constructions that allow us tobeneﬁt similarly in other extensions.Regardless of the chosen basis vector, the choice of reduction trees is limited bythe subﬁeld structure of F . AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 39

Proposition 4.1. If ( V, E ) is a reduction tree for some vector in F n , then d ( v α ) < d ( v ) < n ≤ [ F : F ] and max( d ( v α ) , | d ( v ) | [ F : F ] for all internal v ∈ V .Proof. Suppose that (

V, E ) is a reduction tree for some vector β ∈ F n . Then n ≤ [ F : F ] since Deﬁnition 2.6 requires β to have linearly independent entries over F . The deﬁnition also requires the tree to have n leaves. Thus, d ( v α ) < | L v α | = d ( v ) < | L v | ≤ n ≤ [ F : F ]for all internal v ∈ V .Let v ∈ V be an internal vertex. Then it follows from Deﬁnition 2.6 that thesubtree rooted on v is a reduction tree for some vector β v = ( β v, , . . . , β v, | L v |− ) ∈ F | L v | that has linearly independent entries over F . Moreover, as | L v | >

1, thedeﬁnition implies that the quotients β v, /β v, , . . . , β v,d ( v ) − /β v, belong to F d ( v ) .These quotients inherit linear independence over F . Thus, they form a basis ofthe extension F d ( v ) / F . As the quotients also belong to F , it follows that F d ( v ) isa subﬁeld of F . Therefore, d ( v ) divides [ F : F ]. Similarly, if the vertex is v α is aninternal vertex, then F d ( vα ) is a subﬁeld of F d ( v ) , since the subtree rooted on v α isa reduction tree for α ( β v , d ( v )) = ( β v, , . . . , β v,d ( v ) − ). As v α is an internal vertexif and only if d ( v α ) ≥

1, it follows that max( d ( v α ) ,

1) divides d ( v ). (cid:3) Corollary 4.2.

Suppose that the entries of β ∈ F n are linearly independent over F ,and [ F : F ] has no proper factor less than n . Then a full binary tree is a reductiontree for β if and only if it has n leaves and Im( d ) ⊆ { , } .Proof. Proposition 2.7 implies that it is suﬃcient to have n leaves and Im( d ) ⊆{ , } , while Proposition 4.1 implies that it is also necessary. (cid:3) A construction for arbitrary ﬁelds.

It follows from Proposition 4.1 thata path in a reduction tree that consists of two or more edges of the form { v, v α } admits a nontrivial tower of subﬁelds of F . However, the existence of a basis vectorof a prescribed dimension that has a nontrivial reduction tree is not guaranteed bythe existence of nontrivial tower of subﬁelds. Indeed, Corollary 4.2 shows that it isnecessary for the tower to contain a subﬁeld other than F of degree bounded bythe dimension. In this section, we show that this requirement is also suﬃcient. Theorem 4.3.

Suppose there exists a tower of subﬁelds (4.1) F = F n ⊂ F n ⊂ · · · ⊂ F nm = F . Let { ϑ k, , . . . , ϑ k,n k +1 /n k − } be a basis of F nk +1 / F nk for k ∈ { , . . . , m − } , and β i = m − (cid:89) k =0 ϑ k,i k such that m − (cid:88) k =0 i k n k = i for i ∈ { , . . . , n m − } . Then β , . . . , β n m − ∈ F are linearly independent over F .Moreover, a full binary tree ( V, E ) with n ≤ n m leaves is a reduction tree for ( β , . . . , β n − ) if Im( d ) ⊆ { , n , . . . , n m − } and d ( v δ ) ≤ d ( v ) for all internal v ∈ V . The requirements of Theorem 4.3 are satisﬁed by the full binary tree with n leaves and Im( d ) ⊆ { , } . Consequently, Proposition 2.7 follows from the case m = 1. We delay the proof of the theorem until the end of the section. Instead, we now show that the basis vectors given by the construction of the theorem allow anontrivial and, more importantly, beneﬁcial choice of reduction trees. Proposition 4.4. If I ⊆ N and ( V, E ) is a full binary tree such that d ( v ) = max { i ∈ I | i < | L v |} for all internal v ∈ V , then d ( v δ ) ≤ d ( v ) for all internal v ∈ V .Proof. Suppose that I ⊆ N and a full binary tree ( V, E ) satisfy the conditions ofthe proposition. Then d ( v δ ) < | L v δ | < | L v | and d ( v δ ) ∈ I ∪ { } for all internal v ∈ V . Hence, d ( v δ ) ≤ max { i ∈ I | i < | L v |} = d ( v ) for all internal v ∈ V . (cid:3) For tuples of positive integers ( n , . . . , n m ) such that (4.1) holds, let T ( n ,...,n m ) n denote the full binary tree with n leaves and d ( v ) = max { n k | n k < | L v |} for allinternal vertices v . If n < n ≤ n m , then the root vertex r of T ( n ,...,n m ) n satisﬁes d ( r ) ≥ n >

1, establishing the existence of a tree with Im( d ) (cid:42) { , } that satisﬁesthe conditions of Theorem 4.3. Moreover, we expect this tree to approximatelyminimise the algebraic complexity of the conversion algorithms of Section 3 overall trees that satisfy the conditions of the theorem. Example 4.5.

Suppose that F = F . Then there are eight tuples of positiveintegers ( n , . . . , n m ) such that (4.1) holds. For each such tuple, Figure 6 displaysthe relative number of additions performed by the basis conversion algorithms ofSection 3 for β = ( β , . . . , β ) given by the constructed of Theorem 4.3 (the choiceof the bases for the extensions F nk +1 / F nk doesn’t matter here), the reduction tree T ( n ,...,n m ) , and polynomial length ( (cid:96) ) ranging over { , . . . , } . The number ofadditions performed in each case is given as a fraction of the number performed forthe tuple (1 , ,

12) has Im( d ) = { , } .Thus, it represents the complexity obtained with the reduction strategy of existingalgorithms. The additional parameters c = (cid:96) and b = 0 are used for Algorithm 3,and c = (cid:96) is used for Algorithm 4. Figure 7 similarly displays the relative numberof multiplications performed by Algorithms 7 and 8, under the assumption that β v δ , is never equal to one in Line 10 of Algorithm 7 and Line 4 of Algorithm 8.The daggered tuples that appear in the ﬁgure are discussed in the next section.Example 4.5 demonstrates that the construction of Theorem 4.3 and the choiceof reduction trees it provides allows us to achieve a lower algebraic complexity if[ F : F ] has even a single suﬃciently small factor. The potential beneﬁts are greaterstill when the degree of the ﬁeld contains many small prime factors, echoing thebeneﬁts obtained by using roots of unity with smooth order in multiplicative FFTs.While a reduction in algebraic complexity is certainly desirable, it is not the onlyconsideration in practice. For example, Harvey’s “cache-friendly” variant [15] ofthe radix-2 truncated Fourier transform [25] obtains better practical performanceby optimising cache eﬀects. This variant employs a reduction strategy that morerapidly reduces to problems of size that ﬁt into cache, helping to reduce dataexchanges with RAM. An analogous approach for the algorithms of Section 3 is touse reduction trees that balance the size of | L v α | and | L v δ | for internal vertices. If[ F : F ] is smooth and the construction of Theorem 4.3 is applied with the numberof subﬁelds in the tower taken as large as possible, then the following propositionshows that it is possible to construct such trees while meeting requirements of AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 41 . . N2X / X2N . . L2X . . X2L . . . X2M / M2X (1 ,

12) (1 , ,

12) (1 , , , ,

12) (1 , , ,

12) (1 , , , Figure 6.

Relative number of additions performed by the algo-rithms of Section 3 (see Example 4.5).the theorem by choosing d ( v ) ∈ { n , . . . , n m } to minimise max( n k , | L v | − n k ) forall internal vertices. While there are trade-oﬀs between optimising cache eﬀectsand reducing the operation count to be considered in practice, it appears thatTheorem 4.3 oﬀers us some freedom, especially when the degree of the ﬁeld issmooth, to tune the algorithms of Section 3. Proposition 4.6. If I ⊆ N and ( V, E ) is a full binary tree such that d ( v ) ∈ arg min i ∈ I max( i, | L v | − i ) for all internal v ∈ V , then d ( v δ ) ≤ d ( v ) for all internal v ∈ V .Proof. We prove the proposition by contradiction. Suppose that I ⊆ N and a fullbinary tree ( V, E ) satisfy the conditions of the proposition. Furthermore, supposethere exists an internal vertex v ∈ V such that d ( v δ ) > d ( v ). Then v δ is an internal . . . . X2M / M2X (1 , , , , , , , , , , , , , , , , , , , , , † (1 , , , † (1 , , , † Figure 7.

Relative number of multiplications performed by Al-gorithms 7 and 8 (see Example 4.5).vertex since d ( v δ ) >

1. Thus, d ( v δ ) ∈ I andmax( d ( v δ ) , | L v | − d ( v δ )) < max( | L v δ | , | L v | − d ( v ))= max( | L v | − | L v α | , | L v | − d ( v ))= max( | L v | − d ( v ) , | L v | − d ( v )) ≤ max( d ( v ) , | L v | − d ( v )) , which contradicts the minimality of max( d ( v ) , | L v | − d ( v )). (cid:3) We now turn our attention to the proof of Theorem 4.3, which we obtain as aconsequence of the following more general result.

Lemma 4.7.

Suppose there exists a tower of subﬁelds F = F n ⊂ · · · ⊂ F nm = F .Let β = ( β , . . . , β n − ) ∈ F n have entries that are linearly independent over F ,and ( V, E ) be a full binary tree with n leaves and root vertex r ∈ V . Then ( V, E ) isa reduction tree for β if the following conditions are satisﬁed: (1) Im( d ) ⊆ { , n , . . . , n m − } , (2) d ( v δ ) ≤ d ( v ) for all internal v ∈ V , and (3) β i /β n k (cid:98) i/n k (cid:99) ∈ F nk for i ∈ { , . . . , n − } and k ∈ { , . . . , m − } such that n k ≤ d ( r ) .Proof. We prove the lemma by induction on n . The lemma holds trivially if n = 1,since it is suﬃcient for ( V, E ) to have n leaves in this case. Therefore, let n ≥ n . Suppose that β = ( β , . . . , β n − ) ∈ F n and a full binary tree ( V, E ) satisfy the conditions of the

AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 43 lemma. Then the root vertex r ∈ V of the tree is not a leaf since | L r | = n ≥ d ( r ) = n (cid:96) for some (cid:96) ∈ { , . . . , m − } such that n (cid:96) < n .Let α ( β, d ( r )) = ( α , . . . , α n (cid:96) − ) and δ ( β, d ( r )) = ( δ , . . . , δ n − n (cid:96) − ), which havelinearly independent entries over F by Lemma 2.1. Then, as d ( r α ) < d ( r ), (3)implies that α i /α n k (cid:98) i/n k (cid:99) = β i /β n k (cid:98) i/n k (cid:99) ∈ F nk for i ∈ { , . . . , n (cid:96) − } and k ∈{ , . . . , m − } such that n k ≤ d ( r α ). Moreover, as n , . . . , n (cid:96) divide n (cid:96) , we have β n (cid:96) + i β = β n (cid:96) + i β n (cid:96) + n k (cid:98) i/n k (cid:99) β n (cid:96) + n k (cid:98) i/n k (cid:99) β = β n (cid:96) + i β n k (cid:98) ( n (cid:96) + i ) /n k (cid:99) β n (cid:96) + n k (cid:98) i/n k (cid:99) β , where β n (cid:96) + i /β n k (cid:98) ( n (cid:96) + i ) /n k (cid:99) ∈ F nk ⊆ F n(cid:96) , for i ∈ { , . . . , n − n (cid:96) − } and k ∈{ , . . . , (cid:96) } . It follows that δ i = β n (cid:96) + i β n k (cid:98) ( n (cid:96) + i ) /n k (cid:99) (cid:32)(cid:18) β n (cid:96) + n k (cid:98) i/n k (cid:99) β (cid:19) n(cid:96) − β n (cid:96) + n k (cid:98) i/n k (cid:99) β (cid:33) = β n (cid:96) + i β n k (cid:98) ( n (cid:96) + i ) /n k (cid:99) δ n k (cid:98) i/n k (cid:99) for i ∈ { , . . . , n − n (cid:96) − } and k ∈ { , . . . , (cid:96) } . As d ( r δ ) ≤ d ( r ) = n (cid:96) by (2), it followsthat δ i /δ n k (cid:98) i/n k (cid:99) ∈ F nk for i ∈ { , . . . , n − n (cid:96) − } and k ∈ { , . . . , m − } such that n k ≤ d ( r δ ). Conditions (1) and (2) are satisﬁed by the subtrees of ( V, E ) rooted on r α and r δ through inheritance. The subtree rooted on r α has | L r α | = d ( r ) = n (cid:96) < n leaves, while the subtree rooted on r δ has | L r δ | = | L r | − | L r α | = n − d ( r ) = n − n (cid:96) < n leaves. Therefore, the induction hypothesis implies that the subtreerooted on r α is a reduction tree for α ( β, d ( r )), and the subtree rooted on r δ is areduction tree for δ ( β, d ( r )). Finally, (3) implies that β i /β = β i /β n (cid:96) (cid:98) i/n (cid:96) (cid:99) ∈ F d ( r ) for i ∈ { , . . . , d ( r ) − } . Hence, ( V, E ) is a reduction tree for β . (cid:3) If β = ( β , . . . , β n − ) ∈ F n is a Cantor basis with n ≥

4, then properties (1)and (2) of Lemma 2.9 imply that β ∈ F \ F . Thus, β /β does not belong to F ,since β = β /β + 1. Consequently, Proposition 2.8 implies that the converseof Lemma 4.7 does not hold. We now complete the proof of Theorem 4.3 byestablishing linear independence and showing that the vectors ( β , . . . , β n − ) for n ∈ { , . . . , n m } always satisfy the third condition of Lemma 4.7. Lemma 4.8.

Suppose that β , . . . , β n m − are given by the construction of Theo-rem 4.3. Then β , . . . , β n m − are linearly independent over F , and β i /β n k (cid:98) i/n k (cid:99) ∈ F nk for i ∈ { , . . . , n m − } and k ∈ { , . . . , m − } .Proof. Suppose that β , . . . , β n m − are given by the construction of Theorem 4.3.Let i ∈ { , . . . , n m − } and i = (cid:80) m − k =0 i k n k with i k ∈ { , . . . , n k +1 /n k − } for k ∈ { , . . . , m − } . Then β i β n k (cid:98) i/n k (cid:99) = ϑ ,i · · · ϑ k − ,i k − ϑ k,i k · · · ϑ m − ,i m − ϑ , · · · ϑ k − , ϑ k,i k · · · ϑ m − ,i m − = ϑ ,i · · · ϑ k − ,i k − ϑ , · · · ϑ k − , ∈ F nk for k ∈ { , . . . , m } , as required. Similarly,(4.2) β in k + j β = ϑ k,i ϑ k, β j β and β j β = β j β n k (cid:98) j/n k (cid:99) ∈ F nk for i ∈ { , . . . , n k +1 /n k − } , j ∈ { , . . . , n k − } and k ∈ { , . . . , m − } . Now { ϑ k, /ϑ k, , . . . , ϑ k,n k +1 /n k − /ϑ k, } is a basis of F nk +1 / F nk for k ∈ { , . . . , m − } .Therefore, if β /β , . . . , β n k − /β are linearly independent over F for some k ∈ { , . . . , m − } , then (4.2) implies that β /β , . . . , β n k +1 − /β are linearly indepen-dent over F . As n = 1, it follows that β , . . . , β n m − are linearly independentover F . (cid:3) Fewer multiplications for quadratic extensions.

Theorem 4.3 does notplace any restrictions on the choice of bases for the extensions F nk +1 / F nk . In thissection, we show that if some of these extension are quadratic, then it possible tochoose their bases so that Algorithms 7 and 8 perform fewer multiplications. Theorem 4.9.

Let β , . . . , β n m − ∈ F be constructed as per Theorem 4.3, n ∈{ , . . . , n m } , and V be the vertex set of the tree T ( n ,...,n m ) n . Recursively deﬁnevectors β v = ( β v, , . . . , β v, | L v |− ) for v ∈ V as follows: if v is the root of the tree,then β v = ( β , . . . , β n − ) ; and if v is an internal vertex, then β v α = α ( β v , d ( v )) and β v δ = δ ( β v , d ( v )) . Suppose there exists t ∈ { , . . . , m − } such that (4.3) n t +1 n t = 2 and Tr F nt +1 / F nt (cid:18) ϑ t, ϑ t, (cid:19) = 1 . Then β v δ , = 1 for all v ∈ V such that d ( v ) = n t . The deﬁnition of the vectors β v in Theorem 4.9 matches that of Section 3. Con-sequently, for β = ( β , . . . , β n − ) given by the construction of Theorem 4.3, andthe reduction tree T ( n ,...,n m ) n , Lines 10–17 of Algorithm 7 (similarly, Lines 4–11 ofAlgorithm 8) perform no multiplications if the input vertex v satisﬁes d ( v ) = n t for some t such that (4.3) holds. If n t +1 /n t = 2, then we may ensure that thecondition on the trace in (4.3) is satisﬁed by taking ϑ t, = 1 and ϑ t, ∈ F nt +1 with Tr F nt +1 / F nt ( ϑ t, ) = 1, which yields a basis since the trace of one is equalto zero. Therefore, it is possible to achieve a signiﬁcant reduction in the numberof multiplications performed by Algorithms 7 and 8 when the tower used in theconstruction contains several quadratic extensions. Returning to Example 4.5, wesee such an improvement for F = F by looking at the relative number of mul-tiplications performed for the daggered tuples in Figure 7. For these tuples, it isassumed that β v δ , = 1 if and only if d ( v ) = n t for some t such that n t +1 /n t = 2.We now turn our attention to the proof of Theorem 4.9. Lemma 4.10.

Assume the hypothesis and notation of Theorem 4.9. If v ∈ V isan internal vertex and d ( v ) = n (cid:96) , then there exist elements ϑ (cid:48) (cid:96), , ϑ (cid:48) (cid:96), , . . . ∈ F n(cid:96) +1 that are linearly independent over F n(cid:96) , for which (4.4) β v δ ,i = ϑ ,i ϑ , · · · ϑ (cid:96) − ,i (cid:96) − ϑ (cid:96) − , ϑ (cid:48) (cid:96),i (cid:96) such that (cid:96) (cid:88) k =0 i k n k = i, for i ∈ { , . . . , | L v δ | − } . Furthermore, (4.5) ϑ (cid:48) (cid:96), = (cid:18) ϑ (cid:96), ϑ (cid:96), (cid:19) n(cid:96) − ϑ (cid:96), ϑ (cid:96), if there is no vertex u ∈ V such that u δ = v and d ( u ) = n (cid:96) .Proof. Throughout the proof, if i ∈ { , . . . , n m − } , then i , . . . , i m − denote thecoeﬃcients of the expansion i = (cid:80) m − k =0 i k n k such that i k ∈ { , . . . , n k +1 /n k − } for k ∈ { , . . . , m − } . We note in particular that { , . . . , | L v |− } ⊆ { , . . . , n m − } for v ∈ V , since | L v | ≤ n ≤ n m . We begin by showing that the assertions of the lemma AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 45 hold for a special subset of the internal vertices in the tree, before completing theproof by induction.Let v ∈ V be an internal vertex such that the (possibly trivial) path r = v , . . . , v h = v that connects v to the root vertex r ∈ V satisﬁes v i = ( v i − ) α for i ∈ { , . . . , h } . Then β v,i = β v h − ,i = · · · = β v ,i = β r,i = m − (cid:89) k =0 ϑ k,i k for i ∈ { , . . . , | L v | − } . As v is an internal vertex, d ( v ) = max { n k | n k < | L v |} by the deﬁnition of the tree T ( n ,...,n m ) n . Thus, d ( v ) = n (cid:96) for some (cid:96) ∈ { , . . . , m − } . Moreover, | L v δ | = | L v | − n (cid:96) < n (cid:96) +1 , since otherwise the maximality of n (cid:96) is contradicted. Consequently,(4.6) β v,n (cid:96) + i β v, = ϑ ,i ϑ , · · · ϑ (cid:96) − ,i (cid:96) − ϑ (cid:96) − , ϑ (cid:96), i (cid:96) ϑ (cid:96), for i ∈ { , . . . , | L v δ | − } . As the quotients ϑ k,i /ϑ k, ∈ F nk +1 ⊆ F n(cid:96) for k ∈ { , . . . , (cid:96) − } , it follows that (4.4)holds with ϑ (cid:48) (cid:96),i = (cid:18) ϑ (cid:96),i +1 ϑ (cid:96), (cid:19) n(cid:96) − ϑ (cid:96),i +1 ϑ (cid:96), for i ∈ { , . . . , n (cid:96) +1 /n (cid:96) − } . Consequently, (4.5) also holds. Finally, ϑ (cid:48) (cid:96), , . . . , ϑ (cid:48) (cid:96),n (cid:96) +1 /n (cid:96) − inherit linear inde-pendence over F n(cid:96) from ϑ (cid:96), , . . . , ϑ (cid:96),n (cid:96) +1 /n (cid:96) − , since n (cid:96) +1 /n (cid:96) − (cid:88) i =0 λ i +1 ϑ (cid:48) (cid:96),i = 0 if and only if n (cid:96) +1 /n (cid:96) − (cid:88) i =1 λ i ϑ (cid:96),i ϑ (cid:96), ∈ F n(cid:96) for λ , . . . , λ n (cid:96) +1 /n (cid:96) − ∈ F n(cid:96) .We now proceed by induction on the depth of v , i.e., on the length h of the path r = v , . . . , v h = v that connects v to the root vertex of the tree. If v ∈ V is aninternal vertex of depth zero, then v is the root of the tree which is covered bythe already proved case. Therefore, let h be a positive integer, and suppose thatthe assertions of the lemma holds for all internal vertices with depth less than h .Let v ∈ V be an internal vertex of depth h (if no such vertex exists, then we aredone) and r = v , . . . , v h = v be the path that connects v to the root vertex r ∈ V .We may assume that v i = ( v i − ) δ for some i ∈ { , . . . , h } . Let j the maximumof all such indices. Then v j − is an internal vertex. Thus, d ( v j − ) = n k for some k ∈ { , . . . , m − } . Consequently, the induction hypothesis and the choice of j imply that there exists a basis { ϑ (cid:48)(cid:48) k, , . . . , ϑ (cid:48)(cid:48) k,n k +1 /n k − } of F nk +1 / F nk such that(4.7) β v,i = β v h − ,i = · · · = β v j ,i = ϑ ,i ϑ , · · · ϑ k − ,i k − ϑ k − , ϑ (cid:48)(cid:48) k,i k for i ∈ { , . . . , | L v | − } . As v is an internal vertex, d ( v ) = n (cid:96) for some (cid:96) ∈ { , . . . , m − } . Moreover, thedeﬁnition of the tree T ( n ,...,n m ) n implies that n (cid:96) = max { n t | n t < | L v |} ≤ max (cid:8) n t | n t < (cid:12)(cid:12) L v j − (cid:12)(cid:12)(cid:9) = n k , since v is descended from v j − . If (cid:96) = k , then (4.7) implies that (4.4) holds with ϑ (cid:48) (cid:96),i = (cid:32) ϑ (cid:48)(cid:48) (cid:96),i +1 ϑ (cid:48)(cid:48) (cid:96), (cid:33) n(cid:96) − ϑ (cid:48)(cid:48) (cid:96),i +1 ϑ (cid:48)(cid:48) (cid:96), for i ∈ { , . . . , n (cid:96) +1 /n (cid:96) − } , which inherit linear independence over F n(cid:96) from ϑ (cid:48)(cid:48) k, , . . . , ϑ (cid:48)(cid:48) k,n k +1 /n k − . Moreover,we must have j = h , since otherwise the maximality of j implies that v is descendedfrom ( v j ) α , which in-turn implies that n k = n (cid:96) < | L v | ≤ d ( v j ) ≤ n k . It followsthat u = v h − satisﬁes u δ = v and d ( u ) = n (cid:96) . Consequently, we are not required toshow that (4.5) holds in this case.If (cid:96) < k , then | L v δ | = | L v | − n (cid:96) < n (cid:96) +1 ≤ n k , since n (cid:96) = max { n t | n t < | L v |} .Thus, (4.7) implies that (4.6) holds, which we have already shown to be suﬃcient forthe two assertions of the lemma to hold. Hence, the lemma follows by induction. (cid:3) Remark . Lemma 4.10 implies that β v δ , = ϑ (cid:48) (cid:96), lies in the subﬁeld F n(cid:96) +1 regardless of the choice of bases used in the construction of Theorem 4.3. Conse-quently, if β v δ , cannot be forced to equal one, then it may still be possible to reducethe cost of the multiplications performed in Lines 10–17 of Algorithm 7 (similarly,Lines 4–11 of Algorithm 8) by choosing the representation of the elements in F sothat the cost of multiplication is reduced whenever one of the multiplicands be-longs to F n(cid:96) +1 . Such optimisations have previously been shown to be beneﬁcial inpractice, particularly for multiplications by elements of small subﬁelds, by Bern-stein and Chou [3] and Chen et al. [8]. These considerations also extend to thealgorithms for conversion between the LCH and the Newton or Lagrange bases ofSections 3.2–3.4. For these algorithms, if the tower used in the construction of thebasis contains small subﬁelds, then Lemma 4.10 can be used to show that some ofthe precomputed elements ϕ v ( u, σ v,i ) belong to small subﬁelds. Consequently, ifthe initial shift parameter λ also lies in a small subﬁeld, as is the case when it iszero, then so too do some of the multiplicands in the base cases of the algorithms. Proof of Theorem 4.9.

Suppose there exists t ∈ { , . . . , m − } such that (4.3) holds,and there exists a vertex v ∈ V such that d ( v ) = n t . If v = u δ for some u ∈ V ,then d ( u ) (cid:54) = n t , since otherwise n t +1 = 2 n t < n t + | L v | = n t + ( | L u | − n t ) = | L u | ,contradicting the maximality of n t . Therefore, Lemma 4.10 implies that β v δ , = ϑ , ϑ , · · · ϑ t − , ϑ t − , (cid:32)(cid:18) ϑ t, ϑ t, (cid:19) nt − ϑ t, ϑ t, (cid:33) = Tr F nt +1 / F nt (cid:18) ϑ t, ϑ t, (cid:19) = 1 , as required. (cid:3) Generalised Cantor basis.

Gao and Mateer propose a generic method ofconstructing Cantor bases in the appendix of their paper [14]. We generalise theirconstruction to one that extends an arbitrarily chosen basis of F t / F to a basisof F mt / F that enjoys properties similar to those oﬀered by Cantor bases. Theoriginal construction of Gao and Mateer then corresponds to the case t = 1. Bygeneralising their construction, we are able to take advantage of quadratic exten-sions in a diﬀerent manner to the previous section in order to provide a greaterselection of reduction trees.Hereafter, we assume that F q m ⊆ F with m ∈ N positive and q = 2 t for somepositive t ∈ N . We also ﬁx a basis β , . . . , β m t − of F q m / F which is given bythe following generalisation of the construction of Gao and Mateer: choose a basis { ϑ , . . . , ϑ t − } of F q / F , choose β (2 m − t , . . . , β m t − ∈ F q m such thatTr F q m / F q (cid:0) β (2 m − t + i (cid:1) = ϑ i for i ∈ { , . . . , t − } , and recursively deﬁne β i = β qi + t − β i + t for i ∈ { , . . . , (2 m − t − } . AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 47 T u u u (cid:100) n/t (cid:101)− u (cid:100) n/t (cid:101) · · · T T T T (cid:100) n/t (cid:101)− T (cid:100) n/t (cid:101) Figure 8.

Construction of Theorem 4.12.For this construction, we provide generalisations of the properties of Cantor basesgiven in Lemma 2.9. The properties are then used to prove the following the-orem, which provides a method of constructing reduction trees for the vectors( β , . . . , β n − ) for n ∈ { , . . . , m t } . Theorem 4.12.

Let n ∈ { , . . . , m t } , T = ( V , E ) be a full binary tree with (cid:100) n/t (cid:101) leaves such that Im( d ) ⊆ { , , , . . . , (cid:100) log (cid:100) n/t (cid:101)(cid:101)− } , and T i = ( V i , E i ) be a reduc-tion tree for ( ϑ , . . . , ϑ min( n − ( i − t,t ) − ) , for i ∈ { , . . . , (cid:100) n/t (cid:101)} . Let u , . . . , u (cid:100) n/t (cid:101) ∈ V be the leaves of T , ordered such that for all i, j ∈ { , . . . , (cid:100) n/t (cid:101)} with i < j ,there exists an internal vertex v ∈ V with u i ∈ L v α and u j ∈ L v δ . Construct a newtree T = ( V, E ) by identifying the root vertex of T i with u i for i ∈ { , . . . , (cid:100) n/t (cid:101)} ,as shown in Figure 8. Then T is a reduction tree for ( β , . . . , β n − ) . Theorem 4.12 provides greater freedom than Theorem 4.3 by not requiring theinequality d ( v δ ) ≤ d ( v ) to hold for v ∈ V that are initially internal vertices in thetree T . Proposition 2.7 guarantees the existence of trees T , . . . , T (cid:100) n/t (cid:101) to use inthe construction. We can of course provide a better selection for these trees if themethods of Sections 4.1 and 4.2 are used to construct the basis { ϑ , . . . , ϑ t − } .The remainder of the section is dedicated to the proof of Theorem 4.12. Lemma 4.13.

The following hold: (1) β i = (cid:80) jr =0 (cid:0) jr (cid:1) β q r i + jt for i ∈ { , . . . , (2 m − j ) t − } and j ∈ { , . . . , m − } , (2) β i = β q k i +2 k t − β i +2 k t for i ∈ { , . . . , (2 m − k ) t − } and k ∈ { , . . . , m − } , (3) β i = ϑ i for i ∈ { , . . . , t − } , (4) β , . . . , β k t − ∈ F q k for k ∈ { , . . . , m − } , and (5) β , . . . , β m t − are linearly independent over F . Our proof of Lemma 4.13 generalises arguments found in Cantor’s paper [7] andthe paper of Gao and Mateer [14].

Proof.

We prove (1) by induction on j . It is clear that (1) holds if j = 0, regardlessof the value of i . Therefore, suppose that (1) holds for some j ∈ { , . . . , m − } and each i ∈ { , . . . , (2 m − j ) t − } . Then β i + t = j (cid:88) r =0 (cid:18) jr (cid:19) β q r i + t + jt = j (cid:88) r =0 (cid:18) jr (cid:19) β q r i +( j +1) t for i ∈ { , . . . , (2 m − j − t − } . As (2 m − j − t − ≤ (2 m − t −

1, it followsthat β i = β qi + t − β i + t = j (cid:88) r =0 (cid:18) jr (cid:19)(cid:16) β q r +1 i +( j +1) t − β q r i +( j +1) t (cid:17) = β i +( j +1) t + j +1 (cid:88) r =1 (cid:18)(cid:18) jr − (cid:19) + (cid:18) jr (cid:19)(cid:19) β q r i +( j +1) t = β i +( j +1) t + j +1 (cid:88) r =1 (cid:18) j + 1 r (cid:19) β q r i +( j +1) t = j +1 (cid:88) r =0 (cid:18) j + 1 r (cid:19) β q r i +( j +1) t for i ∈ { , . . . , (2 m − j − t − } . Thus, property (1) holds.For i, j ∈ N , Lucas’ lemma [22, p. 230] (see also [13]) implies that (cid:0) ij (cid:1) ≡ j ] k ≤ [ i ] k for k ∈ N . Using the lemma, property (2) followsfrom property (1) by setting j = 2 k . Similarly, Lucas’ lemma and property (1) with j = (2 m − k − k implies that β (2 k − t + i = (2 m − k − k (cid:88) r =0 (cid:18) (2 m − k − k r (cid:19) β q r (2 m − t + i = m − k − (cid:88) r =0 β q kr (2 m − t + i = Tr F q m / F q k (cid:0) β (2 m − t + i (cid:1) for i ∈ { , . . . , t − } and k ∈ { , . . . , m − } . Setting k = 0, property (3) fol-lows by the choice of β (2 m − t , . . . , β m t − . Moreover, the trace formula impliesthat β (2 k − t , . . . , β k t − ∈ F q k for k ∈ { , . . . , m − } , after-which the recursivedeﬁnition of β , . . . , β (2 m − t − implies that property (4) holds.Property (3) implies that β , . . . , β t − are linearly independent over F , andbelong to the kernel of the F -linear map ϕ : F → F given by ω (cid:55)→ ω q − ω .Thus, for i ∈ { , . . . , m } , any nontrivial F -linear relation amongst β , . . . , β it − ,which necessarily involves at least one of β ( i − t , . . . , β it − , translates under ϕ to anontrivial relation amongst β , . . . , β ( i − t − . It follows that property (5) holds byinduction on i . (cid:3) Lemma 4.13 provides generalisations of the properties of Cantor bases givenin Lemma 2.9. The following lemma may be viewed as a partial generalisationProposition 2.8.

Lemma 4.14.

Let ( V, E ) be a full binary tree with n ≤ m t leaves that satisﬁesthe following conditions: (1) if v ∈ V such that | L v | > t , then d ( v ) ∈ { k t | k < (cid:100) log (cid:100) n/t (cid:101)(cid:101)} , and AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 49 (2) if v ∈ V such that | L v | ≤ t , and v is either the root of the tree or a childof a vertex v (cid:48) ∈ V with | L v (cid:48) | > t , then the subtree of ( V, E ) rooted on v isa reduction tree for ( β , . . . , β | L v |− ) .Then ( V, E ) is a reduction tree for ( β , . . . , β n − ) . The following technical lemma is required for the proof of Lemma 4.14.

Lemma 4.15.

Let µ ∈ F n have linearly independent entries over F , ( V, E ) be areduction tree for µ , and ω ∈ F be nonzero. Then ( V, E ) is a reduction tree for ωµ .Proof. We prove the lemma by induction on n . The lemma holds trivially for n = 1. Therefore, let n ≥ n . Let µ = ( µ , . . . , µ n − ) ∈ F n have linearly independent entries over F ,( V, E ) be a reduction tree for µ , r ∈ V be the root vertex of the tree, and ω ∈ F be nonzero. Then µ i /µ = ωµ i / ( ωµ ) ∈ F d ( r ) for i ∈ { , . . . , d ( r ) − } , theinduction hypothesis implies that the subtree rooted on r α is a reduction treefor ωα ( µ, d ( r )) = α ( ωµ, d ( r )), and the subtree rooted on r δ is a reduction tree for δ ( µ, d ( r )) = δ ( ωµ, d ( r )). Therefore, ( V, E ) is a reduction tree for ωµ . Hence, thelemma follows by induction. (cid:3) Proof of Lemma 4.14.

We prove the lemma by induction of n . Condition (2) im-plies that the lemma holds trivially if n ≤ t . Therefore, let n ∈ { t + 1 , . . . , m t } and suppose that the lemma is true for all smaller values of n . Let ( V, E ) be a fullbinary tree with n leaves that satisﬁes conditions (1) and (2) of the lemma. Let β = ( β , . . . , β n − ) and r ∈ V be the root vertex of the tree. Then | L r | = n > t .Thus, (1) implies that d ( r ) = 2 k t for some k < (cid:100) log (cid:100) n/t (cid:101)(cid:101) ≤ m . Therefore,property (4) of Lemma 4.13 implies that β i /β ∈ F d ( r ) for i ∈ { , . . . , d ( r ) − } .Moreover, properties (2) and (4) of the lemma imply that(4.8) α ( β, d ( r )) = (cid:0) β , . . . , β d ( r ) − (cid:1) and δ ( β, d ( r )) = 1 β (cid:0) β , . . . , β n − d ( r ) − (cid:1) . The subtrees of (

V, E ) rooted on r α and r δ satisfy the conditions of the lemmathrough inheritance. Thus, the induction hypothesis and (4.8) imply that thesubtree rooted on r α is a reduction tree for α ( β, d ( r )). Similarly, the inductionhypothesis, Lemma 4.15 and (4.8) imply that the subtree root on r δ is a reductiontree for δ ( β, d ( r )). Therefore, ( V, E ) is a reduction tree for β . Hence, the lemmafollows by induction. (cid:3) We now complete the proof of Theorem 4.12 by showing that its constructionproduces binary trees that satisfy the conditions of Lemma 4.14.

Proof of Theorem 4.12.

For i ∈ { , . . . , (cid:100) n/t (cid:101)} , ( V i , E i ) is a full binary tree withmin( n − ( i − t, t ) leaves. Therefore, it is clear that ( V, E ) is a full binary tree with (cid:100) n/t (cid:101) (cid:88) i =1 min( n − ( i − t, t ) = n leaves. We show that ( V, E ) satisﬁes the conditions of Lemma 4.14.Suppose there exists a vertex v ∈ V such that | L v | > t . Then v is not descendedfrom or equal to u i for i ∈ { , . . . , (cid:100) n/t (cid:101)} , since ( V i , E i ) has at most t leaves. Bythe choice of ( V , E ), it follows that 2 k of the vertices u i are descended from v α for some k < (cid:100) log (cid:100) n/t (cid:101)(cid:101) . Let i , . . . , i k be the indices of these vertices. Then i , . . . , i k < (cid:100) n/t (cid:101) , since the ordering of the vertices u i implies that u (cid:100) n/t (cid:101) must beequal to v δ or one of its descendants. It follows that the subtrees of ( V, E ) rootedon u i , . . . , u i k each have t leaves. Thus, d ( v ) = 2 k t for some k < (cid:100) log (cid:100) n/t (cid:101)(cid:101) .Therefore, ( V, E ) satisﬁes condition (1) of Lemma 4.14.Suppose there exists a vertex v ∈ V such that | L v | ≤ t , and v is either the root ofthe tree or the child of a vertex v (cid:48) ∈ V with | L v (cid:48) | > t . Then v is descended from orequal to u i for some i ∈ { , . . . , (cid:100) n/t (cid:101)} , since the subtrees rooted on u , . . . , u (cid:100) n/t (cid:101)− each have t leaves, while the subtree rooted on u (cid:100) n/t (cid:101) has at least one leaf. If v is the root of ( V, E ), then (

V, E ) = ( V i , E i ). If v is the child of a vertex v (cid:48) ∈ V such that | L v (cid:48) | > t , and thus | L v (cid:48) | > | L u i | , then u i is a descendant of v (cid:48) . In eithercase, v is equal to u i . Thus, the choice of ( V i , E i ) and property (3) of Lemma 4.13imply that the subtree of ( V, E ) rooted on v is a reduction tree for ( β , . . . , β | L v |− ).Hence, ( V, E ) satisﬁes condition (2) of Lemma 4.14. (cid:3)

References

1. Eli Ben-Sasson, Iddo Bentov, Alessandro Chiesa, Ariel Gabizon, Daniel Genkin, MatanHamilis, Evgenya Pergament, Michael Riabzev, Mark Silberstein, Eran Tromer, and MadarsVirza,

Computational integrity with a public random string from quasi-linear PCPs , Ad-vances in cryptology—EUROCRYPT 2017. Part III, Lecture Notes in Comput. Sci., vol.10212, Springer, Cham, 2017, pp. 551–579.2. Eli Ben-Sasson, Iddo Bentov, Yinon Horesh, and Michael Riabzev,

Scalable, transparent, andpost-quantum secure computational integrity , Cryptology ePrint Archive, Report 2018/046,2018, https://eprint.iacr.org/2018/046 .3. Daniel J. Bernstein and Tung Chou,

Faster binary-ﬁeld multiplication and faster binary-ﬁeldMACs , Selected areas in cryptography—SAC 2014, Lecture Notes in Comput. Sci., vol. 8781,Springer, Cham, 2014, pp. 92–111.4. Daniel J. Bernstein, Tung Chou, and Peter Schwabe,

McBits: Fast constant-time code-basedcryptography , Cryptographic Hardware and Embedded Systems—CHES 2013, Lecture Notesin Comput. Sci., vol. 8086, Springer, Berlin, 2013, pp. 250–272.5. James R. Bitner, Gideon Ehrlich, and Edward M. Reingold,

Eﬃcient generation of the binaryreﬂected Gray code and its applications , Comm. ACM (1976), no. 9, 517–521.6. Richard P. Brent, Pierrick Gaudry, Emmanuel Thom´e, and Paul Zimmermann, Faster mul-tiplication in

GF(2)[ x ], Algorithmic number theory—ANTS 2008, Lecture Notes in Comput.Sci., vol. 5011, Springer, Berlin, 2008, pp. 153–166.7. David G. Cantor, On arithmetical algorithms over ﬁnite ﬁelds , J. Combin. Theory Ser. A (1989), no. 2, 285–300.8. Ming-Shing Chen, Chen-Mou Cheng, Po-Chun Kuo, Wen-Ding Li, and Bo-Yin Yang, Fastermultiplication for long binary polynomials , 2017, arXiv:1708.09746 [cs.SC] .9. ,

Multiplying boolean polynomials with Frobenius partitions in additive fast Fouriertransform , 2018, arXiv:1803.11301 [cs.SC] .10. Tung Chou,

McBits revisited , Cryptographic Hardware and Embedded Systems—CHES 2017,Lecture Notes in Comput. Sci., vol. 10529, Springer, Cham, 2017, pp. 213–231.11. Nicholas Coxon,

Fast systematic encoding of multiplicity codes , arXiv:1704.07083 [cs.IT] ,Apr 2017.12. , Fast Hermite interpolation and evaluation over ﬁnite ﬁelds of characteristic two , arXiv:1807.00645 [cs.SC] , July 2018.13. N. J. Fine, Binomial coeﬃcients modulo a prime , Amer. Math. Monthly (1947), 589–592.14. Shuhong Gao and Todd Mateer, Additive fast Fourier transforms over ﬁnite ﬁelds , IEEETrans. Inform. Theory (2010), no. 12, 6265–6272.15. David Harvey, A cache-friendly truncated FFT , Theoret. Comput. Sci. (2009), no. 27-29,2649–2658.16. Donald E. Knuth,

The art of computer programming. Vol. 4, Fasc. 2 , Addison-Wesley, UpperSaddle River, NJ, 2005.

AST TRANSFORMS OVER FINITE FIELDS OF CHARACTERISTIC TWO 51

17. Robin Larrieu,

The truncated Fourier transform for mixed radices , ISSAC’17—Proceedingsof the 2017 ACM International Symposium on Symbolic and Algebraic Computation, ACM,New York, 2017, pp. 261–268.18. Wen-Ding Li, Ming-Shing Chen, Po-Chun Kuo, Chen-Mou Cheng, and Bo-Yin Yang,

Frobe-nius additive fast Fourier transform , 2018, arXiv:1802.03932 [cs.SC] .19. Sian-Jheng Lin, Tareq Y. Al-Naﬀouri, and Yunghsiang S. Han,

FFT algorithm for binaryextension ﬁnite ﬁelds and its application to Reed-Solomon codes , IEEE Trans. Inform. Theory (2016), no. 10, 5343–5358.20. Sian-Jheng Lin, Tareq Y. Al-Naﬀouri, Yunghsiang S. Han, and Wei-Ho Chung, Novel poly-nomial basis with fast Fourier transform and its application to Reed–Solomon erasure codes ,IEEE Trans. Inform. Theory (2016), no. 11, 6284–6299.21. Sian-Jheng Lin, Wei-Ho Chung, and Yunghsiang S. Han, Novel polynomial basis and itsapplication to Reed-Solomon erasure codes , 55th Annual IEEE Symposium on Foundationsof Computer Science—FOCS 2014, IEEE Computer Soc., Los Alamitos, CA, 2014, pp. 316–325.22. Edouard Lucas,

Th´eorie des Fonctions Num´eriques Simplement P´eriodiques. [Continued] ,Amer. J. Math. (1878), no. 3, 197–240.23. Todd Mateer, Fast Fourier Transform algorithms with applications , ProQuest LLC, AnnArbor, MI, 2008, Ph.D. thesis–Clemson University.24. J. van der Hoeven,

Notes on the Truncated Fourier Transform , Tech. Report 2005-5, Univer-sit´e Paris-Sud, Orsay, France, 2005.25. Joris van der Hoeven,

The truncated Fourier transform and applications , ISSAC 2004—Proceedings of the 2004 international symposium on Symbolic and algebraic computation,ACM, New York, 2004, pp. 290–296.26. Joris van der Hoeven and ´Eric Schost,

Multi-point evaluation in higher dimensions , Appl.Algebra Engrg. Comm. Comput. (2013), no. 1, 37–52.27. Joachim von zur Gathen, Functional decomposition of polynomials: the tame case , J. SymbolicComput. (1990), no. 3, 281–299.28. Joachim von zur Gathen and J¨urgen Gerhard, Arithmetic and factorization of polynomialover F (extended abstract) , ISSAC ’96—Proceedings of the 1996 International Symposiumon Symbolic and Algebraic Computation, ACM, New York, 1996, pp. 1–9.29. Y. Wang and X. Zhu, A fast algorithm for the Fourier transform over ﬁnite ﬁelds and itsVLSI implementation , IEEE Journal on Selected Areas in Communications (1988), no. 3,572–577. INRIA Saclay–ˆIle-de-France & Laboratoire d’Informatique, ´Ecole polytechnique,91128 Palaiseau Cedex, France

E-mail address ::