An Illustrated Introduction to the Truncated Fourier Transform
aa r X i v : . [ c s . S C ] F e b An Illustrated Introduction to the Truncated FourierTransform
Paul Vrbik.School of Mathematical and Physical SciencesThe University of NewcastleCallaghan, Australia [email protected]
February 18, 2016
Abstract
The Truncated Fourier Transform ( tft ) is a variation of the Discrete Fourier Trans-form ( dft / fft ) that allows for input vectors that do not have length 2 n for n a positiveinteger. We present the univariate version of the tft , originally due to Joris van derHoeven, heavily illustrating the presentation in order to make these methods accessibleto a broader audience. In 1965 Cooley and Tukey developed the
Discrete Fourier Transform ( dft ) to recover con-tinuous functions from discrete samples. This was a landmark discovery because it allowedfor the digital manipulation of analogue signals (like sound) by computers. Soon after a vari-ant called the Fast Fourier Transform ( fft ) eclipsed the dft to the extent that fft is oftenmistakenly substituted for dft . As its name implies, the fft is a method for computing the dft faster. fft s have an interesting application in Computer Algebra. Let R be a ring with 2 ∈ R a unit. If R has a primitive n th root of unity ω with n = 2 p (i.e. ω n/ = −
1) then the fft computes the product of two polynomials
P, Q ∈ R [ x ] with deg( P Q ) < n in O ( n log n )operations in R . Unfortunately, when deg( P Q ) is sufficiently far from a power of two many computations wasted. This deficiency was addressed by the signal processing communityusing a method called fft -pruning [3]. However, the difficult inversion of this method is dueto van der Hoeven [4][5].In Section 2 we outline the dft , including a method for its non-recursive implementation.In Section 3 we develop the “pruned” variant, called Truncated Fourier Transform ( tft ).1inally, in Section 4, we show how the tft can be inverted and outline the algorithm fordoing so in Section 5.
For this paper let R be a ring with 2 ∈ R a unit and ω ∈ R an n th root of unity. TheDiscrete Fourier Transform ∗ , with respect to ω , of vector a = ( a , . . . , a n − ) ∈ R n is thevector ˆ a = (ˆ a , . . . , ˆ a n − ) ∈ R n with ˆ a i = n − X j =0 a j ω ij . Alternatively we can view these n -tuples as encoding the coefficients of polynomials from R [ x ] and define the dft with respect to ω as the mappingDFT ω : R [ x ] → R n A ( x ) = a + · · · + a n − x n − ( A ( ω ) , . . . , A ( ω n − )) . We let the relationship between A and its coefficients be implicit and writeDFT ω ( a , . . . , a n − ) := ( A ( ω ) , . . . , A ( ω n − ))when A = a + · · · + a n − x n − .The dft can be computed efficiently using binary splitting. This method requires evalu-ation only at ω i for i ∈ { , . . . , p − } , rather than at all ω , . . . , ω n − . To compute the dft of A with respect to ω we write( b , c , . . . , b n/ − , c n/ − ) := ( a , . . . , a n − )and recursively compute the dft of ( b , . . . , b n/ − ) and ( c , . . . , c n/ − ) with respect to ω :(ˆ b , . . . , ˆ b n/ − ) := DFT ω ( b , . . . , b n/ − ) , (ˆ c , . . . , ˆ c n/ − ) := DFT ω ( c , . . . , c n/ − ) . Finally, we construct ˆ a according toDFT ω ( a , . . . , a n − ) = (ˆ b + ˆ c , . . . , ˆ b n/ − + ˆ c n/ − ω n/ − , ˆ b − ˆ c , . . . , ˆ b n/ − − ˆ c n/ − ω n/ − ) . This description has a natural implementation as a recursive algorithm, but in practiceit is often more efficient to implement an in-place algorithm that eliminates the overhead ofcreating recursive stacks. ∗ In signal processing community this is called the “decimation-in-time” variant of the fft . efinition. Let i and p be a positive integers and let i = i + · · · + i p p for i , . . . , i p ∈ { , } .The length- p bitwise reverse of i is given by[ i ] p := i p + · · · + i p . Example. [3] = 24 and [11] = 26 because[3] = [1 · + 1 · + 0 · + 0 · + 0 · ] = 0 · + 0 · + 0 · + 1 · + 1 · = 24and [11] = [1 · + 1 · + 0 · + 1 · + 0 · ] = 0 · + 1 · + 0 · + 1 · + 1 · = 26 . Notice if we were to write 3, 24, 11, and 26 as a binary numbers to five digits we have 00011reverses to 11000 and 01011 reverses to 11010 — in fact this is the inspiration for the name“bitwise reverse.”For the in-place non-recursive dft algorithm, we require only one vector of length n .Initially, at step zero, this vector is x = ( x , , . . . , x ,n − ) := ( a , . . . , a n − )and is updated (incrementally) at steps s ∈ { , . . . , p } by the rule " x s,im s + j x s, ( i +1) m s + j := " ω [ i ] s m s − ω [ i ] s m s x s − ,im s + j x s − , ( i +1) m s + j (1)where m s = 2 p − s and for all i ∈ { , , . . . , n/m s − } , j ∈ { , . . . , m s − } . Note that twoadditions and one multiplication are done in (1) as one product is merely the negation ofthe other.We illustrate the dependencies of the x s,i values in (1) with x s − , im s + j x s − , ( i +1) m s + j x s, im s + j x s, ( i +1) m s + j
3e call this a “butterfly” after the shape it forms and may say m s controls the width — thevalue of which decreases as s increases. By placing these butterflies on a s × n grid (Figure1) we can see which values of x s are required to compute particular entires of x s +1 (andvice-versa). For example x , x , x , x , denotes that x , and x , are required to determine x , and x , (and vice-versa). i = 0 i = 1 · · · · · · i = 15 s = 0 s = 1 s = 2 s = 3 s = 4 x , x , x , x , Figure 1: Schematic representation of Equation (1) using butterflies to illustrate the valuedependencies at various steps s . The grid has rows s = 0 , . . . , n = 0 , . . . , x s,i values.Using induction over s , x s,im s + j = (DFT ω ms ( a j , a m s + j , . . . , a n − m s + j )) [ i ] s , for all i ∈ { , . . . , n/m s − } and j ∈ { , . . . , m s − } [4]. In particular, when s = p and j = 0, we have x p,i = ˆ a [ i ] p and ˆ a i = x p, [ i ] p for all i ∈ { , . . . , n − } . That is, ˆ a is a (specific) permutation of x p as illustrated in Figure2. The key property of the dft is that it is straightforward to invert, that is to recover a from ˆ a : DFT ω − (ˆ a ) i = DFT ω − (DFT ω ( a )) i = n − X k =0 n − X j =0 a i ω ( i − k ) j = na i (2)since P n − j =0 ω ( i − k ) j = 0 whenever i = k . This yields a polynomial multiplication algorithmthat does O ( n log n ) operations in R (see [1, § a a a a a a a a a a a a a a a b a [0] b a [1] b a [2] b a [3] b a [4] b a [5] b a [6] b a [7] b a [8] b a [9] b a [10] b a [11] b a [12] b a [13] b a [14] b a [15] Figure 2: The Discrete Fourier Transform for n = 16. The top row, corresponding to s = 0,are the initial values a . The bottom row, corresponding to s = 4, is a permutation of ˆ a (theresult of the DFT on a ). The motivation behind the Truncated Fourier Transform ( tft ) is the observation that manycomputations are wasted when the length of a (the input) is not a power of two. † This isentirely the fault of the strategy where one “completes” the ℓ -tuple a = ( a , . . . , a ℓ − ) bysetting a i = 0 when i ≥ ℓ to artificially extend the length of a to the nearest power of two(so the dft can be executed as usual).However, despite the fact that we may only want ℓ components of ˆ a , the dft will calculate all of them. Thus computation is wasted. We illustrate this in Figures 3 and 4. This typeof wasted computation is relevant when using the dft to multiply polynomials — theirproducts are rarely of degree one less some power of two.The definition of the tft is similar to that of the dft with the exception that the inputand output vector ( a resp. ˆ a ) are not necessarily of length some power of two. More preciselythe tft of an ℓ -tuple ( a , . . . , a ℓ − ) ∈ R ℓ is the ℓ -tuple (cid:0) A ( ω [0] p ) , . . . , A ( ω [ ℓ − p ) (cid:1) ∈ R ℓ . where n = 2 p , ℓ < n (usually ℓ ≥ n/
2) and ω a n th root of unity. Remark. van der Hoeven [4] gives a more general description of the tft where one canchoose an initial vector ( x ,i , . . . , x ,i n ) and target vector ( x p,j , . . . , x p,j n ). Provided the i k ’sare distinct one can carry out the tft by considering the full dft and removing all computa-tions not required for the desired output. In this paper, we restrict our discussion to that ofthe scenario in Figure 4 (where the input and output are the same initial segments) becauseit can be used for polynomial multiplication, and because it yields the most improvement. ⋄ † The tft is exactly equivalent to a technique called “ fft pruning” in the signal processing literature [3]. dft with “artificial” zero points (large black dots).Figure 4: Removing all unnecessary computations from Figure 3 gives the schematic repre-sentation of the tft . 6f we only allow ourselves to operate in a size n vector it is straightforward to modifythe in-place dft algorithm from the previous section to execute the tft . (It should beemphasized that this only saves computation and not space. For a “true” in-place tft algorithm that operates in an array of size ℓ , see Harvey and Roche’s [2].) At stage s itsuffices to compute ( x s, , . . . , x s,j ) with j = ⌈ ℓ/m s ⌉ m s − m s = 2 p − s . ‡ Theorem 1.
Let n = 2 p , ≤ ℓ < n and ω ∈ R be a primitive n th root of unity in R .The tft of an ℓ -tuple ( a , . . . , a ℓ − ) with respect to ω can be computed using at most ℓp + n additions and ⌊ ( ℓp + n ) / ⌋ multiplications of powers of ω .Proof. Let j = ( ⌈ ℓ s /m s ⌉ ) m s −
1; at stage s we compute ( x s, , . . . , x s,j ). So, in addition to x s, , . . . , x s,ℓ − we compute ( ⌈ ℓ/m s ⌉ ) m s − − ℓ ≤ m s more values. Therefore, in total, we compute at most pℓ + p X s =1 m s = pℓ + 2 p − + 2 p − + · · · + 1 = pℓ + 2 p − < pℓ + n values x s,i . The result follows. Unfortunately, the tft cannot be inverted by merely doing another tft with ω − andadjusting the output by some constant factor like inverse of the dft . Simply put: we aremissing information and must account for this. Example.
Let R = Z / Z , n = 2 = 4, with ω = 5 a n th primitive root of unity. Setting A ( x ) = a + a x + a x , the tft of a = ( a , a , a ) at 5 is A ( ω ) A ( ω ) A ( ω ) = A (1) A ( − A (5) = a + a + a a − a + a a + 5 a − a . Now, to show the tft of this with respect to ω − is not a , define b = b b b = a + a + a a − a + a a + 5 a − a . ‡ This is a correction to the bound given in [5] as pointed out in [4]. tft of b with respect to ω − = − B ( ω ) B ( ω − ) B ( ω − ) = B (1) B ( − B (5) = b + b + b b − b + b b − b − b = a + 5 a + a a − a − a − a + a − a which is not a constant multiple of TFT ω ( a ).This discrepancy is caused by the completion of b to ( b , b , b ,
0) — we should haveinstead completed b to ( b , b , b , A ( − tft we use the fact that whenever two values among x s,im s + j , x s − ,im s + j and x s, ( i +1) m s + j , x s − , ( i +1) m s + j are known, that the other values can be deduced. That is, if two values of some butterflyare known then the other two values can be calculated using (1) as the relevant matrix isinvertible. Moreover, these relations only involve shifting (multiplication and division bytwo), additions, subtractions and multiplications by roots of unity — an ideal scenario forimplementation.As with dft , observe that x p − k, , . . . , x p − k, k − can be calculated from x p, , . . . , x p, k − .This is because all the butterfly relations necessary to move up like this never require x s, k + j for any s ∈ { p − k, . . . , p } and j >
0. This is illustrated in Figure 5. More generally, we havethat x p, j +2 k , . . . , x p, j +2 k − is sufficient information to compute x p − k, j , . . . , x p − k, j +2 k − provided that 0 < k ≤ j < p . (In Algorithm 1, that follows, we call this a “self-containedpush up”.) Finally, in this section, we present a simple recursive description of the inverse tft algorithmfor the case we have restricted ourselves to (all zeroes packed at the end). The algorithmoperates in a length n array x = ( x , . . . , x n − ) for which we assume access; here n = 2 p corresponds to ω , a n th primitive root of unity. Initially, the content of the array is x := ( x p, , . . . , x p, ℓ − , , . . . , x p, , . . . , x p, ℓ − ) is the result of the tft on ( x , , . . . , x , ℓ − , , . . . , • ) and what value to calculate (empty dot ◦ ). For8igure 5: The computations in the boxes are self contained.instance, “push down x k with Figure 6”, is shorthand for: use x k = x s − , im s + j and x k + m s + j = x s − , ( i +1) m s + j to determine x s, im s + j . We emphasize with an arrow that this new value shouldalso overwrite the one at x k . This calculation is easily accomplished using (1) with a caveat:the values i and j are not explicitly known. What is known is s , and therefore m s , and somearray position k . Observe that i is recovered by i = k quo m s (the quotient of k/m s ). x k = x s − , im s + j x s − , ( i +1) m s + j = x k + m s + j x s, im s + j x s, ( i +1) m s + j Figure 6: Overwrite x [ im s + j ] with x [ im s + j ] + ω [ i ] s m s x [( i + 1) m s + j ].The full description of the inverse tft follows in Algorithm 1; note that the initial callis InvTFT (0 , ℓ − , n − , Theorem 2.
Algorithm 1, initially called with
InvTFT (0 , ℓ − , n − , and given accessto the zero-indexed length n array x = ( x p, , . . . , x p, ℓ − , , . . . ,
0) (3) will terminate with x = ( x , , . . . , x , ℓ − , , . . . ,
0) (4) where (3) is the result of the tft on (4).Termination.
Let head i , tail i , and last i be the values of head, tail, and last at the i th9 lgorithm 1: InvTFT (head , tail , last , s ) Initial call : InvTFT(0 , ℓ − , n − , middle ← last − head2 + head; LeftMiddle ← ⌊ middle ⌋ ; RightMiddle ← LeftMiddle + 1; if head > tail then Base case—do nothing; return null; else if tail ≥ LeftMiddle then Push up the self-contained region x head to x LeftMiddle ; Push down x tail+1 to x last with ; InvTFT (RightMiddle , tail , last , s + 1); s ← p − log (LeftMiddle − head + 1); Push up (in pairs) ( x head , x head+ m s ) to ( x LeftMiddle , x LeftMiddle+ m s ) with ; else if tail < LeftMiddle then Push down x tail+1 to x LeftMiddle with ; InvTFT (head , tail , LeftMiddle , s + 1); Push up x head to x LeftMiddle with ;10ecursive call. Consider the integer sequences given by α i = tail i − head i ∈ Z ,β i = tail i − (cid:22) last i − head i (cid:23) ∈ Z . If head i > tail i then we have termination. Otherwise, either branch (7) executes givinghead i +1 = (cid:22) last i − head i (cid:23) + head i > head i tail i +1 = tail i last i +1 = last i and thus α i +1 < α i , or branch (13) executes, givinghead i +1 = head i tail i +1 = tail i last i +1 = (cid:22) last i − head i (cid:23) + head i > head i . and thus β i +1 < β i .Neither branch can run forever since α < β < α strictly decreases or condition (13) fails, forcing termination. Sketch of correctness.
Figure 7 and Figure 16 demonstrate that self contained regions canbe exploited to obtain the initial values required to complete the inversion. That is to say,for n ∈ { , . . . , p − } , that x = ( x p − n − , , . . . , x p − n − , n +1 − )can always be calculated from x = ( x p, , . . . , x p, ℓ − , x p − n − , tail+1 , . . . , x p − n − , n +1 − ) . The Truncated Fourier Transform is a novel and elegant way to reduce the number of com-putations of a dft -based computation by a possible factor of two (which may be significant).Additionally, with the advent of Harvey and Roche’s paper [2], it is possible to save as muchspace as computation. The hidden “cost” of working with the tft algorithm is the increaseddifficulty of determining the inverse tft . Although in most cases this is still less costly thanthe inverse dft , the algorithm is no doubt more difficult to implement.11igure 7: tail ≥ LeftMiddle (i.e. at least half the values are at x = p ). x head x tail x last n n s = p − n − s = p − ns = p (a) Line (8): push up the self contained (dashed) region. This yields values sufficient to push down at line(9). x head x tail x last n s = p − n − s = p − ns = p (b) This enables us to make a recursive call on the dashed region (line (12)). By our induction hypothesisthis brings all points at s = p to s = p − n . s = p − n − s = p − ns = p (c) Sufficient points at s = p − n are known to move to s = p − n − head x tail x last n n s = p − n − s = p − ns = p (d) Initially there is sufficient information to push down at line (14). x head x tail x last n n s = p − n − s = p − ns = p (e) This enables us to make the prescribed recursive call at line (15). n n s = p − n − s = p − ns = p (f) By the induction hypothesis this brings the values in the dashed region to s = p − n , leaving enoughinformation to move up at line (16). Figure 8: tail < LeftMiddle (i.e. less than half the values are at x = p ).13 a) Initial state of the algorithm. Grey dots are theresult of the forward tft ; larger grey dots are zeros. (b) tail ≥ LeftMiddle. Push up; calculate x , , . . . , x , from x , , . . . , x , (contained region). Then pushdown.(c) Recursive call on right half. (d) tail < LeftMiddle. Push down with.(e) Recursive call on left half. (f) tail ≥ LeftMiddle. Push up the contained (dashed)region then push down.(g) Recursive call on right half. (h) Hiding details. The result of (g).(i) Finish step (e) by pushing up. (j) Finish step (c) by pushing up.(k) Resolve the original call by pushing up. (l) Done.
Figure 9: Schematic representation of the recursive computation of the inverse tft for n = 16and ℓ = 11. 14 cknowledgements The author wishes to thank Dr. Dan Roche and Dr. ´Eric Schost for reading a draft of thispaper and offering suggestions.
References [1] K. O. Geddes, S. R. Czapor, and G. Labahn.
Algorithms for Computer Algebra . KluwerAcademic Publishers, 1992.[2] David Harvey and Daniel S. Roche. An in-place truncated fourier transform and applica-tions to polynomial multiplication. In
Proceedings of the 2010 International Symposiumon Symbolic and Algebraic Computation , ISSAC ’10, pages 325–329, New York, NY,USA, 2010. ACM.[3] H.V. Sorensen and C.S. Burrus. Efficient computation of the dft with only a subset ofinput or output points.
Signal Processing, IEEE Transactions on , 41(3):1184 –1200, mar1993.[4] J. van der Hoeven. Notes on the Truncated Fourier Transform. Technical report, Uni-versit´e Paris-Sud, Orsay, France, 2008.[5] Joris van der Hoeven. The truncated fourier transform and applications. In