[PDF] A fast FFT-based discrete Legendre transform

Abstract

An O(N(logN ) 2 /loglogN) algorithm for computing the discrete Legendre transform and its inverse is described. The algorithm combines a recently developed fast transform for converting between Legendre and Chebyshev coefficients with a Taylor series expansion for Chebyshev polynomials about equally-spaced points in the frequency domain. Both components are based on the FFT, and as an intermediate step we obtain an O(NlogN) algorithm for evaluating a degree N−1 Chebyshev expansion at an N -point Legendre grid. Numerical results are given to demonstrate performance and accuracy.

Full PDF

IIMA Journal of Numerical Analysis (2015) Page 1 of 15doi:10.1093/imanum/drnxxx

A fast FFT-based discrete Legendre transform

Nicholas Hale † and Alex Townsend ‡[Received on ; revised on ] An O ( N ( log N ) / loglog N ) algorithm for the computation of the discrete Legendre transform and its in-verse is described. The algorithm combines a recently developed fast transform for converting betweenLegendre and Chebyshev coefﬁcients with a Taylor series expansion for Chebyshev polynomials aboutequally-spaced points in the frequency domain. Both components are based on the FFT, and as an inter-mediate step we obtain an O ( N log N ) algorithm for evaluating a degree N − N -point Legendre grid. Numerical results are given to demonstrate performance and accuracy. Keywords : discrete Legendre transform; Legendre polynomials; Chebyshev polynomials; fast Fouriertransform

1. Introduction

Given N values c , . . . , c N − , the discrete Legendre transform (DLT) or ﬁnite Legendre transform calcu-lates the discrete sums f k = N − ∑ n = c n P n ( x legk ) , (cid:54) k (cid:54) N − , (1.1)where P n ( x ) denotes the degree n Legendre polynomial and the Legendre nodes, 1 > x leg > . . . > x legN − > −

1, are the roots of P N ( x ) . The inverse discrete Legendre transform (IDLT), which computes c , . . . , c N − given f , . . . , f N − , can be expressed as c n = ( n + ) N − ∑ k = w legk f k P n ( x legk ) , (cid:54) n (cid:54) N − , (1.2)where w leg , . . . w legN − are the Gauss–Legendre quadrature weights. It is the goal of this paper to describefast algorithms for computing the DLT and IDLT.Expansions in Legendre polynomials can be preferred over Chebyshev expansions because Legendrepolynomials are orthogonal in the standard L -norm. However, Legendre expansions are less convenientin practice because fast algorithms are not as readily available or involve a precomputational cost thatinhibits applications employing adaptive discretizations. By deriving fast algorithms for the DLT andIDLT we take some steps towards removing this barrier and allowing Legendre expansions and adaptivediscretizations at Legendre nodes to become a practical tool in computational science.Our approach makes use of the fast Chebyshev–Legendre transform described in (Hale & Townsend,2014), which is based on carefully exploiting an asymptotic expansion of Legendre polynomials and thefast Fourier transform (FFT). Using the FFT has advantages and disadvantages. On the one hand, fast † Department of Applied Mathematics, University of Stellenbosch, Stellenbosch, 7600, South Africa. ([email protected]) ‡ Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139-4307.([email protected]) c (cid:13) The author 2015. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. a r X i v : . [ m a t h . NA ] O c t of 15 N. HALE AND A. TOWNSENDValues atChebyshevpoints Legendrecoefﬁcients ChebyshevcoefﬁcientsValues in thecomplex plane Values atLegendrepoints(Potts et al. , 1998)(Mori et al. , 1999) (Hale & Townsend, 2014)(Keiner, 2008)(Alpert & Rokhlin, 1991)DCT(Iserles, 2011) (Potts, 2003)(Keiner, 2009)(Tygert, 2010) NDCT(Dutt & Rokhlin, 1993, 1995)(Fenn & Potts, 2005)DLTIDLTF IG . 1. Some existing fast algorithms related to the DLT and IDLT. Solid lines depict transforms discussed in this paper. The“non-uniform discrete cosine transform” (NDCT) is described in Section 2, the DLT in Section 3, and the IDLT in Section 4.This table is far from complete. Furthermore, some of the papers referenced belong to more than one of the connecting lines; forexample, (Keiner, 2009) computes the DLT by combining a transformation from Legendre coefﬁcients to Chebyshev coefﬁcientswith an NDCT. We have tried to include as many of the key papers as possible, without overcomplicating the diagram. and industrial-strength implementations of algorithms for computing the FFT are ubiquitous. On theother hand, the FFT is restricted to equally-spaced samples and is therefore not immediately applicableto situations like computing the DLT. In this paper we overcome the equally-spaced restriction by con-sidering Legendre nodes as a perturbation of a Chebyshev grid and employing a truncated Taylor seriesexpansion. A similar approach for the discrete Hankel transform is described in (Townsend, 2015a).Examples of existing approaches that may be used or adapted to compute the DLT include butterﬂyschemes (O’Neil et al. , 2010), divide-and-conquer approaches (Keiner, 2008; Tygert, 2010), and theoversampled non-uniform discrete cosine transform (Potts, 2003; Fenn & Potts, 2005; Keiner et al. ,2009). The algorithm presented here has three advantages: (1) It is simple (the main ingredients are justthe FFT and Taylor approximations), (2) There is essentially no precomputational cost; and, (3) Thereare no algorithmic parameters for the user to tweak as they are all optimally determined by analysis. Inparticular, these ﬁnal two points make the algorithm presented in this paper well-suited to applicationswhere N is not known in advance or when there are multiple accuracy goals.It is interesting to note that before the work of Glaser et al. (2007) there was no known stable methodfor computing the Legendre nodes in fewer than O ( N ) operations. Therefore, prior to 2007, it seemslikely that all DLT algorithms had a precomputational cost of at least O ( N ) . Nowadays, we believethat the algorithms described above, with the exception of (O’Neil et al. , 2010), can be considered tohave a lower precomputational cost.The paper is structured as follows: In the next section we develop an O ( N log N ) algorithm forevaluating a Chebyshev series expansion at Legendre nodes (we refer to this as a “non-uniform discretecosine transform” or NDCT). In Section 3 we combine this with the fast Chebyshev–Legendre transformfrom (Hale & Townsend, 2014) to obtain an O ( N ( log N ) / loglog N ) algorithm for computing the DLT.A similar algorithm for the IDLT is described in Section 4. Throughout we use the following notation: The current state of the art for computing the Legendre nodes is due to Bogaert (2014), whose O ( N ) algorithm can be rapidlycompute millions of nodes to 16-digits of accuracy in under a second. For a discussion of the history of computing Gauss–Legendre quadrature nodes and weights, see (Townsend, 2015b). FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM v , . . . , v N − is denoted by v , a row vector by v T , a diagonal matrix withentries v , . . . , v N − by D v , and the N × N matrix with ( j , k ) entry P k ( x j ) by P N ( x ) .

2. A non-uniform discrete cosine transform

Before considering the DLT (1.1) we describe an O ( N log N ) algorithm for evaluating a Chebyshevexpansion at Legendre nodes, f k = N − ∑ n = c n T n ( x legk ) , (cid:54) k (cid:54) N − . (2.1)This will become an important step in the DLT. Here, T n ( x ) = cos ( n cos − x ) is the degree n Chebyshevpolynomial of the ﬁrst kind and the sums in (2.1) may be rewritten as f k = N − ∑ n = c n cos ( n θ legk ) , (cid:54) k (cid:54) N − , (2.2)where x legk = cos ( θ legk ) . If the Legendre nodes in (2.2) are replaced by the Chebyshev points of the ﬁrstkind, i.e., x cheb k = cos ( θ cheb k ) = cos (cid:32) ( k + ) π N (cid:33) , (cid:54) k (cid:54) N − , (2.3)then (2.1) can be computed in O ( N log N ) operations via a diagonally-scaled discrete cosine transformof type III (DCT-III) (Gentleman, 1972). Since (2.2) has a similar form to a DCT, but with non-equally-spaced samples θ legk , we refer to (2.1) as a “non-uniform discrete cosine transform” (NDCT).Our algorithm to compute the NDCT is summarized as follows: consider the transformed Legendrenodes, θ leg , . . . , θ legN − , as a perturbation of an equally-spaced grid, θ ∗ , . . . , θ ∗ N − , i.e., θ legk = θ ∗ k + δ θ k , (cid:54) k (cid:54) N − , and then approximate each cos ( n θ legk ) term in (2.2) by a truncated Taylor series expansion about θ ∗ k . If | δ θ k | is small (see Section 2.2) then only a few terms in the Taylor series expansion are required. More-over, since the θ ∗ , . . . , θ ∗ N − are equally-spaced, each term in the series can be computed in O ( N log N ) operations via a DCT (see Section 2.3). The Legendre nodes, both x leg , . . . , x legN − and θ leg , . . . , θ legN − , canbe computed in O ( N ) operations using the algorithm described in (Bogaert, 2014).2.1 Computing an NDCT via a Taylor series expansion

As in (Hale & Townsend, 2014) we ﬁnd it convenient to work in the θ = cos − x variable. There arethree reasons for this:(1) The quantities θ legk can be computed more accurately than x legk (Swarztrauber, 2003).(2) The function cos − x is ill-conditioned near x = ±

1. In particular, since ddx cos − x = ( − x ) − / ,1 − ( x leg ) = O ( N − ) , and 1 − ( x legN − ) = O ( N − ) , one can expect rounding errors that grow like O ( N ) when evaluating cos − ( x leg ) and cos − ( x legN − ) . Furthermore, when evaluating T N − ( cos − ( x leg )) and T N − ( cos − ( x legN − )) this rounding error can increase to O ( N ) ; of 15 N. HALE AND A. TOWNSEND (3) The Taylor series of T n ( cos ( θ + δ θ )) = cos ( n ( θ + δ θ )) about θ involves simple trigonometricterms and is more convenient to work with than the ultraspherical polynomials that arise in aTaylor series of T n ( x + δ x ) about x .The Taylor series expansion of T n ( cos ( θ + δ θ )) about θ ∈ [ , π ] can be expressed ascos ( n ( θ + δ θ )) = ∞ ∑ (cid:96) = cos ( (cid:96) ) ( n θ ) ( δ θ ) (cid:96) (cid:96) ! = ∞ ∑ (cid:96) = ( − ) (cid:98) ( (cid:96) + ) / (cid:99) Φ (cid:96) ( n θ ) ( n δ θ ) (cid:96) (cid:96) ! , (2.4)where Φ (cid:96) ( θ ) = (cid:40) cos ( θ ) , (cid:96) even,sin ( θ ) , (cid:96) odd.If | δ θ | is small then a good approximation to T n ( cos ( θ + δ θ )) can be obtained by truncating this seriesafter just a handful of terms. The following lemma determines the accuracy we can expect from takingthe ﬁrst L terms.L EMMA L (cid:62) n (cid:62) R L , n , δθ : = max θ ∈ [ , π ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) cos ( n ( θ + δ θ )) − L − ∑ (cid:96) = cos ( (cid:96) ) ( n θ ) ( δ θ ) (cid:96) (cid:96) ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) ( n | δ θ | ) L L ! . Proof.

1, we write θ legk = θ ∗ k + δ θ k and substitute the ﬁrst L terms of (2.4) into (2.1) toobtain, f k , L , δθ k = N − ∑ n = c n (cid:32) L − ∑ (cid:96) = ( − ) (cid:98) ( (cid:96) + ) / (cid:99) Φ (cid:96) ( n θ ∗ k ) ( n δ θ k ) (cid:96) (cid:96) ! (cid:33) (2.5) = L − ∑ (cid:96) = ( − ) (cid:98) ( (cid:96) + ) / (cid:99) δ θ (cid:96) k (cid:96) ! (cid:32) N − ∑ n = ( n (cid:96) c n ) Φ (cid:96) ( n θ ∗ k ) (cid:33) , (2.6)where the second equality follows by interchanging the order of the summations.C OROLLARY f k is as in (2.1) and f k , L , δθ k as in (2.5), then for any L (cid:62) | f k − f k , L , δθ k | (cid:54) N − ∑ n = | c n | R L , n , δθ k (cid:54) ( N | δ θ k | ) L L ! || c || , (cid:54) k (cid:54) N − . Proof.

This follows immediately from applying Lemma 2.1 to each of the terms in (2.1). (cid:3)

Hence, if we approximate θ leg , . . . , θ legN − by points θ ∗ , . . . , θ ∗ N − such that • max (cid:54) k (cid:54) N − | θ legk − θ ∗ k | is sufﬁciently small (see Section 2.2); and, • θ ∗ , . . . , θ ∗ N − is equally spaced,then the number of terms, L , required to achieve an accuracy of | f k − f k , L , δθ k | (cid:54) ε || c || is small. More-over, the inner sums of (2.6) can be computed via discrete cosine and sine transforms (see Section 2.3),which can be computed in O ( N log N ) via an FFT. This gives us the foundations of a fast algorithm. FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM

Legendre nodes as a perturbation of a Chebyshev-like grid

The Chebyshev points of the ﬁrst kind, x cheb , . . . , x cheb N − , are an approximation to x leg , . . . , x legN − whichsatisfy the requirements in the two bullet points above. In particular, letting θ cheb k = cos − ( x cheb k ) , onecan readily show that (Olver et al. , 2010, (18.16.3)) | θ legk − θ cheb k | (cid:54) π ( N + ) , (cid:54) k (cid:54) N − . However, this bound is a little weak. A better bound is given in Lemma 2.2.Another possibility is to consider the leading order term of the asymptotic expansion of Legendrepolynomials of large degree (Stieltjes, 1890): P N ( cos θ ) ∼ (cid:114) π Γ ( N + ) Γ ( N + ) cos (cid:0) ( N + ) θ − π (cid:1) ( θ ) / , N → ∞ . The zeros of this leading term are θ cheb ∗ k = ( k + ) π N + , (cid:54) k (cid:54) N − , which are also equally-spaced and provide us with an approximation to Legendre nodes. In fact, multi-plying the numerator and denominator by a factor of 2 and comparing to (2.3) we see that these pointscorrespond precisely to the odd terms (i.e., when the index k is odd) in the (2 N + θ cheb ∗ k and θ cheb k approximate θ legk , and Figure 2 demon-strates that the derived bounds are tight.L EMMA θ leg , . . . , θ legN − denote the N roots of P N ( cos θ ) . Then,max (cid:54) k (cid:54) N − (cid:12)(cid:12)(cid:12) θ legk − θ cheb ∗ k (cid:12)(cid:12)(cid:12) (cid:54) π ( N + ) (cid:54) π N , (2.7)and max (cid:54) k (cid:54) N − (cid:12)(cid:12)(cid:12) θ legk − θ cheb k (cid:12)(cid:12)(cid:12) (cid:54) π + π ( N + ) (cid:54) . N . (2.8) Proof.

We ﬁrst consider | θ legk − θ cheb ∗ k | and restrict our attention to 0 (cid:54) k (cid:54) (cid:98) N / (cid:99) −

1. From (Szeg¨o,1936, Thm. 6.3.2) we know that θ cheb ∗ k < θ legk . Moreover, by (Brass et al. , 1996, Thm. 3.1 & eqn. (3.2))and cot ( x ) < / x , we have θ legk − θ cheb ∗ k (cid:54) ( N + ) cot θ cheb ∗ k (cid:54) N + ( N + ) ( k + ) π (cid:54) ( N + ) π , (cid:54) k (cid:54) (cid:98) N / (cid:99) − . By symmetry of the θ legk and θ cheb ∗ k , the same bound holds for k > (cid:98) N / (cid:99) − | θ legk − θ cheb k | follows from the triangle inequality. That is, for 0 (cid:54) k (cid:54) N −

1, wehave (cid:12)(cid:12)(cid:12) θ legk − θ cheb k (cid:12)(cid:12)(cid:12) (cid:54) (cid:12)(cid:12)(cid:12) θ legk − θ cheb ∗ k (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) θ cheb k − θ cheb ∗ k (cid:12)(cid:12)(cid:12) (cid:54) π ( N + ) + π ( k − N + ) N ( N + ) . of 15 N. HALE AND A. TOWNSEND -6 -5 -4 -3 -2 -1 || θ leg − θ cheb || ∞ || θ leg − θ cheb ∗ || ∞ N (cid:107) θ l e g − θ c h e b (cid:107) ∞ (cid:54) . N − (cid:107) θ l e g − θ c h e b ∗ (cid:107) ∞ (cid:54) ( π N ) − F IG . 2. The maximum distance between corresponding entries of θ leg and θ cheb (upper solid line) and θ leg and θ cheb ∗ (lowersolid line) as a function of N . The dashed lines depict the bounds derived in Lemma 2.2, which appear to be tight. Since k (cid:54) N −

1, we obtain (cid:12)(cid:12)(cid:12) θ legk − θ cheb k (cid:12)(cid:12)(cid:12) (cid:54) π ( N + ) + π ( N − ) N ( N + / ) (cid:54) π + π ( N + ) (cid:54) . N , (cid:54) k (cid:54) N − . (cid:3) C OROLLARY θ ∗ k = θ cheb k or θ ∗ k = θ cheb ∗ k for 0 (cid:54) k (cid:54) N − (cid:54) k (cid:54) N − | f k − f k , L , δθ cheb k | (cid:54) ( . ) L L ! || c || (2.9)or max (cid:54) k (cid:54) N − | f k − f k , L , δθ cheb ∗ k | (cid:54) ( π ) L L ! || c || , (2.10)respectively. Proof.

Follows immediately from combining Corollary 2.1 and Lemma 2.2. (cid:3)

We note that in (2.9) and (2.10) the terms multiplying (cid:107) c (cid:107) are independent of N , and so the numberof terms required in (2.6) to obtain a given precision is bounded independently of N . Table 1 showsthese terms for 1 (cid:54) L (cid:54)

18. Double precision is obtained for L (cid:38) θ ∗ k = θ cheb ∗ k and L (cid:38) θ ∗ k = θ cheb k .2.3 Computing the discrete cosine and sine transforms

To complete our fast algorithm for the NDCT we require a fast way to compute the sums containedinside the parenthesis of (2.6). In particular, writing (2.6) in vector form, we have f k , L = L − ∑ (cid:96) = even ( − ) (cid:98) ( (cid:96) + ) / (cid:99) (cid:96) ! D (cid:96) δθ DCT ( c [ (cid:96) ] ) + L − ∑ (cid:96) = odd ( − ) (cid:98) ( (cid:96) + ) / (cid:99) (cid:96) ! D (cid:96) δθ DST ( c [ (cid:96) ] ) , (2.11) FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM L . L / L ! 8 . × − . × − . × − . × − . × − . × − . × − . × − . × − (( π ) L L ! ) − . × − . × − . × − . × − . × − . × − . × − . × − . × − L

10 11 12 13 14 15 16 17 180 . L / L ! 4 . × − . × − . × − . × − . × − . × − . × − . × − . × − (( π ) L L ! ) − . × − . × − . × − . × − . × − . × − . × − . × − . × − Table 1. The terms multiplying (cid:107) c (cid:107) in (2.9) and (2.10) for 1 (cid:54) L (cid:54)

18. The shaded boxes show the smallest value of L for whichthese fall below single and double precision. where [ c [ (cid:96) ] ] n = n (cid:96) c n , n = , . . . , N −

1. When θ ∗ = θ cheb the DCT and DST take the form N − ∑ n = n (cid:96) c n cos (cid:32) n ( k + ) π N (cid:33) , N − ∑ n = n (cid:96) c n sin (cid:32) n ( k + ) π N (cid:33) , (cid:54) k (cid:54) N − , which are readily related to the DCT-III and DST-III, respectively. When θ ∗ = θ cheb ∗ they become N − ∑ n = n (cid:96) c n cos (cid:32) n (( k + ) + ) π N + (cid:33) , N − ∑ n = n (cid:96) c n sin (cid:32) n (( k + ) + ) π N + (cid:33) , (cid:54) k (cid:54) N − , which are equivalent to the odd terms of a DCT-III and DST-III of length 2 N +

1. Hence, both may beevaluated in O ( N log N ) operations.R EMARK θ cheb ∗ points require around half thenumber of terms in the Taylor series compared to θ cheb (see Table 1), we see from above that eachterm will be roughly twice as expensive to evaluate. Hence, we expect little difference between the twoapproaches when there is a 16-digit accuracy goal.R EMARK

REDFT01 and

RODFT01 , respectively.). Instead, we use the DCT and DST codesimplemented in Chebfun (Driscoll et al. , 2014), which compute the DCT and DST via a complex-valued FFT of double the length. Based on our experiments, we believe this to be a factor of 2-4 slowerthan calling

REDFT01 and

RODFT01 directly.R

EMARK The MATLAB signal processing toolbox has a dct() function, but this is computed via an FFT of double the length. of 15

N. HALE AND A. TOWNSEND

Direct approach

To test the fast algorithm above we construct T N ( x leg ) via the three-term recurrence relation: T ( x ) = , T ( x ) = x , T n + ( x ) = xT n ( x ) − T n − ( x ) , n (cid:62) . Although this approach requires O ( N ) operations to compute T N ( x leg ) c , the memory cost can easilybe reduced to O ( N ) by computing the matrix-vector product as a recurrence. Hereinafter, we refer tothis as the direct approach .We note that one could instead compute the NDCT using the closed form expression of T n ( cos θ ) toconstruct T N ( x leg ) = cos (cid:0) θ leg [ , , . . . , N − ] (cid:1) and then evaluate the matrix-vector product f = T N ( x leg ) c .However, such a closed form is not available when we come to compute P N ( x leg ) in the next section.2.5 Numerical results for the NDCT

All the numerical results were performed on a 3.40GHz Intel Core i7-2600 PC with MATLAB 2015ain Ubuntu Linux. The vector of transformed nodes θ leg is computed with the legpts() command inChebfun (Driscoll et al. , 2014, v5.2) which uses Bogaert’s algorithm (Bogaert, 2014). As discussed inRemark 2.2, DCTs and DSTs are computed using the chebfun.dct() and chebfun.dst() routines.MATLAB codes for reproducing all of the results contained within this paper are available online (Hale& Townsend, 2015).As a ﬁrst test we take randomly distributed vectors c with various rates of decay and comparethe accuracy of both the direct and FFT-based approaches against an extended precision computation(Figure 3). The results for θ cheb and θ cheb ∗ in the FFT-based approach are virtually indistinguishableso we show only the former. We ﬁnd the FFT-based approaches are more accurate than the directapproach, despite our algorithm involving many signiﬁcant approximations. In many applications theChebyshev expansion in (2.1) is derived by an approximation of a smooth function. In this setting,if the function is H¨older continuous with α > /

2, we observe that that the FFT-based algorithm hasessentially no error growth with N .Our second test investigates the time taken to compute a real-valued NDCT of length N . Figure 4compares the direct approach (squares) and the FFT-based approach with θ cheb (triangles) and θ cheb ∗ (circles), in the range 100 (cid:54) N (cid:54) , N is typical of FFT-based algorithms, where the theoretical cost of the FFT as well as theprecise algorithm employed by FFTW depends on the prime factorization of N . The triangles andcircles lie on a piecewise linear least squares ﬁt to the computational timings, to suggest some kind of‘average’ time for near-by values of N . The variation also makes it difﬁcult to say when the FFT-basedalgorithm becomes more efﬁcient than the direct approach, but it occurs around N = ,

3. The discrete Legendre transform

Our procedure for computing the DLT in (1.1) splits naturally into two stages. The ﬁrst deals with thenon-uniform frequency present in P n ( cos θ ) and converts the transform to one involving T n ( cos θ ) withuniform frequency in θ . The second stage deals with the non-uniform grid cos ( θ leg ) , . . . , cos ( θ legN − ) andis the NDCT from the previous section. In particular, the vector corresponding to, say, N =

50 with O ( n − . ) decay can be reproduced exactly by the MATLAB code rng(0,’twister’); c = rand(50,1)./sqrt(1:50).’; . FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM -16 -15 -14 -13 -12 -11 -10 -9 O ( n ) O ( n − . ) O ( n − ) O ( n − . ) N A b s o l u t ee rr o r A b s o l u t ee rr o r O ( N . ) O ( N ) O ( N . ) O ( N ) -16 -15 -14 -13 -12 -11 -10 -9 O ( n ) O ( n − . ) O ( n − ) O ( n − . ) N O ( N / l o g N ) O ( N . / l og N ) O ( N . l og N ) O ( l og N ) F IG . 3. Errors in computing the NDCT of vectors with various rates of decay using the direct method (left) and the new FFT-basedapproach with θ ∗ = θ cheb (right). In both cases, a vector of length N is created using randn(N,1) in MATLAB and then scaledso that the n th entry is O ( n ) , O ( n − . ) , O ( n − ) , and O ( n − . ) , giving the four curves in each panel. The dashed lines depictheuristic observations of the error growth in each case. The log factors may seem somewhat arbitrary, but the lines give a muchbetter ﬁt to the data when these are included. Denote by M the N × N upper-triangular matrix given by M kn =  π Γ ( n / + / ) Γ ( n / + ) , = k (cid:54) n (cid:54) N − , n even , π Γ (( n − k ) / + / ) Γ (( n + k ) / + ) , < k (cid:54) n (cid:54) N − , k + n even , , otherwise . If c is the vector of Legendre coefﬁcients in (1.1) and ˆ c = Mc , then we have (Alpert & Rokhlin, 1991) f k = N − ∑ n = c n P n ( x legk ) = N − ∑ n = ˆ c n T n ( x legk ) , (cid:54) k (cid:54) N − . (3.1)Now, the summation on the righthand side takes the form of the NDCT (2.1) and we may use thealgorithm described in Section 2.3 to compute f in O ( N log N ) operations.To compute ˆ c directly, i.e. via the matrix-vector product Mc , requires O ( N ) operations. Instead,we note that the transform ˆ c = Mc can be computed in O ( N ( log N ) / loglog N ) operations using theFFT-based algorithm described in (Hale & Townsend, 2014) or by the fast multipole-based approachin (Alpert & Rokhlin, 1991). In this paper we use the algorithm in (Hale & Townsend, 2014) because ithas no precomputational cost and so allows for adaptive discretizations.Another convenient property of the transforms in this paper is that there is an asymptotically optimalselection of algorithmic parameters for any working tolerance. In the NDCT, the number of terms inthe Taylor series expansion can be precisely determined based on the working tolerance. We would likethis to also hold for the O ( N ( log N ) / loglog N ) Chebyshev–Legendre transform described in (Hale &Townsend, 2014). We remark that though the paper considered only a working tolerance of machineprecision throughout, the algorithmic parameters were derived and determined in terms of ε >

0. Weslightly modiﬁed our implementation to exploit arbitrary working tolerances.0 of 15

N. HALE AND A. TOWNSEND -3 -2 -1 recurrencecheb cheb ∗ N T i m e ( s ec s ) O ( N ) O ( N l o g N ) F IG . 4. Time to compute a NDCT of various lengths using the direct approach (squares) and FFT-based approaches from Section 2with x cheb (triangles) and x cheb ∗ (circles). For larger values of N the two FFT-based approaches require a comparable time, andthe crossover with the direct approach occurs at around N = , x cheb ∗ time (circles) at around N =

650 is caused by a switch in algorithmic details of the FFT routine.

Numerical results for the discrete Legendre transform

First, we take random vectors c with normally distributed entries and then scale them to have algebraicrates of decay O ( n ) , O ( n − . ) , O ( n − ) , and O ( n − . ) , as in Section 2.5. The accuracy of both the directand FFT-based approaches is compared against an extended precision computation. The results areshown in Figure 5 and one can see that the errors for the direct and FFT-based approach are comparable.In most practical applications the Legendre coefﬁcients are associated to a smooth function so theydecay at least algebraically.While we achieve a similar error to the direct approach, our FFT-based approach has a lower com-plexity and are more computationally efﬁcient for large N . Figure 6 shows a comparison of the compu-tational times for the direct and FFT-based approach. When comparing with Figure 4 we see that thecomputational time for the DLT is dominated by the leg2cheb transformation. For an accuracy goal ofdouble precision, the crossover between the direct and FFT-based approach is approximately N = ,

4. The inverse discrete Legendre transform

We now turn to the IDLT, which can be seen as taking values of a function on an N -point Legen-dre grid and returning the corresponding Legendre coefﬁcients of the degree N − O ( N ( log N ) / loglog N ) complexity. Unlike most NFFT algorithms which resort to an iterative ap-proach for computing the inverse transform (Anderson & Dahleh, 1996; Dutt & Rokhlin, 1993), herewe take advantage of the orthogonality of Legendre polynomials, and in particular the exactness ofGauss–Legendre quadrature, to derive a direct method for the IDLT.It is convenient to work with matrix notation rather than summations, and we denote the N × N matrix with ( j , k ) entries P k ( x legj ) as P N ( x leg ) and the N × N with ( j , k ) entry T k ( x legj ) as T N ( x leg ) . Us-ing the orthogonality of Legendre polynomials and the exactness of Gauss–Legendre quadrature, theIDLT (1.2) can then be conveniently written as c = P − N ( x leg ) f = D s P TN ( x leg ) D w leg f , (4.1) FAST FFT-BASED DISCRETE LEGENDRE TRANSFORM

11 of 15 -16 -15 -14 -13 -12 -11 -10 -9 O ( n ) O ( n − . ) O ( n − ) O ( n − . ) N A b s o l u t ee rr o r O ( N . / l o g N ) O ( N / l o g N ) O ( N . / l o g N ) O ( N / l og N ) -16 -15 -14 -13 -12 -11 -10 -9 O ( n ) O ( n − . ) O ( n − ) O ( n − . ) N O ( N . / l o g N ) O ( N / l o g N ) O ( N . / l o g N ) O ( N / l og N ) F IG . 5. Errors in computing the DLT of vectors with various rates of decay using the direct approach (left) and the FFT-basedapproach (right). In both cases, a vector of length N is created using randn(N,1) in MATLAB and then scaled so that the n th entry is O ( n ) , O ( n − . ) , O ( n − ) , and O ( n − . ) , giving the four curves in each panel. The dashed lines depict heuristicobservations of the error growth in each case. where s = ( (cid:107) P (cid:107) − , . . . , (cid:107) P N − (cid:107) − ) T and w leg is the vector of Gauss–Legendre quadrature weights.From (3.1) we know that, for any vector c , P N ( x leg ) c = T N ( x leg ) ˆ c = T N ( x leg ) Mc , and hence we have thematrix decomposition P N ( x leg ) = T N ( x leg ) M . Substituting this into (4.1) we obtain c = D s M T T TN ( x leg ) D w f . Thus, c can also be computed in O ( N ( log N ) / loglog N ) operations because:1. D w and D s are diagonal matrices, so can be readily be applied to a vector in O ( N ) operations,2. T TN ( x leg ) is precisely the transpose of the NDCT in Section 2. Since the transpose of the DCTsand DSTs in Section 2.3 can themselves be expressed in terms of DCTs and DSTs (of type-II), T TN ( x leg ) can be approximated in O ( N log N ) operations by taking the transpose of (2.11).3. The matrix-vector product with M T is closely related to the Chebyshev–Legendre transform andcan be computed in O ( N ( log N ) / loglog N ) operations by a slight modiﬁcation of the algorithmin (Hale & Townsend, 2014). (Alternatively, one could use the ideas in (Alpert & Rokhlin, 1991).)4.1 Numerical results for the inverse discrete Legendre transform

For the inverse transform, the direct approach requires O ( N ) operations to compute the IDLT of length N , rather than the naive O ( N ) standard inversion algorithm. This is because the direct approachwe have implemented exploits the discrete orthogonality relation that holds for Legendre polynomi-als (see (4.1)). The matrix-vector product with P TN ( x leg ) in (4.1) is computed by using the three-termrecurrence relations satisﬁed by Legendre polynomials, except now along rows instead of columns.For our ﬁrst numerical experiment we take randomly generated vectors with normally distributedentries and compare the accuracy of both the direct and FFT-based approach against an extended preci-sion computation (see Figure 7, left). Unlike in Section 2.5 and Section 3.1 we do not consider decayin these vectors, as they will typically correspond to function values in space where there is no reason2 of 15 REFERENCES -2 -1 recurrencecheb N T i m e ( s ec s ) O ( N ) O ( N ( l o g N ) / l o g l o g N ) F IG . 6. Time taken to compute the DLT of length N using the direct approach (squares) and FFT-based approach (triangles). Herewe choose 1000 logarithmically spaced values of N in the range 10 –10 . The larger squares and rectangles are every 50th one ofthese. We omit the results for the FFT-based approach using the points x cheb ∗ as they are indistinguishable from those of x cheb . to expect decay. We ﬁnd that the errors incurred by our FFT-based IDLT are consistent with the directapproach.In Figure 7 (right) we show the execution time for computing the IDLT with the direct approach(squares) and FFT-based approach (triangles). As with the DLT, our IDLT algorithm is more computa-tionally efﬁcient than the direct approach when N (cid:62) , Conclusion

We have presented fast and simple algorithms for the discrete Legendre transform and its inverse, whichrely on a nonuniform discrete cosine transform and the Chebyshev–Legendre transform. Both com-ponents are based on the fast Fourier transform and have no precomputational cost. For an N -pointtransform both algorithms have a complexity of O ( N ( log N ) / log log N ) operations and are faster thanthe direct approach when N (cid:62) , et al. , 2014) via chebfun.dlt() and chebfun.idlt() , respectively. As part of theanalysis of the algorithm we derived a bound on the distance between associated points in a Chebyshevand Legendre grid of the same size (see Lemma 2.2), which we believe to be tighter than any similarbounds appearing in the literature. Acknowledgments

We thank Andr´e Weideman and Anthony Austin for reading a draft version of this manuscript, and theanonymous referees for their useful suggestions.

References A LPERT , B. K. & R

OKHLIN , V. (1991) A fast algorithm for the evaluation of Legendre expansions.

EFERENCES

13 of 15 -14 -13 -12 recurrencecheb N A b s o l u t ee rr o r O ( N l o g N ) -2 -1 recurrencecheb N T i m e ( s ec s ) O ( N ) O ( N ( l o g N ) / l o g l o g N ) F IG . 7. Left: Errors in computing the IDLT of vectors using the direct method and the FFT-based approach. A vector of length N is created using randn(N,1) in MATLAB with the dashed line showing the observed growth of the maximum absolute error.Right: Time taken to compute the IDLT of length N using the direct approach (squares) and FFT-based approach (triangles). SIAM J. Sci. Statist. Comput. , , 158–179.A NDERSON , C. & D

AHLEH , M. D. (1996) Rapid computation of the discrete Fourier transform.

SIAMJ. Sci. Comput. , , 913–919.B OGAERT , I. (2014) Iteration-free computation of Gauss–Legendre quadrature nodes and weights.

SIAM J. Sci. Comput. , , A1008–A1026.B RASS , H., F

ISCHER , J.-W. & P

ETRAS , K. (1996) The Gaussian quadrature method.

Abh. Braun-schweig. Wiss. Ges. , , 115–150 (1997).DLMF (2014) http://dlmf.nist.gov/ (Release 1.0.9). Online companion to Olver et al. (2010) .D RISCOLL , T. A., H

ALE , N. & T

REFETHEN , L. N. (2014)

Chebfun Guide . Pafnuty Publications,Oxford, UK.D

UTT , A. & R

OKHLIN , V. (1993) Fast Fourier transforms for nonequispaced data.

SIAM J. Sci.Comput. , , 1368–1393.D UTT , A. & R

OKHLIN , V. (1995) Fast Fourier transforms for nonequispaced data. II.

Appl. Comput.Harmon. Anal. , , 85–100.F ENN , M. & P

OTTS , D. (2005) Fast summation based on fast trigonometric transforms at non-equispaced nodes.

Numer. Linear Algebra Appl. , , 161–169.F RIGO , M. & J

OHNSON , S. G. (2005) The design and implementation of FFTW3.

Proceedings of theIEEE , , 216–231. Special issue on “Program Generation, Optimization, and Platform Adaptation”.G ENTLEMAN , W. M. (1972) Implementing Clenshaw–Curtis quadrature II: Computing the cosinetransformation.

Communications of the ACM , , 343–346.G LASER , A., L IU , X. & R OKHLIN , V. (2007) A fast algorithm for the calculation of the roots ofspecial functions.

SIAM Journal on Scientiﬁc Computing , , 1420–1438.4 of 15 REFERENCES H ALE , N. & T

OWNSEND , A. (2014) A fast, simple, and stable Chebyshev–Legendre transform usingan asymptotic formula.

SIAM J. Sci. Comput. , , A148–A167.H ALE , N. & T

OWNSEND , A. (2015) (Accessed 30April 2015). MATLAB code to reproduce the results in this paper.I

SERLES , A. (2011) A fast and simple algorithm for the computation of Legendre coefﬁcients.

Numer.Math. , , 529–553.K EINER , J. (2008) Gegenbauer polynomials and semiseparable matrices.

Electron. Trans. Numer.Anal. , , 26–53.K EINER , J. (2009) Computing with expansions in Gegenbauer polynomials.

SIAM J. Sci. Comput. , ,2151–2171.K EINER , J., K

UNIS , S. & P

OTTS , D. (2009) Using NFFT 3 — a software library for various nonequi-spaced fast Fourier transforms.

ACM Trans. Math. Software , , Art. 19, 30.K UNIS , S. (2008) Nonequispaced fast fourier transforms without oversampling.

PAMM , , 10977–10978.M ORI , A., S

UDA , R. & S

UGIHARA , M. (1999) An improvement on Orszag’s fast algorithm forLegendre polynomial transform.

Trans. Inform. Process. Soc. Japan , , 3612–3615.O LVER , F. W. J., L

OZIER , D. W., B

OISVERT , R. F. & C

LARK , C. W. (eds) (2010)

NIST Handbookof Mathematical Functions . New York, NY: Cambridge University Press. Print companion to DLMF(2014).O’N

EIL , M., W

OOLFE , F. & R

OKHLIN , V. (2010) An algorithm for the rapid evaluation of specialfunction transforms.

Applied and Computational Harmonic Analysis , , 203–226.P OTTS , D., S

TEIDL , G. & T

ASCHE , M. (1998) Fast algorithms for discrete polynomial transforms.

Math. Comp. , , 1577–1590.P OTTS , D. (2003) Fast algorithms for discrete polynomial transforms on arbitrary grids.

Linear AlgebraAppl. , , 353–370. Special issue on structured matrices: analysis, algorithms and applications(Cortona, 2000).S TIELTJES , T.-J. (1890) Sur les polynˆomes de Legendre.

Ann. Fac. Sci. Toulouse Sci. Math. Sci. Phys. , , G1–G17.S WARZTRAUBER , P. N. (2003) On computing the points and weights for Gauss–Legendre quadrature.

SIAM J. Sci. Comput. , , 945–954.S ZEG ¨ O , G. (1936) Inequalities for the zeros of Legendre polynomials and related functions. Trans.Amer. Math. Soc. , , 1–17.T OWNSEND , A. (2015a) A fast analysis-based discrete Hankel transform using asymptotic expansions. (Submitted) .T OWNSEND , A. (2015b) The race for high order gauss-legendre quadrature.

SIAM News , March . EFERENCES

15 of 15T

YGERT , M. (2010) Recurrence relations and fast algorithms.

Appl. Comput. Harmon. Anal. , ,121–128.W ARE , A. F. (1998) Fast approximate Fourier transforms for irregularly spaced data.

SIAM Rev. ,40