[PDF] A High-dimensional Sparse Fourier Transform in the Continuous Setting

Abstract

In this paper, we theoretically propose a new hashing scheme to establish the sparse Fourier transform in high-dimensional space. The estimation of the algorithm complexity shows that this sparse Fourier transform can overcome the curse of dimensionality. To the best of our knowledge, this is the first polynomial-time algorithm to recover the high-dimensional continuous frequencies.

Full PDF

aa r X i v : . [ c s . D S ] F e b A Note on the High-dimensional Sparse FourierTransform in the Continuous Setting

Liang Chen ∗ Abstract

In this paper, we theoretically propose a new hashing scheme to establish the sparse Fouriertransform in high-dimension space. The estimation of the algorithm complexity shows that thissparse Fourier transform can overcome the curse of dimensionality. To the best of our knowledge,this is the ﬁrst polynomial-time algorithm to recover the high-dimensional continuous frequencies.

Keywords:

Curse of dimensionality, Frequency estimation, Runtime complexity, Sparse Fouriertransform.

The sparse Fourier transform (SFT) has received continuous attention from applied mathematics([1, 2, 3, 4, 5, 6]), signal processing ([7, 8, 9, 10, 11, 12]), and theoretical computer science communities([13, 14, 15, 16, 17, 18]) over the last two decades. Since the sample complexity and runtime complexityof the SFT are mainly aﬀected by the sparsity, and less aﬀected by the bandwidth, the SFT hasgreat advantages in signal processing. Most of the relevant works deals with discrete case where thefrequencies are on the grid. Under such condition, the SFT can overcome the curse of dimensionality([19, 18, 20, 21, 6]). However, this condition that the frequencies are on the grid is so strong. Itis natural for people to consider the case that the frequencies are in a continuous region. This hasled researchers to establish the sparse Fourier transform in the one-dimensional continuous setting([22, 23, 24]). Recently, [25] initiates the study on the SFT in the high-dimensional continuous setting.Unfortunately, their SFT method is still subject to the curse of dimensionality, namely, its runtimecomplexity is greater than 2 O ( d ) ( d stands for the dimension).In this paper, we present a new hashing scheme to transform the high-dimensional SFT into theone-dimensional SFT. The computational complexity of this algorithm is polynomial, which meansthat the algorithm can break the curse of dimensionality. To the best of our knowledge, this is theﬁrst polynomial-time algorithm to recover frequencies in the high-dimensional continuous setting.Formally, we consider the signal f of the following form f ( t ) = k X j =1 f j ( t ) , k X j =1 a j exp(2 πiw j · t ) , (1.1)where t ∈ R d , w j ∈ [ − M, M ] d , a j ∈ C , 0 < A ′ ≤ | a j | ≤ A for all j = 1 , , . . . , k and min ≤ i η .Our goal is to recover w j , a j , j = 1 , , . . . , k . ∗ Department of Mathematics, Jiujiang University, Jiujiang, 332000, China. E-mail address: [email protected] . Support in part by Natural Science Foundation of China under grants 11971490 We ﬁrst give some symbols and notions. Let J ∗ denote the transpose of the matrix J . Denote foreach t ∈ R by ⌊ t ⌋ the largest integer not bigger than t . LetΓ , { ( x F , . . . , x d F ) : x i ∈ Z ∩ [0 , T F ) } , Γ , { ( ξ T , . . . , ξ d T ) : ξ i ∈ Z ∩ [ − T F , T F } . The discrete Fourier transform of the function g takes the following form F [ g ]( ξ ) = 1( √ T F ) d X x ∈ Γ g ( x ) exp( − πix · ξ ) , ξ ∈ Γ , and the inverse discrete Fourier transform of the function g is deﬁned by F − [ g ]( x ) = 1( √ T ) d X ξ ∈ Γ g ( ξ ) exp(2 πix · ξ ) , x ∈ Γ . For x, y ∈ Γ , x ± y , x ± y ( modT ) ∈ Γ . For x, y ∈ Γ , x ± y , x ± y ( modF ) ∈ Γ .The “bucket” is deﬁned by B j , { ( ξ /T, . . . , ξ d /T ) ∈ Γ : ξ d ∈ [ T F ( j − s − F T , T F js − F T } , ≤ j ≤ s. (2.2)The hashing transform is deﬁned by H ( ξ ) , h ( ξ ) + (0 , . . . , , b/T ) ∗ , where h ( ξ ) = (cid:20) I d − V h h d (cid:21) ξ, here I d − is the identity matrix of order d − V h = ( h , . . . , h d − ) and { h , . . . , h d } are independentdraws from the uniform distribution on the set { n + 1 : n ∈ Z } ∩ [0 , F/η ] , where T > /η . Therandom variable b obeys the uniform distribution on the set Z ∩ { [0 , T Fs ) } .In this paper, we assume that T, F, s are powers of 2 and 1 < s < F . Next, we give a key lemma.

Lemma 2.1

Suppose < δ < , < ǫ < min { , η, A ′ , A } , s = O ( √ dk δ ) , T = O ( k d / / (( ǫδds ) ηδ )) , F = O ( k M/δ ) . With probability at least − δ/ over the randomness of { h , . . . , h d , b } , for each j ∈ { , . . . , s } , either I j, , T F ) d X x ∈ Γ (cid:12)(cid:12) e πibxdTF F − [ X j · F [ f H ]]( h ∗ ( x )) (cid:12)(cid:12) ≤ ( Aǫδ ds ) . (2.3) or there exists a unique j k ∈ { , , . . . , k } such that I j, , T F ) d X x ∈ Γ (cid:12)(cid:12) f j k ( x ) − e πibxdTF F − [ X j · F [ f H ]]( h ∗ ( x )) (cid:12)(cid:12) ≤ ( Aǫδ ds ) . (2.4) Where f H ( x ) , f (( h − ) ∗ x ) exp( − πi ( h − ) ∗ x · (0 , . . . , , b/T ) ∗ ) , and X j ( ξ ) , (cid:26) ξ ∈ B j ξ / ∈ B j , j = 1 , , . . . , s. By Markov’s inequality, we have a corollary.

Corollary 2.2

All parameters are set as Lemma 2.1. For any l ∈ { , , . . . , d } , choosing the ran-dom vector X l , ( x , . . . , x l − , x l +1 , . . . , x d ) which obeys the uniform distribution on the set Z d − ∩ [0 , F T ) d − , then with probability at least − δ/ (4 ds ) over the randomness of ( x , . . . , x l − , x l +1 , . . . , x d ) ,we have X x l ∈ Z ∩ [0 ,F T ) T F (cid:12)(cid:12) e πibxdTF F − [ X j · F [ f H ]]( h ∗ ( x )) (cid:12)(cid:12) ≤ dsI j, /δ and X x l ∈ Z ∩ [0 ,F T ) T F (cid:12)(cid:12) f j ( x ) − e πibxdTF F − [ X j s · F [ f H ]]( h ∗ ( x )) (cid:12)(cid:12) ≤ dsI j, /δ. Where x = ( x /F, . . . , x l − /F, x l /F, x l +1 /F, . . . , x d /F ) ∗ . We explain Lemma 2.1 at a high level. It is well known that for the discrete Fourier transform,enlarge the sampling interval can enhance the frequency concentration (the frequency resolution) ofeach component f j (see Lemma 3.1), and the frequency gap between diﬀerent components is greaterthan a constant value η . Therefore, with the expanding of the sampling interval, η will be greaterthan the radius of the support set (the main part of the energy) for each signal component in thefrequency domain. And the hashing transform we deﬁned can transform the gap between frequenciesto the gap between the last coordinates of the frequencies. Thus, this hashing transform can keepthe main frequencies of the same component into the same bucket, and diﬀerent components can beisolated into diﬀerent buckets.When the diﬀerent components of the signal are isolated, we only need to recover the frequency andamplitude in each bucket. By Lemma 2.1, there are only two cases of such amplitude and frequency,one is that they are close to the amplitude and frequency of a certain component (up to the hashingtransformation), and the other is that the amplitude is less than O ( ǫ ).Speciﬁcally, for 1 ≤ j ≤ s , 1 ≤ l ≤ d , set g j,l ( t ) = exp(2 πiδ d,l b ⌊ tF ⌋ / ( T F )) F − [ X j s · F [ f H ]]( h ∗ ( x l,t )) , where δ d, = 1 , when l = d ; otherwise, δ d,l = 0 and x l,t = ( x , . . . , x l − , ⌊ tF ⌋ /F, x l +1 , . . . , x d ) . Here ( x , . . . , x l − , x l +1 , . . . , x d ) is the random vector obeys the uniform distribution on the set Z d − ∩ [0 , F T ) d − . The sample value of the function g j,l is calculated by the method in Lemma 4.1. Let F ≥ O ( √ dM/ǫ ), by Lemma 2.1 and Corollary 2.2, either there is a signal component f j k with frequency w j k such that 1 T Z

2, we turn to recover the amplitude and frequency of g j +1 , ( t ); otherwise, we keep w oj k , and continue to use the algorithm in [23] to recover the frequencies of g j, ( t ) , . . . , g j,d ( t ). Finally, { w oj k , , . . . , w oj k ,l , . . . , w oj k ,d } can be recovered element-wisely. Note : From Theorem 1.1 in [23], it takes at most O ( ds log( T F ) log( ds/ ( δǫ )) /δ ) samples of thefunction g j,l to recover the frequency w j k ,l (up to the accuracy A ǫ/ ( T A ′ )) with probability at least1 − δ/ (4 ds ). By Lemma 4.1, in order to get O ( ds log( T F ) log( ds/ ( δǫ ))) samples (up to the accuracy A ǫ ) of g j,l with probability at least 1 − δ , we need a total of O ( k ds ln ( T F + 1) log(

T F ) log(

T F ds/ ( δǫ )) ln (1 /δ ) / ( δǫ ))samples and running time. Therefore, we have the following result. Theorem 2.3

All parameters are set as Lemma 2.1, besides, F ≥ O ( √ dM/ǫ ) . The above algorithmcan output { w o , . . . , w ok } such that | w oj − w j | < O ( ǫA √ dA ′ T ) < O ( A ǫ /A ′ √ dk ) , i = 1 , , . . . , k. holds with probability at least − δ over the randomness of { h , . . . , h d , b } , { X l } dl =1 (see Corollary 2.2), T (see Lemma 4.1). The runtime complexity and sample complexity are at most O ( k d s ln ( T F + 1) log(

T F ) log(

T F ds/ ( δǫ )) ln (1 /δ ) / ( ǫ δ )) . The success probability can be boosted to 1 by repeatedly restarting (as indicated in ([14, 23])).Since | w oj − w j | < O ( A ǫ /A ′ √ dk ) and ǫ < η , we have a j = ( ǫ /k ) d Z t ∈ [0 ,k/ǫ ] d f ( t ) e − πiw oj · t dt + O ( A ǫ/ ( A ′ k )) . We can use the Monte Carlo method to compute the integral ( ǫ /k ) d R t ∈ [0 ,k/ǫ ] d f ( t ) e − πiw oj · t dt , byHoeﬀding’s inequality, with probability at least 1 − δ , the amplitude a j can be recovered (up to A ǫ/A ′ ) with O (log(1 /δ ) k A ′ /ǫ ) random samples. To prove Lemma 2.1, we need some technical lemmas.

Lemma 3.1

Let < β < F/ , for each f j , the following inequalities T F ) d X x ∈ Γ (cid:12)(cid:12) f j ( x ) − F − [ X ′ j,β · F ( f j )]( x ) (cid:12)(cid:12) ≤ O ( d / A T β ) , holds for any l ∈ { , , . . . , d } . Where x = ( x /F, . . . , x l − /F, x l /F, x l +1 /F, . . . , x d /F ) ∗ , and X ′ j,β ( ξ ) , (cid:26) | ξ − w j | ≤ β/ otherwise , i = 1 , . . . , k. Proof:

For any α ∈ [0 , /T ) and any n ∈ Z ∩ [0 , F T ), we have1 T F (cid:12)(cid:12)(cid:12)(cid:12)

T F − X j =0 exp(2 πiα ) exp (cid:18)(cid:18) − πijF (cid:19)(cid:18) nT (cid:19)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ T F | − exp(2 πi (1 /F )( n/T )) | ≤ O ( 1 n ) . (3.5)Since | a j | ≤ A , by (3.5), using Parseval’s identity, we have1( T F ) d X ξ ∈ Γ |X ′ j,β ( ξ ) · F [ f j ]( ξ ) − F ( f j )( ξ ) | ≤ O ( d / A T β ) , (3.6)which completes the proof. ✷ Lemma 3.2

Let s = O ( √ dk /δ ) , F = O ( k M/δ ) . For any w ∈ [ − M, M ] d with | w | ≥ η , | ( h ( w )) d | > F/s holds with probability at least − δ/k over the randomness of h , h , . . . , h d , where ( h ( w )) d denotes d-th component of the vector h ( w ) .Proof: Without loss of generality, let’s assume that the d -th component w d of w is greater than η/ √ d . Fix arbitrary h , . . . , h d − , since η/ √ d ≤ | w d | ≤ M , with probability at most δ/k (nogreater than the ratio of the width of the set ( − F/s, F/s ) to F/

6) over the randomness of h d , P di =1 h i w i ∈ ( − F/s, F/s ), which complete the proof. ✷ From Lemma 3.2, we have a direct consequence.

Corollary 3.3

All parameters are set as above lemma. Suppose the vectors w , . . . , w k satisfy min i = j | w i − w j | > η, max ≤ i ≤ k | w i | ≤ M, then min ≤ i F/s holds with probability at least − δ over the randomness of h , h , . . . , h d . Suppose

F β/η ≤ F δ/ ( √ dks ). For any ξ , ξ ∈ φ j,β , { ξ : | ξ − w j | ≤ β/ } , we have | ( h ( ξ ) − h ( ξ )) d | ≤ δF/ks < F/s . However, this does not mean that there is a bucket B j s such that h ( φ j,β ) ⊂ B j s . It is possible that h ( φ j,β ) intersects on the boundary of some bucket. Obviously, after addingrandom translation, with a certain probability (the ratio of the total “width” of the sets φ j,β , j =1 , , . . . , k to the “width” of the bucket), there exists a bucket B j s such that H ( φ j,β ) is completelyinside it, that is, the following conclusion holds. Lemma 3.4

Suppose

F β/η ≤ F δ/ ( √ dks ) , with probability at least − δ over the randomness of b ,there exists a bucket B j s such that H ( φ j,β ) ⊂ B j s for j=1,2,. . . ,k. Choosing s = O ( √ d ( k /δ )), β = ηδ/ ( √ dks ), T = O ( k d / / (( ǫδds ) ηδ )) and F = O ( k M/δ ).Since F [ f H ]( ξ ) = F [ f ]( Hξ ), combining Lemma 3.2, Corollary 3.3, Lemma 3.4 and Lemma 3.1 provesLemma 2.1. The following lemma reduces the computational complexity of the convolution operation.

Lemma 4.1

For any < ǫ < / , ≤ j ≤ k , suppose N = O ( k A ln ( T F ) ln (1 /δ ) ǫ ) , then with probability at least − δ/ over the randomness T , the following inequality holds sup x =( x F ,..., xdF ) ∗ ∈ Γ (cid:12)(cid:12)(cid:12)(cid:12) F − [ X j s · F [ f H ]]( x ) − N N X i =1 f H ( x − (cid:0) , . . . , , ⌊ ( T F + 1) t i − ⌋ F (cid:1) ∗ ) v ( t i ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ, (4.7) where T , { t , . . . , t N } are independent draws from the uniform distribution on [0 , , v ( t ) = ln( T F + 1)(

T F + 1) t v ( ⌊ ( T F + 1) t − ⌋ ) , and v ( t ) , ( exp( − πiF t )(exp( πitF ( j − s ) − exp( πitF js )) T F (1 − exp(2 πit/T F )) t > /s t = 0 . (4.8) Proof:

Observe that X w ∈ Z ∩ [0 ,T F ) exp( 2 πinwT F ) = 0 for n ∈ N + , then F − [ X j s · F [ f H ]]( x ) = P y ∈ Γ f H ( x − y ) F − ( X j )( y )( √ T F ) d = P z ∈ Z ∩ [0 ,T F ) f H ( x − (0 , . . . , , zF ) ∗ ) v ( z )= R T F f H ( x − (0 , . . . , , ⌊ z ⌋ F ) ∗ )( z + 1) ln( T F + 1) v ( z ) d (cid:0) ln( z +1)ln( T F +1) (cid:1) = R f H ( x − (0 , . . . , , ⌊ ( T F +1) t − ⌋ F ) ∗ ) v ( t ) dt, (4.9)where sup t ∈ [0 , | v ( t ) | ≤ O (ln( T F + 1)).Next, we consider the empirical Rademacher complexity ([26, 27]) of the following function spaces Q w,u,C , { q x ( t ) , u ( t ) sin(2 πw ( x − ⌊ ( T F + 1) t − ⌋ F ) } , and Q ′ w,u,C , { q ′ x ( t ) , u ( t ) cos(2 πw ( x − ⌊ ( T F + 1) t − ⌋ F ) } , where w, x, t ∈ R and u ( t ) is any real-valued function satisfying | u | ≤ C . Let d Re ( T ; Q w,u,C ) denotethe empirical Rademacher complexity for Q w,u,C and T , then d Re ( T ; Q w,u,C ) = E ξ ∼{± } N h sup x ∈ R P Ni =1 ξ i q x ( t i ) N i ≤ E ξ ∼{± } N (cid:20) sup ( z ,z ) ∈ [ − , P Ni =1 ξ i ( z y i, − z y i, ) N (cid:21) ≤ C √ N , (4.10)where y i, = cos( 2 πw ⌊ ( T F + 1) t i − ⌋ F ) u ( t i ) , y i, = sin( 2 πw ⌊ ( T F + 1) t i − ⌋ F ) u ( t i ) , the last inequality in (4.10) is obtained by Lemma 26.10 in [26]. Similarly, we have d Re ( T ; Q ′ w,u,C ) ≤ C √ N .Combining the last equality in (4.9) and Lemma A.10 in [27] proves Lemma 4.1. ✷ We do not consider the case of noise in this paper, so it is an issue that need to be discussed later.In addition, we can also discuss the reconstruction of the signals without frequency gap just likeone-dimensional case [24]. In terms of application, it is worthy to optimize the algorithm and thecomplexity estimation to make it more applicable to practice.

References [1] M. A. Iwen, “Combinatorial sublinear-time fourier algorithms,”

Foundations of ComputationalMathematics , pp. 303–338, 2010.[2] ——, “Improved approximation guarantees for sublinear-time fourier algorithms,”

Applied AndComputational Harmonic Analysis , vol. 34, no. 1, pp. 57–82, 2013.[3] D. Potts and T. Volkmer, “Sparse high-dimensional ﬀt based on rank-1 lattice sampling,”

Appliedand Computational Harmonic Analysis , vol. 41, no. 3, pp. 713–748, 2016.[4] S. Merhi, R. Zhang, M. A. Iwen, and A. Christlieb, “A new class of fully discrete sparse fouriertransforms: Faster stable implementations with guarantees,”

Journal of Fourier Analysis andApplications , no. 1, pp. 1–34, 2017.[5] S. Bittens, R. Zhang, and M. A. Iwen, “A deterministic sparse ﬀt for functions with structuredfourier sparsity,”

Advances in Computational Mathematics , vol. 45, no. 2, pp. 519–561, 2019.[6] B. Choi, M. A. Iwen, and F. Krahmer, “Sparse harmonic transforms: A new class of sublinear-timealgorithms for learning functions of many variables,”

Foundations of Computational Mathematics ,pp. 1–55, 2020.[7] S.-H. Hsieh, C.-S. Lu, and S.-C. Pei, “Sparse fast fourier transform by downsampling,” in . IEEE, 2013, pp.5637–5641.[8] A. C. Gilbert, P. Indyk, M. Iwen, and L. Schmidt, “Recent developments in the sparse fouriertransform: A compressed fourier transform for big data,”

IEEE Signal Processing Magazine ,vol. 31, no. 5, pp. 91–100, 2014.[9] S. Liu, T. Shan, R. Tao, Y. D. Zhang, and Y. Wang, “Sparse discrete fractional fourier transformand its applications,”

IEEE Transactions on Signal Processing , vol. 62, no. 24, p. 6582–6595,2014.[10] S. Pawar and K. Ramchandran, “FFAST: An algorithm for computing an exactly k -sparse DFTin O ( klogk ) time,” IEEE Transactions on Information Theory , vol. 64, no. 1, pp. 429–450, 2017.[11] S. Wang, V. M. Patel, and A. Petropulu, “The robust sparse fourier transform (rsft) and itsapplication in radar signal processing,”

IEEE Transactions on Aerospace and Electronic Systems ,pp. 2735–2755, 2017.[12] ——, “Multidimensional sparse fourier transform based on the fourier projection-slice theorem,”

IEEE Transactions on Signal Processing , vol. 67, no. 1, pp. 54–69, 2018.[13] E. Kushilevitz and Y. Mansour, “Learning decision trees using the fourier spectrum,”

SIAMJournal on Computing , vol. 22, no. 6, pp. 1331–1348, 1993.[14] A. C. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss, “Near-optimal sparsefourier representations via sampling,” in

Proceedings of the thiry-fourth annual ACM symposiumon Theory of computing , 2002, pp. 152–161.[15] A. C. Gilbert, S. Muthukrishnan, and M. Strauss, “Improved time bounds for near-optimalsparse fourier representations,” in

Wavelets XI , vol. 5914. International Society for Opticsand Photonics, 2005, p. 59141A.[16] H. Hassanieh, P. Indyk, D. Katabi, and E. Price, “Simple and practical algorithm for sparsefourier transform,” in

Proceedings of the twenty-third annual ACM-SIAM symposium on DiscreteAlgorithms . SIAM, 2012, pp. 1183–1194.[17] M. Kapralov, “Sparse fourier transform in any constant dimension with nearly-optimal samplecomplexity in sublinear time,” in

Proceedings of the forty-eighth annual ACM symposium onTheory of Computing , 2016, pp. 264–277.[18] M. Kapralov, A. Velingker, and A. Zandieh, “Dimension-independent sparse fourier transform,”in

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms . SIAM,2019, pp. 2709–2728.[19] L. K¨ammerer, D. Potts, and T. Volkmer, “High-dimensional sparse FFT based on sampling alongmultiple rank-1 lattices,”

Applied and Computational Harmonic Analysis , vol. 51, pp. 225–257,2020.[20] B. Choi, A. Christlieb, and Y. Wang, “Multiscale high-dimensional sparse fourier algorithms fornoisy data,” arXiv preprint arXiv:1907.03692 , 2019.[21] ——, “High-dimensional sparse fourier algorithms,”

Numerical Algorithms , pp. 1–26, 2020.[22] P. Boufounos, V. Cevher, A. C. Gilbert, Y. Li, and M. J. Strauss, “What’s the frequency, ken-neth?: Sublinear fourier sampling oﬀ the grid,”

Algorithmica , vol. 73, no. 2, pp. 261–288, 2015.[23] E. Price and Z. Song, “A robust sparse fourier transform in the continuous setting,” in . IEEE, 2015, pp. 583–600.[24] X. Chen, D. M. Kane, E. Price, and Z. Song, “Fourier-sparse interpolation without a frequencygap,” in . IEEE, 2016,pp. 741–750.[25] Y. Jin, D. Liu, and Z. Song, “A robust multi-dimensional sparse fourier transform in the contin-uous setting,” arXiv preprint arXiv:2005.06156 , 2020.[26] S. Shalev-Shwartz and S. Ben-David,

Understanding machine learning: From theory to algorithms .Cambridge university press, 2014.[27] Z. Allen-Zhu, Y. Li, and Y. Liang, “Learning and generalization in overparameterized neuralnetworks, going beyond two layers,” in