[PDF] Sparse Time-Frequency decomposition by dictionary learning

Abstract

In this paper, we propose a time-frequency analysis method to obtain instantaneous frequencies and the corresponding decomposition by solving an optimization problem. In this optimization problem, the basis to decompose the signal is not known. Instead, it is adapted to the signal and is determined as part of the optimization problem. In this sense, this optimization problem can be seen as a dictionary learning problem. This dictionary learning problem is solved by using the Augmented Lagrangian Multiplier method (ALM) iteratively. We further accelerate the convergence of the ALM method in each iteration by using the fast wavelet transform. We apply our method to decompose several signals, including signals with poor scale separation, signals with outliers and polluted by noise and a real signal. The results show that this method can give accurate recovery of both the instantaneous frequencies and the intrinsic mode functions.

Full PDF

aa r X i v : . [ c s . I T ] O c t Sparse Time-Frequency decomposition by dictionary learning

Thomas Y. Hou, ∗ Zuoqiang Shi † June 25, 2018

Abstract

In this paper, we propose a time-frequency analysis method to obtain instantaneous fre-quencies and the corresponding decomposition by solving an optimization problem. In thisoptimization problem, the basis to decompose the signal is not known a priori . Instead, it isadapted to the signal and is determined as part of the optimization problem. In this sense, thisoptimization problem can be seen as a dictionary learning problem. This dictionary learningproblem is solved by using the Augmented Lagrangian Multiplier method (ALM) iteratively.We further accelerate the convergence of the ALM method in each iteration by using the fastwavelet transform. We apply our method to decompose several signals, including signals withpoor scale separation, signals with outliers and polluted by noise and a real signal. The resultsshow that this method can give accurate recovery of both the instantaneous frequencies and theintrinsic mode functions.

Nowadays we must process a massive amount of data in our daily life and the scientiﬁc research.Data analysis methods have played an important role in processing and analyzing these data. Mostdata analysis methods use pre-determined basis, including the most commonly used Fourier trans-form and wavelet transform. While these data analysis methods are very eﬃcient in processing data,each component of the decomposition in general does not reveal the intrinsic physical informationof these data due to the presence of harmonics in the decomposition. For example, applicationof these traditional data analysis methods to a modulated oscillatory chirp signal would producemany components. Thus, it is essential to develop a truly adaptive data analysis method that canextract hidden physical information such as trend and time varying cycles from these data andpreserve the integrity of the physically meaningful components. To achieve this, we need to use adata-driven basis that is adapted to the signal instead of being determined a priori . ∗ Applied and Comput. Math, MC 9-94, Caltech, Pasadena, CA 91125.

Email: [email protected]. † Mathematical Sciences Center, Tsinghua University, Beijing, China, 100084.

Email: [email protected]. f ( t ) into the followingform: f ( t ) = M X j =1 a j ( t ) cos θ j ( t ) , t ∈ R , (1)where a j ( t ) , θ j ( t ) are smooth functions, θ ′ j ( t ) > , j = 1 , · · · , M and M is an integer that is given a priori . We assume that a j ( t ) and θ ′ j are less oscillatory than cos θ j ( t ). We call a j ( t ) cos θ j ( t )as the Intrinsic Mode Functions (IMFs) and θ ′ j , j = 1 , · · · , M the Instantaneous Frequencies [19].The objective of our data-driven time-frequency analysis is to extract the Intrinsic Mode Functionsand their Instantaneous Frequencies.In [17], we proposed to decompose the signal by solving the following optimization problem:Minimize M ( a k ) ≤ k ≤ M , ( θ k ) ≤ k ≤ M Subject to: f = M X k =1 a k cos θ k , a k cos θ k ∈ D , (2)where D is the dictionary consist of all IMFs (see [17] for its precise deﬁnition). Further, an eﬃcientalgorithm based on matching pursuit and Fast Fourier transform has been proposed to solve theabove nonlinear optimization problem. In a subsequent paper [18], we proved the convergence ofour algorithm for periodic data that satisfy certain scale separation property.Basis pursuit is another powerful method to obtain the sparsest representation of a signal.Unforturnately, the basis pursuit cannot apply directly to (10), since the dictionary D has inﬁnitelymany atoms. But if the number of IMFs, M , is known, the idea of basis pursuit can be generalized toapproximate the optimization problem (10). In this case, we can obtain the desirable decompositionby solving the following dictionary learning problem:min x ,θ ··· ,θ M k x k , subject to Φ θ , ··· ,θ M x = f, (3)where Φ θ , ··· ,θ M = [Φ θ , · · · , Φ θ M ] , (4)2nd Φ θ j , j = 1 , · · · , M is the basis to decompose the signal f . The speciﬁc form of the basis as afunction of θ j will be given in (8).In (3), the basis Φ θ , ··· ,θ M is not known a priori . It is determined by the phase functions θ j , j = 1 , · · · , M and the phase functions are adaptive to the data. We need to solve not only theoptimal coeﬃcients x but also the optimal basis Φ θ , ··· ,θ M . In this sense, the problem describedabove can be seen as a dictionary learning problem. Dictionary learning is a well-established subjectin signal processing. Many eﬃcient methods have been developed for this purpose [1, 13, 20, 21, 23].But the problem we study here has some important diﬀerences from a traditional dictionary learningproblem. In a traditional dictionary learning problem, a common setup starts with a training set,a collection of training vectors. The atoms of the dictionary can be any function. For the problemwe consider, we only have one signal not a training set. Moreover, the atoms of the dictionaryin our problem cannot be any function. If we do not put any constraint on the atoms of thedictionary, we may get only trivial decompositions. In order to get a reasonable result, the atomsof the dictionary are restricted to be nonlinear functionals of the phase function θ j . At the sametime, the phase functions are conﬁned to be a low dimensional space to make sure that the overalldegree of freedoms is not too large. Then we can still get a reasonable decomposition with onlyone mesurement of the signal.We need to develop a new method to solve this non-convex optimization problem, which is notrequired by a traditional dictionary learning problem. The key part is to ﬁnd the phase functions.Once the phase functions are known, we only need to solve a l optimization problem to getthe decomposition. Based on this observation, we develop an iterative algorithm to solve (3). Thisalgorithm starts from one initial guess of phase functions. In each step, the phase functions are ﬁxedand one l optimization problem is solved. The phase functions will updated in the next iteration.This precedure is repeated until the iteration converges. In each step, the Augmented LagrangianMultiplier method (ALM) is used to solve the linear l optimization problem. We further acceleratethe ALM method in each iteration by using the fast wavelet transform. This method can bealso generalized to decompose signals that contain outliers by enlarging the dictionary to includeimpulses.We will demonstrate the eﬀectiveness of our method by applying it to decompose several mul-tiscale data, include sythetic data and real data. For the data that we consider in this paper, wewill demonstrate that we can recover both the instantaneous frequencies and the intrinsic modefunctions very accurately. Even for those signals that are polluted by noise, we still can approxi-mate the instantaneous frequencies and the IMFs with reasonable accuracy comparable to the noiselevel.The remaining of the paper is organized as follows. In Section 2, we give the formulation ofour problem. In Section 3, an iterative algorithm is introduced to solve the nonlinear optimization3roblem. An accelerated procedure is introduced based on the fast wavelet transform in Section4. In Section 5, we generalize this method to deal with signals that have outliers. In Section 6,several numerical results are presented to demonstrate the eﬀectiveness of our method. Finally,some concluding remarks are made in Section 7. In this section, we will set up the framework of the sparse time-frequency decomposition. Let { V l } l ∈ Z be a multi-resolution approximation of L ( R ) and ϕ the associated scaling function, ψ thecorresponding wavelet function. Assume that ϕ is real and b ϕ has compact support, supp ( b ϕ ) =( − s ϕ , s ϕ ), where b ϕ is the Fourier transform of ϕ deﬁned below, b ϕ ( k ) = 12 π Z R ϕ ( t ) e − ikt dt. For each 1 ≤ j ≤ M , we assume that θ ′ j >

0. Since θ j is a strictly monotonously increasingfunction, we can use θ j as a coordinate. Then we can deﬁne the following wavelet basis in the θ j -coordinate, Π θ j = (cid:20) ψ l,n ( θ j ) l,n ∈ Z ,

In this paper, we construct the basis by utilizing the wavelet basis instead of theovercomplete Fourier basis which was used in our previous paper [17]. We choose the waveletbasis over the overcomplete Fourier basis because there are fast decomposition and reconstructionalgorithms using the wavelet basis. This feature makes our algorithm very eﬃcient. For periodicsignals, we can use the standard Fourier basis, which can be made very eﬃcient by using the FastFourier transform. But the Fourier basis works only for periodic signals. For general nonperiodicsignals, the wavelet basis seems to be more robust although it still suﬀers from the ”end eﬀect” nearthe boundary of the signal.

Inspired by the algorithm in [17], we propose the following iterative algorithm to solve the nonlinear l optimization problem (11). 5 lgorithm 1 (Gauss-Newton type iteration) Input:

Initial guess of phase functions θ j , j = 1 , · · · , M , η = l , where l is same as that in (5). Output:

Phase functions and corresponding envelopes: θ j , a j , j = 1 , · · · , M . while η ≥ do while M X j =1 k θ n +1 j − θ nj k > ǫ do Solve the following l optimization problem: (cid:16)e a n +1 , e b n +1 (cid:17) = argmin x , y ( k x k + k y k ) , (12)subject to Φ θ n , ··· ,θ nM · x + Ψ θ n , ··· ,θ nM · y = f, where Ψ θ n , ··· ,θ nM = [Ψ θ n , · · · , Ψ θ nM ] andΨ θ nj = (sin θ j )Π θ j , j = 1 , · · · , M. (13) Calculate the envelopes a j , b j , j = 1 , · · · M : a j = Π θ nj · e a n +1 j , b j = Π θ nj · e b n +1 j , (14) Update θ nj , j = 1 , · · · , M :∆ θ ′ j = P V η ( θ nj ) (cid:18) ddt (cid:18) arctan (cid:18) b j a j (cid:19)(cid:19)(cid:19) , ∆ θ j = Z t ∆ θ ′ j ( s ) ds, θ n +1 j = θ nj − β j ∆ θ j , where β j ∈ [0 ,

1] is chosen to make sure that θ n +1 j is monotonically increasing: β j = max (cid:26) α ∈ [0 ,

1] : ddt (cid:0) θ nj − α ∆ θ j (cid:1) ≥ (cid:27) . Here P V η ( θ nj ) is the projection operator to the space V η ( θ nj ) and V η ( θ ) = span (cid:20) ψ l,n ( θ ) l,n ∈ Z ,η

Input: p j = 0 , j = 1 , · · · , M , µ > while not converge do for j=1:M do Compute r mj = f − P j − l =1 Θ θ l p m +1 l − P Ml = j +1 Θ θ l p ml . Compute p m +1 j = argmin p j k p j k + µ k r mj + q k /µ − Θ θ j p j k . end for end while Theoretically, we need to run the above sweeping process several times until the solution con-verges, but in practical computations, in order to save the computational cost, we only run thesweeping process once. Combining this idea with the augmented Lagrange multiplier method, weobtain the following algorithm to solve the l optimization problem (12): Algorithm 3 (Sweeping ALM)

Input: p j = 0 , j = 1 , · · · , M , q = 0, µ > while not converge do for j=1:M do Compute r kj = f − P j − l =1 Θ θ l p k +1 l − P Ml = j +1 Θ θ l p kl . Compute p k +1 j = arg min p j k p j k + µ k r kj + q k /µ − Θ θ j p j k . end for q k +1 = q k + µ (cid:16) f − P Mj =1 Θ θ j p k +1 j (cid:17) . end while In this section, we propose an approximate solver to accelerate the computation of p k +1 j = arg min p j k p j k + µ k r kj + q k /µ − Θ θ j p j k which is the most expensive step in Algorithm 1.In this paper, we only consider the signal which is well resolved by the samples and the samplesare uniformly distributed in time and the total number of the samples is N . Based on these assump-tions, we can approximate the continuous integral by the discrete summation and the integrationerror is negligible. This greatly simpliﬁes our calculations.Now, we turn to simplify the optimization problem. First, we replace the standard L norm by8 weighted L norm which gives the following approximation: p k +1 j = arg min p j k p j k + µ k r kj + q k /µ − Θ θ j p j k ,θ j , (17)where k g k ,θ j = P i g i θ ′ j ( t i ).Using the fact that supp( b ϕ ) = ( − s ϕ , s ϕ ), it is easy to check that the columns of the matrix Θ θ are orthonormal under the weighted discrete inner product h g , h i θ = N X i =1 g i h i θ ′ ( t i ) = g · ( θ ′ h ) , (18)where g = [ g i ] , h = [ h i ] , θ ′ h = [ θ ′ ( t i ) h i ] , i = 1 , · · · , N .Using this property, it is easy to derive the following equality: k Θ θ · x k ,θ = k x k . (19)We can also deﬁne the projection operator P V ( θ ) to V ( θ ) space. Here V ( θ ) is the linear spacespanned by the columns of the matrix Θ θ and P V ( θ ) is the projection operator to V ( θ ). Since thecolumns of the matrix Θ θ are orthonormal under the weighted inner product (18), projection P V ( θ ) can be calculated as follows: P V ( θ ) ( r ) = Θ θ · b r , b r = Θ Tθ · (cid:2) θ ′ r (cid:3) . (20)Now, we are ready to show that the optimization problem (17) can be solved explicitly by theshrinkage operator. To simplify the notation, we denote w = r kj + q k /µ ,argmin p j k p j k + µ k r kj + q k /µ − Θ θ j p j k ,θ j = argmin p j k p j k + µ k Θ θ j p j − P V ( θ j ) ( w ) k ,θ j = argmin p j k p j k + µ k Θ θ j · h p j − Θ Tθ j · (cid:2) θ ′ j w (cid:3)i k ,θ j = argmin p j k p j k + µ k p j − Θ Tθ j · (cid:2) θ ′ j w (cid:3) k = S µ − (cid:16) Θ Tθ j · h θ ′ j (cid:16) r kj + q k /µ (cid:17)i(cid:17) , (21)where S τ is the shrinkage operator deﬁned below: S τ ( x ) = sgn( x ) max( | x | − τ, . (22)Notice that the matrix vector product in (21) has the following structure by the deﬁnition ofΘ θ j in (15) Θ Tθ j · (cid:0) θ ′ j r (cid:1) = h Π Tθ j · (cid:0) cos θ j r θ ′ j (cid:1) , Π Tθ j · (cid:0) sin θ j r θ ′ j (cid:1)i T . θ j r and cos θ j r in the θ j -coordinate, since thecolumns of Π θ j are standard wavelet basis in the θ j -coordinate. Then this product can be computedeﬃciently by interpolating r cos θ j and r sin θ j to the uniform grid in the θ j coordinate and employingthe fast wavelet transform.Summarizing the above discussion, we obtain Algorithm 4 based on the fast wavelet transformto solve the optimization problem (12): Remark 4.1.

The continuous version of formula (23) is given as follows: e a = (cid:18)Z R θ j ( t ) cos t ψ l,n ( t )d t (cid:19) l,n ∈ Z ,

Input:

Initial guess of phase functions θ j , j = 1 , · · · , M , η = l , where l is same as that in (5). Output:

Phase functions and the corresponding envelopes: θ j , a j , j = 1 , · · · , M . while η ≥ do while M X j =1 k θ n +1 j − θ nj k > ǫ do Solve the following l optimization problem: (cid:16)e a n +1 , e b n +1 , z n +1 (cid:17) = argmin x , y , z ( k x k + k y k + k z k ) , subject to: Φ θ n , ··· ,θ nM · x + Ψ θ n , ··· ,θ nM · y + z = f. Calculate the envelopes in the same way as we did in Algorithm 1. Update θ nj , j = 1 , · · · , M in the same way as we did in Algorithm 1. end while η = η − end while Moreover, the l optimization problem in the above iterative algorithm can be solved by thefollowing sweeping ALM method: Algorithm 6 (Sweeping ALM with outliers)

Input: p j = 0 , j = 1 , · · · , M , q = 0, µ > Output:

Phase functions and the corresponding envelopes: θ j , a j , j = 1 , · · · , M . while not converge do for j = 1 : M do r kj = f − j − X l =1 Θ θ l p k +1 l − M X l = j +1 Θ θ l p kl − z k . p k +1 j = arg min p j k p j k + µ k r kj + q k /µ − Θ θ j p j k . end for r k = f − M X l =1 Θ θ l p k +1 l . z k +1 = S µ − ( r k + q k /µ ). q k +1 = q k + µ  f − M X j =1 Θ θ j p k +1 j − z k +1  . end while p k +1 j = argmin p j k p j k + µ k r kj + q k /µ − Θ θ j p j k can be accelerated byAlgorithm 4 in the previous section. In this section, we present several numerical results to demonstrate the eﬀectiveness of our time-frequency analysis methods. In our previous paper [17], we have performed extensive numericalexperiments to demonstrate the eﬀectiveness of our data-driven time-frequency analysis methodfor signals with good scale separations. To save space, we will not consider the examples with goodscale separation in this paper and consider more chanllenging signals that do not have good scaleseparation property.

Example 1:

In the ﬁrst example, the signal contains two diﬀerent components whose instantaneousfrequencies intersect with each other. More speciﬁcally, the signal is generated by the formula below: f = cos θ ( t ) + cos θ ( t ) + X ( t ) , t ∈ [0 , , (29)where the phase function θ , θ are given as following: θ ( t ) = 39 . πt −

12 sin 2 πt, (30) θ ( t ) = 85 . πt + 12 sin 2 πt. (31)and X ( t ) is white noise with zero mean and variance σ = 1. The signal is sampled over 1024 gridpoints which are uniformly distributed over the interval [0 , πt and 32 πt respectively, which are far from the ground truth. As we can see, even withthese rough initial guesses, our iterative algorithm still can recover the instantaneous frequenciesand the corresponding IMFs with reasonable accuracy, although the signal is polluted by noise. Wenote that the end-eﬀect is more pronounced in this case due to the noise pollution. Example 2:

The next signal that we consider is polluted by outliers. We generate the signal byusing the following formula: f = cos θ ( t ) + cos θ ( t ) + σ ( t ) . (32)13 Instantaneous Frequency ( θ ’/2 π ) Figure 1: Left: Original signal in Example 2, red curve is the clean signal without noise; Middle:instantaneous frequencies, red: exact, blue: numerical; Right: corresponding IMFs, red: exact,blue: numerical.

Instantaneous Frequency ( θ ’/2 π ) Figure 2: Left: instantaneous frequencies, red: exact, blue: numerical; Middle: correspondingIMFs, red: exact, blue: numerical; Right: Outliers, red circle: exact, blue cross: numerical.The signal is sampled over 1024 uniform grid points. Among these samples, there are 32 samplesthat are outliers. The locations of these outliers are selected randomly and the strengths satisfythe normal distribution. In Fig. 2, we present the results that we obtain using the algorithm givenin Section 5. As we can see, both the IMFs and the outliers are captured very accurately.We also test the signal with outliers and noise. The results is shown in Fig. 3. In this example,the noise and the outliers are added to the original signal together. The signal is sampled over 1024uniform grid points. Among these samples, there are 32 samples that are outliers. The locationsof these outliers are selected randomly and the strengths satisfy the normal distribution whosestandard deviation is 1. The noise is Gaussian noise and the standard deviation is 0.1.In this case, the instantaneous frequencies and the IMFs are still quite accurate. But theoutliers are not captured as well as in the case with no noise. We would like to emphasize that14 θ ’/2 π ) Figure 3: Left: instantaneous frequencies, red: exact, blue: numerical; Middle: correspondingIMFs, red: exact, blue: numerical; Right: Outliers, red circle: exact, blue cross: numerical.it would be hard to distinguish the outliers from the noise when the amplitude of the outliers issmall. For the outliers whose amplitude is large, they can be separated from the noise by a propershrinkage operator. However, the shrinkage operator also kills the outliers whose whose amplitudeis comparable to or smaller than the noise level.

Example 3:

In our third example, we consider a real signal, a bat chirp signal. It is the digitized echolocationpulse emitted by the Large Brown Bat, Eptesicus Fuscus . The signal includes 400 samples andthe sampling period is 7 microseconds, so the total time span is 2.8 miliseconds.The signal is shown in the left panel of Fig. 4. The IMFs and instantaneous frequencies obtainedby our method are given in the middle and right panel of Fig. 4. Our method could gives preciseinstantaneous frequencies and also the sparse decomposition of the original signal. From Fig. 4,we can see that the signal is approximated very well by only three IMFs. Near the boundaries, theoriginal signal and the IMFs are all very small. In these regions, the frequencies actually do nothave any physical meaning. They are only auxiliary variables in the algorithm. In the region inwhich the amplitude of the signal is order one, the recovered instantaneous frequencies reveal someinteresting patterns that have not been seen before using traditional time-frequency methods. Thephysical signiﬁcance of these patterns need to be further investigated in the future. Example 4:

In the last example, we consider the data from an ODE system. We consider a multiple degreeof freedom (MDOF) system. ¨ u + K ( t ) u = 0 (33) The authors wish to thank Curtis Condon, Ken White, and Al Feng of the Beckman Institute of the Universityof Illinois for the bat data and for permission to use it in this paper. Time (ms)

Time (ms)

Time (ms) F r equen cy ( k H z ) Figure 4: Left: Bat chirp signal; Middle: IMFs; Right: Instantaneous frequencies.where K is an N × N symmetric positive deﬁnite stiﬀness matrix and u is an N × K (at least part of it) from incompletemeasurement of the solution u .Here, we assume that K ( t ) is slowly varying. Under this assumption, the solutions, u j ( t ) , j =1 , · · · , N , have the following approximate expression: u j ( t ) = N X k =1 ρ j,k e iθ k ( t ) , (34)and ( θ ′ k ( t )) , k = 1 , · · · , N are eigenvalues of K ( t ). Using this formulation, we can see that theinstantaneous frequencies could help us to retrieve the stiﬀness matrix.As a test example, we consider a simple case where the degree of freedom N = 2. K = " k + k − k − k k + k (35)and k = k = 100 cos(0 . πt ) + 500 , k = 400 cos(0 . πt ) + 400 (36)The initial conditions are u (0) = 1 , u ′ (0) = 0 , u (0) = 2 , u ′ (0) = 0 . (37)This system models the movement of two objects of equal masses connected by springs. And k , k , k are stiﬀness values of springs.The above system is solved from 0 to 9. The initial guesses of the phase functions are 20 t and40 t . Here we only analyze the ﬁrst component of the solution u ( t ), which is shown in the left ﬁgure16 Time (s) u Time (s) F r equen cy Figure 5: Left: Solution u of the ODE system (33) ; Right: Instantaneous frequencies, red:theoretical frequencies; blue: numerical results.of Fig. 5. According to (34), the theoretical frequencies are p ( k + k ) / p ( k + k + 4 k ) / In this paper, we introduced a noval formulation to obtain sparse Time-Frequency decompositionand the corresponding instantaneous frequencies. We formulated the decomposition as a dictionarylearning problem. The dictionary is parametrized by phase functions and the phase functionsare determined by the signal itself. Based on our previous work and the methods of dictionarylearning, we developed an iterative algorithm to look for the phase functions and the correspondingdecomposition. By designing the dictionary carefully to make them orthogonal in the coordinateof the phase functions, we can accelerate the algorithm by using the fast wavelet transform. Thismakes our algorithm very eﬃcient.Another advantage of this method is that it can be easily generalized to deal with more com-plicated data that are not sparse over the time-frequency dictionary, such as data with outliers.For this kind of signals, we just need to enlarge the dictionary and follow the similar procedure to17ook for the sparsest decomposition over this enlarged dictionary. We presented several numericalexamples to demonstrate the eﬀectiveness of our method, including data that do not have scaleseparation and data that are polluted by noise or outliers. The results that we obtained seem tosuggest that our method can oﬀer an eﬀective way to decompose multiscale data even with poorscale separation property.We remark that decomposing several IMFs simultaneously increases the complexity of the op-timization problem. As a result, the robustness of this new method is not as good as the previousone that we introduced in [17]. Moreover, to apply this method, we need to know some informationof the signal. We are considering to combine with other time-frequency analysis method, such assynchrosqueezed wavelet transform [10] in our future work.Another interesting problem that we are considering is to decompose data with intra-wavefrequency modulation. This type of data is known to be very challenging. Naive application oftraditional data analysis methods tends to introduce artiﬁcial harmonics. To deal with this kindof data, we have to extend the deﬁnition of the dictionary by replacing the cosine function by anunknwon periodic shape function that we need to ﬁnd as part of the optimization problem. Thiswork will be reported in our future paper. The other project we are working on is to apply themethod in this paper to analyze the signals from real bridges and we have got some progress. Moreresults will be reported in future.

Acknowledgments.

This research was supported in part by NSF Grants No. DMS-318377 andDMS-1159138, DOE Grant DE-FG02-06ER25727, and AFOSR MURI Grant FA9550-09-1-0613.The research of Dr. Z. Shi was supported by a NSFC Grant 11201257.

References [1] M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: An algorithm for designing ofovercomplete dictionaries for sparse representations.

IEEE Transactions on Signal Processing , , pp. 4311-4322, 2006.[2] D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Method, Academic Press,1982.[3] A. M. Bruckstein, D. L. Donoho, M. Elad, From sparse solutions of systems of equations tosparse modeling of signals and images, SIAM Review , , pp. 34-81, 2009.[4] Anil K. Chopra, Dynamics OF Structures: Theory and Applications to Earthquake Engineer-ing, Prentice-Hall, 1995. 185] E. Cand`es and T. Tao, Near optimal signal recovery from random projections: Universalencoding strategies?, IEEE Trans. on Information Theory , , pp. 5406-5425, 2006.[6] E. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal recovery fromhighly incomplete frequency information, IEEE Trans. Inform. Theory , , pp. 489-509, 2006.[7] E. Candes, J. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccuratemeasurements, Comm. Pure and Appl. Math. , , pp. 1207-1223, 2006.[8] S. Chen, D. Donoho and M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci.Comput. , , pp. 33-61, 1998.[9] I. Daubechies, Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series on AppliedMathematics, Vol. 61, SIAM Publications, 1992.[10] I. Daubechies, J. Lu and H. Wu, Synchrosqueezed wavelet transforms: an empirical modedecomposition-like tool, Appl. Comp. Harmonic Anal. , , pp. 243-261, 2011.[11] D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory , , pp. 1289-1306, 2006.[12] K. Dragomiretskiy and D. Zosso, Varitational Mode Decomposition, IEEE Trans. on SignalProcessing .[13] K. Engan, K. Skretting and J. Husøy, Family of iterative LS-based dictionary learning algo-rithms, ILS-DLA, for sparse signal representation,

Digital Signal Processing , , pp. 32-49,2007.[14] J. Gilles, Empirical Wavelet Transform, IEEE Trans. on Signal Processing , , pp. 3999-4010,2013[15] Tom Goldstein and Stanley Osher, The Split Bregman method for L -regularized problems, SIAM J. Imaging Sci. , , pp. 323-343, 2009.[16] T. Y. Hou and Z. Shi, Adaptive data analysis via sparse time-frequency representation, Ad-vances in Adaptive Data Analysis , , pp. 1-28, 2011.[17] T. Y. Hou and Z. Shi, Data-driven time-frequency analysis, Applied and Comput. HarmonicAnalysis , , pp. 284-308, 2013.[18] T. Y. Hou, Z. Shi, P. Tavallali, Convergence of a data-driven time-frequency analysis method,submitted to Applied and Comput. Harmonic Analysis .[19] N. E. Huang et al., The empirical mode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysis,

Proc. R. Soc. Lond. A , (1998), pp. 903-995.1920] M. Lewicki and T. Sejnowski, Learning overcomplete representations. Neural Computation , , pp. 337-365, 2000.[21] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding.In Proceedings of the International Conference on Machine Learning (ICML) , 2009a.[22] S. Mallat and Z. Zhang, Matching pursuit with time-frequency dictionaries,

IEEE Trans.Signal Process , , pp. 3397-3415, 1993.[23] K. Skretting and K. Engan, Recursive Least Squares Dictionary Learning Algorithm, IEEETransactions on Signal Processing , , pp. 2121-2131, 2010[24] Y. Shi, K. F. Li, Y. L. Yung, H. H. Aumann, Z. Shi, and T. Y. Hou, A decadal microwave recordof tropical air temperature from AMSU-A/Aqua observations, Climate Dynamics , accepted,2013, DOI 10.1007/s00382-013-1696-x.[25] Z. Wu and N. E. Huang, Ensemble Empirical Mode Decomposition: a noise-assisted dataanalysis method,

Advances in Adaptive Data Analysis ,1