[PDF] Sparse constrained projection approximation subspace tracking

Abstract

In this paper we revisit the well-known constrained projection approximation subspace tracking algorithm (CPAST) and derive, for the first time, non-asymptotic error bounds. Furthermore, we introduce a novel sparse modification of CPAST which is able to exploit sparsity in the underlying covariance structure. We present a non-asymptotic analysis of the proposed algorithm and study its empirical performance on simulated and real data.

Full PDF

SSparse constrained pro jection approximationsubspace tracking

Denis Belomestny, Ekaterina Krymova ∗†‡§

November 27, 2018

Abstract

In this paper we revisit the well-known constrained projection ap-proximation subspace tracking algorithm (CPAST) and derive, forthe ﬁrst time, non-asymptotic error bounds. Furthermore, we intro-duce a novel sparse modiﬁcation of CPAST which is able to exploitsparsity in the underlying covariance structure. We present a non-asymptotic analysis of the proposed algorithm and study its empiricalperformance on simulated and real data.

Subspace tracking methods are intensively used in statistical and signal pro-cessing community. Given observations of a multidimensional signal, oneis interested in estimating or tracking a subspace spanning the eigenvectorscorresponding to the ﬁrst largest eigenvalues of the signal covariance matrix.Over the past few decades many variations of the original projection approx-imation subspace tracking (PAST) method [1] were developed which found ∗ The authors are with Faculty of Mathematics at the University Duisburg-Essen, Essen,Germany † D. Belomestny e-mail: [email protected] ‡ E. Krymova e-mail: [email protected] § This work is funded by the German Research Foundation (DFG), Collaborative Re-search Center 823, Subprojects B3 and C5. The ﬁrst author was also supported by theRSF grant 18-11-00132 a r X i v : . [ s t a t . M E ] N ov pplications in data compression, ﬁltering, speech enhancement, etc. (see [2]and references therein). Despite popularity of the subspace tracking meth-ods, only partial results are known about their convergence. The asymptoticconvergence of the PAST algorithm was ﬁrst established in [3, 4] using ageneral theory of stability for ordinary diﬀerential equations. However, noﬁnite sample error bounds are available in the literature. Furthermore, inthe case of a high-dimensional signal the empirical covariance matrix esti-mator performs poorly if the number of observations is small. A commonway to improve the estimation quality in this case is to impose some kindof sparsity assumptions on the signal itself or on the eigensubspace of theunderlying covariance matrix. In [5] a sparse modiﬁcation of the orthogonaliteration scheme for a ﬁxed number of observations was proposed. A thor-ough analysis in [5] shows that under appropriate sparsity assumptions on theleading eigenvectors, the orthogonal iteration scheme combined with thresh-olding allows to perform dimension reduction in high-dimensional setting.Our main goal is to propose a novel modiﬁcation of constraint projection ap-proximation subspace tracking method (CPAST) [6], called sparse projectionapproximation subspace tracking method (SCPAST), which can be used foreﬃcient subspace tracking in the case of high-dimensional sparse signal andsmall number of available observations. Another contribution of our paper isa non-asymptotic convergence analysis of CPAST and SCPAST algorithmsshowing the advantage of SCPAST algorithm in the case of sparse covari-ance structure. Last but not the least, we analyse numerical performance ofSCPAST algorithm on simulated and real data. In particular, the problemof tracking the leading subspace of a music signal is considered.The structure of the paper is as follows. In Section 2 we introduce ourobservational model and formulate main assumptions. Section 3 ﬁrst re-views the CPAST algorithm and then provides the non-asymptotic errorbounds for CPAST in a ”stationary” case. In Section 4 we introduce oursparse constraint approximation subspace tracking method and prove thenon-asymptotic upper bounds for the estimation error. A numerical studyof the proposed algorithm is presented in Section 5. Finally the proofs arecollected in Section 6. 2 Main setup

One important problem in signal processing is adaptive estimation of a dom-inant subspace given incoming noisy observations. Speciﬁcally one considersa model x ( t ) = s ( t ) + σ ( t ) ξ ( t ) , t = 1 , . . . , T, (1)where the observations x ( t ) ∈ R n contain the signal s ( t ) ∈ R n corrupted bya vector ξ ( t ) ∈ R n with independent standard Gaussian components. Thesignal s ( t ) is usually modelled as s ( t ) = A ( t ) η ( t ) , where A ( t ) is a deterministic n × d matrix of rank d with d (cid:28) n and η ( t )is a random vector in R d independent of ξ ( t ) , such that E [ η ( t )] = 0 and E [ η i ( t )] = 1, i = 1 , . . . , d . Under these assumptions, the process x ( t ) has acovariance matrix Σ( t ) which may be decomposed in the following wayΣ( t ) = E [ x ( t ) x (cid:62) ( t )] = A ( t ) A (cid:62) ( t ) + σ ( t ) I n , (2)where I n stands for the unit matrix in R n . Note that the matrix A ( t ) A (cid:62) ( t )has the rank d and by the singular value decomposition (SVD) A ( t ) A (cid:62) ( t ) = d (cid:88) i =1 λ i ( t ) v i ( t ) v (cid:62) i ( t ) , where v i ( t ) ∈ R n , i = 1 , . . . , d, are the eigenvectors of A ( t ) A (cid:62) ( t ) corre-sponding to the eigenvalues λ ( t ) ≥ λ ( t ) ≥ · · · ≥ λ d ( t ) >

0. It followsfrom (2) that the ﬁrst d eigenvalues of Σ( t ) are λ ( t ) + σ ( t ) , . . . , λ d ( t ) + σ ( t ) , whereas the remaining n − d eigenvalues are equal to σ ( t ). Since λ d ( t ) > , the subspace corresponding to the ﬁrst d eigenvectors of A ( t ) A (cid:62) ( t )is identiﬁable. The subspace tracking methods aim to estimate the subspace span( v ( t ) , . . . , v d ( t )) based on the observations ( x ( k )) tk =1 . The overall num-ber of observations T is assumed to be ﬁxed and known. Relying on a heuristic assumption of slow (in time) varying Σ( t ), the sub-space tracking methods use the following estimator of the covariance matrix(up to scaling) (cid:98) Σ γ ( t ) = t (cid:88) i =0 γ t − i x ( i ) x (cid:62) ( i ) , (3)3here 0 < γ ≤ (cid:98) Σ γ ( t ) canadapt to the change in Σ( t ) by discounting the past observations. In thestationary regime, that is, if Σ( t ) is a constant matrix, one would use γ = 1.It is well known, that in the case of Gaussian independent noise the estimator (cid:98) Σ ( t ) is consistent. For the general model (1) and non-stationary case, constrained projectionapproximation subspace tracking (CPAST) method allows to iteratively com-pute a matrix (cid:98) V γ ( t ), t = 1 , . . . , T , containing the ﬁrst d leading eigenvectorsof the matrix (cid:98) Σ γ ( t ) (see (3)) based on sequentially arriving observations x ( j ), j = 1 , . . . , t. The procedure starts with some initial approximation (cid:98) V γ (0) = (cid:98) V and consists of the following two steps • multiplication : compute the n × d matrix (cid:98) Σ γ,V ( t ) = (cid:98) Σ γ ( t ) (cid:98) V γ ( t − • orthogonalization : compute an estimator (cid:98) V γ ( t ) of the matrix V ( t ) con-taining d leading eigenvectors via (cid:98) V γ ( t ) = (cid:98) Σ γ,V ( t )[ (cid:98) Σ (cid:62) γ,V ( t ) (cid:98) Σ γ,V ( t )] − / . In the ”stationary” case ( γ = 1) the method may be regarded as the”online”-version of the orthogonal iterations scheme (see [7]) for computingthe eigen-subspace of the non-negatively deﬁnite matrix. With the use ofthe Sherman-Morrison-Woodbury formula for the inversion at each time t, one has to perform O ( nd ) operations to compute the updated matrix (cid:98) V γ ( t )given (cid:98) V γ ( t − (cid:98) Σ γ ( t −

1) and x ( t ). Throughout this section we consider the stationary case where Σ( t ) = Σ, A ( t ) = A , v i ( t ) = v i , λ i ( t ) = λ i , i = 1 , . . . , d , σ ( t ) = σ . In this situationone would like to keep all the available information to estimate V , that is, touse the estimator (3) for Σ with γ = 1. For notational simplicity, from now4n we skip the dependence on γ and use the notation (cid:98) Σ( t ) for the empiricalcovariance matrix (cid:98) Σ( t ) = t (cid:80) ti =1 x ( i ) x (cid:62) ( i ) and (cid:98) V ( t ) for CPAST estimator.Thus in the stationary case the CPAST estimator takes the form (cid:98) V ( t ) = [ (cid:98) Σ( t ) (cid:98) V ( t − (cid:98) V (cid:62) ( t − (cid:62) (cid:98) Σ ( t ) (cid:98) V ( t − − / . (4)We assume that the random vectors η ( t ) and ξ ( t ) have independent N (0 , t = 1 , . . . , T . Under these assumptions the covariance matrix(2) becomes Σ = d (cid:88) i =1 λ i v i v (cid:62) i + σ I n = V Λ d V (cid:62) + σ I n , (5)where V is n × d matrix with columns { v i } di =1 , Λ d is d × d diagonal matrix with { λ i } di =1 on the diagonal. Note that the observational model (1) in stationarycase can be alternatively written as the so-called spike model x ( t ) = d (cid:88) i =1 (cid:112) λ i u i ( t ) v i + σξ ( t ) , (6)where u i ( t ) are i.i.d. standard Gaussian random variables independent from ξ ( t ).For our non-asymptotic error analysis of CPAST, we assume that d , σ and λ i , i = 1 , . . . , d are known. With the known σ we can always normalizethe data and therefore without loss of generality we can assume that σ = 1.The typical condition while analyzing the quality of the eigenvectors es-timation is the so-called spectral gap condition, which says that the adjacenteigenvectors explain distinguishably diﬀerent portion of the variance in thedata, namely there exists τ ≥

1, such that for all j = 1 , . . . , d,τ ( λ j − λ j +1 ) ≥ λ , where λ d +1 = 0 by deﬁnition. Since our goal is the estimation of the d -dimensional subspace of the ﬁrst eigenvectors, and we are not interested inthe estimation of each particular eigenvector, we need only the condition forthe separation of this d -dimensional subspace, namely that the gap between λ d and λ d +1 is suﬃciently large: τ λ d ≥ λ . (7)5eﬁne a distance l between two subspaces W and Q spanning orthonormalcolumns w , . . . , w d and q , . . . , q d correspondingly via l ( W, Q ) = l ( W , Q ) = (cid:107) W W (cid:62) − QQ (cid:62) (cid:107) , (8)where the nuclear norm (cid:107) A (cid:107) of a matrix A ∈ R n × d is deﬁned as (cid:107) A (cid:107) = sup x ∈ R d (cid:107) Ax (cid:107) (cid:107) x (cid:107) ,and W = { w , . . . , w d } and Q = { q , . . . , q d } are the matrices in R n × d withorthonormal columns.The next result shows that with high probability the subspace which spansthe CPAST estimator (cid:98) V ( t ) is close, in terms of l, to the subspace spanning V when the number of observation is large enough. We assume that the initialestimator (cid:98) V ( t ) = (cid:98) V is constructed from t ﬁrst observations by means ofthe singular value decomposition of (cid:98) Σ( t ). Theorem 1.

Suppose that the spectral gap condition (7) holds and √ t ≥ √ R max λ + 1 λ d , where R max = 5 √ n − d + 5 √ (cid:112) ln( n ∨ T ) . Then after t − t iterations weget with probability at least − C ( n ∨ t ) − ,l ( V, (cid:98) V ( t )) ≤ C λ d + 1 λ d n − dt + C λ + 1 λ d log( n ∨ t ) t , (9) where C , C are absolute constants and C depends on τ. Remark 1.

The second term on the right-hand side of (9) corresponds to theerror of separating the ﬁrst d eigenvectors from the rest. The ﬁrst term is anaverage error of estimating all components of d leading eigenvectors. It orig-inates from the interaction of the noise terms with the diﬀerent coordinates,see [8]. We assume that in the stationary case (5) the ﬁrst d leading eigenvectors v i , i = 1 , . . . , d, of Σ have most of their entries close to zero. Namely, we6uppose that each v i fulﬁls the so-called weak- l r ball condition [9, 10], thatis, for some r ∈ (0 , , | v i | ( k ) ≤ s i k − /r , k = 1 , . . . , n. where | v i | ( k ) is the k -th largest coordinate of v i . The weak- l r ball conditionis known to be more general than l r ball condition (which is (cid:107) q (cid:107) r ≤ s for q ∈ R n , r ∈ (0 , s ≥ g ( x, β ) with a thresholding parameter β > x ∈ R via x − β ≤ g ( x, β ) ≤ x + β, g ( x, β )1 | x |≤ β = 0 . (10)For example, the so-called hard-thresholding function g H ( x, β ) given by g H ( x, β ) = x ( | x |≤ β ) (11)and the so-called soft-thresholding function deﬁned as g S ( x, β ) = ( x − β ) + · sign( x )fulﬁll the conditions (10). When β is a vector with components β i , i =1 , . . . , d , and V is a matrix with columns v i ∈ R n , i = 1 , . . . , d , we denote by g ( V, β ) a n × d matrix with the elements { g ( v ij , β i ) } , i = 1 , . . . , d , j = 1 , . . . , n .Our primal goal is to propose a subspace tracking method for estimat-ing a d -dimensional subspace of the process under weak- l r ball assumptionon the leading eigenvectors of the covariance matrix Σ and to analyze it’sconvergence. Our sparse modiﬁcation of CPAST relies on the orthogonal iteration schemewith an additional thresholding step (cf. [5, 10]). From now on, by a slightabuse of notation, we will denote by (cid:98) V ( t ) an iterative estimator obtainedwith the help of the modiﬁed CPAST, given t observations. To get theinitial approximation (cid:98) V ( t ) , we use the following modiﬁcation of a standardSPCA scheme, see [5, 10].1. First compute the empirical covariance (cid:98) Σ( t ) based on t observations: (cid:98) Σ( t ) = t (cid:80) t i =1 x ( i ) x (cid:62) ( i ) .

7. Deﬁne a set of indices G, corresponding to large enough diagonal ele-ments of (cid:98) Σ( t ) : G = (cid:26) k : (cid:98) Σ kk ( t ) > γ log( n ∨ t ) t (cid:27) for γ ≥ √ log( n ∨ T )log( n ∨ t ) .3. Let (cid:98) Σ ( t ) be a submatrix of (cid:98) Σ( t ) corresponding to the row and columnindices in G × G .4. As an estimator at step zero, we take the ﬁrst d eigenvectors of (cid:98) Σ ( t )completed with zeros in the coordinates { , . . . , n }\ G to the vectors oflength n .Now we describe a sparse modiﬁcation of CPAST, which we called SCPAST.We start with (cid:98) V ( t ) obtained by the above procedure. Then for t = t +1 , . . . , T, we perform the following steps1. multiplication : (cid:98) Υ( t ) = (cid:98) Σ( t ) (cid:98) V ( t − , thresholding : deﬁne a matrix (cid:98) Υ β ( t ) = g ( (cid:98) Υ( t ) , β ( t )) , where g is a thresholding function satisfying (10) and β ( t ) is the cor-responding thresholding vector;3. orthogonalization : (cid:98) V ( t ) = (cid:98) Υ β ( t )[ (cid:98) Υ β (cid:62) ( t ) (cid:98) Υ β ( t )] − / . First we deﬁne the thresholding parameter β ( t ) as follows. For t = t +1 , . . . , T and a ≥ √ n ∨ T )log( n ∨ t )the components β i ( t ), i = 1 , . . . , d of the vector β ( t ) are given by β i ( t ) = a (cid:114) ( λ i + 1) log( n ∨ t ) t . (12)8he motivation for thresholding of the column vectors of (cid:98) Υ( t ) comes fromthe following connection between sparsity of the leading eigenvectors v j , j =1 , . . . , d, and the vector ζ v with the components (cid:113)(cid:80) dj =1 λ j v jk , k = 1 , . . . , n ( ζ vk is the variance of the k -th coordinate of the signal part [12]): the weak- l r sparsity of the vector ζ v implies the weak- l r sparsity of v j , j = 1 , . . . , d .Suppose that d and the eigenvalues λ , . . . , λ d are known. In the case ofunknown d and λ , . . . , λ d one might ﬁrst estimate the eigenvalues of (cid:98) Σ ( t )deﬁned in the previous section and then select the largest set of eigenvaluessatisfying the spectral gap condition (7) with some parameter τ (see [5] formore details).Denote by S ( t ) the set of indices of “large” eigenvectors components ( S stands for ”signal”), that is, for a ﬁxed t,S ( t ) = (cid:40) j : | v ij | ≥ bh i (cid:114) log( n ∨ t ) t , for some i = 1 , . . . , d (cid:41) , where h i = √ λ i +1 λ i and b = . a √ τ √ d . In fact, the quantity h i /t is an estimateof the noise variance in the entries of the i -th leading eigenvector [8]. Thenumber of “large” entries of the ﬁrst d leading eigenvectors to estimate thusmight be estimated by the cardinality of S ( t ), which we denote by card( S ( t )).One can bound card( S ( t )) ascard( S ( t )) ≤ d (cid:88) i =1 card( S j ( t )) , where S j ( t )) = (cid:26) j : | v ij | ≥ bh i (cid:113) log( n ∨ t ) t (cid:27) . From Lemma 14 (see AppendixB) we see that d ≤ card( S ( t )) ≤ CM ( t ) , where C depends on b, r and M ( t ) = n ∧ (cid:34) d (cid:88) j =1 s rj h rj (cid:18) log( n ∨ t ) t (cid:19) − r/ (cid:35) . (13)Note that in the sparse case, the number of non-zero components card( S ( t ))is much smaller than n . For example, if (cid:107) v j (cid:107) r ≤ s , j = 1 . . . , d, then M ( t ) ≤ n ∧ d s r h rd (cid:18) log( n ∨ t ) t (cid:19) − r/ . s r h rj is often referred to as an eﬀective dimension of the vector v j .Thus M ( t ) is the number of eﬀective coordinates of v j , j = 1 , . . . , d in thecase of disjoint S j ( t ).Since h d /t is an upper-bound for the estimation error for the componentsof the ﬁrst d leading eigenvectors, the right hand side of the above inequalitygives, up to a logarithmic term, the overall number of components of the d leading eigenvectors to estimate. The next theorem gives non-asymptoticbounds for a distance between V and (cid:98) V ( t ) . Theorem 2.

Let √ t ≥ (cid:0) C h d M / ( T ) + C (cid:1) λ + 1 λ d (cid:112) log( n ∨ T ) , (14) where C depends on τ in (7) , r , a , C depends on τ . After t iterations onehas with probability at least − C ( n ∨ t ) − ,l ( V, (cid:98) V ( t )) ≤ C h d M ( t ) log( n ∨ t ) t + C λ + 1 λ d log( n ∨ t ) t . (15) with some absolute constant C > . Remark 2.

The second term in (15) is the same as in the non-sparse case,see Theorem 1. This term is always present as an error of separating the ﬁrst d eigenvectors from the rest eigenvectors regardless how sparse they are. Theﬁrst term in (15) and (9) is responsible for the interaction of the noise withdiﬀerent coordinates of the signal. The average error of estimating one entryof the ﬁrst d leading eigenvectors based on t observation can be bounded by t λ d +1 λ d , see [13]. The number of components to be estimated in SCPAST foreach vector is bounded by M ( t ) (see (13) ), which is small compared to n in thesparse case. Thus, the ﬁrst term in (15) can be signiﬁcantly smaller than theﬁrst one in (9) , provided the ﬁrst d leading eigenvectors are sparse. Note alsothat the computational complexity of SCPAST at each step t = t + 1 , . . . , T is O ( d card( S ( t ))) with probability given by Theorem (2) . To illustrate the advantage of using SCPAST for the sparse case, we generate T = 2000 observations from (6) for the case of a single spike, that is, d = 110 a)(b) (c)(d) Figure 1: The components of the leading eigenvector to recover (a) stepfunction, (b)–(d) contain the results for the error l ( v , (cid:98) v ) for λ = { , , } and n = 1024. Our aim is to estimate the leading eigenvector v . We shalluse three functions depicted in subplots (a) of Fig. 1–3 with diﬀerent sparsitylevels in the wavelet domain.The observations are generated for the noise variance 1 and followingcases of maximal eigenvalue λ ∈ { , , } . We used the Symmlet 8basis from the Matlab package SPCALab to transform the initial data intothe wavelet domain. We applied CPAST and SCPAST for the recovery ofwavelet coeﬃcients of the vector v and then transformed the estimates tothe initial domain and computed the error l ( v , (cid:98) v ) depending on the numberof observations. The results for the hard thresholding (11) with the a = 1 . a)(b) (c)(d) Figure 2: The components of the leading eigenvector to recover (a) threepeeks function, (b)–(d) contain the results for the error l ( v , (cid:98) v ) for λ = { , , } a)(b) (c)(d) Figure 3: The components of the leading eigenvector to recover (a) one peekfunction, (b)–(d) contain the results for the error l ( v , (cid:98) v ) for λ = { , , } Natural acoustic signals like the musical ones exhibit a highly varying tem-poral structure, therefore there is a need in adaptive unsupervised methodsfor signal processing which reduce the complexity of the signal. In [14] amethod was proposed which reduces the spectral complexity of music signalsusing the adaptive segmentation of the signal in the spectral domain for theprincipal component analysis for listeners with cochlear hearing loss. In thefollowing we apply CPAST and SCPAST as an alternative method for thecomplexity reduction of music signals. To illustrate the use of SCPAST andCPAST we set the memory parameter γ = 0 . , v .Fig. 5 contains the results of the recovery of the leading eigenvalue with168 components. The results show that SCPAST method allows to obtainsparse representation of the leading eigenvectors and seems to be promisingfor construction of the structure-preserving compressed representations ofthe signals. Denote by ¯ V a matrix with n − d column vectors v i , i = d + 1 , . . . , n, whichcomplete the orthonormal columns { v i } di =1 of the matrix V to the orthonor-mal basis in R n . Denote by X ( t ) a matrix with the columns { x ( i ) } ti =1 . From(6) one gets a representation X ( t ) = V Λ / d U (cid:62) ( t ) + σ Ξ( t ) , t = 1 , . . . , T, (16)14igure 4: CQT-Spectrogram of Bach Siciliano for Oboe and Piano. Thedeep blue color corresponds to the zero values, the red color corresponds tothe higher values.where U ( t ) ∈ R t × d , Ξ( t ) ∈ R n × t are matrices with independent N (0 , V is the orthonormal matrix with columns { v i } ni =1 , Λ d is a diagonalmatrix with λ i , i = 1 , . . . , d on the diagonal. Denote a set of indices to thesmall components of leading eigenvectors as N ( t ) = { , . . . , n }\ S ( t ) (where N here stands for “noise”).From (16) the empirical covariance matrix can be decomposed as (cid:98) Σ( t ) = 1 t V Λ / d U (cid:62) ( t ) U ( t )Λ / d V (cid:62) + 1 t Ξ( t )Ξ (cid:62) ( t )+ 1 t V Λ / d U (cid:62) ( t )Ξ (cid:62) ( t ) + 1 t Ξ( t ) U ( t )Λ / d V (cid:62) . (17)It is well known [7] that the distance (8) between subspaces W and Q ,spanning n × d matrices with orthonormal columns W and Q correspond-ingly, is related to d -th principal angle between subspaces W and Q as l ( W , Q ) = sin φ d ( W , Q ) , where the principal angles 0 ≤ φ ≤ · · · ≤ φ d γ = 0 . W and Q are recursively deﬁned as [16] φ i ( W , Q ) = arccos (cid:104) x i , y i (cid:105)(cid:107) x i (cid:107) (cid:107) y i (cid:107) , where { x i , y i } = arg min x ∈W , y ∈Q ,x ⊥ x j , y ⊥ y j , j

For CPAST (4) with probability − C ( n ∨ t ) − r ( t ) ≤ ( λ d +1 + 1) tan φ d ( (cid:98) V ( t − , V ) λ d + 1 − ( λ + 1) E ( t ) sec φ d ( (cid:98) V ( t − , V )+ ( λ + 1) / E ( t ) sec φ d ( (cid:98) V ( t − , V ) λ d + 1 − ( λ + 1) E ( t ) sec φ d ( (cid:98) V ( t − , V ) , (20) where E ( t ) = 5 (cid:114) n − dt + 5 √ (cid:114) log( n ∨ t ) t . The following lemma gives the bound for the error l ( (cid:98) V ( u ) , V ) dependingon the error of the previous iteration l ( (cid:98) V ( u − , V ) for all u ∈ { t + 1 , . . . , t } . Lemma 2.

With probability greater than − C ( n ∨ t ) − for all u ∈ { t + 1 , . . . , t } r ( u ) ≤ α r ( u −

1) + α R ( t ) √ u (cid:112) − r ( u − − α R ( t ) √ u , (21) where r ( u ) = sin φ d ( (cid:98) V ( u ) , V ) , R ( t ) = 5 √ n − d + 5 √ (cid:112) log( n ∨ t ) ,α = 1 λ d + 1 , α = √ λ + 1 λ d + 1 , α = λ + 1 λ d + 1 . (22)17iven Lemma 2 it is possible to derive a bound for l ( (cid:98) V ( t ) , V ). First let usstate the result which allows to bound the error of the initial estimate (cid:98) V ( t ). Lemma 3.

Let (cid:98) V ( t ) be a matrix containing ﬁrst d leading eigenvectors ofthe matrix (cid:98) Σ( t ) . Then with probability − C ( n ∨ T ) − r ( t ) = l ( (cid:98) V ( t ) , V ) ≤ α t , where α = R max λ +1 λ d , with R max = R ( T ) . The following Lemma gives the error of CPAST after observing K vectors x i , i = t + 1 , . . . , t + K based on the recursive bound (21). Note that theproof of Lemma 4 also insures that the denominator in (21) is bounded awayfrom zero. Lemma 4.

Suppose that r ( t ) ≤ α √ t , where √ t ≥ R max α (1 − α ) / . Then for K ≥ K ( ρ, t ) r ( t + K ) ≤ α α − α R ( t + K ) √ K + t . (23)The statement of the Theorem 1 follows from Lemma 2 applied to theinequality (21) with (22), which holds with probability 1 − C ( n ∨ t ) − withthe initial conditions given by Lemma 3. Deﬁne (cid:98) Σ ◦ ( t ), the oracle version of (cid:98) Σ( t ) and the corresponding expectationΣ ◦ ( t ) = E (cid:98) Σ ◦ ( t ) as follows (cid:98) Σ ◦ ( t ) = (cid:20)(cid:98) Σ S ( t ) 00 I N ( t ) (cid:21) , Σ ◦ ( t ) = (cid:20) Σ S ( t ) 00 I N ( t ) (cid:21) , (24)where (cid:98) Σ S ( t ) and Σ S ( t ) are the sub-matrices of the size card( S ( t )) × card( S ( t ))with column and row indices from S ( t ). The identity matrix I N ( t ) has the18ize card( N ( t )) × card( N ( t )). Here we assumed without loss of generalitythat indices in S ( t ) are always smaller than ones in N ( t ).First we obtain the oracle sequence (cid:98) V ◦ ( t ) of the solutions by iteratingSCPAST with matrices (cid:98) Σ ◦ ( t ) instead of (cid:98) Σ( t ). We deﬁne the initial estimate (cid:98) V ◦ ( t ) with the steps (a)-(d) in the section 6.1 applied to the matrix (cid:98) Σ ◦ ( t ).And then bound sin φ d ( (cid:98) V ◦ ( t ) , V ). Denote the result of the thresholding thecolumns of the matrix (cid:98) Υ ◦ ( t ) = (cid:98) Σ ◦ ( t ) (cid:98) V ( t −

1) with the thresholding parametersgiven by the vector β ( t ) as (cid:98) Υ ◦ ,β ( t ) = g ( (cid:98) Υ ◦ ( t ) , β ( t )) . Denote the submatrix V S ( t ) obtained by selecting the rows of V withindices in S ( t ). Denote by V k , k = 1 , . . . , n the rows of V (recall that thecolumns are v j , j = 1 , . . . , m ). For the estimators of V we omit the depen-dence of S ( t ) on t as the estimator itself depends on t , that is, (cid:98) V S ( t ) is amatrix of the rows of (cid:98) V ( t ) with indices from S ( t ).The following bound for the oracle error r ( t ) = l ( (cid:98) V ◦ ( t ) , V ) of SCPASTmethod is analogous to Lemma 2. Lemma 5.

For u = t + 1 , . . . , t with probability greater than − C ( n ∨ t ) − the following bound holds true r ( u ) ≤ α r ( u −

1) + α R ◦ ( t ) √ u (cid:112) − r ( u − − α R ◦ ( t ) √ u , where (25) α = ( λ d + 1) − , α = α = ( λ + 1) / ( λ d + 1) , (26) R ◦ ( t ) = C h d M / ( t ) (cid:112) log( n ∨ t ) + C (cid:112) log( n ∨ t ) , where C and C are con-stants and C depends on r , d , a , τ . In the sparse case the similar result to Lemma 3 holds true giving a boundon the error of the initial oracle estimator.

Lemma 6.

The error of initial oracle estimation is bounded as follows r ( t ) ≤ α √ t , with probability − C ( n ∨ T ) − , where α = (cid:18) λ d C λ h d M / ( t ) + C λ + 1 λ d (cid:19) (cid:112) log( n ∨ T ) , where C and C are constants and C depends on r . emma 7. Thus after t iterations, t = t + 1 , . . . , T , with the probability − C ( n ∨ t ) − one has l ( V, (cid:98) V ◦ ( t )) ≤ C h d M ( t ) log( n ∨ t ) t + C λ + 1 λ d log( n ∨ t ) t , where C depends on d , r , τ , a , and C depends on τ . The convergence of the oracle scheme doesn’t immediately imply the con-vergence of the SCPAST estimators. The following two lemmas state thatwith the high probability (cid:98) V ◦ ( t ) = (cid:98) V ( t ). Thus the bound in Lemma 7 holdsfor SCPAST and the Theorem 2 is justiﬁed. Lemma 8.

For γ ≥ √ log( n ∨ T )log( n ∨ t ) with probability − C ( n ∨ T ) − the initialoracle estimate coincide with the initial SPCA estimate, that is, (cid:98) V ◦ ( t ) = (cid:98) V ( t ) . Lemma 9.

With probability − C ( n ∨ t ) − for u = t + 1 , . . . , t the oracleSCPAST and SCPAST solutions coincide (cid:98) V ◦ ( t ) = (cid:98) V ( t ) . Conclusions

We developed a new method SCPAST based on constraint projection ap-proximation subspace tracking method for subspace tracking in the sparsityassumptions on the underlying signal eigen subspace. The thesholding stepwas introduced in order to ensure the sparsity of the solution. We presentedthe non-asymptotical bounds for the errors of subspace recovery with SC-PAST and CPAST as well as the empirical studies of the methods. Theresults of experiments show that SCPAST method allows to obtain sparserepresentation of the leading eigenvector of music signals and might be usedfor adaptive compression of the musical signal in the spectral domain.20 ppendix A. Proofs of Lemmas

Proof of Lemma 1

From the deﬁnition (19) tan φ d ( (cid:98) Σ( t ) (cid:98) V ( t − , V ) = max (cid:107) x (cid:107) =1 (cid:107) ¯ V (cid:62) (cid:98) Σ( t ) (cid:98) V ( t − x (cid:107)(cid:107) V (cid:62) (cid:98) Σ( t ) (cid:98) V ( t − x (cid:107) .Using (cid:98) Σ( t ) = (cid:98) Σ( t ) − Σ + Σ the latter maximum can be bounded bymax (cid:107) x (cid:107) =1 (cid:107) ¯ V (cid:62) (cid:98) V ( t − x (cid:107) + (cid:107) ¯ V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:98) V ( t − x (cid:107) ( λ d + 1) (cid:107) V (cid:62) (cid:98) V ( t − x (cid:107) − (cid:107) V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:98) V ( t − x (cid:107) . Using (18) max (cid:107) x (cid:107) =1 (cid:107) ¯ V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:98) V ( t − x (cid:107)(cid:107) V (cid:62) (cid:98) V ( t − x (cid:107) ≤ (cid:107) ¯ V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:107) cos φ d ( (cid:98) V ( t − , V ) , max (cid:107) x (cid:107) =1 (cid:107) V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:98) V ( t − x (cid:107)(cid:107) V (cid:62) (cid:98) V ( t − x (cid:107) ≤ (cid:107) V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:107) cos φ d ( (cid:98) V ( t − , V ) . From (17) (cid:98) Σ( t ) =Σ + 1 t Ξ( t )Ξ (cid:62) ( t ) − I n + 1 t V Λ / d U (cid:62) ( t )Ξ (cid:62) ( t ) + 1 t Ξ( t ) U ( t )Λ / d V (cid:62) + V Λ / d (cid:18) t U (cid:62) ( t ) U ( t ) − I d (cid:19) Λ / d V (cid:62) . (27)21herefore using ¯ V ¯ V (cid:62) + V V (cid:62) = I n (cid:107) ¯ V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:107) ≤ (cid:13)(cid:13)(cid:13)(cid:13) t ¯ V (cid:62) Ξ( t )[ ¯ V (cid:62) Ξ( t )] (cid:62) − I n − d (cid:13)(cid:13)(cid:13)(cid:13) + 1 t (cid:107) ¯ V (cid:62) Ξ( t )[ V (cid:62) Ξ( t )] (cid:62) (cid:107) + (cid:112) λ t (cid:107) ¯ V (cid:62) Ξ( t ) U ( t ) (cid:107) , (cid:107) V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:107) ≤ λ (cid:13)(cid:13)(cid:13)(cid:13) t U (cid:62) ( t ) U ( t ) − I d (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) t V (cid:62) Ξ( t )[ ¯ V (cid:62) Ξ( t )] (cid:62) (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) t V (cid:62) Ξ( t ) (cid:2) V (cid:62) Ξ( t ) (cid:3) (cid:62) − I d (cid:13)(cid:13)(cid:13)(cid:13) + 2 (cid:112) λ (cid:13)(cid:13)(cid:13)(cid:13) t Ξ( t ) U ( t ) (cid:13)(cid:13)(cid:13)(cid:13) . Using √ λ ∨ ≤ √ λ + 1 ≤ λ +1, Lemma 10 and Lemma 11 with p = √ − C ( n ∨ t ) − , for big enough t (cid:107) ¯ V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:107) ≤ (cid:112) λ + 1 E ( t ) , (cid:107) V (cid:62) ( (cid:98) Σ( t ) − Σ) (cid:107) ≤ ( λ + 1) E ( t ) , (28)where E ( t ) = 5 (cid:113) n − dt + 5 √ (cid:113) log( n ∨ t ) t with probability 1 − C ( n ∨ t ) − , where C is a constant. The statement of the lemma follows from the observationthat tan φ d ( (cid:98) V ( t ) , V ) = tan φ d ( (cid:98) Σ( t ) (cid:98) V ( t − , V ) . Proof of Lemma 2

Lemma 1 gives a probabilistic bound on the error of subspace estimation l ( (cid:98) V ( t ) , V ) based on the previous iteration l ( (cid:98) V ( t − , V ). Or goal is to bound l ( (cid:98) V ( t ) , V ) for t ∈ { t + 1 , . . . , T } . Due to Lemmas (10) and (11) we get for p > u = t + 1 , . . . , t the term √ u (cid:13)(cid:13)(cid:13) u V (cid:62) Ξ( u ) (cid:2) V (cid:62) Ξ( u ) (cid:3) (cid:62) − I d (cid:13)(cid:13)(cid:13) is boundedfrom above by 3( √ d + p (cid:112) log( n ∨ t )), √ u (cid:13)(cid:13) u ¯ V (cid:62) Ξ( u )[ ¯ V (cid:62) Ξ( u )] (cid:62) − I n − d (cid:13)(cid:13) by22( √ n − d + p (cid:112) log( n ∨ t )), √ u (cid:13)(cid:13) u V (cid:62) Ξ( u )[ ¯ V (cid:62) Ξ( u )] (cid:62) (cid:13)(cid:13) by (cid:113) p log( n ∨ t ) √ u ( √ n − d + √ d + p (cid:112) log( n ∨ t )). Finally, √ u (cid:13)(cid:13) u ¯ V (cid:62) Ξ( u ) U ( u ) (cid:13)(cid:13) is bounded by (cid:115) p log( n ∨ t ) √ u ( √ n − d + √ d + p (cid:112) log( n ∨ t )) . Each of the bounds holds with the probability 1 − ( n ∨ t ) − for p = √

6. Usingthe union bound we get the statement of the Lemma for the intersection ofevents with the probability 1 − C ( t − t )( n ∨ t ) − . Proof of Lemma 3

The proof is based on Davis sin θ Theorem 4, Lemma 10 (see Appendix B),and Weyl’s theorem [17]. From Davis sin θ Theorem l ( V, (cid:98) V ( t )) ≤ (cid:13)(cid:13)(cid:13) ( (cid:98) Σ( t ) − Σ) V (cid:13)(cid:13)(cid:13) (cid:16) λ d + 1 − λ d +1 (cid:16)(cid:98) Σ( t ) (cid:17)(cid:17) , (29)where λ d +1 ( A ) is a ( d + 1)-th singular value of the matrix A T A . Weil’stheorem gives for j = 1 , . . . , n | λ j + 1 − λ j ( (cid:98) Σ( t )) | ≤ (cid:107) (cid:98) Σ( t ) − Σ (cid:107) . Therefore the denominator in (29) may be bounded as | λ d + 1 − λ d +1 ( (cid:98) Σ( t )) | ≥ λ d − (cid:107) (cid:98) Σ( t ) − Σ (cid:107) . From (27) (cid:107) (cid:98) Σ( t ) − Σ (cid:107) ≤ λ (cid:13)(cid:13)(cid:13)(cid:13) t U ( t ) (cid:62) U ( t ) − I d (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) t Ξ( t )Ξ (cid:62) ( t ) − I n (cid:13)(cid:13)(cid:13)(cid:13) + 2 (cid:112) λ (cid:13)(cid:13)(cid:13)(cid:13) t U (cid:62) ( t )Ξ (cid:62) ( t ) (cid:13)(cid:13)(cid:13)(cid:13) ,

23y Lemma 10 and 11 | λ d + 1 − λ d +1 ( (cid:98) Σ( t )) | ≥ (1 + o (1)) λ d . From (28) and Lemma 10 one has that with probability 1 − C ( n ∨ t ) − (cid:107) ( (cid:98) Σ( t ) − Σ) V (cid:107) ≤ ( λ + 1) (cid:114) nt + (cid:115) log( n ∨ t ) t  . Combining the last two inequalities we get the statement of Lemma.

Proof of Lemma 4

First we prove (23) for the pair ( t , r ( t )) which satisﬁes by induction for all k = 1 , . . . , K, and some ρ ∈ ( α , (cid:115) − (cid:18) r ( t ) + R (cid:18) α α (cid:19) ρ − ρ √ t (cid:19) − α R √ t > α ρ . (30)We have for K = 1 r ( t + 1) ≤ α r ( t ) + α R √ t +1 (cid:112) − r ( t ) − α R √ t +1 ≤ ρr ( t ) + ρα α R √ t + 1 . Furthermore suppose that (30) holds for K = L, then r ( t + L ) ≤ (cid:32) ρ L r ( t ) + R (cid:18) α α (cid:19) L (cid:88) k =1 ρ L +1 − k √ t + k (cid:33) and r ( t + L + 1) ≤ α r ( t + L ) + α R √ t + L +1 (cid:112) − r ( t + L ) − α R √ t + L +1 ≤ ρ L r ( t ) + R (cid:18) α α (cid:19) L +1 (cid:88) k =1 ρ L − k √ t + k .

24 suﬃcient condition for the above formula to hold reads as (cid:118)(cid:117)(cid:117)(cid:116) − (cid:32) ρ k − r ( t ) + α Rα k − (cid:88) j =1 ρ k − j √ t + j (cid:33) − α R √ t + k > α ρ . Note that (cid:80) k − j =1 ρ k − j √ t + j ≤ ρ − ρ √ t , therefore the above condition is fulﬁlledgiven (30). Furthermore r ( t + K + 1) ≤ ρ K r ( t ) + R (cid:18) α α (cid:19) K +1 (cid:88) k =1 ρ K − k √ t + k , where for K > K ( ρ ), t > j K,ρ = log( K ) / (2 log(1 /ρ )) K (cid:88) k =1 ρ K − k √ t + k ≤ j K,ρ (cid:88) j =0 ρ j √ t + K − j + K − (cid:88) j = j K,ρ +1 ρ j √ t + K − j ≤ − ρ (cid:112) t + K − j K,ρ + 11 − ρ √ K + t and r ( t + K + 1) (cid:46) α α ρ − ρ R √ K + t + 1 . From (30) the condition on the starting value r ( t ) is r ( t ) ≤ (cid:115) − (cid:18) α R √ t + α ρ (cid:19) − Rα α ρ − ρ √ t . (31)Thus the number of initial observations t for (31) to be satisﬁed given r ( t ) ≤ α √ t reads as α √ t ≤ (cid:115) − (cid:18) α R √ t + α ρ (cid:19) − R (cid:18) α α (cid:19) − ρ √ t . Therefore, taking into account (26) and ρ > α , the suﬃcient condition on √ t is √ t ≥ α R (cid:16) − α ρ (cid:17) + 2 R (cid:16) α α (cid:17) ρ − ρ (cid:113) − α ρ . α = R max α − α . Set ρ = ρ ( (cid:15) ) = 1 − (cid:15) (1 − α ). Itis easy to check that α < ρ ( (cid:15) ) < (cid:15) ∈ (0 , / R max = R ( T ),therefore √ t ≥ R max α (cid:15) (1 − α ) / √ − (cid:15) . The value K ( ρ ) might be deﬁned by ρ K ≤ α α R √ t + K , thus (using | ln(1 − x ) | ≥ x for x ∈ (0 , K ≥ (cid:15) (1 − α ) ln (cid:16) α α √ T (cid:17) . Put (cid:15) = 1 / Proof of Lemma 5

Using the triangle inequality (cid:113) l ( (cid:98) V ◦ ( t ) , V ) ≤ (cid:113) l ( (cid:98) Υ ◦ ( t ) , V ) + (cid:113) l ( (cid:98) Υ ◦ ( t ) , (cid:98) V ◦ ( t )) . (32)We bound the ﬁrst term as l / ( (cid:98) Υ ◦ ( t ) , V ) ≤ tan φ d ( (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − , V ) . Using the variational deﬁnition of tan φ d l / ( (cid:98) Υ ◦ ( t ) , V ) ≤ max (cid:107) x (cid:107) =1 (cid:107) ¯ V (cid:62) (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − x (cid:107)(cid:107) V (cid:62) (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − x (cid:107) The right hand side may be bounded withmax (cid:107) x (cid:107) =1 (cid:107) ¯ V (cid:62) Σ (cid:98) V ◦ ( t − x (cid:107) + (cid:107) ¯ V (cid:62) ( (cid:98) Σ ◦ ( t ) − Σ) (cid:98) V ◦ ( t − x (cid:107)(cid:107) V (cid:62) Σ (cid:98) V ◦ ( t − x (cid:107) − (cid:107) V (cid:62) ( (cid:98) Σ ◦ ( t ) − Σ) (cid:98) V ◦ ( t − x (cid:107) . Triangle inequality gives (cid:107) ( (cid:98) Σ ◦ ( t ) − Σ) ¯ V (cid:107) ≤(cid:107) ( (cid:98) Σ ◦ ( t ) − Σ ◦ ( t )) ¯ V (cid:107) + (cid:107) (Σ ◦ ( t ) − Σ) ¯ V (cid:107) . Note that (cid:98) Σ ◦ ( t ) − Σ ◦ ( t ) = (cid:20)(cid:98) Σ S ( t ) − Σ S ( t ) 00 0 (cid:21) , (33)Σ ◦ ( t ) − Σ = (cid:20) − V S ( t )Λ d V (cid:62) N ( t ) − V N ( t )Λ d V (cid:62) S ( t ) − V N ( t )Λ d V (cid:62) N ( t ) (cid:21) , (34)26here V S ( t ) is a submatrix of V with the row indices in S ( t ). Decompose (cid:98) Σ S ( t ) − Σ S ( t ) using (27) and (33) (cid:107) ( (cid:98) Σ ◦ ( t ) − Σ ◦ ( t )) ¯ V (cid:107) ≤ λ (cid:107) ¯ V (cid:62) S V S ( t ) (cid:107) (cid:13)(cid:13)(cid:13)(cid:13) t U ( t ) (cid:62) U ( t ) − I d (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) t Ξ S ( t )Ξ (cid:62) S ( t ) − I S ( t ) (cid:13)(cid:13)(cid:13)(cid:13) + 2 (cid:112) λ (cid:13)(cid:13)(cid:13)(cid:13) t U ( t ) (cid:62) Ξ (cid:62) S ( t ) (cid:13)(cid:13)(cid:13)(cid:13) , where Ξ S ( t ) is t × card( S ( t )) matrix, U ( t ) is t × d matrix. The elements ofboth matrices are i.i.d. N (0 , V (cid:62) V = ¯ V (cid:62) S ( t ) V S ( t ) + ¯ V (cid:62) N ( t ) V N ( t ) = 0we may bound (cid:107) ¯ V (cid:62) S V S ( t ) (cid:107) ≤ (cid:107) ¯ V (cid:62) N V N ( t ) (cid:107) ≤ (cid:107) V N ( t ) (cid:107) ≤ (cid:107) V N ( t ) (cid:107) F , where (cid:107) · (cid:107) F is Frobenius norm, i.e. (cid:107) A (cid:107) F = (cid:112) tr( A (cid:62) A ) for any matrix A , issmall since it depends only on the components of the eigenvectors below thecorresponding thresholds (see Lemma 13 and deﬁnition (13)) (cid:107) V N ( t ) (cid:107) = d (cid:88) i =1 (cid:107) v i,N ( t ) (cid:107) ≤ d (cid:88) j =1 (cid:34) − r t r s rj / ( bh j ) r [log( n ∨ t )] r ∧ n (cid:35) b h j log( n ∨ t ) t ≤ CM ( t ) h d log( n ∨ t ) t , where C depends on d, r .From Lemma 10, 11 and 12 (see Appendix B) with the probability 1 − C ( n ∨ t ) − one can bound (cid:107) ( (cid:98) Σ ◦ ( t ) − Σ ◦ ( t )) ¯ V (cid:107) ≤ C λ h d M / ( t ) √ t + C ( (cid:112) λ ∨ (cid:112) log( n ∨ t ) √ t . From (34) (cid:107) ¯ V (cid:62) (Σ ◦ ( t ) − Σ) (cid:107) ≤ λ (cid:107) V N ( t ) (cid:107) . Thus (cid:107) ( (cid:98) Σ ◦ ( t ) − Σ ◦ ( t )) ¯ V (cid:107) ≤ C λ h d M / ( t ) (cid:114) log( n ∨ t ) t + C ( (cid:112) λ ∨ (cid:114) log( n ∨ t ) t . (35)27here C depends on r , d and C is a constant. Similarly, from (34) (cid:107) V (cid:62) (Σ ◦ ( t ) − Σ) (cid:107) ≤ | V N ( t ) (cid:107) (1 + o (1)) and (cid:107) V (cid:62) ( (cid:98) Σ ◦ ( t ) − Σ) (cid:107) ≤ C λ h d M / ( t ) (cid:114) log( n ∨ t ) t + C ( (cid:112) λ ∨ (cid:114) log( n ∨ t ) t , (36)where C depends on r , d and C is a constant.The bound on l ( (cid:98) Υ ◦ ( t ) , (cid:98) V ◦ ( t )) = l ( (cid:98) Υ ◦ ( t ) , (cid:98) Υ ◦ ,β ( t )) relies on Wedin’s sin θ Theorem 3 (see Appendix B) l ( (cid:98) Υ ◦ ( t ) , (cid:98) Υ ◦ ,β ( t )) ≤ (cid:107) (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − − (cid:98) Σ ◦ ,β ( t ) (cid:107) λ d ( (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − . (37)Note that (cid:107) (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − − (cid:98) Υ ◦ ,β ( t − (cid:107) ≤ (cid:107) Z ( t ) (cid:107) F , where Z ij ( t ) is a matrixwith the entries Z ij ( t ) = β j ( t ) if i ∈ S ( t ) and Z ij ( t ) = 0 if i ∈ N ( t ). Thus (cid:107) (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − − (cid:98) Υ ◦ ,β ( t ) (cid:107) ≤ CM ( t ) (cid:80) di =1 β i ( t ) and from (12) d (cid:88) i =1 β i ( t ) ≤ a log( n ∨ t ) t d (cid:88) i =1 ( λ i + 1) ≤ da λ d log( n ∨ t ) t h d That is (cid:107) (cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − − (cid:98) Υ ◦ ,β ( t ) (cid:107) ≤ CM ( t ) λ d log( n ∨ t ) t h d , where C (cid:48) depends on d , a and r .To bound the denominator of (37) note that one may decompose (cid:107) Σ (cid:98) V ◦ ( t − x (cid:107) = (cid:107) Σ z (cid:107) + (cid:107) Σ z (cid:107) , where (cid:98) V ◦ ( t − x = z + z and z ∈ ran( V ) and z ∈ ran( ¯ V ). Thus (cid:107) Σ (cid:98) V ◦ ( t − x (cid:107) ≥ (cid:107) Σ z (cid:107) . Using z ∈ ran( V ) one has (cid:107) Σ z (cid:107) ≥ ( λ d + 1) (cid:107) z (cid:107) ≥ ( λ d + 1) cos( V, (cid:98) V ◦ ( t − λ / d (cid:16)(cid:98) Σ ◦ ( t ) (cid:98) V ◦ ( t − (cid:17) ≥ ( λ d + 1) cos( V, (cid:98) V ◦ ( t − − (cid:107) (cid:98) Σ ◦ ( t ) − Σ ◦ ( t ) (cid:107) − λ (cid:107) V N ( t ) (cid:107) . l / ( (cid:98) Υ ◦ ( t ) , (cid:98) V ◦ ( t )) ≤ λ d CM / ( t ) (cid:113) log( n ∨ t ) t h d ( λ d + 1) cos( V, (cid:98) V ◦ ( t − − E ◦ ( t ) , where E ◦ ( t ) = (cid:0) C λ h d M / ( t ) + C ( √ λ ∨ (cid:1) (cid:113) log( n ∨ t ) t . Combining the above inequality, (35), (36), (32) and the spectral gap con-dition (7) we get the result in the ﬂavour of (1), that is with probability 1 − C ( n ∨ t ) − for one step of SCPAST algorithm. To get the bounds for u = t +1 , . . . , t simultaneously, similarly to Lemma 2 deﬁne the events, each of whichoccurs with probability 1 − C ( n ∨ t ) − , namely that √ u (cid:13)(cid:13) u U ( u ) (cid:62) U ( u ) − I d (cid:13)(cid:13) is bounded from above by 2( √ d + p (cid:112) log( n ∨ t )), √ u (cid:13)(cid:13) u Ξ S ( u )Ξ (cid:62) S ( u ) − I S ( u ) (cid:13)(cid:13) by 2( (cid:112) card( S ( t ))+ p (cid:112) log( n ∨ t )), √ u (cid:13)(cid:13) u U ( u ) (cid:62) Ξ (cid:62) S ( u ) (cid:13)(cid:13) by (cid:113) p log( n ∨ t ) √ u ( (cid:112) card( S ( t ))+ √ d + p (cid:112) log( n ∨ t )) . Taking the intersection of the above events for u = t + 1 , . . . , t and using Lemma 12, we get the statement of the Lemma. Proof of Lemma 6

Using Wedin sin θ Theorem 3 (Appendix B) l ( V, (cid:98) V ◦ ( t )) ≤ (cid:107) V (cid:62) (Σ − (cid:98) Σ ◦ ( t )) (cid:107) ( λ d − λ d +1 ( (cid:98) Σ ◦ ( t )) . (38)Using Weyl theorem [17] it may be shown that λ d +1 ( (cid:98) Σ ◦ ( t )) = λ d +1 + o ( λ )and thus | λ d − λ d +1 ( (cid:98) Σ ◦ ( t )) | ≥ λ d (1 + o (1)) . From (36) with probability1 − ( n ∨ T ) − (cid:107) ( (cid:98) Σ ◦ ( t ) − Σ) V (cid:107) ≤ C λ h d M / ( t ) (cid:115) log( n ∨ t ) t + C ( (cid:112) λ ∨ (cid:115) log( n ∨ T ) t . Thus r ( t ) ≤ α √ t holds with probability 1 − ( n ∨ T ) − , where α = (cid:18) λ d C λ h d M / ( t ) + C √ λ + 1 λ d (cid:19) (cid:112) log( n ∨ T ) . roof of Lemma 7 The proof follows from Lemma 4 applied to (25) with α = λ d +1 , α = α = λ +1 λ d +1 , and initial conditions given by Lemma 6. Proof of Lemma 8

Following [12] deﬁne η j = (cid:80) di =1 λ i v ji , j = 1 , . . . , n and for 0 < a − < G + = (cid:26) j : η j > a − γ (cid:113) log( n ∨ t ) t (cid:27) . To show that (cid:98) V ◦ ( t ) = (cid:98) V ( t ) onehas to prove that for the proper choice of γ and a − it holds G ⊆ G + ⊆ S ( t )with probability 1 − C ( n ∨ T ) − . To show that we ﬁrst note that (cid:98) Σ jj ( t ) ∼ (1 + (cid:80) di =1 λ i v ji ) ξ/t , where ξ is χ t r. v. Therefore P ( G (cid:54)⊂ G + ) = P  (cid:91) j (cid:54)∈ G + (cid:98) Σ jj ( t ) > γ (cid:115) log( n ∨ t ) t  ≤ (cid:88) j (cid:54)∈ G + P (cid:98) Σ jj ( t ) > γ (cid:115) log( n ∨ t ) t  ≤ n P  ξt − > γ (1 − a − ) (cid:113) log( n ∨ t ) t a − γ (cid:113) log( n ∨ t ) t  ≤ √ nγ exp  − γ (1 − a − ) log( n ∨ t )4 (cid:18) a − γ (cid:113) log( n ∨ t ) t (cid:19)  ≤ √ γ n ( n ∨ t ) − ( γ (1 − a − ) / o (1)) . Thus G ⊂ G + holds with probability 1 − C ( n ∨ T ) − , e.g. for a − = 1 −√ / √ γ ≥ √ (cid:113) log( n ∨ T )log( n ∨ t ) . Note that for any j ∈ G + there exists i ∈ { , . . . , d } , λ i v ji ≥ a − γ d (cid:113) log( n ∨ t ) t , thus for G + ⊂ S ( t ) to hold it is suﬃcient that a − dλ i (cid:115) log( n ∨ T ) t > b λ i + 1 λ i log( n ∨ t ) t . T , G ∩ S ( t ) = G , that is (cid:98) V ◦ ( t ) = (cid:98) V ( t ) withprobability 1 − C ( n ∨ T ) − . Proof of Lemma 9

From Lemma 8 with probability 1 − ( n ∨ T ) − the results of the original andoracle version of the zero-step estimation procedure coincide, that is (cid:98) V ◦ ( t ) = (cid:98) V ( t ). First let us show that the similar statement holds for (cid:98) V ◦ ( t + 1) and (cid:98) V ( t + 1). Denote t = t + 1. On the event for which (cid:98) V ◦ ( t ) = (cid:98) V ( t ) holdsit is true that (cid:98) Υ( t ) = (cid:98) Σ( t ) (cid:98) V ( t ) = (cid:98) Σ( t ) (cid:98) V ◦ ( t ) . From the construction of (cid:98) V ◦ ( t ), the submatrix (cid:98) V ◦ N ( t ) has zero entries. Note that S ( t ) ⊆ S ( t ) and N ( t ) ⊆ N ( t ). Thus (cid:98) Υ( t ) k,l = (cid:98) Σ k,S ( t ) (cid:98) v ◦ l,S ( t ) , (39)where (cid:98) v ◦ l,S ( t ) is a vector of size card( S ( t )) containing the components of (cid:98) v ◦ l ( t ) indexed by S ( t ), (cid:98) Σ k,S ( t ) is a row containing the components of k -throw of (cid:98) Σ( t ) indexed by S ( t ). Let us show that for k ∈ N ( t ) with highprobability, which is equivalent to (cid:98) Υ( t ) k,l ≤ β l ( t ) = a (cid:115) ( λ l + 1) log( n ∨ t ) t , that is during the thresholding step the components from N ( t ) would be setto zero with high probability. From (16) t (cid:98) Σ k,S ( t ) = V k Λ / d U ( t ) (cid:62) U ( t )Λ / d V (cid:62) S ( t )+ Ξ k ( t )Ξ (cid:62) S ( t )+ V k Λ / d U ( t ) (cid:62) Ξ (cid:62) S ( t )+ Ξ k ( t ) U ( t )Λ / d V (cid:62) S ( t ) , (40)where Ξ k ( t ) is k -th row of Ξ( t ). Denote by V ◦ S ( t ) a matrix containingthe ﬁrst d eigenvalues of Σ S ( t ) as columns (recall (24)) and by ¯ V ◦ S ( t ) amatrix with card( S ( t )) − d columns which complete columns of V ◦ S ( t ) tothe orthonormal basis in R card( S ( t )) . Note that¯ V ◦ S ( t ) ¯ V ◦ , (cid:62) S ( t ) + V ◦ S ( t ) V ◦ , (cid:62) S ( t ) = I S ( t ) . V (cid:62) S ( t )) in the view of (40) onegets (cid:98) Υ( t ) k,l = q + q + q + q + q + q + q + q , where q -s with the listed below with the ﬁrst index 1 depend on ¯ V ◦ S ( t ) andwith the ﬁrst index 2 depend on V ◦ S ( t ). Let us ﬁrst bound the terms q and q . To this end we add and subtract V k Λ d V (cid:62) S ( t ) in the ﬁrst term in (40)and use that (cid:13)(cid:13)(cid:13) t U ( t ) (cid:62) U ( t ) − I (cid:13)(cid:13)(cid:13) = o (1), thus | q | ≤ (1 + o (1)) (cid:107) V k Λ d (cid:107)(cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) (41)where it was also used that (cid:107) V (cid:62) S ( t ) ¯ V ◦ S ( t ) (cid:107) ≤ k ∈ N ( t ), that is, | V kl | = | v kl | ≤ b (cid:113) h i log( n ∨ t ) t . Using thedeﬁnition of β k ( t ) (recall (12)) (cid:107) V k Λ d (cid:107) ≤ ba β k ( t ) (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 λ i + 1 λ k + 1 , (42) (cid:107) V k Λ / d (cid:107) ≤ ba β k ( t ) (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 λ i + 1( λ k + 1) λ i . (43)Thus using (42) and (43) the term (41) may be bounded as | q | ≤ (1 + o (1)) ba β k ( t ) (cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 λ i + 1 λ l + 1and in the same way it can be shown that | q | ≤ (1 + o (1)) ba β k ( t ) (cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 λ i + 1 λ l + 1 . Next | q | = 1 t | Ξ k ( t )Ξ (cid:62) S ( t ) ¯ V ◦ S ( t ) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) |≤ t ζ ( k, S ( t )) (cid:107) Ξ (cid:62) S ( t ) (cid:107)(cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) , ζ ( k, l, S ( t )) = t q (cid:107) Ξ (cid:62) S ( t ) ¯ V ◦ S ( t ) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) . Note that (cid:98) v ◦ l,S ( t ) is the l -th eigenvector of (cid:98) Σ S ( t ) and doesn’t depend onΞ k ( t ), k ∈ N ( t ), and since N ( t ) ⊆ N ( t ), Ξ k ( t ) is independent fromΞ S ( t ), thus ζ ( k, l, S ( t )) has N (0 ,

1) distribution. Deﬁne the events | ζ ( k, l, S ( t )) | ≤ (cid:112) c log( n ∨ t )and (cid:107) Ξ S ( t ) (cid:107) ≤ √ t + (cid:112) card( S ( t )) + 2 (cid:112) log( n ∨ t ) . For big enough t (guarantied by (14)) t dominates card( S ( t )) and log( n ∨ t ).Thus | q | ≤ t | ζ ( k, S ( t )) |(cid:107) Ξ (cid:62) S ( t ) (cid:107)(cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107)≤ a (cid:18) c λ l + 1 (cid:19) / β l ( t ) log( n ∨ t )log( n ∨ t ) (cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) . On the events deﬁned in the end of the proof of Lemma 5 the bound for theterm q is as follows | q | ≤ t (cid:107) V k Λ / d (cid:107)(cid:107) U ( t ) (cid:62) Ξ (cid:62) S ( t ) (cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107)≤ ba card( S ( t )) √ λ d t log( n ∨ t )log( n ∨ t ) β l ( t ) (cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) . From λ d card( S ( t ))) t = o (1) (see Supplementary materials for [5] p.16) it followsthat | q | = o ( β l ( t )). To bound the term q one may utilize the sameargument as for q | q | ≤ t | g ( k, S ( t )) |(cid:107) U ( t )Λ / d (cid:107)(cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) , where g ( k, l, S ( t )) = t q (cid:107) U ( t )Λ / d V (cid:62) S ( t ) ¯ V ◦ S ( t ) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107)

33s independent from U ( t ) and (cid:98) v ◦ l,S ( t ), furthermore g ( k, l, S ( t )) has N (0 , {(cid:107) U ( t ) (cid:107) ≤ √ t + √ d +2 (cid:112) log( n ∨ t ) } and {| g ( k, l, S ( t )) | ≤ (cid:112) c log( n ∨ t ) } . On these events with probability1 − C ( n ∨ t ) − | q | ≤ t g ( k, l, S ( t )) (cid:107) U ( t ) (cid:107)(cid:107) Λ / d (cid:107)(cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107)≤ a (cid:18) c λ λ l + 1 (cid:19) / (cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) β l ( t ) log( n ∨ t )log( n ∨ t ) . In the similar way term q may be bounded as follows q ≤ (cid:13)(cid:13)(cid:13)(cid:13) t Ξ k Ξ (cid:62) S ( t ) V ◦ S ( t ) (cid:13)(cid:13)(cid:13)(cid:13) (cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107)≤ ba (1 + o (1)) (cid:114) c λ l + 1 β j ( t ) log( n ∨ t )log( n ∨ t ) (cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) . The bound on the term q due to (43) reads as | q | = 1 t (cid:107) V k Λ / d (cid:107)(cid:107) U ( t ) (cid:62) Ξ (cid:62) S ( t ) V ◦ S ( t ) (cid:107)(cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107)≤ (cid:115) log( n ∨ t ) t (cid:107) V k Λ / d (cid:107)(cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) = o ( β l ( t )) . Similarly to the case of q one can show that | q | ≤(cid:107) t Ξ k U ( t ) (cid:107)(cid:107) Λ / d V (cid:62) S ( t ) V ◦ S ( t ) (cid:107)(cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107)≤ (cid:114) c λ λ l + 1 1 a β l ( t ) log( n ∨ t )log( n ∨ t ) (cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) . Note that (see [5]) (cid:107) V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) = 1 + o (1) and (cid:107) ¯ V ◦ , (cid:62) S ( t ) (cid:98) v ◦ l,S ( t ) (cid:107) = o (1),that is from above bounds (cid:80) i =1 | q i | = o (cid:0)(cid:80) i =1 | q i | (cid:1) . Therefore (cid:98) Υ( t ) k,l ≤ ba β l ( t ) (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 λ j + 1 λ l + 1+ 1 a β l ( t ) √ c log( n ∨ t )log( n ∨ t ) (cid:114) λ + 1 λ l + 1 . (cid:113)(cid:80) dj =1 λ j +1 λ l +1 ≤ √ τ √ d and λ /λ j ≤ τ . Let us bound log( n ∨ t )by log( n ∨ T ) thus a ≥ √ c log( n ∨ T )log( n ∨ t ) , b = 0 . a − √ c n ∨ T )log( n ∨ t ) √ τ √ d . Therefore one gets for all k ∈ N ( t ) | (cid:98) Υ( t ) k,l | ≤ β l ( t )and (cid:98) V ◦ N ( t + 1) = 0, and so, (cid:98) V ◦ N ( t + 1) = (cid:98) V N ( t + 1) . To show that (cid:98) V ◦ N ( u ) = (cid:98) V N ( u ) , u = t + 2 , . . . , t we consider the events deﬁned for the standard normal random variables z ( k, l, S ( u )) and g ( k, l, S ( u )) { z ( k, l, S ( u )) ≤ (cid:112) c log( n ∨ t ) } , {| g ( k, l, S ( u )) | ≤ (cid:112) c log( n ∨ t ) } . Using the union bound P  (cid:91) l ∈ N ( u ) ,k =1 ,...,d,u = t +1 ,...,t (cid:110) z ( k, l, S ( u )) ≤ (cid:112) c log( n ∨ t ) (cid:111) ≤ − d (cid:88) k =1 t (cid:88) u = t +1 (cid:88) l ∈ N ( t ) P { z ( k, l, S ( u )) ≤ (cid:112) c log( n ∨ t ) }≤ − nd ( t − t ) P { z ( k, l, S ( t )) ≤ (cid:112) c log( n ∨ t ) }≤ − C n ( t − t )log( n ∨ t ) − ( n ∨ t ) − c / . Take c ≥ ppendix B. Concentration of the spectral normof the perturbation Denote δ n,t = log( n ∨ t ). Lemma 10. [18] Let X be a t × n matrix with i.i.d. N (0 , entries. Thefollowing result holds true P (cid:18)(cid:13)(cid:13)(cid:13)(cid:13) t X (cid:62) X − I n (cid:13)(cid:13)(cid:13)(cid:13) ≥ E ( t, n, p ) (cid:19) ≤ n ∨ t ) − p / , where E ( t, n, p ) = 3 max (cid:114) nt + p (cid:112) δ n,t √ t , (cid:34)(cid:114) nt + p (cid:112) δ n,t √ t (cid:35)  . Lemma 11. [19] Let X and Y be t × q and t × m matrices, q > m , withi.i.d. N (0 , entries then for any < x < / and c > P (cid:0) (cid:107) X (cid:62) Y (cid:107) ≥ tE ( t, q, m, x, c ) (cid:1) ≤ e − c δn,t + q e − x δn,t , where E ( t, q, m, x, c ) = (cid:114) x δ n,t t (cid:32)(cid:114) qt + (cid:114) mt + c (cid:112) δ n,t √ t (cid:33) . Lemma 12. [5] There exist constants ˜ C and ¯ C depending on r and ¯ C and ˜ C such that E ( t, card( S ( t )) , p ) ≤ ˜ C λ M / ( t ) √ t h d + ˜ C (cid:114) δ n,t tE ( t, card( S ( t )) , d, , p ) ≤ ¯ C M / ( t ) √ t h d h + ¯ C (cid:114) δ n,t t . Theorem 3. (Wedin sin θ ) Let A and B be n × k , n ≥ k , full-column rankmatrices. Let the columns of a n × ( n − k + 1) matrix U be the orthogonalmatrices spanning the orthogonal complement of range of B . If the λ min ( A ) ≥ (cid:15) ≥ then l ( A, B ) ≤ (cid:107) A (cid:62) U (cid:107) (cid:15) ≤ (cid:107) B − A (cid:107) (cid:15) . heorem 4. (Davis sin θ ) [20] Let A and B be the symmetric matrices withthe decomposition A = W Λ W (cid:62) + W Λ W (cid:62) and B = U ∆ U (cid:62) + U ∆ U (cid:62) ,with conditions [ U , U ] is orthogonal, W is orthonormal and W (cid:62) W = 0 ,the eigenvalues of Λ W (cid:62) W are contained in the interval ( a , a ) and theeigenvalues of ∆ are laying outside of the interval ( a − (cid:15), a − (cid:15) ) for some (cid:15) > then l ( W , U ) ≤ (cid:107) U (cid:62) ( B − A ) W (cid:107) (cid:15)λ ( W ) . Lemma 13.

The norms of the subvectors v j,N ( t ) of v j satisfy (cid:107) v j,N ( t ) (cid:107) ≤ (cid:34) − r s rj ( bh j ( t )) r (cid:18) δ n,t t (cid:19) − r/ ∧ n (cid:35) b h j ( t ) δ n,t t . Lemma 14.

Bound on the eﬀective dimension card( S ( t )) is given by d ≤ card( S ( t )) ≤ CM ( t ) = C (cid:34) n ∧ d (cid:88) j =1 s rj h − rj (cid:18) δ n,t t (cid:19) − r/ (cid:35) . Lemma 15.

For a χ t random variable ζ t the following bounds hold [21] P ( ζ t > t (1 + ε )) ≤ e − tε / , < ε < / , P ( ζ t < t (1 − ε )) ≤ e − tε / , < ε < , P ( ζ t > t (1 + ε )) ≤ √ ε √ t e − tε / , < ε < t / , t > . Acknowledgment

This work is funded by the German Research Foundation (DFG), Collabo-rative Research Center 823, Subprojects B3 and C5.

References [1] B. Yang, “Projection approximation subspace tracking,”

IEEE Trans-actions on Signal processing , vol. 43, no. 1, pp. 95–107, 1995.372] J.-P. Delmas, “Subspace tracking for signal processing,” 2010.[3] B. Yang, “Convergence analysis of the subspace tracking algorithms pastand pastd,” in

Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on ,vol. 3. IEEE, 1996, pp. 1759–1762.[4] ——, “Asymptotic convergence analysis of the projection approximationsubspace tracking algorithms,”

Signal processing , vol. 50, no. 1-2, pp.123–136, 1996.[5] Z. Ma et al. , “Sparse principal component analysis and iterative thresh-olding,”

The Annals of Statistics , vol. 41, no. 2, pp. 772–801, 2013.[6] A. Valizadeh and M. Karimi, “Fast subspace tracking algorithm basedon the constrained projection approximation,”

EURASIP Journal onAdvances in Signal Processing , vol. 2009, p. 9, 2009.[7] G. H. Golub and C. F. Van Loan,

Matrix computations . JHU Press,2012, vol. 3.[8] A. Birnbaum, I. M. Johnstone, B. Nadler, and D. Paul, “Minimaxbounds for sparse pca with noisy high-dimensional data,”

Annals ofstatistics , vol. 41, no. 3, p. 1055, 2013.[9] D. L. Donoho, “Unconditional bases are optimal bases for data compres-sion and for statistical estimation,”

Applied and computational harmonicanalysis , vol. 1, no. 1, pp. 100–115, 1993.[10] I. M. Johnstone and A. Y. Lu, “Sparse principal components analysis,”

Unpublished manuscript , vol. 7, 2004.[11] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihoodand its oracle properties,”

Journal of the American statistical Associa-tion , vol. 96, no. 456, pp. 1348–1360, 2001.[12] D. Paul and I. M. Johnstone, “Augmented sparse principal componentanalysis for high dimensional data,” arXiv preprint arXiv:1202.1242 ,2012.[13] D. Paul, “Nonparametric estimation of principal components,” 2004.3814] E. Krymova, A. Nagathil, D. Belomestny, and R. Martin, “Segmenta-tion of music signals based on explained variance ratio for applications inspectral complexity reduction,” in

Acoustics, Speech and Signal Process-ing (ICASSP), 2017 IEEE International Conference on . IEEE, 2017,pp. 206–210.[15] J. C. Brown, “Calculation of a constant q spectral transform,”

TheJournal of the Acoustical Society of America , vol. 89, no. 1, pp. 425–434, 1991.[16] M. Hardt and E. Price, “The noisy power method: A meta algorithmwith applications,” in

Advances in Neural Information Processing Sys-tems , 2014, pp. 2861–2869.[17] G. Stewart and J.-G. Sun, “Matrix perturbation theory (computer sci-ence and scientiﬁc computing),” 1990.[18] R. Vershynin, “Introduction to the non-asymptotic analysis of randommatrices,” arXiv preprint arXiv:1011.3027 , 2010.[19] K. R. Davidson and S. J. Szarek, “Local operator theory, random ma-trices and banach spaces,”

Handbook of the geometry of Banach spaces ,vol. 1, no. 317-366, p. 131, 2001.[20] C. Davis and W. M. Kahan, “The rotation of eigenvectors by a per-turbation. iii,”

SIAM Journal on Numerical Analysis , vol. 7, no. 1, pp.1–46, 1970.[21] I. M. Johnstone, “Chi-square oracle inequalities,”

Related Researches

A robust multivariate linear non-parametric maximum likelihood model for ties

by Landon Hurley

An Aligned Rank Transform Procedure for Multifactor Contrast Tests

by Lisa A. Elkin

Fast and frugal time series forecasting

by Fotios Petropoulos

Parameter estimation in nonlinear mixed effect models based on ordinary differential equations: an optimal control approach

by Quentin Clairon

A Review of Generalizability and Transportability

by Irina Degtiar

Sharp Inference on Selected Subgroups in Observational Studies

by Xinzhou Guo

Interactive identification of individuals with positive treatment effect while controlling false discoveries

by Boyan Duan

Regression discontinuity design: estimating the treatment effect with standard parametric rate

by Debarghya Mukherjee

Sequential Bayesian experimental design for estimation of extreme-event probability in stochastic dynamical systems

by Xianliang Gong

Bayesian Non-parametric Quantile Process Regression and Estimation of Marginal Quantile Effects

by Steven G. Xu

Modelling Extremes of Spatial Aggregates of Precipitation using Conditional Methods

by Jordan Richards

Probabilistic Learning on Manifolds (PLoM) with Partition

by Christian Soize

Left-censored recurrent event analysis in epidemiological studies: a proposal when the number of previous episodes is unknown

by Gilma Hernández-Herrera

Estimating Sibling Spillover Effects with Unobserved Confounding Using Gain-Scores

by David C. Mallinson

Distributional data analysis via quantile functions and its application to modelling digital biomarkers of gait in Alzheimer's Disease

by Rahul Ghosal

The likelihood-ratio test for multi-edge network models

by Giona Casiraghi

Group Inverse-Gamma Gamma Shrinkage for Sparse Regression with Block-Correlated Predictors

by Jonathan Boss

Adaptive Importance Sampling for Efficient Stochastic Root Finding and Quantile Estimation

by Shengyi He

Bi-factor and second-order copula models for item response data

by Sayed H. Kadhem

Accounting for recall bias in case-control studies: a causal inference approach

by Kwonsang Lee

Designing Experiments Informed by Observational Studies

by Evan Rosenman

Adaptive dose-response studies to establish proof-of-concept in learning-phase clinical trials

by Shiyang Ma

Diagnostics for Conditional Density Models and Bayesian Inference Algorithms

by David Zhao

Adaptive Frequency Band Analysis for Functional Time Series

by Pramita Bagchi

A Basis Approach to Surface Clustering

by Adriano Zanin Zambom

«
1

2

3

4

»

Submitted on 22 Oct 2018 (v1), last revised 23 Nov 2018 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar