[PDF] Identification of Matrices having a Sparse Representation

Abstract

We consider the problem of recovering a matrix from its action on a known vector in the setting where the matrix can be represented efficiently in a known matrix dictionary. Connections with sparse signal recovery allows for the use of efficient reconstruction techniques such as Basis Pursuit. Of particular interest is the dictionary of time-frequency shift matrices and its role for channel estimation and identification in communications engineering. We present recovery results for Basis Pursuit with the time-frequency shift dictionary and various dictionaries of random matrices.

Full PDF

IIdentiﬁcation of Matrices having a Sparse Representation

G¨otz E. Pfander ∗ , Holger Rauhut † , Jared Tanner ‡ We consider the problem of recovering a matrix from its action on a known vector inthe setting where the matrix can be represented eﬃciently in a known matrix dictionary.Connections with sparse signal recovery allows for the use of eﬃcient reconstruction tech-niques such as Basis Pursuit. Of particular interest is the dictionary of time-frequencyshift matrices and its role for channel estimation and identiﬁcation in communicationsengineering. We present recovery results for Basis Pursuit with the time-frequency shiftdictionary and various dictionaries of random matrices.

1. INTRODUCTION

Inferring reliable information from limited data is a key task in the sciences. For example, identifyinga channel operator from its response to a limited number of test signals is a crucial step in radar andcommunications engineering [25, 32, 34, 40, 43, 49]. Here we consider the canonical setting where anoperator is approximated by a linear map, that is, by a matrix Γ ∈ C n × m . While it is clear that Γ isdetermined by its action on any m vectors that span C m , signiﬁcantly fewer measurements may be suﬃcientif a-priori information about the operator is at hand. For instance, one commonly considers the questionwhether a single test signal h , referred to also as identiﬁer, can be used to identify Γ from Γ h . A priori information guaranteeing that such an h exists is generally deduced from physical considerations whichmay ensure that Γ can be eﬃciently represented or approximated using relatively few basic matrices froma known matrix dictionary.In wireless communications ([13, 28, 41] and references within) and sonar [39, 50], for example, thenarrowband regime of a transmission channel can generally be well approximated by a linear combination ∗ School of Engineering and Science, Jacobs University Bremen, 28759 Bremen, Germany, [email protected] † Numerical Harmonic Analysis Group, Faculty of Mathematics, University of Vienna, Nordbergstrasse 15, A-1090 Vienna,Austria, [email protected]. H.R. is supported by the European Union’s Human Potential Programme under contractMEIF-CT 2006-022811. ‡ Department of Mathematics, University of Utah, 155 South 1400 East, Salt Lake City, UT 84112-0090, USA. J.T. would liketo thank John E. and Marva M. Warnock for their generous support in the form of an endowed chair. [email protected] a r X i v : . [ m a t h . NA ] A p r f a small number of time-frequency shift matrices. Signals travel from the source to the receiver along anumber of diﬀerent paths, each of which can be modeled by a time shift (delay dependent on the length of thepath traveled) and a frequency shift (Doppler eﬀect caused by the motion of the transmitter, of the receiver,and of reﬂecting objects) [5, 28]. It is frequently assumed, that the number of relevant (but unknown) paths,that is, in slightly simpliﬁed terms the number of involved time-frequency shifts is relatively small whencompared to the symbol length. For example, for mobile communications the number of paths required towell approximate a channel in rural areas or typical urban regiments does not exceed 10 [41, pages 266,283],see also [10, 13]. In wireless communications the beneﬁt of recovering the operator at the receiver is clear.Knowledge of the operator is necessary to invert it and to recover the information carrying channel inputfrom the channel output.Complexity regularization has recently seen a resurgence of interest in the signal processing communityunder the monikers sparse signal recovery and sparse approximation . In sparse signal recovery, one seeks thesolution of an underdetermined system of equations Ax = b , A ∈ C n × N , n < N , with x having the fewestnumber of non-zero entries from all solutions of Ax = b . We show in Section 2 that the identiﬁcation of amatrix from its action on a single test signal falls into the same setting as sparse signal recovery when thematrix is known to have a sparse representation. This observation allows us to adopt eﬃcient algorithmsfrom sparse signal recovery for the sparse matrix identiﬁcation question. Examples of applications includethe channel identiﬁcation, estimation, or sounding problem described in part above, which also have beenconsidered in the case of time-invariant channels in [11, 14, 30]. Numerical results based on Basis Pursuithave been obtained for time-varying channels in [48]. Further, the application of recovery methods of sparselyrepresented operators to radar measurements is discussed in [32].In brief, the content of this paper is organized as follows. In Section 2 we formalize the matrix identiﬁca-tion problem for matrices with sparse representations. We establish a connection to the recovery problem ofvectors with sparse representations and state the main results that are proven and discussed in greater detailin Section 4 and Section 5. In particular, we consider matrix ensembles of random Gaussian or Bernoullimatrices as well as partial Fourier matrices (Section 2.1 and Section 4).In Section 2.2 and Section 5 we consider matrix dictionaries of time-frequency shift matrices which areof particular interest due to their eﬃcacy in approximating time-varying transmission channels. We would2ike to emphasize that the common framework of the identiﬁcation problem for matrices with a sparserepresentation and the sparse signal recovery problem implies that the results achieved on the recovery ofmatrices with a sparse representation in the dictionary of time-frequency shift matrices are at the same timeresults for the recovery of signals with a sparse representation in Gabor frames.In Section 6 we brieﬂy discuss the use of several test vectors instead of just one, and comment on howthis improves corresponding recovery results.We conclude with numerical experiments in Section 7. They verify our main results concerning sparserepresentations with time-frequency shift matrices stated in Theorem 2.5, and show that the precise recov-erability thresholds follow those proven for Gaussian random matrices in [24]; that is, for matrices havinga k -sparse representation we observe Basis Pursuit to successfully recover the matrix from its action on asingle vector provided k ≤ n/ (2 log n ).

2. MAIN RESULTS AND CONTEXT

Before comparing the matrix identiﬁcation problem with sparse signal recovery, we formalize the notion ofa matrix having a k -sparse representation. Definition 2.1.

A matrix Γ has a k -sparse representation in the matrix dictionary Ψ = { Ψ j } Nj =1 if Γ = (cid:88) j x j Ψ j with (cid:107) x (cid:107) = k, and (cid:107) x (cid:107) counts the number of non-zero entries in x , that is (cid:107) x (cid:107) = | supp x | = cardinality { x j : x j (cid:54) = 0 } . The set of elementary matrices comprising Ψ may form a basis for C n × m but it may as well only span asubspace of C n × m and/or contain linearly dependent subsets. In Deﬁnition 2.1 we place no restrictions onthe dictionary Ψ .Identiﬁcation of matrices having a sparse representation from their action on a single vector (henceforthreferred to simply as sparse matrix identiﬁcation , which is not to be confused with the notion of sparsematrices in numerical analysis) can be formulated as sparse signal recovery problem through the simpleobservation that the action of Γ on a test signal h ∈ C m can be expressed as Γ h = (cid:16) N (cid:88) j =1 x j Ψ j (cid:17) h = N (cid:88) j =1 x j (cid:0) Ψ j h (cid:1) = ( Ψ h | Ψ h | . . . | Ψ N h ) x =( Ψ h ) x (1)3here x = ( x , x , · · · , x N ) T and ( Ψ h ) = ( Ψ h | Ψ h | . . . | Ψ N h ) . In classical sparse signal recovery the sparsest vector x satisfying Ax = b is sought given b and A ; toidentify the matrix Γ , Γ h takes the place of b and the j th column of A is Ψ j h for j = 1 , , . . . , N .As mentioned above, we note that in case of the Ψ j being time-frequency shift matrices, the columnsin A = ( Ψ h ) form a Gabor system with window h [12, 29, 37]. Consequently, all our identiﬁability resultsconcerning representations with time-frequency shift matrices are also results for the recovery of signals thatare sparse in a Gabor system. Remark 2.2.

Although sparse matrix identiﬁcation can be cast as sparse signal recovery, two importantdiﬀerences should be noted. • In most applications, sparse signal recovery is only of interest for k -sparse vectors with k < n , as thelinear dependence of the N columns of A ∈ C n × N , n < N , implies that n -term solutions x for Ax = b are never unique. However, in some cases an n -term solution might be of interest if there is no sparsersolution of Ax = b . In contrast, the goal in sparse matrix identiﬁcation is not to represent b = Γ h eﬃciently, but to recover Γ . The non-uniqueness of n -term solutions to ( Ψ h ) x = Γ h implies thatthere always exist inﬁnitely many n − sparse matrices Γ (cid:48) consistent with the observations Γ (cid:48) h = Γ h .As such, the recovery of an n -sparse x in the sparse matrix identiﬁcation setting does not give anyinformation about the matrix to be identiﬁed, Γ . • In sparse signal recovery the columns of A are used to represent or to approximate b , whereas forsparse matrix identiﬁcation the matrices Ψ j are used to represent or approximate Γ . However, unlikesparse signal recovery where the columns of A appear explicitly in the reconstruction, the Ψ j do notappear explicitly when sparse matrix identiﬁcation is cast as sparse signal recovery (1); rather, only theaction of Ψ j on the test vector h is utilized. The test vector h ∈ C m has no analog in traditional sparsesignal recovery, and can be exploited in sparse matrix identiﬁcation to design desirable characteristicsin Ψ j h . This design freedom is utilized extensively in our main results concerning the matrix dictionaryof time-frequency shifts, Theorem 2.5.Note that the computational diﬃculty in sparse signal recovery, sparse approximation, and our formu-lation of sparse matrix identiﬁcation arises from the fact that the support set of the non-zero entries in x is4nknown. While the direct solution of ﬁnding the sparsest representation of Γ in the dictionary Ψ min (cid:107) x (cid:48) (cid:107) subject to ( Ψ h ) x (cid:48) = Γ h , (2)involves a combinatorial search of the support set and is therefore computationally intractable, a numberof computationally eﬃcient algorithms have been shown to recover the sparsest solution if appropriateconditions are met. We concentrate here on recoverability conditions for the canonical sparse signal recoveryalgorithm Basis Pursuit (BP) where the convex problemmin (cid:107) x (cid:48) (cid:107) subject to ( Ψ h ) x (cid:48) = Γ h , (3) (cid:107) x (cid:107) = (cid:80) j | x j | , is solved as a proxy to (2).The convex program (3) can be solved eﬃciently using well established optimization algorithms forsecond-order cone programming and linear programming [6, 18, 33], for complex and real valued systems,respectively. We give theoretical and numerical evidence for conditions where the solution of (3) coincidesexactly with that of (2). Many other algorithms may also be used as proxys for (2), including OrthogonalMatching Pursuit (OMP) [26, 36, 52], stagewise orthogonal matching pursuit (StOMP) [16], and an algorithmbased upon error correcting codes [2]–to name a few. Our principal technical results in Section 5.1 also giveresults for OMP, but for conciseness we do not state them here, leaving them to the interested reader.In practice, the measured vector Γ h will be contaminated by noise, and, in addition, the operator Γ willnot be strictly sparse, but will instead be well approximated by a sparse representation; in this case theminimization problem (3) will be replaced by its well known variantmin (cid:107) x (cid:48) (cid:107) subject to (cid:107) ( Ψ h ) x (cid:48) − Γ h (cid:107) ≤ (cid:15), (4)where (cid:107) z (cid:107) = (cid:113)(cid:80) j | z j | as usual. Many known results in sparse signal recovery, sparse approximations and their companion theory of com-pressed sensing involve random matrices [4, 9, 15, 24, 46]. Based on these results, we obtain recovery resultsfor matrix dictionaries where all its member matrices are chosen at random. From a practical point ofview such random matrix dictionaries do not seem to be useful in the sparse matrix identiﬁcation setting;5evertheless, the statements give some insight into the sparse matrix identiﬁcation question as they giveguidance in what kind of results to seek in the mathematical analysis of structured and more applicationrelevant matrix dictionaries.

Theorem 2.3.

Let h be a non-zero vector in R m .(a) Let all entries of the N matrices Ψ j ∈ R n × m , j = 1 , . . . , N be chosen independently according to astandard normal distribution (Gaussian ensemble); or(b) let all entries of the N matrices Ψ j ∈ R n × m , j = 1 , . . . , N be independent Bernoulli ± variables(Bernoulli ensemble).Then there exists a positive constant c so that for ε > , k ≤ c n log (cid:0) Nnε (cid:1) implies that with probability of at least − ε all matrices Γ having a k -sparse representation with respect to Ψ = { Ψ j } can be recovered from Γ h by Basis Pursuit (3). Using Theorem 3.6, this recovery result can be made stable under perturbation of Γ h by noise, and alsoapplies when Γ is not exactly k -sparse, but can be well approximated by a k -sparse operator.Precise information on the constant c will be given in Section 4. In case of the Gaussian ensemble Donohoand Tanner [17, 19, 20, 23, 24] obtained sharp thresholds separating regions in the ( k/n , n/N ) plane whererecovery holds or fails with high probability; Section 4.1 recounts these and additional results on Gaussiansystems. Theorem 2.3(b) is proven in Section 4.2, and similar results for certain diagonal matrices are provenin Section 4.3. As outlined in the introduction, the matrix dictionary of time-frequency shifts appears naturally in thechannel identiﬁcation problem in wireless communications [5] or sonar [50]. Due to physical considerationswireless channels may indeed be modeled by sparse linear combinations of time-frequency shifts M (cid:96) T p ,where the periodic translation operators T p and modulation operator M (cid:96) on C n are given by( T p h ) q = h ( p + q ) mod n , ( M (cid:96) h ) q = e πi(cid:96)q/n h q . (5)6he system of time-frequency shifts, G = { M (cid:96) T p : (cid:96), p = 0 , . . . , n − } , (6)forms a basis of C n × n and for any non-zero h , the vector dictionary G h is a Gabor system [29, 35, 37].Below, we focus on the so-called Alltop window h A [3, 51] with entries h Aq = 1 √ n e πiq /n , q = 0 , . . . , n − , (7)and the randomly generated window h R with entries h Rq = 1 √ n (cid:15) q , q = 0 , . . . , n − , (8)where the (cid:15) q are independent and uniformly distributed on the torus { z ∈ C , | z | = 1 } .Invoking existing recovery results [22, 27, 52, 53] (see Theorems 3.1 and 3.2 below) and our results onthe coherence of Gabor systems G h A and G h R in Section 5.1, see Section 2.4, we will obtain Theorem 2.4. (a) Let n be prime and h A be the Alltop window deﬁned in (7). If k < √ n +12 then Basis Pursuit recoversfrom Γ h A all matrices Γ ∈ C n × n having a k -sparse representation, Γ = (cid:80) ( p,(cid:96) ) ∈ Λ x p(cid:96) M (cid:96) T p , | Λ | = k ,with respect to the time-frequency shift dictionary G given in (6).(b) Let n be even and choose h R to be the random unimodular window in (8). Let t > and suppose k ≤ (cid:114) n n + log 4 + t + 12 . (9) Then with probability of at least − e − t Basis Pursuit recovers from Γ h R all matrices Γ ∈ C n × n havinga k -sparse representation with respect to the time-frequency shift dictionary G given in (6). A slight variation of part (b) also holds for n odd, but is omitted for conciseness. Further note thatTheorem 2.4 also holds with Basis Pursuit literally being replaced by Orthogonal Matching Pursuit [52].Moreover, Theorem 3.2 shows that recovery is stable under perturbation of Γ h A and Γ h R by noise.In contrast with Theorem 2.3 for random matrices, where k is allowed to be of order O ( n/ log n ),Theorem 2.4 requires k to be of order √ n or (cid:112) n/ log n . Substantially larger order thresholds, O ( n/ log n )7or h A and O ( n/ log ( n )) for h R , are also possible to identify a matrix Γ which is the linear combination of asmall number of time-frequency shift matrices. However, this larger regime of successful recovery necessitatespassing from a worst case analysis for sparse Γ to an average case analysis in the sense that the coeﬃcientvector x is chosen at random. Theorem 2.5 will follow from recent work by Tropp, [54], and our coherenceresults in Section 5.1, see Section 5.3. Theorem 2.5.

Let k ≥ and let Λ be chosen uniformly at random among all subsets of { , . . . , n − } of cardinality k . Suppose further that x ∈ C n has support Λ with random phases (sgn( x (cid:96)p )) ( (cid:96),p ) ∈ Λ that areindependent and uniformly distributed on the torus { z, | z | = 1 } . Let Γ = (cid:88) ( (cid:96),p ) ∈ Λ x (cid:96)p M (cid:96) T p . (a) Let n be prime and choose the Alltop window h A from (7). Assume that for (cid:15) > k ≤ n n /(cid:15) ) (10) and s := 1144 (cid:18) e − / / − kn (cid:19) nk log( k/ ≥ Then with probability at least − ( (cid:15) + ( k/ − s ) Basis Pursuit (3) recovers Γ from Γ h A .(b) Let n be an even number and choose the random window h R from (8). Assume k ≤ n σ + 2) log( n ) log(2 n /(cid:15) ) for some σ > and s := 1576( σ + 2) (cid:18) e − / / − kn (cid:19) · nk log( k/ ≥ Then with probability at least − ( (cid:15) + 4 n − σ + ( k/ − s ) Basis Pursuit (3) recovers Γ from Γ h R . (A similar result also holds for n odd.)

8n simple terms, Theorem 2.5 states that Γ can be recovered from Γ h A or Γ h R with high probability − ε provided that the sparsity of Γ satisﬁes k ≤ C ε n/ log n in case of h A and k ≤ C (cid:48) ε n/ log( n ) in case of h R .In Section 5.4 we use a simple argument from time-frequency analysis to obtain Corollary 2.6.

Theorems 2.4, 2.5, and 5.1, also hold with the windows h A and h R replaced by theirFourier transforms (cid:99) h A and (cid:99) h R , with entries deﬁned as (cid:98) h j = √ n (cid:80) n − j =0 h q e πijq/n .

3. TOOLS IN SPARSE SIGNAL RECOVERY

It was shown in (1) that for any test signal h , we have Γ h = ( Ψ h ) x where x is the sparse coeﬃcientvector of Γ . This observation links the sparse matrix identiﬁcation question with sparse signal recoverywhere one seeks the sparsest solution (2) to the underdetermined system Ax = b ; in the sparse matrixidentiﬁcation setting ( Ψ h ) = ( Ψ h | Ψ h | . . . | Ψ N h ) takes the place of A and Γ h the place of b . Incontrast to sparse approximation, where the dictionary A is usually ﬁxed, for sparse matrix identiﬁcationwe have the additional freedom of designing the test signal h in order for ( Ψ h ) to have desirable properties.Let us shortly recall known results in sparse signal recovery and sparse approximation that we applyto the sparse matrix identiﬁcation question. In Section 3.1 we review the notion of coherence (12) andits implications for sparse signal recovery and approximation using Basis Pursuit, (3) and (4), as wellas Orthogonal Matching Pursuit. In Section 3.2 we review the restricted isometry property, allowing forimproved recoverability results for Basis Pursuit. The recoverability properties of sparse signal recovery algorithms for an underdetermined system Ax = b is often measured by the coherence of A , µ = max r (cid:54) = s |(cid:104) a r , a s (cid:105)| , (12)where a r is the r th column of A and (cid:107) a r (cid:107) = 1 for all r . Theorem 3.1 (Tropp [52]; Donoho, Elad [21]).

Let A be a unit norm dictionary with coherence µ . If (2 k − µ < hen Basis Pursuit (as well as Orthogonal Matching Pursuit) recovers all k -sparse vectors x from b = Ax . Recovery is also stable under perturbation by noise when Basis Pursuit (3) is replaced with (4).

Theorem 3.2 (Donoho et al. [22], Theorem 3.1).

Let A , µ be as above and suppose that (4 k − µ < . Assume that x is k -sparse and we have perturbed observations b = Ax + z with (cid:107) z (cid:107) ≤ (cid:15) . Then thesolution x of the Basis Pursuit variant min (cid:107) x (cid:48) (cid:107) subject to (cid:107) Ax (cid:48) − b (cid:107) ≤ δ satisﬁes (cid:107) x − x (cid:107) ≤ ( (cid:15) + δ ) − µ (4 k − . Theorems 3.1 and 3.2 ensure that the solutions of (3) and (4) correspond (exactly and approximately,respectively) to the solution of (2) for all k -sparse x . For a broad class of dictionaries the coherence is oforder O (1 / √ n ), see Sections 4 and 5 for random and Gabor dictionaries, respectively. Hence, Theorems 3.1and 3.2 ensure (stable) recovery provided k = O ( √ n ).In contrast to these O ( √ n ) thresholds, which are valid for all x , Tropp [54] developed a general frameworkfor the analysis of Basis Pursuit (3), which is still based on the coherence of a general dictionary, but showsthat (3) is often successful for substantially larger k than those considered in Theorems 3.1 and 3.2. Thiscomes, however, at the cost of assuming a random model on the sparse signal to be recovered. It allows usto prove order O ( n/ log n ) for h A and O ( n/ log( n ) ) for h R recoverability result for the time-frequency-shiftdictionary, Theorem 2.5. We state the results of Tropp, where (cid:107) · (cid:107) , denotes the operator norm given by (cid:107) A (cid:107) , = sup (cid:107) x (cid:107) =1 (cid:107) Ax (cid:107) , and A Λ is the restriction of a matrix A to the columns indexed by Λ. Theorem 3.3 (Tropp [54], Theorem 12).

Let A be an n × N vector dictionary with unit normcolumns and coherence µ . Suppose that Λ is selected uniformly at random among all subsets of { , . . . , N } of size k ≥ . Let s ≥ . Then (cid:112) sµ k log( k/ kN (cid:107) A (cid:107) , ≤ e − / δ (13) implies P ( (cid:107) A ∗ Λ A Λ − Id (cid:107) , ≥ δ ) ≤ ( k/ − s . heorem 3.4 (Tropp [54], Theorem 13). Let A be an n × N dictionary with coherence µ . Suppose Λ ⊆ { , . . . , N } of cardinality k ( | Λ | = k ) is such that (cid:107) A ∗ Λ A Λ − Id (cid:107) , ≤ / . Suppose that x ∈ C N has support Λ with random phases sgn( x r ) , r ∈ T , that are independent and uniformlydistributed on the torus { z, | z | = 1 } . Then with probability at least − N e − / (8 µ k ) the sparse vector x canbe recovered from b = Ax by Basis Pursuit. Cand`es, Romberg and Tao introduced the Restricted Isometry Property (RIP) which is an alternativeperspective to coherence [8, 9].

Definition 3.5.

Let A ∈ C n × N and k < n . The restricted isometry constant δ k = δ k ( A ) is the smallestnumber such that (1 − δ k ) (cid:107) x (cid:107) ≤ (cid:107) Ax (cid:107) ≤ (1 + δ k ) (cid:107) x (cid:107) for all k -sparse x . A is said to satisfy the restricted isometry property if it has small isometry constants, say δ k < / Theorem 3.6 (Cand`es, Romberg and Tao [8]).

Assume that the restricted isometry constants of A satisfy δ k + 3 δ k < . Let x ∈ C N and assume we have noisy data y = Ax + η with (cid:107) η (cid:107) ≤ (cid:15) . Denote by x k the truncated vectorcorresponding to the k largest absolute values of x . Then the solution x of (4) satisﬁes (cid:107) x − x (cid:107) ≤ C (cid:15) + C (cid:107) x − x k (cid:107) √ k . The constants C and C depend only on δ k and δ k . Note that for x k -sparse and noise level (cid:15) = 0, Theorem 3.6 guarantees exact recovery of x by (3).11 . RANDOM MATRICES Many of the recent results in sparse signal recovery with recoverability thresholds for k ≤ Cn/ log n eitherassume that A is a random Gaussian or Bernoulli matrix [4, 9, 15, 46], or partial random Fourier matrix[7, 36, 45, 44, 47]. Recoverability results in these cases can be obtained by establishing the restrictedisometry property, see Deﬁnition 3.5, or through a careful analysis of the geometric structure of the convexhull associated with the columns of A [17, 19, 20, 23, 24]. We apply these results to the matrix identiﬁcationproblem when the matrix has a sparse representation in terms of certain random matrices. Assume all entries of the N matrices Ψ j ∈ R n × m in Ψ are independent standard Gaussian random variablesand h is an arbitrary non-zero vector in R m . Then the entries of the dictionary A = ( Ψ h ) ∈ R n × N whosecolumns are given by Ψ j h , j = 1 , . . . , N , are jointly independent and of the form Z = (cid:80) n(cid:96) =1 g (cid:96) h (cid:96) where the g (cid:96) are independent standard Gaussian random variables. By rotational invariance of the distribution of theGaussian vector ( g , . . . , g n ) the random variable Z has the same distribution as (cid:107) h (cid:107) g where g is a (scalar-valued) standard Gaussian. Hence, the dictionary ( Ψ h ) has the same distribution as (cid:107) h (cid:107) A ∈ R n × N , where A is a random matrix whose entries are independent standard Gaussians. Thus, the existing literature insparse approximation concerning Gaussian matrices applies, see for instance [4, 9, 15, 24, 46] and additionalresults discussed in the remainder of this section.In particular, the restricted isometry property ensures stable recovery with probability at least 1 − ε provided [4, 9, 46] k ≤ c n log( Nnε ) . (14)Hence, by Theorem 3.6 we have stable recovery by (4) in this regime and the statement of Theorem 2.3(a)follows.The work of Donoho and Tanner [19, 20] actually allows for a stronger statement than (14) in the contextof noise-free and exact k -sparse vectors x . A simple version of their results says that most k -sparse Γ canbe recovered with high probability by Basis Pursuit provided k ≤ n N/n ) . For details we refer to [19, 20],and for extension to the noisy setting to Wainwright’s work [55].12 .2. Bernoulli matrix ensemble The recoverability results for Bernoulli matrices in Theorem 2.3(b) are based on establishing the restrictedisometry property given in Deﬁnition 3.5.To this end, we assume that the entries of the N matrices Ψ j ∈ R n × m in Ψ are selected as independent ± − h be an arbitrary non-zero vector.Then an entry of the dictionary A = ( Ψ h ) is given by a pq = n (cid:88) (cid:96) =1 (cid:15) pq(cid:96) h (cid:96) , p = 1 , . . . , m, q = 1 , . . . , N, (15)where the (cid:15) pq(cid:96) are independent Bernoulli variables, that is, the a pq are independent Rademacher series [38].Theorem 4.1 shows that the matrix A has the restricted isometry property with high probability for sparsities k that are nearly linear in m . Hence, by Theorem 3.6, for an arbitrary non-zero choice of h we can recoverany Γ having a k -sparse representation in terms of random Bernoulli matrices from the action of Γ h throughBasis Pursuit (3). Theorem 4.1.

Let h ∈ R m be normalized by (cid:107) h (cid:107) = 1 / √ m . Let A be the random matrix with entriesdeﬁned in (15). Assume δ ∈ (0 , and t > . If n ≥ C δ − ( k log( N/k ) + log(2 e + 24 e/δ ) + t ) . (16) Then with probability at least − e − t the restricted isometry property is satisﬁed, that is, for all Λ ⊂{ , . . . , N } of cardinality at most k it holds that (1 − δ ) (cid:107) x (cid:107) ≤ (cid:107) Ax (cid:107) ≤ (1 + δ ) (cid:107) x (cid:107) for all x supported on Λ . The constant satisﬁes C ≤ . .Proof . Let v ∈ R N be an arbitrary vector. We form the inner product of a row of A with v , X p = n (cid:88) q =1 a pq v q = N (cid:88) q =1 n (cid:88) (cid:96) =1 (cid:15) pq(cid:96) h (cid:96) v q . By independence of the (cid:15) pq(cid:96) , the X p are similarly independent. By Khintchine’s inequality the even momentsof X can be estimated by the moments of a standard Gaussian variable g [38, 42] E [ | X p | z ] ≤ (cid:107) v (cid:107) (cid:107) h (cid:107) (2 z )!2 z z ! = (cid:107) v (cid:107) (cid:107) h (cid:107) E [ | g | z ] , z ∈ N . P ( |(cid:107) Av (cid:107) − (cid:107) v (cid:107) | ≥ (cid:15) (cid:107) v (cid:107) ) ≤ (cid:0) − n ( (cid:15) / − (cid:15) / (cid:1) . By Theorem 2.2 in [46], see also Theorem 5.2 in [4], this implies that the restricted isometry property holdsunder the stated condition on n . The estimate of the constant C follows from [46, Theorem 2.2] as well. (cid:3) Note that for ﬁxed δ and t condition (16) can be rewritten as k ≤ cn/ log( N/k )for some constant c .Combining Theorems 3.6 and 4.1 yields Theorem 2.3(b). Diagonal matrices act as multiplication operators on C n . Using a Fourier expansion of the diagonal, weobserve that any diagonal matrix can be expressed as linear combination of modulation operators M (cid:96) ∈ C n × n , (cid:96) = 0 , . . . , n −

1, deﬁned in (5). We now consider the case that only a small number of components ofthe output of a diagonal operator Γ can be measured; the assumption that Γ is sparse in the dictionary ofmodulation operators shall be used to recover Γ from these components.To this end, let Ω be a subset of { , . . . , n − } of cardinality m and denote by M Ω (cid:96) ∈ C m × m the submatrixof M (cid:96) with columns and rows restricted to the index set Ω. Let Ψ Ω = { M Ω (cid:96) , (cid:96) = 0 , . . . , n − } and h = = (1 , . . . , T . If Γ Ω = (cid:80) n − (cid:96) =0 x (cid:96) M Ω (cid:96) then Γ Ω coincides with the restriction of Γ1 = (cid:80) n − (cid:96) =0 x (cid:96) M (cid:96) to the indices in Ω.The matrix A whose columns are the elements of the dictionary ( Ψ Ω ) = { M Ω (cid:96) , (cid:96) = 0 , . . . , n − } isprecisely a row submatrix of the Fourier matrix, A = A Ω = ( e πir(cid:96) ) r ∈ Ω ,(cid:96) =0 ,...,n − ∈ C m × n . If the subset Ω is chosen uniformly at random among all subsets of size m then A Ω is a random matrix.This random partial Fourier matrix was studied in [7, 9, 47], see also [45] for a slight variation. Indeed,14nder the condition k ≤ c m log ( n ) log( ε − )the restricted isometry property holds with probability at least 1 − ε [47] and by Theorem 3.6 we obtainstable recovery of all matrices having a sparse representation in terms of Ψ Ω .

5. TIME-FREQUENCY SHIFT DICTIONARIES

In this section we establish coherence results for the dictionary of time-frequency shift matrices and proveTheorems 2.4 and 2.5.

We apply known recovery results [22, 27, 52, 53, 54] for dictionaries with small coherence (12). Assuming (cid:107) h (cid:107) = 1, the coherence, (12), of Gabor systems is µ = max ( (cid:96),p ) (cid:54) =( (cid:96) (cid:48) ,p (cid:48) ) |(cid:104) M (cid:96) T p h, M (cid:96) (cid:48) T p (cid:48) h (cid:105)| . (17)Based on results by Alltop in [3], Strohmer and Heath showed in [51] that the coherence (17) of G h A givenin (7) satisﬁes µ = 1 √ n (18)for n prime. This is almost optimal since the general lower bound in [51] for the coherence of frames with n elements in C n yields µ ≥ √ n +1 .Unfortunately, the coherence (17) of h A applies only for n prime. For arbitrary n we consider the randomwindow h R . Theorem 5.1.

Let n ∈ N and choose a random window h R with entries h Rq = 1 √ n (cid:15) q , q = 0 , . . . , n − , where the (cid:15) q are independent and uniformly distributed on the torus { z ∈ C , | z | = 1 } . Let µ be the coherenceof the associated Gabor dictionary (17), then for α > and n even, P (cid:0) µ ≥ α √ n (cid:1) ≤ n ( n − e − α / , hile for n odd, P (cid:0) µ ≥ α √ n (cid:1) ≤ n ( n − (cid:16) e − n − n α / + e − n +1 n α / (cid:17) . (19)Up to the constant factor α , the coherence in Theorem 5.1 comes close to the lower bound µ ≥ √ n +1 with high probability. Theorems 2.4 and 2.5 will follow from these order O (1 / √ n ) coherence results in thissection and the Theorems 3.1 and 3.2 of [22, 27, 52, 53] and Theorems 3.3 and 3.4 of Tropp [54] respectively. Proof of Theorem 5.1.

The technical details for n even and odd are slightly diﬀerent, for conciseness weonly state the proof for n even, and outline the proof for n odd.A direct computation shows that |(cid:104) M (cid:96) (cid:48) T p (cid:48) h R , M (cid:96) T p h R (cid:105)| = |(cid:104) M (cid:96) − (cid:96) (cid:48) T p − p (cid:48) h R , h R (cid:105)| and, therefore, it suﬃces to consider (cid:104) M (cid:96) T p h R , h R (cid:105) , (cid:96), p = 0 , . . . , n −

1; furthermore, as (cid:104) M (cid:96) h R , h R (cid:105) = (cid:104) M (cid:96) , | h R | (cid:105) = 0 for (cid:96) (cid:54) = 0, we consider only the case p (cid:54) = 0.Writing (cid:15) q = e πiy q with y q ∈ [0 ,

1) we obtain (cid:104) M (cid:96) T p h R , h R (cid:105) = 1 n n − (cid:88) q =0 e πi q(cid:96)n (cid:15) q − p (cid:15) q = 1 n n − (cid:88) q =0 e πi ( y q − p − y q + q(cid:96)n ) , where (cid:15) q − p = (cid:15) n + q − p if q − p <

0, that is, the indices are understood modulo n . Set δ ( p,(cid:96) ) q = e πi ( y q − p − y q + q(cid:96)n ) , and note that δ ( p,(cid:96) ) q is uniformly distributed on the torus T . However, the δ ( p,(cid:96) ) q , q = 1 , . . . , n , are no longerjointly independent. But nevertheless, as we demonstrate in the following, we can split all variables intotwo subsets of independent variables.If p = 1, p = n −

1, or if neither p nor n − p divide n , then the n/ (cid:15) (cid:15) p , (cid:15) p (cid:15) p , . . . , (cid:15) p ( n/ − (cid:15) pn/ are jointly independent, as well as the remaining n/ (cid:15) pn/ (cid:15) p ( n/ , . . . , (cid:15) p ( n − (cid:15) . The indices are16gain understood modulo n . If p ≥ n − p ≥ n , then we form the p random vectors Y =( (cid:15) (cid:15) p , (cid:15) p (cid:15) p , . . . , (cid:15) n − p (cid:15) ) , Y =( (cid:15) (cid:15) p +1 , (cid:15) p +1 (cid:15) p +1 , . . . , (cid:15) n − p +1 (cid:15) ) , ... Y p =( (cid:15) p − (cid:15) p − , (cid:15) p − (cid:15) p − , . . . , (cid:15) n − (cid:15) p − ) . These vectors are jointly independent. Moreover, p ≤ n/ Y into two sets Λ p and Λ p with | Λ p | , | Λ p | ≥ adjacent elements of the form { (cid:15) k + jp (cid:15) k +( j +1) p , (cid:15) k +( j +1) p (cid:15) k +( j +2) p } withpossibly a remaining single element subset. Then all subsets are jointly independent and the two elementsinside a subset are independent as well.Now by forming unions ∪ pi =1 Λ i and ∪ pi =1 Λ i we can always partition the index set { , . . . , n − } into twosubsets Λ , Λ ⊂ { , . . . , n − } with | Λ | = | Λ | = n/ { δ ( p,(cid:96) ) q , q ∈ Λ i } arejointly independent for both i = 1 , (cid:15) q , q = 1 , . . . , n , of random variables which are uniformlydistributed on the torus, P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) q =1 (cid:15) q (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ nu  ≤ e − nu / . (20)Using the pigeonhole principle and the inequality (20) we obtain P (cid:0) |(cid:104) M (cid:96) T p h R , h R (cid:105)| ≥ t (cid:1) = P (cid:0)(cid:12)(cid:12) n − (cid:88) q =0 δ ( p,(cid:96) ) q (cid:12)(cid:12) ≥ nt (cid:1) ≤ P (cid:0)(cid:12)(cid:12) (cid:88) q ∈ Λ δ ( p,(cid:96) ) q (cid:12)(cid:12) ≥ nt/ (cid:1) + P (cid:0)(cid:12)(cid:12) (cid:88) q ∈ Λ δ ( p,(cid:96) ) q (cid:12)(cid:12) ≥ nt/ (cid:1) ≤ − nt / . Forming the union bound over all possible ( p, (cid:96) ) ∈ { , . . . , n − } \ { (0 , } and choosing t = α/ √ n yieldsthe statement of Theorem 5.1 for n even.The proof of Theorem 5.1 for n odd uses essentially the same technique as for n even, with the diﬀerencethat the random variables δ ( m,(cid:96) ) k are grouped into sets of unequal cardinality, | Λ | = ( n − / | Λ | =( n + 1) /

2. For large n the probability tail bounds are nearly the same for n even (21) and n odd (19). (cid:3) .2. Proof of Theorem 2.4 Part (a) follows directly from Theorem 3.1 and the coherence of G h A (18).Part (b) follows from Theorem 3.1 and Theorem 5.1. In fact, the probability that the condition µ < (2 k − − of Theorem 3.1 does not hold for G h R is estimated by P ( µ ≥ (2 k − − ) ≤ n exp (cid:18) − n k − (cid:19) . Requiring that the latter term is less than e − t and solving for k gives (9). (cid:3) Having established coherence results for G h A and G h R in Section 5.1, Theorem 2.5 follows from Theorems 3.3and 3.4 of Tropp [54] as shown below.(a) Recall from (18) that the coherence for G h A satisﬁes µ = n − / . Next, observe that h A unimodularimplies that the columns of G h A form n orthonormal bases, and, hence, n = (cid:107) ( G h A ) ∗ (cid:107) , = (cid:107) G h A (cid:107) , .Plugging this into condition (13) of Tropp’s theorem with δ = 1 / √ s (cid:114) k log( k/ n + 2 kn = e − / / . Solving for s yields (11). Applying Theorem 3.4, which requires s ≥

1, shows that condition (13) inTheorem 3.3 holds for A = G h A and we conclude that (cid:107) A ∗ Λ A Λ − Id (cid:107) , ≤ / − ( k/ − s .Now let δ = (cid:107) A ∗ Λ A Λ − Id (cid:107) , . Then P (BP does not recover Γ from Γ h A ) ≤ P (BP does not recover Γ from Γ h A | δ ≤ /

2) + P ( δ > / . Thus by Theorem 3.4 we can lower bound the probability that recovery is successful by1 − (( k/ − s + 2 n exp( − n k )) . Furthermore, observe that 2 n exp( − n k ) ≤ (cid:15) under condition (10).18b) Let µ be the coherence associated with the random Gabor window h R . Setting α = p log n inTheorem 5.1 we obtain that the probability that µ exceeds (cid:113) p log nn is smaller than4 n ( n −

1) exp( − α / ≤ n − p/ . Set σ = p/ −

2, i.e., p = 4( σ + 2), and assume for the moment that µ ≤ (cid:113) p log nn . Then condition (13) with δ = 1 / √ s (cid:114) σ + 2) k log nn + 2 kn = e − / / . Requiring s ≥ (cid:107) A ∗ Λ A Λ − Id (cid:107) , ≤ / A = ( G h R ), with probability at least 1 − ( k/ − s .Similarly to the proof of part (a), we estimate the probability of successful recovery by P (BP recovers Γ from Γ h R ) ≥ − (cid:16) P (cid:0) BP does not recover Γ from Γ h R | δ ≤ / µ ≤ p log nn (cid:1) + P (cid:0) δ > / | µ ≤ p log nn (cid:1) + P (cid:0) µ > p log nn (cid:1) (cid:17) . By Theorem 3.3, the probability that Γ can be reconstructed from Γ h R by Basis Pursuit (3) exceeds1 − (2 n exp( − n p log( n ) k ) + ( k/ − s + 4 n − σ ) . Finally, observe that the term 2 n exp( − np log( n ) k ) is less than (cid:15) provided k ≤ n σ + 2) log( n ) log(2 n /(cid:15) ) . Plancherel’s theorem and (cid:92) M (cid:96) T p h = T (cid:96) M n − p (cid:98) h = σ M n − p T (cid:96) (cid:98) h with | σ | = 1 implies that the coherence remainsthe same under Fourier transform of the window, that is, µ h = sup ( (cid:96),p ) (cid:54) =( (cid:96) (cid:48) ,p (cid:48) ) |(cid:104) M (cid:96) T p h , M (cid:96) (cid:48) T p (cid:48) h (cid:105)| = sup ( (cid:96),p ) (cid:54) =( (cid:96) (cid:48) ,p (cid:48) ) |(cid:104) (cid:92) M (cid:96) T p h , (cid:92) M (cid:96) (cid:48) T p (cid:48) h (cid:105)| = sup ( (cid:96),p ) (cid:54) =( (cid:96) (cid:48) ,p (cid:48) ) |(cid:104) M n − p T (cid:96) ˆ h , M n − p (cid:48) T (cid:96) (cid:48) ˆ h (cid:105)| = µ ˆ h . Since all of the results concerning the dictionary of time-frequency shift matrices stated above are based onthe coherence this proves the claim. 19 . MULTIPLE TEST VECTORS

In addition to the goal of recovering the operator Γ from the operator output caused by a single test signal,we may also consider using two or more test signals h , . . . , h r to identify Γ . In this case, the vector ofconcatenated observations Γ h , . . . , Γ h r is given as  Γ h ... Γ h r  =  Ψ h . . . Ψ N h ... ... Ψ h r . . . Ψ N h r  x =  Ψ h ... Ψ h r  x , and our sparse matrix identiﬁcation task is again reduced to a sparse signal recovery problem. Although wewill not pursue this task in depth here, we will make some remarks and state extensions of our results tothis more general setting.Intuitively, using several test vectors instead of a single one should increase the maximal sparsity k thatallows for perfect reconstruction as more information can be exploited. However, it is only interesting toconsider r < m since any operator Γ ∈ C n × m can be characterized by its action on m basis vectors. Thefollowing lemma on coherence of concatenated measurement matrices suggests that the maximal recoverablesparsity does not decrease. Its proof is straightforward and therefore omitted. Lemma 6.1.

Let h , . . . , h r ∈ C m such that the matrices ( Ψ h j ) have coherence µ j . Then the coherence µ of the normalized concatenated matrix A h ,..., h r = 1 √ r  ( Ψ h )( Ψ h ) ... ( Ψ h r )  = 1 √ r  Ψ h . . . Ψ N h ... ... Ψ h r . . . Ψ N h r  satisﬁes µ ≤ r ( µ + µ + · · · + µ r ) ≤ max j =1 ,...,r µ j . A straightforward extension of the proof of Theorem 5.1 yields the following result in the setting oftime-frequency shifts and several randomly chosen h Rj , j = 1 , . . . , r . Theorem 6.2.

Let n ∈ N be even and choose random windows h Rj , j = 1 , . . . , r , with entries ( h Rj ) q = 1 √ n (cid:15) qj , q = 0 , . . . , n − , here the (cid:15) qj are independent and uniformly distributed on the torus { z ∈ C , | z | = 1 } . Let µ be the coherenceof the concatenated matrix √ r  ( G h R ) ... ( G h Rr )  where G is deﬁned in (6). Then for α > P (cid:0) µ ≥ α √ rn (cid:1) ≤ n ( n − e − α / . (21)Similarly as in Theorem 2.4(b) we deduce that the condition k ≤ (cid:114) rn n + log 4 + t implies that Basis Pursuit (or Orthogonal Matching Pursuit) recovers all k -sparse Γ from Γ h R , . . . , Γ h Rr with probability at least 1 − e − t . Hence, the maximal provable sparsity increases at least by a factor of √ r .Of course, we may as well apply Tropp’s result based on random support sets and phases to arrive at astatement analogous to Theorem 2.5. Theorem 6.3.

Let n be even and k ≥ and let Λ be chosen uniformly at random among all sub-sets of { , . . . , n − } of cardinality k . Suppose further that x ∈ C n has support Λ with random phases (sgn( x (cid:96)p )) ( (cid:96),p ) ∈ Λ that are independent and uniformly distributed on the torus { z, | z | = 1 } . Let Γ = (cid:88) ( (cid:96),p ) ∈ Λ x (cid:96)p M (cid:96) T p . Choose r independent random windows h R , . . . , h Rr according to (8). Assume k ≤ rn σ + 2) log n log(2 n /(cid:15) ) for some σ > and s := 1576( σ + 2) (cid:18) e − / / − kn (cid:19) · rnk log( k/ ≥ . (22) Then with probability at least − ( (cid:15) + 4 n − σ + ( k/ − s ) Basis Pursuit (3) recovers Γ from Γ h R , . . . , Γ h Rr . ranslation m odu l a t i on

10 20 30 40 501020304050 00.20.40.60.811.21.41.6 (a) translation m odu l a t i on

10 20 30 40 501020304050 0.20.40.60.811.21.41.6 (b) translation m odu l a t i on

10 20 30 40 501020304050 0.0050.010.0150.020.0250.03 (c)

Figure 1. (a) Original 7-sparse coeﬃcient vector ( n = 59) in the time-frequency plane. (b) Reconstructionby Basis Pursuit using the Alltop window h A . (c) For comparison, the reconstruction by traditional (cid:96) -minimization (23).Roughly speaking, with the chosen probabilistic model on the sparse coeﬃcient vector x , the provablemaximal sparsity k that allows for recovery, increases by a factor of r when taking r test vectors instead ofonly one. This fact is illustrated in Figure 5 in Section 7.

7. NUMERICAL RESULTS

Theorem 2.5 can be tested empirically for various values of n by trying a number of sparsity levels k andrecording the fraction of times (3) recovers the true k -sparse coeﬃcient vector x .But before doing so, we illustrate in Figure 1 the recovery method for matrices which have a sparserepresentation in the dictionary of time–frequency shift matrices as considered in Theorem 2.5. A 7-sparsecoeﬃcient vector x in the time-frequency plane is chosen and reconstructed from Γ h A = (cid:80) (cid:96),p x (cid:96)p M (cid:96) T p h A by Basis Pursuit. As comparison, x is reconstructed by a traditional reconstruction by (cid:96) -minimization,min (cid:107) x (cid:107) subject to ( Ψ h A ) x = Γ h A . (23)For the Alltop window h A in (7) we consider the values of n prime from 11 to 59, for the randomwindow h R in equation (8) we consider the values of n prime from 11 to 59 as well as n = 10 + 4 j for j = 0 , , . . . ,

12. Each empirical test consists of generating a random k -sparse x ∈ C n with non-zero entries x q = r q exp(2 πiθ q ), with r q drawn independently from the Gaussian N (0 ,

1) distribution, and θ q drawnindependently and uniformly from [0 , .1 0.15 0.2 0.25 0.3 0.3500.10.20.30.40.50.60.70.80.91 k/nE(Y nk ) Fraction of successful recovery & Logistic regression Figure 2.

Empirical veriﬁcation of Theorem 2.5 without noise. For the random window h R with n = 30the mean response of Y nk (dash-dot) and ﬁtted logistic regression model E ( Y nk ), (solid), plotted against thefractional sparsity k/n . For the Alltop window h A with n = 43 the mean response of Y nk (dot) and ﬁttedlogistic regression model E ( Y nk ), (dash), plotted against the fractional sparsity k/n .For each value of n , 1000 tests are computed per value of k = 1 , , . . . , n −

1. A test is consideredsuccessful if Basis Pursuit (3) recovers all components of the coeﬃcient vector x with 10 − error tolerance.The successful recovery of x , and, hence, of Γ from Γ h A or Γ h R is recorded in Y nk as a 1, and failure torecover as a 0. Following the empirical examination of phase transitions in [18], we approximate the observedprobability distribution by ﬁtting the mean response of Y nk using the logistic regression model, [31], E ( Y nk ) = exp( β ( n ) + β ( n ) k )1 + exp( β ( n ) + β ( n ) k ) . (24)For illustration purposes, the ﬁtted response for windows h A with n = 43 and h R with n = 30 is shownin Figure 2 along with the mean response of Y nk .The phase transition behaviors are often observed through the fractional sparsity ratio k/n , and thematrix so-called undersampling rate n/N , here 1 /n for G h A and G h R [24]. Contours of the ﬁtted logisticregression models for time-frequency shift dictionaries with identiﬁers h A and h R are shown in Figure 3 (a)and (b) respectively. To facilitate a quantitative inspection of the contours in Figure 3 and the theoreticalresults of [24] we overlay the contours in Figure 3 with the level curve for 93% success rate (dash) and1 / (2 log n ) (solid). The curve 1 / (2 log n ) is known to be the threshold for overwhelming probability ofsuccessful recovery in the case of Gaussian random matrices for large n [24]. It is observed in Figure 3 thatthe curve 1 / (2 log n ) remains below the 93% success rate level curve, indicating consistence of the empirical23 /nk/n E(Y nk ) for h A (a) nk ) for h R (b) Figure 3.

Empirical veriﬁcation of Theorem 2.5 for h A (a) and h R (b) without noise. Contours of the ﬁttedlogistic regression model (gray), the 93% success rate contour (dashed), and 1 / (2 log n ) (solid). Figure 2shows vertical slices for 1 /

43 (a) and 1 /

30 (b).results with the phase transition 1 / (2 log n ) conjectured for the class of time-frequency shift matrices appliedto identiﬁers h A and h R . Moreover, the curve 1 / (2 log n ) increasingly falls below the 93% success rate levelcurve as n increases, indicating improved agreement in the large n limit. Note that this conjectured phasetransition 1 / (2 log n ) is larger than that proven in the main Theorem 2.5, both in order (as u = 0 here), aswell as in the constant.As stated earlier, in practice the measurements Γ h are observed with noise and although Γ can be wellapproximated by a k -sparse representation, it is rarely strictly k -sparse. For both of these reasons, therecovery algorithm (3) is not often used in practice, rather (4) is used to allow for an inexact ﬁt of themeasurements.In Figure 4 we empirically test Theorem 2.5 using (4) rather than (3) for the reconstruction algorithm.We choose the same values of k and n , and the same number of tests were performed as for Figure 3. Thenon-zero entries in x are also selected from the same distribution as was used to generate Figure 3. Additivenoise is simulated at a level of 25 dB signal to noise ratio; that is, η is added to Γ h with the entries in η drawn independently from the Gaussian N (0 ,

1) and η is normalized to (cid:107) η (cid:107) = (cid:107) Γ h (cid:107) · − / .Unlike the solution of (3) for which the exact solution can be exactly k -sparse, and for which numericalalgorithms can compute approximations of arbitrary precision, the solution of (4) from noisy measurements24 /nk/n E(Y nk ) for h A (a) nk ) for h R (b) Figure 4.

Empirical veriﬁcation of Theorem 2.5 for h A (a) and h R (b) in the noisy setting, with (3)replaced by (4) and additive noise of 25 dB signal to noise ratio. Contours of the ﬁtted logistic regressionmodel (gray), the 93% success rate contour (dash), and 1 / (2 log n ) (solid).will not recover the solution exactly. For our numerical experiments involving noisy measurements, thevector x associated with Γ resulting from the solution of (4) is only considered to have been successfullyrecovered if the largest k entries of the recovered x (cid:48) have the same support set Λ as x . Alternative metricsof successful recovery, such as (cid:96) error or Signal to Noise Ratio (SNR), are less demanding than requiringa match of the support set; moreover, the support set metric was previously examined in this setting byWainwright [55] and following this convention allows for a more direct comparison. The inequality ﬁtparameter (cid:15) in (4) is selected to be at the noise level 10 − / .As in the noiseless setting, we approximate the probability distribution of the empirical observations Y nk using the logistic regression model (24). Contours of the ﬁtted logistic regression models for time-frequencyshift dictionaries with identiﬁers h A and h R are shown in Figure 4 (a) and (b) respectively. Overlaying thesecontours is the level curve for 93% success rate (dash) and 1 / (2 log n ) (solid). Unlike the noiseless case (3),it was shown that the threshold for overwhelming probability of successful recovery in the case of Gaussianrandom n × n matrices with noise using (4) is 1 / (4 log n ), [55]; however, we observe in Figure 4 that1 / (2 log n ) ﬁts the empirical data better in this instance. As Wainwright considered the Gaussian setting,this empirical observation for the Gabor system does not contradict results in [55], but the diﬀerence isnoteworthy. 25

10 15 20 25 3000.10.20.30.40.50.60.70.80.91 kE(Y nk ) Fraction of successful recovery for n=30 with one(dash−dot), two (solid), and three (dash) test vectors Figure 5.

Empirical veriﬁcation of Theorem 6.3 without noise. For the random windows h R , h R , h R with n = 30 the fraction of successful recovery based on G h R (dash-dot), G h R and G h R (solid), and G h R , G h R and G h R (dash) test vectors.In Figure 5 we illustrate the performance of Basis Pursuit when using multiple test signals as discussedin Section 6, in particular in Theorem 6.3. Figure 5 was obtained using the same procedure that providedFigure 2. References [1] D. Achlioptas. Database-friendly random projections. In

Proc. 20th Annual ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems , pages 274–281, 2001.[2] M. Akcakaya and V. Tarokh. Performance bounds on sparse representations using redundant frames.

Preprint , 2007.[3] W. O. Alltop. Complex sequences with low periodic correlations.

IEEE Trans. Inf. Theory , 26(3):350–354, 1980.[4] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the restricted isometryproperty for random matrices.

Constr. Approx. , to appear.[5] P. Bello. Characterization of randomly time-variant linear channels.

IEEE Trans. Commun. , 11:360–393, 1963.[6] S. Boyd and L. Vandenberghe.

Convex Optimization.

Cambridge Univ. Press, 2004.267] E. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction fromhighly incomplete frequency information.

IEEE Trans. Inf. Theory , 52(2):489–509, 2006.[8] E. Cand`es, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measure-ments.

Comm. Pure Appl. Math. , 59(8):1207–1223, 2006.[9] E. Cand`es and T. Tao. Near optimal signal recovery from random projections: universal encodingstrategies?

IEEE Trans. Inf. Theory , 52(12):5406–5425, 2006.[10] I. Cavdar. Performance analysis in non-Rayleigh and non-Rician communication channels.

Computers& Electrical Engineering , 28.[11] M. Cetin and B. Sadler. Semi–blind sparse channel estimation with constant modulus symbols. In

Proc. IEEE ICASSP 05 , volume 3, pages iii/561– iii/564, Atlanta (GA), 2005.[12] O. Christensen.

An introduction to frames and Riesz bases . Applied and Numerical Harmonic Analysis.Birkh¨auser Boston Inc., Boston, MA, 2003.[13] L. M. Correia.

Wireless Flexible Personalized Communications . John Wiley & Sons, Inc., New York,NY, USA, 2001.[14] S. Cotter and B. Rao. Sparse channel estimation via matching pursuit with applications to equalization.

IEEE Trans. on Comm. , 50(3):374–377, 2002.[15] D. Donoho. Compressed sensing.

IEEE Trans. Inf. Theory , 52(4):1289–1306, 2006.[16] D. Donoho, I. Drori, J.-L. Starck, and Y. Tsaig. Sparse solution of underdetermined linear equationsby stagewise orthogonal matching pursuit.

Preprint , 2006.[17] D. Donoho and J. Tanner. Sparse nonnegative solutions of underdetermined linear equations by linearprogramming.

Proc. Nat. Acad. Sci. , 102(27):9446–9451, 2005.[18] D. Donoho and Y. Tsaig. Fast solution of l1-norm minimization problems when the solution may besparse.

Preprint , 2006. 2719] D. L. Donoho. Neighborly polytopes and sparse solutions of underdetermined linear equations.

Preprint ,2005.[20] D. L. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional todimension.

Discrete Comput. Geom. , 35(4):617–652, 2006.[21] D. L. Donoho and M. Elad. Optimally sparse representations in general (non-orthogonal) dictionariesvia (cid:96) minimization. Proc. Nat. Acad. Sci. , 100:2197–2202, 2002.[22] D. L. Donoho, M. Elad, and V. N. Temlyakov. Stable recovery of sparse overcomplete representationsin the presence of noise.

IEEE Trans. Inf. Theory , 52(1):6–18, 2006.[23] D. L. Donoho and J. Tanner. Neighborliness of randomly projected simplices in high dimensions.

Proc.Natl. Acad. Sci. USA , 102(27):9452–9457, 2005.[24] D. L. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when the projectionradically lowers dimension.

Preprint , 2006.[25] P. Georgiev and A. Ralescu. Clustering on subspaces and sparse representation of signals.

Circuits andSystems , 2:1843–1846, 2005.[26] A. Gilbert and J. Tropp. Signal recovery from random measurements via orthogonal matching pursuit.

IEEE Trans. Inform. Theory , to appear.[27] R. Gribonval and P. Vandergheynst. On the exponential convergence of matching pursuits in quasi-incoherent dictionaries.

IEEE Trans. Inform. Theory , 52(1):255–261, 2006.[28] N. Grip and G. Pfander. A discrete model for the eﬃcient analysis of time-varying narrowband com-munication channels.

Multidim Syst Sign P , to appear.[29] K. Gr¨ochenig.

Foundations of Time-Frequency Analysis . Applied and Numerical Harmonic Analysis.Birkh¨auser, Boston, MA, 2001.[30] D. Han, S.-P. Kim, and J. Principe. Sparse channel estimation with regularization method usingconvolution inequality for entropy. In

Proc. IJCNN’05, International Joint Conf. on Neural Networks ,volume 4, pages 2359–2362, August 2005. 2831] T. Hastie, R. Tibshirani, and J. Friedman.

The Elements of Statistical Learning . Springer, 2001.[32] M. Herman and T. Strohmer. High resolution radar via compressed sensing.

Preprint , 2007.[33] S. Kim, K. Ksh, M. Lustig, S. Boyd, and D. Gorinevsky. A method for large-scale l1-regularized leastsquares problems with applications in signal processing and statistics.

Preprint , 2007.[34] W. Kozek and G. Pfander. Identiﬁcation of operators with bandlimited symbols.

SIAM J. Math. Anal. ,37(3):867–888, 2006.[35] F. Krahmer, G. Pfander, and P. Rashkov. Uncertainty principles for time–frequency representationson ﬁnite abelian groups. 2006.[36] S. Kunis and H. Rauhut. Random sampling of sparse trigonometric polynomials II - orthogonal matchingpursuit versus basis pursuit.

Found. Comput. Math. , to appear.[37] J. Lawrence, G. Pfander, and D. Walnut. Linear independence of Gabor systems in ﬁnite dimensionalvector spaces.

J. Fourier Anal. Appl. , 11(6):715–726, 2005.[38] M. Ledoux and M. Talagrand.

Probability in Banach Spaces. Isoperimetry and Processes.

Springer-Verlag, Berlin, Heidelberg, New York, 1991.[39] D. Middleton. Channel modeling and threshold signal processing in underwater acoustics: An analyticaloverview.

IEEE J. Oceanic Eng. , 12(1):4–28, 1987.[40] J. Parsons, D. Demery, and A. Turkmani. Sounding techniques for wideband mobile radio channels: areview.

IEE Proceedings-1 , 138(5):437–446, 1991.[41] M. P¨atzold.

Mobile Fading Channels: Modelling,Analysis and Simulation . John Wiley & Sons, Inc.,New York, NY, USA, 2001.[42] G. Pevskir and A. Shiryaev. The Khintchine inequalities and martingale expanding sphere of theiraction.

Russ. Math. Surv. , 50(5):849–904, 1995.[43] G. Pfander and D. Walnut. Measurement of time–variant channels.

IEEE Trans. Info. Theory ,52(11):4808–4820, 2006. 2944] H. Rauhut. Stability results for random sampling of sparse trigonometric polynomials.

Preprint , 2006.[45] H. Rauhut. Random sampling of sparse trigonometric polynomials.

Appl. Comput. Harm. Anal. ,22(1):16–42, 2007.[46] H. Rauhut, K. Schnass, and P. Vandergheynst. Compressed sensing and redundant dictionaries.

Preprint , 2006.[47] M. Rudelson and R. Vershynin. Sparse reconstruction by convex relaxation: Fourier and Gaussianmeasurements. In

Proc. CISS 2006 (40th Annual Conference on Information Sciences and Systems) ,2006.[48] S. Sanyal, S. Kukreja, E. Perreault, and D. Westwick. Identiﬁcation of linear time varying systemsusing basis pursuit. In

Proceedings of the IEEE EMBS 2005 .[49] M. Skolnik.

Introduction to Radar Systems . McGraw-Hill Book Company, New York, 1980.[50] M. Stojanovic.

Underwater Acoustic Communications , volume 22, pages 688–698. John Wiley & Sons,1999.[51] T. Strohmer and R. W. Heath. Grassmannian frames with applications to coding and communication.

Appl. Comput. Harmon. Anal. , 14(3):257–275, 2003.[52] J. Tropp. Greed is good: Algorithmic results for sparse approximation.

IEEE Trans. Inf. Theory ,50(10):2231–2242, 2004.[53] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals.

IEEE Trans.Inf. Theory , 51(3):1030–1051, 2006.[54] J. A. Tropp. On the conditioning of random subdictionaries.