[PDF] Orthogonal Matching Pursuit: A Brownian Motion Analysis

Abstract

A well-known analysis of Tropp and Gilbert shows that orthogonal matching pursuit (OMP) can recover a k-sparse n-dimensional real vector from 4 k log(n) noise-free linear measurements obtained through a random Gaussian measurement matrix with a probability that approaches one as n approaches infinity. This work strengthens this result by showing that a lower number of measurements, 2 k log(n - k), is in fact sufficient for asymptotic recovery. More generally, when the sparsity level satisfies kmin <= k <= kmax but is unknown, 2 kmax log(n - kmin) measurements is sufficient. Furthermore, this number of measurements is also sufficient for detection of the sparsity pattern (support) of the vector with measurement errors provided the signal-to-noise ratio (SNR) scales to infinity. The scaling 2 k log(n - k) exactly matches the number of measurements required by the more complex lasso method for signal recovery with a similar SNR scaling.

Full PDF

aa r X i v : . [ c s . I T ] M a y ORTHOGONAL MATCHING PURSUIT: A BROWNIAN MOTION ANALYSIS 1

Orthogonal Matching Pursuit:A Brownian Motion Analysis

Alyson K. Fletcher and Sundeep Rangan

Abstract —A well-known analysis of Tropp and Gilbert showsthat orthogonal matching pursuit (OMP) can recover a k -sparse n -dimensional real vector from m = 4 k log( n ) noise-free linear measurements obtained through a random Gaussianmeasurement matrix with a probability that approaches oneas n → ∞ . This work strengthens this result by showingthat a lower number of measurements, m = 2 k log( n − k ) ,is in fact sufﬁcient for asymptotic recovery. More generally,when the sparsity level satisﬁes k min ≤ k ≤ k max but isunknown, m = 2 k max log( n − k min ) measurements is sufﬁcient.Furthermore, this number of measurements is also sufﬁcient fordetection of the sparsity pattern (support) of the vector withmeasurement errors provided the signal-to-noise ratio (SNR)scales to inﬁnity. The scaling m = 2 k log( n − k ) exactly matchesthe number of measurements required by the more complex lassomethod for signal recovery with a similar SNR scaling. Index Terms —compressed sensing, detection, lasso, orthogo-nal matching pursuit, random matrices, sparse approximation,sparsity, subset selection

I. I

NTRODUCTION

Suppose x ∈ R n is a sparse vector, meaning its numberof nonzero entries k is smaller than n . The support of x isthe locations of the nonzero entries and is sometimes calledits sparsity pattern . A common sparse estimation problem isto infer the sparsity pattern of x from linear measurements ofthe form y = Ax + w , (1)where A ∈ R m × n is a known measurement matrix, y ∈ R m represents a vector of measurements and w ∈ R m is a vectorof measurement errors (noise).Sparsity pattern detection and related sparse estimationproblems are classical problems in nonlinear signal processingand arise in a variety of applications including wavelet-basedimage processing [1] and statistical model selection in linearregression [2]. There has also been considerable recent interestin sparsity pattern detection in the context of compressedsensing , which focuses on large random measurement matrices A [3]–[5]. It is this scenario with random measurements thatwill be analyzed here.Optimal subset recovery is NP-hard [6] and usually involvessearches over all the (cid:0) nk (cid:1) possible support sets of x . Thus, most This work was supported in part by a University of California President’sPostdoctoral Fellowship. The material in this paper was presented in part atthe Conference on Neural Information Processing Systems, Vancouver, BC,Canada, December 2009.A. K. Fletcher (email: [email protected]) is with the Departmentof Electrical Engineering and Computer Sciences, University of California,Berkeley.S. Rangan (email: [email protected]) is with the Polytechnic Institute ofNew York University, Brooklyn, NY. attention has focused on approximate methods. One simple andpopular approximate algorithm is orthogonal matching pursuit(OMP) [7]–[9]. OMP is a greedy method that identiﬁes thelocation of one nonzero entry of x at a time. A version ofthe algorithm will be described in detail below in Section II.The best known analysis of the detection performance of OMPfor large random matrices is due to Tropp and Gilbert [10],[11]. Among other results, Tropp and Gilbert show that when A has i.i.d. Gaussian entries, the measurements are noise-free( w = 0 ), and the number of measurements scales as m ≥ (1 + δ )4 k log( n ) (2)for some δ > , the OMP method will recover the correctsparse pattern of x with a probability that approaches one as n and k → ∞ . The analysis uses a deterministic sufﬁcientcondition for success on the matrix A based on a greedyselection ratio introduced in [12]. A similar deterministiccondition on A was presented in [13], and a condition usingthe restricted isometry property was given in [14].Numerical experiments reported in [10] suggest that asmaller number of measurements than (2) may be sufﬁcient forasymptotic recovery with OMP. Speciﬁcally, the experimentssuggest that the constant 4 can be reduced to 2.Our main result, Theorem 1 below, does a bit better thanproving this conjecture. We show that the scaling in measure-ments m ≥ (1 + δ )2 k log( n − k ) (3)is sufﬁcient for asymptotic reliable recovery with OMP pro-vided both n − k and k → ∞ . Theorem 1 goes further byallowing uncertainty in the sparsity level k .We also improve upon the Tropp–Gilbert analysis by ac-counting for the effect of the noise w . While the Tropp–Gilbertanalysis requires that the measurements are noise-free, weshow that the scaling (3) is also sufﬁcient when there is noise w , provided the signal-to-noise ratio (SNR) goes to inﬁnity.The main signiﬁcance of the new scaling (3) is that itexactly matches the conditions for sparsity pattern recoveryusing the well-known lasso method. The lasso method, whichwill be described in detail in Section IV, is based on aconvex relaxation of the optimal detection problem. Thebest analysis of sparsity pattern recovery with lasso is dueto Wainwright [15], [16]. He showed in [15] that under asimilar high SNR assumption, the scaling (3) in number ofmeasurements is both necessary and sufﬁcient for asymptotic ORTHOGONAL MATCHING PURSUIT: A BROWNIAN MOTION ANALYSIS reliable sparsity pattern detection. The lasso method is oftenmore complex than OMP, but it is widely believed to offset thisdisadvantage with superior performance [10]. Our results showthat, at least for sparsity pattern recovery under our asymptoticassumptions, OMP performs at least as well as lasso. Hence,the additional complexity of lasso for these problems may notbe warranted.Neither lasso nor OMP is the best known approximatealgorithm for sparsity pattern recovery. For example, wherethere is no noise in the measurements, the lasso minimization(15) can be replaced by b x = arg min v ∈ R n k v k , s.t. y = Av . A well-known analysis due to Donoho and Tanner [17] showsthat, for i.i.d. Gaussian measurement matrices, this minimiza-tion will recover the correct vector with m ≍ k log( n/m ) (4)when k ≪ n . This scaling is fundamentally better than thescaling (3) achieved by OMP and lasso.There are also several variants of OMP that have shownimproved performance. The CoSaMP algorithm of Needelland Tropp [18] and subspace pursuit algorithm of Dai andMilenkovic [19] achieve a scaling similar to (4). Other variantsof OMP include the stagewise OMP [20] and regularizedOMP [21], [22]. Indeed with the recent interest in compressedsensing, there is now a wide range of promising algorithmsavailable. We do not claim that OMP achieves the bestperformance in any sense. Rather, we simply intend to showthat both OMP and lasso have similar performance in certainscenarios.Our proof of (3) follows along the same lines as Troppand Gilbert’s proof of (2), but with two key differences.First, we account for the effect of the noise by separatelyconsidering its effect in the “true” subspace and its orthogonalcomplement. Second and more importantly, we address the“nasty independence issues” noted by Tropp and Gilbert [10]by providing a tighter bound on the maximum correlation ofthe incorrect vectors. Speciﬁcally, in each iteration of the OMPalgorithm, there are n − k possible incorrect vectors that thealgorithm can choose. Since the algorithm runs for k iterations,there are total of k ( n − k ) possible error events. The Troppand Gilbert proof bounds the probability of these error eventswith a union bound, essentially treating them as statisticallyindependent. However, here we show that energies on any oneof the incorrect vectors across the k iterations are correlated.In fact, they are precisely described by samples of a certainnormalized Brownian motion. Exploiting this correlation weshow that the tail bound on error probability grows as n − k ,not k ( n − k ) , independent events. Sufﬁcient conditions under weaker conditions on the SNR are moresubtle [16]: the scaling of SNR with n determines the sequences of regu-larization parameters for which asymptotic almost sure success is achieved,and the regularization parameter sequence affects the sufﬁcient number ofmeasurements. Recall that our result is a sufﬁcient condition for success whereas thematching condition for lasso is both necessary and sufﬁcient.

The outline of the remainder of this paper is as follows.Section II describes the OMP algorithm. Our main result,Theorem 1, is stated in Section III. A comparison to lasso isprovided in Section IV, and we suggest some future problemsin Section VII. The proof of the main result is somewhatlong and given in the Section VIII. The main result was ﬁrstreported in [23].II. O

RTHOGONAL M ATCHING P URSUIT

To describe the algorithm, suppose we wish to determinethe vector x from a vector y of the form (1). Let I true = { j : x j = 0 } , (5)which is the support of the vector x . The set I true will alsobe called the sparsity pattern . Let k = | I true | , which is thenumber of nonzero entries of x . The OMP algorithm producesa sequence of estimates ˆ I ( t ) , t = 0 , , , . . . , of the sparsitypattern I true , adding one index at a time. In the descriptionbelow, let a j denote the j th column of A . Algorithm 1 (Orthogonal Matching Pursuit):

Given a vec-tor y ∈ R m , a measurement matrix A ∈ R m × n , and thresholdlevel µ > , compute an estimate ˆ I OMP of the sparsity patternof x as follows:1) Initialize t = 0 and ˆ I ( t ) = ∅ .2) Compute P ( t ) , the projection operator onto the orthog-onal complement of the span of { a i , i ∈ ˆ I ( t ) } .3) For each j , compute ρ ( t, j ) = | a ′ j P ( t ) y | k P ( t ) y k , and let [ ρ ∗ ( t ) , i ∗ ( t )] = max j =1 ,...,n ρ ( t, j ) , (6)where ρ ∗ ( t ) is the value of the maximum and i ∗ ( t ) isan index that achieves the maximum.4) If ρ ∗ ( t ) > µ , set ˆ I ( t + 1) = ˆ I ( t ) ∪ { i ∗ ( t ) } . Also,increment t = t + 1 and return to step 2.5) Otherwise stop. The ﬁnal estimate of the sparsity patternis ˆ I OMP = ˆ I ( t ) .Note that since P ( t ) is the projection onto the orthogonalcomplement of the span of { a j , j ∈ ˆ I ( t ) } , for all j ∈ ˆ I ( t ) we have P ( t ) a j = 0 . Hence, ρ ( t, j ) = 0 for all j ∈ ˆ I ( t ) , andtherefore the algorithm will not select the same vector twice.The algorithm above only provides an estimate, ˆ I OMP , ofthe sparsity pattern of I true . Using ˆ I OMP , one can estimatethe vector x in a number of ways. For example, one can takethe least-squares estimate, b x = arg min k y − Av k (7)where the minimization is over all vectors v such v j = 0 forall j ˆ I OMP . The estimate b x is the projection of the noisyvector y onto the space spanned by the vectors a i with i inthe sparsity pattern estimate ˆ I OMP . This paper only analyzesthe sparsity pattern estimate ˆ I OMP itself, and not the vectorestimate b x . LETCHER AND RANGAN 3

III. A

SYMPTOTIC A NALYSIS

We analyze the OMP algorithm in the previous sectionunder the following assumptions.

Assumption 1:

Consider a sequence of sparse recoveryproblems, indexed by the vector dimension n . For each n ,let x ∈ R n be a deterministic vector. Also assume:(a) The sparsity level k = k ( n ) (i.e., number of nonzeroentries in x ) satisﬁes k ( n ) ∈ [ k min ( n ) , k max ( n )] (8)for some deterministic sequences k min ( n ) and k max ( n ) with k min ( n ) → ∞ as n → ∞ and k max ( n ) < n/ forall n .(b) The number of measurements m = m ( n ) is a determin-istic sequence satisfying m ≥ (1 + δ )2 k max log( n − k min ) (9)for some δ > .(c) The minimum component power x satisﬁes lim n →∞ kx = ∞ , (10)where x min = min j ∈ I true | x j | (11)is the magnitude of the smallest nonzero entry of x .(d) The powers of the vectors k x k satisfy lim n →∞ n − k ) ǫ log (cid:0) k x k (cid:1) = 0 (12)for all ǫ > .(e) The vector y is a random vector generated by (1) where A and w have i.i.d. Gaussian entries with zero mean andvariance /m .Assumption 1(a) provides a range on the sparsity level k .As we will see below in Section V, bounds on this range arenecessary for proper selection of the threshold level µ > .Assumption 1(b) is the scaling law on the number ofmeasurements that we will show is sufﬁcient for asymptoticreliable recovery. In the special case when k is known so that k max = k min = k , we obtain the simpler scaling law m ≥ (1 + δ )2 k log( n − k ) . (13)We have contrasted this scaling law with the Tropp–Gilbertscaling law (2) in Section I. We will also compare it to thescaling law for lasso in Section IV.Assumption 1(c) is critical and places constraints on thesmallest component magnitude. The importance of the smallestcomponent magnitude in the detection of the sparsity patternwas ﬁrst recognized by Wainwright [15], [16], [24]. Also, asdiscussed in [25], the condition requires that signal-to-noiseratio (SNR) goes to inﬁnity. Speciﬁcally, if we deﬁne the SNRas SNR = E k Ax k E k w k , then under Assumption 1(e) it can be easily checked that SNR = k x k . (14) Since x has k nonzero entries, k x k ≥ kx , and thereforecondition (10) requires that SNR → ∞ . For this reason,we will call our analysis of OMP a high-SNR analysis. Theanalysis of OMP with SNR that remains bounded above is aninteresting open problem.Assumption (d) is technical and simply requires that theSNR does not grow too quickly with n . Note that even if SNR = O ( k α ) for any α > , Assumption 1(d) will besatisﬁed.Assumption 1(e) states that our analysis concerns largeGaussian measurement matrices A and Gaussian noise w .Our main result is as follows. Theorem 1:

Under Assumption 1, there exists a sequenceof threshold levels µ = µ ( n ) such that the OMP method inAlgorithm 1 will asymptotically detect the correct sparsitypattern in that lim n →∞ Pr (cid:16) ˆ I OMP = I true (cid:17) = 0 . Moreover, the threshold levels µ can be selected simply as afunction of k min , k max , n , m and δ .Theorem 1 provides our main scaling law for OMP. Theproof is given in Section VIII.IV. C OMPARISON TO L ASSO P ERFORMANCE

It is useful to compare the scaling law (13) to the numberof measurements required by the widely-used lasso methoddescribed for example in [26]. The lasso method ﬁnds anestimate for the vector x in (1) by solving the quadraticprogram b x = arg min v ∈ R n k y − Av k + µ k v k , (15)where µ > is an algorithm parameter that trades off theprediction error with the sparsity of the solution. Lasso issometimes referred to as basis pursuit denoising [27]. Whilethe optimization (15) is convex, the running time of lasso issigniﬁcantly longer than OMP unless A has some particularstructure [10]. However, it is generally believed that lasso hassuperior performance.The best analysis of lasso for sparsity pattern recovery forlarge random matrices is due to Wainwright [15], [16]. There,it is shown that with an i.i.d. Gaussian measurement matrixand white Gaussian noise, the condition (13) is necessary for asymptotic reliable detection of the sparsity pattern. Inaddition, under the condition (10) on the minimum com-ponent magnitude, the scaling (13) is also sufﬁcient. Wethus conclude that OMP requires an identical scaling in thenumber of measurements to lasso. Therefore, at least for spar-sity pattern recovery from measurements with large randomGaussian measurement matrices and high SNR, there is noadditional performance improvement with the more complexlasso method over OMP.V. T HRESHOLD S ELECTION AND S TOPPING C ONDITIONS

In many problems, the sparsity level k is not known a priori and must be detected as part of the estimation process. In OMP,the sparsity level of the estimate vector is precisely the number ORTHOGONAL MATCHING PURSUIT: A BROWNIAN MOTION ANALYSIS of iterations conducted before the algorithm terminates. Thus,reliable sparsity level estimation requires a good stoppingcondition.When the measurements are noise-free and one is concernedonly with exact signal recovery, the optimal stopping conditionis simple: the algorithm should simply stop whenever there isno more error; that is, ρ ∗ ( t ) = 0 in (6). However, with noise,selecting the correct stopping condition requires some care.The OMP method as described in Algorithm 1 uses a stoppingcondition based on testing if ρ ∗ ( t ) > µ for some threshold µ .One of the appealing features of Theorem 1 is that it pro-vides a simple sufﬁcient condition under which this thresholdmechanism will detect the correct sparsity level. Speciﬁcally,Theorem 1 provides a range k ∈ [ k min , k max ] under whichthere exists a threshold such that the OMP algorithm willterminate in the correct number of iterations. The larger thenumber of measurements m , the wider one can make the range [ k min , k max ] . The formula for the threshold level is given laterin (22).In practice, one may deliberately want to stop the OMPalgorithm with fewer iterations than the “true” sparsity level.As the OMP method proceeds, the detection becomes lessreliable and it is sometimes useful to stop the algorithmwhenever there is a high chance of error. Stopping early maymiss some small entries, but it may result in an overall betterestimate by not introducing too many erroneous entries orentries with too much noise. However, since our analysis isonly concerned with exact sparsity pattern recovery, we do notconsider this type of stopping condition.VI. N UMERICAL S IMULATIONS

To verify the above analysis, we simulated the OMP al-gorithm with ﬁxed signal dimension n = 100 and differentsparsity levels k , numbers of measurements m , and randomly-generated vectors x .In the ﬁrst experiment, x ∈ R n was generated with k randomly placed nonzero values, with all the nonzero entrieshaving the same magnitude | x j | = C for some C > . Follow-ing Assumption 1(e), the measurement matrix A ∈ R m × n andnoise vector w ∈ R m were generated with i.i.d. N (0 , /m ) entries. Using (14) and the fact that x has k nonzero entrieswith power C , the SNR is given by SNR = k x k = kC , so the SNR can be controlled by varying C .Fig. 1 plots the probability that the OMP algorithm incor-rectly detected the sparsity pattern for different values of k and m . The probability is estimated with 1000 Monte Carlosimulations per ( k, m ) pair. For each k and m , the thresholdlevel µ was selected as the one with the lowest probability oferror, assuming, of course, that the same µ is used across all1000 Monte Carlo runs.The solid curve in Fig. 1 is the theoretical number ofmeasurements in (13) from Theorem 1 that guarantees exactsparsity recovery. The formula is theoretically valid as n → ∞ and SNR → ∞ . At ﬁnite problem sizes, the probability of errorfor m satisfying (13) will be nonzero. However, Fig. 1 shows N u m be r o f m ea s u r e m en t s m Sparsity kNo noise5 10 15 20406080100120140160180200 0 0.20.40.60.81 Sparsity kSNR = 20 dB5 10 15 20406080100120140160180200

Fig. 1. OMP performance prediction. The colored bars show the probabilityof sparsity pattern misdetection based on 1000 Monte Carlo simulations ofthe OMP algorithm. The signal dimension is ﬁxed to n = 100 and the errorprobability is plotted against the number of measurements m and sparsitylevel k . The solid black curve shows the theoretical number of measurements m = 2 k log( n − k ) sufﬁcient for asymptotic reliable detection. that for the problem size in the simulation, the probability oferror for OMP is indeed low for values of m greater than thetheoretical level. When there is no noise (i.e. SNR = ∞ ), theprobability of error is between 3 and 5% for most values of k .When the SNR is 20 dB, the probability of error is between15 and 20%. In either case, the formula provides a reasonableprediction of the threshold in the number of measurements atwhich the OMP method succeeds.Theorem 1 is only a sufﬁcient condition . It is possible thatfor some x , OMP could require a number of measurementsless than predicted by (13). That is, the number of measure-ments (13) may not be necessary .To illustrate such a case, we consider vectors with a nonzerodynamic range of component magnitudes. Fig. 2 shows theprobability of sparsity pattern detection as a function of m forvectors x with different dynamic ranges. Speciﬁcally, the k nonzero entries of x were chosen to have powers uniformlydistributed in a range of 0, 10 and 20 dB. In this simulation,we used k = 20 and n = 100 , so the sufﬁcient conditionpredicted by (13) is m ≈ . When the dynamic range is0 dB, all the nonzero entries have equal magnitude, and theprobability of error at the value m = 136 is approximately 3%.However, with a dynamic range of 10 dB, the same probabilityof error can be achieved with m ≈ measurements, avalue signiﬁcantly below the sufﬁcient condition in (13). Witha dynamic range of 20 dB, the number of measurementsdecreases further to m ≈ .This possible beneﬁt of dynamic range in OMP-like algo-rithms has been observed in [28], [29] and in sparse Bayesianlearning [30], [31]. A valuable line of future research wouldbe to see if this beneﬁt can be quantiﬁed. That is, it would beuseful to develop a sufﬁcient condition tighter than (13) thataccounts for the dynamic range of the signals. LETCHER AND RANGAN 5

20 40 60 80 100 120 140 160 18010 −2 −1 Number of measurements m P r obab ili t y o f e rr o r (n,k)=(100,20), no noise0 dB range10 dB range20 dB rangem = 2k log(n−k) Fig. 2. OMP performance and dynamic range. Plotted is the probabilityof sparsity pattern detection as a function of the number of measurementsfor random vectors x with various dynamic ranges. In all cases, n = 100 , k = 20 and SNR = ∞ . VII. C

ONCLUSIONS AND F UTURE W ORK

We have provided an improved scaling law on the numberof measurements for asymptotic reliable sparsity pattern de-tection with OMP. Most importantly, the scaling law exactlymatches the scaling needed by lasso under similar conditions.However, much about the performance of OMP is still notfully understood. Most importantly, our analysis is limited tohigh SNR. It would be interesting to see if reasonable sufﬁcientconditions can be derived for ﬁnite SNR as well. Also, ouranalysis has been restricted to exact sparsity pattern recovery.However, in many problems, especially with noise, it is notnecessary to detect every element in the sparsity pattern. Itwould be useful if partial support recovery results such asthose in [32]–[34] can be obtained for OMP.VIII. P

ROOF OF T HEOREM A. Proof Outline

The main difﬁculty in analyzing OMP is the statisticaldependencies between iterations in the OMP algorithm. Fol-lowing along the lines of the Tropp–Gilbert proof in [10], weavoid these difﬁculties by considering the following alternate“genie” algorithm. A similar alternate algorithm is analyzedin [28] as well.1) Initialize t = 0 and I true ( t ) = ∅ .2) Compute P true ( t ) , the projection operator onto the or-thogonal complement of the span of { a i , i ∈ I true ( t ) } .3) For all j = 1 , . . . , n , compute ρ true ( t, j ) = | a ′ j P true ( t ) y | k P true ( t ) y k , (16)and let [ ρ ∗ true ( t ) , i ∗ ( t )] = max j ∈ I true ρ true ( t, j ) . (17) 4) If t < k , set I true ( t + 1) = I true ( t ) ∪ { i ∗ ( t ) } . Increment t = t + 1 and return to step 2.5) Otherwise stop. The ﬁnal estimate of the sparsity patternis I true ( k ) .This “genie” algorithm is identical to the regular OMPmethod in Algorithm 1, except that it runs for precisely k iterations as opposed to using a threshold µ for the stop-ping condition. Also, in the maximization in (17), the geniealgorithm searches over only the correct indices j ∈ I true .Hence, this genie algorithm can never select an incorrect index j I true . Also, as in the regular OMP algorithm, the geniealgorithm will never select the same vector twice for almostall vectors y . Therefore, after k iterations, the genie algorithmwill have selected all the k indices in I true and terminate withcorrect sparsity pattern estimate I true ( k ) = I true with probability one.The reason to consider the sequences P true ( t ) and I true ( t ) instead of P ( t ) and ˆ I ( t ) is that the quantities P true ( t ) and I true ( t ) depend only on the vector y and the columns a j for j ∈ I true . The vector y also only depends on a j for j ∈ I true and the noise vector w . Hence, P true ( t ) and I true ( t ) are statistically independent of all the columns a j , j I true .This property will be essential in bounding the “false alarm”probability to be deﬁned shortly.Now, a simple induction argument shows that if min t =0 ,...,k − max j ∈ I true ρ true ( t, j ) > µ, (18a) max t =0 ,...,k max j I true ρ true ( t, j ) < µ, (18b)then the regular OMP algorithm, Algorithm 1, will terminate in k iterations. Moreover, for all t , the OMP algorithm will output P ( t ) = P true ( t ) , ˆ I ( t ) = I true ( t ) , and ρ ( t, j ) = ρ true ( t, j ) forall t and j . This will in turn result in the OMP algorithmdetecting the correct sparsity pattern ˆ I OMP = I true . So, we need to show that the two events in (18a) and (18b)occur with high probability.To this end, deﬁne the following two probabilities: p MD = Pr (cid:18) max t =0 ,...k − min j ∈ I true ρ true ( t, j ) ≤ µ (cid:19) (19) p FA = Pr (cid:18) max t =0 ,...k max j I true ρ true ( t, j ) ≥ µ (cid:19) (20)Both probabilities are implicitly functions of n . The ﬁrst term, p MD , can be interpreted as a “missed detection” probability,since it corresponds to the event that the maximum correlationenergy ρ true ( t, j ) on the correct vectors j ∈ I true falls belowthe threshold. We call the second term p FA the “false alarm”probability since it corresponds to the maximum energy on oneof the “incorrect” indices j I true exceeding the threshold.The above arguments show that Pr (cid:16) ˆ I OMP = I true (cid:17) ≤ p MD + p FA . ORTHOGONAL MATCHING PURSUIT: A BROWNIAN MOTION ANALYSIS

So we need to show that there exists a sequence of thresholds µ = µ ( n ) > , such that p MD → and p FA → as n → ∞ . We will deﬁne the threshold level in Section VIII-B.Sections VIII-C and VIII-D then prove that p MD → with thisthreshold. The difﬁcult part of the proof is to show p FA → .This part is proven in Section VIII-G after some preliminaryresults in Sections VIII-E and VIII-F. B. Threshold Selection

We will ﬁrst select the threshold sequence µ ( n ) . Given δ > in (9), let ǫ > such that δ ǫ ≥ ǫ. (21)Then, deﬁne the threshold level µ = µ ( n ) = 2(1 + ǫ ) m log( n − k min ) . (22)Observe that since k ≥ k min , (22) implies that µ ≥ ǫ ) m log( n − k ) . (23)Also, since k ≤ k max , (9), (21) and (22) show that µ ≤ ǫ ) k . (24) C. Decomposition Representation and Related Bounds

To bound the missed detection probability, it is easiestto analyze the OMP algorithm in two separate subspaces:the span of the vectors { a j , j ∈ I true } , and its orthogonalcomplement. This subsection deﬁnes some notation for thisorthogonal decomposition and proves some simple bounds.The actual limit of the missed detection probability will thenbe evaluated in the next subsection, Section VIII-D.Assume without loss of generality I true = { , , . . . , k } ,so that the vector x is supported on the ﬁrst k elements. Let Φ be the m × k matrix formed by the k correct columns: Φ = [ a , a , . . . , a k ] . Also, let x true = [ x , x , . . . , x k ] ′ be the vector of the k nonzero entries so that Ax = Φ x true . (25)Now rewrite the noise vector w as w = Φ v + w ⊥ (26)where v = (Φ ′ Φ) − Φ ′ w , w ⊥ = w − Φ v . (27)The vectors Φ v and w ⊥ are, respectively, the projections ofthe noise vector w onto the k -dimensional range space of Φ and its orthogonal complement. Combining (25) with (26), wecan rewrite (1) as y = Φ z + w ⊥ , (28)where z = x true + v . (29) We begin by computing the limit of the norms of themeasurement vector y and the projected noise vector w ⊥ . Lemma 1:

The limits lim n →∞ k y k k x k = 1 , lim n →∞ k w ⊥ k = 1 , hold almost surely and in probability. Proof:

The vector w is Gaussian, zero mean and whitewith variance /m per entry. Therefore, its projection, w ⊥ ,will also be white in the ( m − k ) -dimensional orthogonal com-plement of the range of Φ with variance /m per dimension.Therefore, by the strong law of large numbers lim n →∞ k w ⊥ k = lim n →∞ m − km = 1 , where the last step follows from the fact that (9) implies that k/m → .Similarly, it is easily veriﬁed that since A and w have i.i.d.Gaussian entries with variance /m , the vector y is also i.i.d.Gaussian with per-entry variance ( k x k + 1) /m . Again, thestrong law of large numbers shows that lim n →∞ k y k k x k = 1 . We next need to compute the minimum singular value of Φ . Lemma 2:

Let σ min (Φ) and σ max (Φ) be the minimum andmaximum singular values of Φ , respectively. Then lim n →∞ σ min (Φ) = lim n →∞ σ max (Φ) = 1 where the limits are in probability. Proof:

Since the matrix Φ has N (0 , /m ) i.i.d. entries,the Marˇcenko–Pastur theorem [35] states that lim n →∞ σ min (Φ) = lim n →∞ − p k/m lim n →∞ σ max (Φ) = lim n →∞ p k/m where the limits are in probability. The result now followsfrom (9) which implies that k/m → as n → ∞ .We can also bound the singular values of submatrices of Φ .Given a subset I ⊆ { , , . . . , k } , let Φ I be the submatrix of Φ formed by the columns a i for i ∈ I . Also, let P I be theprojection onto the orthogonal complement of the span of theset { a i , i ∈ I } . We have the following bound. Lemma 3:

Let I and J be any two disjoint subsets ofindices such that I ∪ J = { , , . . . , k } . Then, σ min (Φ ′ J P I Φ J ) ≥ σ (Φ) . Proof:

The matrix S = [Φ I Φ J ] is identical to Φ exceptthat the columns may be permuted. In particular, σ min ( S ) = LETCHER AND RANGAN 7 σ min (Φ) . Therefore, S ′ S = (cid:20) Φ ′ I Φ I Φ ′ I Φ J Φ ′ J Φ I Φ ′ J Φ J (cid:21) ≥ σ ( S ) I = σ (Φ) I ≥ (cid:20) σ (Φ) I (cid:21) . The Schur complement (see, for example [36]) now shows that Φ ′ J Φ J − σ (Φ) I ≥ Φ ′ J Φ I (Φ ′ I Φ I ) − Φ ′ I Φ J , or equivalently, Φ ′ J (cid:0) I − Φ I (Φ ′ I Φ I ) − Φ ′ I (cid:1) Φ J ≥ σ (Φ) I. The result now follows from the fact that P I = I − Φ I (Φ ′ I Φ I ) − Φ ′ I . We also need the following tail bound on chi-squaredrandom variables.

Lemma 4:

Suppose X i , i = 1 , , . . . , is a sequence of real-valued, scalar Gaussian random variables with X i ∼ N (0 , .The variables need not be independent. Let M k be the maxi-mum M k = max i =1 ,...,k | X i | . Then lim sup k →∞ M k k ) ≤ , where the limit is in probability. Proof:

See for example [28].This bound permits us to bound the minimum componentof z . Lemma 5:

Let z min be the minimum component value z min = min j =1 ,...,k | z j | . (30)Then lim inf n →∞ z min x min ≥ , where the limit is in probability and x min is deﬁned in (11). Proof:

Since w is zero mean and Gaussian, so is v asdeﬁned in (27). Also, the covariance of v is bounded aboveby E [ vv ′ ] ( a ) = (Φ ′ Φ) − Φ ′ ( E [ ww ′ ]) Φ ′ (Φ ′ Φ) − b ) = 1 m (Φ ′ Φ) − c ) ≤ m σ − (Φ) , where (a) follows from the deﬁnition of v in (27); (b) followsfrom the assumption that E [ ww ′ ] = (1 /m ) I m ; and (c) is abasic property of singular values. This implies that for every i ∈ { , , . . . , k } , E | v i | ≤ m σ − (Φ) . Applying Lemma 4 shows that lim sup k →∞ mv σ (Φ)2 log( k ) ≤ , (31)where v max = max i =1 ,...,k | v i | . Therefore, lim n →∞ v x = lim n →∞ (cid:18) mv k ) (cid:19) (cid:18) k ) mx (cid:19) ( a ) ≤ lim n →∞ (cid:18) mv σ (Φ)2 log( k ) (cid:19) (cid:18) k ) mx (cid:19) ( b ) ≤ lim n →∞ k ) mx c ) ≤ lim n →∞ n − k ) mx d ) ≤ lim n →∞ δ ) kx e ) = 0 , where all the limits are in probability and (a) follows fromLemma 2; (b) follows from (31); (c) follows from the factthat k < n/ and hence k < n − k ; (d) follows from (9); and(e) follows from (10). Now, for j ∈ { , , . . . , k } , | z j | = | x j + v j | ≥ | x j | − | v j | , and therefore, z min ≥ x min − v max . Hence, z min x min ≥ − v max x min → , where again the limit is in probability. D. Probability of Missed Detection

With the bounds in the previous section, we can now showthat the probability of missed detection goes to zero. The proofis similar to Tropp and Gilbert’s proof in [10] with somemodiﬁcations to account for the noise.For any t ∈ { , , . . . , k } , let J ( t ) = I true ∩ I true ( t ) c ,which is the set of indices j ∈ I true that are not yet detectedin iteration t of the genie algorithm in Section VIII-A. Then Φ z = Φ I true ( t ) z I true ( t ) + Φ J ( t ) z J ( t ) , (32)where (using the notation of the previous subsection), Φ I denotes the submatrix of Φ formed by the columns withindices i ∈ I , and z I denotes the corresponding subvector.Now since P true ( t ) is the projection onto the orthogonalcomplement of the span of { a i , i ∈ I true ( t ) } , P true ( t )Φ I true ( t ) = 0 . (33)Also, since w ⊥ is orthogonal to a i for all i ∈ I true and I true ( t ) ⊆ I true , P true ( t ) w ⊥ = w ⊥ . (34) ORTHOGONAL MATCHING PURSUIT: A BROWNIAN MOTION ANALYSIS

Therefore, P true ( t ) y ( a ) = P true ( t )(Φ z + w ⊥ ) ( b ) = P true ( t )(Φ J ( t ) z J ( t ) + w ⊥ ) ( c ) = P true ( t )Φ J ( t ) z J ( t ) + w ⊥ , (35)where (a) follows from (28); (b) follows from (32) and (33);and (c) follows from (34).Now using (34) and the fact that w ⊥ is orthogonal to a i for all i ∈ I true , we have a ′ i P true ( t ) w ⊥ = a ′ i w ⊥ = 0 (36)for all i ∈ I true . Since the columns of Φ J ( t ) are formed byvectors a i with i ∈ I true , Φ ′ J ( t ) P true ( t ) w ⊥ = 0 . (37)Combining (37) and (35), k P true ( t ) y k = k P true ( t )Φ J ( t ) z J ( t ) k + k w ⊥ k . (38)Now for all t , we have that max j ∈ I true ρ true ( t, j ) ( a ) = 1 k P true ( t ) y k max j ∈ I true | a ′ j P true ( j ) y | b ) = 1 k P true ( t ) y k max j ∈ J ( t ) | a ′ j P true ( j ) y | c ) = 1 k P true ( t ) y k k Φ ′ J ( t ) P true ( j ) y k ∞ ( d ) ≥ | J ( t ) |k P true ( t ) y k k Φ ′ J ( t ) P true ( j ) y k e ) = k Φ ′ J ( t ) P true ( j )Φ J ( t ) z J ( t ) k | J ( t ) |k P true ( t ) y k f ) = k Φ ′ J ( t ) P true ( j )Φ J ( t ) z J ( t ) k | J ( t ) | (cid:0) k P true ( t )Φ J ( t ) z J ( t ) k + k w ⊥ k (cid:1) ( g ) ≥ σ min (Φ ′ J ( t ) P true ( j )Φ J ( t ) ) k z J ( t ) k | J ( t ) | (cid:0) σ (Φ) k z J ( t ) k + k w ⊥ k (cid:1) ( h ) ≥ σ (Φ) k z J ( t ) k | J ( t ) | (cid:0) σ (Φ) k z J ( t ) k + k w ⊥ k (cid:1) ( i ) ≥ σ (Φ) z σ (Φ) kz + k w ⊥ k , (39)where (a) follows from the deﬁnition of ρ true ( t, j ) in (16); (b)follows from the fact that P true ( t ) a j = 0 for all j ∈ I true ( t ) and hence the maximum will occur on the set j ∈ I true ∩ I true ( t ) c = J ( t ) ; (c) follows from the fact that Φ J ( t ) is thematrix of the columns a j with j ∈ J ( t ) ; (d) follows the boundthat k v k ≤ d k v k ∞ for any v ∈ R d ; (e) follows (35) and (37);(f) follows from (38); (g) follows from the fact that P true ( t ) is a projection operator and hence, σ max ( P true ( t )Φ J ( t ) ) ≤ σ max (Φ J ( t ) ) ≤ σ max (Φ); (h) follows from Lemma 3; and (i) follows from the bound k z J ( t ) k ≥ | J ( t ) | z and | J ( t ) | ≤ k . Therefore, lim inf n →∞ min t =0 ,...,k − max j ∈ I true µ ρ true ( t, j ) ( a ) ≥ lim inf n →∞ µ σ (Φ) z σ (Φ) kz + k w ⊥ k b ) ≥ lim inf n →∞ µ z kz + 1 ( c ) ≥ lim inf n →∞ µ x kx + 1 ( d ) ≥ lim inf n →∞ kµ ( e ) ≥ ǫ, (40)where (a) follows from (39), (b) follows from Lemmas 1 and 2;(c) follows from Lemma 5; (d) follows from the assumptionof the theorem that kx → ∞ ; and (e) follows from (24).The deﬁnition of p MD in (19) now shows that lim n →∞ p MD = 0 . E. Bounds on Normalized Brownian Motions

Let B ( t ) be a standard Brownian motion. Deﬁne the nor-malized Brownian motion S ( t ) as the process S ( t ) = 1 √ t B ( t ) , t > . (41)We call the process normalized since E | S ( t ) | = 1 t E | B ( t ) | = tt = 1 . We ﬁrst characterize the autocorrelation of this process.

Lemma 6: If t > s , the normalized Brownian motion hasautocorrelation E [ S ( t ) S ( s )] = p s/t. Proof:

Write S ( t ) = 1 √ t ( B ( s ) + B ( t ) − B ( s )) . Thus, E [ S ( t ) S ( s )] = 1 √ st E [( B ( s ) + ( B ( t ) − B ( s )) B ( s )] ( a ) = 1 √ st E (cid:2) B ( s ) (cid:3) ( b ) = s √ st = r st , where (a) follows from the orthogonal increments property ofBrownian motions; and (b) follows from the fact that B ( s ) ∼N (0 , s ) .We now need the following standard Gaussian tail bound. Lemma 7:

Suppose X is a real-valued, scalar Gaussianrandom variable, X ∼ N (0 , . Then, Pr (cid:0) X > µ (cid:1) ≤ √ πµ exp( − µ/ . Proof:

See for example [37].

LETCHER AND RANGAN 9

We next provide a simple bound on the maximum of samplepaths of S ( t ) . Lemma 8:

For any < a < b , let S max ( a, b ) = sup t ∈ [ a,b ] | S ( t ) | . Then, for any µ > , Pr (cid:0) S ( a, b ) > µ (cid:1) ≤ baµ √ π exp (cid:16) − aµ b (cid:17) . Proof:

Since S ( t ) and S ( − t ) are identically distributed, Pr (cid:0) S ( a, b ) > µ (cid:1) ≤ sup t ∈ [ a,b ] S ( t ) > √ µ ! . (42)So, it will sufﬁce to bound the probability of the single-sidedevent sup S ( t ) > √ µ . For t ≥ , deﬁne B a ( t ) = B ( a + t ) − B ( a ) . Then, B a ( t ) is a standard Brownian motion independentof B ( a ) . Also, sup t ∈ [ a,b ] S ( t ) > √ µ ⇒ sup t ∈ [ a,b ] √ t B ( t ) > √ µ ⇒ sup t ∈ [ a,b ] B ( t ) > √ aµ ⇒ B ( a ) + sup t ∈ [0 ,b − a ] B a ( t ) > √ aµ. Now, the reﬂection principle (see, for example [38]) states thatfor any y , Pr (cid:18) max t ∈ [0 ,b − a ] B a ( t ) > y (cid:19) = 2 Pr (cid:16) √ b − aY > y (cid:17) , where Y is a unit-variance, zero-mean Gaussian. Also, B ( a ) ∼N (0 , a ) , so if we deﬁne X = (1 / √ a ) B ( a ) , then X ∼N (0 , . Since B ( a ) is independent of B a ( t ) for all t ≥ ,we can write Pr sup t ∈ [ a,b ] S ( t ) > √ µ ! ≤ (cid:16) √ aX + √ b − aY > √ aµ (cid:17) , (43)where X and Y are independent zero mean Gaussian randomvariables with unit variance. Now √ aX + √ b − aY hasvariance E h ( √ aX + √ b − aY ) i = a + b − a = b. Applying Lemma 7 shows that (43) can be bounded by Pr sup t ∈ [ a,b ] S ( t ) > √ µ ! ≤ baµ √ π exp (cid:16) − aµ b (cid:17) . Substituting this bound in (42) proves the lemma.Our next lemma improves the bound for large µ . Lemma 9:

There exist constants C , C , and C such thatfor any < a < b and µ > C , Pr (cid:0) S ( a, b ) > µ (cid:1) ≤ ( C + C log ( b/a )) e − µ/ . Proof:

Fix any integer n > , and deﬁne t i = a ( b/a ) i/n for i = 0 , , . . . , n . Observe that t i s partition the interval [ a, b ] in that a = t < t < · · · < t n = b. Also, let r = b/a . Then, t i +1 /t i = ( b/a ) /n = r /n . ApplyingLemma 8 to each interval in the partition, Pr( S ( a, b ) > µ ) ≤ n − X i =1 Pr (cid:0) S ( t i , t i +1 ) > µ (cid:1) ≤ nr /n µ √ π exp (cid:18) − r − /n µ (cid:19) . (44)Now, let δ > , and for µ > δ , let n = (cid:24) − log( r )log(1 − δ/µ ) (cid:25) . (45)Then r − /n ≥ − δ/µ, (46)and hence exp (cid:18) − r − /n µ (cid:19) ≤ e δ/ e − µ/ . (47)Also, (45) implies that n ≤ − log( r )log(1 − δ/µ ) ≤ µδ log( r ) , (48)where we have used the fact that log(1 − x ) < − x for x > .Combining the bounds (46) and (48) yields nr /n µ ≤ (cid:16) µδ log( r ) (cid:17) µ − δ . (49)Now, pick any δ > and let C = 2 δ . Then if µ > C = 2 δ ,(49) implies that nr /n µ ≤ δ (1 + 2 log( r )) . (50)Substituting (47) and (50) into (44) shows that Pr( S max ( a, b ) > µ ) ≤ ( C + C log( r )) e − µ/ , where C = e δ/ √ πδ , C = 2 e δ/ √ πδ . The result now follows from the fact that r = b/a . F. Bounds on Sequences of Projections

We can now apply the results in the previous subsection tobound the norms of sequences of projections. Let y ∈ R m be any deterministic vector, and let P ( i ) , i = 0 , , . . . , k bea deterministic sequence of orthogonal projection operatorson R m . Assume that the sequence P ( i ) is decreasing in that P ( i ) P ( j ) = P ( i ) for j > i . Lemma 10:

Let a ∈ R m be a Gaussian random vector withunit variance, and deﬁne the random variable M = max i =0 ,...,k | a ′ P ( i ) y | k P ( i ) y k . Then there exist constants C , C , and C > (all indepen-dent of the problem parameters) such that µ > C implies Pr(

M > µ ) ≤ ( C + C log( r )) e − µ/ , where r = k P (1) y k / k P ( n ) y k . Proof:

Deﬁne z i = y ′ P ( i ) a k P ( i ) y k , so that M = max i =0 ,...,k | z i | . Since each z i is the inner product of the Gaussian vector a witha ﬁxed vector, the scalars { z i , i = 0 , , . . . , k } are jointlyGaussian. Since a has mean zero, so do the z i s.To compute the cross-correlations, suppose that j ≥ i . Then E [ z i z j ] = 1 k P ( i ) y kk P ( j ) y k E [ y ′ P ( i ) aa ′ P ( j ) y ] ( a ) = 1 k P ( i ) y kk P ( j ) y k y ′ P ( i ) P ( j ) y ( b ) = 1 k P ( i ) y kk P ( j ) y k y ′ P ( i ) y = k P ( i ) y kk P ( j ) y k , where (a) uses the fact that E [ aa ′ ] = I m ; and (b) uses thedescending property that P ( i ) P ( j ) = P ( i ) . Therefore, if welet t i = k P ( i ) y k , we have the cross-correlations E [ z i z j ] = q t i /t j (51)for all j ≥ i . Also observe that since the projection operatorsare decreasing, so are the t j s. That is, for j ≥ i , t i = k P ( i ) y k a ) = k P ( i ) P ( j ) y k b ) ≤ k P ( j ) y k = t j , where again (a) uses the decreasing property; and (b) usesthe fact that P ( i ) is a projection operator and norm non-increasing.Now let S ( t ) be the normalized Brownian motion in (41).Lemma 6 and (51) show that the Gaussian vector z = ( z , z , . . . , z k ) has the same covariance as the vector of samples of S ( t ) , s = ( S ( t ) , S ( t ) , . . . , S ( t k )) . Since they are also both zero-mean and Gaussian, they havethe same distribution. Hence, for all µ , Pr(

M > µ ) = Pr (cid:18) max i =0 ,...,k | z i | > µ (cid:19) = Pr (cid:18) max i =0 ,...,k | S ( t i ) | > µ (cid:19) ≤ Pr sup t ∈ [ t k ,t ] | S ( t ) | > µ ! , where the last step follows from the fact that the t i s aredecreasing and hence t k ≥ t i ≥ t for all i ∈ { , , . . . , k } .The result now follows from Lemma 9. G. Probability of False Alarm

Recall that all the projection operators P true ( t ) and thevector y are statistically independent of the vectors a j for j I true . Since the entries of the matrix A are i.i.d. Gaussianwith zero mean and variance /m , the vector m a j is Gaussianwith unit variance. Hence, Lemma 10 shows that there existconstants C , C , and C such that for any λ > C , Pr (cid:18) max t =0 ,...,k m | a j P true ( t ) y | k P true ( t ) y k ≥ λ (cid:19) ≤ Be − λ/ , (52)where j I true and B = C + C log (cid:18) k P true (0) y k k P true ( k ) y k (cid:19) . (53)Therefore, p FA ( a ) = Pr (cid:18) max t =1 ,...,k max j I true ρ true ( t, j ) > µ (cid:19) ( b ) ≤ ( n − k ) max j I true Pr (cid:18) max t =1 ,...,k ρ true ( t, j ) > µ (cid:19) ( c ) = ( n − k ) max j I true Pr (cid:18) max t =1 ,...,k | a j P true ( t ) y | k P true ( t ) y k > µ (cid:19) ( d ) ≤ ( n − k ) Be − mµ/ e ) ≤ ( n − k ) Be − (1+ ǫ ) log( n − k ) = 1( n − k ) ǫ B, (54)where (a) follows from the deﬁnition of p FA in (20); (b) usesthe union bound and the fact that I c true has n − k elements; (c)follows from the deﬁnition of ρ true ( t, j ) in (16); (d) followsfrom (52) under the condition that µm > C ; and (e) followsfrom (23). By (9) and the hypothesis of the theorem that n − k → ∞ , µm = (1 + δ )2 log( n − k ) → ∞ as n → ∞ . Therefore, for sufﬁciently large n , µm > C and (54) holds.Now, since I true (0) = ∅ , P true (0) = I and therefore P true (0) y = y . (55)Also, I true ( k ) = I true and so P true ( k ) is the projectiononto the orthogonal complement of the range of Φ . Hence P true ( k )Φ = 0 . Combining this fact with (28) and (34) shows P true ( k ) y = w ⊥ . (56)Therefore, lim inf n →∞ p FA( a ) ≤ lim inf n →∞ n − k ) ǫ B ( b ) ≤ lim inf n →∞ n − k ) ǫ (cid:18) C + C log (cid:18) k P true (0) y k k P true ( k ) y k (cid:19)(cid:19) ( c ) = lim inf n →∞ n − k ) ǫ (cid:18) C + C log (cid:18) k y k k w ⊥ k (cid:19)(cid:19) ( d ) = lim inf n →∞ n − k ) ǫ (cid:0) C + C log(1 + k x k ) (cid:1) ( e ) = 0 LETCHER AND RANGAN 11 where (a) follows from (54); (b) follows from (53); (c) followsfrom (55) and (56); (d) follows from Lemma 1; and (e) followsfrom (12). This completes the proof of the theorem.A

CKNOWLEDGMENTS

The authors thank Vivek Goyal for comments on an earlierdraft and Martin Vetterli for his support, wisdom, and encour-agement. R

EFERENCES[1] S. Mallat,

A Wavelet Tour of Signal Processing , 2nd ed. AcademicPress, 1999.[2] A. Miller,

Subset Selection in Regression , 2nd ed., ser. Monographs onStatistics and Applied Probability. New York: Chapman & Hall/CRC,2002, no. 95.[3] E. J. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency informa-tion,”

IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, Feb.2006.[4] D. L. Donoho, “Compressed sensing,”

IEEE Trans. Inform. Theory ,vol. 52, no. 4, pp. 1289–1306, Apr. 2006.[5] E. J. Cand`es and T. Tao, “Near-optimal signal recovery from randomprojections: Universal encoding strategies?”

IEEE Trans. Inform. The-ory , vol. 52, no. 12, pp. 5406–5425, Dec. 2006.[6] B. K. Natarajan, “Sparse approximate solutions to linear systems,”

SIAMJ. Computing , vol. 24, no. 2, pp. 227–234, Apr. 1995.[7] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methodsand their application to non-linear system identiﬁcation,”

Int. J. Control ,vol. 50, no. 5, pp. 1873–1896, Nov. 1989.[8] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matchingpursuit: Recursive function approximation with applications to waveletdecomposition,” in

Conf. Rec. 27th Asilomar Conf. Sig., Sys., & Comput. ,vol. 1, Paciﬁc Grove, CA, Nov. 1993, pp. 40–44.[9] G. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompo-sition,”

Optical Eng. , vol. 33, no. 7, pp. 2183–2191, Jul. 1994.[10] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measure-ments via orthogonal matching pursuit,”

IEEE Trans. Inform. Theory ,vol. 53, no. 12, pp. 4655–4666, Dec. 2007.[11] ——, “Signal recovery from random measurements via orthogonalmatching pursuit: The Gaussian case,” California Inst. of Tech., Appl.Comput. Math. 2007-01, Aug. 2007.[12] J. A. Tropp, “Greed is good: Algorithmic results for sparse approxima-tion,”

IEEE Trans. Inform. Theory , vol. 50, no. 10, pp. 2231–2242, Oct.2004.[13] D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery of sparseovercomplete representations in the presence of noise,”

IEEE Trans.Inform. Theory , vol. 52, no. 1, pp. 6–18, Jan. 2006.[14] M. A. Davenport and M. B. Wakin, “Analysis of orthogonal matchingpursuit using the restricted isometry property,”

IEEE Trans. Inform.Theory , vol. 56, no. 9, pp. 4395–4401, Sep. 2010.[15] M. J. Wainwright, “Sharp thresholds for high-dimensional and noisyrecovery of sparsity,” Univ. of California, Berkeley, Dept. of Statistics,Tech. Rep., May 2006, arXiv:math.ST/0605740 v1 30 May 2006.[16] ——, “Sharp thresholds for high-dimensional and noisy sparsity recov-ery using ℓ -constrained quadratic programming (lasso),” IEEE Trans.Inform. Theory , vol. 55, no. 5, pp. 2183–2202, May 2009.[17] D. L. Donoho and J. Tanner, “Counting faces of randomly-projectedpolytopes when the projection radically lowers dimension,”

J. Amer.Math. Soc. , vol. 22, no. 1, pp. 1–53, Jan. 2009.[18] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,”

Appl. Comput. Harm. Anal. , vol. 26,no. 3, pp. 301–321, May 2009.[19] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”

IEEE Trans. Inform. Theory , vol. 55, no. 5, pp.2230–2249, May 2009.[20] D. L. Donoho, Y. Tsaig, I. Drori, and J. L. Starck, “Sparse solutionof underdetermined linear equations by stagewise orthogonal matchingpursuit,” preprint, Mar. 2006.[21] D. Needell and R. Vershynin, “Uniform uncertainty principle and signalrecovery via regularized orthogonal matching pursuit,”

Found. Comput.Math. , vol. 9, no. 3, pp. 317–334, Jun. 2009. [22] ——, “Signal recovery from incomplete and inaccurate measurementsvia regularized orthogonal matching pursuit,”

IEEE J. Sel. Topics SignalProcess. , vol. 4, no. 2, pp. 310–316, Apr. 2010.[23] A. K. Fletcher and S. Rangan, “Orthogonal matching pursuit from noisymeasurements: A new analysis,” in

Proc. Neural Information Process.Syst. , Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, andA. Culotta, Eds., Vancouver, Canada, Dec. 2009.[24] M. J. Wainwright, “Information-theoretic limits on sparsity recovery inthe high-dimensional and noisy setting,”

IEEE Trans. Inform. Theory ,vol. 55, no. 12, pp. 5728–5741, Dec. 2009.[25] A. K. Fletcher, S. Rangan, and V. K. Goyal, “Necessary and sufﬁcientconditions for sparsity pattern recovery,”

IEEE Trans. Inform. Theory ,vol. 55, no. 12, pp. 5758–5772, Dec. 2009.[26] R. Tibshirani, “Regression shrinkage and selection via the lasso,”

J.Royal Stat. Soc., Ser. B , vol. 58, no. 1, pp. 267–288, 1996.[27] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,”

SIAM J. Sci. Comp. , vol. 20, no. 1, pp. 33–61, 1999.[28] A. K. Fletcher, S. Rangan, and V. K. Goyal, “On–off random ac-cess channels: A compressed sensing framework,” arXiv:0903.1022v1[cs.IT]., Mar. 2009.[29] ——, “A sparsity detection framework for on–off random access chan-nels,” in

Proc. IEEE Int. Symp. Inform. Theory , Seoul, Korea, Jun.–Jul.2009, pp. 169–173.[30] D. Wipf and B. Rao, “Comparing the effects of different weight distri-butions on ﬁnding sparse representations,” in

Proc. Neural InformationProcess. Syst. , Vancouver, Canada, Dec. 2006.[31] Y. Jin and B. Rao, “Performance limits of matching pursuit algorithms,”in

Proc. IEEE Int. Symp. Inform. Theory , Toronto, Canada, Jun. 2008,pp. 2444–2448.[32] M. Akc¸akaya and V. Tarokh, “Shannon-theoretic limits on noisy com-pressive sampling,”

IEEE Trans. Inform. Theory , vol. 56, no. 1, pp.492–504, Jan. 2010.[33] G. Reeves, “Sparse signal sampling using noisy linear projections,” Univ.of California, Berkeley, Dept. of Elec. Eng. and Comp. Sci., Tech. Rep.UCB/EECS-2008-3, Jan. 2008.[34] S. Aeron, V. Saligrama, and M. Zhao, “Information theoretic bounds forcompressed sensing,”

IEEE Trans. Inform. Theory , vol. 56, no. 10, pp.5111–5130, Oct. 2010.[35] V. A. Marˇcenko and L. A. Pastur, “Distribution of eigenvalues for somesets of random matrices,”

Math. USSR–Sbornik , vol. 1, no. 4, pp. 457–483, 1967.[36] R. A. Horn and C. R. Johnson,

Matrix Analysis . Cambridge Univ.Press, 1985, reprinted with corrections 1987.[37] J. Spanier and K. B. Oldham,

An Atlas of Functions . Washington:Hemisphere Publishing, 1987.[38] I. Karatzas and S. E. Shreve,