Generalized Orthogonal Matching Pursuit
aa r X i v : . [ c s . I T ] M a r Generalized Orthogonal Matching Pursuit
Jian Wang,
Student Member, IEEE , Seokbeop Kwon,
Student Member, IEEE , and Byonghyo Shim,
SeniorMember, IEEE
Abstract —As a greedy algorithm to recover sparse signalsfrom compressed measurements, orthogonal matching pursuit(OMP) algorithm has received much attention in recent years.In this paper, we introduce an extension of the OMP for pur-suing efficiency in reconstructing sparse signals. Our approach,henceforth referred to as generalized OMP (gOMP), is literallya generalization of the OMP in the sense that multiple N indicesare identified per iteration. Owing to the selection of multiple“correct” indices, the gOMP algorithm is finished with muchsmaller number of iterations when compared to the OMP. Weshow that the gOMP can perfectly reconstruct any K -sparsesignals ( K > ), provided that the sensing matrix satisfies theRIP with δ NK < √ N √ K +3 √ N . We also demonstrate by empiricalsimulations that the gOMP has excellent recovery performancecomparable to ℓ -minimization technique with fast processingspeed and competitive computational complexity. Index Terms —Compressive sensing (CS), orthogonal matchingpursuit, sparse recovery, restricted isometry property (RIP).
I. I
NTRODUCTION
As a paradigm to acquire sparse signals at a rate signifi-cantly below Nyquist rate, compressive sensing has attractedmuch attention in recent years [1]–[13]. The goal of com-pressive sensing is to recover the sparse vector using a smallnumber of linearly transformed measurements. The process ofacquiring compressed measurements is referred to as sensing while that of recovering the original sparse signals from com-pressed measurements is called reconstruction . In the sensingoperation, K -sparse signal vector x , i.e., n -dimensional vectorhaving at most K non-zero elements, is transformed into m -dimensional measurements y via a matrix multiplication with Φ . The measurement is expressed as y = Φx . (1)Since n > m for most of the compressive sensing scenarios,the system in (1) can be classified as an underdeterminedsystem having more unknowns than observations. Clearly, it isin general impossible to obtain an accurate reconstruction ofthe original input x using conventional “inverse” transform of Φ . Whereas, it is now well known that with a prior information J. Wang, S. Kwon, and B. Shim are with School of Infor-mation and Communication, Korea University, Seoul, Korea (email: { jwang,sbkwon,bshim } @isl.korea.ac.kr).Copyright (c) 2012 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected] work was supported by the KCC (Korea Communications Com-mission), Korea, under the R&D program supervised by the KCA (KoreaCommunication Agency) (KCA-12-911-01-110) and the NRF grant fundedby the Korea government (MEST) (No. 2011-0012525). A part of this paperwas presented in Asilomar Conference on Signals, Systems & Computers,Nov., 2011. on the signal sparsity and a condition imposed on Φ , x canbe reconstructed by solving the ℓ -minimization problem [6]: min x k x k subject to Φx = y . (2)A widely used condition of Φ ensuring the exact recovery of x is called restricted isometry property (RIP) [3]. A sensingmatrix Φ is said to satisfy the RIP of order K if there existsa constant δ ∈ (0 , such that (1 − δ ) k x k ≤ k Φx k ≤ (1 + δ ) k x k (3)for any K -sparse vector x ( k x k ≤ K ). In particular, theminimum of all constants δ satisfying (3) is referred to asan isometry constant δ K . It has been shown that x can beperfectly recovered by solving ℓ -minimization problem if δ K < √ − [6]. While ℓ -norm is convex and hence theproblem can be solved via linear programming (LP) technique,the complexity associated with the LP is cubic (i.e., O ( n ) )in the size of the original vector to be recovered [14] so thatthe complexity is burdensome for many real applications.Recently, greedy algorithms sequentially investigating thesupport of x have received considerable attention as costeffective alternatives of the LP approach. Algorithms in thiscategory include orthogonal matching pursuit (OMP) [8],regularized OMP (ROMP) [15], stagewise OMP (StOMP)[2], subspace pursuit (SP) [16], and compressive samplingmatching pursuit (CoSaMP) [17]. As a representative methodin the greedy algorithm family, the OMP has been widelyused due to its simplicity and competitive performance. Troppand Gilbert [8] showed that, for a K -sparse vector x and an m × n Gaussian sensing matrix Φ , the OMP recovers x from y = Φx with overwhelming probability if the number ofmeasurements follows m ∼ K log n . Wakin and Davenportshowed that the OMP can exactly reconstruct all K -sparsevectors if δ K +1 < √ K [7] and Wang and Shim recentlyimproved the condition to δ K +1 < √ K +1 [18].The main principle behind the OMP is simple and intuitive:in each iteration, a column of Φ maximally correlated with theresidual is chosen ( identification ), the index of this columnis added to the list ( augmentation ), and then the vestigeof columns in the list is eliminated from the measurements,generating a new residual used for the next iteration ( residualupdate ). Among these, computational complexity of the OMPis dominated by the identification and the residual updatesteps. In the k -th iteration, the identification requires a matrix-vector multiplication so that the number of floating pointoperations (flops) becomes (2 m − n . Main operation ofthe residual update is to compute the estimate of x , which iscompleted by obtaining the least squares (LS) solution and therequired flops is approximately km . Additionally, km flops are required to perform the residual update. Considering thatthe algorithm requires K iterations, the total number of flopsof the OMP is about Kmn + 3 K m . Clearly, the sparsity K plays an important role in the complexity of the OMP. Whenthe signal being recovered is not very sparse, therefore, theOMP may not be an excellent choice.There have been some studies on the modification ofthe OMP, mainly on the identification step, to improve thecomputational efficiency and recovery performance. In [2],a method identifying more than one indices in each iter-ation was proposed. In this approach, referred to as theStOMP, indices whose magnitude of correlation exceeds adeliberately designed threshold are chosen. It is shown thatwhile achieving performance comparable to ℓ -minimizationtechnique, the StOMP runs much faster than the OMP as wellas ℓ -minimization technique [2]. In [15], another variation ofthe OMP, so called ROMP, was proposed. After choosing aset of K indices with largest correlation in magnitude, theROMP narrows down the candidates by selecting a subsetsatisfying a predefined regularization rule. It is shown thatthe ROMP algorithm exactly recovers K -sparse signals under δ K < . / √ log K [19]. While the main focus of theStOMP and ROMP algorithm is on the modification of theidentification step, the SP and CoSaMP algorithm requireadditional operation, called pruning step, to refine the signalestimate recursively.Our approach lies on the similar ground of these approachesin the sense that we pursue reduction in complexity throughthe modification on the identification step of the OMP. Specif-ically, towards the reduction of complexity and speeding-up the execution time of the algorithm, we choose multipleindices in each iteration. While previous efforts employ specialtreatment on the identification step such as thresholding [2](for StOMP) or regularization [15] (for ROMP), the proposedmethod pursues direct extension of the OMP by choosingindices corresponding to N ( ≥ ) largest correlation inmagnitude. Therefore, our approach, henceforth referred to as generalized OMP (gOMP), is literally a generalization of theOMP and embraces the OMP as a special case ( N = 1 ).Owing to the selection of multiple indices, multiple “correct”indices (i.e., indices in the support set) are added to thelist and the algorithm is finished with much smaller numberof iterations when compared to the OMP. Indeed, in bothempirical simulations and complexity analysis, we observethat the gOMP achieves substantial reduction in the numberof calculations with competitive reconstruction performance.The primary contributions of this paper are twofold: • We present an extension of the OMP, termed gOMP, forpursuing efficiency in reconstructing sparse signals. Ourempirical simulation shows that the recovery performanceof the gOMP is comparable to the LP technique as well asmodified OMP algorithms (e.g., CoSaMP and StOMP). • We develop a perfect recovery condition of the gOMP.To be specific, we show that the RIP of order
N K with δ NK < √ N √ K +3 √ N ( K > ) is sufficient for the gOMP toexactly recover any K -sparse vector within K iterations(Theorem 3.8). As a special case of the gOMP, we show that a sufficient condition of the OMP is given by δ K +1 < √ K +1 . Also, we extend our work to the reconstructionof sparse signals in the presence of noise and obtain thebound of the estimation error.It has been brought to our attention that in parallel to oureffort, orthogonal multi matching pursuit (OMMP) or orthog-onal super greedy algorithm (OSGA) [20] suggested a similartreatment to the one posted in this paper. Similar approachhas also been introduced in [21]. Nevertheless, our work issufficiently distinct from these works in the sufficient recoverycondition analysis. Further, we also provide an analysis of thenoisy scenario (upper bound of the recovery distortion in ℓ -norm) for which there is no counterpart in the OMMP andOSGA study.The rest of this paper is organized as follows. In SectionII, we introduce the proposed gOMP algorithm and provideempirical experiments on the reconstruction performance. InSection III, we provide the RIP based analysis of the gOMPguaranteeing the perfect reconstruction of K -sparse signals.We also revisit the OMP algorithm as a special case of thegOMP and obtain a sufficient condition ensuring the recoveryof K -sparse signals. In Section IV, we study the reconstructionperformance of the gOMP under noisy measurement scenario.In Section V, we discuss complexity of the gOMP algorithmand conclude the paper in Section VI.We briefly summarize notations used in this paper. Let Ω = { , , · · · , n } then T = { i | i ∈ Ω , x i = 0 } denotes thesupport of vector x . For D ⊆ Ω , | D | is the cardinality of D . T − D = T \ ( T ∩ D ) is the set of all elements contained in T but not in D . x D ∈ R | D | is a restriction of the vector x to theelements with indices in D . Φ D ∈ R m ×| D | is a submatrix of Φ that only contains columns indexed by D . If Φ D is full columnrank, then Φ † D = ( Φ ′ D Φ D ) − Φ ′ D is the pseudoinverse of Φ D . span ( Φ D ) is the span of columns in Φ D . P D = Φ D Φ † D isthe projection onto span ( Φ D ) . P ⊥ D = I − P D is the projectiononto the orthogonal complement of span ( Φ D ) .II. G OMP A
LGORITHM
In each iteration of the gOMP algorithm, correlationsbetween columns of Φ and the modified measurements(residual) are compared and indices of the columns cor-responding to N maximal correlation are chosen as thenew elements of the estimated support set Λ k . As atrivial case, when N = 1 , gOMP returns to theOMP. Denoting the N indices as φ (1) , · · · , φ ( N ) where φ ( i ) = arg max j : j ∈ Ω \{ φ ( i − , ··· ,φ (1) } |h r k − , ϕ j i| , the extendedsupport set at the k -th iteration becomes Λ k = Λ k − ∪{ φ (1) , · · · , φ ( N ) } . After obtaining the LS solution ˆx Λ k =arg min u k y − Φ Λ k u k = Φ † Λ k y , the residual r k is updated bysubtracting Φ Λ k ˆx Λ k from the measurements y . In other words,the projection of y onto the orthogonal complement spaceof span ( Φ Λ k ) becomes the new residual (i.e., r k = P ⊥ Λ k y ).These operations are repeated until either the iteration numberreaches maximum k max = min( K, mN ) or the ℓ -norm of theresidual falls below a prespecified threshold ( k r k k ≤ ǫ ). It is worth mentioning that the residual r k of the gOMP isorthogonal to the columns of Φ Λ k since (cid:10) Φ Λ k , r k (cid:11) = (cid:10) Φ Λ k , P ⊥ Λ k y (cid:11) (4) = Φ ′ Λ k P ⊥ Λ k y (5) = Φ ′ Λ k (cid:0) P ⊥ Λ k (cid:1) ′ y (6) = (cid:0) P ⊥ Λ k Φ Λ k (cid:1) ′ y = (7)where (6) follows from the symmetry of P ⊥ Λ k ( P ⊥ Λ k = (cid:0) P ⊥ Λ k (cid:1) ′ ) and (7) is due to P ⊥ Λ k Φ Λ k = ( I − P Λ k ) Φ Λ k = Φ Λ k − Φ Λ k Φ † Λ k Φ Λ k = . Here we note that this property is satisfied when Φ Λ k hasfull column rank, which is true if k ≤ m/N in the gOMPoperation. It is clear from this observation that indices in Λ k cannot be re-selected in the succeeding iterations and thecardinality of Λ k becomes simply kN . When the iteration loopof the gOMP is finished, therefore, it is possible that the finalsupport set Λ s contains indices not in T . Note that, even inthis situation, the final result is unaffected and the originalsignal is recovered because ˆ x Λ s = Φ † Λ s y (8) = ( Φ ′ Λ s Φ Λ s ) − Φ ′ Λ s Φ T x T (9) = ( Φ ′ Λ s Φ Λ s ) − Φ ′ Λ s ( Φ Λ s x Λ s ) − ( Φ ′ Λ s Φ Λ s ) − Φ ′ Λ s Φ Λ s − T x Λ s − T (10) = x Λ s , (11)where (10) follows from the fact that x Λ s − T = . Fromthis observation, we deduce that as long as at least onecorrect index is found in each iteration of the gOMP, we canensure that the original signal is perfectly recovered within K iterations. In practice, however, the number of correct indicesbeing selected is usually more than one so that the requirednumber of iterations is much smaller than K .In order to observe the empirical performance of the gOMPalgorithm, we performed computer simulations. In our ex-periment, we use the testing strategy in [16], [22] whichmeasures the effectiveness of recovery algorithms by checkingthe empirical frequency of exact reconstruction in the noiselessenvironment. By comparing the maximal sparsity level ofthe underlying sparse signals at which the perfect recoveryis ensured (this point is often called critical sparsity [16]),accuracy of the reconstruction algorithms can be comparedempirically. In our simulation, the following algorithms areconsidered.1) LP technique for solving ℓ -minimization problem(http://cvxr.com/cvx/).2) OMP algorithm.3) gOMP algorithm.4) StOMP with false alarm control (FAC) based threshold-ing (http://sparselab.stanford.edu/). Since FAC scheme outperforms false discovery control (FDC) scheme, weexclusively use FAC scheme in our simulation.
10 20 30 40 50 60 7000.10.20.30.40.50.60.70.80.91 Sparsity F r equen cy o f E x a c t R e c on s t r u c t i on gOMP (N=3)gOMP (N=6)gOMP (N=9)OMPStOMPROMPCoSaMPLP Fig. 1. Reconstruction performance for K -sparse Gaussian signal vector asa function of sparsity K .
10 20 30 40 50 60 7000.10.20.30.40.50.60.70.80.91 Sparsity F r equen cy o f E x a c t R e c on s t r u c t i on gOMP (N=3)gOMP (N=6)gOMP (N=9)OMPStOMPROMPCoSaMPLP Fig. 2. Reconstruction performance for K -sparse PAM signal vector as afunction of sparsity K . m × n ( m = 128 and n = 256 )sensing matrix Φ with entries drawn independently fromGaussian distribution N (0 , m ) . In addition, we generate a K -sparse vector x whose support is chosen at random. Weconsider two types of sparse signals; Gaussian signals andpulse amplitude modulation (PAM) signals. Each nonzeroelement of Gaussian signals is drawn from standard Gaussian TABLE IT
HE G
OMP A
LGORITHM
Input : measurements y ∈ R m ,sensing matrix Φ ∈ R m × n ,sparsity K ,number of indices for each selection N ( N ≤ K and N ≤ m/K ). Initialize : iteration count k = 0 ,residual vector r = y ,estimated support set Λ = ∅ . While k r k k > ǫ and k < min { K, m/N } do k = k + 1 .( Identification ) Select indices { φ ( i ) } i =1 , , ··· ,N correspondingto N largest entries (in magnitude) in Φ ′ r k − .( Augmentation ) Λ k = Λ k − ∪ { φ (1) , · · · , φ ( N ) } .( Estimation ) ˆx Λ k = arg min u k y − Φ Λ k u k .( Residual Update ) r k = y − Φ Λ k ˆx Λ k . EndOutput : the estimated signal ˆ x = arg min u : supp ( u )=Λ k k y − Φu k . and that in PAM signals is randomly chosen from the set {± , ± } . It should be noted that the gOMP algorithm shouldsatisfy N ≤ K and N ≤ m/K and thus the N value shouldnot exceed √ m ( N ≤ √ m ). In view of this, we choose N = 3 , , in our simulations. For each recovery algorithm,we perform at least , independent trials and plot theempirical frequency of exact reconstruction.In Fig. 1, we provide the recovery performance as a functionof the sparsity level K . Clearly, higher critical sparsity impliesbetter empirical reconstruction performance. The simulationresults reveal that the critical sparsity of the gOMP algorithmis larger than that of the ROMP, OMP, and StOMP algorithms.Even compared to the LP technique and CoSaMP, the gOMPexhibits slightly better recovery performance. Fig. 2 providesresults for the PAM input signals. We observe that the overallbehavior is similar to the case of Gaussian signals exceptthat the ℓ -minimization is better than the gOMP. Overall,we observe that the gOMP algorithm is competitive for bothGaussian and PAM input scenarios.In Fig. 3, the running time (average of Gaussian and PAMsignals) for recovery algorithms is provided. The running timeis measured using the MATLAB program under quad-core 64-bit processor and Window environments. Note that we donot add the result of LP technique simply because the runningtime is more than order of magnitude higher than that of allother algorithms. Overall, we observe that the running time ofStOMP, CoSaMP, gOMP, and OMP is more or less similarwhen the signal vector is sparse (i.e., when K is small).However, when the signal vector becomes less sparse (i.e.,when K is large), the running time of the CoSaMP and OMPincreases much faster than that of the gOMP and StOMP. Inparticular, while the running time of the OMP, StOMP, andgOMP increases linearly over K , that for the CoSaMP seemsto increase quadratically over K . Among algorithms undertest, the running time of the StOMP and gOMP ( N = 6 , ) issmallest. Note that we do not use any option for enabling the multithread operations. R unn i ng t i m e ( s e c ) gOMP (N=3)gOMP (N=6)gOMP (N=9)OMPStOMPROMPCoSaMP Fig. 3. Running time as a function of sparsity K . Note that the running timeof the ℓ -minimization is not in the figure since the time is more than orderof magnitude higher than the time of other algorithms. III. RIP
BASED R ECOVERY C ONDITION A NALYSIS
In this section, we analyze the RIP based condition underwhich the gOMP can perfectly recover K -sparse signals.First, we analyze the condition ensuring a success at the firstiteration ( k = 1 ). Success means that at least one correctindex is chosen in the iteration. Next, we study the conditionensuring the success in the non-initial iteration ( k > ). Bycombining two conditions, we obtain the sufficient conditionof the gOMP algorithm guaranteeing the perfect recovery of K -sparse signals. The following lemmas are useful in ouranalysis. Lemma 3.1 (Lemma 3 in [3], [16]):
If the sensing matrixsatisfies the RIP of both orders K and K , then δ K ≤ δ K for any K ≤ K . This property is referred to as the monotonicity of the isometry constant. Lemma 3.2 (Consequences of RIP [3], [17]):
For I ⊂ Ω ,if δ | I | < then for any u ∈ R | I | , (cid:0) − δ | I | (cid:1) k u k ≤ k Φ ′ I Φ I u k ≤ (cid:0) δ | I | (cid:1) k u k ,
11 + δ | I | k u k ≤ k ( Φ ′ I Φ I ) − u k ≤ − δ | I | k u k . Lemma 3.3 (Lemma 2.1 in [6] and Lemma 1 in [16]):
Let I , I ⊂ Ω be two disjoint sets ( I ∩ I = ∅ ). If δ | I | + | I | < , then (cid:13)(cid:13) Φ ′ I Φu (cid:13)(cid:13) = (cid:13)(cid:13) Φ ′ I Φ I u I (cid:13)(cid:13) ≤ δ | I | + | I | k u k holds for any u supported on I . A. Condition for Success at the Initial Iteration
As mentioned, if at least one index is correct among N indices selected, we say that the gOMP makes a successin the iteration. The following theorem provides a sufficientcondition guaranteeing the success of the gOMP in the firstiteration. Theorem 3.4:
Suppose x ∈ R n is a K -sparse signal ( K ≥ N ), then the gOMP algorithm makes a success in the firstiteration if δ K + N < √ N √ K + √ N . (12)
Proof:
Let Λ denote the set of N indices chosen inthe first iteration. Then, elements of Φ ′ Λ y are N significantelements in Φ ′ y and thus k Φ ′ Λ y k = max | I | = N sX i ∈ I |h ϕ i , y i| (13)where ϕ i denotes the i -th column in Φ . Further, we have √ N k Φ ′ Λ y k = 1 √ N max | I | = N sX i ∈ I |h ϕ i , y i| (14) = max | I | = N s | I | X i ∈ I |h ϕ i , y i| (15) ≥ s | T | X i ∈ T |h ϕ i , y i| (16) = 1 √ K k Φ ′ T y k (17)where (16) is from the fact that the average of N -bestcorrelation power is larger than or equal to the average of K (true) correlation power. Using this together with y = Φ T x T ,we have k Φ ′ Λ y k ≥ r NK k Φ ′ T Φ T x T k ≥ r NK (1 − δ K ) k x k (18)where the second inequality is from Lemma 3.2.On the other hand, when no correct index is chosen in thefirst iteration (i.e., Λ ∩ T = ∅ ), k Φ ′ Λ y k = k Φ ′ Λ Φ T x T k ≤ δ K + N k x k , (19) (true support set)(estimated support set) T k ! Fig. 4. Set diagram of Ω , T , and Λ k . where the inequality follows from Lemma 3.3. This inequalitycontradicts (18) if δ K + N k x k < r NK (1 − δ K ) k x k . (20)Note that, under (20), at least one correct index is chosen inthe first iteration. Since δ K ≤ δ K + N by Lemma 3.1, (20)holds true when δ K + N k x k < r NK (1 − δ K + N ) k x k . (21)Equivalently, δ K + N < √ N √ K + √ N . (22)In summary, if δ K + N < √ N √ K + √ N , then Λ contains at leastone element of T in the first iteration of the gOMP. B. Condition for Success in Non-initial Iterations
In this subsection, we investigate the condition guaranteeingthe success of the gOMP in non-initial iterations.
Theorem 3.5:
Suppose N ≤ min { K, mK } and the gOMPhas performed k iterations ( ≤ k < K ) successfully. Thenunder the condition δ NK < √ N √ K + 3 √ N , (23)the gOMP will make a success at the ( k + 1) -th condition.As mentioned, newly selected N indices are not overlappingwith previously selected ones and hence | Λ k | = kN . Also,under the hypothesis that the gOMP has performed k iterationssuccessfully, Λ k contains at least k correct indices. In otherwords, the number of correct indices l in Λ k becomes l = | T ∩ Λ k | ≥ k. Note that we only consider the case where Λ k does not includeall correct indices ( l < K ) since otherwise the reconstructiontask is already finished. Hence, we can safely assume thatthe set containing the rest of the correct indices is nonempty( T − Λ k = ∅ ).Key ingredients in our proof are 1) the upper bound α N of the N -th largest correlation in magnitude between r k andcolumns indexed by F = Ω \ (Λ k ∪ T ) (i.e., the set of remaining When all the correct indices are chosen ( T ⊆ Λ k ) then the residual r k = and hence the gOMP algorithm is finished already. incorrect indices) and 2) the lower bound β of the largestcorrelation in magnitude between r k and columns whoseindices belong to T − Λ k (i.e., the set of remaining correctindices). If β is larger than α N , then β is contained in thetop N among all values of |h ϕ j , r k i| and hence at least onecorrect index is chosen in the ( k + 1) -th iteration.The following two lemmas provide the upper bound of α N and the lower bound of β , respectively. Lemma 3.6:
Let α i = |h ϕ φ ( i ) , r k i| where φ ( i ) =arg max j : j ∈ F \{ φ (1) , ··· ,φ ( i − } (cid:12)(cid:12)(cid:10) ϕ j , r k (cid:11)(cid:12)(cid:12) so that α i are ordered inmagnitude ( α ≥ α ≥ · · · ). Then, in the ( k + 1) -th iterationin the gOMP algorithm, α N , the N -th largest correlation inmagnitude between r k and { ϕ i } i ∈ F , satisfies α N ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k √ N . (24)
Proof:
See Appendix A.
Lemma 3.7:
Let β i = |h ϕ φ ( i ) , r k i| where φ ( i ) =arg max j : j ∈ ( T − Λ k ) \{ φ (1) , ··· ,φ ( i − } (cid:12)(cid:12)(cid:10) ϕ j , r k (cid:11)(cid:12)(cid:12) so that β i are or-dered in magnitude ( β ≥ β ≥ · · · ). Then in the ( k + 1) -thiteration in the gOMP algorithm, β , the largest correlation inmagnitude between r k and { ϕ i } i ∈ T − Λ k , satisfies β ≥ − δ K − l − p δ K − l √ δ Nk δ Nk + K − l − δ Nk ! × k x T − Λ k k √ K − l . (25) Proof:
See Appendix B.We now have all ingredients to prove Theorem 3.5.
Proof of Theorem 3.5
Proof:
A sufficient condition under which at least onecorrect index is selected at the ( k +1) -th step can be describedas α N < β . (26)Noting that ≤ k ≤ l < K and < N ≤ K and alsousing the monotonicity of the restricted isometry constant inLemma 3.1, we have K − l < N K → δ K − l < δ NK ,N k + K − l < N K → δ Nk + K − l < δ NK ,N k < N K → δ Nk < δ NK ,N + N k ≤ N K → δ N + Nk ≤ δ NK . (27)From Lemma 3.6 and (27), we have α N ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) × k x T − Λ k k √ N (28) ≤ (cid:18) δ NK + δ NK − δ NK (cid:19) k x T − Λ k k √ N (29) = δ NK − δ NK k x T − Λ k k √ N . (30) Also, from Lemma 3.7 and (27), we have β ≥ − δ K − l − p δ K − l √ δ Nk δ Nk + K − l − δ Nk ! × k x T − Λ k k √ K − l (31) ≥ (cid:18) − δ NK − √ δ NK √ δ NK δ NK − δ NK (cid:19) × k x T − Λ k k √ K − l (32) = 1 − δ NK − δ NK k x T − Λ k k √ K − l . (33)Using (30) and (33), we obtain the sufficient condition of (26)as − δ NK − δ NK k x T − Λ k k √ K − l > δ NK − δ NK k x T − Λ k k √ N . (34)After some manipulations, we have δ NK < √ N √ K − l + 3 √ N . (35)Since √ K − l < √ K , (35) holds if δ NK < √ N √ K + 3 √ N , (36)which completes the proof.
C. Overall Sufficient Condition
Thus far, we investigated conditions guaranteeing the suc-cess of the gOMP algorithm in the initial iteration ( k = 1 ) andnon-initial iterations ( k > ). We now combine these resultsto describe the sufficient condition of the gOMP algorithmensuring the perfect recovery of K -sparse signals.Recall from Theorem 3.4 that the gOMP makes a successin the first iteration if δ K + N < √ N √ K + √ N . (37)Also, recall from Theorem 3.5 that if the previous k iterationswere successful, then the gOMP will be successful for the ( k + 1) -th iteration if δ NK < √ N √ K + 3 √ N . (38)In essence, the overall sufficient condition is determined bythe stricter condition between (37) and (38).
Theorem 3.8 (Sufficient condition of gOMP):
Let N ≤ min { K, mK } , then the gOMP algorithm perfectly recovers any K -sparse vector x from y = Φx via at most K iterations ifthe sensing matrix Φ satisfies the RIP with isometry constant δ NK < √ N √ K +3 √ N for K > , (39) δ < for K = 1 . (40) Proof:
In order to prove the theorem, the following threecases need to be considered. • Case 1 [
N > , K > ]:In this case, N K ≥ K + N and hence δ NK ≥ δ K + N and also √ N √ K + √ N > √ N √ K +3 √ N . Thus, (38) is stricter than(37) and the general condition becomes δ NK < √ N √ K + 3 √ N . (41) • Case 2 [ N = 1 , K > ]:In this case, the general condition should be the strictercondition between δ K < √ K +3 and δ K +1 < √ K +1 . Un-fortunately, since δ K ≤ δ K +1 and √ K +3 ≤ √ K +1 , onecannot compare two conditions directly. As an indirectway, we borrow a sufficient condition guaranteeing theperfect recovery of the gOMP for N = 1 as δ K < √ K − √ K − K . (42)Readers are referred to [23] for the proof of (42). Since √ K +3 < √ K − √ K − K for K > , the sufficient conditionfor Case 2 becomes δ K < √ K + 3 . (43)It is interesting to note that (43) can be nicely combinedwith the result of Case 1 since applying N = 1 in (41)will result in (43). • Case 3 [ K = 1 ]:Since x has a single nonzero element ( K = 1 ), x shouldbe recovered in the first iteration. Let u be the index ofnonzero element, then the exact recovery of x is ensuredregardless of N if |h ϕ u , y i| = max i |h ϕ i , y i| . (44)The condition ensuring (44) is obtained by applying N = K = 1 for Theorem 3.4 and is given by δ < . Remark 1 ( δ K based recovery condition): We can expressour condition with a small order of isometry constant. Byvirtue of [17, Corollary 3.4] ( δ cK ≤ cδ K for positive integer c ), the proposed bound holds whenever δ K < √ NK +3 N . Remark 2 (Comparison with previous work):
It is worthmentioning that there have been previous efforts to investigatethe sufficient condition for this algorithm. In particular, thecondition δ NK < √ N (2+ √ √ K was established in [20, Theorem2.1]. The proposed bound δ NK < √ N √ K +3 √ N holds the advan-tage over this bound if N < (3+2 √ K ≈ . K . Since N is typically much smaller than K , the proposed bound offersbetter recovery condition in many practical scenarios. Remark 3 (Measurement size of sensing matrix):
It is wellknown that an m × n random sensing matrix whose entriesare i.i.d. with Gaussian distribution N (0 , m ) obeys the RIP In fact, N should not be large since the inequality N ≤ mK must beguaranteed. ( δ K < ε ) with overwhelming probability if the dimension ofthe measurements satisfies [9] m = O (cid:18) K log nK ε (cid:19) . (45)In [7], it is shown that the OMP requires m = O (cid:0) K log( n/K ) (cid:1) random measurements for reconstructing K -sparse signal. Plugging (39) into (45), we also get the sameresult. D. Sufficient Condition of OMP
In this subsection, we put our focus on the OMP algorithmwhich is the special case of the gOMP algorithm for N = 1 .For sure, one can immediately obtain the condition of the OMP δ K < √ K +3 by applying N = 1 to Theorem 3.8. Our resultis an improved version of this and based on the fact that thenon-initial step of the OMP process is the same as the initialstep since the residual is considered as a new measurementpreserving the sparsity K of an input vector x [18], [23]. Inthis regard, a condition guaranteeing to select a correct indexin the first iteration can be readily extended to the generalcondition without incurring any loss. Corollary 3.9 (Direct consequence of Theorem 3.4):
Suppose x ∈ R n is K -sparse, then the OMP algorithmrecovers an index in T from y = Φx ∈ R m in the firstiteration if δ K +1 < √ K +1 .We now state that the first iteration condition is extendedto any iteration of the OMP algorithm. Lemma 3.10 (Wang and Shim [18]):
Suppose that the first k iterations ( ≤ k ≤ K − ) of the OMP algorithm aresuccessful (i.e., Λ k ⊂ T ), then the ( k + 1) -th iteration is alsosuccessful (i.e., t k +1 ∈ T ) under δ K +1 < √ K +1 .Combining Corollary 3.9 and Lemma 3.10, and also notingthat indices in Λ k are not selected again in the succeedingiterations (since the index chosen in the ( k +1) -th step belongsto T − Λ k ), one can conclude that Λ K = T and the OMPalgorithm recovers original signal x in exactly K iterationsunder δ K +1 < √ K +1 .The following theorem formally describes the sufficientcondition of the OMP algorithm. Theorem 3.11 (Wang and Shim [18]):
Suppose x is K -sparse vector, then the OMP algorithm recovers x from y = Φx under δ K +1 < √ K + 1 . (46) Proof:
Immediate from Corollary 3.9 and Lemma 3.10.
Remark 4 (Comments on bounds of (41) and (46) ): The bounds of the gOMP in (41) and the OMP in (46)cannot be directly compared since δ NK ≥ δ K +1 and √ N √ K +3 √ N ≥ √ K +1 . Nevertheless, the difference in the ordermight offer possible advantages to the OMP. It should benoted that the RIP condition, analyzed based on the worsecase scenario, is for the perfect recovery and hence offerstoo conservative bound. This explains why the most ofsparse recovery algorithms perform better than the bound predicts in practice. It should also be noted that by allowingmore iterations (larger than K iterations) for the OMP,one can improve performance [20], [24], [25] and achieveperformance comparable to the gOMP. However, this mayincur large delay and higher computational cost.IV. R ECONSTRUCTION OF S PARSE S IGNALS FROM N OISY M EASUREMENTS
In this section, we consider the reconstruction performanceof the gOMP algorithm in the presence of noise. Since themeasurement is expressed as y = Φx + v in this scenario,perfect reconstruction of x cannot be guaranteed and hence weneed to use the upper bound of ℓ -norm distortion k x − ˆx k as a performance measure.Recall that the termination condition of the gOMP algorithmis either k r s k < ǫ or k ≥ min (cid:8) K, mN (cid:9) . Note that, since min (cid:8) K, mN (cid:9) = K under the condition that Φ satisfies theRIP of order N K , the stopping rule of the gOMP can besimplified to k r k k < ǫ or k = K . In these two scenarios, weinvestigate the upper bound of k x − ˆx k .The next theorem provides the upper bound of ℓ -normdistortion when the gOMP algorithm is finished by k r k k < ǫ . Theorem 4.1:
Let Φ be the sensing matrix satisfying RIP oforder N K . If k r s k < ǫ is satisfied after s ( s < K ) iterations,then k x − ˆx k ≤ ǫ √ − δ NK + k v k √ − δ NK . (47) Proof:
See Appendix C.The next theorem provides the upper bound of k x − ˆx k for the second scenario (i.e., when the gOMP is terminatedafter K iterations). Theorem 4.2:
Let Φ be the sensing matrix satisfying theRIP of order N K + K and δ NK < √ N √ N + √ K . Suppose thegOMP algorithm is terminated after K iterations, then k x − ˆx k ≤ k v k √ − δ NK , if T ⊂ Λ K , (48) k x − ˆx k ≤ C K k v k , if T Λ K , (49)where C K = (1 − δ NK ) (cid:16) δ K + q KN (1 + δ N ) (1 + δ K ) (cid:17)(cid:16) − δ NK − q KN δ NK (cid:17) p − δ NK + K + 2 (cid:16) − δ NK − q KN δ NK (cid:17)(cid:16) − δ NK − q KN δ NK (cid:17) p − δ NK + K . Since C K > √ − δ NK , one can get the simple upper bound as k x − ˆx k ≤ C K k v k . If Φ satisfies the RIP of order NK , (i.e., δ NK ∈ (0 , ), then < − δ NK ≤ λ min (cid:0) Φ ′ D Φ D (cid:1) for all index set D with | D | ≤ NK , whichindicates that all eigenvalues of Φ D are positive. Thus, Φ D should be fullcolumn rank (i.e., K ≤ m/N ). Remark 5 (Comments about C K ): Note that, by the hy-pothesis of the theorem, < − δ NK − q KN δ NK < ,so C k for large K is approximately C K ≈ (1 − δ NK ) p (1 + δ N ) (1 + δ K ) (cid:16) − δ NK − q KN δ NK (cid:17) p − δ NK + K r KN . (50)This says that the ℓ -norm distortion is essentially upperbounded by the product of noise power and c q KN ( c isa constant). It is clear from (50) that C K decreases as N increases. Hence, by increasing N (i.e., allowing more indicesto be selected per step), we may obtain a better (smaller)distortion bound. However, since N K ≤ m needs to besatisfied, this bound is guaranteed only for very sparse signalvectors.Before providing the proof of Theorem 4.2, we analyze asufficient condition of the gOMP to make success at the ( k +1) -th iteration when the former k iterations are successful. Inour analysis, we reuse the notation α i and β i of Lemma 3.6and 3.7. The following lemma provides an upper bound of α N and a lower bound of β . Lemma 4.3: α N and β satisfy α N ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k √ N + √ δ N k v k √ N (51)and β ≥ − δ K − l − p δ K − l √ δ Nk δ Nk + K − l − δ Nk ! × k x T − Λ k k √ K − l − p δ K − l k v k √ K − l . (52) Proof:
See Appendix D.As mentioned, the gOMP will select at least one correctindex from T at the ( k +1) -th iteration provided that α N < β . Lemma 4.4:
Suppose the gOMP has performed k iterations( ≤ k < K ) successfully. Then under the condition k x T − Λ k k > (cid:16) √ δ K + q KN + KN δ N (cid:17) (1 − δ NK )1 − δ NK − q KN δ NK k v k , (53)the gOMP makes a success at the ( k + 1) -th condition. Proof:
See Appendix E.We are now ready to prove Theorem 4.2.
Proof:
We first consider the scenario where Λ K contains all correct indices (i.e., T ⊂ Λ K ). In this case, k x − ˆx k ≤ p − δ | Λ K ∪ T | k Φ ( x − ˆx ) k (54) = 1 p − δ | Λ K | k Φ ( x − ˆx ) k (55) = 1 √ − δ NK (cid:13)(cid:13)(cid:13) Φx − Φ Λ K Φ † Λ K y (cid:13)(cid:13)(cid:13) (56) = (cid:13)(cid:13)(cid:13) Φx − Φ Λ K Φ † Λ K ( Φx + v ) (cid:13)(cid:13)(cid:13) √ − δ NK (57) = k Φx − Φ Λ K Φ † Λ K Φx − Φ Λ K Φ † Λ K v k √ − δ NK (58) = k P Λ K v k √ − δ NK (59) ≤ k v k √ − δ NK , (60)where (59) follows from Φx − Φ Λ K Φ † Λ K Φx = Φx − Φ Λ K Φ † Λ K Φ Λ K x Λ K = for T ⊂ Λ K .Now we turn to the next scenario where Λ K does notcontain all the correct indices (i.e., T Λ K ). Since thealgorithm has performed K iterations yet failed to find allcorrect indices, it is clear that the gOMP algorithm does notmake a success for some iteration (say this occurs at ( p +1) -thiteration). Then, by the contraposition of Lemma 4.4, k x T − Λ p k ≤ (cid:16) √ δ K + q KN + KN δ N (cid:17) (1 − δ NK )1 − δ NK − q KN δ NK k v k . (61)Since x − ˆx is at most ( N K + K ) -sparse, k x − ˆx k ≤ p − δ | Λ K ∪ T | k Φ ( x − ˆx ) k (62) ≤ (cid:13)(cid:13)(cid:13) Φx − Φ Λ K Φ † Λ K y (cid:13)(cid:13)(cid:13) p − δ NK + K (63) = (cid:13)(cid:13)(cid:13) y − v − Φ Λ K Φ † Λ K y (cid:13)(cid:13)(cid:13) p − δ NK + K (64) = (cid:13)(cid:13) r K − v (cid:13)(cid:13) p − δ NK + K (65) ≤ (cid:13)(cid:13) r K (cid:13)(cid:13) + k v k p − δ NK + K . (66) Also, (cid:13)(cid:13) r K (cid:13)(cid:13) ≤ k r p k , and thus k x − ˆx k ≤ p − δ NK + K ( k r p k + k v k ) (67) = (cid:13)(cid:13) P ⊥ Λ p y (cid:13)(cid:13) + k v k p − δ NK + K (68) = (cid:13)(cid:13) P ⊥ Λ p Φ T x T + P ⊥ Λ p v (cid:13)(cid:13) + k v k p − δ NK + K (69) = (cid:13)(cid:13) P ⊥ Λ p Φ T − Λ p x T − Λ p + P ⊥ Λ p v (cid:13)(cid:13) p − δ NK + K + k v k p − δ NK + K (70) ≤ (cid:13)(cid:13) P ⊥ Λ p Φ T − Λ p x T − Λ p (cid:13)(cid:13) p − δ NK + K + (cid:13)(cid:13) P ⊥ Λ p v (cid:13)(cid:13) + k v k p − δ NK + K (71) ≤ k Φ T − Λ p x T − Λ p k + 2 k v k p − δ NK + K , (72)where (70) is because P ⊥ Λ p cancels all the components in span ( Φ Λ p ) . Using the definition of the RIP, we have k Φ T − Λ p x T − Λ p k ≤ q δ | T − Λ p | k x T − Λ p k ≤ p δ K k x T − Λ p k , (73)and hence k x − ˆx k ≤ √ δ K k x T − Λ p k + 2 k v k p − δ NK + K . (74)Combining (74) and (61), we finally have k x − ˆx k ≤ C K k v k , (75)where C K = (1 − δ NK ) (cid:16) δ K + q KN (1 + δ N ) (1 + δ K ) (cid:17)(cid:16) − δ NK − q KN δ NK (cid:17) p − δ NK + K + 2 (cid:16) − δ NK − q KN δ NK (cid:17)(cid:16) − δ NK − q KN δ NK (cid:17) p − δ NK + K . V. D
ISCUSSIONS ON C OMPLEXITY
In this section, we discuss the complexity of the gOMPalgorithm. Our analysis shows that the computational com-plexity is approximately smn +(2 N + N ) s m where s is thenumber of iterations. We show by empirical simulations thatthe number of iterations s is small so that the proposed gOMPalgorithm is effective in running time and computationalcomplexity.The complexity for each step of the gOMP algorithm issummarized as follows. Due to the orthogonal projection of the gOMP, the magnitude of theresidual decreases as iterations go on ( (cid:13)(cid:13) r i (cid:13)(cid:13) ≤ (cid:13)(cid:13) r j (cid:13)(cid:13) for i ≥ j ). N u m be r o f i t e r a t i on gOMP (N=3)gOMP (N=6)gOMP (N=9)OMPK/3 Fig. 5. Number of iterations of the OMP and gOMP ( N = 5 ) as a functionof sparsity K . The red dashed line is the reference curve indicating K/ iterations. • Identification : The gOMP performs a matrix-vector mul-tiplication Φ ′ r k − , which requires (2 m − n flops ( m multiplication and m − additions). Also, Φ ′ r k − needsto be sorted to find N best indices, which requires nN − N ( N + 1) / flops. • Estimation of ˆ x Λ k : The LS solution ˆ x Λ k is obtained inthis step. Using the QR factorization of Φ Λ k ( Φ Λ k = QR ), we have ˆx Λ k = ( Φ ′ Λ k Φ Λ k ) − Φ ′ Λ k y = ( R ′ R ) − R ′ Q ′ y (76)and this leads to a cost of O ( k m ) [26]. Actually, sincethe elements of Λ k and Λ k − are largely overlapped, it ispossible to recycle the part of the previous QR factoriza-tion of Φ Λ k − and then apply the modified Gram-Schmidt(MGS) algorithm. In doing so, the LS solution can besolved efficiently (see Appendix F), and the associatedcost is N km + ( − N + 5 N ) m + 2 N k + ( − N +5 N ) k + 3 N − N − N flops. • Residual update : For the residual update, the gOMP per-forms the matrix-vector multiplication Φ Λ k ˆ x Λ k ( (2 N k − m flops) followed by the subtraction ( m flops).Table II summarizes the complexity of each operation inthe k -th iteration of the gOMP. The complexity of the k -thiterations is approximately mn + (4 N + 2 N ) km . If thegOMP is finished in s iterations, then the complexity of the We note that the fast approach of the MGS for solving LS problem canalso be applied to the OMP, ROMP, and StOMP, but with the exception of theCoSaMP. This is because the CoSaMP algorithm computes a completely newLS solution over the distinct subset of Φ in each iteration. Nevertheless, otherfast approaches such as Richardson’s iteration or conjugate gradient might beapplied in the CoSaMP. TABLE IIC OMPLEXITY OF THE G
OMP
ALGORITHM ( k - TH STEP )Operation ComplexityIdentification (2 m − N ) n − N ( N + 1) / O ( mn ) Estimation of ˆ x Λ k ≈ N km = O ( km ) Residual update Nkm
Total ≈ mn + (4 N + 2 N ) km = O ( mn ) gOMP, denoted as C gOMP ( N, s, m, n ) , becomes C gOMP ( N, s, m, n ) ≈ s X k =1 mn + (4 N + 2 N ) km = 2 smn + (2 N + N ) s m. Noting that s ≤ min { K, m/N } and N is a small constant,the complexity of the gOMP can be expressed as O ( Kmn ) .In practice, however, the iteration number of the gOMP ismuch smaller than K due to the inclusion of multiple correctindices for each iteration, which saves the complexity of thegOMP substantially. Indeed, as shown in Fig. 5, the numberof iterations is about of the OMP so that the gOMP has acomputational advantage over the OMP.VI. C ONCLUSION
As a cost-effective solution for recovering sparse signalsfrom compressed measurements, the OMP algorithm hasreceived much attention in recent years. In this paper, wepresented the generalized version of the OMP algorithm forpursuing efficiency in reconstructing sparse signals. Sincemultiple indices can be identified with no additional postpro-cessing operation, the proposed gOMP algorithm lends itselfto parallel-wise processing, which expedites the processingof the algorithm and thereby reduces the running time. Infact, we demonstrated in the empirical simulations that thegOMP has excellent recovery performance comparable to ℓ -minimization technique with fast processing speed andcompetitive computational complexity. We showed from theRIP analysis that if the isometry constant of the sensing matrixsatisfies δ NK < √ N √ K +3 √ N then the gOMP algorithm canperfectly recover K -sparse signals ( K > ) from compressedmeasurements. One important point we would like to mentionis that the gOMP algorithm is potentially more effectivethan what this analysis tells. Indeed, the bound in (39) isderived based on the worst case scenario where the algorithmselects only one correct index per iteration (hence requiresmaximum K iterations). In reality, as observed in the empiricalsimulations, it is highly likely that the multiple correct indicesare identified for each iteration and hence the number ofiterations is usually much smaller than that of the OMP.Therefore, we believe that less strict or probabilistic analysiswill uncover the whole story of the CS recovery performance.Our future work will be directed towards this avenue.A PPENDIX AP ROOF OF L EMMA
Proof:
Let w i be the index of the i -th largest correlationin magnitude between r k and { ϕ j } j ∈ F (i.e., columns cor-responding to remaining incorrect indices). Also, define the set of indices W = { w , w , · · · , w N } . The ℓ -norm of thecorrelation Φ ′ W r k is expressed as (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) = (cid:13)(cid:13) Φ ′ W P ⊥ Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13) = k Φ ′ W Φ T − Λ k x T − Λ k − Φ ′ W P Λ k Φ T − Λ k x T − Λ k k ≤ k Φ ′ W Φ T − Λ k x T − Λ k k + k Φ ′ W P Λ k Φ T − Λ k x T − Λ k k (77)where P ⊥ Λ k = I − P Λ k . Since W and T − Λ k are disjoint (i.e., W ∩ ( T − Λ k ) = ∅ ) and | W | + | T − Λ k | = N + K − l (notethat the number of correct indices in Λ k is l by hypothesis).Using this together with Lemma 3.3, we have k Φ ′ W Φ T − Λ k x T − Λ k k ≤ δ N + K − l k x T − Λ k k . (78)Similarly, noting that W ∩ Λ k = ∅ and | W | + | Λ k | = N + N k ,we have k Φ ′ W P Λ k Φ T − Λ k x T − Λ k k ≤ δ N + Nk (cid:13)(cid:13)(cid:13) Φ † Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13)(cid:13) (79)where (cid:13)(cid:13)(cid:13) Φ † Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) ( Φ ′ Λ k Φ Λ k ) − Φ ′ Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13)(cid:13) (80) ≤ − δ Nk k Φ ′ Λ k Φ T − Λ k x T − Λ k k (81) ≤ δ Nk + K − l − δ Nk k x T − Λ k k , (82)where (81) and (82) follow from Lemma 3.2 and Lemma 3.3,respectively. Since Λ k and T − Λ k are disjoint, if the number ofcorrect indices in Λ k is l , then (cid:12)(cid:12) Λ k ∪ (cid:0) T − Λ k (cid:1)(cid:12)(cid:12) = N k + K − l .Using (77), (78), (79), and (82), we have (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k . (83)Since α i = |h ϕ w i , r k i| , we have k Φ ′ W r k k = N P i =1 α i . Now,using the norm inequality , we have (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) ≥ √ N N X i =1 α i . (84)Since α ≥ α ≥ · · · ≥ α N , it is clear that (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) ≥ √ N N α N = √ N α N . Hence, we have (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k ≥ √ Nα N , (85)and α N ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k √ N . (86) k u k ≤ p k u k k u k . A PPENDIX BP ROOF OF L EMMA
Proof:
Since β is the largest correlation in magnitudebetween r k and { ϕ j } j ∈ T − Λ k , it is clear that β ≥ (cid:12)(cid:12)(cid:10) ϕ j , r k (cid:11)(cid:12)(cid:12) (87)for all j ∈ T − Λ k , and hence β ≥ √ K − l (cid:13)(cid:13) Φ ′ T − Λ k r k (cid:13)(cid:13) (88) = 1 √ K − l (cid:13)(cid:13) Φ ′ T − Λ k P ⊥ Λ k Φx (cid:13)(cid:13) (89)where (89) follows from r k = y − Φ Λ k Φ † Λ k y = P ⊥ Λ k Φx .Using the triangle inequality, β ≥ (cid:13)(cid:13) Φ ′ T − Λ k P ⊥ Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13) √ K − l (90) ≥ √ K − l ( k Φ ′ T − Λ k Φ T − Λ k x T − Λ k k −k Φ ′ T − Λ k P Λ k Φ T − Λ k x T − Λ k k ) . (91)Since | T − Λ k | = K − l , k Φ ′ T − Λ k Φ T − Λ k x T − Λ k k ≥ (1 − δ K − l ) k x T − Λ k k , (92)and k Φ ′ T − Λ k P Λ k Φ T − Λ k x T − Λ k k ≤ k Φ ′ T − Λ k k k P Λ k Φ T − Λ k x T − Λ k k (93) ≤ p δ K − l k P Λ k Φ T − Λ k x T − Λ k k (94)where (94) is from k Φ ′ T − Λ k k ≤ q λ max ( Φ ′ T − Λ k Φ T − Λ k ) ≤ p δ K − l . Furthermore, we observe that k P Λ k Φ T − Λ k x T − Λ k k = (cid:13)(cid:13)(cid:13) Φ Λ k ( Φ ′ Λ k Φ Λ k ) − Φ ′ Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13)(cid:13) (95) ≤ p δ Nk × (cid:13)(cid:13)(cid:13) ( Φ ′ Λ k Φ Λ k ) − Φ ′ Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13)(cid:13) (96) ≤ √ δ Nk − δ Nk k Φ ′ Λ k Φ T − Λ k x T − Λ k k (97) ≤ δ Nk + K − l √ δ Nk − δ Nk k x T − Λ k k , (98)where (96) is from the definition of RIP and (97) and (98)follow from Lemma 3.2 and 3.3, respectively ( Λ k and T − Λ k are disjoint sets and (cid:12)(cid:12) Λ k ∪ (cid:0) T − Λ k (cid:1)(cid:12)(cid:12) = N k + K − l ). Bycombining (94) and (98), we obtain k Φ ′ T − Λ k P Λ k Φ T − Λ k x T − Λ k k ≤ p δ K − l √ δ Nk δ Nk + K − l − δ Nk k x T − Λ k k . (99) Finally, by combining (91) (92), and (99), we obtain β ≥ k x T − Λ k k √ K − l × − δ K − l − p δ K − l √ δ Nk δ Nk + K − l − δ Nk ! . (100)A PPENDIX CP ROOF OF T HEOREM
Proof:
We observe that k x − ˆx k ≤ k Φ ( x − ˆx ) k p − δ | Λ s ∪ T | (101) ≤ k Φ ( x − ˆx ) k p − δ Ns + K (102) ≤ (cid:13)(cid:13)(cid:13) Φx − Φ Λ s Φ † Λ s y (cid:13)(cid:13)(cid:13) p − δ Ns + K (103) = (cid:13)(cid:13)(cid:13) y − v − Φ Λ s Φ † Λ s y (cid:13)(cid:13)(cid:13) p − δ Ns + K (104)where (101) is due to the definition of the RIP, (102) followsfrom the fact that x − ˆx is at most ( N s + K ) -sparse ( δ | Λ s ∪ T | ≤ δ Ns + K ), and (104) is from ˆx Λ s = Φ † Λ s y .Since y − Φ Λ s Φ † Λ s y = y − P Λ s y = P ⊥ Λ s y = r s , we furtherhave k x − ˆx k ≤ p − δ Ns + K k r s − v k (105) ≤ p − δ Ns + K ( k r s k + k v k ) (106) ≤ ǫ √ − δ NK + k v k √ − δ NK (107)where the last inequality is due to N s + K ≤ N K (and hence δ Ns + K ≤ δ NK ) and k r s k < ǫ .A PPENDIX DP ROOF OF L EMMA
Proof:
Using a triangle inequality, (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) = (cid:13)(cid:13) Φ ′ W P ⊥ Λ k ( Φx + v ) (cid:13)(cid:13) (108) ≤ (cid:13)(cid:13) Φ ′ W P ⊥ Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13) + (cid:13)(cid:13) Φ ′ W P ⊥ Λ k v (cid:13)(cid:13) (109) ≤ k Φ ′ W Φ T − Λ k x T − Λ k k + k Φ ′ W P Λ k Φ T − Λ k x T − Λ k k + (cid:13)(cid:13) Φ ′ W P ⊥ Λ k v (cid:13)(cid:13) . (110)Using (78), (79), and (82), (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) is upper bounded by (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k + (cid:13)(cid:13) Φ ′ W P ⊥ Λ k v (cid:13)(cid:13) . (111) Further, we have (cid:13)(cid:13) Φ ′ W P ⊥ Λ k v (cid:13)(cid:13) ≤ k Φ ′ W k (cid:13)(cid:13) P ⊥ Λ k v (cid:13)(cid:13) (112) ≤ p λ max ( Φ ′ W Φ W ) × (cid:13)(cid:13) P ⊥ Λ k v (cid:13)(cid:13) (113) ≤ p δ N (cid:13)(cid:13) P ⊥ Λ k v (cid:13)(cid:13) (114) ≤ p δ N k v k . (115)Plugging (115) into (111), we have (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k + p δ N k v k . (116)Using (cid:13)(cid:13) Φ ′ W r k (cid:13)(cid:13) ≥ √ N α N and (116), we have α N ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) k x T − Λ k k √ N + √ δ N k v k √ N . (117)Next, we derive a lower bound for β . First, recalling that β = max j : j ∈ T − Λ k (cid:12)(cid:12)(cid:10) ϕ j , r k (cid:11)(cid:12)(cid:12) , we have β ≥ √ K − l (cid:13)(cid:13) Φ ′ T − Λ k r k (cid:13)(cid:13) (118) = 1 √ K − l (cid:13)(cid:13) Φ ′ T − Λ k P ⊥ Λ k ( Φx + v ) (cid:13)(cid:13) . (119)Using a triangle inequality, β ≥ (cid:13)(cid:13) Φ ′ T − Λ k P ⊥ Λ k Φ T − Λ k x T − Λ k (cid:13)(cid:13) √ K − l − (cid:13)(cid:13) Φ ′ T − Λ k P ⊥ Λ k v (cid:13)(cid:13) √ K − l (120) ≥ k Φ ′ T − Λ k Φ T − Λ k x T − Λ k k √ K − l − k Φ ′ T − Λ k P Λ k Φ T − Λ k x T − Λ k k √ K − l − (cid:13)(cid:13) Φ ′ T − Λ k P ⊥ Λ k v (cid:13)(cid:13) √ K − l . (121)From (92) and (99), we have k Φ ′ T − Λ k Φ T − Λ k x T − Λ k k ≥ (1 − δ K − l ) k x T − Λ k k , and k Φ ′ T − Λ k P Λ k Φ T − Λ k x T − Λ k k ≤ p δ K − l √ δ Nk δ Nk + K − l − δ Nk k x T − Λ k k . (122)Also, (cid:13)(cid:13) Φ ′ T − Λ k P ⊥ Λ k v (cid:13)(cid:13) ≤ k Φ ′ T − Λ k k (cid:13)(cid:13) P ⊥ Λ k v (cid:13)(cid:13) (123) ≤ p δ K − l (cid:13)(cid:13) P ⊥ Λ k v (cid:13)(cid:13) (124) ≤ p δ K − l k v k . (125) Finally, by combining (121), (122) and (125), we obtain β ≥ − δ K − l − p δ K − l √ δ Nk δ Nk + K − l − δ Nk ! × k x T − Λ k k √ K − l − p δ K − l k v k √ K − l . (126)A PPENDIX EP ROOF OF L EMMA
Proof:
Using the relaxations of the isometry constants in(27), we have α N ≤ (cid:18) δ N + K − l + δ N + Nk δ Nk + K − l − δ Nk (cid:19) × k x T − Λ k k √ N + √ δ N k v k √ N ≤ (cid:18) δ NK + δ NK δ NK − δ NK (cid:19) k x T − Λ k k √ N + √ δ N k v k √ N = δ NK − δ NK k x T − Λ k k √ N + √ δ N k v k √ N (127)and β ≥ − δ K − l − p δ K − l √ δ Nk δ Nk + K − l − δ Nk ! × k x T − Λ k k √ K − l − p δ K − l k v k √ K − l ≥ (cid:18) − δ NK − √ δ NK √ δ NK δ NK − δ NK (cid:19) × k x T − Λ k k √ K − l − p δ K − l k v k √ K − l = 1 − δ NK − δ NK k x T − Λ k k √ K − l − p δ K − l k v k √ K − l . (128)The bounds on α N and β imply that a sufficient conditionof α N < β is − δ NK − δ NK k x T − Λ k k √ K − l − p δ K − l k v k √ K − l> δ NK − δ NK k x T − Λ k k √ N + √ δ N k v k √ N . (129)After some manipulations, we have k x T − Λ k k > (cid:18)p δ K − l + q KN + K − lN δ N (cid:19) (1 − δ NK )1 − δ NK − q K − lN δ NK k v k . (130)Since K − l ≤ K , this condition is guaranteed if k x T − Λ k k > (cid:16) √ δ K + q KN (1 + δ N ) (cid:17) (1 − δ NK )1 − δ NK − q KN δ NK k v k . (131) A PPENDIX FC OMPUTATIONAL COST FOR THE “ ESTIMATE ” STEP OFG
OMPIn the k -th iteration, the gOMP estimates the nonzeroelements of x by solving an LS problem, ˆx Λ k = arg min x k y − Φ Λ k x k = ( Φ ′ Λ k Φ Λ k ) − Φ ′ Λ k y . (132)To solve (132), we employ the MGS algorithm in which theQR decomposition of previous iteration is maintained and,therefore, the computational cost can be reduced.Without loss of generality, we assume Φ Λ k = (cid:0) ϕ ϕ · · · ϕ Nk (cid:1) . The QR decomposition of Φ Λ k is given by Φ Λ k = QR where Q = (cid:0) q q · · · q Nk (cid:1) ∈ R m × Nk consists of N k orthonormal columns and R ∈ R Nk × Nk is an upper triangularmatrix, R = h q , ϕ i h q , ϕ i · · · h q , ϕ Nk i h q , ϕ i · · · h q , ϕ Nk i· · · · · · h q Nk , ϕ Nk i . For notation simplicity we denote R i,j = h q i , ϕ j i and p = N ( k − . In addition, we denote the QR decomposition ofthe ( k − -th iteration as Φ Λ k − = Q − R − . Then it is clearthat Q = (cid:0) Q − Q (cid:1) and R = (cid:18) R − R a b (cid:19) . (133)where Q = (cid:0) q p +1 · · · q Nk (cid:1) ∈ R m × N and R a and R b are given by R a = R ,p +1 · · · R ,Nk ... ... R p,p +1 · · · R p,Nk , R b = R p +1 ,p +1 · · · R p +1 ,Nk . . . ... R Nk,Nk . (134)Applying Φ Λ k = QR to (132), we have ˆx Λ k = ( R ′ R ) − R ′ Q ′ y . (135)We count the cost of solving (135) in the following steps.Here we assess the computational cost by counting floating-point operations (flops). That is, each + , − , ∗ , /, √ countsas one flop. • Cost of QR decomposition :To obtain Q and R , one only needs to compute Q , R a and R b since the previous data, Q − and R − arestored. For j = 1 to N , we sequentially calculate { R i,p + j } i =1 , , ··· ,p + j − = {h q i , ϕ j i} i =1 , , ··· ,p + j − , ˆ q p + j = ϕ p + j − p + j − X i =1 R i,p + j q i , q p + j = ˆ q p + j k ˆ q p + j k ,R p + j,p + j = h q p + j , ϕ p + j i . Taking j = 1 for example. One first computes { R i,p +1 } i =1 , , ··· ,p using Q − (requires p (2 m − flops)and then computes ˆ q p +1 = ϕ p +1 − P pi =1 R i,p +1 q i (requires mp flops). Then, normalization of ˆ q p +1 re-quires m flops. Finally, computing R p +1 ,p +1 requires m − flops. The cost of this example amounts to mp + 5 m − p − . Similarly, one can calculate the otherdata in Q and (cid:0) R a R b (cid:1) ′ . In summary, the cost forthis QR factorization becomes C QR = 4 N mk − mN + 3 mN − N k + 12 N − N. • Cost of calculating Q ′ y Since Q = (cid:0) Q − Q (cid:1) , we have Q ′ y = (cid:18) Q ′− yQ ′ y (cid:19) . By reusing the data Q ′− y , Q ′ y is solved with C = N (2 m − . • Cost of calculating R ′ Q ′ y :Applying R ′ to the vector Q ′ y , we have R ′ Q ′ y = (cid:18) R ′− Q ′− yR ′ a Q ′− y + R ′ b Q ′ y (cid:19) . Since the data R ′− Q ′− y can be reused, R ′ Q ′ y is solvedwith C = 2 N k − N . • Cost of calculating ( R ′ R ) − Since R is an upper triangular matrix, ( R ′ R ) − =( R ′ ) − R − . Using the block matrix inversion formula,we have R − = (cid:18) R − R a b (cid:19) − = (cid:18) ( R − ) − − ( R − ) − R a ( R b ) − ( R b ) − (cid:19) . Then we calculate ( R ′ R ) − = ( R ′ ) − R − , i.e., ( R ′ R ) − = (cid:18) M M M M (cid:19) where M = ( R ′− ) − ( R − ) − ,M = − ( R ′− ) − ( R − ) − R a ( R b ) − ,M = − ( R ′ b ) − R ′ a ( R ′− ) − ( R − ) − ,M = ( R ′ b ) − ( R b ) − . We can reuse the data ( R ′− ) − ( R − ) − so thatthe cost of calculating ( R ′ b ) − , ( R ′ b ) − ( R b ) − , and − ( R ′− ) − ( R − ) − R a ( R b ) − becomes N ( N +1)(2 N + 1) / (using Gaussian elimination method), N ( N + 1)(2 N + 1) / , and N k − N k + 2 N ,respectively. The cost for computing ( R ′ R ) − is C = 2 N k − N k + 3 N + 32 N + 12 N. • Cost of calculating ˆx Λ k = ( R ′ R ) − R ′ Q ′ y Applying ( R ′ R ) − to the vector R ′ Q ′ y , we obtain ˆx Λ k = (cid:18) ( R ′− ) − ( R − ) − R ′− Q ′− y + ξ ξ + ξ (cid:19) where ξ = − ( R ′− ) − ( R − ) − R a ( R b ) − R ′ a Q ′− y + R ′ b Q ′ y ,ξ = − ( R ′ b ) − R ′ a ( R ′− ) − ( R − ) − R ′− Q ′− y ,ξ = ( R ′ b ) − ( R b ) − R ′ a Q ′− y + R ′ b Q ′ y . We can reuse ( R ′− ) − ( R − ) − R ′− Q ′− y so that thecomputation of ξ , ξ and ξ requires (2 N − N ( k − , (2 N ( k − − N and (2 N − N flops, respectively.The cost of this step becomes C = 4 N k − N . In summary, whole cost of solving LS problem in the k -thiteration of the gOMP is the sum of the above and is given by C LS = C QR + C + C + C + C = 4 N km + ( − N + 5 N ) m + 2 N k +( − N + 5 N ) k + 3 N − N − N. A CKNOWLEDGMENT
The authors would like to thank the anonymous reviewersand for their valuable suggestions that improved the presenta-tion of the paper. R
EFERENCES[1] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomicdecomposition,”
IEEE Trans. Inform. Theory , vol. 47, no. 7, pp. 2845–2862, Nov. 2001.[2] D. L. Donoho, I. Drori, Y. Tsaig, and J. L. Starck,
Sparse solutionof underdetermined linear equations by stagewise orthogonal matchingpursuit , Citeseer, 2006.[3] E. J. Cand`es and T. Tao, “Decoding by linear programming,”
IEEETrans. Inform. Theory , vol. 51, no. 12, pp. 4203–4215, Dec. 2005.[4] E. J. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency informa-tion,”
IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, Feb.2006.[5] E. J. Cand`es and J. Romberg, “Sparsity and incoherence in compressivesampling,”
Inverse problems , vol. 23, pp. 969, Apr. 2007.[6] E. J. Cand`es, “The restricted isometry property and its implicationsfor compressed sensing,”
Comptes Rendus Mathematique , vol. 346, no.9-10, pp. 589–592, 2008.[7] M. A. Davenport and M. B. Wakin, “Analysis of Orthogonal MatchingPursuit using the restricted isometry property,”
IEEE Trans. Inform.Theory , vol. 56, no. 9, pp. 4395–4401, Sept. 2010.[8] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measure-ments via orthogonal matching pursuit,”
IEEE Trans. Inform. Theory ,vol. 53, no. 12, pp. 4655–4666, Dec. 2007.[9] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proofof the restricted isometry property for random matrices,”
ConstructiveApproximation , vol. 28, no. 3, pp. 253–263, 2008. [10] M. Raginsky, R. M. Willett, Z. T. Harmany, and R. F. Marcia, “Com-pressed sensing performance bounds under poisson noise,” IEEE Trans.Signal Process. , vol. 58, no. 8, pp. 3990–4002, Aug. 2010.[11] J. Wang and B. Shim, “Exact reconstruction of sparse signals viageneralized orthogonal matching pursuit,” in
Proc. Asilomar Conf. onSignals, Systems and Computers, Monterey, CA, Nov. 2011 , pp. 1139–1142.[12] K. Gao, S. N. Batalama, D. A. Pados, and B. W. Suter, “Compressivesampling with generalized polygons,”
IEEE Trans. Signal Process. , vol.59, no. 10, pp. 4759 –4766, Oct. 2011.[13] M. A. Khajehnejad, A. G. Dimakis, W. Xu, and B. Hassibi, “Sparserecovery of nonnegative signals with minimal expansion,”
IEEE Trans.Signal Process. , vol. 59, no. 1, pp. 196–208, Jan. 2011.[14] S. Sarvotham, D. Baron, and R. G. Baraniuk, “Compressed sensingreconstruction via belief propagation,” preprint , 2006.[15] D. Needell and R. Vershynin, “Signal recovery from incomplete andinaccurate measurements via regularized orthogonal matching pursuit,”
IEEE J. Sel. Topics Signal Process. , vol. 4, no. 2, pp. 310–316, Apr.2010.[16] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”
IEEE Trans. Inform. Theory , vol. 55, no. 5, pp.2230–2249, May 2009.[17] D. Needell and J. A. Tropp, “Cosamp: Iterative signal recoveryfrom incomplete and inaccurate samples,”
Applied and ComputationalHarmonic Analysis , vol. 26, no. 3, pp. 301–321, March 2009.[18] J. Wang and B. Shim, “On the reoovery limit of sparse signals usingorthogonal matching pursuit,”
IEEE Trans. Signal Process. , vol. 60, no.9, pp. 4973–4976, Sep. 2012.[19] D. Needell and R. Vershynin, “Uniform uncertainty principle and signalrecovery via regularized orthogonal matching pursuit,”
Foundations ofComputational Mathematics , vol. 9, no. 3, pp. 317–334, 2009.[20] E. Liu and V. N. Temlyakov, “The orthogonal super greedy algorithmand applications in compressed sensing,”
IEEE Trans. Inform. Theory ,vol. 58, no. 4, pp. 2040–2047, Apr. 2012.[21] R. Maleh, “Improved rip analysis of orthogonal matching pursuit,” arXiv:1102.4311 , 2011.[22] E. Candes, M. Rudelson, T. Tao, and R. Vershynin, “Error correction vialinear programming,” in
IEEE Symposium on Foundations of ComputerScience (FOCS). , 2005, pp. 668–681.[23] J. Wang, S. Kwon, and B. Shim, “Near optimal bound of orthogonalmatching pursuit using restricted isometric constant,”
EURASIP Journalon Advances in Signal Processing , vol. 2012, no. 1, pp. 8, 2012.[24] T. Zhang, “Sparse recovery with orthogonal matching pursuit under rip,”
IEEE Trans. Inform. Theory , vol. 57, no. 9, pp. 6215–6221, Sept. 2011.[25] S. Foucart, “Stability and robustness of weak orthogonal matchingpursuits,”
Submitted for publication , 2011.[26] ˚A. Bj¨orck,
Numerical methods for least squares problems , Number 51.Society for Industrial Mathematics, 1996.PLACEPHOTOHERE
Jian Wang (Student Member, IEEE) receivedthe B.S. degree in Material Engineering and M.S.degree in Information and Communication Engineer-ing from Harbin Institute of Technology, China, in2006 and 2009, respectively. He is currently workingtoward the Ph.D. degree in Electrical and ComputerEngineering in Korea University. His research inter-ests include compressive sensing, wireless commu-nications, and statistical learning.PLACEPHOTOHERE
Seokbeop Kwon (Student Member, IEEE) re-ceived the B.S. and M.S degrees in the School ofInformation and Communication, Korea University,in 2008 and 2010, where he is currently working to-ward the Ph.D. degree. His research interests includecompressive sensing and signal processing. PLACEPHOTOHERE