Compressive Sensing of Analog Signals Using Discrete Prolate Spheroidal Sequences
CCompressive Sensing of Analog Signals UsingDiscrete Prolate Spheroidal Sequences
Mark A. Davenport s and Michael B. Wakin c ∗ s Department of Statistics, Stanford University,390 Serra Mall, Sequoia Hall, Stanford, CA 94305, USA. Email: [email protected] c Department of Electrical Engineering and Computer Science, Colorado School of Mines,1500 Illinois St., Golden, CO 80401, USA. Email: [email protected]
September 2011; revised March 2012
Abstract
Compressive sensing (CS) has recently emerged as a framework for efficiently capturingsignals that are sparse or compressible in an appropriate basis. While often motivated as an al-ternative to Nyquist-rate sampling, there remains a gap between the discrete, finite-dimensionalCS framework and the problem of acquiring a continuous-time signal. In this paper, we attemptto bridge this gap by exploiting the Discrete Prolate Spheroidal Sequences (DPSS’s), a collec-tion of functions that trace back to the seminal work by Slepian, Landau, and Pollack on theeffects of time-limiting and bandlimiting operations. DPSS’s form a highly efficient basis forsampled bandlimited functions; by modulating and merging DPSS bases, we obtain a dictio-nary that offers high-quality sparse approximations for most sampled multiband signals. Thismultiband modulated DPSS dictionary can be readily incorporated into the CS framework. Weprovide theoretical guarantees and practical insight into the use of this dictionary for recoveryof sampled multiband signals from compressive measurements.
Keywords:
Compressive sensing, multiband signals, Discrete Prolate Spheroidal Sequences, Fourieranalysis, sampling, block-sparsity, approximation, signal recovery, greedy algorithms
In recent decades the digital signal processing community has enjoyed enormous success in devel-oping hardware and algorithms for capturing and extracting information from signals. Capitalizingon the early work of Whitaker, Nyquist, Kotelnikov, and Shannon on the sampling and representa-tion of continuous signals, signal processing has moved from the analog to the digital domain andridden the wave of Moore’s law. Digitization has enabled the creation of sensing and processingsystems that are more robust, flexible, cheaper and, therefore, more ubiquitous than their analogcounterparts.The foundation of this progress has been the Nyquist sampling theorem, which states that inorder to perfectly capture the information in an arbitrary continuous-time signal x ( t ) with bandlimit ∗ Corresponding author. Phone: (303) 273-3607. Fax: (303) 273-3602.This work was partially supported by NSF grants DMS-1004718 and CCF-0830320, DARPA grant FA8650-08-C-7853, and AFOSR grant FA9550-09-1-0465. a r X i v : . [ c s . I T ] M a r nyq Hz, we must sample the signal at its Nyquist rate of B nyq samples/sec. This requirement hasplaced a growing burden on analog-to-digital converters as applications that require processingsignals of ever-higher bandwidth lead to ever-higher sampling rates. This pushes these devicestoward a physical barrier, beyond which their design becomes increasingly difficult and costly [77].In recent years, compressive sensing (CS) has emerged as a framework that can significantlyreduce the acquisition cost at a sensor [2, 10, 28]. CS builds on the work of Cand`es, Romberg, andTao [13] and Donoho [28], who showed that a signal that can be compressed using classical methodssuch as transform coding can also be efficiently acquired via a small set of nonadaptive, linear, andusually randomized measurements.There remains, however, a prominent gap between the theoretical framework of CS, which dealswith acquiring finite-length, discrete signals that are sparse or compressible in a basis or dictionary,and the problem of acquiring a continuous-time signal. Previous work has attempted to bridgethis gap by employing two very different strategies. First, works such as [73] have employed asimple multitone analog signal model that maps naturally into a finite-dimensional sparse model.Although this assumption allows the reconstruction problem to be formulated directly within theCS framework, the multitone model can be unrealistic for many analog signals of practical interest.Alternatively, other authors have considered a more plausible multiband analog signal model that isalso amenable to sub-Nyquist sampling [8, 34, 35, 47, 48, 75]. These works, however, have involvedcustomized sampling protocols and reconstruction formulations that fall largely outside of thestandard CS framework. Indeed, some of this body of literature and many of its underlying ideasactually pre-date the very existence of CS. In this paper, we bridge this gap in a different manner. Namely, we show that when dealing withfinite-length windows of samples, it is possible to map the multiband analog signal model—in a verynatural way—into a finite-dimensional sparse model. One can then apply many of the standardtheoretical tools of CS to develop algorithms for both recovery as well as compressive domainprocessing of multiband signals.Our work actually rests on ideas that trace back to the classical signal processing literatureand the study of time-frequency localization. The Weyl-Heisenberg uncertainty principle statesthat a signal cannot be simultaneously localized on a finite interval in both time and frequency.A natural question is to what extent it is possible to concentrate a signal x ( t ) and its continuous-time Fourier transform (CTFT) X ( F ) near finite intervals. In an extraordinary series of papersfrom the 1960s and 1970s, Slepian, Landau, and Pollack provide an in-depth investigation intothis question [42, 43, 63, 65, 67]. The implications of this body of work have had a tremendousimpact across a number of disciplines within mathematics and engineering, particularly in the fieldof spectral estimation and harmonic analysis (e.g., [69]). Very few of these ideas have appearedin the CS literature, however, and so one goal of this paper is to carefully explain—from a CSperspective—the natural role that these ideas can indeed play in CS.We begin this paper in Sections 2 and 3 with a description of our problem setup and a surveyof the necessary CS background material. In Section 4, we introduce the multitone and multiband analog signal models. We then discuss how sparse representations for multiband signals can beincorporated into the CS framework through the use of Discrete Prolate Spheroidal Sequences (DPSS’s) [65]. First described by Slepian in 1978, DPSS’s form a highly efficient basis for sampledbandlimited functions. For the sake of clarity and completeness, we provide a self-contained reviewof the key results from Slepian’s work that are most relevant to the problem of modeling sampledmultiband signals. We then explain how, by modulating and merging DPSS bases, one obtains a2ictionary that—to a very high degree of approximation—provides a sparse representation for mostfinite-length, Nyquist-rate sample vectors arising from multiband analog signals. We also explainwhy the qualifiers “approximation” and “most” in the preceding sentence are necessary; however,we characterize them formally and justify the use of the multiband modulated DPSS dictionary inpractical settings.In Section 5, we discuss the use of the multiband modulated DPSS dictionary for recovery ofsampled multiband signals from compressive measurements. We discuss the implications (in termsof formulating reconstruction procedures and guaranteeing their performance) of the fact that ourdictionary is not quite orthogonal; in fact, it may be undercomplete or overcomplete, dependingon the setting of a user-defined parameter. We also provide theoretical guarantees for recoveryalgorithms that exploit the block-sparse nature of signal expansions in our dictionary. Ultimately,this allows us to guarantee that most finite-length sample vectors arising from multiband analogsignals can—to a high degree of approximation—be recovered from a number of measurements thatis proportional to the underlying information level (also known as the Landau rate [41]).In Section 6, we present the results of a detailed suite of simulations for signal recovery fromcompressive measurements, illustrating the effectiveness of our proposed approaches on realisticsignals. We show that the reconstruction quality achieved using the multiband modulated DPSSdictionary is far better than what is achieved using the discrete Fourier transform (DFT) as asparsifying basis. These results confirm that a DPSS-based dictionary can provide a very attractivealternative to the DFT for sparse recovery. We conclude in Section 7 with a final discussion anddirections for future work.
Although customized measurement and reconstruction schemes [8, 34, 35, 47, 48, 75] have previouslybeen proposed for efficiently sampling multiband signals, we believe that our paper is of independentinterest from these works, specifically because we restrict ourselves to operating within the finite-dimensional CS framework. There are a variety of plausible CS (and even non-CS) scenarios wherea sparse representation of a finite-length Nyquist-rate sample vector would be useful, and it is thisproblem to which we devote our attention. This work may be of interest, for example, to anypractitioner who has struggled with the lack of sparsity that the DFT dictionary provides evenfor pure sampled tones at “off-grid” frequencies. Moreover, as we discuss more fully in Section 2,several analog CS hardware architectures can be viewed as providing random projections of finite-length, Nyquist-rate sample vectors. Our formulation is compatible with these architectures anddoes not require a customized sampling protocol.It is important to mention that we are not the first authors to recognize the potential rolethat DPSS’s (or their continuous-time counterparts, the Prolate Spheroidal Wave Functions, orPSWF’s [42, 43, 63, 67]) can play in CS. Izu and Lakey [40] have drawn an analogy between samplingbounds for multiband signals and classical results in CS, but not specifically for the purpose ofusing the finite-dimensional CS framework for sparse recovery of sample vectors from multibandanalog signals. Gosse [37] has considered the recovery of smooth functions from random samples;however, this work focuses on a very different setting, employing a PSWF (not DPSS) dictionary,considering only baseband signals, and exploiting sparsity in a different way than our work. Senayet al. [60, 61] have also considered a PSWF dictionary for reconstruction of signals from nonuniformsamples; however, this work also focuses on baseband signals and lacks formal approximation andCS recovery guarantees. Oh et al. [53] have employed a modulated DPSS dictionary for sampledbandpass signals; however, this work falls largely outside the standard CS framework and again lacksformal approximation and CS recovery guarantees of the type we provide. Finally, Sejdi´c et al. [59]3ave proposed a multiband modulated DPSS dictionary very similar to our own and a greedyalgorithm for signal decomposition in that dictionary. However, this work is again not devotedto developing sparse approximation guarantees for sampled multiband signals. It focuses not onsignal recovery but on identification of a communications channel, and the proposed reconstructionalgorithm is not intended to exploit block-sparse structure in the signal coefficients. We hope thatour paper will be a valuable addition to this nascent literature and help to encourage much furtherexploration of the connections between DPSS’s, PSWF’s, and CS.
Before proceeding, we first briefly introduce some mathematical notation that we will use through-out the paper. We use bold characters to indicate finite-dimensional vectors and matrices. All suchvectors and matrices are indexed beginning at 0, so that the first element of a length- N vector x isgiven by x [0] and the last by x [ N − A by A H .We use (cid:107)·(cid:107) p to denote the standard (cid:96) p norm. We also use (cid:107) x (cid:107) := | supp( x ) | to denote the numberof nonzeros of x , and we say that x is S -sparse if (cid:107) x (cid:107) = S . We use E [ x ] to denote expected valueof a random variable x and P [ E ] to denote the probability of an event E . Finally, we adopt theconvention j = √− In the standard CS setting, one is concerned with recovering a finite-dimensional vector x ∈ C N from a limited number of measurements. A typical first order assumption is that the vector x is sparse , meaning that there exists some basis or dictionary Ψ such that x = Ψ α and α has a smallnumber of nonzeros, i.e., (cid:107) α (cid:107) ≤ S for some S (cid:28) N . One then acquires the measurements y = Ax + e (1)where A ∈ C M × N maps x to a length- M vector of complex-valued measurements, and where e isa length- M vector that represents measurement noise generated by the acquisition hardware. Inthe context of CS, one seeks to design A so that M is on the order of S (the number of degrees offreedom of the signal) and potentially much smaller than N .In the present work, however, we are concerned with the acquisition of a finite-length windowof a complex-valued, continuous-time signal, which we denote by x ( t ). Specifically, we suppose thatwe are interested in a time window of length T w seconds and that we acquire the measurements y = Φ( x ( t )) + e , (2)where Φ is a linear measurement operator that maps functions defined on [0 , T w ) to a length- M vector of measurements and e again represents measurement noise. We assume throughout thispaper that x ( t ) is bandlimited with bandlimit B nyq Hz, i.e., that x ( t ) has a continuous-time Fouriertransform (CTFT) X ( F ) = (cid:90) ∞−∞ x ( t ) e − j πF t dt such that X ( F ) = 0 for | F | > B nyq . Additional assumptions on x ( t ) will be specified in Section 4.1.2.Because we assume that x ( t ) is bandlimited and that the measurement process (2) takes placeover a finite window of time, we restrict our attention to the problem of recovering the Nyquist-ratesamples of x ( t ) over this time interval. Specifically, we let T s ≤ B nyq denote a sampling interval (in4econds) chosen to meet the minimum Nyquist sampling rate, and we let x [ n ] denote the infinite-length sequence that would be obtained by uniformly sampling x ( t ) with sampling period T s , i.e., x [ n ] = x ( nT s ). We are interested in a time window of length T w seconds, during which there are N = (cid:100) T w T s (cid:101) samples. We let x = [ x [0] x [1] · · · x [ N − T denote x [ n ] truncated to the N samplesfrom 0 to N −
1. This paper is specifically devoted to the problem of recovering x , the vector ofNyquist-rate samples of x ( t ) on [0 , T w ), from compressive measurements y of the form (2).To facilitate this, we first note that the sensing model in (2) is clearly very similar to thestandard CS model in (1). We briefly describe conditions under which these models are equivalent.Recall from the Shannon-Nyquist sampling theorem that x ( t ) can be perfectly reconstructed from x [ n ] since T s ≥ B nyq . Specifically, we have the formula x ( t ) = ∞ (cid:88) n = −∞ x [ n ] sinc ( t/T s − n ) , (3)where sinc ( t ) = (cid:40) sin( πt ) / ( πt ) , t (cid:54) = 01 , t = 0 . Observe that since Φ is linear, we can express each measurement y [ m ] in (2) simply as the innerproduct between x ( t ) and some sensing functional φ m ( t ), i.e., y [ m ] = (cid:104) φ m ( t ) , x ( t ) (cid:105) + e [ m ] . (4)In this case we can use (3) to reduce (4) to y [ m ] = (cid:42) φ m ( t ) , ∞ (cid:88) n = −∞ x [ n ] sinc ( t/T s − n ) (cid:43) + e [ m ] = ∞ (cid:88) n = −∞ x [ n ] (cid:104) φ m ( t ) , sinc ( t/T s − n ) (cid:105) + e [ m ] . (5)If we let A denote the M × N matrix with entries given by A [ m, n ] = (cid:104) φ m ( t ) , sinc ( t/T s − n ) (cid:105) and let w denote the length- M vector with entries given by w [ m ] = (cid:88) n ≤− n ≥ N x [ n ] (cid:104) φ m ( t ) , sinc ( t/T s − n ) (cid:105) , (6)then (2) reduces to y = Ax + w + e . (7)If the vector w = , then (7) is exactly equivalent to the standard CS sensing model in (1).Moreover, if w is not zero but is small compared to e , then we can simply absorb w into e andagain reduce (7) to (1). Note that our goal is to recover x , which of course carries useful information about x ( t ), but recovering x maynot be sufficient for exactly recovering x ( t ) on the entire window [0 , T w ). (This depends on the exact sampling rateand the decay of the analog interpolation kernel.) In practice, the methods we describe in this paper for digitalsingle-window reconstruction could be implemented in a streaming multi-window setting, and this would allow for amore accurate reconstruction of x ( t ) on the entire window. In our setup, since Φ maps functions defined on [0 , T w ) to vectors in C M , we are inherently assuming that φ m ( t ) = 0 outside of [0 , T w ), so that the sensing functionals are time-limited. Although certain acquisition systems(such as the modulated wideband converter of [48]) do not satisfy this condition, we believe that it is often a reasonableassumption in practice and that many acquisition systems can at least be well-approximated as time-limited. w would depend greatly on the choice of the φ m ( t ).While a detailed analysis of w for various practical choices of φ m ( t ) is beyond the scope of thispaper, we briefly mention some possible strategies for controlling w . First, one can easily showthat if each φ m ( t ) consists of any weighted combination of Dirac delta functions positioned at times0 , T s , . . . , ( N − T s , then by construction w = . This should not be surprising, as in this case itis clear that the measurements are simply a linear combination of the Nyquist-rate samples fromthe finite window. Importantly, it is possible to collect measurements of this type without firstacquiring the Nyquist-rate samples (see, for example, the architecture proposed in [62]), althoughthere are also plenty of situations in which one might explicitly apply a matrix multiplication tocompress data after acquiring a length- N vector of Nyquist-rate samples.For many architectures used in practice, it will not be the case that w = exactly. However, itmay still be possible to ensure that w remains very small. There are a number of possible routes tosuch a guarantee. For example, the φ m ( t ) could be designed to incorporate a smooth window g ( t ) sothat we effectively sample x ( t ) g ( t ) instead of x ( t ), where g ( t ) is designed to ensure that x [ n ] g [ n ] ≈ n ≤ − n ≥ N . The reconstruction algorithm could then compensate for the effect of g [ n ]on 0 , , . . . , N −
1. Alternatively, by considering a slightly oversampled version of x ( t ) (so that T s exceeds B nyq by some nontrivial amount) it is also possible to replace the sinc interpolation kernelwith one that decays significantly faster, ensuring that the inner products in (6) decay to zeroextremely quickly [18]. Finally, as we will see below, many constructions of φ m ( t ) often involve adegree of randomness that could also be leveraged to show that with high probability, the innerproducts in (6) decay even faster. However, since the details will depend greatly on the particulararchitecture used, we leave such an investigation for future work.Having argued that the measurement model in (2) can often be expressed in the form (1), wenow turn to the central theoretical question of this paper:Supposing that x ( t ) obeys the multiband model described in Section 4.1, how can werecover x , i.e., the Nyquist-rate samples of x ( t ) on [0 , T w ), from compressive measure-ments of the form y = Ax + e ?In order to answer this question, of course, we will need a dictionary Ψ that provides a suitablysparse representation for x . We devote Section 4 to constructing such a dictionary. In addition toa dictionary Ψ , however, we will also need a reconstruction algorithm that can efficiently recover x from the compressive measurements y . While it is certainly possible to apply out-of-the-box CSrecovery algorithms to this problem, there are certain properties of our dictionary that make therecovery problem worthy of further consideration. (In particular, the columns of our dictionary Ψ will typically not be orthogonal, and the sparse coefficient vectors α that arise will tend tohave structured (block-sparse) sparsity patterns.) In light of these nuances, Section 3 now providesadditional background on CS that will allow us to formulate a principled recovery technique. Setting aside the question of how to design the sparsity-inducing dictionary Ψ , we first addressthe problem of designing A . Although many favorable properties for sensing matrices have beenstudied in the context of CS, the most common is the restricted isometry property (RIP) [14]. Wesay that the matrix A Ψ satisfies the RIP of order S if there exists a constant δ S ∈ (0 ,
1) such that (cid:112) − δ S ≤ (cid:107) A Ψ α (cid:107) (cid:107) α (cid:107) ≤ (cid:112) δ S (8)6olds for all α such that (cid:107) α (cid:107) ≤ S . In words, A Ψ preserves the norm of S -sparse vectors. Notethat for any pair of vectors α and α (cid:48) such that (cid:107) α (cid:107) = (cid:107) α (cid:48) (cid:107) = S , we have that (cid:107) α − α (cid:48) (cid:107) ≤ S .This gives us an alternative interpretation of (8)—namely that the RIP of order 2 S ensures that A Ψ preserves Euclidean distances between S -sparse vectors α .A related concept is what we call the Ψ -RIP (following the notation in [12]). Specifically, wesay that the matrix A satisfies the Ψ -RIP of order S if there exists a constant δ S ∈ (0 ,
1) such that (cid:112) − δ S ≤ (cid:107) A Ψ α (cid:107) (cid:107) Ψ α (cid:107) ≤ (cid:112) δ S (9)holds for all α such that (cid:107) α (cid:107) ≤ S . When Ψ is an orthonormal basis, (8) and (9) are equivalent.However, we will be concerned in this paper with non-orthogonal (and even non-square) dictionar-ies Ψ , in which case the RIP and the Ψ -RIP are slightly different concepts: the former ensuresnorm preservation of all sparse coefficient vectors α , while the latter ensures norm preservationof all signals having a sparse representation x = Ψ α . In many problems (such as when Ψ is anovercomplete dictionary), the RIP is considered to be a stronger requirement.There are a variety of approaches to constructing matrices that satisfy the RIP or Ψ -RIP, someof which are better suited to practical architectures than others. From a theoretical standpoint,however, the most fruitful approaches involve the use of random matrices. Specifically, we considermatrices constructed as follows: given M and N , we generate a random M × N matrix A bychoosing the entries A [ m, n ] as independent and identically distributed (i.i.d.) random variables.While it is not strictly necessary, for the sake of simplicity we will consider only real-valued randomvariables, so that A ∈ R M × N .We impose two conditions on the random distribution. First, we require that the distributionis centered and normalized such that E ( A [ m, n ]) = 0 and E ( A [ m, n ] ) = M . Second, we requirethat the distribution is subgaussian [9, 76], meaning that there exists a constant c > E (cid:16) e A [ m,n ] t (cid:17) ≤ e c t (10)for all t ∈ R . Examples of subgaussian distributions include the Gaussian distribution, theRademacher distribution, and the uniform distribution. In general, any distribution with boundedsupport is subgaussian.The key property of subgaussian random variables that will be of use in this paper is that forany x ∈ C N , the random variable (cid:107) Ax (cid:107) is highly concentrated about (cid:107) x (cid:107) . In particular, thereexists a constant c ( η ) > η and the constant c in (10) such that P (cid:104)(cid:12)(cid:12)(cid:12) (cid:107) Ax (cid:107) − (cid:107) x (cid:107) (cid:12)(cid:12)(cid:12) ≥ η (cid:107) x (cid:107) (cid:105) ≤ e − c ( η ) M , (11)where the probability is taken over all draws of the matrix A (see Lemma 6.1 of [27] or [19]). We leave the constant c ( η ) undefined since it will depend both on the particular subgaussiandistribution under consideration and on the range of η considered. Importantly, however, for anysubgaussian distribution and any η max , we can write c ( η ) = κη for η ≤ η max with κ being aconstant that depends on certain properties of the distribution [19]. This concentration bound has The concentration result in (11) is typically stated for x ∈ R N instead of C N . The complex case follows from thereal case by handling the real and imaginary parts separately and then applying the union bound, which results in afactor of 4 instead of 2 in front of the exponent. Lemma 3.1.
Let X denote any S -dimensional subspace of C N . Fix δ, β ∈ (0 , . Let A be an M × N random matrix with i.i.d. entries chosen from a distribution satisfying (11). If M ≥ S log(42 /δ ) + log(4 /β ) c ( δ/ √
2) (12) then with probability exceeding − β , √ − δ (cid:107) x (cid:107) ≤ (cid:107) Ax (cid:107) ≤ √ δ (cid:107) x (cid:107) (13) for all x ∈ X . When Ψ is an orthonormal basis, one can use this lemma to go beyond a single S -dimensionalsubspace to instead consider all possible subspaces spanned by S columns of Ψ , thereby establishingthe RIP for A Ψ . The proof follows that of Theorem 5.2 of [4]. Lemma 3.2.
Let Ψ be an orthonormal basis for C N and fix δ, β ∈ (0 , . Let A be an M × N random matrix with i.i.d. entries chosen from a distribution satisfying (11). If M ≥ S log(42 eN/δS ) + log(4 /β ) c ( δ/ √
2) (14) with e denoting the base of the natural logarithm, then with probability exceeding − β , A Ψ willsatisfy the RIP of order S with constant δ .Proof. This is a simple generalization of Lemma 3.1, which follows from the observation that (8) isequivalent to (13) holding for all S -dimensional subspaces. There are (cid:0) NS (cid:1) ≤ ( eN/S ) S subspaces ofdimension S aligned with the coordinate axes of Ψ , and so applying a union bound to Lemma 3.1we obtain the desired result.From essentially the same argument, we can also prove for more general dictionaries Ψ that A will satisfy the Ψ -RIP. Corollary 3.1.
Let Ψ be an arbitrary N × D matrix and fix δ, β ∈ (0 , . Let A be an M × N random matrix with i.i.d. entries chosen from a distribution satisfying (11). If M ≥ S log(42 eD/δS ) + log(4 /β ) c ( δ/ √
2) (15) with e denoting the base of the natural logarithm, then with probability exceeding − β , A willsatisfy the Ψ -RIP of order S with constant δ . As noted above, the random matrix approach is somewhat impractical to build in hardware.However, several hardware architectures have been implemented and/or proposed that enable com-pressive samples to be acquired in practical settings. Examples include the random demodula-tor [73], random filtering [74], the modulated wideband converter [48], random convolution [1, 57],and the compressive multiplexer [62]. In this paper we will rely on random matrices in the de-velopment of our theory, but we will see via simulations that the techniques we propose are alsoapplicable to systems that use some of these more practical architectures. The constants in [4] differ from those in Lemma 3.1, but the proof is substantially the same (see [21]). Note thatin [21] X is a subspace of R N rather than C N . In our case we incur an additional factor of 2 in the constant whicharises as a consequence of the increase in the covering number for a sphere in C S (which can easily be derived fromthe fact that there is an isometry between C S and R S ). .2 CS recovery algorithms Before we return to the problem of designing Ψ , we first discuss the question of how to recover thevector x from measurements of the form y = Ax + e = A Ψ α + e . The original CS theory proposed (cid:96) -minimization as a recovery technique [10, 28]. Convex optimization techniques are powerfulmethods for CS signal recovery, but there also exist a variety of alternative greedy or iterativealgorithms that are commonly used in practice and that satisfy similar performance guarantees,including iterative hard thresholding (IHT) [6], orthogonal matching pursuit (OMP) [24, 26, 54, 72],and several more recent variations on OMP [16, 17, 29, 50–52].In this paper we will restrict our attention to two of the most commonly used algorithms inpractice—IHT and CoSaMP [6, 50]. We begin with IHT, which is probably the simplest of all CSrecovery algorithms. As is the case for most iterative recovery algorithms, a core component ofIHT is hard thresholding . Specifically, we define the operatorhard( α , S )[ n ] = (cid:40) α [ n ] , | α [ n ] | is among the S largest elements of | α | ;0 , otherwise. (16)In words, the hard thresholding operator sets all but the S largest elements of a vector to zero(with ties broken according to any arbitrary rule).To the best of our knowledge, there are no existing papers that specifically discuss how toimplement IHT when Ψ is not an orthonormal basis or a tight frame (see [15] for a discussion ofthe latter case). Nonetheless, we can envision two natural (and reasonable) ways that the canonicalIHT algorithm [6] can be extended to handle a general dictionary. In the first of these variations,the algorithm would consist of iteratively applying the update rule α (cid:96) +1 = hard( α (cid:96) + µ ( A Ψ ) H ( y − A Ψ α (cid:96) ) , S ) (17)where µ is a parameter set by the user. In the second of these variations, the algorithm wouldconsist of iteratively applying the update rule x (cid:96) +1 = Ψ · hard (cid:16) Ψ H (cid:16) x (cid:96) + µ A H (cid:16) y − Ax (cid:96) (cid:17)(cid:17) , S (cid:17) . (18)When Ψ is an orthonormal basis these algorithms are equivalent, but in general they are not.On the whole, IHT is a remarkably simple algorithm, but in practice its performance is greatlydependent on careful selection and adaptation of the parameter µ . We refer the reader to [6] forfurther details.CoSaMP is a somewhat more complicated algorithm, but can be easily understood as breakingthe recovery problem into two separate sub-problems: identifying the S columns of Ψ that bestrepresent x and then projecting onto that subspace. The former problem is clearly somewhatchallenging, but once solved, the latter is relatively straightforward. In particular, if we haveidentified the optimal columns of Ψ , indexed by the set Λ, then we can recover x via least-squares.In this case, an optimal recovery strategy is to solve the problem: (cid:98) x = arg min z (cid:107) y − Az (cid:107) s . t . z ∈ R ( Ψ Λ ) , (19)where Ψ Λ denotes the submatrix of Ψ that contains only the columns of Ψ corresponding to theindex set Λ and R ( Ψ Λ ) denotes the range of Ψ Λ . If we let (cid:101) A = A Ψ , then one way to obtain thesolution to (19) is via the pseudoinverse of (cid:101) A Λ , denoted (cid:101) A † Λ . Specifically, we can compute (cid:98) α | Λ = (cid:101) A † Λ y = (cid:16) (cid:101) A H Λ (cid:101) A Λ (cid:17) − (cid:101) A H Λ y and (cid:98) α | Λ c = (20)9 lgorithm 1 Compressive Sampling Matching Pursuit (CoSaMP) input: A , Ψ , y , S , stopping criterion initialize: r = y , x = 0, (cid:96) = 0 while not converged doproxy: h (cid:96) = A H r (cid:96) identify: Ω (cid:96) +1 = supp(hard( Ψ H h (cid:96) , S )) merge: Λ (cid:96) +1 = supp(hard( Ψ H x (cid:96) , S )) ∪ Ω (cid:96) +1 update: (cid:101) x = arg min z (cid:107) y − Az (cid:107) s . t . z ∈ R ( Ψ Λ (cid:96) +1 ) x (cid:96) +1 = Ψ · hard( Ψ H (cid:101) x , S ) r (cid:96) +1 = y − Ax (cid:96) +1 (cid:96) = (cid:96) + 1 end whileoutput: (cid:98) x = x (cid:96) and then set (cid:98) x = Ψ (cid:98) α . While this is certainly not the only approach to solving (19) (as we will see inSection 6.1.2), it allows us to easily observe that in the noise-free setting, if the support estimate Λis correct, then y = Ax = (cid:101) A Λ α | Λ , and so plugging this into (20) yields (cid:98) α = α (and hence (cid:98) x = x )provided that (cid:101) A Λ has full column rank. Thus, the central challenge in recovery is to correctlyidentify the set Λ. CoSaMP and related algorithms solve this problem by iteratively identifyinglikely columns, performing a projection, and then improving the estimate of which columns to use.Unfortunately, we are again not aware of any papers that specifically discuss how to implementCoSaMP when Ψ is not an orthonormal basis. Nonetheless, we can envision two natural extensionsof the canonical CoSaMP algorithm [50]. One of these is shown in Algorithm 1; in a sense, thisformulation is more analogous to (18) than to (17) because it is focused on recovery of x ratherthan α . However, Algorithm 1 is actually quite flexible and can be invoked in multiple ways. Tohelp distinguish among the different possibilities, it will be helpful to introduce the notation (cid:98) x = CoSaMP( A , Ψ , y , S )to denote the output produced by Algorithm 1 when the arguments ( A , Ψ , y , S ) are provided asinput. Having set this notation, it is also reasonable to consider invoking Algorithm 1 with theinput arguments ( A Ψ , I , y , S ). This formulation is more analogous to (17). In this case we willdenote the output by (cid:98) α = CoSaMP( A Ψ , I , y , S ) since the algorithm will construct and output anestimate of α (rather than x ). Traditional approaches to CS signal recovery, like those described above, place no prior assumptionson supp( α ). Sparsity on its own implies nothing about the locations of the nonzeros, and hence mostapproaches to CS signal recovery treat every possible support as equally likely. However, in manypractical applications the nonzeros are not distributed completely at random, but rather exhibit a We note that the choice of 2 S in the “identify” step is primarily driven by the proof technique, and is not intendedto be interpreted as an optimal or necessary choice. For example, in [17] it is shown that the choice of S is sufficientto establish performance guarantees similar to those for CoSaMP. It is also important to note that when the numberof measurements M is very small (less than 3 S ) it is necessary to make suitable modifications as the assumptions ofthe algorithm are clearly violated in this case. Moreover, a simple extension of CoSaMP as presented here involvesincluding an additional orthogonalization step after pruning (cid:101) x down to an S -dimensional estimate, as is also donein [17]. This can often result in modest performance gains and is a technique that we exploit in our simulations. structured sparsity , it is possible to bothreduce the required number of measurements and develop specialized “model-based” algorithms forrecovery that exploit this structure [3, 5].In this paper, we are interested in the model of block-sparsity [3, 31]. In a block-sparse vector α , the nonzero coefficients cluster in a small number of blocks. Specifically, suppose that x = Ψ α with Ψ being an N × D matrix and that we decompose Ψ into J submatrices of size N × DJ , i.e., Ψ = [ Ψ Ψ · · · Ψ J − ] . Then we can write x = (cid:80) J − i =0 Ψ i α i , where each α i ∈ C D/J . We say that α is K -block-sparse ifthere exists a set I ⊆ { , , . . . , J − } such that |I| = K and α i = 0 for all i / ∈ I . With someabuse of notation, we now let let Ψ I denote the submatrix of Ψ that contains only the columns of Ψ corresponding to the blocks indexed by I .We first illustrate how we can exploit block-sparsity algorithmically. Our goal is to generalizeIHT and CoSaMP to the block-sparse setting. To do this, we observe that the hard thresholdingfunction plays a key role in both algorithms. One way to interpret this role is that hard( Ψ H x , S )is actually computing a projection of Ψ H x onto the set of S -sparse vectors. In the case where Ψ isan orthonormal basis we can also interpret Ψ · hard( Ψ H x , S ) as projecting x onto the set of signalsthat are S -sparse with respect to the basis Ψ .In the block-sparse case we must replace hard thresholding with an appropriate operator thattakes a candidate signal and finds the closest K -block-sparse approximation. Towards this end, wedefine S Ψ ( x , K ) := arg min |I|≤ K min z ∈R ( Ψ I ) (cid:107) x − z (cid:107) . (21) S Ψ ( x , K ) is analogous to the support of α in the traditional sparse setting: it tells us which set of K blocks of Ψ can best approximate x . Along with S Ψ ( x , K ), we also define P Ψ ( x , K ) := arg min z (cid:107) x − z (cid:107) s . t . z ∈ R (cid:0) Ψ S Ψ ( x ,K ) (cid:1) . (22) P Ψ is simply the projection of the vector x ∈ C N onto the set of K -block-sparse signals. To simplifyour notation, we will often write S ( α , K ) = S Ψ ( α , K ) and P ( x , K ) = P Ψ ( x , K ) when Ψ is clearfrom the context.For each of the IHT and CoSaMP algorithms proposed in Section 3.2.1, it is possible to proposea variation of the algorithm designed to exploit block-sparsity simply by replacing the hard thresh-olding operator with an appropriate block-sparse projection. For example, one block-based versionof IHT (which is also a special case of the iterative projection algorithm in [5]), would consist ofreplacing the core iteration of IHT in (18) with x (cid:96) +1 = P ( x (cid:96) + µ A H ( y − Ax (cid:96) ) , K ) . (23)Note that the only difference from (18) is that we have replaced hard thresholding with the pro-jection onto the set of block-sparse signals. Similarly, a block-based version of CoSaMP is shownin Algorithm 2.Algorithm 2 is in differing senses both a special case and a generalization of the model-basedCoSaMP algorithm proposed in [3]. Specifically, in [3] an algorithm for block-sparse signal recoveryis proposed that is equivalent to Algorithm 2 when Ψ = I . The more general case of arbitrary Ψ isnot discussed. However, there are alternative options for handling Ψ (cid:54) = I besides the one specifiedin Algorithm 2. Like Algorithm 1, Algorithm 2 is quite flexible and can be invoked in multipleways. Following our convention in Section 3.2.1, we use the notation (cid:98) x = BBCoSaMP( A , Ψ , y , K )11 lgorithm 2 Block-Based CoSaMP (BBCoSaMP) input: A , Ψ , y , K , stopping criterion initialize: r = y , x = 0, (cid:96) = 0 while not converged doproxy: h (cid:96) = A H r (cid:96) identify: (cid:101) I (cid:96) +1 = S ( P ( h (cid:96) , K ) , K ) merge: I (cid:96) +1 = S ( x (cid:96) , K ) ∪ (cid:101) I (cid:96) +1 update: (cid:101) x = arg min z (cid:107) y − Az (cid:107) s . t . z ∈ R ( Ψ I (cid:96) +1 ) x (cid:96) +1 = P ( (cid:101) x , K ) r (cid:96) +1 = y − Ax (cid:96) +1 (cid:96) = (cid:96) + 1 end whileoutput: (cid:98) x = x (cid:96) to denote the output produced by Algorithm 2 when the arguments ( A , Ψ , y , K ) are provided asinput, and we use the notation (cid:98) α = BBCoSaMP( A Ψ , I , y , K ) to denote the output produced byAlgorithm 2 when the arguments ( A Ψ , I , y , K ) are provided as input. Theoretical guarantees for standard CS recovery algorithms typically rely on the RIP, and since inthe standard case any S -sparse signal is possible, there is little room for improvement. However,the block-sparse model actually rules out a large number of possible signal supports, and so weno longer require the full RIP or Ψ -RIP, i.e., we no longer need (8) or (9) to hold for all possible S -sparse signals. Instead we only require that (8) or (9) hold for all α which are K -block-sparse.We will refer to these relaxed properties as the block-RIP and Ψ -block-RIP respectively.The relaxation to block-sparse signals allows us to potentially dramatically reduce the requirednumber of measurements. Specifically, note that a K -block-sparse vector α satisfies (cid:107) α (cid:107) ≤ K DJ .In the standard sparsity model we would have that the number of possible subspaces is (cid:0) DS (cid:1) with S = K DJ , whereas now the number of possible subspaces is given by (cid:0) JK (cid:1) , which can be potentiallymuch smaller. Establishing (8) or (9) for a more general union of subspaces is a problem that hasreceived some attention in the CS literature [7, 32, 45]. In our context it should be clear that wecan simply apply Lemma 3.1 just as in the proofs of Lemma 3.2 and Corollary 3.1 to obtain thefollowing improved bounds. Corollary 3.2.
Let Ψ be an orthonormal basis for C N and fix δ, β ∈ (0 , . Let A be an M × N random matrix with i.i.d. entries chosen from a distribution satisfying (11). If M ≥ K (cid:0) NJ log(42 /δ ) + log( eJ/K ) (cid:1) + log(4 /β ) c ( δ/ √
2) (24) with e denoting the base of the natural logarithm, then with probability exceeding − β , A Ψ willsatisfy the block-RIP of order K with constant δ . Corollary 3.3.
Let Ψ be an arbitrary N × D matrix and fix δ, β ∈ (0 , . Let A be an M × N random matrix with i.i.d. entries chosen from a distribution satisfying (11). If M ≥ K (cid:0) DJ log(42 /δ ) + log( eJ/K ) (cid:1) + log(4 /β ) c ( δ/ √
2) (25)12 ith e denoting the base of the natural logarithm, then with probability exceeding − β , A willsatisfy the Ψ -block-RIP of order K with constant δ . The measurement requirements in (24) and (25) represent improvements over (14) or (15) inthat a straightforward application of (14) or (15) would lead to replacing the log( eJ/K ) term abovewith NJ log( eJ/K ) or DJ log( eJ/K ) respectively.We can combine these corollaries with the following theorems to show that we can stablyrecover block-sparse signals using fewer measurements. Note that the following theorems are sim-plified guarantees for the case of only approximately block-sparse signals. Both theorems can begeneralized to block-compressible signals, but we restrict our attention to the guarantees for sparsesignals to simplify discussion. Theorem 3.1 (Theorem 2 from [5]) . Suppose that x can be written as x = Ψ α where α is K -block-sparse and that we observe y = A Ψ α + e = Ax + e . If A satisfies the Ψ -block-RIP oforder K with constant δ K ≤ . and µ ∈ [1 + δ K , . − δ K )) , then the estimate obtained fromblock-based IHT (23) satisfies (cid:107) x − (cid:98) x (cid:107) ≤ κ (cid:107) e (cid:107) (26) where κ > is a constant determined by δ K and the stopping criterion. Theorem 3.2 (Theorem 6 from [3]) . Suppose that α is K -block-sparse and that we observe y = A Ψ α + e . If A Ψ satisfies the block-RIP of order K with constant δ K ≤ . , then the output ofblock-based CoSaMP (Algorithm 2) with (cid:98) α = BBCoSaMP( A Ψ , I , y , K ) satisfies (cid:107) α − (cid:98) α (cid:107) ≤ κ (cid:107) e (cid:107) (27) where κ > is a constant determined by δ K and the stopping criterion. Theorems 3.1 and 3.2 show that measurement noise has a controlled impact on the amountof noise in the reconstruction. However, note that Theorems 3.1 and 3.2 are fundamentally dif-ferent from one another when the matrix Ψ is not an orthonormal basis. Theorem 3.2 requiresthe assumption that A Ψ satisfies the block-RIP while Theorem 3.1 requires that A satisfies the Ψ -block-RIP. Theorem 3.2 also provides a different guarantee (recovery of α ) compared to Theo-rem 3.1 (which guarantees recovery of x ). In the case where Ψ is an orthonormal basis, we couldimmediately set (cid:98) x = Ψ (cid:98) α so that the recovery guarantee in (27) applies to the error in x as well.However, for arbitrary dictionaries Ψ this equivalence no longer holds. We conjecture that werewe to instead consider (cid:98) x = BBCoSaMP( A , Ψ , y , K ) we should be able to establish a theorem forblock-based CoSaMP analogous to Theorem 3.1, but we leave such an analysis for future work.In Section 4 we develop a multiband modulated DPSS dictionary Ψ designed to offer high-quality block-sparse representations for sampled multiband signals. Using this dictionary we estab-lish block-RIP and Ψ -block-RIP guarantees in Section 5, which allows us to translate Theorems 3.1and 3.2 into guarantees for recovery of sampled multiband signals from compressive measurements.When we then turn to implement these algorithms, however, it is important to note that, althoughwe can implement the BBCoSaMP( A Ψ , I , y , K ) version of block-based CoSaMP with no trouble, By block-compressible, we mean signals that are well-approximated by a block-sparse signal. The guarantee onthe recovery error for block-compressible signals is similar to those in Theorems 3.1 and 3.2 but includes an additionaladditive component that quantifies the error incurred by approximating α with a block-sparse signal. If a signal isclose to being block-sparse, then this error is negligible, but if a signal is not block-sparse at all, then this error canpotentially be large. It is also worth noting that in the context of traditional (as opposed to model-based) CS, there do exist guaranteeson (cid:107) (cid:98) x − x (cid:107) for general Ψ when using an alternative optimization-based approach combined with the Ψ -RIP [12]. A , Ψ , y , K )version of block-based CoSaMP. Specifically, both of those latter algorithms require that we be ableto compute P ( x , K ) as defined in (22). Unfortunately, because our dictionary Ψ is not orthonor-mal we are aware of no algorithms that are guaranteed to solve this problem. However, we willsee empirically in Section 6 that we can attempt to solve this problem by applying many of thesame algorithms commonly used for CS recovery. In other words, we can perfectly implement analgorithm (BBCoSaMP( A Ψ , I , y , K )) that has a provable guarantee on the recovery of α , and wecan approximately implement algorithms (block-based IHT and BBCoSaMP( A , Ψ , y , K )) that areintended to accurately recover x . We have implemented both variations of block-based CoSaMP,and while both perform well in practice, the empirical performance of BBCoSaMP( A , Ψ , y , K )turns out to be superior. In Section 6 we present a suite of simulations demonstrating the re-markable effectiveness of BBCoSaMP( A , Ψ , y , K ) (when combined with the multiband modulatedDPSS dictionary) in recovering sampled multiband signals from compressive measurements. We now confront the problem of designing a dictionary Ψ in which x , a length- N vector of Nyquist-rate samples of x ( t ), will be sparse or compressible. This is typically a significant challenge toanyone applying CS techniques to analog signals, since many natural analog signal models cannotbe obviously represented via a simple basis Ψ . We now describe two of the most appealing analogsignal models and discuss the degree to which these models can fit within the CS framework. There are a variety of possibilities for quantifying the notion of sparsity in a continuous-time signal x ( t ). Perhaps the simplest, dating back at least to the work of Prony [56], is the multitone model,which assumes that x ( t ) can be expressed as x ( t ) = S − (cid:88) k =0 α k e j πF k t , (28)i.e., x ( t ) is simply a weighted combination of S complex exponentials of arbitrary frequencies. Arelated model is given by x ( t ) = N − (cid:88) n =0 α [ n ] e j π ( n − N/ t/NT s , (29)where (cid:107) α (cid:107) = S . This model is considered in [73], which provided one of the first extensions ofthe CS framework to the case of continuous-time signals. The advantage of this model is that (29)implies that x = Ψ α , where x is the vector of discrete-time samples of x ( t ) on [0 , T w ) and Ψ is the non-normalized N × N DFT matrix with suitably ordered columns. Thus, the model (29)immediately fits into the standard CS framework when the vector of coefficients α is sparse.In practice, however, this approach is inadequate for two main reasons. First, the model in (29)assumes that any tones in the signal have a frequency that lies exactly on the so-called Nyquist-grid , i.e., the tones are bounded harmonics . When dealing with tones of arbitrary frequencies, thecorresponding “DFT leakage” results in α that are not sparse and are not even well-approximatedas sparse. In this case it can be useful to either incorporate a smooth windowing function into14he measurement system, as in [73], or to consider the less restrictive model in (28), as in [30].However, these approaches do not address the second main objection, which is that many (if notmost) real-world signals are not mere combinations of a few pure tones. For a variety of reasons, itis typically more realistic to assume that each of the S signal components has some non-negligiblebandwidth, which leads us to instead consider the following extension of the multitone model. For the remainder of this paper, we will focus on multiband signals, or signals whose Fouriertransform is concentrated on a small number of continuous intervals or bands. Towards this end,for a general continuous-time signal x ( t ), we define F ( x ) as the support of X ( F ), i.e., F ( x ) = { F ∈ R : X ( F ) (cid:54) = 0 } . We will be interested in signals for which we can decompose F = F ( x ) into aunion of K continuous intervals, so that we can write F ⊆ K − (cid:91) i =0 [ a i , b i ] . In the most general setting, we would allow the endpoints of each interval to be arbitrary but subjectto a restriction on the total Lebesgue measure of their union. See, for example, [8, 34, 35, 75]. Inthis paper we restrict ourselves to a simpler model. Specifically, we divide the range of possiblefrequencies from − B nyq to B nyq into J = B nyq B band equal bands of width B band and require X ( F ) to besupported on at most K of these bands. An example is illustrated in Figure 1. More formally, wedefine the i th band as ∆ i = (cid:20) − B nyq iB band , − B nyq i + 1) B band (cid:21) . We then defineΓ(
K, B band ) = (cid:40) F : F ⊆ (cid:91) i ∈I ∆ i for some I ⊆ { , , . . . , J − } with |I| ≤ K (cid:41) as the set of all possible supports. Using this, we define M ( K, B band ) = (cid:8) x ( t ) : F ( x ) ⊆ F (cid:48) for some F (cid:48) ∈ Γ( K, B band ) (cid:9) (30)as the set of multiband signals. Note that the total occupied bandwidth is at most KB band . Ourinterest is in the setting where KB band (cid:28) B nyq , so that if we knew a priori where the K bandswere located, we could acquire x ( t ) in a streaming setting with only KB band samples per second.(This is the so-called Landau rate [41].) Our goal is to acquire finite windows of multiband signalswithout such a priori knowledge while keeping the measurement rate as close as possible to theLandau rate.Note, however, that the set M ( K, B band ) is defined for infinite-length signals x ( t ). Indeed, anysignal with a Fourier transform supported on a finite range of frequencies cannot also be supportedon a finite range of time. This would seem to be somewhat at odds with the finite-dimensional CSframework described above. As a result, previous efforts aimed at sampling multiband signals havedeveloped largely outside the framework of CS [8, 34, 35, 47, 48, 75]. It is our goal in this paper toshow that it is possible to recover a finite-length window of samples of a multiband signal usingmany of the standard tools from CS. To do this, we will need to construct an appropriate dictionary Ψ for capturing the structure of this set, which we do in the following section.15 ( F ) F − B nyq B nyq B band Figure 1:
Illustration showing the CTFT of a multiband signal where J = B nyq B band = 10 and K = 3 . Finally, we also note that while our signal model breaks up the spectrum into J = B nyq B band bandswith fixed boundaries and bandwidth, it actually encompasses the broader class of signals wherethe bandwidth and center frequency of each band are arbitrary. For example, a signal with K bandsof width B band but with arbitrary center frequencies will automatically lie within M (2 K, B band ).Since we are primarily interested in the case where KB band (cid:28) B nyq , this factor of 2 will not besignificant in the development of our theoretical results. However, we do note that in practice itmay be possible to achieve a significant gain by exploiting a more flexible signal model.In the remainder of this section we demonstrate that it is possible to construct discrete-timebases using Discrete Prolate Spheroidal Sequences (DPSS’s) that efficiently capture the structureof sampled multiband signals. We first review DPSS’s and their key properties as first developedin [65], and we then discuss some of the consequences of these properties in terms of their utilityin representing sampled continuous-time signals. Ultimately, we demonstrate how to use DPSS’sto construct a dictionary Ψ which sparsely represents windows of sampled multiband signals. Our goal in this subsection is to provide a self-contained review of the concepts from Slepian’swork on DPSS’s [65] that are most relevant to the problem of modeling sampled multiband signals.We refer the reader to [42, 43, 63–67] for more complete overviews of DPSS’s, PSWF’s, and theirimplications in time-frequency localization.
Let N be an integer, and let 0 < W < . Given N and W , the DPSS’s are a collection of N discrete-time sequences that are strictly bandlimited to the digital frequency range | f | ≤ W yethighly concentrated in time to the index range n = 0 , , . . . , N −
1. The DPSS’s are defined tobe the eigenvectors of a two-step procedure in which one first time-limits the sequence and thenbandlimits the sequence. Before we can state a more formal definition, let us note that for a givendiscrete-time signal x [ n ], we let X ( f ) = ∞ (cid:88) n = −∞ x [ n ] e − j πfn x [ n ]. Next, we let B W denote an operatorthat takes a discrete-time signal, bandlimits its DTFT to the frequency range | f | ≤ W , andreturns the corresponding signal in the time domain. Additionally, we let T N denote an operatorthat takes an infinite-length discrete-time signal and zeros out all entries outside the index range { , , . . . , N − } (but still returns an infinite-length signal). With these definitions, the DPSS’sare defined as follows. Definition 4.1. [65]
Given N and W , the Discrete Prolate Spheroidal Sequences (DPSS’s) area collection of N real-valued discrete-time sequences s (0) N,W , s (1)
N,W , . . . , s ( N − N,W that, along with thecorresponding scalar eigenvalues > λ (0) N,W > λ (1)
N,W > · · · > λ ( N − N,W > , satisfy B W ( T N ( s ( (cid:96) ) N,W )) = λ ( (cid:96) ) N,W s ( (cid:96) ) N,W (31) for all (cid:96) ∈ { , , . . . , N − } . The DPSS’s are normalized so that (cid:107)T N ( s ( (cid:96) ) N,W ) (cid:107) = 1 (32) for all (cid:96) ∈ { , , . . . , N − } . As we discuss in more detail below in Section 4.2.4, the eigenvalues λ (0) N,W , λ (1)
N,W , . . . , λ ( N − N,W have a very distinctive behavior: the first 2
N W eigenvalues tend to cluster extremely close to 1,while the remaining eigenvalues tend to cluster similarly close to 0.Before proceeding, let us briefly mention several key properties of the DPSS’s that will be usefulin our subsequent analysis. First, it is clear from (31) that the DPSS’s are, by definition, strictlybandlimited to the digital frequency range | f | ≤ W . Second, the DPSS’s are also approximatelytime-limited to the index range n = 0 , , . . . , N −
1. Specifically, it can be shown that [65] (cid:107) s ( (cid:96) ) N,W (cid:107) = 1 (cid:113) λ ( (cid:96) ) N,W . (33)Comparing (32) with (33), we see that for values of (cid:96) where λ ( (cid:96) ) N,W ≈
1, nearly all of the energy in s ( (cid:96) ) N,W is contained in the interval { , , . . . , N − } . Third, the DPSS’s are orthogonal over Z [65],i.e., for any (cid:96), (cid:96) (cid:48) ∈ { , , . . . , N − } with (cid:96) (cid:54) = (cid:96) (cid:48) , (cid:104) s ( (cid:96) ) N,W , s ( (cid:96) (cid:48) ) N,W (cid:105) = 0.
While each DPSS actually has infinite support in time, several very useful properties hold for thecollection of signals one obtains by time-limiting the DPSS’s to the index range n = 0 , , . . . , N − T N ( s (0) N,W ) , T N ( s (1) N,W ) , . . . , T N ( s ( N − N,W ) are approximately bandlimitedto the digital frequency range | f | ≤ W . Specifically, from (31) and (33), one can deduce that (cid:107)B W ( T N ( s ( (cid:96) ) N,W )) (cid:107) = (cid:113) λ ( (cid:96) ) N,W . (34) Note that we use lower-case f to indicate the digital frequency (so that X ( f ) represents the DTFT of a discrete-time sequence x [ n ], while X ( F ) represents the CTFT of a continuous-time signal x ( t )). nx [ n ] n n n −15 −10 −5 | X ( f ) | − − f
14 12 − − f
14 12 − − f
14 12 − − f
14 12 (cid:96) = 0 (cid:96) = 127 (cid:96) = 511 (cid:96) = 767
Figure 2:
Illustration of four example DPSS’s time-limited to the interval [0 , , . . . , N − and the magnitudeof their DTFT’s. In this example, N = 1024 and W = . Note that for (cid:96) up to approximately N W − the energy of the spectrum is highly concentrated on the interval [ − W, W ] , and when (cid:96) is sufficiently largerthan N W − the energy of the spectrum is concentrated almost entirely outside the interval [ − W, W ] . Comparing (32) with (34), we see that for values of (cid:96) where λ ( (cid:96) ) N,W ≈
1, nearly all of the energyin T N ( s ( (cid:96) ) N,W ) is contained in the frequencies | f | ≤ W . An illustration of four representative time-limited DPSS’s and their DTFT’s is provided in Figure 2. While by construction the DTFT ofany DPSS is perfectly bandlimited, the DTFT of the corresponding time-limited DPSS will only beconcentrated in the bandwidth of interest for the first 2 N W
DPSS’s. As a result, we will frequentlybe primarily interested in roughly the first 2
N W
DPSS’s.Second, the time-limited DPSS’s are also orthogonal [65] so that for any (cid:96), (cid:96) (cid:48) ∈ { , , . . . , N − } with (cid:96) (cid:54) = (cid:96) (cid:48) , (cid:104)T N ( s ( (cid:96) ) N,W ) , T N ( s ( (cid:96) (cid:48) ) N,W ) (cid:105) = 0 . (35)Finally, like the DPSS’s, the time-limited DPSS’s have a special eigenvalue relationship with thetime-limiting and bandlimiting operators. In particular, if we apply the operator T N to both sidesof (31), we see that the sequences T N ( s ( (cid:96) ) N,W ) are actually eigenfunctions of the two-step procedurein which one first bandlimits a sequence and then time-limits the sequence.
Because our focus in this paper is primarily on representing and reconstructing finite-length vectors,we will find the following restriction of the time-limited DPSS’s to be useful, where we restrict thedomain exclusively to the index range n = 0 , , . . . , N − Definition 4.2.
Given N and W , the DPSS vectors s (0) N,W , s (1) N,W , . . . , s ( N − N,W ∈ R N are defined byrestricting the time-limited DPSS’s to the index range n = 0 , , . . . , N − : s ( (cid:96) ) N,W [ n ] := T N ( s ( (cid:96) ) N,W )[ n ] = s ( (cid:96) ) N,W [ n ] for all (cid:96), n ∈ { , , . . . , N − } .
55 511 767 102310 -15 -10 -5 λ ( (cid:31) ) N , W (cid:30) Figure 3:
Eigenvalue concentration for N = 1024 and W = . Note that the first N W = 512 eigenvaluescluster near and the remaining eigenvalues rapidly approach the level of machine precision. Following from our discussion in Section 4.2.2, the DPSS vectors obey several favorable prop-erties. First, combining (32) and (35), it follows that the DPSS vectors form an orthonormal basisfor C N (or for R N ). However, as we discuss in subsequent sections, bases constructed using justthe first ≈ N W
DPSS vectors can be remarkably effective for capturing the energy in our signalsof interest. Second, if we define B N,W to be the N × N matrix with entries given by B N,W [ m, n ] := 2 W sinc (2 W ( m − n )) , (36)we see that the eigenvectors of B N,W are given by the DPSS vectors s (0) N,W , s (1) N,W , . . . , s ( N − N,W , andthe corresponding eigenvalues are λ (0) N,W , λ (1)
N,W , . . . , λ ( N − N,W [65]. Thus, if we concatenate the DPSSvectors into an N × N matrix S N,W := (cid:104) s (0) N,W s (1) N,W · · · s ( N − N,W (cid:105) ∈ R N × N (37)and let Λ N,W denote an N × N diagonal matrix with the DPSS eigenvalues λ (0) N,W , λ (1)
N,W , . . . , λ ( N − N,W along the main diagonal, then we can write the eigendecomposition of B N,W as B N,W = S N,W Λ N,W S HN,W . (38)This decomposition will prove useful in our analysis below. As mentioned above, the eigenvalues λ (0) N,W , λ (1)
N,W , . . . , λ ( N − N,W have a very distinctive and importantbehavior: the first 2
N W eigenvalues tend to cluster extremely close to 1, while the remainingeigenvalues tend to cluster similarly close to 0. This behavior—which will allow us to constructefficient bases using small numbers of DPSS vectors—is illustrated in Figure 3 and captured moreformally in the following results.
Lemma 4.1 (Eigenvalues that cluster near one [65]) . Suppose that W is fixed, and let (cid:15) ∈ (0 , be fixed. Then there exist constants C , C (where C may depend on W and (cid:15) ) and an integer N (which may also depend on W and (cid:15) ) such that λ ( (cid:96) ) N,W ≥ − C e − C N for all (cid:96) ≤ N W (1 − (cid:15) ) and all N ≥ N . (39)19 emma 4.2 (Eigenvalues that cluster near zero [65]) . Suppose that W is fixed, and let (cid:15) ∈ (0 , W − be fixed. Then there exist constants C , C (where C may depend on W and (cid:15) ) and an integer N (which may also depend on W and (cid:15) ) such that λ ( (cid:96) ) N,W ≤ C e − C N for all (cid:96) ≥ N W (1 + (cid:15) ) and all N ≥ N . (40) Alternatively, suppose that W is fixed, and let α > be fixed. Then there exist constants C , C and an integer N (where N may depend on W and α ) such that λ ( (cid:96) ) N,W ≤ C e − C α for all (cid:96) ≥ N W + α log( N ) and all N ≥ N . On occasion, we will have a need to compute bounds on sums of the eigenvalues. First, we notethe following.
Lemma 4.3. N − (cid:88) (cid:96) =0 λ ( (cid:96) ) N,W = trace( B N,W ) = 2
N W. (41)The following results will also prove useful.
Corollary 4.1.
Suppose that W is fixed, let (cid:15) ∈ (0 , , and let k = 2 N W (1 − (cid:15) ) . Then for N ≥ N , N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W ≤ N W ( (cid:15) + C e − C N ) , where C , C , and N are as specified in Lemma 4.1.Proof. It follows from (39) and (41) that N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W = 2
N W − k − (cid:88) (cid:96) =0 λ ( (cid:96) ) N,W ≤ N W − N W (1 − (cid:15) ) (cid:0) − C e − C N (cid:1) = 2 N W (cid:0) − (1 − (cid:15) ) (cid:0) − C e − C N (cid:1)(cid:1) = 2 N W (cid:0) (cid:15) + C e − C N − (cid:15)C e − C N (cid:1) ≤ N W (cid:0) (cid:15) + C e − C N (cid:1) (42)for N ≥ N . Corollary 4.2.
Suppose that W is fixed, let (cid:15) ∈ (0 , W − , and let k = 2 N W (1 + (cid:15) ) . Then forfor N ≥ max( N , N ) , N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W ≤ N W min (cid:18) (cid:15) + C e − C N , C W e − C N (cid:19) , (43) where C , C , C , C , N , and N are as specified in Lemma 4.1 and Lemma 4.2.Proof. This result follows simply from Corollary 4.1 and from (40).20 .3 DPSS bases for sampled bandpass signals
Using the DPSS vectors, it is possible to construct remarkably efficient bases for representingthe discrete-time vectors that arise when collecting finite numbers of samples from most multibandsignals (e.g., signals in M ( K, B band )). Before presenting our full construction, however, we illustratethe basic concepts using sampled bandpass signals (e.g., signals in M (1 , B band )). For the moment, we restrict our attention to vectors of samples acquired from a continuous-timebandpass signal x ( t ). We assume that F ( x ), the support of X ( F ), is restricted to some interval[ F c − B band , F c + B band ], where the center frequency F c and width B band are known. We define x = [ x (0) x ( T s ) · · · x (( N − T s )] T as in Section 3 to be a finite-length vector of N samples of x ( t )collected uniformly with a sampling interval T s ≤ / (2 max {| F c ± B band |} ).As a basis for efficiently representing many such vectors x , we propose the following. First, let W = B band T s , and as in (37), let S N,W denote the N × N matrix containing the N DPSS vectors(constructed with parameters N and W ) as columns. Next, define f c = F c T s and let E f c denotean N × N diagonal matrix with entries E f c [ m, n ] := (cid:26) e j πf c m , m = n, , m (cid:54) = n. (44)Multiplying a vector by E f c simply modulates that vector by a frequency f c . Finally, consider the N × N matrix E f c S N,W , whose columns are given by the DPSS vectors, each modulated by f c . Onecan easily check that E f c S N,W forms an orthonormal basis for C N , since ( E f c S N,W ) H E f c S N,W =( S N,W ) H ( E f c ) H E f c S N,W = ( S N,W ) H S N,W = I . For a given integer k ∈ { , , . . . , N } , we let[ E f c S N,W ] k denote the N × k matrix formed by taking the first k columns of E f c S N,W . We will see that taking[ E f c S N,W ] k with k ≈ N W forms an efficient basis that, to a high degree of accuracy, capturesmost sample vectors x that can arise from sampling bandpass signals. To best illustrate one of our key points regarding the approximation of bandpass signals, let us firstrestrict our focus to the simplest possible bandpass signals: pure complex exponentials. Specifically,consider a continuous-time signal of the form x ( t ) = e j πF t where the frequency F belongs to theinterval [ F c − B band , F c + B band ]. We define x = [ x (0) x ( T s ) · · · x (( N − T s )] T , f c = F c T s , and W = B band T s as in Section 4.3.1. Also, defining e f := e j πf e j πf ... e j πf ( N − for all f ∈ [ f c − W, f c + W ], we note that the sample vector x will equal e f for f = F T s . Withoutknowing the exact carrier frequency F in advance, we ask whether it is possible to define an efficientlow-dimensional basis for capturing the energy in any sample vector x that could arise in this model.21t first glance, this problem may appear difficult or impossible. The infinite set of possiblesample vectors { e f } f ∈ [ f c − W,f c + W ] traverses a 1-dimensional manifold (parameterized by f ) within C N . Technically speaking, these vectors collectively span all of C N . What is remarkable, however,is that to a high degree of accuracy, these vectors are approximately concentrated along a verylow-dimensional subspace of C N . Moreover, for any k ∈ { , , . . . , N } , it is possible to analyticallyfind the best k -dimensional subspace that minimizes the average squared distance from the vectors e f to the subspace, and this subspace is spanned by the first k modulated DPSS vectors.To see this, we let Q denote a subspace of C N and P Q denote the orthogonal projection operatoronto Q . We would like to minimize1 B band · (cid:90) F c + B band2 F c − B band2 (cid:107) e F T s − P Q e F T s (cid:107) dF = 12 W · (cid:90) f c + Wf c − W (cid:107) e f − P Q e f (cid:107) df (45)over all subspaces Q of some prescribed dimension k . As we show below in Theorem 4.1, thisminimization problem can be solved by relating it to the Karhunen-Loeve (KL) transform. For thebenefit of the reader, we briefly review the relevant concepts from the KL transform in Appendix A.
Theorem 4.1.
For any k with k ∈ { , , . . . , N } , the k -dimensional subspace which minimizes (45) is spanned by the columns of Q = [ E f c S N,W ] k , i.e., by the first k DPSS vectors modulated to centerfrequency f c . Furthermore, with this choice of Q , we will have W · (cid:90) f c + Wf c − W (cid:107) e f − P Q e f (cid:107) df = 12 W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W . For a point of comparison, each sampled sinusoid has energy (cid:107) e f (cid:107) = N .Proof. See Appendix B.Recall that the first ≈ N W
DPSS eigenvalues are very close to 1 and the rest are small, so weare guaranteed a high degree of approximation to the sampled sinusoids if we choose k ≈ N W .In particular, if we choose k = 2 N W (1 − (cid:15) ), it follows from Corollary 4.1 that12 W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W ≤ N ( (cid:15) + C e − C N )for N ≥ N . Alternatively, if we choose k = 2 N W (1 + (cid:15) ), it follows from Corollary 4.2 that12 W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W ≤ N min (cid:18) (cid:15) + C e − C N , C W e − C N (cid:19) for N ≥ max( N , N ). Since each sampled sinusoid has energy (cid:107) e f (cid:107) = N , for the subspace we havechosen (say with k = 2 N W (1 − (cid:15) )), the average relative approximation error across the sampledsinusoids is bounded by a small fraction (cid:15) + C e − C N of the energy of a given sinusoid.It is important to note that, while we are guaranteed a very high degree of approximationaccuracy in an MSE sense, we are not guaranteed such accuracy uniformly over all sampled sinusoidsin our band of interest. A relatively small number of sinusoids may have higher values of (cid:107) e f − P Q e f (cid:107) , and in practice this diminished approximation performance tends to occur for thosesinusoids with frequencies near the edge of the band (i.e., for f near f c ± W ). The observation that a connection exists between DPSS’s and the KL transform is not a new one (see, forexample, [33, 39, 49, 58]).
256 512 768 1024 S N R ( d B ) (cid:31) f = 0 f = f = k f S N R ( d B ) k = 500 k = 512 k = 700 k = 560 − −
14 14 12 (a) (b)
Figure 4:
DPSS bases efficiently capture the energy in sampled complex exponentials. (a) SNR capturedfrom three sampled complex exponentials (with differing frequencies f ) as a function of the number k ofDPSS basis elements. (b) SNR captured as a function of f for four fixed values of k . In both plots, N = 1024 and W = . This behavior is illustrated in Figure 4. In Figure 4(a) we consider three possible frequencies( f = 0, , or ) and show the ability of the baseband DPSS basis (with f c = 0 and W = ) tocapture the energy in these sinusoids as a function of how many DPSS vectors are added to thebasis. The ability of a basis to capture a given signal is quantified throughSNR = 20 log (cid:18) (cid:107) e f (cid:107) (cid:107) e f − P Q e f (cid:107) (cid:19) dB . Overall, we observe broadly similar behavior for each frequency in that by adding slightly morethan 2
N W
DPSS vectors to our basis, we capture essentially all of the energy in the original signal.However, we do observe slightly different behavior for the sinusoid with a frequency exactly at W .In this case we capture very little of the energy in the signal until we have added more than 2 N W vectors, while for lower frequencies we begin to do well before this point.To illustrate this phenomenon, Figure 4(b) considers four different sizes for the DPSS basis andshows the SNR as a function of the frequency of the sinusoid. In the cases where k = 500 < N W and k = 2 N W we see somewhat similar behavior—we are capturing a good fraction, but not all, ofany sinusoid whose frequency is not too close to W . We see a dramatic difference when we increase k slightly to 560, at which point we are capturing virtually all of the energy in any sinusoid withinthe band of interest. However, this eventually has a potentially problematic side-effect, as we cansee by further increasing k to 700. Specifically, as we continue to increase the size of the DPSS basiswe begin to capture energy of sinusoids that lie outside the targeted band as well. This tradeoff willplay an important role in the selection of the appropriate k in the algorithms we propose below. The above analysis shows that sampled sinusoids, on average, are well-approximated by modulatedDPSS bases. This is a strong indication that such bases might also be useful for approximatingmore general sampled bandpass signals, since the vectors { e f } f ∈ [ f c − W,f c + W ] ⊂ C N themselves act as“building blocks” for representing sampled, bandpass signals in C N . Formally, for any continuous-time bandpass signal x ( t ) with frequency content restricted to the interval [ F c − B band , F c + B band ],23ne can show that for each n ∈ { , , . . . , N − } , x [ n ] := x ( nT s ) = (cid:90) F c + B band2 F c − B band2 X ( F ) e j πF nT s dF = (cid:90) f c + Wf c − W X ( f ) e f [ n ] df, where we recall that X ( F ) denotes the CTFT of x ( t ) and X ( f ) denotes the DTFT of x [ n ]. So,informally, x can be expressed as a linear combination of (infinitely many) sampled complex expo-nentials e f , where f ranges from f c − W to f c + W .Our analysis from Section 4.3.2 allows us to show that in a certain sense, most continuous-timebandpass signals, when sampled and time-limited, are well-approximated by the modulated DPSSbasis. In particular, the following result establishes that bandpass random processes, when sampledand time-limited, are in expectation well-approximated. Theorem 4.2.
Let x ( t ) denote a continuous-time, zero-mean, wide sense stationary random processwith power spectrum P x ( F ) = (cid:26) B band , F ∈ [ F c − B band , F c + B band ] , , otherwise , and let x = [ x (0) x ( T s ) · · · x (( N − T s )] T ∈ C N denote a finite vector of samples acquired from x ( t ) with a sampling interval T s ≤ / (2 max {| F c ± B band |} ) . Then over all k -dimensional subspacesof C N , x is best approximated (in an MSE sense) by the subspace spanned by the columns of Q = [ E f c S N,W ] k , where f c = F c T s and W = B band T s . The corresponding MSE is given by E (cid:2) (cid:107) x − P Q x (cid:107) (cid:3) = 12 W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W , (46) while E (cid:2) (cid:107) x (cid:107) (cid:3) = N .Proof. See Appendix C.As in our discussion following Theorem 4.1, we can ensure that the MSE in (46) is smallcompared to E (cid:2) (cid:107) x (cid:107) (cid:3) by choosing k ≈ N W . This suggests that in a probabilistic sense, mostbandpass signals, when sampled and time-limited, will be well-approximated by a small number ofmodulated DPSS vectors. Again, however, we are not guaranteed such accuracy uniformly over all sampled bandpass signals in our band of interest. A relatively small number of bandpass signals x ( t )could lead to sample vectors x with higher values of (cid:107) x − P Q x (cid:107) . In particular, recalling that thebaseband DPSS’s are themselves strictly bandlimited, it follows that there exist strictly bandpasssignals that when sampled and time-limited yield the modulated DPSS vectors. If we restrict Q tothe first k columns of E f c S N,W , then any bandpass signal producing a sample vector x = E f c s ( (cid:96) ) N,W with (cid:96) ≥ k will have P Q x = and (cid:107) x − P Q x (cid:107) = (cid:107) x (cid:107) . Fortunately, Theorem 4.2 confirms thatsuch bandpass signals are relatively uncommon: at the risk of belaboring this important point,most bandpass signals, when sampled and time-limited, produce sample vectors approximately inthe span of just the first k ≈ N W modulated DPSS vectors.On a related note, signal processing engineers often have a sense for how much “spectral leakage”to anticipate when collecting a finite window of samples of a continuous-time signal. (Frequently,this leakage is reduced via a smooth windowing function [46].) Such practitioners can rest assuredthat, in every case where the spectral leakage is small outside a bandpass range of frequencies, theresulting sample vector can be well-approximated by a small number of modulated DPSS vectors.24 heorem 4.3.
Let x [ n ] = T N ( x [ n ]) be a time-limited sequence, and suppose that x [ n ] is approxi-mately bandlimited to the frequency range f ∈ [ f c − W, f c + W ] such that for some δ , (cid:107)B f c ,W x (cid:107) ≥ (1 − δ ) (cid:107) x (cid:107) , where B f c ,W denotes an orthogonal projection operator that takes a discrete-time signal, bandlimitsits DTFT to the frequency range f ∈ [ f c − W, f c + W ] , and returns the corresponding signal inthe time domain. Let x ∈ C N denote the vector formed by restricting x [ n ] to the indices n =0 , , . . . , N − . Set k = 2 N W (1 + (cid:15) ) and let Q = [ E f c S N,W ] k . Then for N ≥ N , (cid:107) x − P Q x (cid:107) ≤ ( δ + N C e − C N ) (cid:107) x (cid:107) , (47) where C , C , and N are as specified in Lemma 4.2.Proof. See Appendix D.Note that, in contrast to Theorem 4.2, Theorem 4.3 requires k to be slightly larger than 2 N W . In order to construct an efficient dictionary for sampled multiband signals, we propose simply toconcatenate an ensemble of modulated DPSS bases, each one modulated to the center of a band inour model. In particular, in light of the multiband signal model discussed in Section 4.1.2, wherethe bandwidth [ − B nyq , B nyq ] is partitioned into bands ∆ i of size B band , let us define the midpointof ∆ i as F i = − B nyq (cid:18) i + 12 (cid:19) B band , i ∈ { , , . . . , J − } , where J = B nyq B band . Let W = B band T s (we assume a sampling interval T s ≤ B nyq ), and for each i , let f i = F i T s and define Ψ i = [ E f i S N,W ] k (48)for some value of k that we can choose as desired. We construct the multiband modulated DPSSdictionary Ψ by concatenating all of the Ψ i : Ψ = [ Ψ Ψ · · · Ψ J − ] . (49)The matrix Ψ will have size N × kJ (note that if k = 2 N W and T s = B nyq , Ψ will be square). In a probabilistic sense, most multiband signals, when sampled and time-limited, will be well-approximated by a small number of vectors from the multiband modulated DPSS dictionary. Inparticular, there exists a block-sparse approximation for such sample vectors using only the modu-lated DPSS vectors corresponding to the active signal bands.
Theorem 4.4.
Let
I ⊆ { , , . . . , J − } with |I| = K . Suppose that for each i ∈ I , x i ( t ) is acontinuous-time, zero-mean, wide sense stationary random process with power spectrum P x i ( F ) = (cid:26) KB band , F ∈ ∆ i , , otherwise , nd furthermore suppose the { x i } i ∈I are independent and jointly wide sense stationary. Let x ( t ) = (cid:80) i ∈I x i ( t ) , and let x = [ x (0) x ( T s ) · · · x (( N − T s )] T ∈ C N denote a finite vector of samplesacquired from x ( t ) with a sampling interval of T s ≤ B nyq . Let Ψ I denote the concatenation of the Ψ i over all i ∈ I , where the Ψ i are as defined in (48) . Then E (cid:2) (cid:107) x − P Ψ I x (cid:107) (cid:3) ≤ K W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W , (50) whereas E (cid:2) (cid:107) x (cid:107) (cid:3) = N .Proof. See Appendix E.Theorem 4.4 confirms the existence of a high-quality block-sparse approximation for most sam-pled multiband signals; in particular, the signal approximation vector specified in (50) can be writ-ten as P Ψ I x = Ψ α for a K -block-sparse coefficient vector α given by α | I = Ψ †I x and α | I c = .As in our previous discussions for bandpass signals, we can ensure the MSE in (50) is small com-pared to E (cid:2) (cid:107) x (cid:107) (cid:3) by choosing k ≈ N W . Compared to previous analysis, however, the MSEappearing in Theorem 4.4 is larger by a factor of K (though this quantity may still be quite small).Although it may be possible to improve upon this figure, the reader should keep in mind that themultiband modulated DPSS dictionary is not necessarily the optimal basis for representing sampledmultiband signals with a given sparsity pattern I ; it is merely a generic (and easily computable)dictionary that provides highly accurate approximations for most multiband signals having anypossible sparsity pattern.Although we omit the details here, one could also consider generalizing Theorem 4.3 to multi-band signals that have small spectral leakage when windowed in time. In this section, we proceed to develop theoretical guarantees for signal recovery using the multibandmodulated DPSS dictionary Ψ as defined in (49). Throughout our theoretical discussion and thesubsequent experiments, we pay special attention to the role played the number k of DPSS vectorsper band. We begin in Section 5.1 with a collection of RIP guarantees, and we extend these tosignal recovery guarantees in Section 5.2. Throughout this section and the remainder of the paper,we assume that T s ≤ B nyq . We can actually immediately establish Ψ -RIP and Ψ -block-RIP guarantees for any value of k . Thetheorem below follows as a direct consequence of Corollaries 3.1 and 3.3. Theorem 5.1.
Let k ∈ { , , . . . , N } , set D = kJ = kB nyq B band , and let Ψ be the N × D multibandmodulated DPSS dictionary defined in (49) . The following statements hold:1. Fix δ, β ∈ (0 , , and let A be an M × N subgaussian matrix with M satisfying (15) . Thenwith probability exceeding − β , A satisfies the Ψ -RIP of order S with constant δ .2. Fix δ, β ∈ (0 , , and let A be an M × N subgaussian matrix with M satisfying (25) . Thenwith probability exceeding − β , A satisfies the Ψ -block-RIP of order K with constant δ .
26n order to establish RIP and block-RIP bounds for A Ψ , however, we must restrict our attentionto values of k that are not too large (this ensures the matrix Ψ is not overcomplete). To be specific,we note that for any k , the columns of Ψ have unit norm. In addition, when k is suitably small,the columns of Ψ are approximately orthogonal. Lemma 5.1.
Let k = 2 N W (1 − (cid:15) ) , set D = kJ = N (1 − (cid:15) ) B nyq T s < N , and let Ψ be the N × D multiband modulated DPSS dictionary defined in (49) . Then for any pair of distinct columns q , q in Ψ , |(cid:104) q , q (cid:105)| ≤ C / e − C N (51) if N ≥ N , where C , C , and N are as specified in Lemma 4.1.Proof. See Appendix F.Using this fact, we can ensure that whenever k = 2 N W (1 − (cid:15) ), Ψ must act as an approximateisometry between any coefficient vector α ∈ C D and the corresponding signal vector x = Ψ α ∈ C N . Lemma 5.2.
Let k = 2 N W (1 − (cid:15) ) , set D = kJ = N (1 − (cid:15) ) B nyq T s < N , and let Ψ be the N × D multiband modulated DPSS dictionary defined in (49) . Then (cid:113) − N C / e − C N ≤ (cid:107) Ψ α (cid:107) (cid:107) α (cid:107) ≤ (cid:113) N C / e − C N (52) for all α ∈ C D .Proof. The sharpest possible lower and upper bounds in (52) are given by the smallest andlargest singular values of Ψ , respectively. Using standard results from linear algebra, σ min ( Ψ ) = (cid:113) λ min ( Ψ H Ψ ) and σ max ( Ψ ) = (cid:113) λ max ( Ψ H Ψ ). The Gram matrix Ψ H Ψ has size D × D . All entrieson the main diagonal of Ψ H Ψ are equal to 1, and all entries off of the main diagonal can be boundedusing (51). From the Gerˇsgorin circle theorem [36], it follows that all eigenvalues of Ψ H Ψ mustfall in the interval [1 − DC / e − C N , DC / e − C N ], which for simplicity we note is containedwithin the interval [1 − N C / e − C N , N C / e − C N ] since by assumption D < N .Lemma 5.2 is the key fact we need to establish RIP and block-RIP bounds for A Ψ . Theorem 5.2.
Let k = 2 N W (1 − (cid:15) ) , set D = kJ = N (1 − (cid:15) ) B nyq T s < N , and let Ψ be the N × D multiband modulated DPSS dictionary defined in (49) . The following statements hold:1. If A satisfies the Ψ -RIP of order S with constant δ , then A Ψ satisfies the RIP of order S with constant δ + 6 N C / e − C N .2. If A satisfies the Ψ -block-RIP of order K with constant δ , then A Ψ satisfies the block-RIPof order K with constant δ + 6 N C / e − C N .Proof. For any α ∈ C D such that (9) holds, we can use (52) to conclude that √ − δ · (cid:113) − N C / e − C N ≤ (cid:107) A Ψ α (cid:107) (cid:107) α (cid:107) ≤ √ δ · (cid:113) N C / e − C N . For the upper bound, note that(1 + δ ) (cid:16) N C / e − C N (cid:17) = 1 + δ + 3 N C / e − C N + δ N C / e − C N ≤ δ + 6 N C / e − C N , where the second line follows from the assumption that δ <
1. The lower bound follows similarly.27 .2 Recovery guarantees
In this section we prove that with a sufficient number of measurements and an appropriatelyconstructed multiband modulated DPSS dictionary, most sample vectors of multiband signals canbe accurately reconstructed. Our proof of this fact relies on two principles.
The first of these principles is that, as a consequence of our RIP results in Section 5.1, signalvectors having representations that are exactly K -block-sparse in the dictionary Ψ can be accu-rately reconstructed from compressive samples. The following two results follow from combiningTheorems 3.1, 3.2, 5.1, and 5.2. Theorem 5.3.
Let k ∈ { , , . . . , N } , set D = kJ = kB nyq B band , and let Ψ be the N × D multibandmodulated DPSS dictionary defined in (49) . Fix δ ∈ (0 , . and β ∈ (0 , , and let A be an M × N subgaussian matrix with M ≥ K (cid:0) DJ log(42 /δ ) + log( eJ/ K ) (cid:1) + log(4 /β ) c ( δ/ √ . (53) Then with probability exceeding − β , the following statement holds: For any x (cid:48) ∈ C N that has a K -block-sparse representation in the dictionary Ψ (i.e., that can be written as x (cid:48) = Ψ α (cid:48) for some K -block-sparse vector α (cid:48) ∈ C D ), if we use block-based IHT (23) with µ ∈ [1 + δ, . − δ )) torecover an estimate (cid:98) x of x (cid:48) from the observations y = Ax (cid:48) + e , the resulting (cid:98) x will satisfy (cid:13)(cid:13) x (cid:48) − (cid:98) x (cid:13)(cid:13) ≤ κ (cid:107) e (cid:107) , (54) where κ > is as specified in Theorem 3.1. Theorem 5.4.
Let k = 2 N W (1 − (cid:15) ) , set D = kJ = N (1 − (cid:15) ) B nyq T s < N , and let Ψ be the N × D multiband modulated DPSS dictionary defined in (49) . Fix δ ∈ (0 , . − N C / e − C N ) and β ∈ (0 , . Let A be an M × N subgaussian matrix with M ≥ K (cid:0) DJ log(42 /δ ) + log( eJ/ K ) (cid:1) + log(4 /β ) c ( δ/ √
2) (55)
Then with probability exceeding − β , the following statement holds: For any x (cid:48) ∈ C N that hasa K -block-sparse representation in the dictionary Ψ (i.e., that can be written as x (cid:48) = Ψ α (cid:48) forsome K -block-sparse vector α (cid:48) ∈ C D ), if we use block-based CoSaMP (Algorithm 2) with (cid:98) α =BBCoSaMP( A Ψ , I , y , K ) to recover an estimate (cid:98) α of α (cid:48) from the observations y = Ax (cid:48) + e , theresulting (cid:98) α will satisfy (cid:13)(cid:13) α (cid:48) − (cid:98) α (cid:13)(cid:13) ≤ κ (cid:107) e (cid:107) (56) where κ > is as specified in Theorem 3.2. The second of our principles is that most multiband signals with K occupied bands, when sampledand time-limited, have a high-quality K -block-sparse representation in the dictionary Ψ . Let x ( t ) denote a continuous-time multiband signal with nonzero support on blocks indexed by I ⊆{ , , . . . , J − } , where |I| = K . Let x = [ x (0) x ( T s ) · · · x (( N − T s )] T ∈ C N denote a finite28ector of samples acquired from x ( t ) with a sampling interval of T s ≤ B nyq . Let α (cid:48) be a K -block-sparse coefficient vector given by α (cid:48) | I = Ψ †I x and α (cid:48) | I c = . Defining x (cid:48) := Ψ α (cid:48) and e x := x − Ψ α (cid:48) = x − x (cid:48) , we can then write x = x (cid:48) + e x , (57)where x (cid:48) has an exactly K -block-sparse representation in the dictionary Ψ and we expect e x to besmall.We can more formally bound the size of e x . For example, under the multiband random processmodel for x ( t ) described in Theorem 4.4, we will have E (cid:2) (cid:107) e x (cid:107) (cid:3) ≤ K W (cid:80) N − (cid:96) = k λ ( (cid:96) ) N,W . By setting thenumber of columns per band k to be on the order of 2 N W , we can make this error small relativeto E (cid:2) (cid:107) x (cid:107) (cid:3) = N . For example, if we take k = 2 N W (1 − (cid:15) ) for some (cid:15) ∈ (0 , E (cid:2) (cid:107) e x (cid:107) (cid:3) ≤ K W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W ≤ KN ( (cid:15) + C e − C N ) . (58)for N ≥ N . The rightmost upper bound in (58) can be made as small as desired (relative to E (cid:2) (cid:107) x (cid:107) (cid:3) = N ) by choosing (cid:15) sufficiently small and N sufficiently large.For any value of k , we can also establish a tail bound on (cid:107) e x (cid:107) , guaranteeing that it is unlikelyto significantly exceed the quantity K W (cid:80) N − (cid:96) = k λ ( (cid:96) ) N,W . Using standard concentration of measurearguments for subgaussian and subexponential random variables (see [76] for a thorough discussion),one can make the following guarantee on the squared norm of a Gaussian random vector.
Lemma 5.3.
Let z ∈ C N be a complex-valued Gaussian random vector with mean zero. Then P (cid:2)(cid:12)(cid:12) (cid:107) z (cid:107) − E (cid:2) (cid:107) z (cid:107) (cid:3)(cid:12)(cid:12) ≥ γ E (cid:2) (cid:107) z (cid:107) (cid:3)(cid:3) ≤ (cid:26) − min (cid:18) γ ( (cid:80) n λ n ) c (cid:80) n λ n , γ (cid:80) n λ n c max n λ n (cid:19)(cid:27) , where c is a universal constant and { λ n } denote the eigenvalues of the autocorrelation matrix ofthe length- N random vector [Re( z ) T Im( z ) T ] T . If we assume in our multiband random process model that x ( t ) is a Gaussian random process,this will imply that x is a Gaussian random vector, and since e x is a linear transformation of x , e x will be Gaussian as well. This allows us to apply Lemma 5.3 to the quantity (cid:107) e x (cid:107) . Using the factthat (cid:107) λ (cid:107) ≥ (cid:107) λ (cid:107) ≥ (cid:107) λ (cid:107) ∞ for any vector λ , we can pessimistically simplify this bound to read: P (cid:34) (cid:107) e x (cid:107) ≥ (1 + γ ) K W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W (cid:35) ≤ (cid:26) − min (cid:18) γ c , γc (cid:19)(cid:27) . (59) To put these two principles together, we note that when taking noise-free compressive measurements Ax of a signal vector x obeying (57), we will have Ax = Ax (cid:48) + Ae x . This allows us to invokeTheorems 5.3 and 5.4 with e := Ae x , and when the number of columns per band k is chosen sothat we expect e x to be small, the concentration of measure phenomenon tells us that e should One could easily incorporate actual measurement noise into e as well (and thus extend Theorems 5.5 and 5.6 tothe case of noisy measurements). However, for the sake of clarity we simply set e := Ae x in order to highlight theimpact of modeling error in our main results.
29e small as well. In particular, note that for fixed e x and random subgaussian A , (11) guaranteesthat P (cid:104)(cid:12)(cid:12)(cid:12) (cid:107) e (cid:107) − (cid:107) e x (cid:107) (cid:12)(cid:12)(cid:12) ≥ η (cid:107) e x (cid:107) (cid:105) ≤ e − c ( η ) M . (60)Then, for example, if we let let (cid:98) x denote the estimated signal vector recovered via block-based IHT,(60) allows us to write (cid:107) x − (cid:98) x (cid:107) ≤ (cid:107) x − x (cid:48) (cid:107) + (cid:107) x (cid:48) − (cid:98) x (cid:107) ≤ (cid:107) e x (cid:107) + κ (cid:107) e (cid:107) ≤ (1 + κ (1 + η ) / ) (cid:107) e x (cid:107) , where the second inequality follows from Theorem 5.3 and the third inequality holds with probabilityat least 1 − e − c ( η ) M . Combining this fact with the tail bound (59), we can establish the followingguarantee. Theorem 5.5.
Let k ∈ { , , . . . , N } , set D = kJ = kB nyq B band , and let Ψ be the N × D multibandmodulated DPSS dictionary defined in (49) . Fix δ ∈ (0 , . and β ∈ (0 , , and let A be an M × N subgaussian matrix with M satisfying (53) . If x ( t ) is a Gaussian random process obeying the K -band model described in Theorem 4.4, x ∈ C N is generated by sampling x ( t ) as in Theorem 4.4,and we use block-based IHT (23) with µ ∈ [1 + δ, . − δ )) to recover an estimate of x from theobservations y = Ax , then the resulting estimate (cid:98) x will satisfy (cid:107) x − (cid:98) x (cid:107) ≤ (1 + κ (1 + η ) / ) (1 + γ ) K W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W (61) with probability at least − β − e − c ( η ) M − e − min( γ /c ,γ/c ) , where W = B band T s as specified in thedictionary construction (49) , κ > is a constant as specified in Theorem 3.1, c is a constant asspecified in (11) , and c is a universal constant. Although the above result holds for any k ∈ { , , . . . , N } , it is perhaps most interesting whenthe number of columns per band k is chosen to be on the order of 2 N W . For example, if we take k = 2 N W (1 − (cid:15) ), (61) will read (cid:107) x − (cid:98) x (cid:107) ≤ (1 + κ (1 + η ) / ) (1 + γ )( (cid:15) + C e − C N ) KN. (62)Supposing that K , κ , η , and γ are fixed, one can make the upper bound appearing in (62) as smallas desired (relative to E (cid:2) (cid:107) x (cid:107) (cid:3) = N ) by choosing (cid:15) sufficiently small and N sufficiently large. Specifically, the term (cid:15) + C e − C N can be made arbitrarily close to zero by choosing (cid:15) sufficientlysmall and N sufficiently large. Of course, as one increases the length N of the signal vector, therequisite number M of measurements will increase proportionally (this is captured in (53) via thedependence on D ). What is important about the measurement bound is that MN , the permittedundersampling ratio relative to the Nyquist sampling rate (supposing T s = B nyq ), will scale like MN ∼ KB band B nyq . In other words, we need only collect compressive samples at a rate proportional to KB band , thetotal amount of occupied bandwidth (i.e., the so-called Landau rate [41]). One could also use Lemma 5.3 to guarantee that, with high probability, (cid:107) x (cid:107) will not be too small compared to N . We omit these details in the interest of conciseness. k , we can establish a similar guarantee for block-based CoSaMP. Let (cid:98) α be the recovered coefficient vector, and define (cid:98) x := Ψ (cid:98) α . Then we can write (cid:107) x − (cid:98) x (cid:107) ≤ (cid:107) x − x (cid:48) (cid:107) + (cid:107) Ψ α (cid:48) − Ψ (cid:98) α (cid:107) ≤ (cid:107) e x (cid:107) + (cid:113) N C / e − C N (cid:107) α (cid:48) − (cid:98) α (cid:107) ≤ (cid:107) e x (cid:107) + κ (cid:113) N C / e − C N (cid:107) e (cid:107) ≤ (1 + κ (1 + 3 N C / e − C N ) / (1 + η ) / ) (cid:107) e x (cid:107) , where the second line follows from Lemma 5.2, the third line follows from Theorem 5.4, and thelast line holds with probability at least 1 − e − c ( η ) M . Combining this fact with Corollary 4.1 andthe tail bound (59), we can establish the following guarantee. Theorem 5.6.
Let k = 2 N W (1 − (cid:15) ) , set D = kJ = N (1 − (cid:15) ) B nyq T s < N , and let Ψ be the N × D multiband modulated DPSS dictionary defined in (49) . Fix δ ∈ (0 , . − N C / e − C N ) and β ∈ (0 , , and let A be an M × N subgaussian matrix with M satisfying (55) . If x ( t ) is a Gaussian random process obeying the K -band model described in Theorem 4.4, x ∈ C N isgenerated by sampling x ( t ) as in Theorem 4.4, we use block-based CoSaMP (Algorithm 2) with (cid:98) α = BBCoSaMP( A Ψ , I , y , K ) to recover an estimate of α from the observations y = Ax , and weset (cid:98) x := Ψ (cid:98) α , then the resulting estimate will satisfy (cid:107) x − (cid:98) x (cid:107) ≤ (1 + κ (1 + 3 N C / e − C N ) / (1 + η ) / ) (1 + γ )( (cid:15) + C e − C N ) KN (63) with probability at least − β − e − c ( η ) M − e − min( γ /c ,γ/c ) , where κ > is a constant as specifiedin Theorem 3.2, C and C are constants as specified in Lemma 4.1, c is a constant as specifiedin (11) , and c is a universal constant. Once again, supposing that K , κ , η , and γ are fixed, one can make the upper bound appear-ing in (63) as small as desired (relative to E (cid:2) (cid:107) x (cid:107) (cid:3) = N ) by choosing (cid:15) sufficiently small and N sufficiently large. Specifically, the terms 3 N C / e − C N and (cid:15) + C e − C N can be made arbitrarilyclose to zero by choosing (cid:15) sufficiently small and N sufficiently large. Thus, we have again guaran-teed that most finite-length sample vectors arising from multiband analog signals can—to a veryhigh degree of approximation—be recovered from a number of compressive measurements that isproportional to the underlying information level. In this section we present the results of a suite of simulations that demonstrate the effectiveness ofour proposed approaches to multiband signal recovery from compressive measurements. In doing so,we address various practical considerations that arise within our framework. All of our simulationswere performed via a MATLAB software package that we have made available for download at . This software package contains all of the code andresults necessary to reproduce the experiments and figures described below, but should additionallyserve as a platform upon which other researchers can test and develop their own extensions of theseideas.
We begin with a brief discussion regarding our implementation and experimental setup.31 .1.1 Computing P ( x , K )We first recall that a key step in solving either block-based IHT (specifically, the variation in (23)) orblock-based CoSaMP (specifically, the variation (cid:98) x = BBCoSaMP( A , Ψ , y , K )) is the computationof P ( x , K ), i.e., the projection of x onto the set of K -block sparse signals in the dictionary Ψ (as defined in (22)). If Ψ were an orthonormal basis, this projection could be computed simplyby taking Ψ H x and setting to zero all but the K blocks of this vector with the highest energy.Unfortunately, the columns of the multiband modulated DPSS dictionary Ψ (as defined in (49))are not orthogonal, and so this thresholding approach is not guaranteed to even approximate thecorrect solution. In fact, when the number of columns per band k exceeds 2 N W (1 − (cid:15) ), Ψ will fail toact as an isometry (recall Lemma 5.2), and when k exceeds 2 N W , Ψ will actually be overcomplete.As we will see in our experiments, it can often be desirable to choose k to be larger than 2 N W ,but this lack of isometry and this overcompleteness prevent us from applying any of the standardarguments from sparse approximation to solving the projection problem.It is important to note that in (22) we are seeking a vector z that minimizes (cid:107) x − z (cid:107) and hasa block-sparse representation in Ψ . Although z will be unique, the corresponding representation in Ψ need not be unique if Ψ fails to satisfy the block-RIP. Unfortunately, there seems to be relativelylittle known about solving this type of sparse approximation problem. Nevertheless, since we areultimately interested in z (not its representation with respect to Ψ ) this should not necessarilystop us from proceeding under the hope that standard sparse approximation algorithms can stillsucceed even when Ψ is moderately overcomplete. Thus, in our simulations, we use block-OMPto obtain an approximation to P ( x , K ). Block-OMP is a straightforward generalization of theclassical OMP algorithm. It proceeds by: (i) initializing the residual vector r = x and the set I = ∅ , (ii) computing the proxy vector h = Ψ H r , (iii) identifying the block in h that has thelargest energy and adding this block to the set I , and (iv) orthogonalizing r against the columnsin Ψ I . This last step is equivalent to the “update” step in CoSaMP in Algorithms 1 and 2. Steps (ii) through (iv) are repeated until termination. The final output can be computed from x − r K ,where r K is the residual after K iterations.We close by noting that the lack of a provable technique for computing P ( x , K ) represents animportant gap between some of our theory and practice. Specifically, one of our main results (The-orem 5.5) pertains to block-based IHT (23), but this is an algorithm that we can only implementapproximately. As noted at the end of Section 3.2.3, we conjecture that one could also establishtheoretical guarantees for the BBCoSaMP( A , Ψ , y , K ) version of block-based CoSaMP, but thistoo relies on being able to compute P ( x , K ) exactly. In light of this, we point out the following.First, we are able to implement the BBCoSaMP( A Ψ , I , y , K ) version of block-based CoSaMP ex-actly because this problem can be solved with simple block thresholding of the vector α , and oneof our main results (Theorem 5.6) does pertain specifically to BBCoSaMP( A Ψ , I , y , K ). Second,our simulations in Sections 6.2 and beyond, which rely on block-OMP for approximating P ( x , K ),indicate that we are clearly finding high quality solutions to this problem despite the lack of prov-able guarantees. What is interesting is that our experimental results seem most favorable when k exceeds 2 N W by a nontrivial amount, and that is a regime where the dictionary Ψ is overcomplete.We hope to further address the implications of overcomplete Ψ in future work. In our simulations below, we focus exclusively on testing the two versions of block-based CoSaMP.One could conceivably expect similar experimental results using a properly-tuned version of block-based IHT, but since we lack theory concerning P ( x , K ) we find it more worthwhile to devote32ur space to exploring the various practical issues surrounding block-based CoSaMP. Specifically,let us note that when solving block-based CoSaMP, the key steps in this algorithm consist ofsolving the least-squares problem in the “update” step and of computing P ( x , K ) (twice), whichalso involves solving (potentially multiple) least-squares problems. Thus, it is crucial that we solvethese least-squares problems efficiently and accurately.While there has been much work in the CS community on solving these problems efficiently(for example, see [50]), we leave aside the issue of speed for the moment and focus instead on theissue of stability. In our context, we must take some care in solving these problems, since when thenumber of columns k per band becomes becomes much larger than 2 N W , the matrices involvedcan become potentially rank deficient and thus standard approaches can be numerically unstable.Fortunately, this instability can be easily addressed using relatively simple techniques.We first consider the least-squares problem that must be solved in the computation of P ( x , K ).In this case we will not focus specifically on the block-OMP algorithm that we implement, butsimply assume that we have some technique for obtaining an estimate I of S ( x , K ). At that point,solving (22) is relatively straightforward. One could solve this via P ( x , K ) = Ψ I Ψ †I x , but when I contains indices corresponding to adjacent bands, Ψ I can be nearly rank deficient. In this case, abetter approach is to construct a reduced basis U for R ( Ψ I ). One can then compute the projectionsimply via P ( x , K ) = U U H x . In our context, we use this reduced basis approach to perform theorthogonalization in each step of the block-OMP approach to computing P ( x , K ).We now turn to the main least-squares problem in the “update” step of CoSaMP. In this stepwe are given a set of blocks I and wish to solve a problem of the form (cid:101) x = arg min z (cid:107) y − Az (cid:107) s . t . z ∈ R ( Ψ I ) . (64)Note that in this case we require more than simply the projection of y onto R ( A Ψ I )—we wish toactually calculate the vector z corresponding to this projection. This requires a different approachthan the simple reduced basis approach described above (although constructing a reduced basisfor R ( Ψ I ) can still be useful in this context). In our simulations we regularize (64) via Tikhonovregularization [55, 70, 71]. The key idea is to replace (64) with (cid:101) x = arg min z (cid:107) y − Az (cid:107) s . t . z ∈ R ( Ψ I ) , (cid:107) z (cid:107) ≤ γ. (65)The constraint (cid:107) z (cid:107) ≤ γ significantly improves the conditioning of this problem when A Ψ I isill-conditioned. In our simulations below, we use the toolbox of [38] to efficiently solve (65). Notealso that in our setting, the parameter γ can be easily set using the norm of the original signal x that we are acquiring. In all simulations below (using either variation of block-based CoSaMP),we assume that an upper bound on (cid:107) x (cid:107) is known. In practical settings such an upper bound willusually be available, but if necessary one could also estimate this upper bound from (cid:107) y (cid:107) . In all of the experiments below, we assume that T s = B nyq and that there are J = B nyq B band = 256possible bands. For each value of k that we consider, we set D = kJ and let Ψ be the N × D multiband modulated DPSS dictionary defined in (49). The digital half-bandwidth parameter W is set to be W = B band T s = , and we consider sample vectors with length N = 4096, so that2 N W = 16.In our experiments we generate our sampled multiband signal vectors x by selecting the positionsof the K occupied bands uniformly at random from the J possibilities, and then within each33and adding together 50 complex exponentials with frequencies selected uniformly at random fromwithin the frequency band (not aligned with the “Nyquist grid”). Each complex exponential inthis summation is given a random amplitude and phase via multiplication by a complex Gaussianrandom variable. There are a variety of other possibilities for generating test signals which weconsidered and for which we have observed essentially the same results as presented below.We will typically report our results in terms of the SNR of the recovery, which in this contextis defined as SNR = 20 log (cid:18) (cid:107) x (cid:107) (cid:107) x − (cid:98) x (cid:107) (cid:19) dB , where x is the original input to the measurement matrix and (cid:98) x is the recovered estimate of x provided by CoSaMP. In all cases where we plot the SNR, what we actually show is the result of50 independent trials (with a new signal for each trial). Rather than plotting the mean SNR, weplot the contour for which 95% of trials result in an SNR at least as large as the level indicated (sothat only 5% of trials result in a worse performance than indicated). ( A , Ψ , y , K ) versus BBCoSaMP ( A Ψ , I , y , K ) We begin with a comparison between the two versions of block-based CoSaMP discussed in Sec-tion 3.2—specifically, BBCoSaMP( A , Ψ , y , K ) and BBCoSaMP( A Ψ , I , y , K ). Recall that the keydifference between these approaches is that the former essentially tries to recover x directly, whilethe latter instead attempts to first recover α and then estimates (cid:98) x as Ψ (cid:98) α . Since less is known aboutthe theoretical properties of BBCoSaMP( A , Ψ , y , K ) (both in and of itself and with respect to thecomputation of P ( x , K )), we have focused more on BBCoSaMP( A Ψ , I , y , K ) in our theoreticaldevelopment. However, all of our theorems regarding BBCoSaMP( A Ψ , I , y , K ) (e.g., Theorems 5.4and 5.6) are limited to the case where the number k of DPSS vectors per band used to constructthe multiband modulated DPSS dictionary Ψ satisfies k < N W , and as we will see shortly, inpractice we obtain significant improvements in performance by considering k > N W .In our experiments we evaluate both versions of block-based CoSaMP as a function of k . Forthe purposes of this experiment, we assume that there are K = 5 active bands, we set the numberof measurements M = 512, and we construct the measurement matrix A to be M × N withi.i.d. Gaussian entries with mean zero and variance M . In this experiment, we take y = Ax and do not add noise to the measurements. In Figure 5(a) we see that when k is small thetwo approaches perform similarly, but when k exceeds 2 N W by more than a small amount, theperformance of BBCoSaMP( A , Ψ , y , K ) is far superior to that of BBCoSaMP( A Ψ , I , y , K ). Infact, when k becomes large we observe that for at least 5% of trials BBCoSaMP( A Ψ , I , y , K )results in an SNR of almost 0 dB (which can be achieved simply via the trivial estimate (cid:98) x = .)This gap is further illustrated in Figure 5(b), which shows the probability that the performance ofBBCoSaMP( A Ψ , I , y , K ) is within 3 dB of BBCoSaMP( A , Ψ , y , K ) as a function of k . This alsobegins to rapidly decay when k exceeds 2 N W . For a point of reference, in Figure 5(c) we plot theeigenvalue λ ( k ) N,W corresponding to the first DPSS basis vector which is not used in constructingour dictionary Ψ . This value is the dominant term in the approximation error that we can expectwhen using Ψ to represent multiband signals. We see that BBCoSaMP( A , Ψ , y , K ) continues toimprove with increasing k , roughly until λ ( k ) N,W approaches the level of machine precision.Our experimental results suggest that in our analysis of BBCoSaMP( A Ψ , I , y , K ) we are correctin requiring that k be relatively small, as the algorithm does indeed break down when k becomes toolarge. However, before this breakdown the performance can be quite favorable, with recovery SNRexceeding 88 dB, and Theorem 5.6 does guarantee even better performance were N to increase.34
16 24 32 40050100150200250
DPSS vectors per band ( k ) R ec o v e r y S N R ( d B ) BBCoSaMP( A, Ψ , y, K )BBCoSaMP( A Ψ , I, y, K ) DPSS vectors per band ( k ) P r o b a b ili t y −15 −10 −5 DPSS vectors per band ( k ) λ ( k ) N,W (a) (b) (c)
Figure 5:
Comparison of
BBCoSaMP( A , Ψ , y , K ) to BBCoSaMP( A Ψ , I , y , K ) . (a) The recovery SNRof BBCoSaMP( A , Ψ , y , K ) and BBCoSaMP( A Ψ , I , y , K ) as a function of the number k of DPSS vectorsused per band in constructing the multiband modulated DPSS dictionary Ψ . (b) The probability thatthe performance of BBCoSaMP( A Ψ , I , y , K ) is within 3 dB of BBCoSaMP( A , Ψ , y , K ) as a function of k .(c) The eigenvalue corresponding to the first DPSS basis vector that is not used. We speculate that the breakdown in the performance of BBCoSaMP( A Ψ , I , y , K ) as k growsis likely due to the fact that, for large enough k , the dictionary Ψ begins to contain highly coherentcolumns, so that any method that attempts to recover α itself is likely to encounter significantproblems. However, the strong performance of BBCoSaMP( A , Ψ , y , K ) (with no clear limitationon the size of k ) seems to suggest that this latter approach likely satisfies the kinds of guaranteesprovided for block-based IHT in Theorems 5.3 and 5.5, which provide for arbitrarily large k andhence arbitrarily accurate recovery. In light of these results, for the remainder of our experimentswe focus exclusively on BBCoSaMP( A , Ψ , y , K ). In most realistic scenarios, the compressive measurements y will be subject to various sources ofnoise (including noise in the signal itself, noise within the sensing hardware, quantization, etc.) Asnoted in Section 5.2.3, our approach to signal recovery inherits the same robustness to noise that isexhibited by traditional CoSaMP. To illustrate this, we consider the case where the measurements y are corrupted by additive white Gaussian noise, i.e., y = Ax + e where each e ∼ N ( , σ I ).In our experiments we consider three different noise levels, quantified by the “measurement SNR”(MSNR), which is defined as MSNR = 20 log (cid:18) (cid:107) Ax (cid:107) (cid:107) e (cid:107) (cid:19) dB , In general, the theoretical guarantees for this scenario suggest that the SNR of the recovery shouldbe roughly comparable to the MSNR.The results of our experiment are illustrated in Figure 6. In this case we are still assuming thatthere are K = 5 active bands, we use an i.i.d. Gaussian matrix A with M = 512 rows, and weplot the performance as a function of k . We observe that the results are essentially the same asin Figure 5(a), but that as we increase k the recovery SNR will hit a plateau dictated by the best Note that the case where the signal is corrupted with white noise as opposed to the measurements can be reducedto the case where the signal is noise-free and the measurements are noisy, and thus we restrict our attention to noisymeasurements. See [11, 22] for further discussion of this equivalence and the challenges posed by signal noise.
12 16 20 24 28 320255075100
DPSS vectors per band ( k ) R ec o v e r y S N R ( d B ) MSNR = 100dBMSNR = 50dBMSNR = 25dB
Figure 6:
Impact of measurement noise on the signal recovery SNR as a function of k . possible SNR achievable for a given MSNR. Roughly speaking, when k is small, performance is beinglimited by “modeling error”, but as k increases we eventually reach a regime where measurementnoise surpasses modeling error as the limiting factor, and no further gains are possible by increasing k . Thus, in practice it will typically be the noise level that dictates the optimal choice in k . Forthe remainder of our experiments we wish to avoid any assumptions about the noise level, and sowe restrict our attention to noise-free measurements. However, the results all translate to the noisysetting roughly as one would expect based on these results. We now study the performance of our approach as a function of the number of measurements M .Specifically, we consider the cases where K = 5, 10, and 15 bands are active, and for each value of K we let M vary from 2 N W K (the Landau rate) up to 14
N W K (oversampling the Landau rateby a factor of 7). The results are shown in Figures 7(a) and (b), which plot the results in terms of M NW K and M , respectively. We observe that when the measurement rate is only 3 or 4 times thatof the Landau rate, we are already doing extremely well, and we obtain near-perfect recovery at 6times the Landau rate. We observe very similar behavior for all values of K . Thus, for the sakeof simplicity, we restrict the remainder of our experiments to the case where there are only K = 5active bands.Before moving on, we note that an important caveat to these results is that as we vary M and K , it is sometimes necessary to adjust the value of k used in the construction of Ψ . This can easilybe seen by considering the regime where the measurement rate is very close to the Landau rate.Suppose M = 2 N W K and that we knew a priori which bands were occupied. In this case we couldperform recovery using the appropriate submatrix of Ψ , which would have kK columns. Thus,if k > N W , then we would have only M = 2 N W K measurements and would need to estimate kK > M values—a situation we would clearly like to avoid. Moreover, if we must also estimatethe support from the data as well, then it is clearly best to avoid setting k to be too large when M is small. Thus, in the experiments for Figure 7 (and for those below) we set k based on a rule ofthumb that we determined based on empirical performance. Specifically, we set k = 2 N W when M NW K ≤
2. For M NW K ∈ [2 , k from 2 N W = 16 to, in our case k = 38(which represents the rough point at which the first omitted eigenvalue λ ( k ) N,W reaches the level ofmachine precision). For larger values of M there is no performance gain by considering k larger36 Oversampling factor ¡ M NW K ¢ R ec o v e r y S N R ( d B ) K = 5 K = 10 K = 15
500 1000 1500050100150200250
Measurements ( M ) R ec o v e r y S N R ( d B ) K = 5 K = 10 K = 15 (a) (b) Figure 7:
Recovery performance as a function of the number of measurements M for K = 5 , , and active bands. (a) Performance in terms of M NW K , which measures oversampling with respect to the Landaurate. (b) Performance in terms of the actual number of measurements M . than this level. As promised in Section 3.1, we now evaluate the performance of our approach using more practicalmeasurement schemes. We compare the performance achieved using an i.i.d. Gaussian matrix tothat achieved using an M × N random demodulator matrix [73] and to a random sampling approachwhere M samples are taken uniformly at random on the length- N Nyquist grid. The results areshown in Figure 8. We observe very similar performance among the three approaches, with theGaussian matrix performing best, then the random demodulator, followed by random sampling.Note that while there is certainly a gap between these three approaches, it is also most pronouncedin a regime which is likely irrelevant in practice, since it is rare to find applications where a recoverySNR of 200 dB or more is feasible.
Finally, we close with a comparison between the performance achievable using our proposed multi-band modulated DPSS dictionary Ψ and the performance achievable using the N × N DFT asa sparsifying basis. We test both dictionaries using an M × N random demodulator measure-ment matrix [73], which was originally designed with frequency-sparse signals in mind (but undera multitone signal model, not a multiband model). Due to DFT leakage effects, we believe thatit would be inappropriate to break the DFT dictionary into bands and use a block-based recoveryalgorithm; thus when Ψ is the DFT basis, we use (cid:98) x = CoSaMP( A , Ψ , y , S ) as our recovered signalestimate. In order to give the DFT approach the best possible chance for success, for each valueof M we consider a range of possible values for the sparsity parameter S , selecting the value of S that achieves the best possible performance according to an oracle.The performance gap between these two approaches—illustrated in Figure 9(a)—is monumen-tal. Using the DFT dictionary, we never achieve a recovery SNR better than 20 dB over thisrange of measurements, whereas using the multiband modulated DPSS dictionary with block-basedCoSaMP, the recovery SNR can exceed 200 dB. 37 Oversampling factor ¡ M NW K ¢ R ec o v e r y S N R ( d B ) Gaussian matrixRandom demodulatorRandom sampling
Figure 8:
Comparison of recovery SNR using i.i.d. Gaussian measurements, random demodulator measure-ments, and random samples.
To illustrate the typical difference between these two approaches, in Figure 9(b) we show inblue (solid line) the DTFT of a representative signal from one trial in this experiment. Plottedin red (dashed line) is the DTFT of the signal vector recovered using the DPSS dictionary with M NW K = 4; the recovery SNR is 109 dB and the recovered signal is visually indistinguishable fromthe original. Plotted in green (dots) are the DFT coefficients of the signal vector recovered usingthe DFT dictionary with the same measurements; the recovery SNR is now only 13.4 dB. Whilethe DFT-based estimate does successfully capture the main peaks of each band, it naturally missesall of the sidelobes of each band (and despite the multiband nature of x ( t ), these sidelobes areimportant for accurately representing the window x ). Moreover, due to modeling error, the DFT-based approach also results in a number of spurious artifacts in regions where there is no significantfrequency content in the original signal.We note that for the DFT-based approach in this experiment, the best estimate producedusing CoSaMP( A , Ψ , y , S ) was observed when setting S = 85. On the other hand, the DPSSapproach (according to our rule of thumb) selects k = 27, so that in this approach there are atotal of kK = 135 free parameters. It is not surprising that the DPSS approach results in superiorperformance since the model has so many extra dimensions. What is potentially surprising is thatif we set S = 135, the DFT approach exhibits a complete breakdown and is unable to make anypractical use of these extra degrees of freedom. There are likely to be many ways that one could bridge the gap between the discrete, finite frame-work of CS and the problem of acquiring a continuous-time signal. In this paper, using a dictionaryconstructed from a sequence of modulated DPSS basis elements, we have argued that when dealingwith finite-length windows of Nyquist-rate samples, it is possible to map the multiband analogsignal model—in a very natural way—into a finite-dimensional sparse model. This allows us toapply many of the standard theoretical and algorithmic tools from CS to the problem of recoveringsuch a sample vector from compressive measurements. Moreover, the sparse signals that we en-counter in this model actually have a structured sparsity pattern (namely, block-sparsity), whichwe can exploit through the use of specialized “model-based” CS recovery algorithms. Althoughour recovery bounds are qualified—we have showed that most finite-length sample vectors arising38
Oversampling factor ¡ M NW K ¢ R ec o v e r y S N R ( d B ) DPSS CoSaMPDFT CoSaMP −5 −4 −3 −2 −1 | b X ( f ) | f − −
14 12
OriginalDPSSDFT (a) (b)
Figure 9:
Comparison of recovery performance between block-based CoSaMP with the multiband modulatedDPSS dictionary and standard CoSaMP using the DFT basis. (a) Performance comparison as a function of M NW K . (b) The DTFT of a representative signal, and the recovered estimate using each approach. from multiband analog signals can be highly accurately recovered from a number of measurementsthat is proportional to the Landau rate—these qualifiers are ultimately necessary. Moreover, wehave demonstrated through a series of careful experiments that the multiband modulated DPSSdictionary can provide a far superior alternative to the DFT for sparse recovery. Our experimentsalso confirm that certain practical measurement architectures such as the random demodulator,which was previously viewed only as a mechanism for capturing multitone signals [44], can indeedbe used to efficiently capture multiband signals.As the reader will note from our discussions regarding the selection of the number of columnsper band k , the computation of P ( x , K ), etc., there are certain subtle challenges in choosing theproper dictionary design and implementing an effective recovery algorithm. However, our experi-mental results do confirm that it is possible to navigate these waters. We have also aimed to givesome practical insight into the proper choice of design parameters, and we have made all of the soft-ware for our simulations available for download at .Ultimately, in addition to the open algorithmic questions discussed in Section 6, there are manyopen questions concerning the most effective way to construct a multiband DPSS dictionary; onecould imagine possible advantages to considering multiscale dictionaries (see also [59]), adaptiveband sizes, overlapping bands, etc. We leave the consideration of these questions to future work.It is worth emphasizing that the remarkable efficiency of our DPSS dictionary for sparsely ap-proximating finite-length multiband sample vectors is owed entirely to the eigenvalue concentrationbehavior described in Section 4.2.4. It would be interesting to explore other possible problem do-mains (beyond multiband signal models) where similar eigenvalue concentration behavior holds.Again, we leave the consideration of such questions to future work.We conclude by noting that applications of sparse representations abound, and so there aremany possible settings outside of CS where the multiband modulated DPSS dictionary could beuseful for processing finite-length sample vectors arising from multiband signals. For example,there are several possible applications in compressive signal processing [21], where one can attemptto answer certain questions about a signal vector directly from compressive measurements of thatvector without having to actually perform signal recovery. One specific problem in this area couldinvolve cancellation of an interfering signal that has corrupted the measurements. Suppose s ( t ) is39n analog signal of interest (not necessarily obeying a multiband model), and i ( t ) is a multiband (oreven just single-band) interferer supported on bands indexed by the set I . Let s and i denote thesample vectors arising from these two signals, and suppose we collect compressive measurements y = A ( s + i ). Then because i is concentrated in R ( Ψ I ), it follows that Ai is concentrated in R ( (cid:101) A I ),where (cid:101) A = A Ψ . One can therefore define an orthogonal projection matrix P := I − (cid:101) A I (cid:101) A †I . Byapplying P to the measurement vector y , one can remove virtually all of the influence of theinterferer, since P Ai ≈ even for very strong interferers, and therefore P y = P As + P Ai ≈ P As . From these processed measurements
P y one can attempt to recover s or answer variouscompressive-domain questions that do not require signal recovery. It can even be possible to deriveRIP bounds for P A and thus provide guarantees on the performance of these techniques. Werefer the reader to [20, 23] for additional discussion of the interference cancellation problem and anexample using a preliminary version of the modulated DPSS dictionary.
Acknowledgements
We are very grateful to Richard Baraniuk, Emmanuel Cand`es, and Justin Romberg for helpfuldiscussions during the formative stages of this work and to Marco Duarte, Armin Eftekhari, MarkEmbree, Chinmay Hegde, Deanna Needell, Yaniv Plan, and Borhan Sanandaji for their input oncertain technical points. A preliminary version of this work was first presented at the 2011 Workshopon Signal Processing with Adaptive Sparse Structured Representations (SPARS’11) [25].
A Review of the Karhunen-Loeve (KL) Transform
We briefly review the basic ideas behind the KL transform [68]. Suppose that r ∈ C N is a randomvector with mean zero and a known autocorrelation matrix denoted by R with entries defined as R [ m, n ] = E (cid:104) r [ n ] r [ m ] (cid:105) , where the overline denotes complex conjugation. Next suppose that we would like to find, for somefixed k ∈ { , , . . . , N } , a k -dimensional subspace Q of C N that best captures the energy of r inexpectation. That is, we wish to find the Q that minimizes E (cid:2) (cid:107) r − P Q r (cid:107) (cid:3) over all k -dimensional subspaces of C N . The optimal solution to this minimization problem can befound by computing an eigendecomposition of R . In particular, let R = U Λ R U H denote the eigendecomposition of R , with the eigenvalues λ ( R ) ≥ λ ( R ) ≥ · · · ≥ λ N − ( R ) ≥ Λ R . Let U k denote the N × k matrix that results from taking the first k columns of U . The columns of U k provide an orthonormal basis for the optimal Q , and thus wewill have P Q = U k U Hk . For this optimal choice of Q , we will also have E (cid:2) (cid:107) r − P Q r (cid:107) (cid:3) = N − (cid:88) (cid:96) = k λ (cid:96) ( R ) . Proof of Theorem 4.1
Proof.
Let f (cid:48) denote a random variable with uniform distribution on [ f c − W, f c + W ], let θ (cid:48) denotea random variable with uniform distribution on [0 , π ), and let r = r ( f (cid:48) , θ (cid:48) ) := e jθ (cid:48) e f (cid:48) denote arandom vector in C N defined in terms of f (cid:48) and θ (cid:48) . Then we can write12 W · (cid:90) f c + Wf c − W (cid:107) e f − P Q e f (cid:107) df = (cid:90) f c + Wf c − W (cid:107) e f (cid:48) − P Q e f (cid:48) (cid:107) p f (cid:48) ( f (cid:48) ) df (cid:48) = (cid:90) π (cid:90) f c + Wf c − W (cid:107) e jθ (cid:48) ( e f (cid:48) − P Q e f (cid:48) ) (cid:107) p f (cid:48) ( f (cid:48) ) 12 π df (cid:48) dθ (cid:48) = (cid:90) π (cid:90) f c + Wf c − W (cid:107) e jθ (cid:48) e f (cid:48) − P Q ( e jθ (cid:48) e f (cid:48) ) (cid:107) p f (cid:48) ( f (cid:48) ) p θ (cid:48) ( θ (cid:48) ) df (cid:48) dθ (cid:48) = E (cid:2) (cid:107) r − P Q r (cid:107) (cid:3) . We next verify that the random vector r has mean zero, i.e., for each n ∈ { , , . . . , N − } we have E [ r [ n ]] = (cid:90) π (cid:90) f c + Wf c − W e jθ (cid:48) e f (cid:48) [ n ] p f (cid:48) ( f (cid:48) ) p θ (cid:48) ( θ (cid:48) ) df (cid:48) dθ (cid:48) = (cid:90) π (cid:90) f c + Wf c − W e jθ (cid:48) e j πf (cid:48) n W π df (cid:48) dθ (cid:48) = 14 W π · (cid:90) f c + Wf c − W e j πf (cid:48) n (cid:90) π e jθ (cid:48) dθ (cid:48) df (cid:48) = 0 . Next, observe that the random vector r has autocorrelation matrix R with entries: R [ m, n ] = E (cid:104) r [ n ] r [ m ] (cid:105) = E (cid:104) e f (cid:48) [ n ] e − jθ (cid:48) e jθ (cid:48) e f (cid:48) [ m ] (cid:105) = 12 W · (cid:90) f c + Wf c − W e − j πf (cid:48) n e j πf (cid:48) m df (cid:48) = 12 W · e j πf c m · W sinc (2 W ( m − n )) · e − j πf c n . Therefore, we can write R = 12 W · E f c B N,W E Hf c , (66)where E f c is as defined in (44) and B N,W is as defined in (36). Plugging the eigendecompositionfor B N,W from (38) into (66), we obtain R = 12 W · E f c S N,W Λ N,W S HN,W E Hf c . From this eigendecomposition and standard results concerning the KL transform (see Appendix A),it follows that the optimal k -dimensional subspace Q will be spanned by the first k columns of E f c S N,W . Furthermore, this choice of Q yields12 W · (cid:90) f c + Wf c − W (cid:107) e f − P Q e f (cid:107) df = E (cid:2) (cid:107) r − P Q r (cid:107) (cid:3) = N − (cid:88) (cid:96) = k λ (cid:96) ( R ) = 12 W N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W , as desired. 41 Proof of Theorem 4.2
Proof.
The sequence x [ n ] := x ( nT s ) , n ∈ Z is a discrete-time, zero-mean, wide sense stationaryrandom process with power spectrum equal to P x ( f ) = (cid:26) W , f c − W ≤ f ≤ f c + W, , otherwise , and the vector x ∈ C N equals the restriction of x [ n ] to the indices n = 0 , , . . . , N −
1. The inverseDTFT of the power spectrum P x ( f ) gives the autocorrelation function for x [ n ]: r x [ n ] = e j πnf c · sinc (2 W n ) . It follows that the random vector x has mean zero and an autocorrelation matrix R given by (66).Thus, x has exactly the same autocorrelation structure as the random vector r we considered inAppendix B, and so the same KL transform analysis applies. Finally, we can also compute E (cid:2) (cid:107) x (cid:107) (cid:3) = N − (cid:88) n =0 E (cid:2) | x [ n ] | (cid:3) = N r x [0] = N. D Proof of Theorem 4.3
Proof.
First, let us define E f c to be an operator that modulates a discrete-time signal by f c ; morespecifically, E f c ( s )[ n ] = e j πf c n s [ n ]for all n ∈ Z . Since the modulated DPSS vectors form an orthonormal basis for C N , and since thediscrete-time signal x [ n ] is time-limited, we can write x [ n ] as x = N − (cid:88) (cid:96) =0 α [ (cid:96) ] E f c T N ( s ( (cid:96) ) N,W ) , where the coefficients α are given by α [ (cid:96) ] = (cid:104) x, E f c T N ( s ( (cid:96) ) N,W ) (cid:105) = (cid:104) x , E f c s ( (cid:96) ) N,W (cid:105) for (cid:96) = 0 , , . . . , N − (cid:107) α (cid:107) = (cid:107) x (cid:107) = (cid:107) x (cid:107) . By linearity, we can also write B f c ,W x = N − (cid:88) (cid:96) =0 α [ (cid:96) ] B f c ,W E f c T N ( s ( (cid:96) ) N,W ) , where the functions B f c ,W E f c T N ( s ( (cid:96) ) N,W ) are modulated DPSS’s and therefore remain orthogonal.Therefore,(1 − δ ) (cid:107) x (cid:107) ≤ (cid:107)B f c ,W x (cid:107) = N − (cid:88) (cid:96) =0 | α [ (cid:96) ] | (cid:107)B f c ,W E f c T N ( s ( (cid:96) ) N,W ) (cid:107) = N − (cid:88) (cid:96) =0 | α [ (cid:96) ] | λ ( (cid:96) ) N,W . k = 2 N W (1 + (cid:15) ), then(1 − δ ) (cid:107) x (cid:107) ≤ k − (cid:88) (cid:96) =0 | α [ (cid:96) ] | λ ( (cid:96) ) N,W + N − (cid:88) (cid:96) = k | α [ (cid:96) ] | λ ( (cid:96) ) N,W ≤ k − (cid:88) (cid:96) =0 | α [ (cid:96) ] | + N − (cid:88) (cid:96) = k (cid:107) x (cid:107) λ ( (cid:96) ) N,W ≤ (cid:107) P Q x (cid:107) + (cid:107) x (cid:107) N C e − C N = ( (cid:107) x (cid:107) − (cid:107) x − P Q x (cid:107) ) + (cid:107) x (cid:107) N C e − C N for N ≥ N . Rearranging terms in the above inequality yields (47). E Proof of Theorem 4.4
Proof.
For each i ∈ I , the sequence x i [ n ] := x i ( nT s ) , n ∈ Z is a discrete-time, zero-mean, widesense stationary random process with power spectrum equal to P x i ( f ) = (cid:40) W K , f ∈ (cid:104) − B nyq T s + i W, − B nyq T s + ( i + 1)2 W (cid:105) , , otherwise . These discrete-time sequences will also be independent and jointly wide sense stationary. Defining x i ∈ C N to be the restriction of x i [ n ] to the indices n = 0 , , . . . , N −
1, we have x = (cid:80) i ∈I x i and P x ( f ) = (cid:80) i ∈I P x i ( f ). We can compute E (cid:2) (cid:107) x (cid:107) (cid:3) = N − (cid:88) n =0 E (cid:2) | x [ n ] | (cid:3) = N − (cid:88) n =0 (cid:88) i,(cid:96) ∈I E (cid:104) x i [ n ] x (cid:96) [ n ] (cid:105) = N − (cid:88) n =0 (cid:88) i ∈I E (cid:2) | x i [ n ] | (cid:3) = N · K · K = N. Also, we have E (cid:2) (cid:107) x − P Ψ I x (cid:107) (cid:3) = E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:32)(cid:88) i ∈I x i (cid:33) − P Ψ I (cid:32)(cid:88) i ∈I x i (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:88) i ∈I ( x i − P Ψ I x i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ E (cid:32)(cid:88) i ∈I (cid:107) x i − P Ψ I x i (cid:107) (cid:33) ≤ E (cid:32)(cid:88) i ∈I (cid:107) x i − P Ψ i x i (cid:107) (cid:33) ≤ E (cid:34) K (cid:88) i ∈I (cid:107) x i − P Ψ i x i (cid:107) (cid:35) = K (cid:88) i ∈I E (cid:2) (cid:107) x i − P Ψ i x i (cid:107) (cid:3) = K · K · W K N − (cid:88) (cid:96) = k λ ( (cid:96) ) N,W , Ψ i belong to Ψ I as well, the fifth line follows because (cid:107) h (cid:107) ≤ K (cid:107) h (cid:107) for any K -dimensionalvector h , and the final line follows from Theorem 4.2. F Proof of Lemma 5.1
Proof.
Let Ψ i and Ψ i denote the blocks from which q and q , respectively, are drawn. If i = i ,we have already established at the beginning of Section 4.3.1 that (cid:104) q , q (cid:105) = 0. Thus, we assumehenceforth that i (cid:54) = i .Let E f i denote an operator that modulates a discrete-time signal by f i ; more specifically, E f i ( s )[ n ] = e j πf i n s [ n ] for all n ∈ Z . Let B f i ,W denote an orthogonal projection operator thattakes a discrete-time signal, bandlimits its DTFT to the frequency range f ∈ [ f i − W, f i + W ], andreturns the corresponding signal in the time domain. Finally, define B cf i ,f i ,W := I −B f i ,W −B f i ,W ,which is also an orthogonal projection for finite-energy signals because | f i − f i | ≥ W . For some (cid:96), (cid:96) (cid:48) ∈ { , , . . . , N W (1 − (cid:15) ) − } , we can write (cid:104) q , q (cid:105) = (cid:68) E f i s ( (cid:96) ) N,W , E f i s ( (cid:96) (cid:48) ) N,W (cid:69) = (cid:68) E f i T N s ( (cid:96) ) N,W , E f i T N s ( (cid:96) (cid:48) ) N,W (cid:69) = (cid:68) B f i ,W E f i T N s ( (cid:96) ) N,W + B f i ,W E f i T N s ( (cid:96) ) N,W + B cf i ,f i ,W E f i T N s ( (cid:96) ) N,W , B f i ,W E f i T N s ( (cid:96) (cid:48) ) N,W + B f i ,W E f i T N s ( (cid:96) (cid:48) ) N,W + B cf i ,f i ,W E f i T N s ( (cid:96) (cid:48) ) N,W (cid:69) = (cid:68) B f i ,W E f i T N s ( (cid:96) ) N,W , B f i ,W E f i T N s ( (cid:96) (cid:48) ) N,W (cid:69) + (cid:68) B f i ,W E f i T N s ( (cid:96) ) N,W , B f i ,W E f i T N s ( (cid:96) (cid:48) ) N,W (cid:69) + (cid:68) B cf i ,f i ,W E f i T N s ( (cid:96) ) N,W , B cf i ,f i ,W E f i T N s ( (cid:96) (cid:48) ) N,W (cid:69) . From (32), (34), and the fact that | f i − f i | ≥ W , it follows that | (cid:104) q , q (cid:105) | ≤ (cid:113) λ ( (cid:96) ) N,W · (cid:113) − λ ( (cid:96) (cid:48) ) N,W + (cid:113) − λ ( (cid:96) ) N,W · (cid:113) λ ( (cid:96) (cid:48) ) N,W + (cid:113) − λ ( (cid:96) ) N,W · (cid:113) − λ ( (cid:96) (cid:48) ) N,W . Finally, (51) follows by bounding the √ λ terms in this expression by 1 and by bounding the √ − λ terms in this expression using (39). References [1] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak. Toeplitz-structured compressed sensing matrices. In
Proc. IEEE Work. Stat. Signal Processing , Madison, WI, Aug. 2007.[2] R. Baraniuk. Compressive sensing.
IEEE Signal Processing Mag. , 24(4):118–120, 124, 2007.[3] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde. Model-based compressive sensing.
IEEE Trans. Inform.Theory , 56(4):1982–2001, 2010.[4] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the restricted isometry property forrandom matrices.
Const. Approx. , 28(3):253–263, 2008.[5] T. Blumensath. Sampling and reconstructing signals from a union of linear subspaces.
IEEE Trans. Inform.Theory , 57(7):4660–4671, 2011.[6] T. Blumensath and M. Davies. Iterative hard thresholding for compressive sensing.
Appl. Comput. Harmon.Anal. , 27(3):265–274, 2009.[7] T. Blumensath and M. Davies. Sampling theorems for signals from the union of finite-dimensional linear sub-spaces.
IEEE Trans. Inform. Theory , 55(4):1872–1882, 2009.
8] Y. Bresler and P. Feng. Spectrum-blind minimum-rate sampling and reconstruction of 2D multiband signals. In
Proc. IEEE Int. Conf. Image Processing (ICIP) , Zurich, Switzerland, Sept. 1996.[9] V. Buldygin and Y. Kozachenko.
Metric Characterization of Random Variables and Random Processes . AmericanMathematical Society, Providence, RI, 2000.[10] E. Cand`es. Compressive sampling. In
Proc. Int. Congress of Math. , Madrid, Spain, Aug. 2006.[11] E. Cand`es and M. Davenport. How well can we estimate a sparse vector?
Arxiv preprint arXiv:1104.5246 , 2011.[12] E. Cand`es, Y. Eldar, D. Needell, and P. Randall. Compressed sensing with coherent and redundant dictionaries.
Appl. Comput. Harmon. Anal. , 31(1):59–73, 2011.[13] E. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highlyincomplete frequency information.
IEEE Trans. Inform. Theory , 52(2):489–509, 2006.[14] E. Cand`es and T. Tao. Decoding by linear programming.
IEEE Trans. Inform. Theory , 51(12):4203–4215, 2005.[15] V. Cevher. An ALPS view of sparse recovery. In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing(ICASSP) , Prague, Czech Republic, May 2011.[16] A. Cohen, W. Dahmen, and R. DeVore. Instance optimal decoding by thresholding in compressed sensing. In
Proc. Int. Conf. Harmon. Anal. and Partial Diff. Eq. , El Escorial Madrid, Spain, Jun. 2008.[17] W. Dai and O. Milenkovic. Subspace pursuit for compressive sensing signal reconstruction.
IEEE Trans. Inform.Theory , 55(5):2230–2249, 2009.[18] I. Daubechies and R. DeVore. Approximating a bandlimited function using very coarsely quantized data: Afamily of stable sigma-delta modulators of arbitrary order.
Ann. Math. , 158(2):679–710, 2003.[19] M. Davenport. Concentration of measure and sub-gaussian distributions, 2009. available online at http://cnx.org/content/m32583/latest/ .[20] M. Davenport, P. Boufounos, and R. Baraniuk. Compressive domain interference cancellation. In
Proc. Work.Struc. Parc. Rep. Adap. Signaux (SPARS) , Saint-Malo, France, Apr. 2009.[21] M. Davenport, P. Boufounos, M. Wakin, and R. Baraniuk. Signal processing with compressive measurements.
IEEE J. Select. Top. Signal Processing , 4(2):445–460, 2010.[22] M. Davenport, J. Laska, J. Treichler, and R. Baraniuk. The pros and cons of compressive sensing for widebandsignal acquisition: Noise folding vs. dynamic range. to appear in IEEE Trans. Signal Processing , 2012.[23] M. Davenport, S. Schnelle, J.P. Slavinsky, R. Baraniuk, M. Wakin, and P. Boufounos. A wideband compressiveradio receiver. In
Proc. Military Communications Conference (MILCOM) , San Jose, CA, Oct. 2010.[24] M. Davenport and M. Wakin. Analysis of Orthogonal Matching Pursuit using the restricted isometry property.
IEEE Trans. Inform. Theory , 56(9):4395–4401, 2010.[25] M. Davenport and M. Wakin. Reconstruction and cancellation of sampled multiband signals using DiscreteProlate Spheroidal Sequences. In
Proc. Work. Struc. Parc. Rep. Adap. Signaux (SPARS) , Edinburgh, Scotland,Jun. 2011.[26] G. Davis, S. Mallat, and Z. Zhang. Adaptive time-frequency decompositions.
SPIE J. Opt. Engin. , 33(7):2183–2191, 1994.[27] R. DeVore, G. Petrova, and P. Wojtaszczyk. Instance-optimality in probability with an (cid:96) -minimization decoder. Appl. Comput. Harmon. Anal. , 27(3):275–288, 2009.[28] D. Donoho. Compressed sensing.
IEEE Trans. Inform. Theory , 52(4):1289–1306, 2006.[29] D. Donoho, I. Drori, Y. Tsaig, and J.-L. Starck. Sparse solution of underdetermined linear equations by stagewiseorthogonal matching pursuit. Stanford Technical Report, 2006.[30] M. Duarte and R. Baraniuk. Spectral compressive sensing. Preprint, 2011.[31] Y. Eldar, P. Kuppinger, and H. B¨olcskei. Block-sparse signals: Uncertainty relations and efficient recovery.
IEEETrans. Signal Processing , 58(6):3042–3054, 2010.[32] Y. Eldar and M. Mishali. Robust recovery of signals from a structured union of subspaces.
IEEE Trans. Inform.Theory , 55(11):5302–5316, 2009.[33] C. Fancourt and J. Principe. On the relationship between the Karhunen-Loeve transform and the prolatespheroidal wave functions. In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP) , Istanbul,Turkey, Jun. 2000.
34] P. Feng.
Universal spectrum blind minimum rate sampling and reconstruction of multiband signals . PhD thesis,University of Illinois at Urbana-Champaign, Mar. 1997.[35] P. Feng and Y. Bresler. Spectrum-blind minimum-rate sampling and reconstruction of multiband signals. In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP) , Atlanta, GA, May 1996.[36] S. Gerˇsgorin. ¨Uber die Abgrenzung der Eigenwerte einer Matrix.
Izv. Akad. Nauk SSSR Ser. Fiz.-Mat. , 6:749–754, 1931.[37] L. Gosse. Compressed sensing with preconditioning for sparse recovery with subsampled matrices of Slepianprolate functions. Preprint, 2010.[38] P. C. Hansen. Regularization Tools Version 4.0 for Matlab 7.3.
Numerical Algorithms , 46:189–194, 2007.[39] S. Hein and A. Zakhor. Theoretical and numerical aspects of an SVD-based method for band-limiting finite-extent sequences.
IEEE Trans. Signal Processing , 42(5):1227–1230, 1994.[40] S. Izu and J. Lakey. Time-frequency localization and sampling of multiband signals.
Acta Applicandae Mathe-maticae , 107(1):399–435, 2009.[41] H. Landau. Necessary density conditions for sampling and interpolation of certain entire functions.
Acta Math. ,117:37–52, 1967.[42] H. Landau and H. Pollak. Prolate spheroidal wave functions, Fourier analysis, and uncertainty. II.
Bell SystemsTech. J. , 40(1):65–84, 1961.[43] H. Landau and H. Pollak. Prolate spheroidal wave functions, Fourier analysis, and uncertainty. III.
Bell SystemsTech. J. , 41(4):1295–1336, 1962.[44] M. Lexa, M. Davies, and J. Thompson. Reconciling compressive sampling systems for spectrally-sparsecontinuous-time signals.
IEEE Trans. Signal Processing , 60(1):155–171, 2012.[45] Y. Lu and M. Do. A theory for sampling signals from a union of subspaces.
IEEE Trans. Signal Processing ,56(6):2334–2345, 2008.[46] R. Lyons.
Understanding Digital Signal Processing . Pearson Education, Inc., Upper Saddle River, NJ, 2004.[47] M. Mishali and Y. Eldar. Blind multi-band signal reconstruction: Compressed sensing for analog signals.
IEEETrans. Signal Processing , 57(3):993–1009, 2009.[48] M. Mishali and Y. Eldar. From theory to practice: Sub-Nyquist sampling of sparse wideband analog signals.
IEEE J. Select. Top. Signal Processing , 4(2):375–391, 2010.[49] D. Mugler, Y. Wu, and S. Clary. Linear prediction of bandpass signals based on past samples. In
Proc. SamplingTheory and Applications (SampTA) , Loen, Norway, Aug. 1999.[50] D. Needell and J. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples.
Appl.Comput. Harmon. Anal. , 26(3):301–321, 2009.[51] D. Needell and R. Vershynin. Uniform uncertainty principle and signal recovery via regularized orthogonalmatching pursuit.
Found. Comput. Math. , 9(3):317–334, 2009.[52] D. Needell and R. Vershynin. Signal recovery from incomplete and inaccurate measurements via regularizedorthogonal matching pursuit.
IEEE J. Select. Top. Signal Processing , 4(2):310–316, 2010.[53] J. Oh, S. Senay, and L. Chaparro. Signal reconstruction from nonuniformly spaced samples using evolutionarySlepian transform-based POCS.
EURASIP J. Adv. Signal Processing , 2010, February 2010.[54] Y. Pati, R. Rezaifar, and P. Krishnaprasad. Orthogonal matching pursuit: Recursive function approximationwith applications to wavelet decomposition. In
Proc. Asilomar Conf. Signals, Systems, and Computers , PacificGrove, CA, Nov. 1993.[55] D. Phillips. A technique for the numerical solution of certain integral equations of the first kind.
J. ACM ,9:84–97, 1962.[56] R. Prony. Essai exp´erimental et analytique sur les lois de la Dilatabilit´e des fluides ´elastiques et sur celles dela Force expansive de la vapeur de l’eau et de la vapeur de l’alkool, `a diff´erentes temp´eratures.
J. de l’ ´EcolePolytechnique,
Flor´eal et Prairial III, 1(2):24–76, 1795. R. Prony is Gaspard Riche, baron de Prony.[57] J. Romberg. Compressive sensing by random convolution.
SIAM J. Imag. Sci. , 2(4):1098–1128, 2009.[58] L. Scharf. The SVD and reduced rank signal processing.
Signal Processing , 25(2):113–133, 1991.[59] E. Sejdi´c, M. Luccini, S. Primak, K. Baddour, and T. Willink. Channel estimation using DPSS based frames.In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP) , Las Vegas, Nevada, Mar. 2008.
60] S. Senay, L. Chaparro, and L. Durak. Reconstruction of nonuniformly sampled time-limited signals using prolatespheroidal wave functions.
Signal Processing , 89(12):2585–2595, 2009.[61] S. Senay, L. Chaparro, M. Sun, and R. Sclabassi. Compressive sensing and random filtering of EEG signalsusing Slepian basis. In
Proc. European Signal Processing Conf. (EUSIPCO) , Lausanne, Switzerland, Aug. 2008.[62] J. P. Slavinsky, J. Laska, M. Davenport, and R. Baraniuk. The compressive multiplexer for multi-channelcompressive sensing. In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP) , Prague, CzechRepublic, May 2011.[63] D. Slepian. Prolate spheroidal wave functions, Fourier analysis, and uncertainty. IV.
Bell Systems Tech. J. ,43(6):3009–3058, 1964.[64] D. Slepian. On Bandwidth.
Proc. IEEE , 64(3):292–300, 1976.[65] D. Slepian. Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V – The discrete case.
BellSystems Tech. J. , 57(5):1371–1430, 1978.[66] D. Slepian. Some comments on Fourier analysis, uncertainty, and modeling.
SIAM Rev. , 25(3):379–393, 1983.[67] D. Slepian and H. Pollak. Prolate spheroidal wave functions, Fourier analysis, and uncertainty. I.
Bell SystemsTech. J. , 40(1):43–64, 1961.[68] H. Stark and J. Woods.
Probability, random processes, and estimation theory for engineers . Prentice-Hall, 1986.[69] D. Thomson. Spectrum estimation and harmonic analysis.
Proc. IEEE , 70(9):1055–1096, 1982.[70] A. Tikhonov. Solution of incorrectly formulated problems and the regularization method.
Dokl. Akad. Nauk.SSSR , 151:1035–1038, 1963.[71] A. Tikhonov and V. Arsenin.
Solutions of Ill-Posed Problems . Winston & Sons, Washington, D.C., 1977.[72] J. Tropp. Greed is good: Algorithmic results for sparse approximation.
IEEE Trans. Inform. Theory ,50(10):2231–2242, 2004.[73] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk. Beyond Nyquist: Efficient sampling of sparse,bandlimited signals.
IEEE Trans. Inform. Theory , 56(1):520–544, 2010.[74] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk. Random filters for compressive sampling andreconstruction. In
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing (ICASSP) , Toulouse, France,May 2006.[75] R. Venkataramani and Y. Bresler. Further results on spectrum blind sampling of 2D signals. In
Proc. IEEE Int.Conf. Image Processing (ICIP) , Chicago, IL, Oct. 1998.[76] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices.
Arxiv preprint arXiv:1011.3027 ,2010.[77] R. Walden. Analog-to-digital converter survey and analysis.
IEEE J. Selected Areas Comm. , 17(4):539–550,1999., 17(4):539–550,1999.