Compressed sensing with structured sparsity and structured acquisition
CCompressed sensing with structured sparsity and structuredacquisition
Claire Boyer (1) , J´er´emie Bigot (2) and Pierre Weiss (1 , Institut de Math´ematiques de Toulouse (UMR 5219), CNRS, Universit´e de Toulouse, France [email protected] (2)
Institut de Math´ematiques de Bordeaux (UMR 5251), CNRS, Universit´e de Bordeaux, France [email protected] (3)
Institut des Technologies Avanc´ees du Vivant (USR 3505), CNRS, Toulouse, France [email protected]
June 14, 2016
Abstract
Compressed Sensing (CS) is an appealing framework for applications such as MagneticResonance Imaging (MRI). However, up-to-date, the sensing schemes suggested by CS the-ories are made of random isolated measurements, which are usually incompatible with thephysics of acquisition. To reflect the physical constraints of the imaging device, we introducethe notion of blocks of measurements: the sensing scheme is not a set of isolated measure-ments anymore, but a set of groups of measurements which may represent any arbitraryshape (parallel or radial lines for instance). Structured acquisition with blocks of measure-ments are easy to implement, and provide good reconstruction results in practice. However,very few results exist on the theoretical guarantees of CS reconstructions in this setting. Inthis paper, we derive new CS results for structured acquisitions and signals satisfying a priorstructured sparsity. The obtained results provide a recovery probability of sparse vectorsthat explicitly depends on their support. Our results are thus support-dependent and offerthe possibility for flexible assumptions on the sparsity structure. Moreover, the results aredrawing-dependent, since we highlight an explicit dependency between the probability of re-constructing a sparse vector and the way of choosing the blocks of measurements. Numericalsimulations show that the proposed theory is faithful to experimental observations.
Key-words:
Compressed Sensing, blocks of measurements, structured sparsity, MRI, exactrecovery, (cid:96) minimization. Since its introduction in [CRT06b, Don06], compressed sensing triggered a massive interest infundamental and applied research. However, despite recent progresses, existing theories are stillinsufficient to explain the success of compressed acquisitions in many practical applications. Ouraim in this paper is to extend the applicability of the theory by combining two new ingredients:structured sparsity and acquisition structured by blocks.
In this section, we provide a brief history of the compressed sensing evolution, with a particularemphasis on Fourier imaging, in order to better highlight our contribution.1 a r X i v : . [ c s . I T ] J un .1.1 Sampling with matrices with i.i.d. entries Compressed sensing - as proposed in [CT06] - consists in recovering a signal x ∈ C n , from avector of measurements y = Ax , where A ∈ C m × n is the sensing matrix. Typical theorems statethat if A is an i.i.d. Gaussian matrix, x is s -sparse, and m (cid:38) s log( n ), then x can be recoveredexactly from y by solving the following (cid:96) minimization problem:min x ∈ C n ,Ax = y (cid:107) x (cid:107) . (1)Moreover, it can be shown that the recovery is robust to noise if the constraint in (1) is penalized.An important fact about this theorem is that the number of measurements mostly depends onthe intrinsic dimension s rather than the ambient dimension n . Nearly at the same time, the theory was extended to random linear projections from orthogonalbases [CRT06b, Rau10, CP11, FR13]. Let A ∈ C n × n denote an orthogonal matrix with rows( a ∗ i ) ≤ i ≤ n ∈ C n . A sensing matrix A can be constructed by randomly drawing rows as follows A = 1 √ m (cid:18) √ π J (cid:96) a ∗ J (cid:96) (cid:19) ≤ (cid:96) ≤ m , (2)where ( J (cid:96) ) ≤ (cid:96) ≤ m are i.i.d. copies of a uniform random variable J with P ( J = j ) = π j = 1 /n , forall 1 ≤ j ≤ n . The coherence of matrix A can be defined by κ ( A ) = n · max ≤ i ≤ n (cid:107) a i (cid:107) ∞ . A typical result in this setting states that if m (cid:38) κ ( A ) s ln( n/ε ) then an s -sparse vector x can be exactly recovered using the (cid:96) -minimization problem (1) with probability exceeding 1 − ε .This type of theorem is particularly helpful to explain the success of recovery of sparse signals(spikes) from Fourier measurements, since in that case κ ( A ) = 1. Unfortunately, in most applications, the sensing matrix A is coherent, meaning that κ ( A ) islarge. In pratice, uniformly drawn measurements lead to very poor reconstructions. A naturalidea to reduce the coherence consists in drawing the highly coherent rows of A more often thanthe others.A byproduct of standard compressed sensing results [CP11] implies that variable densitysampling [PVW11, CCW13, KW14] allows perfect reconstruction with a limited (but usuallytoo high) number of measurements. This idea is captured by the following result.Let A ∈ C n × n denote an orthogonal matrix with rows ( a ∗ i ) ≤ i ≤ n ∈ C n . Let A denote therandom matrix A = 1 √ m (cid:18) √ π J (cid:96) a ∗ J (cid:96) (cid:19) ≤ (cid:96) ≤ m , (3)where ( J (cid:96) ) ≤ (cid:96) ≤ m are i.i.d. copies of a random variable J with P ( J = j ) = π j = (cid:107) a j (cid:107) ∞ (cid:80) nj =1 (cid:107) a j (cid:107) ∞ , forall 1 ≤ j ≤ n .Let x denote an s -sparse vector and set m (cid:38) (cid:16)(cid:80) nj =1 (cid:107) a j (cid:107) ∞ (cid:17) s ln( n/ε ). Then, the minimizerof (1) coincides with x , with probability larger than 1 − ε .Unfortunately, it is quite easy to show experimentally, that this principle cannot explain thesuccess of CS in applications such as Fourier imaging. The flip test proposed in [AHPR13] is astriking illustration of this fact. 2 .1.4 Variable density sampling with structured sparsity A common aspect of the above results is that they assume no structure - apart from sparsity- in the signals to recover. Recovering arbitrary sparse vectors is a very demanding propertythat precludes the use of CS in many practical settings. Exact recovery conditions for sparsevectors with a structured support appeared quite early, with the work of Tropp [Tro06]. Tothe best of our knowledge, the work [AHPR13] is the first to provide explicit constructionsof random matrices allowing to recover sparse signals with a structured support. The theoryin [AHPR13], also suggests variable density sampling strategies. There is however one majordifference compared to the previously mentioned contributions: the density should depend bothon the sensing basis and the sparsity structure. The authors develop a comprehensive theoryfor Fourier sampling, based on isolated measurements under a sparsity-by-levels assumptionin the wavelet domain. They illustrate through extensive numerical experiments in [AHR14b]that sampling structured signals in coherent bases can significantly outperform i.i.d. Gaussianmeasurements - usually considered as an optimal sampling strategy. This theory will be reviewedand compared to ours in Section 4.
To fix the ideas, let us illustrate the application of the previously described theory in the contextof Magnetic Resonance Imaging (MRI). In MRI, images are sampled in the Fourier domain andcan be assumed to be sparse in the wavelet domain. Figure (1) (a) illustrates a variable densitysampling pattern: the white dots indicate which Fourier coefficients are probed. Figure (1) (b)is the reconstruction of a phantom image from the measurements in (a) via (cid:96) -minimization.Figure (1) (c) is a zoom on the reconstruction. As can be seen, only 4 .
6% of the coefficients areenough to reconstruct a well resolved image.
Probing measurements independently at random is infeasible - or at least impractical - in mostmeasuring instruments. This is the case in MRI, where the samples have to lie on piecewisesmooth trajectories [LDP07, CCKW14, CWKC16]. The same situation occurs in a number ofother devices such as Electron [LSMH13] and X-ray Tomography [PSV09], radio-interferometry[WJP + (cid:96) -minimization reconstruction.To the best of our knowledge, there currently exists no theory able to explain this favorablebehavior. The only works dealing with such an acquisition are [PDG15, BBW14]. They assumeno structure in the sparsity and we showed in [BBW14] that structure was crucially needed toexplain results such as those in Figure 2. We will recall this result in Section 4.3.1. The main contribution of this paper is to derive a new compressed sensing theory:(i) giving recovery guarantees with an explicit dependency on the support of the vector toreconstruct,(ii) based on block-structured acquisition. 3 a) (b) SNR = 24.2 dB (c)(d) SNR = 21 dB (e)
Figure 1: An example of reconstruction of a 2048 × .
6% measurements). (b)Corresponding reconstruction via (cid:96) -minimization. (c) A zoom on a part of the reconstructedimage. (d) Image obtained by using the pseudo-inverse transform. (e) A zoom on a part of thisimage.Informally, our main result (Theorem 3.3) reads as follows. Let x ∈ C n denote a vector withsupport S ⊂ { , . . . , n } . Draw m blocks of measurements with a distribution π ∈ R M , where M denotes the number of available blocks. If m (cid:38) Γ( S, π ) ln (cid:0) nε (cid:1) , the vector x is recovered by (cid:96) minimization with probability greater than 1 − ε .The proposed theory has a few important consequences: • The block structure proposed herein enriches the family of sensing matrices available forCS. Existing theories for structured sampling do not take constraints of the sampling deviceinto account. Therefore, the proposed theory gives keys to design realistic structuredsampling. • Our theorem significantly departs from most works that consider reconstruction of any s -sparse vector. This is similar in spirit to the works [AH15, AHPR13] However, this isthe first time that the dependency on the support S and the drawing probability π is madeexplicit through the quantity Γ( S, π ). This provides many possibilities such as optimizingthe drawing probability π or identifying the classes of supports recoverable with blocksampling strategies. • The proposed approach generalizes most existing compressed sensing theories. In partic-ular, it allows recovering all the results mentioned in the introduction. • The provided theory seems to predict accurately practical Fourier sampling experiments,which is quite rare in this field. The example given in Figure 2 can be analyzed precisely.4 a) (b) SNR = 24.1 dB (c)(d) SNR = 21 dB (e)
Figure 2: An example of reconstruction of a 2048 × (cid:96) -minimization. (c) A zoom on a part of the reconstructed image. (d) Image obtained byusing the pseudo-inverse transform. (e) A zoom on a part of this image.In particular, we show that a block structured acquisition can be used, only if the supportstructure is adapted to it. The resulting structures are more complex than the sparsity bylevels of [AHPR13]. • The proposed theory allows envisioning the use of CS in situations that were not possiblebefore. The use of incoherent transforms is not necessary anymore, given that the support S has some favorable properties. • The usual restricted isometry constant or coherence are replaced by the quantity Γ(
S, π ),which seems to be much more adapted to describe the practical success of CS.
In this paper, structured acquisition denotes the constraints imposed by the physics of theacquisition, that are modeled using blocks of measurements extracted from a full deterministicmatrix A . This notion of structured acquisition differs from the notion of structured randommatrices, as described in [Rau10] and [DE11]. Indeed, this latter strategy is based on acquiringisolated measurements randomly drawn from the rows of a deterministic matrix. The resultingsensing matrix has thus some inherent structure, which is not the case of random matrices withi.i.d. entries, that were initially considered in CS. In our paper, the sensing matrix A is evenmore structured, in the sense that the full sampling matrix A has been partitioned into blocksof measurements.We also focus on obtaining RIPless results by combining structured acquisition and struc-tured sparsity. RIPless results [CP11] refer to CS approaches that are non-uniform in the sense5hat they hold for a given sensing matrix A and a given support S of length s , but not for all s -sparse vectors. Nevertheless, existing RIPless results in the literature are only based on thedegree of sparsity s = | S | . A main novelty of this paper is to develop RIPless results that ex-plicitly depend on the support S (and not only on its cardinality s ) of the signal to reconstruct.This strategy allows to incorporate any kind of prior information on the structure of S to studyits influence on the quality of CS reconstructions.Structured sparsity is a concept that appeared early in the history of compressed sensing. Theworks [Tro06, GN08, HSIG13] provide sufficient conditions to recover structured sparse signalsby using orthogonal matching pursuit or basis pursuit algorithms. Similar conditions (inexactdual certificate) are used in our work. The main novelty and difficulty in our contribution is toshow that very structured sampling matrices satisfy these conditions.Other authors [EM09, BCDH10, DE11, BJMO12] proposed to change the recovery algorithm,when a prior knowledge of structured sparsity is available. Their study is usually restricted torandom sub-Gaussian matrices which have no structure at all. At this point, we do not knowif better recovery guarantees could be obtained by using structured recovery algorithms withstructured sampling.Finally, let us mention that a few papers recently considered the problem of mobile sampling[UV13b, UV13a, GRUV14]. In these papers, the authors provide theoretical guarantees forthe exact reconstruction of bandlimited functions in the spirit of Shannon’s sampling theorem.These papers thus strongly differ from our compressed sensing perspective. The paper organization is as follows. Section 2 gives the formal setting of structured acquisition.Section 3 gives the main results, with a precise definition of Γ(
S, π ). Applications of our maintheorem to various settings are presented in Section 4. Technical appendices contain the proofsof the main results of this paper.
In this paper, n denotes the dimension of the signal to reconstruct. The notation S ⊂ { , . . . , n } refers to the support of the signal to reconstruct. The vectors ( e i ) ≤ i ≤ p denote the vectors of thecanonical basis of R d , where d will be equal to n or √ n , depending on the context. In the sequel,we set P S ∈ R n × n to be the projection matrix onto span ( { e i , i ∈ S } ), i.e. the diagonal matrixwith the j -th diagonal entry equal to 1 if j ∈ S , and 0 otherwise. We will use the shorthandnotation M S ∈ C n × n and v S ∈ C n to denote the matrix M P S and the vector P S v for M ∈ C n × n and v ∈ C n . Similarly, if M k denotes a matrix indexed by k , then M k,S = M k P S . For anymatrix M , for any 1 ≤ p, q ≤ ∞ , the operator norm (cid:107) M (cid:107) p → q is defined as (cid:107) M (cid:107) p → q = sup (cid:107) v (cid:107) p ≤ (cid:107) M v (cid:107) q , with (cid:107) · (cid:107) p and (cid:107) · (cid:107) q denoting the standard (cid:96) p and (cid:96) q norms. Note that for a matrix M ∈ R n × n , (cid:107) M (cid:107) ∞→∞ = max ≤ i ≤ n (cid:107) e ∗ i M (cid:107) . The function sign : R n → R n is defined by(sign( x )) i = x i > − x i <
00 if x i = 0 , and Id n will denote the n -dimensional identity matrix.6 .2 Sampling strategy In this paper, we assume that we are given some orthogonal matrix A ∈ C n × n representing theset of possible linear measurements imposed by a specific sensor device. Let ( I k ) ≤ k ≤ M denote apartition of the set { , . . . , n } . The rows ( a ∗ i ) ≤ i ≤ n ∈ C n of A are partitioned into the followingblocks dictionary ( B k ) ≤ k ≤ M , such that B k = ( a ∗ i ) i ∈I k ∈ C |I k |× n s.t. I k ⊂ { , . . . , n } , with (cid:116) Mk =1 I k = { , . . . , n } . The sensing matrix A is then constructed by randomly drawingblocks as follows A = 1 √ m (cid:18) √ π K (cid:96) B K (cid:96) (cid:19) ≤ (cid:96) ≤ m , (4)where ( K (cid:96) ) ≤ (cid:96) ≤ m are i.i.d. copies of a random variable K such that P ( K = k ) = π k , for all 1 ≤ k ≤ M . Moreover, thanks to the renormalization of the blocks B K (cid:96) by the weights1 / √ π K (cid:96) in model (4), the random block B K satisfies E (cid:18) B ∗ K B K π K (cid:19) = M (cid:88) k =1 B ∗ k B k = Id , (5)since A is orthogonal and ( B k ) ≤ k ≤ M is a partition of the rows of A . Remark 2.1.
The case of overlapping blocks can also be handled. To do so, we may define theblocks ( B k ) ≤ k ≤ M as follows: B k = (cid:18) √ α i a ∗ i (cid:19) i ∈I k , for ≤ k ≤ M, where M (cid:91) k =1 I k = { , . . . , n } . The coefficients ( α i ) ≤ i ≤ n denotes the multiplicity of the row a ∗ i ,namely the number of appearances α i = |{ k, i ∈ I k }| of this row in different blocks. Thisrenormalization is sufficient to ensure the isotropy condition E (cid:16) B ∗ K B K π K (cid:17) = Id where K is definedas above. Note that our block sampling strategy encompasses the standard acquisition based on isolatedmeasurements. Indeed, isolated measurements can be considered as blocks of measurementsconsisting of only one row of A . Remark 2.2.
More generally, the theorems could be extended - with slight adaptations - to thecase where the sensing matrix is A = 1 √ m B K ... B K m where B K , . . . , B K m are i.i.d. copies of a random matrix B ∈ C b × n satisfying E ( B ∗ B ) = Id . The integer b is itself random and Id is the n × n identity matrix. Assuming that B takes its valuein a countable family ( B k ) k ∈K , this formalism covers a large number of applications described in[BBW14]: (i) blocks with i.i.d. entries, (ii) partition of the rows of orthogonal transforms, (iii)cover of the rows of orthogonal transforms, (iv) cover of the rows from tight frames. Main Results
Before introducing our main results, we need to define some quantities (reminiscent of thecoherence) that will play a key role in our analysis.
Definition 3.1.
Consider a blocks dictionary ( B k ) ≤ k ≤ M . Let S ⊂ { , . . . , n } and π be aprobability distribution on { , . . . , M } . Define Θ( S, π ) := max ≤ k ≤ M π k (cid:107) B ∗ k B k,S (cid:107) ∞→∞ = max ≤ k ≤ M max ≤ i ≤ n (cid:107) e ∗ i B ∗ k B k,S (cid:107) π k , (6)Υ( S, π ) := max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ M (cid:88) k =1 π k | e ∗ i B ∗ k B k,S v | , (7)Γ( S, π ) := max (Υ(
S, π ) , Θ( S, π )) . (8)For the sake of readability, we will sometimes use the shorter notation Θ , Υ and Γ to denoteΘ(
S, π ) , Υ( S, π ) and Γ(
S, π ). In Definition 3.1, Θ is related to the local coherence and the degreeof sparsity, when the blocks are made of only one row (the case of isolated measurements).Indeed, in such a case, Θ reads as followsΘ(
S, π ) := max ≤ k ≤ n (cid:107) a k (cid:107) ∞ (cid:107) a k,S (cid:107) π k ≤ s · max ≤ k ≤ n (cid:107) a k (cid:107) ∞ π k . The quantity max ≤ k ≤ n (cid:107) a k (cid:107) ∞ π k refers to the usual notion of coherence described in [CP11]. Thequantity Υ is new and it is more delicate to interpret. It reflects an inter-block coherence. Arough upper-bound for Υ is Υ( S, π ) ≤ M (cid:88) k =1 π k (cid:107) B ∗ k B k,S (cid:107) ∞→∞ . by switching the maximum and supremum with the sum in the definition of Υ. However, it isimportant to keep this order (maximum, supremum and sum) to measure interferences betweenblocks. In Section 4, we give more precise evaluations of Θ( S, π ) and Υ(
S, π ) in particular cases.
Remark 3.2 (Support-dependency and drawing-dependency) . In Definition 3.1, the quantities Θ and Υ are drawing-dependent and support-dependent. Indeed, Γ does not only depend on thedegree of sparsity s = | S | . To the best of our knowledge, existing theories in CS only rely on s , see [CRT06a, CP11], or on degrees of sparsity structured by levels, see [AHPR13]. Since Γ is explicitly related to S , this allows to incorporate prior assumptions on the structure of S .Besides, the dependency on π (i.e. the way of drawing the measurements) is also explicit in thedefinition of Γ . This offers the flexibility to analyze the influence of π on the required number ofmeasurements. We therefore believe that the introduced quantities might play an important rolein the future analysis of CS. Our main result reads as follows.
Theorem 3.3.
Let S ⊂ { , . . . , n } be a set of indices of cardinality s ≥ and suppose that x ∈ C n is an s -sparse vector supported on S . Fix ε ∈ (0 , . Suppose that the sampling matrix A is constructed as in (4) . Suppose that Γ( S, π ) ≥ . If m ≥ · Γ( S, π ) ln(64 s ) (cid:18) ln (cid:18) nε (cid:19) + ln ln(64 s ) (cid:19) , (9) then x is the unique solution of (1) with probability larger than − ε . emark 3.4. In the sequel, we will simplify condition (9) by writing: m ≥ C · Γ( S, π ) ln( s ) ln (cid:16) nε (cid:17) where C is a universal constant. The proof of Theorem 3.3 is contained in AppendixA.1. It relies on the construction of aninexact dual certificate satisfying appropriate properties that are described in Lemma A.1. Thenour proof is based on the so-called golfing scheme introduced in [Gro11] for matrix completionand adapted by [CP11] for compressed sensing from isolated measurements. In the golfingscheme, the main difficulty is to control operator norms of random matrices extracted fromthe sensing matrix A . In [CP11], it is proposed to control (in probability) the operator norms (cid:107) · (cid:107) ∞→ and (cid:107) · (cid:107) → . However, this technique only gives results depending on the degree ofsparsity s . In order to include an explicit dependency on the support S , one has to modify thegolfing scheme in [CP11], by controlling the operator norm (cid:107) · (cid:107) ∞→∞ , instead of controlling theoperator norms (cid:107) · (cid:107) ∞→ and (cid:107) · (cid:107) → . A similar idea has been developed in [AHPR13]. Remark 3.5.
Compared to most compressed sensing results, the condition required in Theorem3.3 involves the extra multiplicative factor ln(64 s ) . This factor does not appear in [Gro11, CP11],but this is due to a mistake that was detected and corrected in [AH15]. Following the proofsproposed in [AH15], we could in fact obtain a bound of type: m ≥ C (cid:48) · Γ( S, π ) ln (cid:16) nε (cid:17) , with C (cid:48) > C . To the best of our knowledge, the ratio C (cid:48) /C obtained using the proof in [AH15]is of order . This means that the new bound becomes interesting only for s > · , i.e.in an asymptotic regime. In this paper, we therefore stick to the bound in Theorem 3.3 for(i) simplifying the proof of the main result and (ii) obtain the best results in a non asymptoticregime. The explicit dependency of Γ in S allows us to consider the case of a random support S . Proposition 3.6.
Let S ⊂ { , . . . , n } denote a random support. For some real positive γ ,suppose that the event Γ( S, π ) ≤ γ occurs with probability larger than − ε (cid:48) ( γ ) . If m (cid:38) γ ln( s ) ln( n/ε ) , then x is the unique solution of Problem 1 with probability higher than − ε − εε (cid:48) ( γ ) .Proof. Set m (cid:38) γ ln( s ) ln( n/ε ). Define the event R “ x is the unique solution of Problem 1”where R stands for “reconstruction of the signal”. Define also A the event “Γ( S, π ) ≥ γ ”. Thehypothesis of Proposition 3.6 and Theorem 3.3 give that P ( R | A ) ≥ − ε . To prove Proposition3.6, we must quantify P ( R ) = P ( R ∩ A ) + P ( R ∩ A c ) = P ( R | A ) P ( A ) + P ( R ∩ A c ) ≥ (1 − ε ) (cid:0) − ε (cid:48) ( γ ) (cid:1) = 1 − ε − εε (cid:48) ( γ ) , which concludes the proof. (cid:4) The choice of a drawing probability π minimizing the required number of block measurementsin Theorem 3.3, is a delicate issue. The distribution π (cid:63) minimizing Θ( S, π ) in Equation (6) canbe obtained explicitly: π (cid:63)k = (cid:107) B ∗ k B k,S (cid:107) ∞→∞ (cid:80) M(cid:96) =1 (cid:107) B ∗ (cid:96) B (cid:96),S (cid:107) ∞→∞ , for 1 ≤ k ≤ M. (10)9nfortunately, the minimization of Υ( S, π ) with respect to π seems much more involved and weleave this issue as an open question in the general case.Note however that in all the examples treated in the paper, we derive upper bounds dependingon ( S, π ) for Υ(
S, π ) and Θ(
S, π ) that coincide. The distribution π (cid:63) is then set to minimize thelatter upper bound.Note also that optimizing π independently of S will result in a sole dependence to the degreeof sparsity s = | S | which is not desirable if one wants to exploit structured sparsity. In this section, we first show that Theorem 3.3 can be used to recover state of the art results inthe case of isolated measurements [CP11]. We then show that it allows recovering recent resultswhen a prior on the sparsity structure is available. The proposed setting however applies to awider setting even in the case of isolated measurements. Finally, we illustrate the consequencesof our results when the acquisition is constrained by blocks of measurements. In the lattercase, we show that the sparsity structure should be adapted to the sampling structure for exactrecovery.
First, we focus on an acquisition based on isolated measurements which is the most widespreadin CS. This case corresponds to choose blocks of form B k = a ∗ k for 1 ≤ k ≤ n with M = n ,where a ∗ k are the rows of an orthogonal matrix. In such a setting, the sensing matrix can bewritten as follows A = 1 √ m (cid:18) √ π K (cid:96) a ∗ K (cid:96) (cid:19) ≤ (cid:96) ≤ m , (11)where ( K (cid:96) ) ≤ (cid:96) ≤ m are i.i.d. copies of K such that P ( K = k ) = π k , for 1 ≤ k ≤ n .We apply Theorem 3.3 when only the degree of sparsity s of the signal to reconstruct isknown. This is the setting considered in most CS papers (see e.g. [CT06, Rau10, CP11]). Inthis context, our main result can be rewritten as follows. Corollary 4.1.
Let S ⊂ { , . . . , n } be a set of indices of cardinality s and suppose that x ∈ C n is an s -sparse vector. Fix ε ∈ (0 , . Suppose that the sampling matrix A is constructed as in (11) . If m ≥ C · s · max ≤ k ≤ n (cid:107) a k (cid:107) ∞ π k ln( s ) ln (cid:16) nε (cid:17) , (12) then x is the unique solution of (1) with probability at least − ε .Moreover, the drawing distribution minimizing (12) is π k = (cid:107) a k (cid:107) ∞ (cid:80) n(cid:96) =1 (cid:107) a (cid:96) (cid:107) ∞ , which leads to m ≥ C · s · n (cid:88) k =1 (cid:107) a k (cid:107) ∞ ln( s ) ln (cid:16) nε (cid:17) . The proof is given in Appendix D.1.Note that Corollary 4.1 is identical to Theorem 1.1 in [CP11] up to a logarithmic factor.This result is usually used to explain the practical success of variable density sampling. It is thecore of papers such as [PVW11, KW14, CCKW14].10 .2 Isolated measurements with structured sparsity
When using coherent transforms, meaning that the term max ≤ k ≤ n (cid:107) a k (cid:107) ∞ π k in Equation (12) isan increasing function of n , Corollary 4.1 is unsufficient to justify the use of CS in applications.In this section, we show that the proposed results allow justifying the use of CS even in theextreme case where the sensing is performed with the canonical basis. S Suppose that the signal x to reconstruct is S -sparse where S ⊆ { , . . . , n } is a fixed subset.Consider the highly coherent case where A = Id. All current CS theories would give the sameunsatisfactory conclusion: it is not possible to use CS since A is a perfectly coherent transform.Indeed, the bound on the required number of isolated measurements given by standard CStheories [CP11] reads as follows m ≥ C · s · max ≤ k ≤ n (cid:107) e ∗ k (cid:107) ∞ π k ln ( n/ε ) = C · s · max ≤ k ≤ n π k ln ( n/ε ) . Without any assumption on the support S , one can choose to draw the measurements uniformlyat random, i.e. π k = 1 /n for 1 ≤ k ≤ n . This particular choice leads to a required number ofmeasurements of the order m ≥ C · s · n ln ( n/ε ) , which corresponds to fully sampling the acquisition space several times.Let us now see what conclusion can be drawn with Theorem 3.3. Corollary 4.2.
Let S ⊆ { , . . . , n } of cardinality s . Suppose that x ∈ C n is an S -sparse vector.Fix ε ∈ (0 , . Suppose that the sampling matrix A is constructed as in (11) with A = Id . Set π k = δ k,S s for ≤ k ≤ n where δ k,S = 1 if k ∈ S , otherwise. Suppose that m ≥ C · s · ln( s ) ln (cid:16) nε (cid:17) . then x is the unique solution of (1) with probability at least − ε . With this new result, O ( s ln( s ) ln( n )) measurements are sufficient to reconstruct the signalvia a totally coherent. The least amount of measurements necessary to recover x is of order O ( s ln( s )), by an argument of coupon collector effect [Fel08, p.262]. Therefore, Corollary 4.2 isnear-optimal up to logarithmic factors. Proof.
The result ensues from a direct evaluation of Γ. Indeed, (cid:107) e k e ∗ k,S (cid:107) ∞→∞ = max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ | (cid:10) e i , e k e ∗ k,S v (cid:11) | = sup (cid:107) v (cid:107) ∞ ≤ | e ∗ k,S v | = δ k,S , where δ k,S = 1 if k ∈ S , 0 otherwise. ThereforeΘ = max ≤ k ≤ n δ k,S π k . Then, we can write thatΥ(
S, π ) = max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k | e ∗ i e k e ∗ k,S v | = max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ | e ∗ i,S v | π i = max ≤ i ≤ n δ i,S π i . To conclude the proof it suffices to apply Theorem 3.3. (cid:4) .2.2 Isolated measurements when the degree of sparsity is structured by levels In this part, we consider a partition of { , . . . , n } into levels (Ω i ) i =1 ,...,N ⊂ { , . . . , n } such that (cid:71) ≤ i ≤ N Ω i = { , . . . , n } and | Ω i | = N i . We consider that x is S -sparse with | S ∩ Ω i | = s i for 1 ≤ i ≤ N meaning that restricted to thelevel Ω i , the signal P Ω i x is s i -sparse. This setting is studied extensively in the recent papers[AHPR13, RHA14, BH14]. Theorem 3.3 provides the following guarantees. Corollary 4.3.
Let S ⊂ { , . . . , n } be a set of indices of cardinality s , such that | S ∩ Ω i | = s i for ≤ i ≤ N . Suppose that x ∈ C n is an S -sparse vector. Fix ε ∈ (0 , . Suppose that thesampling matrix A is constructed as in (11) . Set m ≥ C (cid:32) max ≤ k ≤ n (cid:80) N(cid:96) =1 s (cid:96) (cid:107) a k, Ω (cid:96) (cid:107) ∞ (cid:107) a k (cid:107) ∞ π k (cid:33) ln( s ) ln (cid:16) nε (cid:17) , (13) m ≥ C (cid:32) max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k | e ∗ i a k | (cid:12)(cid:12) a ∗ k,S v (cid:12)(cid:12) (cid:33) ln( s ) ln (cid:16) nε (cid:17) , (14) then x is the unique solution of (1) with probability at least − ε . The proof of Corollary 4.3 is given in Appendix D.2.1. We show in Appendix D.2.2 that asimple analysis leads to results that are nearly equivalent to those in [AHPR13]. It should benoted that the term (cid:107) a k, Ω (cid:96) (cid:107) ∞ (cid:107) a k (cid:107) ∞ π k is related to the notion of local coherence defined in [AHPR13].There are however a few differences making our approach potentially more interesting in thecase of isolated measurements: • Our paper is based on i.i.d. sampling with an arbitrary drawing distribution. This leaves alot of freedom for generating sampling patterns and optimizing the probability π in orderto minimize the upper-bounds (13) and (14). In contrast, the results in [AHPR13] arebased on uniform Bernoulli sampling over fixed levels. The dependency on the levels isnot explicit and it therefore seems complicated to optimize them. • We can deal with a fixed support S , which enlarges the possibilities for structured sparsity.It is also possible to consider random supports as explained in Proposition 3.6. The bounds in Corollary 4.3 are rather cryptic. They have to be analyzed separately for eachsampling strategy. To conclude the discussion on isolated measurements, we provide a practicalexample with the 1D Fourier-Haar system.We set A = F φ ∗ , where F ∈ C n × n is the 1D Fourier transform and φ ∗ ∈ C n × n is the 1D in-verse wavelet transform. To simplify the notation, we assume that n = 2 J and we decompose thesignal at the maximum level J = log ( n ) −
1. In order to state our result, we introduce a dyadicpartition (Ω j ) ≤ j ≤ J of the set { , . . . , n } . We set Ω = { } , Ω = { } , Ω = { , } , . . . , Ω J = { n/ , . . . , n } . We also define the function j : { , . . . , n } → { , . . . , J } by j ( u ) = j if u ∈ Ω j . Corollary 4.4.
Let S ⊂ { , . . . , n } be a set of indices of cardinality s , such that | S ∩ Ω j | = s j for ≤ j ≤ J . Suppose that x ∈ C n is an s -sparse vector supported on S . Fix ε ∈ (0 , . Supposethat A is constructed from the Fourier-Haar transform A . Choose π k to be constant by level,i.e. π k = ˜ π j ( k ) . If m ≥ C · max ≤ j ≤ J π j − j J (cid:88) p =0 −| j − p | / s p · ln( s ) ln (cid:16) nε (cid:17) , (15)12 hen x is the unique solution of (1) with probability at least − ε .In particular, the distribution minimizing (15) is ˜ π j = 2 − j (cid:80) Jp =0 −| j − p | / s p (cid:80) n(cid:96) =1 − j ( (cid:96) ) (cid:80) Jp =0 −| j ( (cid:96) ) − p | / s p , which leads to m ≥ C · J (cid:88) j =0 s j + J (cid:88) p =0 p (cid:54) = j −| j − p | / s p · ln( s ) ln (cid:16) nε (cid:17) . (16)The proof is presented in Section D.3. This corollary is once again similar to the results in[AHR14b]. The number of measurements in each level j should depend on the degree of sparsity s j but also on the degree of sparsity of the other levels which is more and more attenuated whenthe level is far away from the j -th one. Remark 4.5.
The Fourier-Wavelet system is coherent and the initial compressed sensing the-ories cannot explain the success of sampling strategies with such a transform. To overcome thecoherence, two strategies have been devised. The first one is based on variable density sampling(see e.g. [PMG +
12, CCKW14, KW14]). The second one is based on variable density samplingand an additional structured sparsity assumption (see e.g. [AHPR13] and Corollary 4.4). First,note that the results obtained with the latter approach allow recovering signal with arbitrarysupports. Indeed, J (cid:88) j =0 s j + J (cid:88) p =0 p (cid:54) = j −| j − p | / s p ≤ s .Second, it is not clear yet - from a theoretical point of view - that the structure assump-tion allows obtaining better guarantees. Indeed, it is possible to show that the sole variabledensity sampling leads to perfect reconstruction from m ∝ s ln( n ) measurements, which is onpar with bound (16) . It will become clear that structured sparsity is essential when using theFourier-Wavelet systems with structured acquisition. Morever, the numerical experiments ledin [AHR14b] let no doubt about the fact that structured sparsity is essential to ensure goodreconstruction with a low number of measurements. In this paragraph, we illustrate how Theorem 3.3 explains the practical success of structuredacquisition in applications. We will mainly focus on the 2D setting: the vector x ∈ C n toreconstruct can be seen as an image of size √ n × √ n . In [BBW14, PDG15], the authors provided theoretical CS results when using block-constrainedacquisitions. Moreover, the results in [BBW14] are proved to be tight in many practical situa-tions. Unfortunately, the bounds on the number of blocks of measurements necessary for perfectreconstruction are incompatible with a faster acquisition.To illustrate this fact, let us recall a typical result emanating from [BBW14]. It shows thatthe recovery of sparse vectors with an arbitrary support is of little interest when sampling linesof tensor product transforms. This setting is widely used in imaging. It corresponds to the MRIsampling strategy proposed in [LDP07].
Proposition 4.6 ([BBW14]) . Suppose that A = φ ⊗ φ ∈ C n × n is a 2D separable transform,where φ ∈ C √ n ×√ n is an orthogonal transform. Consider blocks of measurements made of √ n horizontal lines in the 2D acquisition space, i.e. for ≤ k ≤ √ nB k = (cid:16) φ k, φ, . . . , φ k, √ n φ (cid:17) . f the number of acquired lines m is less than min(2 s, √ n ) , then there exists no decoder ∆ such that ∆( Ax ) = x for all s -sparse vector x ∈ C n .In other words, the minimal number m of distinct blocks required to identify every s -sparsevectors is necessarily larger than min(2 s, √ n ) . This theoretical bound is quite surprising: it seems to enter in contradiction with the practicalresults obtained in Figure 2 or with one of the most standard CS strategy in MRI [LDP07].Indeed, the equivalent number of isolated measurements required by Proposition 4.6 is of theorder O ( s √ n ). This theoretical result means that in many applications, a full sampling strategyshould be adopted, when the acquisition is structured by horizontal lines. In the next paragraphs,we show how Theorem 3.3 allows bridging the gap between theoretical recovery and practicalexperiments. In this paragraph, we illustrate - through a simple example - that additional assumptions onstructured sparsity is the key to explain practical results.
Corollary 4.7.
Let A ∈ C n × n be the 2D Fourier transform. Assume that x is a 2D signal withsupport S concentrated on q horizontal lines of the spatial plane, i.e. S ⊂ { ( j − √ n + { , . . . , √ n } , j ∈ J } (17) where J ⊂ { , . . . , √ n } and | J | = q .Choose a uniform sampling strategy among the √ n horizontal lines, i.e. π (cid:63)k = 1 / √ n for ≤ k ≤ √ n . The number m of sampled horizontal lines sufficient to reconstruct x with probability − ε is m ≥ C · q · ln( s ) ln (cid:16) nε (cid:17) . The proof is given in Appendix D.4 By Proposition 4.7, we can observe that the requirednumber of sampled lines is of the order of non-zero lines in the 2D signal. In comparison,Proposition 4.6 in [BBW14] (with no structured sparsity) requires m (cid:38) s · ln( n/ε ) , measurements, to get the same guarantees. This means that the required number of horizontallines to sample is of the order of the non-zero coefficients. By putting aside the logarithmicfactors, we see that the gain with our new approach is considerable. Clearly, our strategy is ableto take advantage of the sparsity structure of the signal of interest. We now turn to a real MRI application. We assume that the sensing matrix A ∈ C n × n is the product of the 2D Fourier transform F D with the inverse 2D wavelet transform Φ ∗ .We aim at reconstructing a vector x ∈ C n that can be seen as a 2D wavelet transform with √ n × √ n coefficients. Set J = log ( √ n ) − τ j ) ≤ j ≤ J denote a dyadic partition ofthe set { , . . . , √ n } , i.e. τ = { } , τ = { } , τ = { , } , . . . , τ J = {√ n/ , . . . , √ n } . Define j : { , . . . , √ n } → { , . . . , J } by j ( u ) = j if u ∈ τ j . Finally, define the sets Ω (cid:96),(cid:96) (cid:48) = τ (cid:96) × τ (cid:96) (cid:48) , for0 ≤ (cid:96), (cid:96) (cid:48) ≤ J . See Figure 3 for an illustration of these sets. Definition 4.8.
Given S = supp ( x ) , define the following quantity s c(cid:96) := max ≤ (cid:96) (cid:48) ≤ J max k ∈ τ (cid:96) (cid:48) (cid:12)(cid:12) S ∩ Ω (cid:96),(cid:96) (cid:48) ∩ C k (cid:12)(cid:12) , (18) where C k represents the set corresponding to the k -th vertical line (see Figure 3). x ∈ C n to reconstruct. The vector x can be reshaped in a √ n ×√ n matrix. C k represents the coefficient indexes corresponding to the k -th vertical column.The quantity s c(cid:96) represents the maximal sparsity of x restricted to columns (or vertical lines)of ∪ ≤ l (cid:48) ≤ J Ω (cid:96),(cid:96) (cid:48) . We have now settled everything to state our result.As a first step, we will consider the case of Shannon’s wavelets, leading to a block-diagonalsampling matrix A . Corollary 4.9.
Let S ⊂ { , . . . , n } be a set of indices of cardinality s . Suppose that x ∈ C n is an s -sparse vector supported on S . Fix ε ∈ (0 , . Suppose that A is the product of the 2DFourier transform with the 2D inverse Shannon’s wavelets transform. Consider that the blocks ofmeasurements are the √ n horizontal lines in the 2D setting. Choose ( π k ) ≤ k ≤√ n to be constantby level, i.e. π k = ˜ π j ( k ) . If the number of horizontal lines to acquire satisfies m (cid:38) max ≤ j ≤ J π j − j s cj ln( s ) ln (cid:16) nε (cid:17) , then x is the unique solution of Problem 1. Furthermore, choosing ˜ π j = s cj / j (cid:80) J(cid:96) =0 s c(cid:96) , for ≤ j ≤ J ,leads to the following upper bound m (cid:38) J (cid:88) j =0 s cj ln( s ) ln (cid:16) nε (cid:17) . The proof is given in Section D.5. Corollary 4.9 shows that the number of lines acquired atlevel j depends only on an extra-column structure of S . Now let us turn to a case where thematrix A is not block-diagonal anymore. Corollary 4.10.
Suppose that x ∈ C n is an S -sparse vector. Fix ε ∈ (0 , . Suppose that A isthe product of the 2D Fourier transform with the 2D inverse Haar transform. Consider that the locks of measurements are the √ n horizontal lines. Choose ( π k ) ≤ k ≤√ n to be constant by level,i.e. π k = ˜ π j ( k ) .If the number m of drawn horizontal lines satisfies m (cid:38) max ≤ j ≤ J − j ˜ π j J (cid:88) r =0 −| j − r | / s cr ln( s ) ln (cid:16) nε (cid:17) , then x is the unique solution of Problem 1 with probablity − ε .In particular, if π k = 2 − j ( k ) (cid:80) Jr =0 −| j − r | / s cr (cid:80) √ n(cid:96) =1 − j ( (cid:96) ) (cid:80) Jr =0 −| j ( (cid:96) ) − r | / s cr , then m (cid:38) J (cid:88) j =0 s cj + J (cid:88) r =0 r (cid:54) = j −| j − r | / s cr · ln( s ) ln (cid:16) nε (cid:17) ensures perfect reconstruction with probability − ε . The proof of Corollary 4.10 is given in Section D.6.This result indicates that the number of acquired lines in the ”horizontal” level j should bechosen depending on the quantities s cj . Note that this is very different from the sparsity by levelsproposed in [AHPR13]. In conclusion, Corollary 4.10 reveals that with a structured acquisition,the sparsity needs to be more structured in order to guarantee exact recovery. To the best of ourknowledge, this is the first theoretical result which can explain why sampling lines in MRI as in[LDP07] might work. In Figure 4, we illustrate that the results in Corollary 4.10 seem to indeedcorrespond to the practical reality. In this experiment, we seek reconstructing a reeds imagefrom block structured measurements. As a test image, we chose a reeds image with verticalstripes of its rotated version. This particular geometrical structure explains that the quantities s cj are much higher for the horizontal stripes than for the vertical one. As can be seen, the imagewith a low s cj is much better reconstructed than the one with a high s cj . This wa predicted byour theory. 16 ampling scheme(a) Original image (b) SNR = 27.8 dB s c = (16 , , , , , , (c) Original image (d) SNR = 14.7 dB s c = (16 , , , , , , × ◦ . Weprecise the value of the vector s c = (cid:16) s cj (cid:17) ≤ j ≤ for both images. Note that the quantities s cj arelarger in the case of image (b). For the reconstruction, we use the sampling scheme at the top ofthe Figure. It corresponds to 9.8 % of measurements. In (b) (d), corresponding reconstructionvia (cid:96) -minimization. We have rotated the image in (d) to facilitate the comparison betweenboth. Note that (b) is much better reconstructed than (d). This is predicted by Corollary 4.10.17 Extensions
We analyzed the combination of structured acquisition and structured sparsity with i.i.d. draw-ings of random blocks. These results can be extended to a Bernoulli sampling setting. In sucha setting, the sensing matrix is constructed as follows A = (cid:18) δ k √ π k B k (cid:19) ≤ k ≤ M , where ( δ k ) ≤ k ≤ M are independent Bernoulli random variables such that P ( δ k = 1) = π k , for1 ≤ k ≤ M . We may set (cid:80) Mk =1 π k = m in order to measure m blocks of measurementsin expectation. By considering the same definition for Γ( S, π ) with ( π k ) ≤ k ≤ M the Bernoulliweights, it is possible, for the case of Bernoulli block sampling, to give a reconstruction resultthat shares a similar flavor to Theorem 3.3. The results in Section 4.3.3 lead to the conclusion that exact recovery with structured acquisitioncan only occur if the the signal to reconstruct possesses an adequate sparsity pattern. We believethat the proposed theorems might help designing new efficient and feasible sampling schemes.Ideally, this could be done by optimizing Γ(
S, π ) assuming that S belongs to some set of realisticsignals. Unfortunately, this optimization seems unrealistic to perform numerically, owing to thehuge dimensions of the objects involved. We therefore leave this question open for future works.However, probing the limits of a given system, as was proposed in Corollary 4.10 helpsdesigning better sampling schemes. To illustrate this fact, we performed a simple experiment.Since the quantity s jc is critical to characterize a sampler efficiency, it is likely that mixinghorizontal and vertical sampling lines improves the situation. We aim at reconstructing the MRimage shown in Figure 5 and assume that it is sparse in the wavelet basis. In Figure 5(a)(d),we propose two different sampling schemes. The first one is based solely on parallel lines in thehorizontal direction, while the second one is based on a combination of vertical and horizontallines. The combination of vertical and horizontal lines provides much better reconstructionresults despite a lower total number of measurements.18 eference image(a) Sampling scheme (b) SNR = 24.47 dB (c)(d) Sampling scheme (e) SNR = 26.74 dB (f) Figure 5: An example of MRI reconstruction of a 2048 × (cid:96) -minimization. In (c) (f) we enhance the results by zooming on the reconstructed images.Note that the horizontal and vertical sampling scheme produces much better reconstructionresults despite a smaller number of measurements since samples are overlapping. Moreover, theacquisition time would be exactly the same for an MRI. Acknowledgement
The authors would like to thank Ben Adcock and Anders Hansen for their availibility for dis-cussion. They are also grateful to Nicolas Chauffert for discussion. This work was partiallysupported by the CIMI (Centre International de Math´ematiques et d’Informatique) Excellenceprogram. 19
Proofs of the main results
A.1 Proof of Theorem 3.3
In this section, we give sufficient conditions to guarantee that the vector x is the unique minimizerof (1), using an inexact dual certificate see [CP11]. Lemma A.1 (Inexact duality [CP11]) . Suppose that x ∈ R n is supported on S ⊂ { , . . . , n } .Assume that A S is full column rank and that (cid:107) ( A ∗ S A S ) − (cid:107) → ≤ and max i ∈ S c (cid:107) A ∗ S Ae i (cid:107) ≤ , (19) where ( A ∗ S A S ) − only makes sense on the set span { e i , i ∈ S } . Morever, suppose that there exists v ∈ R n in the row space of A obeying (cid:107) v S − sign( x S ) (cid:107) ≤ / and (cid:107) v S c (cid:107) ∞ ≤ / , (20) Then, the vector x is the unique solution of the minimization problem (1)First, let us focus on Conditions (19). Remark that A ∗ S A S is invertible by assuming that A S is full column-rank. Moreover, (cid:107) ( A ∗ S A S ) − (cid:107) → = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) k =0 ( A ∗ S A S − P S ) k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) → ≤ ∞ (cid:88) k =0 (cid:107) A ∗ S A S − P S (cid:107) k → . Therefore, if (cid:107) A ∗ S A S − P S (cid:107) → ≤ / (cid:107) ( A ∗ S A S ) − (cid:107) → ≤
2. Moreover, byLemma C.1, (cid:107) ( A ∗ S A S ) − (cid:107) → ≤ − ε , provided that m ≥
283 Θ(
S, π ) ln (cid:18) sε (cid:19) . By definition of Γ(
S, π ), the first inequality of Conditions (19) is therefore ensured with proba-bility larger than 1 − ε if m ≥
283 Γ(
S, π ) ln (cid:18) sε (cid:19) . (21)Furthermore, using Lemma C.5, we obtain thatmax i ∈ S c (cid:107) A ∗ S Ae i (cid:107) ≤ − ε if m ≥ Θ( S, π ) (cid:18) (cid:114) ln (cid:16) nε (cid:17) + 4ln (cid:16) nε (cid:17)(cid:19) . Again by definition of Γ(
S, π ), the second part of Conditions (20) is ensured if n ≥ m ≥ S, π ) ln (cid:16) nε (cid:17) . (22)Conditions (20) remain to be verified. The rest of the proof of Theorem 3.3 relies on theconstruction of a vector v satisfying the conditions described in Lemma A.1 with high probability.To do so, we adapt the so-called golfing scheme introduced by Gross [Gro11] to our setting. Moreprecisely, we will iteratively construct a vector that converges to a vector v satisfying (20) withhigh probability.Let us first partition the sensing matrix A into blocks of blocks so that, from now on, wedenote by A (1) the first m blocks of A , A (2) the next m blocks, and so on. The L random20atrices (cid:8) A ( (cid:96) ) (cid:9) (cid:96) =1 ,...,L are independently distributed, and we have that m = m + m + . . . + m L .As explained before, A ( (cid:96) ) S denotes the matrix A ( (cid:96) ) P S .The golfing scheme starts by defining v (0) = 0, and then it iteratively defines v ( (cid:96) ) = mm (cid:96) A ( (cid:96) ) ∗ A ( (cid:96) ) S (cid:16) sign( x ) − v ( (cid:96) − (cid:17) + v ( (cid:96) − , (23)for (cid:96) = 1 , . . . , L , where sign( x i ) = 0 if x i = 0. In the rest of the proof, we set v = v ( L ) . Byconstruction, v is in the row space of A . The main idea of the golfing scheme is then to combinethe results from the various Lemmas in Section C with an appropriate choice of L to show thatthe random vector v satisfies the assumptions of Lemma A.1 with large probability. Using theshorthand notation v ( (cid:96) ) S = P S v ( (cid:96) ) , let us define w ( (cid:96) ) = sign( x ) − v ( (cid:96) ) S , (cid:96) = 1 , . . . , L, where x ∈ C n is the solution of Problem (1).From the definition of v ( (cid:96) ) , it follows that, for any 1 ≤ (cid:96) ≤ L , w ( (cid:96) ) = (cid:18) P S − mm (cid:96) A ( (cid:96) ) ∗ S A ( (cid:96) ) S (cid:19) w ( (cid:96) − = (cid:96) (cid:89) j =1 (cid:18) P S − mm j A ( j ) ∗ S A ( j ) S (cid:19) sign( x ) , (24)and v = L (cid:88) (cid:96) =1 mm (cid:96) A ( (cid:96) ) ∗ A ( (cid:96) ) S w ( (cid:96) − . (25)Note that in particular, w (0) = sign( x ) and w ( L ) = sign( x ) − v . In what follows, it will beshown that the matrices P S − mm (cid:96) A ( (cid:96) ) ∗ S A ( (cid:96) ) S are contractions and that the norm of the vector w ( (cid:96) ) decreases geometrically fast with (cid:96) . Therefore, v ( (cid:96) ) S becomes close to sign( x S ) as (cid:96) tends to L . Inparticular, we will prove that (cid:107) w ( L ) (cid:107) ≤ / L . In addition, we also showthat v satisfies the condition (cid:107) v S c (cid:107) ∞ ≤ /
4. All these conditions will be shown to be satisfiedwith a large probability (depending on ε ).For all 1 ≤ (cid:96) ≤ L , assume that (cid:13)(cid:13)(cid:13) w ( (cid:96) ) (cid:13)(cid:13)(cid:13) ≤ r (cid:96) (cid:13)(cid:13)(cid:13) w ( (cid:96) − (cid:13)(cid:13)(cid:13) (C1- (cid:96) ) (cid:13)(cid:13)(cid:13)(cid:13) mm (cid:96) (cid:16) A ( (cid:96) ) S c (cid:17) ∗ A ( (cid:96) ) S w ( (cid:96) − (cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:96) (cid:107) w ( (cid:96) − (cid:107) ∞ (C2- (cid:96) ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) mm (cid:96) (cid:16) A ( (cid:96) ) S (cid:17) ∗ A ( (cid:96) ) S − P S (cid:19) w ( (cid:96) − (cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:48) (cid:96) (cid:107) w ( (cid:96) − (cid:107) ∞ , (C3- (cid:96) )with(i) L = 2 + (cid:108) ln( s )2 ln 2 (cid:109) ,(ii) r (cid:96) = , for (cid:96) = 1 , . . . , L ,(iii) t (cid:96) = t (cid:48) (cid:96) = for (cid:96) = 1 , . . . , L .Note that using (C1- (cid:96) ), we can write that (cid:107) sign( x S ) − v S (cid:107) = (cid:107) w ( L ) S (cid:107) ≤ (cid:107) sign( x S ) (cid:107) L (cid:89) (cid:96) =1 r (cid:96) ≤ √ s L (cid:89) (cid:96) =1 r (cid:96) ≤ √ s L ≤ , (26)where the last inequality follows from the previously specified choice on L .21urthermore, Equation (C2- (cid:96) ) implies that (cid:107) v S c (cid:107) ∞ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L (cid:88) (cid:96) =1 mm (cid:96) (cid:16) A ( (cid:96) ) S c (cid:17) ∗ A ( (cid:96) ) S w ( (cid:96) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ L (cid:88) (cid:96) =1 (cid:13)(cid:13)(cid:13)(cid:13) mm (cid:96) (cid:16) A ( (cid:96) ) S c (cid:17) ∗ A ( (cid:96) ) S w ( (cid:96) − (cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ L (cid:88) (cid:96) =1 t (cid:96) (cid:13)(cid:13)(cid:13) w ( (cid:96) − (cid:13)(cid:13)(cid:13) ∞ ≤ L (cid:88) (cid:96) =1 t (cid:96) (cid:96) − (cid:89) j =1 t (cid:48) j = (cid:18) (cid:19) − (1 / L − / ≤ . (27)Note that in Inequality (27), the control of the operator norms ∞ → ∞ avoids the apparitionof √ s as in the usual golfing scheme of [CP11]. Indeed, in our proof strategy, we have used thefact that (cid:107) w (cid:107) ∞ = (cid:107) sign( x S ) (cid:107) ∞ = 1, whereas in [CP11] (cid:107) w (cid:107) = (cid:107) sign( x S ) (cid:107) ≤ √ s is involved.This is a key step in the proof, since the absence of the degree of sparsity at this stage allows toderive results depending only on S and not on its cardinality s = | S | .We denote by p ( (cid:96) ), p ( (cid:96) ) and p ( (cid:96) ) the probabilities that the upper bounds (C1- (cid:96) ), (C2- (cid:96) )and (C3- (cid:96) ) do not hold.Let us call ”failure C” the event in which one of the 3 L inequalities (C1- (cid:96) ), (C2- (cid:96) ), (C3- (cid:96) ) isnot satisfied. Then, P (failure C) ≤ L (cid:88) (cid:96) =1 P (failure (C1- (cid:96) )) + P (failure (C2- (cid:96) )) + P (failure (C3- (cid:96) )) . Therefore a sufficient condition for P (failure C) ≤ ε is (cid:80) L(cid:96) =1 p ( (cid:96) ) + p ( (cid:96) ) + p ( (cid:96) ) ≤ ε whichholds provided that p ( (cid:96) ) ≤ ε/ L , p ( (cid:96) ) ≤ ε/ L and p ( (cid:96) ) ≤ ε/ L for every (cid:96) = 1 , . . . , L . ByLemma C.2, condition p ( (cid:96) ) ≤ ε/ L is satisfied if m (cid:96) ≥ S, π ) (cid:18) ln (cid:18) Lε (cid:19) + 14 (cid:19) . By Lemma C.3, condition p ( (cid:96) ) ≤ ε/ L is satisfied if m (cid:96) ≥ S, π ) ln (cid:18) nLε (cid:19) . By Lemma C.4, condition p ( (cid:96) ) ≤ ε/ L is satisfied if m (cid:96) ≥ S, π ) ln (cid:18) nLε (cid:19) . Overall, condition m (cid:96) ≥ S, π ) ln (cid:18) nLε (cid:19) (28)ensures that (26) and (27) are satisfied with probability 1 − ε . Condition m = L (cid:88) (cid:96) =1 m (cid:96) ≥ (cid:18) ln( s )2 ln(2) + 3 (cid:19) Γ( S, π ) ln (cid:0) nLε − (cid:1) m ≥ · Γ( S, π ) ln(64 s ) (cid:18) ln (cid:18) nε (cid:19) + ln ln(64 s ) (cid:19) . (29)The latter condition ensures that the random vector v , defined by (25), satisfies Assumptions20 of Lemma A.1 with probability larger than 1 − ε .Hence, we have thus shown that if conditions (21), (22) and (29) are satisfied, then theAssumptions 19 and 20 of Lemma A.1 simultaneously hold with probability larger than 1 − ε .Note that bound (29) implies (21) and (22). B Bernstein’s inequalities
Theorem B.1 (Scalar Bernstein Inequality) . Let x , . . . , x m be independent real-valued, zero-mean, random variables such that | x (cid:96) | ≤ K almost surely for every (cid:96) ∈ { , . . . , m } . Assume that E | x (cid:96) | ≤ σ (cid:96) for (cid:96) ∈ { , . . . , m } . Then for all t > , P (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) (cid:96) =1 x (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:33) ≤ (cid:18) − t / σ + Kt/ (cid:19) , with σ ≥ (cid:80) m(cid:96) =1 σ (cid:96) . Theorem B.2 (Vector Bernstein Inequality (V1)) . [CP11, Theorem 2.6] Let ( y k ) ≤ k ≤ m be afinite sequence of independent random complex vectors of dimension n . Suppose that E y k = 0 and (cid:107) y k (cid:107) ≤ K a.s. for some constant K > and set σ ≥ (cid:80) k E (cid:107) y k (cid:107) . Let Z = (cid:107) (cid:80) mk =1 y k (cid:107) .Then, for any < t ≤ σ /K , we have that P ( Z ≥ t ) ≤ exp (cid:32) − ( t/σ − (cid:33) ≤ exp (cid:18) − t σ + 14 (cid:19) . Theorem B.3 (Bernstein Inequality for self-adjoint matrices) . Let ( Z k ) ≤ k ≤ n be a finite se-quence of independent, random, self-adjoint matrices of dimension d , and let a k be a sequenceof fixed self-adjoint matrices. Suppose that Z k is such that E Z k = 0 and (cid:107) Z k (cid:107) → ≤ K a.s.for some constant K > that is independent of k . Moreover, assume that E Z k (cid:22) A k for each ≤ k ≤ n . Define σ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) k =1 A k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) → Then, for any t > , we have that P (cid:32)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) k =1 Z k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) → ≥ t (cid:33) ≤ d exp (cid:18) − t / σ + Kt/ (cid:19) . Proof.
This result is as an application of the techniques developed in [Tro12] to obtain tailbounds for sum of random matrices. Our arguments follow those in the proof of Theorem 6.1in [Tro12]. We assume that K = 1 since the general result follows by a scaling argument. Usingthe assumption that E Z k (cid:22) A k , and by applying the arguments in the proof of Lemma 6.7 in[Tro12], we obtain that E exp ( θZ k ) (cid:22) exp (cid:0) g ( θ ) A k (cid:1) , for any real θ >
0, where g ( θ ) = e θ − θ −
1, and the notation exp( A ) denotes the matrixexponential of a self-adjoint matrix A (see [Tro12] for further details). Therefore, by Corollary3.7 in [Tro12], it follows that P (cid:32)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:88) k =1 Z k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) → ≥ t (cid:33) ≤ d inf θ> (cid:110) e − θt + σ g ( θ ) (cid:111) , (30)23here σ = (cid:13)(cid:13)(cid:80) nk =1 A k (cid:13)(cid:13) → . To conclude, we follow the proof of Theorem 6.1 in [Tro12]. Thefunction θ (cid:55)→ − θt + σ g ( θ ) attains its minimum for θ = ln(1 + t/σ ), which implies that theminimal value of the right-hand size of Inequality (30) is d exp (cid:0) − σ h ( t/σ ) (cid:1) where h ( u ) =(1 + u ) ln(1 + u ) − u for u ≥
0. To complete the proof, it suffices to use the standard lower bound h ( u ) ≥ u / u/ for u ≥ (cid:4) C Estimates: auxiliary results
Let S be the support of the signal to be reconstructed such that | S | = s . We setΛ( S, π ) := max ≤ k ≤ M π k (cid:13)(cid:13) B ∗ k,S B k,S (cid:13)(cid:13) → . Note that (cid:13)(cid:13)(cid:13) B ∗ k,S B k,S (cid:13)(cid:13)(cid:13) → ≤ (cid:13)(cid:13)(cid:13) B ∗ k,S B k,S (cid:13)(cid:13)(cid:13) ∞→∞ ≤ (cid:107) B ∗ k B k,S (cid:107) ∞→∞ , therefore,Λ( S, π ) ≤ Θ( S, π ) . To make the notation less cluttered, we will write Λ, Θ, Υ and Γ instead of Λ(
S, π ), Θ(
S, π ),Υ(
S, π ) and Γ(
S, π ). Lemma C.1.
Let S ⊂ { , . . . , n } be of cardinality of s . Suppose that Θ ≥ . Then, for any δ > , one has that P ( (cid:107) A ∗ S A S − P S (cid:107) → ≥ δ ) ≤ s exp (cid:18) − mδ / δ/ (cid:19) . (E1) Proof.
We decompose the matrix A ∗ S A S − P S as A ∗ S A S − P S = 1 m m (cid:88) k =1 B ∗ J k ,S B J k ,S π J k − P S = 1 m m (cid:88) k =1 X k , where X k := (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) . It is clear that E X k = 0, and since for all 1 ≤ k ≤ M , (cid:107) B k,S B ∗ k,S (cid:107) → π k ≤ Λ ≤ Θ , we have that (cid:107) X k (cid:107) → ≤ max (cid:13)(cid:13)(cid:13) B ∗ J k ,S B J k ,S (cid:13)(cid:13)(cid:13) → π J k − , ≤ Θ . Lastly, we remark that0 (cid:22) E X k = E (cid:20) B ∗ J k ,S B J k ,S π J k (cid:21) − P S (cid:22) max ≤ k ≤ M (cid:13)(cid:13)(cid:13) B ∗ k,S B k,S (cid:13)(cid:13)(cid:13) → π k E (cid:20) B ∗ J k ,S B J k ,S π J k (cid:21) (cid:22) max ≤ k ≤ M (cid:13)(cid:13)(cid:13) B ∗ k,S B k,S (cid:13)(cid:13)(cid:13) → π k P S (cid:22) Λ P S (cid:22) Θ P S . Therefore, using Theorem B.3, we can set σ = (cid:13)(cid:13)(cid:80) mk =1 E X k (cid:13)(cid:13) → ≤ m Θ. Hence, inequality (E1)immediately follows from Bernstein’s inequality for random matrices (see Theorem B.3). (cid:4)
Lemma C.2.
Let S ⊂ { , . . . , n } , such that | S | = s . Let w be a vector in C n . Then, for any ≤ t ≤ , one has that P ( (cid:107) ( A ∗ S A S − P S ) w (cid:107) ≥ t (cid:107) w (cid:107) ) ≤ exp (cid:18) − mt
8Θ + 14 (cid:19) . (E2)24 roof. Without loss of generality we may assume that (cid:107) w (cid:107) = 1. We remark that( A ∗ S A S − Id s ) w S = 1 m m (cid:88) k =1 (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) w = 1 m m (cid:88) k =1 y k , where y k = (cid:18) B ∗ Jk,S B Jk,S π Jk − P S (cid:19) w is a random vector with zero mean. Simple calculations yieldthat (cid:13)(cid:13)(cid:13)(cid:13) m y k (cid:13)(cid:13)(cid:13)(cid:13) = 1 m (cid:32) w ∗ (cid:18) B ∗ J k ,S B J k ,S π J k (cid:19) w − w ∗ B ∗ J k ,S B J k ,S π J k w + w ∗ w (cid:33) ≤ m (cid:18) Λ w ∗ B ∗ J k ,S B J k ,S π J k w − w ∗ B ∗ J k ,S B J k ,S π J k w + 1 (cid:19) = 1 m (cid:18) (Λ − w ∗ B ∗ J k ,S B J k ,S π J k w + 1 (cid:19) ≤ m (cid:0) (Λ −
2) Λ (cid:107) w (cid:107) + 1 (cid:1) = 1 m ((Λ −
2) Λ + 1) ≤ m (Λ − ≤ m Λ ≤ m Θ . Now, let us define Z = (cid:13)(cid:13) m (cid:80) mk =1 y k (cid:13)(cid:13) . By independence of the random vectors y k , it followsthat E (cid:2) Z (cid:3) = 1 m E (cid:107) y (cid:107) = 1 m E (cid:20)(cid:28) B ∗ J,S B J,S π J w, B ∗ J,S B J,S π J w (cid:29) − (cid:28) B ∗ J,S B J,S π J w, w (cid:29) + (cid:104) w, w (cid:105) (cid:21) = 1 m E (cid:34)(cid:42)(cid:18) B ∗ J,S B J,S π J (cid:19) w, w (cid:43) − (cid:107) B J,S w (cid:107) π J + 1 (cid:35) . To bound the first term in the above equality, one can write E (cid:34)(cid:42)(cid:18) B ∗ J ,S B J ,S π J (cid:19) w, w (cid:43)(cid:35) = (cid:42) E (cid:34)(cid:18) B ∗ J ,S B J ,S π J (cid:19) (cid:35) w, w (cid:43) ≤ Λ (cid:28) E (cid:20)(cid:18) B ∗ J ,S B J ,S π J (cid:19)(cid:21) w, w (cid:29) ≤ Λ (cid:107) w (cid:107) ≤ Θ . One immediately has that E (cid:107) B J,S w (cid:107) π k = (cid:107) w (cid:107) = 1 . Therefore, one finally obtains that E (cid:2) Z (cid:3) ≤ Θ − m ≤ Θ m . Using the above upper bounds, namely (cid:13)(cid:13) m y k (cid:13)(cid:13) ≤ Θ m and E (cid:2) Z (cid:3) ≤ Θ m , the result of the lemmais thus a consequence of the Bernstein’s inequality for random vectors (see Theorem B.2), whichcompletes the proof. (cid:4) Lemma C.3.
Let S ⊂ { , . . . , n } , such that | S | = s . Let v be a vector of C n . Then we have P ( (cid:107) A ∗ S c A S v (cid:107) ∞ ≥ t (cid:107) v (cid:107) ∞ ) ≤ n exp (cid:18) − mt /
4Υ + Θ t/ (cid:19) . (E3) Proof.
Suppose without loss of generality that (cid:107) v (cid:107) ∞ = 1. Then, (cid:107) A ∗ S c A S v (cid:107) ∞ = max i ∈ S c |(cid:104) e i , A ∗ A S v (cid:105)| = max i ∈ S c m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 (cid:28) e i , B ∗ J k B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Z k = (cid:28) e i , B ∗ Jk B Jk,S π Jk v (cid:29) . Note that E Z k = 0, since for i ∈ S c , E (cid:28) e i , B ∗ Jk B Jk,S π Jk v (cid:29) = e ∗ i (cid:80) Mk =1 π k B ∗ k B k,S π k v = e ∗ i P S v = 0. From Holder’s inequality, we get | Z k | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) e i , B ∗ J k B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) e ∗ i B ∗ J k B J k ,S π J k v (cid:12)(cid:12)(cid:12)(cid:12) ≤ max j ∈ Sc ≤ k ≤ M π k (cid:13)(cid:13) B ∗ k,S B k e j (cid:13)(cid:13) (cid:107) v (cid:107) ∞ ≤ max j ∈ Sc ≤ k ≤ M π k (cid:13)(cid:13) e ∗ j B ∗ k B k,S (cid:13)(cid:13) = Θ . Furthermore, E | Z k | = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) e i , B ∗ J k B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) = M (cid:88) (cid:96) =1 | e ∗ i B ∗ (cid:96) B (cid:96),S v | π (cid:96) ≤ Υ . Therefore (cid:80) mk =1 E | Z k | ≤ m Υ. Using real-valued Bernstein’s inequality B.1 in the case ofcomplex random variables, we obtain P (cid:32) m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 (cid:28) e i , B ∗ J k B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:33) ≤ P (cid:32) m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 Re (cid:28) e i , B ∗ J k B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t/ √ (cid:33) ... + P (cid:32) m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 Im (cid:28) e i , B ∗ J k B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t/ √ (cid:33) ≤ (cid:18) − mt /
4Υ + Θ t/ (cid:19) . Taking the union bound over i ∈ S c completes the proof. (cid:4) Lemma C.4.
Let S ⊂ { , . . . , n } , such that | S | = s . Suppose that Θ ≥ .Let v be a vector of C n . Then we have P ( (cid:107) ( A ∗ S A S − P S ) v (cid:107) ∞ ≥ t (cid:107) v (cid:107) ∞ ) ≤ s exp (cid:18) − mt /
4Υ + Θ t/ (cid:19) . (E4) Proof.
Suppose without loss of generality that (cid:107) v (cid:107) ∞ = 1. Then, (cid:107) ( A ∗ S A S − P S ) v (cid:107) ∞ = max i ∈ S |(cid:104) e i , ( A ∗ S A S − P S ) v (cid:105)| = max i ∈ S m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 (cid:28) e i , (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Let us define Z k = (cid:28) e i , (cid:18) B ∗ Jk,S B Jk,S π Jk − P S (cid:19) v (cid:29) . Note that E Z k = 0. From Holder’s inequality,we get | Z k | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) e i , (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:13)(cid:13)(cid:13)(cid:13) B ∗ J k ,S B J k ,S π J k − P S (cid:13)(cid:13)(cid:13)(cid:13) ∞→∞ ≤ max(Θ − , ≤ Θ , (cid:107) B ∗ k,S B k,S (cid:107) ∞→∞ ≤ (cid:107) B ∗ k B k,S (cid:107) ∞→∞ , and using the same argument as in Lemma C.3. Fur-thermore, E | Z k | = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) e i , (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) e i , B ∗ J k ,S B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:104) e i , v (cid:105) E (cid:28) e i , B ∗ J k ,S B J k ,S π J k v (cid:29) − (cid:104) e i , v (cid:105) ∗ E (cid:28) e i , B ∗ J k ,S B J k ,S π J k v (cid:29) + |(cid:104) e i , v (cid:105)| = E (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) e i , B ∗ J k ,S B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) − |(cid:104) e i , v (cid:105)| ≤ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) e i , B ∗ J k ,S B J k ,S π J k v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) = M (cid:88) (cid:96) =1 (cid:12)(cid:12)(cid:12) e ∗ i B ∗ (cid:96),S B (cid:96),S v (cid:12)(cid:12)(cid:12) π (cid:96) ≤ Υ . Therefore, (cid:80) mk =1 E | Z k | ≤ m Υ, and using real-valued Bernstein’s inequality B.1 in the case ofcomplex random variables, we obtain P (cid:32) m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 (cid:28) e i , (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:33) ≤ P (cid:32) m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 Re (cid:28) e i , (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t/ √ (cid:33) + P (cid:32) m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) k =1 Im (cid:28) e i , (cid:18) B ∗ J k ,S B J k ,S π J k − P S (cid:19) v (cid:29)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t/ √ (cid:33) ≤ (cid:18) − mt /
4Υ + Θ t/ (cid:19) . Taking the union bound over i ∈ S completes the proof. (cid:4) Lemma C.5.
Let S be a subset of { , . . . , n } . Then, for any ≤ t ≤ m , one has that P (cid:18) max i ∈ S c (cid:107) A ∗ S Ae i (cid:107) ≥ t (cid:19) ≤ n exp − (cid:16) √ mt/ √ Θ − (cid:17) . (E5) Proof.
Let us fix some i ∈ S c . For k = 1 , . . . , M , we define the random vector x k := B ∗ J k ,S B J k π J k e i . Then, since i ∈ S c one easily gets E x k = (cid:80) M(cid:96) =1 B ∗ (cid:96),S B (cid:96) e i = (cid:80) M(cid:96) =1 ( B (cid:96) P S ) ∗ B (cid:96) e i = P S (cid:80) M(cid:96) =1 B ∗ (cid:96) B (cid:96) e i = P S e i = 0 (note that P S is self-adjoint). In addition, we can write (cid:107) A ∗ S Ae i (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m m (cid:88) k =1 B ∗ J k ,S B J k π J k e i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m M (cid:88) k =1 x k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . Then, (cid:107) x k (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) B ∗ J k ,S B J k π J k e i (cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13) B ∗ J k ,S B J k π J k e i (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13) e ∗ i B ∗ J k B J k ,S π J k (cid:13)(cid:13)(cid:13)(cid:13) ≤ π J k (cid:13)(cid:13) B ∗ J k B J k ,S (cid:13)(cid:13) ∞→∞ ≤ Θ . Furthermore, one has that E (cid:107) x k (cid:107) = E (cid:13)(cid:13)(cid:13)(cid:13) B ∗ J k ,S B J k π J k e i (cid:13)(cid:13)(cid:13)(cid:13) ≤ E (cid:13)(cid:13)(cid:13)(cid:13) B J k ,S √ π J k (cid:13)(cid:13)(cid:13)(cid:13) → (cid:13)(cid:13)(cid:13)(cid:13) B J k √ π J k e i (cid:13)(cid:13)(cid:13)(cid:13) ≤ Λ E (cid:13)(cid:13)(cid:13)(cid:13) B J k √ π J k e i (cid:13)(cid:13)(cid:13)(cid:13) = Λ (cid:107) e i (cid:107) = Λ , m (cid:88) k =1 E (cid:107) x k (cid:107) ≤ m Λ ≤ m Θ . P ( (cid:107) A ∗ S Ae i (cid:107) ≥ t ) ≤ exp − (cid:16) √ mt/ √ Θ − (cid:17) , Finally, Inequality (E4) follows from a union bound over i ∈ S c , which completes the proof. (cid:4) D Proof of results in Applications
D.1 Proof of Corollary 4.1
The proof relies on the evaluation of Θ and Υ in the case of isolated measurements. In this case,we have n blocks composed of isolated measurements. Then, each block corresponds to one ofthe rows ( a ∗ k ) ≤ k ≤ n of A . Recall that (cid:107) a k a ∗ k,S (cid:107) ∞→∞ = max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ | e ∗ i a k a ∗ k,S v | , so thenorm (cid:107) a k a ∗ k,S (cid:107) ∞→∞ is the maximum (cid:96) -norm of the rows of the matrix a k a ∗ k,S . Therefore, thequantities in Definition 3.1 can be rewritten as followsΘ( S, π ) := max ≤ k ≤ n (cid:107) a k a ∗ k,S (cid:107) ∞→∞ π k = max ≤ k ≤ n (cid:107) a k (cid:107) ∞ (cid:107) a k,S (cid:107) π k (31) ≤ s · max ≤ k ≤ n (cid:107) a k (cid:107) ∞ π k , Υ( S, π ) = max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k | e ∗ i a k | | a ∗ k,S v | (32) ≤ sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k (cid:107) a k (cid:107) ∞ | a ∗ k,S v | ≤ sup (cid:107) v (cid:107) ∞ ≤ max ≤ (cid:96) ≤ n (cid:107) a (cid:96) (cid:107) ∞ π (cid:96) n (cid:88) k =1 | a ∗ k,S v | = sup (cid:107) v (cid:107) ∞ ≤ (cid:107) A P S v (cid:107) max ≤ (cid:96) ≤ n (cid:107) a (cid:96) (cid:107) ∞ π (cid:96) = sup (cid:107) v (cid:107) ∞ ≤ (cid:107) P S v (cid:107) max ≤ (cid:96) ≤ n (cid:107) a (cid:96) (cid:107) ∞ π (cid:96) ≤ s · max ≤ k ≤ n (cid:107) a k (cid:107) ∞ π k . Therefore we can choose Γ(
S, π ) = s · max ≤ k ≤ n (cid:107) a k (cid:107) ∞ π k , and the result follows by Theorem 3.3. D.2 Around Corollary 4.3
D.2.1 Proof of Corollary 4.3
Again, this is all about evaluating Θ and Υ in this specific case. Concerning the evaluation ofΥ, we can use the expression (32) to conclude thatΥ(
S, π ) = max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k | e ∗ i a k | | a ∗ k,S v | .
28o control Θ, using (31), it suffices to write:Θ(
S, π ) = max ≤ k ≤ n (cid:107) a k a ∗ k,S (cid:107) ∞→∞ π k ≤ max ≤ k ≤ n (cid:107) a k (cid:107) ∞ (cid:107) a k,S (cid:107) π k ≤ max ≤ k ≤ n (cid:107) a k (cid:107) ∞ (cid:80) N(cid:96) =1 (cid:107) a k, Ω (cid:96) (cid:107) ∞ s (cid:96) π k . By Theorem 3.3, the two conditions m ≥ C (cid:32) max ≤ k ≤ n (cid:80) N(cid:96) =1 s (cid:96) (cid:107) a k, Ω (cid:96) (cid:107) ∞ (cid:107) a k (cid:107) ∞ π k (cid:33) ln( s ) ln (cid:16) nε (cid:17) ,m ≥ C (cid:32) max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k | e ∗ i a k | (cid:12)(cid:12) a ∗ k,S v (cid:12)(cid:12) (cid:33) ln( s ) ln (cid:16) nε (cid:17) , lead to the desired conclusion. D.2.2 Comparison of Corollary 4.3 and the results in [AHPR13].
Note that the sampling in [AHPR13] is based on Bernoulli drawings structured by level. Theirresults are then easily transposable to the case of i.i.d. sampling with constant probability bylevel. The first condition on m in Corollary 4.3 is similar to condition (4.4) in Theorem 4.4of [AHPR13], since we recognize the term (cid:107) a k, Ω (cid:96) (cid:107) ∞ (cid:107) a k (cid:107) ∞ π k as the ( k, (cid:96) )-local coherence defined in[AHPR13]. Let us show that the second condition on m is similar to equation (4.5) in [AHPR13].First, observe thatmax ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k | e ∗ i a k | (cid:12)(cid:12) a ∗ k,S v (cid:12)(cid:12) ≤ max ≤ (cid:96) ≤ N sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k (cid:107) a k, Ω (cid:96) (cid:107) ∞ (cid:12)(cid:12) a ∗ k,S v (cid:12)(cid:12) ≤ max ≤ (cid:96) ≤ N sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k (cid:107) a k, Ω (cid:96) (cid:107) ∞ (cid:107) a k (cid:107) ∞ (cid:12)(cid:12) a ∗ k,S v (cid:12)(cid:12) . Let ˜ v denote the maximizer in the last expression, and define (cid:101) s k = (cid:12)(cid:12)(cid:12) a ∗ k,S ˜ v (cid:12)(cid:12)(cid:12) for 1 ≤ k ≤ n . Itfollows, max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ n (cid:88) k =1 π k | e ∗ i a k | (cid:12)(cid:12) a ∗ k,S v (cid:12)(cid:12) ≤ max ≤ (cid:96) ≤ N n (cid:88) k =1 π k (cid:107) a k, Ω (cid:96) (cid:107) ∞ (cid:107) a k (cid:107) ∞ (cid:101) s k , (33)and (cid:80) nk =1 (cid:101) s k = (cid:80) nk =1 (cid:12)(cid:12)(cid:12) a ∗ k,S v (cid:12)(cid:12)(cid:12) = (cid:107) A P S (cid:101) v (cid:107) = (cid:107) P S (cid:101) v (cid:107) ≤ (cid:80) N(cid:96) =1 s (cid:96) . The last inequality andEquation (33) for i.i.d sampling correspond to the condition (4.5) in Theorem 4.4 of [AHPR13]in the case of Bernoulli sampling. This completes the comparison between Corollary 4.3 and theresults in [AHPR13]. D.3 Proof of Corollary 4.4
Recall that (Ω j ) ≤ j ≤ J the dyadic partition of the set of indexes { , . . . , n } . Recall also thefunction j : { , . . . , n } → { , . . . , J } defined by j ( u ) = j if u ∈ Ω j . In the interests of simplifyingnotation, in this section, the symbol ’ (cid:38) ’ will be equivalent to ’ ≥ C · ’, with C a universal constant.The following lemma will be useful to bound above the coefficients of A in absolute value, andto derive Lemmas D.2 and D.3. Lemma D.1. [AHR14a] The magnitude of the coefficients of matrix A = F φ ∗ , where F is the1D Fourier transform and φ is the 1D Haar transform, satisfies (cid:107) P Ω j A P Ω (cid:96) (cid:107) →∞ (cid:46) − j −| j − (cid:96) | , for ≤ j, (cid:96) ≤ J. (34)29 emma D.2. In the case of isolated measurements, with A = F φ ∗ with φ to be the inverse1D Haar transform, suppose that the signal to reconstruct x is sparse by levels, meaning that (cid:107) P Ω j x (cid:107) ≤ s j for ≤ j ≤ J . Then, Θ (cid:46) max ≤ k ≤ n − j ( k ) π k s j ( k ) + J (cid:88) (cid:96) =0 (cid:96) (cid:54) = j ( k ) s (cid:96) −| j ( k ) − (cid:96) | / . (35) Choosing π k to be constant by level, i.e. π k = ˜ π j ( k ) , the last expression can be rewritten asfollows Θ (cid:46) max ≤ j ≤ J − j ˜ π j s j + J (cid:88) (cid:96) =0 (cid:96) (cid:54) = j s (cid:96) −| j − (cid:96) | / . (36) Proof.
Using (31), we can writeΘ = max ≤ k ≤ n (cid:107) a k (cid:107) ∞ (cid:107) a k,S (cid:107) π k ≤ max ≤ k ≤ n (cid:107) a k (cid:107) ∞ (cid:80) J(cid:96) =0 (cid:107) a k, Ω (cid:96) (cid:107) ∞ s (cid:96) π k (cid:46) max ≤ k ≤ n π k − j ( k ) / J (cid:88) (cid:96) =0 − j ( k ) / −| j ( k ) − (cid:96) | / s (cid:96) (cid:46) max ≤ k ≤ n π k − j ( k ) J (cid:88) (cid:96) =0 −| j ( k ) − (cid:96) | / s (cid:96) , where we use (34) to bound above (cid:107) a k, Ω (cid:96) (cid:107) ∞ . (cid:4) Lemma D.3.
In the case of isolated measurements, with A = F φ ∗ with φ to be the inverse Haartransform, suppose that the signal to reconstruct x is sparse by levels, meaning that (cid:107) P Ω j x (cid:107) ≤ s j for ≤ j ≤ J . Choosing π k to be constant by level, i.e. π k = ˜ π j ( k ) , we have Υ (cid:46) max ≤ j ≤ J π j − j J (cid:88) p =0 −| j − p | / s p . (37) Proof.
Denoting ˜ v = ˜ v ( i ) the argument of the supremum in the definition of Υ, we getΥ := max ≤ i ≤ n n (cid:88) k =1 π k | e ∗ i a k | | a k,S ˜ v | ≤ max ≤ (cid:96) ≤ J n (cid:88) k =1 π k (cid:107) a k, Ω (cid:96) (cid:107) ∞ | a k,S ˜ v | (cid:46) max ≤ (cid:96) ≤ J n (cid:88) k =1 π k − j ( k ) −| j ( k ) − (cid:96) | | a k,S ˜ v | (cid:46) max ≤ (cid:96) ≤ J J (cid:88) j =0 π j − j −| j − (cid:96) | (cid:88) k ∈ Ω j | a k,S ˜ v | (cid:124) (cid:123)(cid:122) (cid:125) =: K j We can rewrite K j as follows K j = (cid:107) P Ω j A P S ˜ v (cid:107) . Therefore, since (cid:107) ˜ v (cid:107) ∞ ≤ (cid:112) K j = (cid:107) P Ω j A P S ˜ v (cid:107) = (cid:107) P Ω j A J (cid:88) p =0 P Ω p P S ˜ v (cid:107) ≤ J (cid:88) p =0 (cid:107) P Ω j A P Ω p P S ˜ v (cid:107) ≤ J (cid:88) p =0 (cid:107) P Ω j A P Ω p (cid:107) → (cid:107) P Ω p P S ˜ v (cid:107) ≤ J (cid:88) p =0 (cid:107) P Ω j A P Ω p (cid:107) → √ s p . (cid:107) P Ω j A P Ω p (cid:107) → (cid:46) −| j − p | / , for 0 ≤ j, p ≤ J. Then, (cid:112) K j (cid:46) (cid:80) Jp =0 −| j − p | / √ s p , and thus K j (cid:46) J (cid:88) p =0 −| j − p | / √ s p (cid:46) J (cid:88) p =0 −| j − p | / J (cid:88) p =0 −| j − p | / s p (cid:46) J (cid:88) p =0 −| j − p | / s p where in the second inequality we use Cauchy-Schwarz inequality. Therefore,Υ (cid:46) max ≤ (cid:96) ≤ J J (cid:88) j =0 −| j − (cid:96) | π j − j J (cid:88) p =0 −| j − p | / s p (cid:46) max ≤ (cid:96) ≤ J J (cid:88) j =0 −| j − (cid:96) | max ≤ j ≤ J π j − j J (cid:88) p =0 −| j − p | / s p (cid:46) max ≤ j ≤ J π j − j J (cid:88) p =0 −| j − p | / s p . (cid:4) Note that the upper bounds given in Lemmas D.2 and D.3 coincide. Therefore, we can applyTheorem 3.3 with the following upper bound for Γ(
S, π )Γ(
S, π ) (cid:46) max ≤ j ≤ J π j − j J (cid:88) p =0 −| j − p | / s p , and conclude the proof for Corollary 4.4. D.4 Proof of Corollary 4.7
Recall that A = φ ⊗ φ ∈ C n × n , where φ ∈ C √ n ×√ n is a 1D orthogonal transform. Consider ablocks dictionary made of √ n horizontal lines, i.e. for 1 ≤ k ≤ √ nB k = (cid:16) φ k, φ, . . . , φ k, √ n φ (cid:17) , and thus B ∗ k B k = (cid:16) φ ∗ k,i φ k,j Id √ n (cid:17) ≤ i,j ≤√ n . Now, let us fix that the signal support S is concentrated on q horizontal lines of the spatialplane. Formally, S ⊂ { ( j − √ n + { , . . . , √ n } , j ∈ J } (38)where J ⊂ { , . . . , √ n } and | J | = q . Therefore, B ∗ k B k,S = (cid:16) δ j ∈ J φ ∗ k,i φ k,j Id √ n (cid:17) ≤ i,j ≤√ n , where δ j ∈ J = 1 if j ∈ J and 0 otherwise. In such a setting, the quantities in Definition 3.1 canbe rewritten as follows:Θ( S, π ) = max ≤ k ≤ M max ≤ i ≤ n (cid:107) e ∗ i B ∗ k B k,S (cid:107) π k = max ≤ k ≤√ n max ≤ ˜ i ≤√ n | φ k, ˜ i | (cid:80) j ∈ J | φ k,j | π k ≤ max ≤ k ≤√ n q (cid:107) φ k, : (cid:107) ∞ π k . (39)31ecall that Υ( S, π ) := max ≤ i ≤ n sup (cid:107) v (cid:107) ∞ ≤ M (cid:88) k =1 π k | e ∗ i B ∗ k B k,S v | , and call ( i (cid:63) , v ) the argument of the supremum over { , . . . , n } and { u, (cid:107) u (cid:107) ∞ ≤ } . Therefore,Υ( S, π ) = M (cid:88) k =1 π k | e ∗ i (cid:63) B ∗ k B k,S v | . We can decompose i (cid:63) = ( i − √ n + i with i , i integers of { , . . . , √ n } . We can writeΥ( S, π ) = √ n (cid:88) k =1 π k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n (cid:88) j =1 δ j ∈ J φ ∗ k,i φ k,j e ∗ i v [ j ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = √ n (cid:88) k =1 π k | φ k,i | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ n (cid:88) j =1 δ j ∈ J φ k,j w j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where w ∈ C √ n such that w j = e ∗ i v [ j ] and v [ j ] ∈ C √ n is the restriction of v to the j -th horizontalline, i.e. to the components of v indexed by { ( j − √ n + 1 , . . . , j √ n } . We can rewrite the lastexpression as followsΥ( S, π ) = √ n (cid:88) k =1 π k | φ k,i | |(cid:104) e k , φP J w (cid:105)| ≤ max ≤ (cid:96) ≤√ n π (cid:96) | φ (cid:96),i | √ n (cid:88) k =1 |(cid:104) e k , φP J w (cid:105)| = max ≤ (cid:96) ≤√ n π (cid:96) | φ (cid:96),i | (cid:107) φP J w (cid:107) = max ≤ (cid:96) ≤√ n π (cid:96) | φ (cid:96),i | (cid:107) P J w (cid:107) ≤ max ≤ (cid:96) ≤√ n π (cid:96) | φ (cid:96),i | · q, where in the last expression we use that (cid:107) w (cid:107) ∞ ≤
1. Choosing φ as the 1D Fourier transform gives (cid:107) φ (cid:96), : (cid:107) ∞ = n / and choosing a uniform sampling among the √ n horizontal lines, i.e. π (cid:63)(cid:96) = 1 / √ n for 1 ≤ (cid:96) ≤ √ n , leads to Γ( S, π (cid:63) ) ≤ q, which ends the proof of Corollary 4.7. D.5 Proof of Corollary 4.9
We recall that the sampling matrix is then constructed from the full sampling matrix A ∈ C n × n ,in the 2D setting, where A = F D Ψ ∗ with F D ∈ C n × n the 2D Fourier transform and Ψ ∗ ∈ C n × n the 2D inverse wavelet transform. Since both transforms are separable, F D = F ⊗ F , Ψ = ψ ⊗ ψ , with ⊗ the Kronecker product and F , ψ ∈ C √ n ×√ n the corresponding 1D transforms.Then A can also be rewritten as A = φ ⊗ φ , the Kronecker product of the 1D transforms φ := F ψ ∗ ∈ C √ n ×√ n .In this section, in order to avoid any confusion, we will denote by (cid:16) e ( n ) i (cid:17) ≤ i ≤ n the canonicalbasis in dimension n .In Corollary 4.9, we focus on the case where A = φ ⊗ φ ∈ C n × n is the 2D Fourier-Shannonwavelet transform, then φ ∈ C √ n ×√ n is the 1D Fourier-Shannon wavelets transform. Therefore, φ and A are block-diagonal orthogonal matrices. The sensing schemes are based on horizontallines on the 2D plane, meaning that B k = (cid:16) φ k, φ . . . φ k, √ n φ (cid:17) , for k = 1 , . . . , √ n . By defintion of the Fourier-Shannon transform, we have that B ∗ k B k = (cid:16) φ ∗ k,(cid:96) φ k,m Id √ n (cid:17) ≤ (cid:96),m ≤√ n = 12 j ( k ) (cid:16) δ (cid:96) ∈ τ j ( k ) δ m ∈ τ j ( k ) Id √ n (cid:17) ≤ (cid:96),m ≤√ n , k = 1 , . . . , √ n , where δ (cid:96) ∈ τ j = 1 if (cid:96) ∈ τ j , and 0 otherwise.First let us start with the evaluation of Θ. By definition of (cid:107) · (cid:107) ∞→∞ , we have (cid:107) B ∗ k B k,S (cid:107) ∞→∞ = max ≤ (cid:96) ≤ n sup (cid:107) v (cid:107)∞≤ v ∈ C n (cid:12)(cid:12)(cid:12)(cid:16) e ( n ) (cid:96) (cid:17) ∗ B ∗ k B k P S v (cid:12)(cid:12)(cid:12) . Setting ˜ v = ˜ v ( k ) the argument of the supremum in the last expression, thenΘ := max ≤ k ≤√ n max ≤ (cid:96) ≤ n π k (cid:12)(cid:12)(cid:12)(cid:16) e ( n ) (cid:96) (cid:17) ∗ B ∗ k B k P S ˜ v (cid:12)(cid:12)(cid:12) , Note that (cid:107) ˜ v (cid:107) ∞ ≤
1. The index (cid:96) can be rewritten as (cid:96) = ( (cid:96) − √ n + (cid:96) , with 1 ≤ (cid:96) , (cid:96) ≤ √ n .Θ := max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) (cid:16) φ k,m (cid:16) e ( √ n ) (cid:96) (cid:17) ∗ (cid:17) ≤ m ≤√ n P S ˜ v (cid:12)(cid:12)(cid:12)(cid:12) , = max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) √ n (cid:88) m =1 φ k,m (cid:16) e ( √ n ) (cid:96) (cid:17) ∗ ( P S ˜ v ) [ m ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where ( v ) [ m ] ∈ C √ n is the restriction of the vector v to the m -th horizontal line, i.e. tothe components indexed by { ( m − √ n + 1 , . . . , m √ n } . Set w | ( m ) := ( P S ˜ v ) [ m ] ∈ C √ n , therestriction of P S ˜ v to the m -th horizontal line. Then the (cid:96) -th component of w | ( m ) , written as w | ( m ) (cid:96) is equal to (cid:16) e ( √ n ) (cid:96) (cid:17) ∗ ( P S ˜ v ) [ m ]. Note that (cid:12)(cid:12)(cid:12) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) ≤ m − √ n + (cid:96) ∈ S , and it isequal to 0 otherwise. Then,Θ ≤ max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) √ n (cid:88) m =1 φ k,m w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (40)By the properties of block-diagonality of the Fourier-Shannon transform, we haveΘ ≤ max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) (cid:88) m ∈ τ j ( k ) φ k,m w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (41) ≤ max ≤ k ≤√ n max ≤ (cid:96) ≤√ n π k (cid:107) φ k, : (cid:107) ∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) m ∈ τ j ( k ) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max ≤ k ≤√ n max ≤ (cid:96) ≤√ n π k (cid:107) φ k, : (cid:107) ∞ (cid:88) m ∈ τ j ( k ) (cid:12)(cid:12)(cid:12) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) (cid:46) max ≤ k ≤√ n π k j ( k ) s cj ( k ) . (42)Indeed, (cid:80) m ∈ τ j ( k ) (cid:12)(cid:12)(cid:12) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) is bounded above by (cid:80) m ∈ τ j ( k ) δ ( m − √ n + (cid:96) ∈ S , which counts the numberof intersections between S , the (cid:96) th-column and the j ( k ) (horizontal) level, see the blue line inFigure 3. Taking the maximum over 1 ≤ (cid:96) ≤ √ n leads to (cid:80) m ∈ τ j ( k ) δ ( m − √ n + (cid:96) ∈ S ≤ s cj ( k ) .Secondly, let us evaluate Υ. We have thatΥ := max ≤ (cid:96) ≤ n √ n (cid:88) k =1 π k (cid:12)(cid:12)(cid:12)(cid:16) e ( n ) (cid:96) (cid:17) ∗ B ∗ k B k ˜ v (cid:12)(cid:12)(cid:12) , v = ˜ v ( (cid:96) ) is the argument of the supremum on the (cid:96) ∞ unit-ball. Using (41), with (cid:96) =( (cid:96) − √ n + (cid:96) , we can rewriteΥ = max ≤ (cid:96) ,(cid:96) ≤√ n √ n (cid:88) k =1 π k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) √ n (cid:88) m =1 φ k,m w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where w | ( m ) (cid:96) := (cid:16) e ( √ n ) (cid:96) (cid:17) ∗ ( P S ˜ v ) [ m ]. Note again that (cid:12)(cid:12)(cid:12) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) ≤ m − √ n + (cid:96) ∈ S , and itis equal to 0 otherwise. By denoting w | (: ,(cid:96) ) the vector with components w | (: ,(cid:96) ) := (cid:16) w | (1) (cid:96) , w | (2) (cid:96) , . . . , w | ( √ n ) (cid:96) (cid:17) ∗ , (43)we can rewrite the previous quantity as followsΥ = max ≤ (cid:96) ,(cid:96) ≤√ n √ n (cid:88) k =1 π k (cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) (cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) (44)= max ≤ (cid:96) ,(cid:96) ≤√ n √ n (cid:88) k =1 π k | φ k,(cid:96) | (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) . Since φ is an orthogonal block-diagonal transform, we haveΥ = max ≤ (cid:96) ,(cid:96) ≤√ n (cid:88) k ∈ τ j ( (cid:96) π k | φ k,(cid:96) | (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) . Choosing π k = ˜ π j for k ∈ τ j meaning that the probability of drawing lines is constant by levels,we can write that Υ = max ≤ (cid:96) ,(cid:96) ≤√ n π j ( (cid:96) ) (cid:88) k ∈ τ j ( (cid:96) | φ k,(cid:96) | (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) , ≤ max ≤ (cid:96) ,(cid:96) ≤√ n π j ( (cid:96) ) (cid:88) k ∈ τ j ( (cid:96) (cid:107) φ k, : (cid:107) ∞ (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) , (cid:46) max ≤ (cid:96) ,(cid:96) ≤√ n − j ( (cid:96) ) ˜ π j ( (cid:96) ) (cid:88) k ∈ τ j ( (cid:96) (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) , = max ≤ (cid:96) ,(cid:96) ≤√ n − j ( (cid:96) ) ˜ π j ( (cid:96) ) (cid:13)(cid:13)(cid:13) P τ j ( (cid:96) φw | (: ,(cid:96) ) (cid:13)(cid:13)(cid:13) . Since φ is orthogonal and block diagonal we have (cid:13)(cid:13)(cid:13) P τ j ( (cid:96) φw | (: ,(cid:96) ) (cid:13)(cid:13)(cid:13) = (cid:107) P τ j ( (cid:96) w | (: ,(cid:96) ) (cid:107) . Then,Υ (cid:46) max ≤ (cid:96) ,(cid:96) ≤√ n − j ( (cid:96) ) ˜ π j ( (cid:96) ) (cid:13)(cid:13)(cid:13) P τ j ( (cid:96) w | (: ,(cid:96) ) (cid:13)(cid:13)(cid:13) , (cid:46) max ≤ (cid:96) ≤√ n − j ( (cid:96) ) ˜ π j ( (cid:96) ) s cj ( (cid:96) ) , (45)where the last step invokes that (cid:107) P τ j ( (cid:96) w | (: ,(cid:96) ) (cid:107) ≤ (cid:80) m ∈ τ j ( (cid:96) δ ( m − √ n + (cid:96) ∈ S ≤ s cj ( (cid:96) ) . Note thatthe upper bounds (42) and (45) on Υ and Θ coincide. They lead to the following choice for1 ≤ k ≤ √ n , π k = ˜ π j ( k ) = s cj ( k ) − j ( k ) (cid:80) √ n(cid:96) =1 s cj ( (cid:96) ) − j ( (cid:96) ) = s cj ( k ) − j ( k ) (cid:80) Jj =0 (cid:80) (cid:96) ∈ τ j s cj − j = s cj ( k ) − j ( k ) (cid:80) Jj =0 s cj . , Υ) (cid:46) J (cid:88) j =0 s cj . To conclude, by Theorem 3.3, a lower bound on the required number of horizontal lines toacquire is thus m (cid:38) J (cid:88) j =0 s cj ln( s ) ln( n/ε ) . D.6 Proof of Corollary 4.10
In this part, using the formalism introduced in the last section, ψ is the 1D Haar transform, and φ is then the Fourier-Haar’s wavelet transform. In such a case, we can reuse (40) in Section D.5to evaluate Θ: Θ ≤ max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) φ k,(cid:96) √ n (cid:88) m =1 φ k,m w m [ (cid:96) ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Using Lemma D.1, we have for 1 ≤ k, m ≤ √ n , | φ k,m | (cid:46) − j ( k ) / −| j ( k ) − j ( m ) | / . Therefore, Θ ≤ max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12) φ ∗ k,(cid:96) (cid:12)(cid:12) √ n (cid:88) m =1 (cid:12)(cid:12)(cid:12) φ k,m w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) ≤ max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12) φ ∗ k,(cid:96) (cid:12)(cid:12) J (cid:88) j =0 (cid:88) m ∈ τ j | φ k,m | (cid:12)(cid:12)(cid:12) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) ≤ max ≤ k ≤√ n max ≤ (cid:96) ,(cid:96) ≤√ n π k (cid:12)(cid:12) φ ∗ k,(cid:96) (cid:12)(cid:12) J (cid:88) j =0 (cid:88) m ∈ τ j | φ k,m | (cid:12)(cid:12)(cid:12) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) (cid:46) max ≤ k ≤√ n max ≤ (cid:96) ≤√ n π k − j ( k ) J (cid:88) j =0 −| j ( k ) − j | / (cid:88) m ∈ τ j (cid:12)(cid:12)(cid:12) w | ( m ) (cid:96) (cid:12)(cid:12)(cid:12) (cid:46) max ≤ k ≤√ n π k − j ( k ) J (cid:88) j =0 −| j ( k ) − j | / s cj . (46)Now let us study Υ. Recall the definition of w | (: ,(cid:96) ) depending on (cid:96) in (43), we can reuse(44) to have Υ = max ≤ (cid:96) ,(cid:96) ≤√ n √ n (cid:88) k =1 π k (cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) (cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) = max ≤ (cid:96) ,(cid:96) ≤√ n √ n (cid:88) k =1 π k (cid:12)(cid:12) φ ∗ k,(cid:96) (cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) , = max ≤ (cid:96) ,(cid:96) ≤√ n J (cid:88) j =0 π j (cid:88) k ∈ τ j (cid:12)(cid:12) φ ∗ k,(cid:96) (cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) ,
35y choosing π k = ˜ π j for k ∈ τ j , meaning that the drawing probability is constant by level. Sincefor k ∈ τ j , we have (cid:12)(cid:12)(cid:12) φ ∗ k,(cid:96) (cid:12)(cid:12)(cid:12) ≤ − j −| j − j ( (cid:96) ) | by Lemma D.1. Then,Υ = max ≤ (cid:96) ,(cid:96) ≤√ n J (cid:88) j =0 π j − j −| j − j ( (cid:96) ) | (cid:88) k ∈ τ j (cid:12)(cid:12)(cid:12)(cid:68) φ ∗ k, : , w | (: ,(cid:96) ) (cid:69)(cid:12)(cid:12)(cid:12) (cid:124) (cid:123)(cid:122) (cid:125) =: K j . Dealing with K j , we can derive that (cid:112) K j = (cid:13)(cid:13)(cid:13) P τ j φ ∗ w | (: ,(cid:96) ) (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) P τ j φ ∗ J (cid:88) r =0 P τ r w | (: ,(cid:96) ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ J (cid:88) r =0 (cid:13)(cid:13) P τ j φ ∗ P τ r (cid:13)(cid:13) → (cid:13)(cid:13)(cid:13) P τ r w | (: ,(cid:96) ) (cid:13)(cid:13)(cid:13) (cid:46) J (cid:88) r =0 −| j − r | / (cid:112) s cr , where the upper bound (cid:13)(cid:13) P τ j φ ∗ P τ r (cid:13)(cid:13) → (cid:46) −| j − r | / can be found in [AHR14a, Lemma 4.3].Then, K k (cid:46) (cid:32) J (cid:88) r =0 −| j − r | / (cid:112) s cr (cid:33) (cid:46) (cid:32) J (cid:88) r =0 −| j − r | / (cid:33) (cid:32) J (cid:88) r =0 −| j − r | / s cr (cid:33) (cid:46) J (cid:88) r =0 −| j − r | / s cr . Therefore, Υ (cid:46) max ≤ (cid:96) ≤√ n J (cid:88) j =0 π j − j −| j − j ( (cid:96) ) | J (cid:88) r =0 −| j − r | / s cr (cid:46) max ≤ (cid:96) ≤√ n J (cid:88) j =0 −| j − j ( (cid:96) ) | (cid:32) max ≤ j ≤ J − j ˜ π j J (cid:88) r =0 −| j − r | / s cr (cid:33) (cid:46) max ≤ j ≤ J − j ˜ π j J (cid:88) r =0 −| j − r | / s cr . (47)The upper bounds (46) and (47) givemax(Θ , Υ) (cid:46) max ≤ j ≤ J − j ˜ π j J (cid:88) r =0 −| j − r | / s cr . Therefore, by Theorem 3.3, a lower bound on the required number of horizontal lines is m (cid:38) max ≤ j ≤ J − j ˜ π j J (cid:88) r =0 −| j − r | / s cr ln( n/ε ) ln( s ) . By choosing π k = ˜ π j ( k ) = 2 − j ( k ) (cid:80) Jr =0 −| j ( k ) − r | / s cr (cid:80) √ n(cid:96) =1 − j ( (cid:96) ) (cid:80) Jr =0 −| j ( (cid:96) ) − r | / s cr , ≤ k ≤ √ n , the lower bound on the required number of horizontal lines can be rewritten as m (cid:38) √ n (cid:88) (cid:96) =1 − j ( (cid:96) ) J (cid:88) r =0 −| j ( (cid:96) ) − r | / s cr · ln( n/ε ) ln( s ) (cid:38) J (cid:88) j =0 (cid:88) (cid:96) ∈ τ j − j J (cid:88) r =0 −| j − r | / s cr · ln( n/ε ) ln( s ) (cid:38) J (cid:88) j =0 J (cid:88) r =0 −| j − r | / s cr · ln( n/ε ) ln( s ) (cid:38) J (cid:88) j =0 s cj + J (cid:88) r =0 r (cid:54) = j −| j − r | / s cr · ln( n/ε ) ln( s ) , which concludes the proof of Corollary 4.10. References [AH15] Ben Adcock and Anders C Hansen. Generalized sampling and infinite-dimensionalcompressed sensing.
Foundations of Computational Mathematics , pages 1–61, 2015.[AHPR13] Ben Adcock, Anders C. Hansen, Clarice Poon, and Bogdan Roman. Breakingthe coherence barrier: A new theory for compressed sensing. arXiv preprintarXiv:1302.0561 , 2013.[AHR14a] Ben Adcock, Anders C Hansen, and Bogdan Roman. A note on compressed sensingof structured sparse wavelet coefficients from subsampled fourier measurements. arXiv preprint arXiv:1403.6541 , 2014.[AHR14b] Ben Adcock, Anders C. Hansen, and Bogdan Roman. The quest for optimal sam-pling: Computationally efficient, structure-exploiting measurements for compressedsensing.
Book Chapter, Compressed Sensing and its Applications, Springer (to ap-pear), arXiv preprint arXiv:1403.6540 , 2014.[BBW14] J´er´emie Bigot, Claire Boyer, and Pierre Weiss. An analysis of blocks samplingstrategies in compressed sensing. arXiv preprint arXiv:1310.4393 , 2014.[BCDH10] Richard G Baraniuk, Volkan Cevher, Marco F Duarte, and Chinmay Hegde. Model-based compressive sensing.
Information Theory, IEEE Transactions on , 56(4):1982–2001, 2010.[BH14] Alexander Bastounis and Anders C Hansen. On the absence of the rip in real-world applications of compressed sensing and the rip in levels. arXiv preprintarXiv:1411.4449 , 2014.[BJMO12] Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Opti-mization with sparsity-inducing penalties.
Foundations and Trends R (cid:13) in MachineLearning , 4(1):1–106, 2012.[CCKW14] Nicolas Chauffert, Philippe Ciuciu, Jonas Kahn, and Pierre Weiss. Variable densitysampling with continous sampling trajectories. SIAM Journal on Imaging Sciences,in press , 2014.[CCW13] N. Chauffert, P. Ciuciu, and P. Weiss. Variable density compressed sensing in MRI.theoretical vs heuristic sampling strategies. In proceedings of IEEE ISBI , 2013.37CP11] Emmanuel Cand`es and Yaniv Plan. A probabilistic and ripless theory of compressedsensing.
Information Theory, IEEE Transactions on , 57(11):7235–7254, 2011.[CRT06a] Emmanuel Cand`es, Justin Romberg, and Terence Tao. Robust uncertainty prin-ciples: Exact signal reconstruction from highly incomplete frequency information.
Information Theory, IEEE Transactions on , 52(2):489–509, 2006.[CRT06b] Emmanuel Cand`es, Justin Romberg, and Terence Tao. Stable signal recovery fromincomplete and inaccurate measurements.
Communications on pure and appliedmathematics , 59(8):1207–1223, 2006.[CT06] Emmanuel Cand`es and Terence Tao. Near-optimal signal recovery from random pro-jections: Universal encoding strategies?
Information Theory, IEEE Transactionson , 52(12):5406–5425, 2006.[CWKC16] Nicolas Chauffert, Pierre Weiss, Jonas Kahn, and Philippe Ciuciu. Gradient wave-form design for variable density sampling in magnetic resonance imaging.
IEEETransactions on Medical Imaging, in press , 2016.[DE11] Marco F Duarte and Yonina C Eldar. Structured compressed sensing: From theoryto applications.
Signal Processing, IEEE Transactions on , 59(9):4053–4085, 2011.[Don06] David Donoho. Compressed sensing.
Information Theory, IEEE Transactions on ,52(4):1289–1306, 2006.[EM09] Yonina C Eldar and Moshe Mishali. Robust recovery of signals from a structuredunion of subspaces.
Information Theory, IEEE Transactions on , 55(11):5302–5316,2009.[Fel08] Willliam Feller.
An introduction to probability theory and its applications , volume 2.John Wiley & Sons, 2008.[FR13] Simon Foucart and Holger Rauhut.
A mathematical introduction to compressivesensing . Springer, 2013.[GN08] R´emi Gribonval and Morten Nielsen. Beyond sparsity: Recovering structured rep-resentations by {\ ell } ˆ 1 minimization and greedy algorithms. Advances in compu-tational mathematics , 28(1):23–41, 2008.[Gro11] David Gross. Recovering low-rank matrices from few coefficients in any basis.
In-formation Theory, IEEE Transactions on , 57(3):1548–1566, 2011.[GRUV14] Karlheinz Gr¨ochenig, Jos´e Luis Romero, Jayakrishnan Unnikrishnan, and MartinVetterli. On minimal trajectories for mobile sampling of bandlimited fields.
Appliedand Computational Harmonic Analysis , 2014.[HSIG13] C´edric Herzet, Charles Soussen, J´erˆome Idier, and R´emi Gribonval. Exact recoveryconditions for sparse representations with partial support information.
InformationTheory, IEEE Transactions on , 59(11):7509–7524, 2013.[KW14] Felix Krahmer and Rachel Ward. Stable and robust sampling strategies for com-pressive imaging.
IEEE Trans. Image Proc. , 23(2):612–622, 2014.[LDP07] Michael Lustig, David Donoho, and John M. Pauly. Sparse MRI: The applicationof compressed sensing for rapid MR imaging.
Magnetic resonance in medicine ,58(6):1182–1195, 2007. 38LSMH13] Rowan Leary, Zineb Saghi, Paul A Midgley, and Daniel J Holland. Compressedsensing electron tomography.
Ultramicroscopy , 131:70–91, 2013.[PDG15] Adam C. Polak, Marco F. Duarte, and Dennis L. Goeckel. Performance boundsfor grouped incoherent measurements in compressive sensing.
To appear in IEEESignal Processing , 2015.[PMG +
12] Gilles Puy, Jose P. Marques, Rolf Gruetter, J. Thiran, Dimitri Van De Ville, PierreVandergheynst, and Yves Wiaux. Spread spectrum magnetic resonance imaging.
Medical Imaging, IEEE Transactions on , 31(3):586–598, 2012.[PSV09] Xiaochuan Pan, Emil Y Sidky, and Michael Vannier. Why do commercial ct scannersstill employ traditional, filtered back-projection for image reconstruction?
Inverseproblems , 25(12):123009, 2009.[PVW11] Gilles Puy, Pierre Vandergheynst, and Yves Wiaux. On variable density compressivesampling.
Signal Processing Letters, IEEE , 18(10):595–598, 2011.[Rau10] Holger Rauhut. Compressive sensing and structured random matrices.
Theoreticalfoundations and numerical methods for sparse recovery , 9:1–92, 2010.[RHA14] Bogdan Roman, Anders Hansen, and Ben Adcock. On asymptotic structure incompressed sensing. arXiv preprint arXiv:1406.4178 , 2014.[TH08] Georg Taub¨ock and Franz Hlawatsch. A compressed sensing technique for ofdmchannel estimation in mobile environments: Exploiting channel sparsity for reduc-ing pilots. In
Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEEinternational conference on , pages 2885–2888. IEEE, 2008.[Tro06] Joel A Tropp. Just relax: Convex programming methods for identifying sparsesignals in noise.
Information Theory, IEEE Transactions on , 52(3):1030–1051, 2006.[Tro12] Joel A. Tropp. User-friendly tail bounds for sums of random matrices.
Foundationsof Computational Mathematics , 12(4):389–434, 2012.[UV13a] J. Unnikrishnan and M. Vetterli. Sampling and reconstruction of spatial fields usingmobile sensors.
Signal Processing, IEEE Transactions on , 61(9):2328–2340, 2013.[UV13b] Jayakrishnan Unnikrishnan and Martin Vetterli. Sampling high-dimensional ban-dlimited fields on low-dimensional manifolds.
Information Theory, IEEE Transac-tions on , 59(4):2103–2127, 2013.[WJP +
09] Yves Wiaux, Laurent Jacques, Gilles Puy, Anna MM. Scaife, and Pierre Van-dergheynst. Compressed sensing imaging techniques for radio interferometry.