Non-uniform recovery guarantees for binary measurements and infinite-dimensional compressed sensing
NNON-UNIFORM RECOVERY GUARANTEES FOR BINARY MEASUREMENTS ANDINFINITE-DIMENSIONAL COMPRESSED SENSING
L. THESING AND A. C. HANSENA
BSTRACT . Due to the many applications in Magnetic Resonance Imaging (MRI), Nuclear Magnetic Resonance(NMR), radio interferometry, helium atom scattering etc., the theory of compressed sensing with Fourier trans-form measurements has reached a mature level. However, for binary measurements via the Walsh transform, thetheory has been merely non-existent, despite the large number of applications such as fluorescence microscopy,single pixel cameras, lensless cameras, compressive holography, laser-based failure-analysis etc. Binary mea-surements are a mainstay in signal and image processing and can be modelled by the Walsh transform and Walshseries that are binary cousins of the respective Fourier counterparts. We help bridging the theoretical gap by pro-viding non-uniform recovery guarantees for infinite-dimensional compressed sensing with Walsh samples andwavelet reconstruction. The theoretical results demonstrate that compressed sensing with Walsh samples, as longas the sampling strategy is highly structured and follows the structured sparsity of the signal, is as effective asin the Fourier case. However, there is a fundamental difference in the asymptotic results when the smoothnessand vanishing moments of the wavelet increase. In the Fourier case, this changes the optimal sampling patterns,whereas this is not the case in the Walsh setting.
1. I
NTRODUCTION
Since Shannon’s classical sampling theorem [50, 54], sampling theory has been a widely studied fieldin signal and image processing. Infinite-dimensional compressed sensing [4, 9, 18, 36, 37, 48, 49] is partof this rich theory and offers a method that allows for infinite-dimensional signals to be recovered fromundersampled linear measurements. This gives a non-linear alternative to other methods like generalizedsampling [2, 5, 6, 8, 32, 34, 41] and the Parametrized-Background Data-Weak (PBDW)-method [13, 14, 24,42–44] that reconstruct infinite-dimensional objects from linear measurement. However, these methods donot allow for subsampling, and hence are dependent on consecutive samples of, for example, the Fouriertransform. Infinite-dimensional compressed sensing, on the other hand, is similar to generalized samplingand the PBDW-method, but utilises an (cid:96) optimisation problem that allows for subsampling.Beside the typical flagship of modern compressed sensing, namely MRI [31, 40], there is also a myriad ofother applications, like fluorescence microscopy [47,51], single pixel cameras [29], medical imaging deviceslike computer tomography [19], electron microscopy [38], lensless cameras [35], compressive holography[20] and laser-based failure-analysis [52] among others. The applications divide themselves in three differentgroups: those that are modelled by Fourier measurements, those that are based on the Radon transform, andthose that are represented by binary measurements. For Fourier measurements there exists a large history ofresearch, however for Radon and binary measurements, the theoretical results are scarce. In this paper weconsider binary measurements and provide the first non-uniform recovery guarantees for infinite-dimensionalcompressed sensing.The setup of infinite-dimensional compressed sensing is as follows. We consider an orthonormal basis { ϕ j } j ∈ N of a Hilbert space H and an element f = ∑ j ∈ N x j ϕ j ∈ H , x j ∈ C , Key words and phrases.
Sampling theory, compressed sensing, recovery guarantees, structured sparsity, Wavelets, Walsh functions,Hadamard transform, Hilbert spaces, 42C10, 42C40, 94A12, 94A20. a r X i v : . [ c s . I T ] S e p L. THESING AND A. C. HANSEN to be recovered from linear measurements. That is, we have another orthonormal basis { ω i } i ∈ N of H and wecan access the linear measurements given by l i ( f ) = ⟨ f, ω i ⟩ . Although the Hilbert space can be arbitrary,we will in applications mostly consider function spaces. Hence, we will often refer to the object f as well asthe basis elements as functions. We call the functions ω i , i ∈ N sampling functions and the space spanned bythem S = span { ω i ∶ i ∈ N } sampling space. Accordingly, ϕ j , j ∈ N are called reconstruction functions and R = span { ϕ j ∶ j ∈ N } reconstruction space. Generalized sampling [2–4] and the PBDW-method [44] use thechange of basis matrix U = { u i,j } i,j ∈ N ∈ B( (cid:96) ( N )) with u i,j = ⟨ ϕ i , ω j ⟩ to find a solution to the problem ofreconstructing coefficients in the reconstruction space from measurements in the sampling space. This is alsothe case in infinite-dimensional compressed sensing. In particular, we consider the following reconstructionproblem . For a fixed signal f = ∑ j x j ϕ j and the measurements g = P Ω U f + z , where Ω ⊂ { , , . . . , N r } is the subsampling set, P Ω the orthogonal projection onto the elements indexed by Ω and ∣∣ z ∣∣ ≤ δ someadditional noise. The reconstruction problem is to find a minimiser of(1.1) min ξ ∈ (cid:96) ( N ) ∥ ξ ∥ subject to ∥ P Ω U ξ − g ∥ ≤ δ.
2. P
RELIMINARIES
Setting and Definitions.
In this section we recall the settings from [9] that are needed to establish themain results. First, note that we will use a ≲ b to describe that a is smaller b modulo a constant, i.e. thereexists some C > such that a ≤ Cb . Moreover, for a set Ω ⊂ N the orthogonal projection corresponding tothe elements of the canonical bases of (cid:96) ( N ) with the indices of Ω is denoted by P Ω . Similar, for N ∈ N the orthogonal projection onto the first N elements of the canonical basis of (cid:96) ( N ) is represented by P N .Finally, P ba stands for the orthogonal projection onto the basis vectors related to the indices { a + , . . . , b } .Note that (1.1) is an infinite-dimensional optimisation problem, however, in practice (1.1) is replaced by min ξ ∈ (cid:96) ( N ) ∥ ξ ∥ subject to ∥ P Ω U P N ξ − g ∥ ≤ δ. As N → ∞ one recovers minimisers of (1.1) (see [4] for details).X-lets such as wavelets [45], Shearlets [22, 23] and Curvelets [15–17] yield a specific sparsity structureaccording to their level structure. To describe this phenomena the notation of ( s , M ) -sparsity is introducedinstead of pure sparsity. Definition 2.1 ( [9]) . Let x ∈ (cid:96) ( N ) . For r ∈ N let M = ( M , . . . , M r ) ∈ N with ≤ M < . . . < M r and s = ( s , . . . , s r ) ∈ N r , with s k ≤ M k − M k − , k = , . . . , r where M = . We say that x is ( s , M ) -sparse if,for each k = , . . . , r , ∆ k ∶= supp ( x ) ∩ { M k − + , . . . , M k } , satisfies ∣ ∆ k ∣ ≤ s k . We denote the set of ( s , M ) - sparse vectors by Σ s , M . The majority of natural signals is not perfectly sparse but instead has a small tail in the representationsystem. Hence, in a large number of applications it is unlikely to ask for sparsity but compressibility.
Definition 2.2 ( [9]) . Let f = ∑ j ∈ N x j ϕ j , where x = ( x j ) j ∈ N ∈ (cid:96) ( N ) . We say that f is ( s , M ) - compressiblewith respect to { ϕ j } j ∈ N if σ s , M ( f ) is small, where σ s , M ( f ) = min η ∈ Σ s , M ∣∣ x − η ∣∣ . In terms of this more detailed description of the signal instead of classical sparsity it is possible to adapt thesampling scheme accordingly. Complete random sampling will be substituted by the setting of multilevelrandom sampling. This allows us later to treat the different levels separately. Moreover, this representssampling schemes that are used in practice.
Definition 2.3 ( [9]) . Let r ∈ N , N = ( N , . . . , N r ) ∈ N r with ≤ N < . . . < N r , m = ( m , . . . , m r ) ∈ N r ,with m k ≤ N k − N k − , k = , . . . , r , and suppose that Ω k ⊂ { N k − + , . . . , N k } , ∣ Ω k ∣ = m k , k = , . . . , r, are chosen uniformly at random, where N = . We refer to the set Ω = Ω N , m = Ω ∪ . . . ∪ Ω r as an ( N , m ) - multilevel sampling scheme.Remark . To avoid pathological examples we assume as in [9] that we have for the total sparsity s = s + . . . s r ≥ . This results in the fact that log ( s ) ≥ and therefore also m k ≥ for all k = , . . . , r .3. M AIN RESULTS : N ON - UNIFORM RECOVERY FOR THE W ALSH - WAVELET CASE
In this paper we focus on the reconstruction from binary measurements. This arises naturally in exampleslike those mentioned in the introduction and applications where the sampling is performed with an apparatusthat has an ”on-off”-behaviour. We focus on the setting of recovering data in L ([ , ]) , however the theorybuilds on general results for Hilbert spaces as presented in §2. Linear measurements are typically representedby inner products between sampling functions and the data of interest. Binary measurements can be repre-sented with functions that take values in { , } , or, after a well known and convenient trick of subtractingconstant one measurements, with functions that take values in {− , } . For practical reasons it is sensible toconsider functions that provide fast transforms. Additionally, the function system should correspond well tothe reconstruction space. For the reconstruction with wavelets, Walsh functions have proven to be a sensiblechoice, and are discussed in more detail in §3.2.1. Sampling from binary measurements has been analysedfor the linear case in [12, 33, 53] and in the non-linear case in [1, 46]. We extend this results to the non-uniform recovery guarantees in the non-linear case. By filling this gap we gain broad knowledge about linearand non-linear reconstruction for two of the three main measurement systems: Fourier and binary.Let(3.1) U = { u i,j } i,j ∈ N ∈ B( (cid:96) ( N )) , u i,j = ⟨ ϕ i , ω j ⟩ , where { ϕ j } j ∈ N denotes the Walsh functions on [ , ] as described in §3.2.1, and { ω i } i ∈ N demotes the Daube-cies boundary wavelets on [ , ] described in §3.2.2. We are now able to state the recovery guarantees forthe Walsh-wavelet case. Theorem 3.1 (Main theorem) . Given the notation above, let N = ( N , . . . , N r ) define the sampling levelsas in (3.7) and M = ( M , . . . , M r ) represent the levels of the reconstruction space as in (3.6) . Consider U as in (3.1) , (cid:15) > and let Ω = Ω N,m be a multilevel sampling scheme such that the following holds:(1) Let N = N r , K = max k = ,...,r { N k − N k − m k } , M = M r , s = s + . . . + s r such that (3.2) N ≳ M ⋅ log ( M K √ s ) . (2) For each k = , . . . , r , m k ≳ log ( (cid:15) − ) log ( K s / N ) ⋅ N k − N k − N k − ⋅ ( r ∑ l = −∣ k − l ∣/ s l ) (3.3) Then with probability exceeding − s(cid:15) , any minimizer ξ ∈ (cid:96) ( N ) of (1.1) satisfies ∥ ξ − x ∥ ≤ c ⋅ ( δ √ K ( + L √ s ) + σ s,M ( f )) , for some constant c , where L = c ⋅ ( + √ log ( (cid:15) − ) log ( KM √ s ) ) . If m k = N k − N k − for ≤ k ≤ r then this holds withprobability . L. THESING AND A. C. HANSEN
This results allows one to exploit the asymptotic sparsity structure of most natural images under thewavelet transform. It was observed in [9] that the ratio of non-zero coefficients per level decreases very fastwith increasing level and at the same time the level size increases. Hence, most images are not that sparsein the first levels and sampling patterns should adapt to that. However, they are very sparse in the higherlevels. Therefore, we get that the number of measurements depends mainly on the sparsity in this level andthe influence of the sparsity in the other levels decays exponentially.
Remark . For awareness of potential extensions of this work to higher dimensions or other reconstructionand sampling spaces we kept the factor ( N k − N k − )/ N k − in (3.3). However, for the Walsh-wavelet case inone dimension this factor reduces to , when the values from Equation (3.7) are used. Hence, the Equation(3.3) can be further simplified to m k ≳ log ( (cid:15) − ) log ( K s / N ) ⋅ ( r ∑ l = −∣ k − l ∣/ s l ) , however, in general one needs the factor ( N k − N k − )/ N k − . Remark . We would like to highlight the differences to the Fourier-wavelet case, i.e. to Theorem 6.2.in [9]. The most striking difference is the squared factor in (3.2). In the Fourier-wavelet case this is dependenton the smoothness of the wavelet and shown to be N ≳ M + /( α − ) ⋅ ( log ( M K √ s )) /( α − ) , where α denotes the decay rate under the Fourier transform, i.e. the smoothness of the wavelet. For very smoothwavelets this can be improved to N ≳ M ⋅ ( log ( M K √ s )) /( α − ) . Hence, for very smooth wavelets we get the optimal linear relation, beside log terms. However, for non-smooth wavelets like the Haar wavelet, we get a squared relation instead of linear. The reason why we donot observe a similar dependence on the smoothness in terms of the sampling relation is that smoothness ofa wavelet does not relate to a faster decay under the Walsh transform. This is also related to the fact that forFourier measurements (3.3) become m k ≳ log ( (cid:15) − ) ⋅ log ( ˜ N ) ⋅ N k − N k − N k − ⋅ ⎛⎝ ˆ s k + k − ∑ l = s l ⋅ −( α − / ) A k,l + r ∑ l = k + s l ⋅ − vB k,l ⎞⎠ , (3.4)where A k,l and B k,l are positive numbers, ˜ N = ( K √ s ) + / v N , where v denotes the number of vanishingmoments, and ˆ s k = max { s k − , s k , s k + } . In particular, smoothness and vanishing moments of the waveletdoes have an impact in the Fourier case, but not in the Walsh case. This is also confirmed in Figure 1, wherewe have plotted the absolute values of sections of U , where U is the infinite matrix from (3.1). As can beseen in Figure 1, the matrix U gets more block diagonal in the Fourier case with more vanishing momentsconfirming the dependence of α and v in (3.4). Note that for a completely block diagonal matrix U the m k in (3.4) will only depend on s k and not any of the s l when l ≠ k . In contrast this effect is not visible in theWalsh situation suggesting that the estimate in (3.3) captures the correct behaviour by not depending on α and v . The reason why is that a function needs to be smooth in the dyadic sense to have a faster decay rateunder the Walsh transform. However, this is not related to classical smoothness. Finally, numerical examplesin §5 suggest that the squared relation in (3.2) is not sharp and is also possible to reconstruct images with areduced relation between the maximal sample and the maximal reconstructed coefficient.3.1. Connection to related work.
Reconstruction methods are mainly divided in two major classes of lin-ear and non-linear methods. For the linear case generalized sampling [2] and the
PBDW-method [44] areprominent examples. Preceding to the first one consistent sampling was investigated by Aldroubi, Eldar,Unser and others [11, 25–28, 55]. Then generalized sampling has been studied by Adcock, Hansen, Hrycak,Gr¨ochenig, Kutyniok, Ma, Poon, Shadrin in [2, 5, 6, 8, 32, 34, 41]. The PBDW-method evolved from the ( A ) Haar wavelets (Walsh) ( B ) vanish. mom. (Walsh) ( C ) vanish. mom. (Walsh)( D ) Haar wavelets (Fourier) ( E ) vanish. mom. (Fourier) ( F ) vanish. mom. (Fourier) F IGURE
1. Absolute values of P N U P N with N = , where U is the infinite matrixfrom (3.1), with Daubechies wavelets with different numbers of vanishing moments, andWalsh (upper row) and Fourier measurements (lower row). In the Fourier case, U becomesmore block diagonal as smoothness and the number of vanishing moments increase. Thisis not the case in the Walsh setting, suggesting that the non-dependence of smoothness andvanishing moments in the estimate (3.3) is correct.work of Maday, Patera, Penn and Yano in [43] first under the name generalized empirical interpolationmethod . This was then further analysed and extended to the PBDW-method by Binev, Cohen, Dahmen, De-Vore, Petrova, and Wojtaszczyk [13, 14, 24, 42, 44]. The stability and accuracy of both methods is securedby the stable sampling rate (SSR) which controls the number of samples needed for a stable and accuratereconstruction of a certain number of coefficients in the reconstruction space. It was shown that the SSR islinear for the Fourier-wavelet [7], Fourier-shearlet [41] and Walsh-wavelet case [33]. However, this is notalways the case as for the Fourier-polynomial situation [34]. In the non-linear setting the most prominentreconstruction technique is infinite-dimensional compressed sensing [18] as analysed in the Fourier case byAdcock, Hansen, Kutyniok, Lim, Poon and Roman [4, 9, 37, 48, 49]. There exists wide spread knowledge inthis area. For the Fourier wavelet case we know uniform recovery guarantees [39] and non-uniform recoveryguarantees [9, 10]. For Walsh measurements we have uniform recovery guarantees from Adcock, Antunand Hansen [1] and an analysis for variable and multilevel density sampling strategies for the Walsh-Haarcase and finite-dimensional signals by Moshtaghpour, Dias and Jacques in [46]. In this paper we present thenon-uniform results for the Walsh-wavelet case as has been studied for the Fourier case in [9, 10].3.2. Sampling and Reconstruction space.
Sampling Space.
We start with the sampling space. To model binary measurements Walsh functionshave proven to be a good choice. They behave similar to Fourier measurements with the difference that theywork in the dyadic rather than the decimal analysis. They also have an increasing number of zero crossing.This leads to the fact that the change of basis matrix gets a block diagonal structure, as can be seen in Figure
L. THESING AND A. C. HANSEN
1. One can exploit the asymptotic sparsity and incoherence. However, the fact that they are defined in thedyadic analysis leads to some difficulties and specialities in the proof.Let us now define the Walsh functions, which form the kernel of the Hadamard matrix. Then we proceedwith their properties and the definition of the Walsh transform.
Definition 3.4 (Walsh function) . Let n = ∑ i ∈ Z n i i − with n i ∈ { , } be the dyadic expansion of n ∈ R + .Analogously, let x = ∑ i ∈ Z x i i − with x i ∈ { , } . The generalized Walsh functions in L ([ , ]) are givenby Wal ( n, x ) = (− ) ∑ i ∈ Z ( n i + n i + ) x − i − . This definition can also be extended to negative inputs by
Wal (− n, x ) = Wal ( n, − x ) = − Wal ( n, x ) .Walsh functions are one-periodic in the second input if the first one is an integer. Moreover, the definition isextended to arbitrary inputs n ∈ R instead of the more classical definition for n ∈ N . We would like to makethe reader aware of different orderings of the Walsh functions. The one presented here is the Walsh-Kaczmarz ordering. It is ordered in terms of increasing number of zero crossings. This has the advantage that it relatesnicely with the scaling of wavelets. Two other possible orderings are
Walsh-Paley and
Walsh-Kronecker .Both have the drawback that the number of zero crossings is not increasing. Therefore, we are not able to getthe block diagonal structure in the change of basis matrix. The Walsh Kronecker ordering is also not oftenused in practice because one has to predefine the largest input of n and dependent on this value the orderingis changing, i.e. there is a third input n max which also leads to changes.For the sampling pattern we divide the sequency parameter n into blocks of size j with j ∈ N . Thisresults in an insightful relationship between the wavelets and the Walsh functions. Additionally, it is directlyrelated to the block structure observed in numerical experiments.After the small excursion on orderings we now define the sampling space in one dimension by S = span { Wal ( n, ⋅) , n ∈ N } . In general it is not possible to acquire or save an infinite number of samples. Therefore, we restrict ourselvesto the sampling space according to Ω N,m , i.e. S Ω N,m = span { Wal ( n, ⋅) , n ∈ Ω N,m } . The Walsh functions obey some interesting properties: the scaling property , i.e.
Wal ( j n, x ) = Wal ( n, j x ) for all j ∈ N and n, x ∈ R and the multiplicative identity , i.e. Wal ( n, x ) Wal ( n, y ) = Wal ( n, x ⊕ y ) , where ⊕ is the dyadic addition. With the Walsh functions we are able to define the continuous Walsh transformalmost everywhere: f ⋀ W ( n ) = ⟨ f (⋅) , Wal ( n, ⋅)⟩ = ∫ [ , ] f ( x ) Wal ( n, x ) dx, n ∈ R . The properties from the Walsh functions are easily transferred to the Walsh transform. We state now somemore statements about the Walsh functions and the Walsh transform, which are necessary for the main proof.
Lemma 3.5 ( [33]) . Let t ∈ N and x ∈ [ , ) , then the following holds: W { f ( x + t )} ( s ) = W { f ( x )} ( s ) Wal ( t, s ) . Remark that this only holds because x and t do not have non-zero elements in their dyadic representationat the same spot and therefore the dyadic addition equals the decimal addition. Next, we consider Walshpolynomials and see how we can relate the sum of squares of the polynomial to the sum of squares of itscoefficients. Definition 3.6 ( [33]) . Let
A, B ∈ Z such that A ≤ B and α j ∈ R . Then for z ∈ R + we define the Walshpolynomial of order n = ∣ B ∣ by Φ ( z ) = ∑ Bj = A α j Wal ( j, z ) . The set of all Walsh polynomials up to degree n is given by W P n = ⎧⎪⎪⎨⎪⎪⎩ B ∑ j = A α j Wal ( j, z ) , α j ∈ R , A, B ∈ Z , ∣ B ∣ ≤ n ⎫⎪⎪⎬⎪⎪⎭ . Lemma 3.7 ( [33]) . Let
A, B ∈ Z such that A ≤ B and consider the Walsh polynomial Φ ( z ) = ∑ Bj = A α j Wal ( j, z ) for z ∈ R + . If L = n , n ∈ N such that L ≥ B − A + , then L − ∑ j = L ∣ Φ ( j L )∣ = B ∑ j = A ∣ α j ∣ . In the proof we will combine the shifts in the wavelet in a Walsh polynomial. With this lemma at handthis is then easily bounded.3.2.2.
Reconstruction Space.
Next, we define the reconstruction space. As we are mainly interested in thereconstruction of images and audio signals, we use Daubechies wavelets. They provide good smoothnessand support properties. Moreover, they obey the Multi-resolution analysis (MRA). The wavelet space isdescribed by the wavelet ψ at different levels and shifts ψ j,m ( x ) = j / ψ ( j x − m ) for j, m ∈ N , i.e. wehave the wavelet space at level j W j ∶= span { ψ j,m , m ∈ N } . They build a representation system for L ( R ) , i.e. ⋃ j ∈ N W j = L ( R ) . For the MRA we define also thesampling function φ and the according sampling space V j = span { φ j,m , m ∈ N } , where φ j,m ( x ) = j / φ ( j x − m ) . We then have that V j = V j − ⊕ W j − and L ( R ) = closure { V J ⊕ ⋃ j ≥ J W j } .The Daubechies scaling function and wavelet have the same smoothness properties. This allows us to dealwith them interchangeably, as we only need the decay rate under the Walsh transform for the analysis.However, the classical definition of Daubechies wavelets has a large drawback for our setting. Normally,they are defined on the whole line R . Due to the fact that Walsh functions are defined on [ , ] it is necessaryto restrict the wavelets also to [ , ] . Otherwise there will be elements in the reconstruction space whichare not in the sampling space and therefore the solution could not be unique. Hence, we are using boundarycorrected wavelets (§4 [21]). In [33] this problem of the relation between the reconstruction and samplingspace is discussed in more length and we refer the interested reader.For the definition of boundary wavelets we have to correct those that intersect with the boundary. We startwith the definition of the scaling space and continue with the wavelet space. Let φ be the scaling function oforder p with support in [− p + , p ] . First consider the lowest level J such that the scaling functions do onlyintersect with one boundary or , i.e. J ≥ p − . Then we can keep the interior scaling functions. Theexterior ones are changed according to [21] and are denoted by φ bJ ,m ( x ) = ⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩ φ J ,m ( x ) m = p, . . . , j − p − φ left J ,m ( x ) m = , . . . , p − φ right J ,m ( x ) m = j − p, . . . j − . The left and right functions still have the same smoothness properties and staggered support, such that thenew system has the same properties as before. Additionally, it was proved in [21] that V j can be spanned bythe scaling function and it translates and the reflected version φ ( x ) = φ (− x + ) , i.e. V j = span { φ bj,m , m = , . . . , j − p − , φ j,m , m = j − p, . . . , j − } . L. THESING AND A. C. HANSEN
The new system still obeys the MRA hence the boundary wavelets can be deduced from the boundary cor-rected scaling functions. Fortunately, we only need the smoothness properties of the wavelet. The boundarywavelet will be denoted by ψ b and ψ bj,m ( x ) = j / ψ b ( j x − m ) . Because the smoothness properties are alsokept after the boundary correction, we do not get into the details about the construction of the wavelet. Theinterested reader should seek out for [21] for a detailed analysis.With this information at hand we can now have a look at the reconstruction space. To analyse L ([ , ]) we represent the low frequency part by the scaling space at J and the higher frequency with the waveletspaces with j ≥ J . Hence, the reconstruction space is given by R = { φ bJ ,m , m = , . . . , J − p − , φ J ,m , m = J − p, . . . , J − , (3.5) ψ bj,m , j ≥ J , m = , . . . , j − } . In the finite-dimensional setting we only consider wavelets up to a certain scale R = log ( N ) we denote thespace of the first N = R elements by R N = { φ J ,m , m = , . . . , J − p − , φ J ,m , m = J − p, . . . , J − ,ψ bj,m , j = J , . . . R − , m = , . . . , j − } . Remark . We consider here only the case of Daubechies wavelets of order p ≥ . The theory also holdsfor the case for order p = , . Nevertheless, we get unpleasant exponents α depending on the wavelet anddifferent cases to consider. For the Haar wavelet, we can get even better estimates due to the perfect blockstructure of the change of basis matrix in that case. A detailed analysis of the relation between Haar waveletsand Walsh functions can be found in [53] and we discuss the recovery guarantees for this special case in§4.4.For the future analysis we want to get the shift in the wavelets transferred to the Walsh function. For thissake we use Lemma 3.5. However, in the assumptions we have that t ∈ N and x ∈ [ , ] . Due to the largersupport of the wavelets this does not hold true, i.e. − R ( n + p ) ∫ − R ( n − p + ) R / φ ( R x − n ) Wal ( k, x ) dx = − R / p ∫ − p + φ ( x ) Wal ( k, − R ( x + n )) dx ≠ − R / p ∫ − p + φ ( x ) Wal ( k, − R ( x ⊕ n )) dx. Therefore, we have to separate the wavelets into parts which have support in [ , ] . Remark that this is nota contradiction to the construction of the boundary wavelets. They are indeed supported in [ , ] . However,only from the beginning of the scaling J and not the mother scaling function. Therefore, we represent themother scaling function as follows φ ( x ) = p ∑ i =− p + φ i ( x − i + ) with φ i ( x ) = φ ( x + i − )X [ , ] ( x ) and φ R,n = R / p ∑ i =− p + φ i ( R x − i + − n ) . This can also be done accordingly for the reflected function φ . More detailed information about thisproblem can be found in [33]. ( A ) Cancelation bythe wavelet ( B ) Addition due tothe same frequency ( C ) Cancelationdue to the Walshfunction F IGURE
2. Intuition for block diagonal structure of the change of basis matrix3.2.3.
Ordering.
We are now discussing the ordering of the sampling and reconstruction space. We orderthe reconstruction space according to the levels, as in (3.5). With this we get the multilevel subsamplingscheme with the level structure. For this sake, we bring the scaling function at level J and the wavelet atlevel J together into one block of size J + . The next level constitutes of the wavelets of order J + ofsize J + and so forth. Therefore, we define(3.6) M = ( M , M , . . . , M r ) = ( , J + , J + , . . . , J + r ) to represent the level structure of the reconstruction space. For the sampling space we use the same partition.We only allow by the choice of q ≥ oversampling. Let(3.7) N = ( N , N , . . . , N r − , N r ) = ( , J + , J + , . . . , J + r − , J + r + q ) . We then get for the reconstruction matrix U in (3.1) with u i,j = ⟨ ϕ i , ω j ⟩ that ω j ( x ) = Wal ( j, x ) and for thefirst block we have ϕ i = φ J ,i for i = , . . . , J − p − and ϕ i = φ J ,i for i = J − p, . . . , J − . For thenext levels, i.e. for i ≥ J we get for l with l ≤ i < l + and m = i − l that ϕ i = ψ bl,m .The proof of the main theorem relies mainly on the analysis of the change of basis matrix. Numericalexamples and rigour mathematics [53] show that it is perfectly block diagonal for the Walsh-Haar case.And it is also close to block diagonality for other Daubechies wavelets, which can be seen in Figure 1.An intuition about this phenomena is given in Figure 2. We plotted Haar wavelets at different scales withWalsh functions at different sequencies. In 2a the scaling of the Haar wavelet is higher than the sequency ofthe Walsh function. Therefore, the Walsh function does not change the wavelet on its support and hence itintegrates to zero. The next one 2b shows a wavelet and Walsh function at similar scale and sequency whichrelates to parts of the change of basis function in the inner block. Here the two functions add up nicely to geta non-zero output. Last, we have in 2c that the Walsh functions oscillate faster then the wavelet and hencethe Walsh function is not disturbed by the wavelet and can integrate to zero.4. P ROOF OF THE M AIN R ESULT
Preliminaries.
It is important to make sure that the uneven finite sections of the change of basis matrixare close to an isometry. In the finite-dimensional setting this is assured by the stable sampling rate. Detailedanalysis about the stable sampling rate for Walsh functions can be found in [33]. The analysis for Fouriermeasurements is conducted in [7, 30, 41]. For the infinite case the balancing property controls the relationbetween the number of samples and reconstructed coefficients, such that the matrix P N U P M is close to anisometry. Definition 4.1 ( [4]) . Let U ∈ B( (cid:96) ( N )) be an isometry. Then N ∈ N and K ≥ satisfy the strong balancingproperty with respect to U, M ∈ N and s ∈ N if ∣∣ P M U ∗ P N U P M − P M ∣∣ (cid:96) ∞ → (cid:96) ∞ ≤ ( log / ( √ sKM )) − , ∣∣ P ⊥ M U ∗ P N U P M ∣∣ (cid:96) ∞ → (cid:96) ∞ ≤ , where ∣∣ ⋅ ∣∣ (cid:96) ∞ → (cid:96) ∞ is the norm on B( (cid:96) ∞ ( N )) . In this setting we use the notation as in [9]. Let ˜ M = min { i ∈ N ∶ max k ≥ i ∣∣ P N U e k ∣∣ ≤ K √ s } . In the rest of the analysis we are interested in the number of samples needed for stable and accuraterecovery. This value depends besides known values on the local coherence and the relative sparsity whichare defined next. We start with the (global) coherence.
Definition 4.2 ( [9]) . Let U = ( u i,j ) Ni,j = ∈ C N × N be an isometry. The coherence of U is µ ( U ) = max i,j = ,...,N ∣ u i,j ∣ With this it is possible to define the local coherence for every level band.
Definition 4.3 ( [9]) . Let U ∈ B( (cid:96) ( N )) be an isometry. The ( k, l ) th local coherence of U with respect to M , N is given by (4.1) µ N , M ( k, l ) = √ µ ( P N k − N k U P M l − M l ) ⋅ µ ( P N k − N k U ) , k, l = , . . . , r. We also define (4.2) µ N , M ( k, ∞) = √ µ ( P N k − N k U P ⊥ M r − ) ⋅ µ ( P N k − N k U ) . Definition 4.4 ( [9]) . Let U ∈ B( (cid:96) ( N )) be an isometry and s = ( s , . . . , s r ) ∈ N r and ≤ k ≤ r the k th relative sparsity is given by S k = S k ( N , M , s ) = max η ∈ Θ ∣∣ P N k − N k U η ∣∣ , with Θ = { η ∶ ∣∣ η ∣∣ ∞ ≤ , ∣ supp ( P M l − M l η )∣ = s l , l = , . . . r } . After clarifying the notation and settings we are now able to state the main theorem from [9].
Theorem 4.5 ( [9]) . Let U ∈ B( (cid:96) ( N )) be an isometry and x ∈ (cid:96) ( N ) . Suppose that Ω = Ω N,m is amultilevel sampling scheme, where N = ( N , . . . , N r ) ∈ N r and m = ( m , . . . , m r ) ∈ N r . Let ( s , M ) , where M = ( M , . . . , M r ) ∈ N r , M < . . . < M r and s = ( s , . . . , s r ) ∈ N r , be any pair such that the followingholds:(1) The parameters N = N r , K = max k = ,...,r { N k − N k − m k } , satisfy the strong balancing property with respect to U, M ∶= M r and s ∶= s + . . . + s r ;(2) For (cid:15) ∈ ( , e − ] and ≤ k ≤ r , (4.3) ≳ N k − N k − m k ⋅ log ( (cid:15) − ) ( r ∑ l = µ N , M ( k, l ) s l ) ⋅ log ( K ˜ M √ s ) , (with µ N , M ( k, r ) replaced by µ N , M ( k, ∞) ) and m k ≳ ˆ m k log ( (cid:15) − ) log ( K ˜ M √ s ) , where ˆ m k issuch that (4.4) ≳ r ∑ k = ( N k − N k − ˆ m k − ) ⋅ µ N , M ( k, l ) ˜ s k , for all l = , . . . , r and all ˜ s , . . . , ˜ s r ∈ ( , ∞) satisfying ˜ s + . . . + ˜ s r = s + . . . + s r , ˜ s k ≤ S k ( N , N , s ) . Suppose that ξ ∈ (cid:96) ( N ) is a minimizer of (1.1) . Then, with probability exceeding − s(cid:15) , ∣∣ ξ − x ∣∣ ≤ c ⋅ ( δ ⋅ √ K ⋅ ( + L ⋅ √ s ) + σ s,M ( f )) for some constant c and L = c ⋅ ( + √ log ( (cid:15) − ) log ( KM √ s ) ) . If m k = N k − N k − for ≤ k ≤ r then this holds withprobability . It is a mathematical justification to use structured sampling schemes in contrast to the first compressedsensing results which promoted the use of random sampling masks.4.2.
Key estimates.
In this chapter we discuss the important estimates that are needed for the proof ofTheorem 3.1. They are also interesting for themselves and allow a deeper understanding of the relationbetween Walsh functions and wavelets.4.2.1.
Local coherence estimate.
We start with restating the results about the decay rate of wavelets underthe Walsh transform.
Lemma 4.6 ( [12]) . Let f be a H¨older continuous function of order α ≥ . Then the constant C f = sup t ∈[ , ] ∣ f ′ ( t )∣ exists and we have that ∣ f ⋀ W ( n )∣ ≤ C f n . This leads directly to the following estimate of the decay rate of wavelets under the Walsh transform.
Corollary 4.7.
Let φ be the mother scaling function of order p ≥ and φ be its reflected version. Moreover,let ψ be the corresponding mother wavelet. Then we have that ∣ φ i ⋀ W ( n )∣ ≤ C φ n , ∣ φ i ⋀ W ( n )∣ ≤ C φ n and ∣ ψ ⋀ W ( n )∣ ≤ C ψ n . We denote by C φ,ψ the maximum of { C φ , C φ , C ψ , } .Proof. The corollary follows directly from the H¨older continuity of the wavelet. (cid:3)
This decay rate is important in a lot of the following proofs. Next, we use it to estimate µ ( P N k − N k U ) . Lemma 4.8.
Let U be the change of basis matrix for the boundary Daubechies wavelets and Walsh functions.Moreover, let M and N be defined by (3.6) and (3.7) . Then we have that µ ( P N k − N k U ) ≤ C φ,ψ N k − . Proof.
The proof follows the lines in [9] and uses the decay estimates in Corollary 4.7. First, we have that µ ( P N k − N k U ) ≤ µ ( P ⊥ N k − U ) . Then we get using the arguments in Theorem 7.15 (ii) in [9] and the tensor product structure µ ( P ⊥ N U ) ≤ max k ≥ N max ϕ ∈R ∣⟨ ϕ, Wal ( k, ⋅)⟩∣ ≤ max k ≥ N max R ∈ N C φ,ψ R ( + k / R ) ≤ max R ∈ N C φ,ψ R ( + N / R ) ≤ C φ,ψ N .
This gives together with the first estimate the desired result. (cid:3)
Now, we recall the result from [12] about the local coherence. Note that the local coherence has a differentdefinition in [9] and [12].
Theorem 4.9 ( [12]) . Let U be the change of basis matrix for Walsh functions and boundary wavelets oforder p ≥ and minimal wavelet decomposition J . Moreover, let M and N as in (3.6) and (3.7) . Then let ˜ µ N , M ( k, l ) ∶= max {∣ u ij ∣ ∶ i = N k − + , . . . N k , j = M l − + , . . . , M l }= µ ( P N k − N k U P M l − M l ) . We have that ˜ µ N , M ( k, l ) ≤ ⎧⎪⎪⎪⎨⎪⎪⎪⎩ C ˜ µ − J − k + l + k ≥ lC ˜ µ − J − l + k ≤ l for the constant C ˜ µ = p C φ,ψ > independent of k, l . With this two theorems at hand we can now give an estimate for the local coherence.
Corollary 4.10.
Let µ N , M ( k, l ) be as in (4.1) . Then, µ N , M ( k, l ) ≤ C / µ −( J + k − ) ⋅ C φ,ψ − / −∣ k − l ∣/ . Proof.
First, we have that µ ( P N k − N k U P M l − M l ) = ˜ µ N , M ( k, l ) . Let l ≤ k then µ ( P N k − N k U P M l − M l ) = ˜ µ N , M ( k, l ) ≤ C ˜ µ (− J − k − l + ) . and similar for l > k µ ( P N k − N k U P M l − M l ) = ˜ µ N , M ( k, l ) ≤ C ˜ µ −( J + l − ) . Combining this with the result in Lemma 4.8 we get again first for l ≤ kµ N , M ( k, l ) ≤ ( C ˜ µ (− J − k − l + ) ) / ( C φ,ψ − −( J + k − ) ) / = C / µ −( J + k − ) C φ,ψ − / −∣ k − l ∣/ . and µ N , M ( k, l ) ≤ ( C ˜ µ (− J − l + ) ) / ( C φ,ψ − −( J + k − ) ) / = C / µ −( J + k − ) C φ,ψ − / −∣ k − l ∣/ . (cid:3) Because of the infinite-dimensional setting we also have to estimate µ N , M ( k, ∞) from (4.2). This is donein the following Corollary. Corollary 4.11.
Let µ N , M ( k, ∞) as in (4.2) . Then, µ N , M ( k, ∞) ≤ C / µ ⋅ −( J + k − ) C φ,ψ − / −( r − k )/ . Proof.
We have that µ N , M ( k, ∞) = √ µ ( P N k − N k U P ⊥ M r − ) ⋅ µ ( P N k − N k U ) . We know from Lemma 4.8 that µ ( P N k − N k U ) ≤ C / N k − moreover, we have with Theorem 4.9 that µ ( P N k − N k U P ⊥ M r − ) = max j ≥ M r − ∣ u i,j ∣ ≤ ˜ µ N , M ( k, r ) ≤ C ˜ µ ⋅ −( J + r − ) . Hence, we get µ N , M ( k, ∞) = √ µ ( P N k − N k U P ⊥ M r − ) ⋅ µ ( P N k − N k U )= √ µ ( P N k − N k U P ⊥ M r − ) ⋅ √ µ ( P N k − N k U )≤ C / µ ⋅ −( J + r − )/ C φ,ψ − / −( J + k − )/ = C / µ C φ,ψ − / ⋅ −( J + k − ) −( r − k )/ . (cid:3) Note that the same local coherence estimate was found for the Fourier-Haar case in [10].4.2.2.
Relative sparsity estimate.
Now we want to estimate the relative sparsity of the change of basis matrix U in the Walsh-wavelet case. To do so remember √ S k = max η ∈ Θ ∣∣ P N k − N k U η ∣∣ ≤ r ∑ l = ∣∣ P N k − N k U P M l − M l ∣∣ √ s l . Hence, we need to bound ∣∣ P N k − N k U P M l − M l ∣∣ . For this sake we first bound ∣∣ P ⊥ N U P M ∣∣ . Lemma 4.12.
Let U be the change of basis matrix for the Walsh measurements and boundary wavelets oforder p ≥ . Let the number of samples N be larger then the number of reconstructed coefficients M . Thenwe have that ∣∣ P ⊥ N U P M ∣∣ ≤ C rs ⋅ ( MN ) , where C rs = ( p − ) max { C φ , C φ } is dependent on the wavelet.Proof. We start with bounding ∣∣ P ⊥ N U P M ∣∣ . We rewrite it as follows ∣∣ P ⊥ N U P M ∣∣ = sup ϕ ∈R M ∣∣ P ⊥S N ϕ ∣∣ . It is clear that this value gets smaller if N grows in relation to M . However, from a practical perspective itis desirable to take as few samples N with in contrast a large number M . For the further analysis we definethe fraction of these two by S = MN .We include for completeness the intermediate steps, which are similar to the proof of the main theoremin [33]. However, we believe that this allows us to give a deeper understanding. Especially, the constant C rs is interesting to understand and see what impacts its size.We first use the MRA property to rewrite ϕ ∈ R M as the sum of the elements in the related scaling space.Take in mind at this point that we only consider values of M = R . Hence, we only jump from level to level.We get for ϕ ∈ R M with ∣∣ ϕ ∣∣ = ϕ = R − p − ∑ l = α l φ R,n + R − ∑ l = R − p β l φ R,n with R − p − ∑ l = ∣ α n ∣ + R − ∑ l = R − p ∣ β n ∣ = . This reduces the problem of the sum of the inner products for the orthogonal projection from a lot of differentwavelets to shifted scaling function at the same level. We have P ⊥S N ϕ = ∑ k > N ⟨ Wal ( k, ⋅) , ϕ ⟩= ∑ k > N ⟨ Wal ( k, ⋅) , R − p − ∑ l = α l φ R,n + R − ∑ l = R − p β l φ R,n ⟩= ∑ k > N R − p − ∑ l = α l ⟨ Wal ( k, ⋅) , φ R,n ⟩ + ∑ k > N R − ∑ l = R − p β l ⟨ Wal ( k, ⋅) , φ R,n ⟩ . Hence, we start with controlling the inner product ⟨ Wal ( k, ⋅) , φ R,n ⟩ and analogously ⟨ Wal ( k, ⋅) , φ R,n ⟩ . Ouraim is to remove the scaling and the shift from the wavelet and get instead the product between the Walshtransform of the original mother wavelet and a Walsh polynomial. For this we follow the ideas in [33].Remember first, that the mother scaling function is divided into the sum of functions that are supported in [ , ] , i.e. φ ( x ) = p ∑ i =− p + φ i ( x − i + ) with φ i ( x ) = φ ( x + i − )X [ , ] ( x ) and hence ⟨ Wal ( k, ⋅) , φ R,n ⟩ = p ∑ i =− p + ⟨ Wal ( k, ⋅) , φ i,R,n ⟩ . This allows us to only deal with ⟨ Wal ( k, ⋅) , φ i,R,n ⟩ . We get ⟨ Wal ( k, ⋅) , φ i,R,n ⟩ = − R / − R ( n + i ) ∫ − R ( n + i − ) φ i ( R x − n − i + ) Wal ( k, x ) dx = − R / ∫ φ i ( x ) Wal ( k, − R ( x + n + i − )) dx. Next, we use Lemma 3.5 to get the shift out of the integral. We define p R ∶ Z → N to map z to thethe smallest integer with p R ( z ) R + z > . This allows us to use Lemma 3.5 because x ∈ [ , ] and n + i − + R p R ( n + i − ) ∈ N . We get − R / ∫ φ i ( x ) Wal ( k, − R ( x + n + i − )) dx = − R / Wal ( k, − R ( n + i − + R p R ( i − ))) ∫ φ i ( x ) Wal ( k, − R x ) dx = − R / Wal ( n + i − + R p R ( i − ) , k R ) φ i ⋀ W ( k R ) . With this we are able to represent the inner product of every shifted version of φ i,R,n with the Walsh functionas product of the Walsh transform of φ i and a Walsh function which contains the shift information. In thefollowing we want to rewrite the inner products such that we are left with a Walsh polynomial and the Walshtransform of the mother scaling function. For this define Φ i ( z ) = R − p − ∑ n = α n Wal ( n + i − + R p R ( i − ) , z ) and Φ i ( z ) = R − ∑ n = R − p β n Wal ( n + i − + R p R ( i − ) , z ) . We get R − p − ∑ n = α n ⟨ φ i,R,n , Wal ( k, ⋅)⟩ = − R / φ i ⋀ W ( k R ) Φ i ( k R ) and R − ∑ n = R − p β n ⟨ φ i,R,n , Wal ( k, x )⟩ = − R / φ i ⋀ W ( k R ) Φ i ( k R ) . After this evaluation we can go back to estimate the norm of ∣∣ P ⊥ N U P M ∣∣ . We have with the Cauchy-Schwarzinequality ∣∣ P ⊥ N U P M ∣∣ ≤ p ∑ i =− p + ∣∣ P ⊥S N ( R − p − ∑ n = α n φ i,R,n + R − ∑ n = R − p β n φ i,R,n )∣∣ = p ∑ i =− p + ¿```(cid:192) ∑ k > N − R ∣ φ i ⋀ W ( k R ) Φ i ( k R ) + φ i ⋀ W ( k R ) Φ i ( k R )∣ . After multiplying out the brackets we are left with(4.5) ∑ k > N − R ∣ φ i ⋀ W ( k R ) Φ i ( k R )∣ and the analogue for φ as well as their product. Because φ and φ share the same decay rate, it is sufficientto only deal with (4.5) and deduce the rest from it. To estimate these values we use the one-periodicity of theWalsh functions. For this sake let M = R . We always want to reconstruct a full level as we do not know inwhich part of the level the information is located. Then we replace k = mM + j , where j = , . . . , M − and m ≥ S = N / M . This leads to ∑ k ≥ N − R ∣ φ i ⋀ W ( k R ) Φ i ( k R )∣ ≤ M − ∑ j = M ∣ Φ i ( jM )∣ ∑ m ≥ S ∣ φ i ⋀ W ( jM + m )∣ . We estimate with Lemma 4.7 ∑ m ≥ S ∣ φ i ⋀ W ( jM + m )∣ ≤ ∑ m ≥ S C φ m ≤ C φ S .
Here C φ depends on the choice of the wavelet. In contrast to the Fourier case there is no known relationshipbetween the smoothness of the wavelet and the decay rate or the behaviour of C φ , as discussed in Remark3.3.For the first sum we get from technical computations in [33] and Lemma 3.7 that M − ∑ j = M ∣ Φ i ( jM )∣ = R − p + i − + R p R ( i − ) ∑ l =− p + + i − + R p R ( i − ) ∣ α l − i + − R p R ( i − ) ∣ = R − p ∑ n =− p + ∣ α n ∣ ≤ . The analogue holds true for the φ part. Hence, we get together ∣∣ P ⊥ N U P M ∣∣ ≤ p ∑ i =− p + ⎛⎝ ∑ k ≥ M − R ∣ φ i ⋀ W ( k R ) Φ i ( k R )∣ + ∑ k ≥ M − R ∣ φ i ⋀ W ( k R ) Φ i ( k R )∣ + ( ∑ k ≥ M − R ∣ φ i ⋀ W ( k R ) Φ i ( k R )∣ ) / ⎛⎝ ∑ k ≥ M − R ∣ φ i ⋀ W ( k R ) Φ i ( k R )∣ ⎞⎠ / ⎞⎟⎠ / . ≤ p ∑ i =− p + ( C φ S + C φ S + C φ C φ S ) / ≤ S / ( p − ) max { C φ , C φ } / . When we now replace S = N / M and set C rs = ( p − ) max { C φ , C φ } we get ∣∣ P ⊥ N U P M ∣∣ ≤ C rs MN . (cid:3)
With this estimate at hand we can now proof the next lemma.
Lemma 4.13.
Let U be the change of basis matrix given by the Walsh measurements and boundary waveletsof order p ≥ . Then we have that ∣∣ P N k − N k U P M l − M l ∣∣ ≤ C max ⋅ −∣ l − k ∣+ , where C max = max { C ˜ µ , C rs } .Proof. We use similar estimates as in Corollary 4.10. We know from Lemma 4.12 that ∣∣ P ⊥ N U P M ∣∣ ≤ C rs ⋅ ( MN ) , whenever N ≥ M . With this we get for k > l : ∣∣ P N k − N k U P M l − M l ∣∣ ≤ ∣∣ P ⊥ N k − U P M l ∣∣ ≤ C rs ( M l N k − )= C rs ( ( J + l ) ⋅ −( J + k − ) ) = C rs ⋅ ( l − k )+ = C rs ⋅ −∣ l − k ∣+ . For l ≥ k we get max i = N k − + ,...,N k max j = M l − + ,...,M l ∣ u i,j ∣ ≤ C ˜ µ −( J + l − ) . Hence, we conclude ∣∣ P N k − N k U P M l − M l ∣∣ = sup z ∈ C ( J + l − ) , ∣∣ z ∣∣ = N k ∑ i = N k − + M l ∑ j = M l − + ∣ u i,j z j ∣ ≤ sup z ∈ C ( J + l − ) , ∣∣ z ∣∣ = C ˜ µ ⋅ −( J + l − ) N k ∑ i = N k − + M l ∑ j = M l − + ∣ z j ∣ ≤ C ˜ µ ( J + k )−( J + l − ) = C ˜ µ ( k − l )+ = C ˜ µ −∣ k − l ∣+ . (cid:3) With this Lemmas at hand we can now bound the relative sparsity S k ( N , M , s ) . Corollary 4.14.
For the setting as before we have S k ( N , M , s ) ≤ C max r − ∑ l = −∣ k − l ∣/ s l . Proof.
With the estimates from before and the Cauchy-Schwarz inequality we get S k = ( r ∑ l = ∣∣ P N k − N k U P M l − M l ∣∣ √ s l ) = C max ( r ∑ l = −∣ k − l ∣/ √ s l ) ≤ C max r ∑ l = −∣ k − l ∣/ r ∑ l = −∣ k − l ∣/ s l ≤ C max r ∑ l = −∣ k − l ∣/ s l . (cid:3) Bounding ˜ M . Next, for an estimate of ˜ M = min { i ∈ N ∶ max m ≥ i ∣∣ P N U e m ∣∣ ≤ K √ s } we make the following calculation with m = ( J + n ) = M n ≥ N ∣∣ P N U e m ∣∣ = ( N ∑ i = ∣ u i,m ∣ ) / ≤ ( C ˜ µ N ⋅ −( J + n ) ) / . Hence, for ( J + n ) ≥ C ˜ µ ⋅ N ⋅ ( K √ s ) we have ∣∣ P N U e m ∣∣ ≤ K √ s . Therefore,(4.6) ˜ M ≤ C ˜ µ ⋅ ⌈ N K s ⌉ . Balancing Property.
In this chapter we show that the first assumption of Theorem 4.5 is fulfilled.For this sake, we use the results from the previous chapter, especially Lemma 4.12.
Corollary 4.15.
Let S and R be the sampling and reconstruction space spanned by the Walsh functions andseparable boundary wavelets of order p ≥ respectively. Moreover, let M = R with R ∈ N . Then, we getfor all θ ∈ ( , ∞) ∣∣ P ⊥ N U P M ∣∣ ≤ θ, whenever N ≥ C rs ⋅ M ⋅ θ − . Proof.
Rewriting N ≥ C rs ⋅ M ⋅ θ − gives us θ ≥ C rs MN .
And hence with Lemma 4.12 and θ > we get ∣∣ P ⊥ N U P M ∣∣ ≤ ( C rs ⋅ MN ) / ≤ θ. (cid:3) With this at hand we can now proof the relation between
N, M such that the strong balancing property issatisfied.
Lemma 4.16.
For the setting as before
N, K satisfy the strong balancing property with respect to
U, M and s whenever N ≳ M ( log ( M K √ s )) .Proof. From Lemma 4.15 we have that ∣∣ P ⊥ N U P M ∣∣ ≤ √ M ( log / ( KM √ s )) − whenever it holds that N ≳ M ( log ( KM √ s )) . Using additionally that U is an isometry we get ∣∣ P M U ∗ P N U P M − P M ∣∣ ∞ = ∣∣ P M U ∗ P N U P M − P M U ∗ IU P M ∣∣ ∞ = ∣∣ P M U ∗ P ⊥ N U P M ∣∣ ∞ ≤ √ M ∣∣ P ⊥ N U P M ∣∣ ≤ ( log / ( KM √ s )) − . For the second inequality we have that ∣∣ P ⊥ M U ∗ P N U P M ∣∣ ∞ = ∣∣ P ⊥ M U ∗ P N U P M + P ⊥ M U ∗ IU P M ∣∣ ∞ = ∣∣ P ⊥ M U ∗ P ⊥ N U P M ∣∣ ∞ ≤ √ M ∣∣ P ⊥ N U P M ∣∣ ≤ ( log / ( KM √ s )) − ≤ . The last inequality follows from the fact that
K, M, s are integers and therefore log ( KM √ s ) ≥ . Hence,the strong balancing property is fulfilled. (cid:3) Proof of the main theorem.
In this chapter we bring the previous results together to proof Theorem3.1.
Proof of Theorem 3.1.
We show that the assumptions of Theorem . . in [9] are fulfilled. Moreover, wefollow the lines of [10].With Lemma 4.16 we have that N, K satisfy the strong balancing property with respect to
U, M and s .Hence, point (1) in Theorem 4.5 is fulfilled. The last two steps show that (4.3) and (4.4) are fulfilled and Theorem 4.5 can be applied. We have N k − N k − m k log ( (cid:15) − ) ( r ∑ l = µ N , M ( k, l ) s l ) log ( K ˜ M √ s )≤ N k − N k − m k log ( (cid:15) − ) ( r ∑ l = C / µ −( J + k − ) ⋅ C φ,ψ − / −∣ k − l ∣/ s l ) log ( C ˜ µ N K s / )= C / µ C φ,ψ − / log ( (cid:15) − ) m k N k − N k − N k − ( r ∑ l = −∣ k − l ∣/ s l ) log ( C ˜ µ N K s / ) , where we used the estimate of µ N , M from Corollary 4.10 and 4.11, (3.3) and (4.6). Moreover, C / µ C φ,ψ − / is independent of k, l, M , N , s . Therefore, C d / µ C φ,ψ − / log ( (cid:15) − ) m k N k − N k − N k − ( r ∑ l = −∣ k − l ∣/ s l ) log ( C ˜ µ N K s / ) ≲ and Equation (4.3) is fulfilled. Now, we consider Equation (4.4) r ∑ k = ( N k − N k − ˆ m k − ) µ N , M ( k, l ) ˜ s k ≤ r ∑ k = ( N k − N k − ˆ m k ) C / µ −( J + k − ) ⋅ C φ,ψ − / −∣ k − l ∣/ ˜ s k = C / µ C φ,ψ − / N k − N k − N k − r ∑ k = ˜ s k ˆ m k −∣ l − k ∣/ ≤ C / µ C φ,ψ / r ∑ k = ˜ s k ˆ m k −∣ l − k ∣/ Due to the fact that the geometric series is bounded we have r ∑ k = −∣ l − k ∣/ ≤ C geo , for all l = , . . . , r. We are left with bounding ˜ s k / ˆ m k for all k = , . . . , r . Denote the constant from ≲ in Theorem 4.5 by C .With the estimate in Corollary 4.14 we can then bound ˜ s k with (3.3) by ˜ s k ≤ S k ( N , M , s ) ≤ C ˜ µ ⋅ r − ∑ l = −∣ k − l ∣/ s l (4.7) ≤ Cm k ⋅ N k − N k − N k − ( log ( (cid:15) − ) log ( K sN )) − ≤ C N k − N k − N k − ˆ m k = C ˆ m k . All together yields r ∑ k = ( N k − N k − ˆ m k − ) µ N , M ( k, l ) ˜ s k ≤ / C / µ C φ,ψ C geo C ≲ . (cid:3) Recovery guarantees for the Walsh-Haar case.
In this section we pay attention to the Walsh-Haarcase. This relationship is of high interest because of the very similar behaviour of Walsh functions and Haarwavelets. As seen earlier this results in perfect block diagonality of the change of basis matrix, see Figure1a. For a detailed analysis we refer the reader to [53]. Due to the structure, the off diagonal blocks to notimpact the coherence and sparsity structure at one level. Therefore, the number of samples per level onlydepends on the incoherence in this given level and the relative sparsity within. With this the main theoremsimplifies for the Walsh-Haar case to the next Corollary.
Corollary 4.17.
Let the notation be as before, but let the wavelet be the Haar wavelet. Moreover, let (cid:15) > and Ω = Ω N,m be a multilevel sampling scheme such that: (1) The number of samples is larger or equal the number of reconstructed coefficients, i.e. N ≥ M .(2) Let K = max k = ,...,r { N k − N k − m k } , M = M r , N = N r and s = s + . . . + s r and for each k = , . . . , r : m k ≳ log ( (cid:15) − ) log ( K √ sN ) ⋅ s k . Then, with probability exceeding − s(cid:15) , any minimizer ξ ∈ (cid:96) ( N ) satisfies ∣∣ ξ − x ∣∣ ≤ c ⋅ ( δ √ K ( + L √ s ) + σ s,M ( f )) , for some constant c , where L = c ⋅ ( + √ log ( (cid:15) − ) log ( KM √ s ) ) . If m k = N k − N k − for ≤ k ≤ r then this holds withprobability .Proof. Due to the block diagonality we have for N ≥ M that ∣∣ P ⊥ N U P M ∣∣ = max ϕ ∈R M ∑ k > N ∣⟨ Wal ( k, ⋅) , ϕ ⟩∣ = and therefore the balancing property is satisfied for any K . Hence assumption (1) in 4.5 is fulfilled. Next,for the same reason where we replace P ⊥ N with P m we have that ˜ M = N and µ N , M ( k, l ) = for k ≠ l . Thisallows us to get rid of the sum in the estimate in the main theorem, i.e. N k − N k − m k log ( (cid:15) − ) ( r ∑ l = µ N , M ( k, l ) s l ) log ( K ˜ M √ s )≤ N k − N k − m k log ( (cid:15) − ) ( −( J + k − ) s k ) log ( K √ sN )= N k − N k − N k − log ( (cid:15) − ) m k s k log ( K √ sN )= log ( (cid:15) − ) m k s k log ( K √ sN ) ≲ . For the second equation (4.4) we get with Equation (4.7) r ∑ k = ( N k − N k − ˆ m k − ) µ N , M ( k, l ) ˜ s k ≤ ( N l − N l − ˆ m l ) −( J + l − ) ˜ s l ≤ N l − N l − N l − ˜ s l ˆ m l ≤ C ≲ . (cid:3)
5. N
UMERICAL EXPERIMENTS
In this chapter we show some examples which illustrate the performance gain we get from the use ofcompressed sensing in contrast to direct inversion. Additionally, we discuss the impact of the samplingpattern and that the coherence structure needs to be taken into account. For this sake we use a modificationof the flip test introduced in [9].First, we have a look again in figure 1 at the coherence structure of the change of basis matrix for differenttypes of Daubechies wavelets. One can directly spot the block structure of the matrices. This is especiallystriking for the Haar-Walsh case, but also for other wavelets it is easy to see that the first block has the largestvalues with nearly zeros outside the blocks and a decay along the diagonal. Together with the structuredsparsity of functions and images under the wavelet transform we can apply the main theorem to improve thereconstruction quality. To demonstrate this, let(5.1) f ( x ) = cos ( πx ) + . ( πx ) . Then, we can see the reconstruction in figure 4. Due to the discontinuous behaviour of the Walsh functionsand the smoothness of the function f , the direct inversion has a lot of block artefacts. Here, CS gives nearly F IGURE
3. Original function from Equation (5.1)perfect reconstruction in Figure 4. It is important to remark that the sampling pattern is chosen accordinglyto Equation (3.3) in the main theorem. However, as mentioned in Remark 3.3 we assume that the squaredrelation in Equation (3.2) is not sharp, which can also be observed in the numerical examples. Next, wedemonstrate that the structure is very important. For this sake we conducted the same experiment with aflipped sampling pattern, see figure 5b. Then, the reconstruction is nowhere close to perfect and the originalsignal is not even identifiable. 6. C
ONCLUSION
In this paper we have completed the theory about linear and non-linear recovery guarantees for the recon-struction from binary and Fourier measurements with wavelets. We underlined the results about the struc-tured sampling and sparsity theory with the special case of Walsh and wavelet reconstruction. Additionally,we showed the numerical gains and the problems that arise when the theory is not taken into account.A
CKNOWLEDGEMENT
LT acknowledges support by the UK Engineering and Physical Sciences Research Council (EPSRC)grant EP/L016516/1 for the University of Cambridge Centre for Doctoral Training, the Cambridge Centrefor Analysis. AH acknowledges support from a Royal Society University Research FellowshipR
EFERENCES[1] B. Adcock, V. Antun, and A. C. Hansen. Uniform recovery in infinite-dimensional compressed sensing and applications to struc-tured binary sampling. arXiv preprint arXiv:1905.00126 , 2019.[2] B. Adcock and A. C. Hansen. A generalized sampling theorem for stable reconstructions in arbitrary bases.
J. Fourier Anal. Appl. ,18(4):685–716, 2010.[3] B. Adcock and A. C. Hansen. Stable reconstructions in hilbert spaces and the resolution of the gibbs phenomenon.
Appl. Comput.Harmon. Anal. , 32(3):357–388, 2012.[4] B. Adcock and A. C. Hansen. Generalized sampling and infinite-dimensional compressed sensing.
Foundations of ComputationalMathematics , 16(5):1263–1323, Oct 2016.[5] B. Adcock, A. C. Hansen, G. Kutyniok, and J. Ma. Linear stable sampling rate: Optimality of 2d wavelet reconstructions fromfourier measurements.
SIAM J. Math. Anal. , 47(2):1196–1233, 2015.[6] B. Adcock, A. C. Hansen, and C. Poon. Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling,and application to the uniform resampling problem.
SIAM J. Math. Anal. , 45(5):3132–3167, 2013.[7] B. Adcock, A. C. Hansen, and C. Poon. On optimal wavelet reconstructions from fourier samples: linearity and universality.
Appl.Comput. Harmon. Anal. , 36(3):387–415, 2014.[8] B. Adcock, A. C. Hansen, and C. Poon. On optimal wavelet reconstructions from Fourier samples: linearity and universality ofthe stable sampling rate.
Appl. Comput. Harmon. Anal. , 36(3):387–415, 2014. ( A ) Truncated Walsh transform for ∣ m ∣ = . M ( B ) Truncated Walsh transform for ∣ m ∣ = . M ( C ) CS reconstruction for ∣ m ∣ = . M ( D ) CS reconstruction for ∣ m ∣ = . M ( E ) Sampling patern with subsampling ( F ) Sampling patern with subsampling F IGURE
4. Reconstruction of the function f with Daubechies wavelets with vanishingmoments and N = , M = . [9] B. Adcock, A. C. Hansen, C. Poon, and B. Roman. Breaking the coherence barrier: A new theory for compressed sensing. In Forum of Mathematics, Sigma , volume 5. Cambridge University Press, 2017.[10] B. Adcock, A. C. Hansen, and B. Roman. A note on compressed sensing of structured sparse wavelet coefficients from subsampledfourier measurements.
IEEE signal processing letters , 23(5):732–736, 2016.[11] A. Aldroubi and M. Unser. A general sampling theory for nonideal acquisition devices.
IEEE Trans. Signal Process. , 42(11):2915–2925, 1994.[12] V. Antun. Coherence estimates between hadamard matrices and daubechies wavelets. Master’s thesis, University of Oslo, 2016.[13] M. Bachmayr, A. Cohen, R. DeVore, and G. Migliorati. Sparse polynomial approximation of parametric elliptic pdes. part ii:lognormal coefficients.
ESAIM: Mathematical Modelling and Numerical Analysis , 51(1):341–363, 2017.[14] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Data assimilation in reduced modeling.
SIAM/ASAJournal on Uncertainty Quantification , 5(1):1–29, 2017.[15] E. J. Cand`es and L. Demanet. Curvelets and fourier integral operators.
C. R. Acad. Sci. , 336(1):395–398, 2003.[16] E. J. Cand`es and D. Donoho. New tight frames of curvelets and optimal representations of objects with piecewise c singularities. Comm. Pure Appl. Math. , 57(2):219–266, 2004. ( A ) CS reconstruction( B ) Flipped sampling pattern F IGURE
5. Reconstruction with flipped samples with N = , M = and ∣ m ∣ = . M [17] E. J. Cand`es and L. Donoho. Recovering edges in ill-posed inverse problems: optimality of curvelet frames. Ann. Statist. ,30(3):784–842, 2002.[18] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete fourierinformation.
IEEE Trans. Inform. Theory , (52):489–509, 2006.[19] K. Choi, S. Boyd, J. Wang, L. Xing, L. Zhu, and T.-S. Suh. Compressed Sensing Based Cone-Beam Computed TomographyReconstruction with a First-Order Method.
Medical Physics , 37(9), 2010.[20] P. Clemente, V. Dur´an, E. Tajahuerce, P. Andr´es, V. Climent, and J. Lancis. Compressive holography with a single-pixel detector.
Optics letters , 38(14):2524–2527, 2013.[21] A. Cohen, I. Daubechies, and P. Vial. Wavelets on the interval and fast wavelet transforms.
Comput. Harmon. Anal. , 1(1):54–81,1993.[22] S. Dahlke, G. Kutyniok, P. Maass, C. Sagiv, H.-G. Stark, and G. Teschke. The uncertainty principle associated with the continuousshearlet transform.
Int. J. Wavelets Multiresolut. Inf. Process. , 2(6):157–181, 2008.[23] S. Dahlke, G. Kutyniok, G. Steidl, and G. Teschke. Shearlet coorbit spaces and associated banach frames.
Appl. Comput. Harmon.Anal. , 2(27):195–214, 2009.[24] R. DeVore, G. Petrova, and P. Wojtaszczyk. Data assimilation and sampling in banach spaces.
Calcolo , 54(3):963–1007, 2017.[25] T. Dvorkind and Y. C. Eldar. Robust and consistent sampling.
IEEE Signal Process. Letters , 16(9):739–742, 2009.[26] Y. C. Eldar. Sampling with arbitrary sampling and reconstruction spaces and oblique dual frame vectors.
J. Fourier Anal. Appl. ,9(1):77–96, 2003.[27] Y. C. Eldar. Sampling without input constraints: Consistent reconstruction in arbitrary spaces.
Sampling, Wavelets and Tomogra-phy , 2003.[28] Y. C. Eldar and T. Werther. General framework for consistent sampling in hilbert spaces.
Int. J. Wavelets Multiresolut. Inf. Process. ,3(4):497–509, 2005.[29] S. Foucart and H. Rauhut.
A Mathematical Introduction to Compressive Sensing . Springer Science+Business Media, New York,2013.[30] M. Gataric and C. Poon. A practical guide to the recovery of wavelet coefficients from fourier measurements.
SIAM J. Sci. Comput. ,38(2):A1075–A1099, 2016.[31] M. Guerquin-Kern, M. H¨aberlin, K. Pruessmann, and M. Unser. A fast wavelet-based reconstruction method for magnetic reso-nance imaging.
IEEE Transactions on Medical Imaging , 30(9):1649–1660, 2011.[32] A. C. Hansen. On the solvability complexity index, the n -pseudospectrum and approximations of spectra of operators. J. Amer.Math. Soc. , 24(1):81–124, 2011. [33] A. C. Hansen and L. Thesing. On the stable sampling rate for binary measurements and wavelet reconstruction. Applied andComputational Harmonic Analysis , 2018.[34] T. Hrycak and K. Gr¨ochenig. Pseudospectral fourier reconstruction with the modified inverse polynomial reconstruction method.
J. Comput. Phys. , 229(3):933–946, 2010.[35] G. Huang, H. Jiang, K. Matthews, and P. Wilford. Lensless imaging by compressive sensing. In , pages 2101–2105. IEEE, 2013.[36] A. Jones, A. Tamt¨ogl, I. Calvo-Almaz´an, and A. C. Hansen. Continuous compressed sensing for surface dynamical processes withhelium atom scattering.
Nature Sci. Rep. , 6:27776 EP –, 06 2016.[37] G. Kutyniok and W.-Q. Lim. Optimal compressive imaging of fourier data.
SIAM Journal on Imaging Sciences , 11(1):507–546,2018.[38] R. Leary, Z. Saghi, P. A. Midgley, and D. J. Holland. Compressed sensing electron tomography.
Ultramicroscopy , 131(0):70–91,2013.[39] C. Li and B. Adcock. Compressed sensing with local structure: uniform recovery guarantees for the sparsity in levels class.
Appliedand Computational Harmonic Analysis , 2017.[40] M. Lustig, D. Donoho, and J. M. Pauly. Sparse mri: the application of compressed sensing for rapid mr imaging.
MagneticResonance in Medicine , 58(6):1182–1195, 2007.[41] J. Ma. Generalized sampling reconstruction from fourier measurements using compactly supported shearlets.
Appl. Comput. Har-mon. Anal. , 2015.[42] Y. Maday, T. Anthony, J. D. Penn, and M. Yano. Pbdw state estimation: Noisy observations; configuration-adaptive backgroundspaces; physical interpretations.
ESAIM: Proceedings and Surveys , 50:144–168, 2015.[43] Y. Maday and O. Mula. A generalized empirical interpolation method: application of reduced basis techniques to data assimilation.In
Analysis and numerics of partial differential equations , pages 221–235. Springer, 2013.[44] Y. Maday, A. T. Patera, J. D. Penn, and M. Yano. A parameterized-background data-weak approach to variational data assimilation:formulation, analysis, and application to acoustics.
International Journal for Numerical Methods in Engineering , 102(5):933–965,2015.[45] S. Mallat.
A wavelet tour of signal processing . Academic Press, San Diego, 1998.[46] A. Moshtaghpour, J. M. B. Dias, and L. Jacques. Close encounters of the binary kind: Signal reconstruction guarantees forcompressive hadamard sampling with haar wavelet basis. arXiv preprint arXiv:1907.09795 , 2019.[47] M. M¨uller.
Introduction to Confocal Fluorescence Microscopy . SPIE, Bellingham, Washington, 2006.[48] C. Poon. A consistent and stable approach to generalized sampling.
J. Fourier Anal. Appl. , (20):985–1019, 2014.[49] C. Poon. Structure dependent sampling in compressed sensing: theoretical guarantees for tight frames.
Applied and ComputationalHarmonic Analysis , 42(3):402–451, 2017.[50] C. Shannon. A mathematical theory of communication.
Bell Syst. Tech. J. , (27):379–423, 623–656, 1948.[51] V. Studer, J. Bobin, M. Chahid, H. S. Mousavi, E. J. Cand´es, and M. Dahan. Compressive fluorescence microscopy for biologicaland hyperspectral imaging.
Proceedings of the National Academy of Sciences , 109(26):E1679–E1687, 2012.[52] T. Sun, G. Woods, M. F. Duarte, K. Kelly, C. Li, and Y. Zhang. Obic measurements without lasers or raster-scanning based oncompressive sensing. In
Int. Symposium for Testing and Failure Analysis (ISTFA), San Jose, CA , pages 272–277, 2009.[53] L. Thesing and A. C. Hansen. Linear reconstructions and the analysis of the stable sampling rate.
Sampling theory in imageprocessing , 17(1):103–126, 2018.[54] M. Unser. Sampling - 50 years after shannon.
Proc. IEEE , 4(88):569–587, 2000.[55] M. Unser and J. Zerubia. A generalized sampling theory without band-limiting constraints.
IEEE Trans. Circuits Syst. II. ,45(8):959–969, 1998.DAMTP, U
NIVERSITY OF C AMBRIDGE
E-mail address : [email protected] DAMTP, U
NIVERSITY OF C AMBRIDGE
E-mail address ::