[PDF] Analysis vs. synthesis sparsity for α-shearlets

Abstract

There are two notions of sparsity associated to a frame \Psi=(\psi_i)_{i\in I}: Analysis sparsity of f means that the analysis coefficients (\langle f,\psi_i\rangle)_i are sparse, while synthesis sparsity means that f=\sum_i c_i\psi_i with sparse coefficients (c_i)_i. Here, sparsity of c=(c_i)_i means c\in\ell^p(I) for a given p<2. We show that both notions of sparsity coincide if \Psi={\rm SH}(\varphi,\psi;\delta) is a discrete (cone-adapted) shearlet frame with 'nice' generators \varphi,\psi and fine enough sampling density \delta>0. The required 'niceness' is explicitly quantified in terms of Fourier-decay and vanishing moment conditions. Precisely, we show that suitable shearlet systems simultaneously provide Banach frames and atomic decompositions for the shearlet smoothness spaces \mathscr{S}_s^{p,q} introduced by Labate et al. Hence, membership in \mathscr{S}_s^{p,q} is simultaneously equivalent to analysis sparsity and to synthesis sparsity w.r.t. the shearlet frame. As an application, we prove that shearlets yield (almost) optimal approximation rates for cartoon-like functions f: If \epsilon>0, then \Vert f-f_N\Vert_{L^2}\lesssim N^{-(1-\epsilon)}, where f_N is a linear combination of N shearlets. This might appear to be well-known, but the existing proofs only establish this approximation rate w.r.t. the dual \tilde{\Psi} of \Psi, not w.r.t. \Psi itself. This is not completely satisfying, since the properties of \tilde{\Psi} (decay, smoothness, etc.) are largely unknown. We also consider \alpha-shearlet systems. For these, the shearlet smoothness spaces have to be replaced by \alpha-shearlet smoothness spaces. We completely characterize the embeddings between these spaces, allowing us to decide whether sparsity w.r.t. \alpha_1-shearlets implies sparsity w.r.t. \alpha_2-shearlets.

Full PDF

aa r X i v : . [ m a t h . F A ] F e b ANALYSIS VS. SYNTHESIS SPARSITY FOR α -SHEARLETS FELIX VOIGTLAENDER, ANNE PEIN

Abstract.

There are two notions of sparsity associated to a frame

Ψ = ( ψ i ) i ∈ I : Analysis sparsity of f means thatthe analysis coeﬃcients ( h f, ψ i i ) i ∈ I are sparse, while synthesis sparsity means that we can write f = P i ∈ I c i ψ i with sparse synthesis coeﬃcients ( c i ) i ∈ I . Here, sparsity of a sequence c = ( c i ) i ∈ I means c ∈ ℓ p ( I ) for a given p < .We show that both notions of sparsity coincide if Ψ = SH ( ϕ, ψ ; δ ) = ( ψ i ) i ∈ I is a discrete (cone-adapted) shearletframe with suﬃciently nice generators ϕ, ψ and suﬃciently small sampling density δ > . The required ’niceness’ of ϕ, ψ is explicitly quantiﬁed in terms of Fourier-decay and vanishing moment conditions. In addition to ℓ p -sparsity,we even allow weighted ℓ p -spaces ℓ pw s as a sparsity measure, with weights of the form w s = (cid:0) js (cid:1) ( j,ℓ,δ,k ) where j encodes the scale of the corresponding shearlet elements.More precisely, we show that the shearlet smoothness spaces S p,qs (cid:0) R (cid:1) introduced by Labate et al. simultane-ously characterize analysis and synthesis sparsity with respect to a shearlet frame, in the sense that—for suitable ϕ, ψ, δ —the following are equivalent: 1) f ∈ S p,ps + ( p − − − ) (cid:0) R (cid:1) ; 2) ( h f, ψ i i ) i ∈ I ∈ ℓ pw s ; 3) f = P i ∈ I c i ψ i forsuitable coeﬃcients c = ( c i ) i ∈ I ∈ ℓ pw s .As an application, we prove that shearlets yield (almost) optimal approximation rates for the class of cartoon-like functions : If f is cartoon-like and ε > , then k f − f N k L . N − (1 − ε ) , where f N is a linear combination of N shearlets. This might appear to be a well-known statement, but an inspection of the existing proofs reveals thatthese only establish analysis sparsity of cartoon-like functions, which implies k f − g N k L . N − · (1 + log N ) / ,where g N is a linear combination of N elements of the dual frame e Ψ to the shearlet frame Ψ . This is not completelysatisfying, since only limited knowledge about the structure and properties of e Ψ is available.In addition to classical shearlets, we also consider more general α -shearlet systems. For these, the parabolicscaling is replaced by α -parabolic scaling. The resulting systems range from ridgelet-like systems (for α = 0 ) overclassical shearlets ( α = ) to wavelet-like systems ( α = 1 ). In this more general case, the shearlet smoothnessspaces S p,qs (cid:0) R (cid:1) have to be replaced by the α -shearlet smoothness spaces S p,qα,s (cid:0) R (cid:1) . We completely characterizethe existence of embeddings between these spaces for diﬀerent values of α . This allows us to decide whether sparsitywith respect to α -shearlets implies sparsity with respect to α -shearlets, even for α = α . Introduction

A cone-adapted shearlet system [46, 51, 44, 48, 43]

SH ( ϕ, ψ, θ ; δ ) is a directional multiscale system in L (cid:0) R (cid:1) that is obtained by applying suitable translations, shearings and parabolic dilations to the generators ϕ, ψ, θ . Theshearings are utilized to obtain elements with diﬀerent orientations ; precisely, the number of diﬀerent orientationson scale j is approximately j/ , in stark contrast to wavelet-like systems which only employ a constant number ofdirections per scale. We refer to Deﬁnition 5.6 for a more precise description of shearlet systems.One of the most celebrated properties of shearlets is their ability to provide “optimally sparse approximations”for functions that are governed by directional features like edges. This can be made more precise by introducingthe class E (cid:0) R (cid:1) of C -cartoon-like functions ; roughly, these are all compactly supported functions that are C away from a C edge [44]. More rigorously, the class E (cid:0) R (cid:1) consists of all functions f that can be written as f = f + B · f with f , f ∈ C c (cid:0) R (cid:1) and a compact set B ⊂ R whose boundary ∂B is a C Jordan curve; seealso Deﬁnition 6.1 for a completely formal description of the class of cartoon-like functions. With this notion, the(almost) optimal sparse approximation of cartoon-like functions as understood in [44, 51] means that k f − f N k L . N − · (1 + log N ) / ∀ N ∈ N and f ∈ E (cid:0) R (cid:1) . (1.1)Here, the N -term approximation f N is obtained by retaining only the N largest coeﬃcients in the expansion f = P i ∈ I h f, ψ i i e ψ i , where e Ψ = ( e ψ i ) i ∈ I is a dual frame for the shearlet frame Ψ = SH ( ϕ, ψ, θ ; δ ) = ( ψ i ) i ∈ I .Formally, this means f N = P i ∈ I N h f, ψ i i e ψ i , where the set I N ⊂ I satisﬁes | I N | = N and |h f, ψ i i| ≥ |h f, ψ j i| for all i ∈ I N and j ∈ I \ I N . Mathematics Subject Classiﬁcation.

Key words and phrases.

Shearlets; Sparsity; Nonlinear approximation; Decomposition spaces; Smoothness spaces; Banach frames;Atomic decompositions. α -Shearlets — Felix Voigtlaender & Anne Pein One can even show that the approximation rate in equation (1.1) is optimal up to log factors; i.e., up to logfactors, no reasonable system ( ̺ n ) n ∈ N can achieve a better approximation rate for the whole class E (cid:0) R (cid:1) . Therestriction to “reasonable” systems is made to exclude pathological cases like dense subsets of L (cid:0) R (cid:1) and involvesa restriction of the search depth : The N -term approximation f N = P n ∈ J N c n ̺ n has to satisfy | J N | = N andfurthermore J N ⊂ { , . . . , π ( N ) } for a ﬁxed polynomial π . For more details on this restriction, we refer to [38,Section 2.1.1].The approximation rate achieved by shearlets is precisely the same as that obtained by (second generation) curvelets [2]. Note, however, that the construction of curvelets in [2] uses bandlimited frame elements, whileshearlet frames can be chosen to have compact support[51, 46]. A frame with compactly supported elements ispotentially advantageous for implementations, but also for theoretical considerations, since localization argumentsare highly simpliﬁed and since compactly supported frames can be adapted to frames on bounded domains, seee.g. [40, 41]. A further advantage of shearlets over curvelets is that curvelets are deﬁned using rotations , whileshearlets employ shearings to change the orientation; in contrast to rotations, these shearings leave the digital grid Z invariant, which is beneﬁcial for implementations.1.1. Cartoon approximation by shearlets . Despite its great utility, the approximation result in equation (1.1)has one remaining issue: It yields a rapid approximation of f by a linear combination of N elements of the dualframe e Ψ of the shearlet frame Ψ , not by a linear combination of N elements of Ψ itself. If Ψ is a tight frame, this isno problem, but the only known construction of tight cone-adapted shearlet frames uses bandlimited generators. Incase of a non-tight cone-adapted shearlet frame, the only knowledge about e Ψ that is available is that e Ψ is a framewith dual Ψ ; but nothing seems to be known[36] about the support, the smoothness, the decay or the frequencylocalization of the elements of e Ψ . Thus, it is highly desirable to have an approximation result similar to equation(1.1), but with f N being a linear combination of N elements of the shearlet frame Ψ = SH ( ϕ, ψ, θ ; δ ) itself .We will provide such a result by showing that analysis sparsity with respect to a (suitable) shearlet frame SH ( ϕ, ψ, θ ; δ ) is equivalent to synthesis sparsity with respect to the same frame, cf. Theorem 5.13. Here, analysissparsity with respect to a frame Ψ = ( ψ i ) i ∈ I means that the analysis coeﬃcients A Ψ f = ( h f, ψ i i ) i ∈ I are sparse ,i.e., they satisfy A Ψ f ∈ ℓ p ( I ) for some ﬁxed p ∈ (0 , . Note that an arbitrary function f ∈ L (cid:0) R (cid:1) always satisﬁes A Ψ f ∈ ℓ ( I ) by the frame property. Synthesis sparsity means that we can write f = S Ψ c = P i ∈ I c i ψ i for a sparse sequence c = ( c i ) i ∈ I , i.e., c ∈ ℓ p ( I ) . For general frames, these two properties need not be equivalent, as shown inSection A.Note though that such an equivalence would indeed imply the desired result, since the proof of equation (1.1)given in [51] proceeds by a careful analysis of the analysis coeﬃcients A Ψ f of a cartoon-like function f : Bycounting how many shearlets intersect the “problematic” region ∂B where f = f + B · f is not C and by thendistinguishing whether the orientation of the shearlet is aligned with the boundary curve ∂B or not, the authorsshow P n>N | θ n ( f ) | . N − · (1 + log N ) , where ( θ n ( f )) n ∈ N is the nonincreasing rearrangement of the shearletanalysis coeﬃcients A Ψ f . It is not too hard to see (see e.g. the proof of Theorem 6.3) that this implies A Ψ f ∈ ℓ p ( I ) for all p > . Assuming that analysis sparsity with respect to the shearlet frame Ψ is indeed equivalent to synthesissparsity, this implies f = P i ∈ I c i ψ i for a sequence c = ( c i ) i ∈ I ∈ ℓ p ( I ) . Then, simply by taking only the N largestcoeﬃcients of the sequence c and by using that the synthesis map S Ψ : ℓ ( I ) → L (cid:0) R (cid:1) , ( e i ) i ∈ I P i ∈ I e i ψ i isbounded, it is not hard to see k f − f N k L . k c − c · I N k ℓ . N − ( p − − − ) , where I N ⊂ I is a set containing N largest coeﬃcients of c .Thus, once we know that analysis sparsity with respect to a (suitable) shearlet frame is equivalent to synthesissparsity, we only need to make the preceding argument completely rigorous.1.2. Previous results concerning the equivalence of analysis and synthesis sparsity for shearlets.

Asnoted above, analysis sparsity and synthesis sparsity need not be equivalent for general frames. To address thisand other problems, Gröchenig[35] and Gröchenig & Cordero[5], as well as Gröchenig & Fornasier[22] introducedthe concept of (intrinsically) localized frames for which these two properties are indeed equivalent, cf. [33,Proposition 2].In contrast to Gabor- and wavelet frames, it is quite nontrivial, however, to verify that a shearlet or curveletframe is intrinsically localized: To our knowledge, the only papers discussing a variant of this property are [36, 42],where the results from [36] about curvelets and shearlets are generalized in [42] to the setting of α -molecules; ageneralization that we will discuss below in greater detail. For now, let us stick to the setting of [36]. In that paper,Grohs considers a certain distance function ω : Λ S × Λ S → [1 , ∞ ) (cf. [36, Deﬁnition 3.9] for the precise formula) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein on the index set Λ S := n ( j, ℓ, k, δ ) ∈ N × Z × Z × { , } (cid:12)(cid:12)(cid:12) − ⌊ j/ ⌋ ≤ ℓ < ⌊ j/ ⌋ o , which is (a slightly modiﬁed version of) the index set that is used for shearlet frames. A shearlet frame Ψ = ( ψ λ ) λ ∈ Λ S is called N -localized with respect to ω if the associated Gramian matrix A := A Ψ := ( h ψ λ , ψ λ ′ i ) λ,λ ′ ∈ Λ S satisﬁes |h ψ λ , ψ λ ′ i| ≤ k A k B N · [ ω ( λ, λ ′ )] − N ∀ λ, λ ′ ∈ Λ S , (1.2)where k A k B N is chosen to be the optimal constant in the preceding inequality.Then, if Ψ is a frame with frame bounds A, B > , i.e., if A · k f k L ≤ P λ ∈ Λ S |h f, ψ λ i| ≤ B · k f k L for all f ∈ L (cid:0) R (cid:1) , [36, Lemma 3.3] shows that the inﬁnite matrix A induces a bounded, positive semi-deﬁnite operator A : ℓ (cid:0) Λ S (cid:1) → ℓ (cid:0) Λ S (cid:1) that furthermore satisﬁes σ ( A ) ⊂ { } ∪ [ A, B ] and the Moore-Penrose pseudoinverse A + of A is the Gramian associated to the canonical dual frame e Ψ of Ψ . This is important, since [36, Theorem3.11] now yields the following: Theorem.

Assume that

Ψ = ( ψ λ ) λ ∈ Λ S is a shearlet frame with sampling density δ > and frame bounds A, B > .Furthermore, assume that Ψ is N + L -localized with respect to ω , where N > and L > · ln (10)ln (5 / . Then the canonical dual frame e Ψ of Ψ is N + -localized with respect to ω , with N + = N ·  (cid:18) A + B k A k N + L · h C δ · (cid:16) − − L/ − + + − − L/ + − − L/ (cid:17)i (cid:19) log (cid:16) B + A B − A (cid:17)  − , (1.3) where the constant C δ > only depends on the sampling density δ > . To see how this theorem could in principle be used, note that the dual frame coeﬃcients satisfy ( h f, f ψ λ i ) λ ∈ Λ S = A + ( h f, ψ λ i ) λ ∈ Λ . Consequently, if(!) the Gramian A + of the canonical dual frame e Ψ of Ψ restricts to a well-deﬁned and boundedoperator A + : ℓ p (cid:0) Λ S (cid:1) → ℓ p (cid:0) Λ S (cid:1) , then analysis sparsity with respect to Ψ would imply analysis sparsity withrespect to e Ψ and thus synthesis sparsity with respect to Ψ , as desired. In fact, [36, Proposition 3.5] shows that if A + is N + -localized with respect to ω , then A + : ℓ p (cid:0) Λ S (cid:1) → ℓ p (cid:0) Λ S (cid:1) is bounded as long as N + > p − .Thus, it seems that all is well, in particular since a combination of [39, Theorem 2.9 and Proposition 3.11] provides readily veriﬁable conditions on the generators ϕ, ψ, θ which ensure that the shearlet frame Ψ = SH ( ϕ, ψ, θ ; δ ) is N -localized with respect to ω .There is, however, a well-hidden remaining problem which is also the reason why the equivalence of analysis andsynthesis sparsity is not explicitly claimed in any of the papers [36, 42, 39, 37]: As seen above, we need N + > p − ,but it is not clear at all that this can be achieved with N + as in equation (1.3): There are strong interdependenciesbetween the diﬀerent quantities on the right-hand side of equation (1.3) which make it next to impossible to verify N + > p − . Indeed, the results in [39] only yield k A k N + L < ∞ under certain assumptions (which depend on N + L ) concerning ϕ, ψ, θ , but no explicit control over k A k N + L is given. Thus, it is not at all clear that increasing N (or L ) will increase N + . Likewise, the frame bounds A, B only depend on ϕ, ψ, θ (which are more or less ﬁxed)and on the sampling density δ . Thus, one could be tempted to change δ to inﬂuence A, B in equation (1.3) andthus to achieve N + > p − . But the sampling density δ also inﬂuences C δ and k A k N + L , so that it is again notclear at all whether one can ensure N + > p − by modifying δ .A further framework for deriving the equivalence between analysis and synthesis sparsity for frames is provided by (generalized) coorbit theory [17, 18, 19, 53, 23, 54]. Here, one starts with a continuous frame Ψ = ( ψ x ) x ∈ X whichis indexed by a locally compact measure space ( X, µ ) . In the case of classical, group-based coorbit theory[17, 18, 19],it is even required that ( ψ x ) x ∈ G = ( π ( x ) ψ ) x ∈ G arises from an integrable, irreducible unitary representation ofa locally compact topological group G , although one can weaken certain of these conditions[10, 11, 6, 3]. Strictly speaking, [39, Deﬁnition 2.4] uses the index distance ω ( λ, λ ′ ) = 2 | s λ − s λ ′ | (1 + 2 min { s λ ,s λ ′ } d ( λ, λ ′ )) which is diﬀerent fromthe distance ω ( λ, λ ′ ) = 2 | s λ − s λ ′ | (1 + d ( λ, λ ′ )) used in [36, Deﬁnition 3.9]. Luckily, this inconsistency is no serious problem, since thedistance in [39] dominates the distance from [36], so that N -localization with respect to the [39]-distance implies N -localization withrespect to the [36]-distance. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Based on the continuous frame Ψ , one can then introduce so-called coorbit spaces Co ( Y ) which are deﬁned interms of decay conditions (speciﬁed by the function space Y ) concerning the voice transform V Ψ f ( x ) := h f, ψ x i of a function or distribution f . Coorbit theory then provides conditions under which one can sample the continuousframe Ψ to obtain a discrete frame Ψ d = ( ψ x i ) i ∈ I , but such that membership of a distribution f in Co ( Y ) is simultaneously equivalent to analysis sparsity and to synthesis sparsity of f with respect to Ψ d .Thus, if one could ﬁnd a continuous frame Ψ such that the prerequisites of coorbit theory are satisﬁed and suchthat the discretized frame Ψ d coincides with a discrete, cone-adapted shearlet frame, one would obtain the desiredequivalence between analysis sparsity and synthesis sparsity. There is, however, no known construction of such aframe Ψ : Although there is a rich theory of shearlet coorbit spaces [9, 13, 12, 7, 8, 14, 31] which ﬁts into themore general framework of wavelet-type coorbit spaces [26, 30, 28, 29, 27, 31, 32, 25], the resulting discretizedframes are not cone-adapted shearlet frames; instead, they are highly directionally biased (i.e., they treat the x and y direction in very diﬀerent ways) and the number of directions per scale is inﬁnite for each scale; therefore, thesesystems are unsuitable for most practical applications and for the approximation of cartoon-like functions, cf. [43,Section 3.3]. Hence—at least using the currently known constructions of continuous shearlet frames—coorbit theorycan not be used to derive the desired equivalence of analysis and synthesis sparsity with respect to cone-adaptedshearlet frames.1.3. Our approach for proving the equivalence of analysis and synthesis sparsity for shearlets.

In thispaper, we use the recently introduced theory of structured Banach frame decompositions of decompositionspaces [62] to obtain the desired equivalence between analysis and synthesis sparsity for (cone-adapted) shearletframes. A more detailed and formal exposition of this theory will be given in Section 2; for this introduction, werestrict ourselves to the bare essentials.The starting point in [62] is a covering Q = ( Q i ) i ∈ I of the frequency space R d , where it is assumed that each Q i is of the form Q i = T i Q + b i for a ﬁxed base set Q ⊂ R d and certain linear maps T i ∈ GL (cid:0) R d (cid:1) and b i ∈ R d . Then,using a suitable partition of unity Φ = ( ϕ i ) i ∈ I subordinate to Q and a suitable weight w = ( w i ) i ∈ I on the index set I of the covering Q , one deﬁnes the associated decomposition space (quasi)-norm k g k D ( Q ,L p ,ℓ qw ) := (cid:13)(cid:13)(cid:13)(cid:0) w i · (cid:13)(cid:13) F − ( ϕ i · b g ) (cid:13)(cid:13) L p (cid:1) i ∈ I (cid:13)(cid:13)(cid:13) ℓ q , while the associated decomposition space D ( Q , L p , ℓ qw ) contains exactly those distributions g for which thisquasi-norm is ﬁnite.Roughly speaking, the decomposition space (quasi)-norm measures the size of the distribution g by frequency-localizing g to each of the sets Q i (using the partition of unity Φ ), where each of these frequency-localized piecesis measured in L p (cid:0) R d (cid:1) , while the individual contributions are aggregated using a certain weighted ℓ q -norm. Theunderlying idea in [62] is to ask whether the strict frequency localization using the compactly supported partition ofunity Φ can be replaced by a soft, qualitative frequency localization: Indeed, if ψ ∈ L (cid:0) R d (cid:1) has essential frequencysupport in the base set Q , then it is not hard to see that the function ψ [ i ] := | det T i | − / · F − (cid:0) L b i (cid:2) b ψ ◦ T − i (cid:3)(cid:1) = | det T i | / · M b i (cid:2) ψ ◦ T Ti (cid:3) has essential frequency support in Q i = T i Q + b i , for arbitrary i ∈ I . Here, L x and M ξ denote the usual translationand modulation operators, cf. Section 1.6.Using this notation, the theory developed in [62] provides criteria pertaining to the generator ψ which guaranteethat the generalized shift-invariant system Ψ δ := (cid:16) L δ · T − Ti k ψ [ i ] (cid:17) i ∈ I, k ∈ Z d (1.4)forms, respectively, a Banach frame or an atomic decomposition for the decomposition space D ( Q , L p , ℓ qw ) , forsuﬃciently ﬁne sampling density δ > . The notions of Banach frames and atomic decompositions generalize theconcept of frames for Hilbert spaces to the setting of (Quasi)-Banach spaces. The precise deﬁnitions of these twoconcepts, however, are outside the scope of this introduction; see e.g. [34] for a lucid exposition.For us, the most important conclusion is the following: If Ψ δ simultaneously forms a Banach space and an atomicdecomposition for D ( Q , L p , ℓ qw ) , then there is an explicitly known (Quasi)-Banach space of sequences C p,qw ≤ C I × Z d ,called the coeﬃcient space , such that the following are equivalent for a distribution g :(1) g ∈ D ( Q , L p , ℓ qw ) ,(2) the analysis coeﬃcients (cid:16)D g, L δ · T − Ti k ψ [ i ] E(cid:17) i ∈ I, k ∈ Z d belong to C p,qw ,(3) we can write g = P i ∈ I P k ∈ Z d (cid:0) c ( i ) k · ψ [ i ] (cid:1) for a sequence ( c ( i ) k ) i ∈ I, k ∈ Z d ∈ C p,qw . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein One can even derive slightly stronger conclusions which make these purely qualitative statements quantitative. Now,if one chooses p = q ∈ (0 , and a suitable weight w = ( w i ) i ∈ I depending on p , one can achieve C p,qw = ℓ p (cid:0) I × Z d (cid:1) .Thus, in this case, the preceding equivalence can be summarized as follows:If ψ is nice and δ > is small, then analysis sparsity is equivalent to synthesis sparsity w.r.t. Ψ δ . In fact, the theory developed in [62] even allows the base set Q to vary with i ∈ I , i.e., Q i = T i Q ′ i + b i , at leastas long as the family { Q ′ i | i ∈ I } of diﬀerent base sets remains ﬁnite. Similarly, the generator ψ is allowed to varywith i ∈ I , so that ψ [ i ] = | det T i | / · M b i (cid:2) ψ i ◦ T Ti (cid:3) , again with the provision that the set { ψ i | i ∈ I } of generatorsis ﬁnite.As we will see, one can choose a suitable covering Q = S —the so-called shearlet covering of the frequencyspace R —such that the system Ψ δ from above coincides with a shearlet frame. The resulting decomposition spaces D ( S , L p , ℓ qw ) are then (slight modiﬁcations of) the shearlet smoothness spaces as introduced by Labate et al.[52].In summary, the theory of structured Banach frame decompositions of decomposition spaces will implythe desired equivalence of analysis and synthesis sparsity with respect to cone-adapted shearlet frames. To thisend, however, we ﬁrst need to show that the technical conditions on the generators that are imposed in [62] areindeed satisﬁed if the generators of the shearlet system are suﬃciently smooth and satisfy certain vanishing momentconditions. As we will see, this is by no means trivial and requires a huge amount of technical estimates.Finally, we remark that spaces similar to the shearlet smoothness spaces have also been considered by Vera:In [58], he introduced so-called shear anisotropic inhomogeneous Besov spaces , which are essentially a gen-eralization of the shearlet smoothness spaces to R d . Vera then shows that the analysis and synthesis operatorswith respect to certain bandlimited shearlet systems are bounded between the shear anisotropic inhomogeneousBesov spaces and certain sequence spaces. Note that the assumption of bandlimited frame elements excludes thepossibility of having compact support in space. Furthermore, boundedness of the analysis and synthesis operatorsalone does not imply that the bandlimited shearlet systems form Banach frames or atomic decompositions for theshear anisotropic Besov spaces, since this requires existence of a certain reproducing formula . In [57], Vera alsoconsiders Triebel-Lizorkin type shearlet smoothness spaces and again derives similar boundedness results for theanalysis and synthesis operators. Finally, in both papers [58, 57], certain embedding results between the classicalBesov or Triebel-Lizorkin spaces and the new “shearlet adapted” smoothness spaces are considered, similarly to ourresults in Section 7. Note though that we are able to completely characterize the existence of such embeddings,while [58] only establishes certain necessary and certain suﬃcient conditions, without achieving a characterization.1.4. α -shearlets and cartoon-like functions of diﬀerent regularity. The usual construction of shearletsemploys the parabolic dilations diag (cid:0) j , j/ (cid:1) and (the dual frames of) the resulting shearlet systems turn outto be (almost) optimal for the approximation of functions that are C away from a C edge. Beginning with thepaper [49], it was realized that diﬀerent regularities—i.e., “functions that are C β away from a C β edge”—can behandled by employing a diﬀerent type of dilations, namely the α -parabolic dilations diag (cid:0) j , αj (cid:1) , with thespeciﬁc choice α = β − .These modiﬁed shearlet systems were called hybrid shearlets in [49], where they were introduced in thethree-dimensional setting. In the Bachelor’s thesis [45], precisely in [45, Section 4], it was then shown also inthe two-dimensional setting that shearlet systems using α -parabolic scaling—from now on called α -shearlet sys-tems —indeed yield (almost) optimal approximation rates for the model class of C β -cartoon-like functions , if α = β − . Again, this comes with the caveat that the approximation is actually performed using the dual frame ofthe α -shearlet frame.Note, however, that the preceding result requires the regularity β of the C β -cartoon-like functions to satisfy β ∈ (1 , . Outside of this range, the arguments in [45] are not applicable; in fact, it was shown in [56] that theresult concerning the optimal approximation rate fails for β > , at least for α -curvelets [38] instead of α -shearlets.These α -curvelets are related to α -shearlets in the same way that shearlets and curvelets are related[39], in thesense that the associated coverings of the Fourier domain are equivalent and in that they agree with respect to analysis sparsity: If f is ℓ p -analysis sparse with respect to a (reasonable) α -curvelet system, then the same holdswith respect to any (reasonable) α -shearlet system and vice versa. This was derived in [37] as an application of theframework of α -molecules , a common generalization of α -shearlets and α -curvelets; see also [20] for a generalizationto dimensions larger than two. In fact, in [49, Section 4.1] the three-dimensional counterparts of the scaling matrices diag (cid:0) βj/ , j/ (cid:1) are used, but the resultinghybrid shearlet systems have the same approximation properties as those deﬁned using the α -parabolic dilations diag (cid:0) j , αj (cid:1) with α = β − ; see Section D for more details. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein As we will see, one can modify the shearlet covering S slightly to obtain the so-called α -shearlet covering S ( α ) . The systems Ψ δ (cf. equation (1.4)) that result from an application of the theory of structured Banach framedecompositions with the covering S ( α ) then turn out to be α -shearlet systems. Therefore, we will be able to establishthe equivalence of analysis and synthesis sparsity not only for classical cone-adapted shearlet systems, but in factfor cone-adapted α -shearlet systems for arbitrary α ∈ [0 , , essentially without additional eﬀort.Even more, recall from above that the theory of structured Banach frame decompositions not only yields equiva-lence of analysis and synthesis sparsity, but also shows that each of these properties is equivalent to membership ofthe distribution f under consideration in a suitable decomposition space D (cid:0) S ( α ) , L p , ℓ qw (cid:1) . We will call these spaces α -shearlet smoothness spaces and denote them by S p,qα,s (cid:0) R (cid:1) , where the smoothness parameter s determines theweight w . Using a recently developed theory for embeddings between decomposition spaces[60], we are then able tocompletely characterize the existence of embeddings between α -shearlet smoothness spaces for diﬀerent values of α .Roughly, such an embedding S p ,q α ,s ֒ → S p ,q α ,s means that sparsity (in a certain sense) with respect to α -shearletsimplies sparsity (in a possibly diﬀerent sense) with respect to α -shearlets.In a way, this extends the results of [37], where it is shown that analysis sparsity transfers from one α -scaled systemto another (e.g. from α -curvelets to α -shearlets); in contrast, our embedding theory characterizes the possibilityof transferring such results from α -shearlet systems to α -shearlet systems, even for α = α . It will turn out,however, that simple ℓ p -sparsity with respect to α -shearlets never yields a nontrivial ℓ q -sparsity with respect to α -shearlets, if α = α . Luckily, one can remedy this situation by requiring ℓ p -sparsity in conjunction with acertain decay of the coeﬃcients with the scale. Fore more details, we refer to Section 7.1.5. Structure of the paper.

Before we properly start the paper, we introduce several standard and non-standardnotations in the next subsection.In Section 2, we give an overview over the main aspects of the theory of structured Banach frame decompositionsof decomposition spaces that was recently developed by one of the authors in [62].The most important ingredient for the application of this theory is a suitable covering Q = ( Q i ) i ∈ I = ( T i Q ′ i + b i ) i ∈ I of the frequency space R such that the provided Banach frames and atomic decompositions are of the desired form;in our case we want to obtain cone-adapted α -shearlet systems. Thus, in Section 3, we introduce the so-called α -shearlet coverings S ( α ) for α ∈ [0 , and we verify that these coverings fulﬁll the standing assumptionsfrom [62]. The more technical parts of this veriﬁcation are deferred to Section B in order to not disrupt theﬂow of the paper. Furthermore, Section 3 also contains the deﬁnition of the α -shearlet smoothness spaces S p,qα,s (cid:0) R (cid:1) = D ( S ( α ) , L p , ℓ qw s ) and an analysis of their basic properties.Section 4 contains the main results of the paper. Here, we provide readily veriﬁable conditions—smoothness,decay and vanishing moments—concerning the generators ϕ, ψ of the α -shearlet system SH ( ± α ( ϕ, ψ ; δ ) whichensure that this α -shearlet system forms, respectively, a Banach frame or an atomic decomposition for the α -shearlet smoothness space S p,qα,s (cid:0) R (cid:1) . This is done by verifying the technical conditions of the theory of structuredBanach frame decompositions. All of these results rely on one technical lemma whose proof is extremely lengthyand therefore deferred to Section C.For α -shearlet systems, it is expected that -shearlets are identical to the classical cone-adapted shearlet systems.This is not quite the case, however, for the shearlet systems SH ( ± / ( ϕ, ψ ; δ ) considered in Section 4. The reasonfor this is that the α -shearlet covering S ( α ) divides the frequency plane into four conic regions (the top, bottom,left, and right frequency cones) and a low-frequency region, while the usual deﬁnition of shearlets only divides thefrequency plane into two cones (horizontal and vertical) and a low-frequency region. To remedy this fact, Section 5introduces a slightly modiﬁed covering, the so-called unconnected α -shearlet covering S ( α ) u ; the reason for thisterminology being that the individual sets of the covering are not connected anymore. Essentially, S ( α ) u is obtainedby combining each pair of opposing sets of the α -shearlet covering S ( α ) into one single set. We then verify thatthe associated decomposition spaces coincide with the previously deﬁned α -shearlet smoothness spaces. Finally, weshow that the Banach frames and atomic decompositions obtained by applying the theory of structured Banachframe decompositions with the covering S (1 / u indeed yield conventional cone-adapted shearlet systems.In Section 6, we apply the equivalence of analysis and synthesis sparsity for α -shearlets to prove that α -shearletframes with suﬃciently nice generators indeed yield (almost) optimal N -term approximations for the class E β (cid:0) R (cid:1) of C β -cartoon-like functions, for β ∈ (1 , and α = β − . In case of usual shearlets (i.e., for α = ), this is astraightforward application of the analysis sparsity of C -cartoon-like functions with respect to shearlet systems.But in case of α = , our α -shearlet systems use the α -parabolic scaling matrices diag (cid:0) j , αj (cid:1) , while analysissparsity of C β -cartoon-like functions is only known with respect to β -shearlet systems, which use the scaling matrices diag (cid:0) βj/ , j/ (cid:1) . Bridging the gap between these two diﬀerent shearlet systems is not too hard, but cumbersome, nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein so that part of the proof for α = is deferred to Section D, since most readers are probably mainly interested inthe (easier) case of classical shearlets (i.e., α = ). The obtained approximation rate is almost optimal (cf. [38,Theorem 2.8]) if one restricts to systems where the N -term approximation is formed under a certain polynomialsearch depth restriction . But in the main text of the paper, we just construct some N -term approximation, whichnot necessarily fulﬁlls this restriction concerning the search depth. In Section E, we give a modiﬁed proof whichshows that one can indeed retain the same approximation rate, even under a polynomial search depth restriction .Finally, in Section 7 we completely characterize the existence of embeddings S p ,q α ,s (cid:0) R (cid:1) ֒ → S p ,q α ,s (cid:0) R (cid:1) between α -shearlet smoothness spaces for diﬀerent values of α . Eﬀectively, this characterizes the cases in which one canobtain sparsity with respect to α -shearlets when the only knowledge available is a certain sparsity with respect to α -shearlets .1.6. Notation.

We write N = Z ≥ for the set of natural numbers and N = Z ≥ for the set of natural numbersincluding . For a matrix A ∈ R d × d , we denote by A T the transpose of A . The norm k A k of A is the usual operatornorm of A , acting on R d equipped with the usual euclidean norm |•| = k•k . The open euclidean ball of radius r > around x ∈ R d is denoted by B r ( x ) . For a linear (bounded) operator T : X → Y between (quasi)-normedspaces X, Y , we denote the operator norm of T by ||| T ||| := ||| T ||| X → Y := sup k x k X ≤ k T x k Y . For an arbitrary set M , we let | M | ∈ N ∪ {∞} denote the number of elements of the set. For n ∈ N , we write n := { , . . . , n } ; in particular, ∅ . For the closure of a subset M of some topological space, we write M .The d -dimensional Lebesgue measure of a (measurable) set M ⊂ R d is denoted by λ ( M ) or by λ d ( M ) .Occasionally, we will also use the constant s d := H d − (cid:0) S d − (cid:1) , the surface area of the euclidean unit-sphere S d − ⊂ R d . The complex conjugate of z ∈ C is denoted by z . We use the convention x = 1 for all x ∈ [0 , ∞ ) ,even for x = 0 .For a subset M ⊂ B of a ﬁxed base set B (which is usually implied by the context), we deﬁne the indicatorfunction (or characteristic function ) M of the set M by M : B → { , } , x ( , if x ∈ M, , otherwise . The translation and modulation of a function f : R d → C k by x ∈ R d or ξ ∈ R d are, respectively, denoted by L x f : R d → C k , y f ( y − x ) , and M ξ f : R d → C k , y e πi h ξ,y i f ( y ) . Furthermore, for g : R d → C k , we use the notation e g for the function e g : R d → C k , x g ( − x ) .For the Fourier transform , we use the convention b f ( ξ ) := ( F f ) ( ξ ) := R R d f ( x ) · e − πi h x,ξ i d x for f ∈ L (cid:0) R d (cid:1) .It is well-known that the Fourier transform extends to a unitary automorphism F : L (cid:0) R d (cid:1) → L (cid:0) R d (cid:1) . The inverseof this map is the continuous extension of the inverse Fourier transform, given by (cid:0) F − f (cid:1) ( x ) = R R d f ( ξ ) · e πi h x,ξ i d ξ for f ∈ L (cid:0) R d (cid:1) . We will make frequent use of the space S (cid:0) R d (cid:1) of Schwartz functions and its topological dualspace S ′ (cid:0) R d (cid:1) , the space of tempered distributions . For more details on these spaces, we refer to [21, Section9]; in particular, we note that the Fourier transform restricts to a linear homeomorphism F : S (cid:0) R d (cid:1) → S (cid:0) R d (cid:1) ; byduality, we can thus deﬁne F : S ′ (cid:0) R d (cid:1) → S ′ (cid:0) R d (cid:1) by F ϕ = ϕ ◦ F for ϕ ∈ S ′ (cid:0) R d (cid:1) .Given an open subset U ⊂ R d , we let D ′ ( U ) denote the space of distributions on U , i.e., the topological dualspace of D ′ ( U ) := C ∞ c ( U ) . For the precise deﬁnition of the topology on C ∞ c ( U ) , we refer to [55, Chapter 6]. Weremark that the dual pairings h· , ·i D ′ , D and h· , ·i S ′ , S are always taken to be bilinear instead of sesquilinear.Occasionally, we will make use of the Sobolev space W N,p ( R d ) = (cid:8) f ∈ L p ( R d ) (cid:12)(cid:12) ∀ α ∈ N d with | α | ≤ N : ∂ α f ∈ L p ( R d ) (cid:9) with p ∈ [1 , ∞ ] . Here, as usual for Sobolev spaces, the partial derivatives ∂ α f have to be understood in the distributional sense.Furthermore, we will use the notations ⌈ x ⌉ := min { k ∈ Z | k ≥ x } and ⌊ x ⌋ := max { k ∈ Z | k ≤ x } for x ∈ R . Weobserve ⌊ x ⌋ ≤ x < ⌊ x ⌋ + 1 and ⌈ x ⌉ − < x ≤ ⌈ x ⌉ . Sometimes, we also write x + := ( x ) + := max { , x } for x ∈ R .Finally, we will frequently make use of the shearing matrices S x , the α -parabolic dilation matrices D ( α ) b and the involutive matrix R , given by S x := (cid:18) x (cid:19) , and D ( α ) b := (cid:18) b b α (cid:19) , as well as R := (cid:18) (cid:19) , (1.5)for x ∈ R and α, b ∈ [0 , ∞ ) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Structured Banach frame decompositions of decomposition spaces — A crash course

In this section, we give a brief introduction to the theory of structured Banach frames and atomic decompositionsfor decomposition spaces that was recently developed by one of the authors in [62].We start with a crash course on decomposition spaces. These are deﬁned using a suitable covering Q = ( Q i ) i ∈ I of (a subset of) the frequency space R d . For the decomposition spaces to be well-deﬁned and for the theory in [62] tobe applicable, the covering Q needs to be a semi-structured covering for which a regular partition of unity exists. For this, it suﬃces if Q is an almost structured covering . Since the notion of almost structured coveringsis somewhat easier to understand than general semi-structured coverings, we will restrict ourselves to this concept. Deﬁnition 2.1.

Let ∅ = O ⊂ R d be open. A family Q = ( Q i ) i ∈ I is called an almost structured covering of O , if for each i ∈ I , there is an invertible matrix T i ∈ GL (cid:0) R d (cid:1) , a translation b i ∈ R d and an open, bounded set Q ′ i ⊂ R d such that the following conditions are fulﬁlled:(1) We have Q i = T i Q ′ i + b i for all i ∈ I .(2) We have Q i ⊂ O for all i ∈ I .(3) Q is admissible , i.e., there is some N Q ∈ N satisfying | i ∗ | ≤ N Q for all i ∈ I , where the index-cluster i ∗ is deﬁned as i ∗ := { ℓ ∈ I | Q ℓ ∩ Q i = ∅ } for i ∈ I. (2.1)(4) There is a constant C Q > satisfying (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ C Q for all i ∈ I and all j ∈ i ∗ .(5) For each i ∈ I , there is an open set P ′ i ⊂ R d with the following additional properties:(a) P ′ i ⊂ Q ′ i for all i ∈ I .(b) The sets { P ′ i | i ∈ I } and { Q ′ i | i ∈ I } are ﬁnite.(c) We have O ⊂ S i ∈ I ( T i P ′ i + b i ) . ◭ Remark. • In the following, if we require Q = ( Q i ) i ∈ I = ( T i Q ′ i + b i ) i ∈ I to be an almost structured covering of O ,it is always implicitly understood that T i , Q ′ i and b i are chosen in such a way that the conditions in Deﬁnition2.1 are satisﬁed. • Since each set Q ′ i is bounded and since the set { Q ′ i | i ∈ I } is ﬁnite, the family ( Q ′ i ) i ∈ I is uniformly bounded, i.e.,there is some R Q > satisfying Q ′ i ⊂ B R Q (0) for all i ∈ I . (cid:7) A crucial property of almost structured coverings is that these always admit a regular partition of unity , anotion which was originally introduced in [61, Deﬁnition 2.4].

Deﬁnition 2.2.

Let Q = ( Q i ) i ∈ I = ( T i Q ′ i + b i ) i ∈ I be an almost structured covering of the open set ∅ = O ⊂ R d .We say that the family Φ = ( ϕ i ) i ∈ I is a regular partition of unity subordinate to Q if the following hold:(1) We have ϕ i ∈ C ∞ c ( O ) with supp ϕ i ⊂ Q i for all i ∈ I .(2) We have P i ∈ I ϕ i ≡ on O .(3) For each α ∈ N d , the constant C ( α ) := sup i ∈ I k ∂ α ϕ ♮i k sup is ﬁnite, where for each i ∈ I , the normalized version ϕ ♮i of ϕ i is deﬁned as ϕ ♮i : R d → C , ξ ϕ i ( T i ξ + b i ) . ◭ Theorem 2.3. (cf. [61, Theorem 2.8] and see [1, Proposition 1] for a similar statement)Every almost structured covering Q of an open subset ∅ = O ⊂ R d admits a regular partition of unity Φ = ( ϕ i ) i ∈ I subordinate to Q . ◭ Before we can give the formal deﬁnition of decomposition spaces, we need one further notion:

Deﬁnition 2.4. (cf. [16, Deﬁnition 3.1]) Let ∅ = O ⊂ R d be open and assume that Q = ( Q i ) i ∈ I is an almoststructured covering of O . A weight w on the index set I is simply a sequence w = ( w i ) i ∈ I of positive numbers w i > . The weight w is called Q -moderate if there is a constant C Q ,w > satisfying w j ≤ C Q ,w · w i ∀ i ∈ I and all j ∈ i ∗ . (2.2)For an arbitrary weight w = ( w i ) i ∈ I on I and q ∈ (0 , ∞ ] we deﬁne the weighted ℓ q space ℓ qw ( I ) as ℓ qw ( I ) := (cid:8) ( c i ) i ∈ I ∈ C I (cid:12)(cid:12) ( w i · c i ) i ∈ I ∈ ℓ q ( I ) (cid:9) , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein equipped with the natural (quasi)-norm (cid:13)(cid:13) ( c i ) i ∈ I (cid:13)(cid:13) ℓ qw := (cid:13)(cid:13) ( w i · c i ) i ∈ I (cid:13)(cid:13) ℓ q . We will also use the notation k c k ℓ qw forarbitrary sequences c = ( c i ) i ∈ I ∈ [0 , ∞ ] I with the understanding that k c k ℓ qw = ∞ if c i = ∞ for some i ∈ I or if c / ∈ ℓ qw ( I ) . ◭ Now, we can ﬁnally give a precise deﬁnition of decomposition spaces. We begin with the (easier) case of theso-called

Fourier-side decomposition spaces . Deﬁnition 2.5.

Let Q = ( Q i ) i ∈ I be an almost structured covering of the open set ∅ = O ⊂ R d , let w = ( w i ) i ∈ I bea Q -moderate weight on I and let p, q ∈ (0 , ∞ ] . Finally, let Φ = ( ϕ i ) i ∈ I be a regular partition of unity subordinateto Q . We then deﬁne the associated Fourier-side decomposition space (quasi)-norm as k g k D F ( Q ,L p ,ℓ qw ) := (cid:13)(cid:13)(cid:13)(cid:0)(cid:13)(cid:13) F − ( ϕ i · g ) (cid:13)(cid:13) L p (cid:1) i ∈ I (cid:13)(cid:13)(cid:13) ℓ qw ∈ [0 , ∞ ] for each distribution g ∈ D ′ ( O ) . The associated

Fourier-side decomposition space is simply D F ( Q , L p , ℓ qw ) := n g ∈ D ′ ( O ) (cid:12)(cid:12)(cid:12) k g k D F ( Q ,L p ,ℓ qw ) < ∞ o . ◭ Remark.

Before we continue with the deﬁnition of the actual (space-side) decomposition spaces, a few remarks arein order: • The expression (cid:13)(cid:13) F − ( ϕ i · g ) (cid:13)(cid:13) L p ∈ [0 , ∞ ] makes sense for each i ∈ I , since ϕ i ∈ C ∞ c ( O ) , so that ϕ i · g isa compactly supported distribution on R d (and thus also a tempered distribution), so that the Paley-Wienertheorem (see e.g. [55, Theorem 7.23]) shows that the tempered distribution F − ( ϕ i · g ) is given by (integrationagainst) a smooth function of which we can take the L p quasi-norm. • The notations k g k D F ( Q ,L p ,ℓ qw ) and D F ( Q , L p , ℓ qw ) both suppress the speciﬁc regular partition of unity Φ thatwas chosen. This is justiﬁed, since [60, Corollary 3.18] shows that any two L p -BAPUs Φ , Ψ yield equivalentquasi-norms and thus the same (Fourier-side) decomposition spaces. This suﬃces, since [61, Corollary 2.7] showsthat every regular partition of unity is also an L p -BAPU for Q , for arbitrary p ∈ (0 , ∞ ] . • Finally, [60, Theorem 3.21] shows that D F ( Q , L p , ℓ qw ) is a Quasi-Banach space. (cid:7) Deﬁnition 2.6.

For an open set ∅ = O ⊂ R d , let Z ( O ) := F ( C ∞ c ( O )) ⊂ S (cid:0) R d (cid:1) and equip this space with theunique topology which makes the Fourier transform F : C ∞ c ( O ) → Z ( O ) , ϕ b ϕ into a homeomorphism. Thetopological dual space of Z ( O ) is denoted by Z ′ ( O ) . By duality, we deﬁne the Fourier transform on Z ′ ( O ) by b g := F g := g ◦ F ∈ D ′ ( O ) for g ∈ Z ′ ( O ) .Finally, under the assumptions of Deﬁnition 2.5, we deﬁne the (space-side) decomposition space associatedto the parameters Q , p, q, w as D ( Q , L p , ℓ qw ) := n g ∈ Z ′ ( O ) (cid:12)(cid:12)(cid:12) k g k D ( Q ,L p ,ℓ qw ) := k b g k D F ( Q ,L p ,ℓ qw ) < ∞ o . It is not hard to see that the Fourier transform F : Z ′ ( O ) → D ′ ( O ) is an isomorphism which restricts to anisometric isomorphism F : D ( Q , L p , ℓ qw ) → D F ( Q , L p , ℓ qw ) . ◭ Remark.

For an explanation why the reservoirs D ′ ( O ) and Z ′ ( O ) are the correct choices for deﬁning D F ( Q , L p , ℓ qw ) and D ( Q , L p , ℓ qw ) , even in case of O = R d , we refer to [60, Remark 3.13]. (cid:7) Now that we have formally introduced the notion of decomposition spaces, we present the framework developedin [62] for the construction of Banach frames and atomic decompositions for these spaces. To this end, we introducethe following set of notations and standing assumptions:

Assumption 2.7.

We ﬁx an almost structured covering Q = ( T i Q ′ i + b i ) i ∈ I with associated regular partition ofunity Φ = ( ϕ i ) i ∈ I for the remainder of the section. By deﬁnition of an almost structured covering, the set { Q ′ i | i ∈ I } is ﬁnite. Hence, we have { Q ′ i | i ∈ I } = { Q (1)0 , . . . , Q ( n )0 } for certain (not necessarily distinct) open, bounded subsets Q (1)0 , . . . , Q ( n )0 ⊂ R d . In particular, for each i ∈ I , there is some k i ∈ n satisfying Q ′ i = Q ( k i )0 .We ﬁx the choice of n ∈ N , of the sets Q (1)0 , . . . , Q ( n )0 and of the map I → n, i k i for the remainder of thesection. ◭ Finally, we need a suitable coeﬃcient space for our Banach frames and atomic decompositions: The exact deﬁnition of an L p -BAPU is not important for us. The interested reader can ﬁnd the deﬁnition in [60, Deﬁnition 3.5]. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Deﬁnition 2.8.

For given p, q ∈ (0 , ∞ ] and a given weight w = ( w i ) i ∈ I on I , we deﬁne the associated coeﬃcientspace as C p,qw := ℓ q (cid:18) | det T i | − p · w i (cid:19) i ∈ I (cid:16)(cid:2) ℓ p (cid:0) Z d (cid:1)(cid:3) i ∈ I (cid:17) := (cid:26) c = ( c ( i ) k ) i ∈ I,k ∈ Z d (cid:12)(cid:12)(cid:12)(cid:12) k c k C p,qw := (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) | det T i | − p · w i · (cid:13)(cid:13) ( c ( i ) k ) k ∈ Z d (cid:13)(cid:13) ℓ p (cid:17) i ∈ I (cid:13)(cid:13)(cid:13)(cid:13) ℓ q < ∞ (cid:27) ≤ C I × Z d . ◭ Remark.

Observe that if w i = | det T i | p − and if p = q , then C p,qw = ℓ p (cid:0) I × Z d (cid:1) , with equal (quasi)-norms. (cid:7) Now that we have introduced the coeﬃcient space C p,qw , we are in a position to discuss the existence criteria forBanach frames and atomic decompositions that were derived in [62]. We begin with the case of Banach frames . Theorem 2.9.

Let w = ( w i ) i ∈ I be a Q -moderate weight, let ε, p , q ∈ (0 , and let p, q ∈ (0 , ∞ ] with p ≥ p and q ≥ q . Deﬁne N := (cid:24) d + ε min { , p } (cid:25) , τ := min { , p, q } and σ := τ · (cid:18) d min { , p } + N (cid:19) . Let γ (0)1 , . . . , γ (0) n : R d → C be given and deﬁne γ i := γ (0) k i for i ∈ I . Assume that the following conditions aresatisﬁed:(1) We have γ (0) k ∈ L (cid:0) R d (cid:1) and F γ (0) k ∈ C ∞ (cid:0) R d (cid:1) for all k ∈ n , where all partial derivatives of F γ (0) k arepolynomially bounded.(2) We have γ (0) k ∈ C (cid:0) R d (cid:1) and ∇ γ (0) k ∈ L (cid:0) R d (cid:1) ∩ L ∞ (cid:0) R d (cid:1) for all k ∈ n .(3) We have h F γ (0) k i ( ξ ) = 0 for all ξ ∈ Q ( k )0 and all k ∈ n .(4) We have C := sup i ∈ I X j ∈ I M j,i < ∞ and C := sup j ∈ I X i ∈ I M j,i < ∞ , where M j,i := (cid:18) w j w i (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · max | β |≤ (cid:18) | det T i | − · Z Q i max | α |≤ N (cid:12)(cid:12)(cid:12)(cid:16)h ∂ α [ ∂ β γ j i (cid:0) T − j ( ξ − b j ) (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) d ξ (cid:19) τ . Then there is some δ = δ (cid:0) p, q, w, ε, ( γ i ) i ∈ I (cid:1) > such that for arbitrary < δ ≤ δ , the family (cid:16) L δ · T − Ti k f γ [ i ] (cid:17) i ∈ I,k ∈ Z d with γ [ i ] = | det T i | / · M b i (cid:2) γ i ◦ T Ti (cid:3) and f γ [ i ] ( x ) = γ [ i ] ( − x ) forms a Banach frame for D ( Q , L p , ℓ qw ) . Precisely, this means the following: • The analysis operator A ( δ ) : D ( Q , L p , ℓ qw ) → C p,qw , f (cid:0) [ γ [ i ] ∗ f ] (cid:0) δ · T − Ti k (cid:1)(cid:1) i ∈ I,k ∈ Z d is well-deﬁned and bounded for each δ ∈ (0 , . Here, the convolution γ [ i ] ∗ f is deﬁned as (cid:16) γ [ i ] ∗ f (cid:17) ( x ) = X ℓ ∈ I F − (cid:16) c γ [ i ] · ϕ ℓ · b f (cid:17) ( x ) ∀ x ∈ R d , (2.3)where the series converges normally in L ∞ (cid:0) R d (cid:1) and thus absolutely and uniformly, for each f ∈ D ( Q , L p , ℓ qw ) .For a more convenient expression of (cid:0) γ [ i ] ∗ f (cid:1) ( x ) , at least for f ∈ L (cid:0) R d (cid:1) ⊂ Z ′ ( O ) , see Lemma 5.12. • For < δ ≤ δ , there is a bounded linear reconstruction operator R ( δ ) : C p,qw → D ( Q , L p , ℓ qw ) satisfying R ( δ ) ◦ A ( δ ) = id D ( Q ,L p ,ℓ qw ) . • We have the following consistency property : If Q -moderate weights w (1) = ( w (1) i ) i ∈ I and w (2) = ( w (2) i ) i ∈ I and exponents p , p , q , q ∈ (0 , ∞ ] are chosen such that the assumptions of the current theorem are satisﬁed for p , q , w (1) , as well as for p , q , w (2) and if < δ ≤ min (cid:8) δ (cid:0) p , q , w (1) , ε, ( γ i ) i ∈ I (cid:1) , δ (cid:0) p , q , w (2) , ε, ( γ i ) i ∈ I (cid:1)(cid:9) then we have the following equivalence: ∀ f ∈ D (cid:0) Q , L p , ℓ q w (2) (cid:1) : f ∈ D (cid:0) Q , L p , ℓ q w (1) (cid:1) ⇐⇒ (cid:0) [ γ [ i ] ∗ f ] (cid:0) δ · T − Ti k (cid:1)(cid:1) i ∈ I,k ∈ Z d ∈ C p ,q w (1) . Finally, there is an estimate for the size of δ which is independent of the choice of p ≥ p and q ≥ q : There is aconstant K = K ( p , q , ε, d, Q , Φ , γ (0)1 , . . . , γ (0) n ) > such that we can choose δ = 1 (cid:14) h K · C Q ,w · (cid:16) C /τ + C /τ (cid:17) i . ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Proof.

This is a special case of Theorem E.4, for Ω = Ω = 1 , K = 0 and v = v ≡ . (cid:3) Now, we provide criteria which ensure that a given family of prototypes generates atomic decompositions . Theorem 2.10.

Let w = ( w i ) i ∈ I be a Q -moderate weight, let ε, p , q ∈ (0 , and let p, q ∈ (0 , ∞ ] with p ≥ p and q ≥ q . Deﬁne N := (cid:24) d + ε min { , p } (cid:25) , τ := min { , p, q } , ϑ := (cid:18) p − (cid:19) + , and Υ := 1 + d min { , p } , as well as σ := ( τ · ( d + 1) , if p ∈ [1 , ∞ ] ,τ · (cid:0) p − · d + (cid:6) p − · ( d + ε ) (cid:7)(cid:1) , if p ∈ (0 , . Let γ (0)1 , . . . , γ (0) n : R d → C be given and deﬁne γ i := γ (0) k i for i ∈ I . Assume that there are functions γ (0 ,j )1 , . . . , γ (0 ,j ) n for j ∈ { , } such that the following conditions are satisﬁed:(1) We have γ (0 , k ∈ L (cid:0) R d (cid:1) for all k ∈ n .(2) We have γ (0 , k ∈ C (cid:0) R d (cid:1) for all k ∈ n .(3) We have Ω ( p ) := max k ∈ n (cid:13)(cid:13)(cid:13) γ (0 , k (cid:13)(cid:13)(cid:13) Υ + max k ∈ n (cid:13)(cid:13)(cid:13) ∇ γ (0 , k (cid:13)(cid:13)(cid:13) Υ < ∞ , where k f k Υ = sup x ∈ R d (1 + | x | ) Υ · | f ( x ) | for f : R d → C ℓ and (arbitrary) ℓ ∈ N .(4) We have F γ (0 ,j ) k ∈ C ∞ (cid:0) R d (cid:1) and all partial derivatives of F γ (0 ,j ) k are polynomially bounded for all k ∈ n and j ∈ { , } .(5) We have γ (0) k = γ (0 , k ∗ γ (0 , k for all k ∈ n .(6) We have h F γ (0) k i ( ξ ) = 0 for all ξ ∈ Q ( k )0 and all k ∈ n .(7) We have (cid:13)(cid:13)(cid:13) γ (0) k (cid:13)(cid:13)(cid:13) Υ < ∞ for all k ∈ n .(8) We have K := sup i ∈ I X j ∈ I N i,j < ∞ and K := sup j ∈ I X i ∈ I N i,j < ∞ , where γ j, := γ (0 , k j for j ∈ I and N i,j := (cid:18) w i w j · (cid:0) | det T j | (cid:14) | det T i | (cid:1) ϑ (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · (cid:18) | det T i | − · Z Q i max | α |≤ N (cid:12)(cid:12) [ ∂ α d γ j, ] (cid:0) T − j ( ξ − b j ) (cid:1)(cid:12)(cid:12) d ξ (cid:19) τ . Then there is some δ ∈ (0 , such that the family Ψ δ := (cid:16) L δ · T − Ti k γ [ i ] (cid:17) i ∈ I, k ∈ Z d with γ [ i ] = | det T i | / · M b i (cid:2) γ i ◦ T Ti (cid:3) forms an atomic decomposition of D ( Q , L p , ℓ qw ) , for all δ ∈ (0 , δ ] . Precisely, this means the following: • The synthesis map S ( δ ) : C p,qw → D ( Q , L p , ℓ qw ) , ( c ( i ) k ) i ∈ I, k ∈ Z d X i ∈ I X k ∈ Z d h c ( i ) k · L δ · T − Ti k γ [ i ] i is well-deﬁned and bounded for every δ ∈ (0 , . • For < δ ≤ δ , there is a bounded linear coeﬃcient map C ( δ ) : D ( Q , L p , ℓ qw ) → C p,qw satisfying S ( δ ) ◦ C ( δ ) = id D ( Q ,L p ,ℓ qw ) . Finally, there is an estimate for the size of δ which is independent of p ≥ p and q ≥ q : There is a constant K = K ( p , q , ε, d, Q , Φ , γ (0)1 , . . . , γ (0) n ) > such that we can choose δ = min (cid:26) , h K · Ω ( p ) · (cid:16) K /τ + K /τ (cid:17)i − (cid:27) . ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Remark. • Convergence of the series deﬁning S ( δ ) has to be understood as follows: For each i ∈ I , the series X k ∈ Z d h c ( i ) k · L δ · T − Ti k γ [ i ] i converges pointwise absolutely to a function g j ∈ L (cid:0) R d (cid:1) ∩ S ′ (cid:0) R d (cid:1) and the series P j ∈ I g j = S ( δ ) ( c ( i ) k ) i ∈ I,k ∈ Z d converges unconditionally in the weak- ∗ -sense in Z ′ ( O ) , i.e., for every φ ∈ Z ( O ) = F ( C ∞ c ( O )) , the series P j ∈ I h g j , φ i S ′ , S converges absolutely and the functional φ P j ∈ I h g j , φ i S ′ , S is continuous on Z ( O ) . • The action of C ( δ ) on a given f ∈ D ( Q , L p , ℓ qw ) is independent of the precise choice of p, q, w , as long as C ( δ ) f isdeﬁned at all. (cid:7) Proof of Theorem 2.10.

This is a special case of Theorem E.5, for Ω = Ω = 1 and v = v ≡ . (cid:3) The main limitation of Theorem 2.10—in comparison to Theorem 2.9—is that we require each γ (0) k to be factorizedas a convolution product γ (0) k = γ (0 , k ∗ γ (0 , k , which is tedious to verify. To simplify such veriﬁcations, the followingresult is helpful: Proposition 2.11. (cf. [62, Lemma 6.9] )Let ̺ ∈ L (cid:0) R d (cid:1) with ̺ ≥ . Let N ∈ N with N ≥ d + 1 and assume that γ ∈ L (cid:0) R d (cid:1) satisﬁes b γ ∈ C N (cid:0) R d (cid:1) with | ∂ α b γ ( ξ ) | ≤ ̺ ( ξ ) · (1 + | ξ | ) − ( d +1+ ε ) ∀ ξ ∈ R d ∀ α ∈ N d with | α | ≤ N for some ε ∈ (0 , .Then there are functions γ ∈ C (cid:0) R d (cid:1) ∩ L (cid:0) R d (cid:1) and γ ∈ C (cid:0) R d (cid:1) ∩ W , (cid:0) R d (cid:1) with γ = γ ∗ γ and with thefollowing additional properties:(1) We have k γ k K ≤ s d · d +3 K · K ! · (1 + d ) K and k∇ γ k K ≤ s d ε · d +3 K · (1 + d ) K ) · ( K + 1)! forall K ∈ N , where k g k K := sup x ∈ R d (1 + | x | ) K | g ( x ) | .(2) We have b γ ∈ C ∞ (cid:0) R d (cid:1) with all partial derivatives of b γ being polynomially bounded (even bounded).(3) If b γ ∈ C ∞ (cid:0) R d (cid:1) with all partial derivatives being polynomially bounded, the same also holds for b γ .(4) We have k γ k N ≤ (1 + d ) N · d +4 N · N ! · k ̺ k L and k γ k N ≤ (1 + d ) N +1 · k ̺ k L .(5) We have | ∂ α b γ ( ξ ) | ≤ d +4 N · N ! · (1 + d ) N · ̺ ( ξ ) for all ξ ∈ R d and α ∈ N d with | α | ≤ N . ◭ Definition and basic properties of α -shearlet smoothness spaces In this section, we introduce the class of α -shearlet smoothness spaces . These spaces are a generalizationof the “ordinary” shearlet smoothness spaces as introduced by Labate et al.[52]. Later on (cf. Theorem 5.13), itwill turn out that these spaces simultaneously describe analysis and synthesis sparsity with respect to (suitable) α -shearlet frames.We will deﬁne the α -shearlet smoothness spaces as certain decomposition spaces. Thus, we ﬁrst have to deﬁnethe associated covering and the weight for the sequence space ℓ qw ( I ) that we will use: Deﬁnition 3.1.

Let α ∈ [0 , . The α -shearlet covering S ( α ) is deﬁned as S ( α ) := ( S ( α ) i ) i ∈ I ( α ) = ( T i Q ′ i ) i ∈ I ( α ) = ( T i Q ′ i + b i ) i ∈ I ( α ) , where: • The index set I ( α ) is given by I := I ( α ) := { } ∪ I , where I := I ( α )0 := { ( n, m, ε, δ ) ∈ N × Z × {± } × { , } | | m | ≤ G n } with G n := G ( α ) n := ⌈ n (1 − α ) ⌉ . • The basic sets ( Q ′ i ) i ∈ I ( α ) are given by Q ′ := ( − , and by Q ′ i := Q := U ( − , ) ( − , for i ∈ I ( α )0 , where we usedthe notation U ( γ,µ )( a,b ) := (cid:26)(cid:18) ξη (cid:19) ∈ ( γ, µ ) × R (cid:12)(cid:12)(cid:12)(cid:12) ηξ ∈ ( a, b ) (cid:27) for a, b ∈ R and γ, µ ∈ (0 , ∞ ) . (3.1) • The matrices ( T i ) i ∈ I ( α ) are given by T := id and by T i := T ( α ) i := R δ · A ( α ) n,m,ε , with A ( α ) n,m,ε := ε · D ( α )2 n · S Tm for i = ( n, m, ε, δ ) ∈ I ( α )0 . Here, the matrices R, S x and D ( α ) b are as in equation (1.5). • The translations ( b i ) i ∈ I ( α ) are given by b i := 0 for all i ∈ I ( α ) .Finally, we deﬁne the weight w = ( w i ) i ∈ I by w := 1 and w n,m,ε,δ := 2 n for ( n, m, ε, δ ) ∈ I . ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Our ﬁrst goal is to show that the covering S ( α ) is an almost structured covering of R (cf. Deﬁnition 2.1). Tothis end, we begin with the following auxiliary lemma: Lemma 3.2. (1) Using the notation U ( γ,µ )( a,b ) from equation (3.1) and the shearing matrices S x from equation (1.5) ,we have for arbitrary m, a, b ∈ R and κ, λ, γ, µ > that S Tm U ( γ,µ )( a,b ) = U ( γ,µ )( m + a,m + b ) and diag ( λ, κ ) U ( γ,µ )( a,b ) = U ( λγ,λµ ) ( κλ a, κλ b ) . (3.2) Consequently, T ( α ) i U ( γ,µ )( a,b ) = ε · U (2 n γ, n µ ) ( n ( α − ( m + a ) , n ( α − ( m + b ) ) for all i = ( n, m, ε, ∈ I . (3.3) In particular, S ( α ) n,m,ε, = ε · U (2 n / , · n ) ( n ( α − ( m − , n ( α − ( m +1) ) .(2) Let i = ( n, m, ε, δ ) ∈ I and let (cid:0) ξη (cid:1) ∈ S ( α ) i be arbitrary. Then the following hold:(a) If i = ( n, m, ε, , we have | η | < · | ξ | .(b) If i = ( n, m, ε, , we have | ξ | < · | η | .(c) We have n − < n < (cid:12)(cid:12)(cid:0) ξη (cid:1)(cid:12)(cid:12) < · n < n +4 . ◭ Proof.

We establish the diﬀerent claims individually:(1) The following is essentially identical with the proof of [59, Lemma 6.3.4] and is only given here for the sake ofcompleteness. We ﬁrst observe the following equivalences: (cid:18) ξη (cid:19) ∈ U ( γ,δ )( m + a,m + b ) ⇐⇒ ξ ∈ ( γ, δ ) and m + a < ηξ < m + b ⇐⇒ ξ ∈ ( γ, δ ) and a < η − mξξ < b ⇐⇒ (cid:18) m (cid:19) − (cid:18) ξη (cid:19) = (cid:18) ξη − mξ (cid:19) ∈ U ( γ,δ )( a,b ) and (cid:18) ξη (cid:19) ∈ U ( λγ,λµ ) ( κλ a, κλ b ) ⇐⇒ ξ ∈ ( λγ, λµ ) and κλ a < ηξ < κλ b ⇐⇒ λ − ξ ∈ ( γ, µ ) and a < κ − ηλ − ξ < b ⇐⇒ (cid:18) λ κ (cid:19) − (cid:18) ξη (cid:19) = (cid:18) λ − ξκ − η (cid:19) ∈ U ( γ,µ )( a,b ) . These equivalences show diag ( λ, κ ) U ( γ,µ )( a,b ) = U ( λγ,λµ ) ( κλ a, κλ b ) and S Tm U ( γ,µ )( a,b ) = U ( γ,µ )( m + a,m + b ) . But for i = ( n, m, ε, , wehave T ( α ) i = R · A ( α ) n,m,ε = ε · diag (2 n , nα ) · S Tm . This easily yields the claim.(2) We again show the three claims individually:(a) For i = ( n, m, ε, ∈ I , equation (3.3) yields for (cid:0) ξη (cid:1) ∈ S ( α ) i that ηξ ∈ (cid:16) n ( α − ( m − , n ( α − ( m + 1) (cid:17) ⊂ (cid:16) − n ( α − ( | m | + 1) , n ( α − ( | m | + 1) (cid:17) , since n ( α − ( m + 1) ≤ n ( α − ( | m | + 1) and n ( α − ( m − ≥ n ( α − ( −| m | −

1) = − n ( α − ( | m | + 1) . Because of | m | ≤ G n = ⌈ n (1 − α ) ⌉ < n (1 − α ) + 1 and | ξ | > , it follows that | η | = | ξ | · (cid:12)(cid:12)(cid:12)(cid:12) ηξ (cid:12)(cid:12)(cid:12)(cid:12) ≤ | ξ | · n ( α − · ( | m | + 1) < | ξ | · − n (1 − α ) · (cid:16) n (1 − α ) + 2 (cid:17) ≤ | ξ | · (cid:16) · − n (1 − α ) (cid:17) ≤ · | ξ | . (b) For i = ( n, m, ε, ∈ I we have (cid:18) ηξ (cid:19) = R · (cid:18) ξη (cid:19) ∈ RS ( α ) n,m,ε, = RT ( α ) n,m,ε, Q = RRA ( α ) n,m,ε Q = A ( α ) n,m,ε Q = S ( α ) n,m,ε, , so that we get | ξ | < · | η | from the previous case. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein (c) To prove this claim, we again distinguish two cases:(i) For i = ( n, m, ε, , equation (3.3) yields εξ ∈ (2 n / , · n ) and thus n < | ξ | < · n . Moreover, weknow from a previous part of the lemma that | η | < · | ξ | . Thus n < | ξ | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ξη (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | ξ | + | η | < | ξ | + 3 | ξ | = 4 | ξ | < · n . (ii) For i = ( n, m, ε, we have εη ∈ (2 n / , · n ) and thus n < | η | < · n . Moreover, we know fromthe previous part of the lemma that | ξ | < · | η | . Thus n < | η | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ξη (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | ξ | + | η | < | η | + | η | = 4 | η | < · n . (cid:3) Using the preceding lemma—which will also be frequently useful elsewhere—one can show the following:

Lemma 3.3.

The α -shearlet covering S ( α ) from Deﬁnition 3.1 is an almost structured covering of R . ◭ Since the proof of Lemma 3.3 is quite lengthy, although it does not yield too much insight, we postpone it to theappendix (Section B).Finally, before we can formally deﬁne the α -shearlet smoothness spaces, we still need to verify that the weight w from Deﬁnition 3.1 is S ( α ) -moderate (cf. Deﬁnition 2.4). Lemma 3.4.

For arbitrary s ∈ R , the weight w s = ( w si ) i ∈ I , with w = ( w i ) i ∈ I as in Deﬁnition 3.1, is S ( α ) -moderate(cf. equation (2.2) ) with C S ( α ) ,w s ≤ | s | . Furthermore, we have · w i ≤ | ξ | ≤ · w i ∀ i ∈ I and all ξ ∈ S ( α ) i . ◭ Proof.

First, let i = ( n, m, ε, δ ) ∈ I be arbitrary. By Lemma 3.2, we get · w i = 2 n ≤ | ξ | ≤ | ξ | ≤ · n ≤ · n = 13 · w i ∀ ξ ∈ S ( α ) i . Furthermore, for i = 0 , we have S ( α ) i = ( − , and thus · w i ≤ w i = 1 ≤ | ξ | ≤ · w i ≤ · w i ∀ ξ ∈ S ( α ) i . This establishes the second part of the lemma.Next, let i, j ∈ I with S ( α ) i ∩ S ( α ) j = ∅ . Pick an arbitrary ξ ∈ S ( α ) i ∩ S ( α ) j and note as a consequence of thepreceding estimates that w i w j ≤ · (1 + | ξ | ) · (1 + | ξ | ) = 39 . By symmetry, this implies ≤ w i w j ≤ and thus also w si w sj = (cid:18) w i w j (cid:19) s ≤ | s | . (cid:3) Now, we can ﬁnally formally deﬁne the α -shearlet smoothness spaces: Deﬁnition 3.5.

For α ∈ [0 , , p, q ∈ (0 , ∞ ] and s ∈ R , we deﬁne the α -shearlet smoothness space S p,qα,s (cid:0) R (cid:1) associated to these parameters as S p,qα,s (cid:0) R (cid:1) := D ( S ( α ) , L p , ℓ qw s ) , where the covering S ( α ) and the weight w s are as in Deﬁnition 3.1 and Lemma 3.4, respectively. ◭ Remark.

Since S ( α ) is an almost structured covering by Lemma 3.3 and since w s is S ( α ) -moderate by Lemma 3.4,Deﬁnition 2.5 and the associated remark show that S p,qα,s (cid:0) R (cid:1) is indeed well-deﬁned, i.e., independent of the chosenregular partition of unity subordinate to S ( α ) . The same remark also implies that S p,qα,s (cid:0) R (cid:1) is a Quasi-Banachspace. (cid:7) Recall that with our deﬁnition of decomposition spaces, S p,qα,s (cid:0) R (cid:1) is a subspace of Z ′ (cid:0) R (cid:1) = (cid:2) F (cid:0) C ∞ c (cid:0) R (cid:1)(cid:1)(cid:3) ′ .But as our next result shows, each f ∈ S p,qα,s (cid:0) R (cid:1) actually extends to a tempered distribution: nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Lemma 3.6.

Let α ∈ [0 , , p, q ∈ (0 , ∞ ] and s ∈ R . Then S p,qα,s ( R ) ֒ → S ′ ( R ) , in the sense that each f ∈ S p,qα,s ( R ) extends to a uniquely determined tempered distribution f S ∈ S ′ (cid:0) R (cid:1) . Fur-thermore, the map S p,qα,s ( R ) ֒ → S ′ ( R ) , f f S is linear and continuous with respect to the weak- ∗ -topology on S ′ (cid:0) R (cid:1) . ◭ Proof.

It is well known (cf. [21, Proposition 9.9]) that C ∞ c (cid:0) R (cid:1) ≤ S (cid:0) R (cid:1) is dense. Since F : S (cid:0) R (cid:1) → S (cid:0) R (cid:1) is ahomeomorphism, we see that Z (cid:0) R (cid:1) = F (cid:0) C ∞ c (cid:0) R (cid:1)(cid:1) ≤ S (cid:0) R (cid:1) is dense, too. Hence, for arbitrary f ∈ S p,qα,s ( R ) , ifthere is any extension g ∈ S ′ (cid:0) R (cid:1) of f ∈ Z ′ (cid:0) R (cid:1) , then g is uniquely determined.Next, by Lemma 3.3, S ( α ) is almost structured, so that [60, Theorem 8.2] shows that S ( α ) is a regular coveringof R . Thus, once we verify that there is some N ∈ N such that the sequence w ( N ) = ( w ( N ) i ) i ∈ I deﬁned by w ( N ) i := | det T ( α ) i | /p · max n , (cid:13)(cid:13) T − i (cid:13)(cid:13) o · " inf ξ ∈ ( S ( α ) i ) ∗ (1 + | ξ | ) − N satisﬁes w ( N ) ∈ ℓ q ′ /w s ( I ) with q ′ = ∞ in case of q ∈ (0 , , then the claim of the present lemma is a consequence of[60, Theorem 8.3] and the associated remark. Here, ( S ( α ) i ) ∗ = S j ∈ i ∗ S ( α ) j .Since I = { } ∪ I and since the single (ﬁnite(!)) term w ( N )0 does not inﬂuence membership of w ( N ) in ℓ q ′ /w s , weonly need to show w ( N ) | I ∈ ℓ q ′ /w s ( I ) . But for i = ( n, m, ε, δ ) ∈ I , we have (cid:13)(cid:13) T − i (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − n − − n m − αn (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ . Here, the last step used that | − n | ≤ , | − αn | ≤ and that | m | ≤ G n = (cid:6) n (1 − α ) (cid:7) ≤ ⌈ n ⌉ = 2 n , so that |− − n m | ≤ as well.Furthermore, Lemma 3.2 shows n ≤ | ξ | ≤ · n for all ξ ∈ S ( α ) i . In particular, since we have | ξ | ≤ for arbitrary ξ ∈ S ( α )0 = ( − , , we have i ∗ ⊂ I as soon as n > , i.e., for n ≥ . Now, for n ≥ and j = ( ν, µ, e, d ) ∈ i ∗ ⊂ I ,there is some η ∈ S ( α ) i ∩ S ( α ) j , so that Lemma 3.2 yields n ≤ | η | ≤ · ν . Another application of Lemma 3.2 thenshows | ξ | ≥ ν ≥ · · n = n for all ξ ∈ S ( α ) j . All in all, we have shown | ξ | ≥ | ξ | ≥ n for all ξ ∈ ( S ( α ) i ) ∗ for arbitrary i = ( n, m, ε, δ ) ∈ I with n ≥ . But in case of n ≤ , we simply have | ξ | ≥ ≥ n , so that thisestimate holds for all i = ( n, m, ε, δ ) ∈ I .Overall, we conclude w ( N ) i ≤ · (1+ α ) np · (cid:18) n (cid:19) − N = 3 · N · n ( αp − N ) ∀ i = ( n, m, ε, δ ) ∈ I . For arbitrary θ ∈ (0 , , this implies X i =( n,m,ε,δ ) ∈ I (cid:20) w si · w ( N ) i (cid:21) θ ≤ · (cid:0) · N (cid:1) θ · ∞ X n =0 X | m |≤ G n nθ ( αp − s − N )( since G n ≤ n ) ≤ · (cid:0) · N (cid:1) θ · ∞ X n =0 θn ( θ + αp − s − N ) < ∞ as soon as N > θ + αp − s , which can always be satisﬁed. Since we have ℓ θ ( I ) ֒ → ℓ q ′ ( I ) for θ ≤ q ′ , this showsthat we always have w ( N ) ∈ ℓ q ′ /w s ( I ) , for suﬃciently large N ∈ N . As explained above, we can thus invoke [60,Theorem 8.3] to complete the proof. (cid:3) Now that we have veriﬁed that the α -shearlet smoothness spaces are indeed well-deﬁned (Quasi)-Banach spaces,our next goal is to verify that the theory of structured Banach frame decompositions for decomposition spaces—asoutlined in Section 2—applies to these spaces. This is the goal of the next section. As we will see (see e.g. Theorem5.13), this implies that the α -shearlet smoothness spaces simultaneously characterize analysis sparsity and synthesissparsity with respect to (suitable) α -shearlet systems. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Construction of Banach frame decompositions for α -shearlet smoothness spaces We now want to verify the pertinent conditions from Theorems 2.9 and 2.10 for the α -shearlet smoothness spaces.To this end, ﬁrst recall from Deﬁnition 3.1 that we have Q ′ i = Q for all i ∈ I and furthermore Q ′ = ( − , .Consequently, in the notation of Assumption 2.7, we can choose n = 2 and Q (1)0 := Q = U ( − , ) ( − , , as well as Q (2)0 := ( − , .We ﬁx a low-pass ﬁlter ϕ ∈ W , (cid:0) R (cid:1) ∩ C (cid:0) R (cid:1) and a mother shearlet ψ ∈ W , (cid:0) R (cid:1) ∩ C (cid:0) R (cid:1) . Then weset (again in the notation of Assumption 2.7) γ (0)1 := ψ and γ (0)2 := ϕ , as well as k := 2 and k i := 1 for i ∈ I .With these choices, the family Γ = ( γ i ) i ∈ I introduced in Theorems 2.9 and 2.10 satisﬁes γ i = γ (0) k i = γ (0)1 = ψ for i ∈ I and γ = γ (0) k = γ (0)2 = ϕ , so that the family Γ is completely determined by ϕ and ψ .Our main goal in this section is to derive readily veriﬁable conditions on ϕ, ψ which guarantee that the generalizedshift-invariant system Ψ δ := (cid:16) L δ · T − Ti k γ [ i ] (cid:17) i ∈ I, k ∈ Z , with γ [ i ] = | det T i | / · γ i ◦ T Ti , generates, respectively, a Banachframe or an atomic decomposition for the α -shearlet smoothness space S p,qα,s (cid:0) R (cid:1) , for suﬃciently small δ > .Precisely, we assume b ψ, b ϕ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of these functions are assumed to be polyno-mially bounded. Furthermore, we assume (at least for the application Theorem 2.9) that max | β |≤ max | θ |≤ N (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ d ∂ β ψ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C · min n | ξ | M , (1 + | ξ | ) − M o · (1 + | ξ | ) − K = C · θ ( ξ ) · θ ( ξ ) = C · ̺ ( ξ ) , max | β |≤ max | θ |≤ N (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ d ∂ β ϕ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C · (1 + | ξ | ) − H = C · ̺ ( ξ ) (4.1)for all ξ = ( ξ , ξ ) ∈ R , a suitable constant C > and certain M , M , K, H ∈ [0 , ∞ ) and N ∈ N . To be precise,we note that equation (4.1) employed the abbreviations θ ( ξ ) := min n | ξ | M , (1 + | ξ | ) − M o and θ ( ξ ) := (1 + | ξ | ) − K for ξ , ξ ∈ R , as well as ̺ ( ξ ) := θ ( ξ ) · θ ( ξ ) and ̺ ( ξ ) := (1 + | ξ | ) − H for ξ = (cid:16) ξ ξ (cid:17) ∈ R .Our goal in the following is to derive conditions on N, M , M , K, H (depending on p, q, s, α ) which ensure thatthe family Ψ δ indeed forms a Banach frame or an atomic decomposition for S p,qα,s (cid:0) R (cid:1) = D (cid:0) S ( α ) , L p , ℓ qw s (cid:1) .To verify the conditions of Theorem 2.9 (recalling that b j = 0 for all j ∈ I ), we need to estimate the quantity M j,i := (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · max | β |≤ | det T i | − · Z S ( α ) i max | θ |≤ N (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ [ ∂ β γ j (cid:17) (cid:0) T − j ξ (cid:1)(cid:12)(cid:12)(cid:12) d ξ ! τ ( eq. (4.1) ) ≤ C τ · (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ =: C τ · M (0) j,i (4.2)with σ, τ > and N ∈ N as in Theorem 2.9 and arbitrary i, j ∈ I , where we deﬁned ̺ j := ̺ for j ∈ I , with ̺ and ̺ as deﬁned in equation (4.1).In view of equation (4.2), the following—highly nontrivial—lemma is crucial: Lemma 4.1.

Let α ∈ [0 , and τ , ω, c ∈ (0 , ∞ ) . Furthermore, let K, H, M , M ∈ [0 , ∞ ) . Then there is a constant C = C ( α, τ , ω, c, K, H, M , M ) > with the following property:If σ, τ ∈ (0 , ∞ ) and s ∈ R satisfy τ ≥ τ and στ ≤ ω and if we have K ≥ K + c , M ≥ M (0)1 + c , and M ≥ M (0)2 + c , as well as H ≥ H + c for K := ( max (cid:8) στ − s, στ (cid:9) , if α = 1 , max (cid:8) − ατ + 2 στ − s, στ (cid:9) , if α ∈ [0 , ,M (0)1 := ( τ + s, if α = 1 , τ + max { s, } , if α ∈ [0 , ,M (0)2 := (1 + α ) στ − s,H := 1 − ατ + στ − s, nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein then we have max  sup i ∈ I X j ∈ I M (0) j,i , sup j ∈ I X i ∈ I M (0) j,i  ≤ C τ , where M (0) j,i is as in equation (4.2) , i.e., M (0) j,i := (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ , with ̺ ( ξ ) = (1 + | ξ | ) − H and ̺ j ( ξ ) = min n | ξ | M , (1 + | ξ | ) − M o · (1 + | ξ | ) − K for arbitrary j ∈ I . ◭ The proof of Lemma 4.1 is highly technical and very lengthy. In order to not disrupt the ﬂow of the paper tooseverely, we deferred the proof to the appendix (Section C).Using the general result of Lemma 4.1, we can now derive convenient suﬃcient conditions concerning the low-passﬁlter ϕ and the mother shearlet ψ which ensure that ϕ, ψ generate a Banach frame for S p,qα,s (cid:0) R (cid:1) . Theorem 4.2.

Let α ∈ [0 , , ε, p , q ∈ (0 , and s , s ∈ R with s ≤ s . Assume that ϕ, ψ : R → C satisfy thefollowing: • ϕ, ψ ∈ L (cid:0) R (cid:1) and b ϕ, b ψ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of b ϕ, b ψ have at most polynomial growth. • ϕ, ψ ∈ C (cid:0) R (cid:1) and ∇ ϕ, ∇ ψ ∈ L (cid:0) R (cid:1) ∩ L ∞ (cid:0) R (cid:1) . • We have b ψ ( ξ ) = 0 for all ξ = ( ξ , ξ ) ∈ R with ξ ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | , b ϕ ( ξ ) = 0 for all ξ ∈ [ − , . • There is some

C > such that b ψ and b ϕ satisfy the estimates (cid:12)(cid:12) ∂ θ b ψ ( ξ ) (cid:12)(cid:12) ≤ C · | ξ | M (1 + | ξ | ) − (1+ K ) ∀ ξ = ( ξ , ξ ) ∈ R with | ξ | ≤ , (cid:12)(cid:12) ∂ θ b ψ ( ξ ) (cid:12)(cid:12) ≤ C · (1 + | ξ | ) − ( M +1) (1 + | ξ | ) − ( K +1) ∀ ξ = ( ξ , ξ ) ∈ R , (cid:12)(cid:12) ∂ θ b ϕ ( ξ ) (cid:12)(cid:12) ≤ C · (1 + | ξ | ) − ( H +1) ∀ ξ ∈ R (4.3)for all θ ∈ N with | θ | ≤ N , where N := (cid:6) p − · (2 + ε ) (cid:7) and K := ε + max (cid:26) − α min { p , q } + 2 (cid:18) p + N (cid:19) − s , { p , q } + 2 p + N (cid:27) ,M := ε + 1min { p , q } + max { s , } ,M := max (cid:26) , ε + (1 + α ) (cid:18) p + N (cid:19) − s (cid:27) ,H := max (cid:26) , ε + 1 − α min { p , q } + 2 p + N − s (cid:27) . Then there is some δ ∈ (0 , such that for < δ ≤ δ and all p, q ∈ (0 , ∞ ] and s ∈ R with p ≥ p , q ≥ q and s ≤ s ≤ s , the following is true: The family f SH ( ± α,ϕ,ψ,δ := (cid:16) L δ · T − Ti k f γ [ i ] (cid:17) i ∈ I,k ∈ Z with f γ [ i ] ( x ) = γ [ i ] ( − x ) and γ [ i ] := ( | det T i | / · (cid:0) ψ ◦ T Ti (cid:1) , if i ∈ I ,ϕ, if i = 0 forms a Banach frame for S p,qα,s (cid:0) R (cid:1) = D (cid:0) S ( α ) , L p , ℓ qw s (cid:1) . Precisely, this means the following:(1) The analysis operator A ( δ ) : S p,qα,s (cid:0) R (cid:1) → C p,qw s , f (cid:2) ( γ [ i ] ∗ f ) (cid:0) δ · T − Ti k (cid:1)(cid:3) i ∈ I,k ∈ Z is well-deﬁned and bounded for arbitrary δ ∈ (0 , , with the coeﬃcient space C p,qw s from Deﬁnition 2.8. Theconvolution γ [ i ] ∗ f has to be understood as explained in equation (2.3); see Lemma 5.12 for a more convenientexpression for this convolution, for f ∈ L (cid:0) R (cid:1) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein (2) For < δ ≤ δ , there is a bounded linear reconstruction operator R ( δ ) : C p,qw s → S p,qα,s (cid:0) R (cid:1) satisfying R ( δ ) ◦ A ( δ ) = id S p,qα,s ( R ) .(3) For < δ ≤ δ , we have the following consistency statement : If f ∈ S p,qα,s (cid:0) R (cid:1) and if p ≤ ˜ p ≤ ∞ , q ≤ ˜ q ≤ ∞ and s ≤ ˜ s ≤ s , then the following equivalence holds: f ∈ S ˜ p, ˜ qα, ˜ s (cid:0) R (cid:1) ⇐⇒ (cid:2) ( γ [ i ] ∗ f ) (cid:0) δ · T − Ti k (cid:1)(cid:3) i ∈ I,k ∈ Z ∈ C ˜ p, ˜ qw ˜ s . ◭ Proof.

First, we show that there are constants K , K > such that max | β |≤ max | θ |≤ N (cid:12)(cid:12)(cid:0) ∂ θ d ∂ β ψ (cid:1) ( ξ ) (cid:12)(cid:12) ≤ K · min n | ξ | M , (1 + | ξ | ) − M o · (1 + | ξ | ) − K =: K · ̺ ( ξ ) (4.4)and max | β |≤ max | θ |≤ N (cid:12)(cid:12)(cid:0) ∂ θ d ∂ β ϕ (cid:1) ( ξ ) (cid:12)(cid:12) ≤ K · (1 + | ξ | ) − H =: K · ̺ ( ξ ) (4.5)for all ξ = ( ξ , ξ ) ∈ R .To this end, we recall that ϕ, ψ ∈ C (cid:0) R (cid:1) ∩ W , (cid:0) R (cid:1) , so that standard properties of the Fourier transformshow for β = e ℓ (the ℓ -th unit vector) that d ∂ β ψ ( ξ ) = 2 πi · ξ ℓ · b ψ ( ξ ) and d ∂ β ϕ ( ξ ) = 2 πi · ξ ℓ · b ϕ ( ξ ) ∀ ξ ∈ R . Then, Leibniz’s rule yields for β = e ℓ and arbitrary θ ∈ N with | θ | ≤ N that (cid:12)(cid:12)(cid:0) ∂ θ d ∂ β ψ (cid:1) ( ξ ) (cid:12)(cid:12) = 2 π · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X ν ≤ θ (cid:18) θν (cid:19) · ( ∂ ν ξ ℓ ) · (cid:0) ∂ θ − ν b ψ (cid:1) ( ξ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N +1 π · (1 + | ξ ℓ | ) · max | η |≤ N | ( ∂ η b ψ ) ( ξ ) | (4.6) ≤ N +1 π · (1 + | ξ ℓ | ) · C · (1 + | ξ | ) − (1+ M ) (1 + | ξ | ) − (1+ K ) ≤ N +1 πC · (1 + | ξ | ) − M · (1 + | ξ | ) − K , (4.7)since we have | ∂ ν ξ ℓ | =  | ξ ℓ | , if ν = 01 , if ν = e ℓ , otherwise and thus | ∂ ν ξ ℓ | ≤ | ξ ℓ | ≤ | ξ | . Above, we also used that P ν ≤ θ (cid:0) θν (cid:1) = (2 , . . . , θ = 2 | θ | ≤ N , as a consequence of the d -dimensional binomialtheorem (cf. [21, Section 8.1, Exercise 2.b]).Likewise, we get (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ d ∂ β ϕ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) = 2 π · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X ν ≤ θ (cid:18) θν (cid:19) · ( ∂ ν ξ ℓ ) · (cid:0) ∂ θ − ν b ϕ (cid:1) ( ξ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N +1 π · (1 + | ξ | ) · max | η |≤ N | ∂ η b ϕ ( ξ ) |≤ N +1 πC · (1 + | ξ | ) − H = 2 N +1 πC · ̺ ( ξ ) and, by assumption, (cid:12)(cid:12) ∂ θ b ϕ ( ξ ) (cid:12)(cid:12) ≤ C · (1 + | ξ | ) − ( H +1) ≤ C · (1 + | ξ | ) − H = C · ̺ ( ξ ) . With this, we have already established equation (4.5) with K := 2 N +1 πC .To validate equation (4.4), we now distinguish the two cases | ξ | > and | ξ | ≤ : Case 1 : We have | ξ | > . In this case, ̺ ( ξ ) = (1 + | ξ | ) − M (1 + | ξ | ) − K , so that equation (4.7) shows (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ d ∂ β ψ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) ≤ N +1 πC · ̺ ( ξ ) for β = e ℓ , ℓ ∈ { , } and arbitrary θ ∈ N with | θ | ≤ N . Finally, we also have (cid:12)(cid:12)(cid:12) ∂ θ b ψ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C · (1 + | ξ | ) − (1+ M ) (1 + | ξ | ) − (1+ K ) ≤ C · (1 + | ξ | ) − M (1 + | ξ | ) − K = C · ̺ ( ξ ) and hence max | β |≤ max | θ |≤ N (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ d ∂ β ψ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) ≤ N +1 πC · ̺ ( ξ ) for all ξ ∈ R with | ξ | > . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Case 2 : We have | ξ | ≤ . First note that this implies (1 + | ξ | ) − M ≥ − M ≥ − M | ξ | M and consequently ̺ ( ξ ) ≥ − M | ξ | M · (1 + | ξ | ) − K . Furthermore, we have for arbitrary ℓ ∈ { , } that | ξ ℓ | ≤ max { | ξ | , | ξ |} ≤ max { , | ξ |} ≤ · (1 + | ξ | ) . In conjunction with equation (4.6), this shows for β = e ℓ , ℓ ∈ { , } and θ ∈ N with | θ | ≤ N that (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ d ∂ β ψ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) ≤ N +1 π · (1 + | ξ ℓ | ) · max | η |≤ N (cid:12)(cid:12)(cid:12) ∂ η b ψ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ N +2 πC · (1 + | ξ | ) · | ξ | M · (1 + | ξ | ) − (1+ K ) ≤ M + N πC · ̺ ( ξ ) . Finally, we also have (cid:12)(cid:12)(cid:12) ∂ θ b ψ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C · | ξ | M (1 + | ξ | ) − (1+ K ) ≤ C · | ξ | M (1 + | ξ | ) − K ≤ M C · ̺ ( ξ ) . All in all, we have shown max | β |≤ max | θ |≤ N (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ d ∂ β ψ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) ≤ M + N πC · ̺ ( ξ ) for all ξ ∈ R with | ξ | ≤ .All together, we have thus established eq. (4.4) with K := 2 M + N πC . Now, deﬁne C ♦ := max { K , K } = K .Now, for proving the current theorem, we want to apply Theorem 2.9 with γ (0)1 := ψ , γ (0)2 := ϕ and k i := 1 for i ∈ I and k := 2 , as well as Q (1)0 := Q = U ( − , ) ( − , and Q (2)0 := ( − , , cf. Assumption 2.7 and Deﬁnition 3.1. Inthe notation of Theorem 2.9, we then have γ i = γ (0) k i for all i ∈ I , i.e., γ i = ψ for i ∈ I and γ = ϕ . Using thisnotation and setting furthermore ̺ i := ̺ for i ∈ I , we have thus shown for arbitrary N ∈ N with N ≤ N that M j,i : = (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · max | β |≤ | det T i | − · Z S ( α ) i max | θ |≤ N (cid:12)(cid:12)(cid:12)(cid:16) ∂ θ [ ∂ β γ j (cid:17) (cid:0) T − j ξ (cid:1)(cid:12)(cid:12)(cid:12) d ξ ! τ ≤ C τ ♦ · (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ =: C τ ♦ · M (0) j,i for arbitrary σ, τ > , s ∈ R and the S ( α ) -moderate weight w s (cf. Lemma 3.4).In view of the assumptions of the current theorem, the prerequisites (1)-(3) of Theorem 2.9 are clearly fulﬁlled,but we still need to verify C := sup i ∈ I X j ∈ I M j,i < ∞ and C := sup j ∈ I X i ∈ I M j,i < ∞ , with M j,i as above, τ := min { , p, q } ≥ min { p , q } =: τ , and N := (cid:24) ε min { , p } (cid:25) ≤ (cid:24) εp (cid:25) = N , as well as σ := τ · (cid:18) { , p } + N (cid:19) ≤ τ · (cid:18) p + N (cid:19) . (4.8)In particular, we have στ ≤ p + N = p + l εp m =: ω .Hence, Lemma 4.1 (with c = ε ) yields a constant C = C ( α, τ , ω, ε, K, H, M , M ) with max { C , C } ≤ C τ ♦ C τ ,provided that we can show H ≥ H + ε , K ≥ K + ε and M ℓ ≥ M (0) ℓ + ε for ℓ ∈ { , } , with H , K , M (0)1 , M (0)2 asdeﬁned in Lemma 4.1. But we have H = 1 − ατ + στ − s ≤ − ατ + ω − s = 1 − α min { p , q } + 2 p + N − s ≤ H − ε. Furthermore, M (0)2 = (1 + α ) στ − s ≤ (1 + α ) ω − s = (1 + α ) (cid:18) p + N (cid:19) − s ≤ M − ε nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein and M ( )1 ≤ τ + max { s, } ≤ { p , q } + max { s , } = M − ε, as well as K ≤ max (cid:26) − ατ + 2 στ − s, στ (cid:27) ≤ max (cid:26) − ατ + 2 ω − s , τ + ω (cid:27) = max (cid:26) − α min { p , q } + 2 (cid:18) p + N (cid:19) − s , { p , q } + 2 p + N (cid:27) = K − ε. Thus, Lemma 4.1 is applicable, so that C /τ = (cid:18) sup i ∈ I X j ∈ I M j,i (cid:19) /τ ≤ C ♦ C , where the right-hand side is independent of p, q and s , since C is independent of p, q and s and since C ♦ = C ♦ ( ε, p , M , C ) = 2 M + N πC = 2 M + l εp m πC. The exact same estimate holds for C .We have shown that all prerequisites for Theorem 2.9 are fulﬁlled. Hence, the theorem implies that there is aconstant K ♦ = K ♦ (cid:0) p , q , ε, S ( α ) , ϕ, ψ (cid:1) > (independent of p, q, s ) such that the family f SH ( ± α,ϕ,ψ,δ forms a Banachframe for S p,qα,s (cid:0) R (cid:1) , as soon as δ ≤ δ , where δ := (cid:18) K ♦ · C S ( α ) ,w s · (cid:16) C /τ + C /τ (cid:17) (cid:19) − . From Lemma 3.4 we know that C S ( α ) ,w s ≤ | s | ≤ s where s := max {| s | , | s |} . Hence, choosing δ := (cid:0) · K ♦ · C ♦ · C · s (cid:1) − , we get δ ≤ δ and δ is independent of the precise choice of p, q, s , as long as p ≥ p , q ≥ q and s ≤ s ≤ s .Thus, for < δ ≤ δ and arbitrary p, q ∈ (0 , ∞ ] , s ∈ R with p ≥ p , q ≥ q and s ≤ s ≤ s , the family f SH ( ± α,ϕ,ψ,δ forms a Banach frame for S p,qα,s (cid:0) R (cid:1) . (cid:3) Finally, we also come to veriﬁable suﬃcient conditions which ensure that the low-pass ϕ and the mother shearlet ψ generate atomic decompositions for S p,qα,s (cid:0) R (cid:1) . Theorem 4.3.

Let α ∈ [0 , , ε, p , q ∈ (0 , and s , s ∈ R with s ≤ s . Assume that ϕ, ψ ∈ L (cid:0) R (cid:1) satisfy thefollowing properties: • We have k ϕ k p < ∞ and k ψ k p < ∞ , where k g k Λ = sup x ∈ R (1 + | x | ) Λ | g ( x ) | for g : R → C ℓ (witharbitrary ℓ ∈ N ) and Λ ≥ . • We have b ϕ, b ψ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of b ϕ, b ψ are polynomially bounded. • We have b ψ ( ξ ) = 0 for all ξ = ( ξ , ξ ) ∈ R with ξ ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | , b ϕ ( ξ ) = 0 for all ξ ∈ [ − , . • We have (cid:12)(cid:12) ∂ β b ϕ ( ξ ) (cid:12)(cid:12) . (1 + | ξ | ) − Λ , (cid:12)(cid:12) ∂ β b ψ ( ξ ) (cid:12)(cid:12) . min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ · (1 + | ξ | ) − (3+ ε ) (4.9) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein for all ξ = ( ξ , ξ ) ∈ R and all β ∈ N with | β | ≤ (cid:6) p − · (2 + ε ) (cid:7) , where Λ :=  ε + max n − α min { p ,q } + 3 + s , o , if p = 1 , ε + max n − α min { p ,q } + − αp + 1 + α + l εp m + s , o , if p ∈ (0 , , Λ := ε + 1min { p , q } + max (cid:26) , (1 + α ) (cid:18) p − (cid:19) − s (cid:27) , Λ := ( ε + max { , α ) + s } , if p = 1 ,ε + max n , (1 + α ) (cid:16) p + l εp m(cid:17) + s o , if p ∈ (0 , , Λ :=  ε + max n − α min { p ,q } + 6 + s , { p ,q } + 3 o , if p = 1 ,ε + max n − α min { p ,q } + − αp + 2 l εp m + 1 + α + s , { p ,q } + p + l εp mo , if p ∈ (0 , . Then there is some δ ∈ (0 , such that for all < δ ≤ δ and all p, q ∈ (0 , ∞ ] and s ∈ R with p ≥ p , q ≥ q and s ≤ s ≤ s , the following is true: The family SH ( ± α,ϕ,ψ,δ := (cid:16) L δ · T − Ti k γ [ i ] (cid:17) i ∈ I, k ∈ Z with γ [ i ] := ( | det T i | / · (cid:0) ψ ◦ T Ti (cid:1) , if i ∈ I ,ϕ, if i = 0 forms an atomic decomposition for S p,qα,s (cid:0) R (cid:1) . Precisely, this means the following:(1) The synthesis map S ( δ ) : C p,qw s → S p,qα,s (cid:0) R (cid:1) , ( c ( i ) k ) i ∈ I,k ∈ Z X i ∈ I X k ∈ Z (cid:16) c ( i ) k · L δ · T − Ti k γ [ i ] (cid:17) is well-deﬁned and bounded for all δ ∈ (0 , , where the coeﬃcient space C p,qw s is as in Deﬁnition 2.8. Convergenceof the series has to be understood as described in the remark to Theorem 2.10.(2) For < δ ≤ δ , there is a bounded linear coeﬃcient map C ( δ ) : S p,qα,s (cid:0) R (cid:1) → C p,qw s satisfying S ( δ ) ◦ C ( δ ) = id S p,qα,s ( R ) .Furthermore, the action of C ( δ ) is independent of the precise choice of p, q, s . Precisely, if p , p ≥ p , q , q ≥ q and s (1) , s (2) ∈ [ s , s ] and if f ∈ S p ,q α,s (1) ∩ S p ,q α,s (2) , then C ( δ )1 f = C ( δ )2 f , where C ( δ ) i denotes thecoeﬃcient operator for the choices p = p i , q = q i and s = s ( i ) for i ∈ { , } . ◭ Proof.

Later in the proof, we will apply Theorem 2.10 to the decomposition space S p,qα,s ( R ) = D (cid:0) S ( α ) , L p , ℓ qw s (cid:1) with w and w s as in Lemma 3.4, while Theorem 2.10 itself considers the decomposition space D ( Q , L p , ℓ qw ) . Toavoid confusion between these two diﬀerent choices of the weight w , we will write v for the weight deﬁned in Lemma3.4, so that we get S p,qα,s ( R ) = D (cid:0) S ( α ) , L p , ℓ qv s (cid:1) . For the application of Theorem 2.10, we will thus choose Q = S ( α ) and w = v s .Our assumptions on ϕ show that there is a constant C > satisfying (cid:12)(cid:12) ∂ β b ϕ ( ξ ) (cid:12)(cid:12) ≤ C · (1 + | ξ | ) − Λ for all β ∈ N with | β | ≤ N := (cid:6) p − · (2 + ε ) (cid:7) . We ﬁrst apply Proposition 2.11 (with N = N ≥ ⌈ ε ⌉ = 3 = d + 1 , with γ = ϕ and with ̺ = ̺ for ̺ ( ξ ) := C · (1 + | ξ | ) ε − Λ , where we note Λ − − ε ≥ ε , so that ̺ ∈ L (cid:0) R (cid:1) ).We indeed have (cid:12)(cid:12) ∂ β b ϕ ( ξ ) (cid:12)(cid:12) ≤ C · (1 + | ξ | ) − Λ = ̺ ( ξ ) · (1 + | ξ | ) − ( d +1+ ε ) for all | β | ≤ N , since we are working in R d = R . Consequently, Proposition 2.11 provides functions ϕ ∈ C (cid:0) R (cid:1) ∩ L (cid:0) R (cid:1) and ϕ ∈ C (cid:0) R (cid:1) ∩ W , (cid:0) R (cid:1) with ϕ = ϕ ∗ ϕ and with the following additional properties:(1) We have k ϕ k Λ < ∞ and k∇ ϕ k Λ < ∞ for all Λ ∈ N .(2) We have c ϕ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of c ϕ are polynomially bounded.(3) We have c ϕ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of c ϕ are polynomially bounded. This uses that b ϕ ∈ C ∞ (cid:0) R (cid:1) with all partial derivatives being polynomially bounded.(4) We have (cid:12)(cid:12) ∂ β c ϕ ( ξ ) (cid:12)(cid:12) ≤ C C · ̺ ( ξ ) = C · (1 + | ξ | ) ε − Λ ∀ ξ ∈ R and β ∈ N with | β | ≤ N . (4.10)Here, C is given by C := C · N · N ! · N . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Likewise, our assumptions on ψ show that there is a constant C > satisfying (cid:12)(cid:12)(cid:12) ∂ β b ψ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C · min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ · (1 + | ξ | ) − (3+ ε ) ∀ ξ ∈ R ∀ β ∈ N with | β | ≤ N . Now, we again apply Proposition 2.11, but this time with N = N ≥ d + 1 , with γ = ψ and with ̺ = ̺ for ̺ ( ξ ) := C · min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ , where we note that Λ ≥ ε and Λ ≥ ≥ ε , so that ̺ ( ξ ) ≤ C · (1 + | ξ | ) − (2+ ε ) · (1 + | ξ | ) − (2+ ε ) ≤ C · [max { | ξ | , | ξ |} ] − (2+ ε ) ≤ C · (1 + k ξ k ∞ ) − (2+ ε ) ∈ L ( R ) . As we just saw, we indeed have (cid:12)(cid:12)(cid:12) ∂ β b ψ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ ̺ ( ξ ) · (1 + | ξ | ) − ( d +1+ ε ) for all | β | ≤ N , since we are working in R d = R . Consequently, Proposition 2.11 provides functions ψ ∈ C (cid:0) R (cid:1) ∩ L (cid:0) R (cid:1) and ψ ∈ C (cid:0) R (cid:1) ∩ W , (cid:0) R (cid:1) with ψ = ψ ∗ ψ and with the following additional properties:(1) We have k ψ k Λ < ∞ and k∇ ψ k Λ < ∞ for all Λ ∈ N .(2) We have c ψ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of c ψ are polynomially bounded.(3) We have c ψ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of c ψ are polynomially bounded. This uses that b ψ ∈ C ∞ (cid:0) R (cid:1) with all partial derivatives being polynomially bounded.(4) We have (cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12) ≤ C C · ̺ ( ξ )= C · min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ ∀ ξ ∈ R and β ∈ N with | β | ≤ N . (4.11)Here, C is given by C := C · N · N ! · N .In summary, if we deﬁne M := Λ , M := Λ and K := Λ , as well as H := Λ − − ε , then we have M , M , K, H ≥ and max | β |≤ N (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C · min n | ξ | M , (1 + | ξ | ) − M o · (1 + | ξ | ) − K =: C · ̺ ( ξ ) , max | β |≤ N (cid:12)(cid:12) ∂ β c ϕ ( ξ ) (cid:12)(cid:12) ≤ C · (1 + | ξ | ) − H =: C · ̺ ( ξ ) , (4.12)where we deﬁned C := max { C , C } for brevity. For consistency with Lemma 4.1, we deﬁne ̺ j := ̺ for arbitrary j ∈ I .Now, deﬁne n := 2 , γ (0)1 := ψ and γ (0)2 := ϕ , as well as γ (0 ,j )1 := ψ j and γ (0 ,j )2 := ϕ j for j ∈ { , } . We want toverify the assumptions of Theorem 2.10 for these choices and for S p,qα,s (cid:0) R (cid:1) = D (cid:0) S ( α ) , L p , ℓ qv s (cid:1) = D ( Q , L p , ℓ qw ) . Tothis end, we recall from Deﬁnition 3.1 that Q := S ( α ) = ( T i Q ′ i + b i ) i ∈ I , with Q ′ i = U ( − , ) ( − , = Q =: Q (1)0 = Q ( k i )0 for all i ∈ I , where k i := 1 for i ∈ I and with Q ′ = ( − , =: Q (2)0 = Q ( k )0 , where k := 2 and n := 2 , cf.Assumption 2.7.Now, let us verify the list of prerequisites of Theorem 2.10:(1) We have γ (0 , k ∈ { ϕ , ψ } ⊂ L (cid:0) R (cid:1) for k ∈ { , } by the properties of ϕ , ψ from above.(2) Likewise, we have γ (0 , k ∈ { ϕ , ψ } ⊂ C (cid:0) R (cid:1) by the properties of ϕ , ψ from above.(3) Next, with Υ = 1 + d min { ,p } as in Theorem 2.10, we have Υ ≤ p =: Υ and thus, with Ω ( p ) as inTheorem 2.10, Ω ( p ) = max k ∈ n (cid:13)(cid:13)(cid:13) γ (0 , k (cid:13)(cid:13)(cid:13) Υ + max k ∈ n (cid:13)(cid:13)(cid:13) ∇ γ (0 , k (cid:13)(cid:13)(cid:13) Υ = max {k ϕ k Υ , k ψ k Υ } + max {k∇ ϕ k Υ , k∇ ψ k Υ }≤ max n k ϕ k ⌈ Υ ⌉ , k ψ k ⌈ Υ ⌉ o + max n k∇ ϕ k ⌈ Υ ⌉ , k∇ ψ k ⌈ Υ ⌉ o =: C < ∞ (4.13)by the properties of ϕ , ψ from above.(4) We have F γ (0 ,j ) k ∈ nc ϕ , c ψ , c ϕ , c ψ o ⊂ C ∞ (cid:0) R (cid:1) and all partial derivatives of these functions are polynomi-ally bounded.(5) We have γ (0)1 = ψ = ψ ∗ ψ = γ (0 , ∗ γ (0 , and γ (0)2 = ϕ = ϕ ∗ ϕ = γ (0 , ∗ γ (0 , . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein (6) By assumption, we have F γ (0)1 ( ξ ) = b ψ ( ξ ) = 0 for all ξ ∈ Q = Q (1)0 . Likewise, we have F γ (0)2 ( ξ ) = b ϕ ( ξ ) = 0 for all ξ ∈ [ − , = ( − , = Q (2)0 .(7) We have k γ (0)1 k Υ = k ψ k Υ ≤ k ψ k Υ = k ψ k p < ∞ and k γ (0)2 k Υ = k ϕ k Υ ≤ k ϕ k p < ∞ , thanks to ourassumptions on ϕ, ψ .Thus, as the last prerequisite of Theorem 2.10, we have to verify K := sup i ∈ I X j ∈ I N i,j < ∞ and K := sup j ∈ I X i ∈ I N i,j < ∞ , where γ j, := γ (0 , k j for j ∈ I (i.e., γ , = γ (0 , = ϕ and γ j, = γ (0 , = ψ for j ∈ I ) and N i,j := w i w j · (cid:20) | det T j || det T i | (cid:21) ϑ ! τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · (cid:18) | det T i | − · Z Q i max | β |≤ N (cid:12)(cid:12)(cid:2) ∂ β d γ j, (cid:3) (cid:0) T − j ( ξ − b j ) (cid:1)(cid:12)(cid:12) d ξ (cid:19) τ ( since b j =0 for all j ∈ I ) ( ∗ ) ≤ v (1+ α ) ϑ − sj v (1+ α ) ϑ − si ! τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i max | β |≤ N (cid:12)(cid:12)(cid:2) ∂ β d γ j, (cid:3) (cid:0) T − j ξ (cid:1)(cid:12)(cid:12) d ξ ! τ ( eq. (4.12) and N ≤ N ) ≤ C τ · M (0) j,i , where the quantity M (0) j,i is deﬁned as in Lemma 4.1, but with s ♮ := (1 + α ) ϑ − s instead of s . At the step markedwith ( ∗ ) , we used that we have w = v s and | det T i | = v αi for all i ∈ I .To be precise, we recall from Theorem 2.10 that the quantities N, τ, σ, ϑ from above are given (because of d = 2 )by ϑ = (cid:0) p − − (cid:1) + , τ = min { , p, q } ≥ min { p , q } =: τ and N = (cid:6) ( d + ε ) (cid:14) min { , p } (cid:7) ≤ (cid:6) p − · (2 + ε ) (cid:7) = N , as well as σ = ( τ · ( d + 1) = 3 · τ, if p ∈ [1 , ∞ ] ,τ · (cid:16) dp + l d + εp m(cid:17) = τ · (cid:16) p + N (cid:17) ≤ τ · (cid:16) p + N (cid:17) , if p ∈ (0 , . In particular, we have στ ≤ p + N =: ω , even in case of p ∈ [1 , ∞ ] , since p + N ≥ N ≥ ⌈ ε ⌉ ≥ .Now, Lemma 4.1 (with c = ε ) yields a constant C = C ( α, τ , ω, ε, K, H, M , M ) = C ( α, p , q , ε, Λ , Λ , Λ , Λ ) > satisfying max { K , K } ≤ C τ C τ , provided that we can show H ≥ H + ε , K ≥ K + ε and M ℓ ≥ M (0) ℓ + ε for ℓ ∈ { , } , where K := ( max (cid:8) στ − s ♮ , στ (cid:9) , if α = 1 , max (cid:8) − ατ + 2 στ − s ♮ , στ (cid:9) , if α ∈ [0 , ,M (0)1 := ( τ + s ♮ , if α = 1 , τ + max (cid:8) s ♮ , (cid:9) , if α ∈ [0 , ,M (0)2 := (1 + α ) στ − s ♮ ,H := 1 − ατ + στ − s ♮ . But we have H = ( − ατ + 3 + s, if p ∈ [1 , ∞ ] , − ατ + p + N − h (1 + α ) (cid:16) p − (cid:17) − s i , if p ∈ (0 , ( − ατ + 3 + s, if p ∈ [1 , ∞ ] , − ατ + − αp + 1 + α + l εp m + s, if p ∈ (0 , ≤ ( − ατ + 3 + s , if p ∈ [1 , ∞ ] , − ατ + − αp + 1 + α + l εp m + s , if p ∈ (0 , ≤ Λ − − ε = H − ε, nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein as an easy case distinction (using (cid:6) p − · (2 + ε ) (cid:7) ≥ ⌈ ε ⌉ ≥ and the observation that p ∈ (0 , entails p ∈ (0 , )shows.Furthermore, M (0)2 = ( · (1 + α ) + s, if p ∈ [1 , ∞ ] , (1 + α ) (cid:16) p + N (cid:17) − h (1 + α ) (cid:16) p − (cid:17) − s i , if p ∈ (0 , ( · (1 + α ) + s, if p ∈ [1 , ∞ ] , (1 + α ) (cid:16) p + N (cid:17) + s, if p ∈ (0 , ≤ ( · (1 + α ) + s , if p ∈ [1 , ∞ ] , (1 + α ) (cid:16) p + l εp m(cid:17) + s , if p ∈ (0 , ≤ Λ − ε = M − ε, as one can see again using an easy case distinction, since (cid:6) p − · (2 + ε ) (cid:7) ≥ ⌈ ε ⌉ ≥ .Likewise, M (0)1 ≤ τ + max (cid:8) s ♮ , (cid:9) ≤ τ + max ( , (1 + α ) (cid:18) p − (cid:19) + − s ) ≤ τ + max (cid:26) , (1 + α ) (cid:18) p − (cid:19) − s (cid:27) = Λ − ε = M − ε. Finally, we also have K ≤ max (cid:26) − ατ + 2 στ − s ♮ , στ (cid:27) = ( max (cid:8) − ατ + 6 + s, τ + 3 (cid:9) , if p ∈ [1 , ∞ ] , max n − ατ + 2 (cid:16) p + N (cid:17) − h (1 + α ) (cid:16) p − (cid:17) − s i , τ + (cid:16) p + N (cid:17)o , if p ∈ (0 , ( max (cid:8) − ατ + 6 + s, τ + 3 (cid:9) , if p ∈ [1 , ∞ ] , max n − ατ + − αp + 2 N + 1 + α + s, τ + p + N o , if p ∈ (0 , ≤  max n − ατ + 6 + s , τ + 3 o , if p ∈ [1 , ∞ ] , max n − ατ + − αp + 2 l εp m + 1 + α + s , τ + p + l εp mo , if p ∈ (0 , ≤ Λ − ε = K − ε, as one can see again using an easy case distinction and the estimate (cid:6) p − · (2 + ε ) (cid:7) ≥ ⌈ ε ⌉ ≥ .Consequently, Lemma 4.1 is indeed applicable and yields max { K , K } ≤ C τ C τ . We have thus veriﬁed allassumptions of Theorem 2.10, which yields a constant K = K (cid:16) p , q , ε, d, Q , Φ , γ (0)1 , . . . , γ (0) n (cid:17) = K ( p , q , ε, α, ϕ, ψ ) > such that the family SH ( ± α,ϕ,ψ,δ from the statement of the current theorem yields an atomic decomposition of the α -shearlet smoothness space S p,qα,s (cid:0) R (cid:1) = D ( Q , L p , ℓ qv s ) , as soon as < δ ≤ δ := min (cid:26) , h K · Ω ( p ) · (cid:16) K /τ + K /τ (cid:17)i − (cid:27) . But in equation (4.13) we saw Ω ( p ) ≤ C independently of p ≥ p , q ≥ q and of s ∈ [ s , s ] , so that δ ≥ δ := min n , [2 K · C C C ] − o , where δ > is independent of the precise choice of p, q, s , as long as p ≥ p , q ≥ q and s ∈ [ s , s ] . The claimsconcerning the notion of convergence for the series deﬁning S ( δ ) and concerning the independence of the action of C ( δ ) from the choice of p, q, s are consequences of the remark after Theorem 2.10. (cid:3) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein If ϕ, ψ are compactly supported and if the mother shearlet ψ is a tensor product, the preceding conditions canbe simpliﬁed signiﬁcantly: Corollary 4.4.

Let α ∈ [0 , , ε, p , q ∈ (0 , and s , s ∈ R with s ≤ s . Let Λ , . . . , Λ as in Theorem 4.3 andset N := (cid:6) p − · (2 + ε ) (cid:7) .Assume that the mother shearlet ψ can be written as ψ = ψ ⊗ ψ and that ϕ, ψ , ψ satisfy the following:(1) We have ϕ ∈ C ⌈ Λ ⌉ c (cid:0) R (cid:1) , ψ ∈ C ⌈ Λ +3+ ε ⌉ c ( R ) , and ψ ∈ C ⌈ Λ +3+ ε ⌉ c ( R ) .(2) We have d ℓ d ξ ℓ c ψ (0) = 0 for ℓ = 0 , . . . , N + ⌈ Λ ⌉ − .(3) We have b ϕ ( ξ ) = 0 for all ξ ∈ [ − , .(4) We have c ψ ( ξ ) = 0 for all ξ ∈ (cid:2) − , (cid:3) and c ψ ( ξ ) = 0 for all ξ ∈ [ − , .Then, ϕ, ψ satisfy all assumptions of Theorem 4.3. ◭ Proof.

Since ϕ, ψ ∈ L (cid:0) R (cid:1) are compactly supported, it is well known that b ϕ, b ψ ∈ C ∞ (cid:0) R (cid:1) with all partialderivatives being polynomially bounded (in fact bounded). Thanks to the compact support and boundedness of ϕ, ψ , we also clearly have k ϕ k p < ∞ and k ψ k p < ∞ .Next, if ξ = ( ξ , ξ ) ∈ R satisﬁes ξ ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | , then | ξ | ≤ | ξ | ≤ , i.e., ξ ∈ [ − , . Thus b ψ ( ξ ) = c ψ ( ξ ) · c ψ ( ξ ) = 0 , as required in Theorem 4.3.Hence, it only remains to verify (cid:12)(cid:12) ∂ β b ϕ ( ξ ) (cid:12)(cid:12) . (1 + | ξ | ) − Λ and (cid:12)(cid:12)(cid:12) ∂ β b ψ ( ξ ) (cid:12)(cid:12)(cid:12) . min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ · (1 + | ξ | ) − (3+ ε ) for all ξ ∈ R and all β ∈ N with | β | ≤ N . To this end, we ﬁrst recall that diﬀerentiation under the integral showsfor g ∈ C c (cid:0) R d (cid:1) that b g ∈ C ∞ (cid:0) R d (cid:1) , where the derivatives are given by ∂ β b g ( ξ ) = Z R d g ( x ) · ∂ βξ e − πi h x,ξ i d x = Z R d ( − πix ) β g ( x ) · e − πi h x,ξ i d x = (cid:16) F h x ( − πix ) β g ( x ) i(cid:17) ( ξ ) . (4.14)Furthermore, the usual mantra that “smoothness of f implies decay of b f ” shows that every g ∈ W N, (cid:0) R d (cid:1) satisﬁes | b g ( ξ ) | . (1 + | ξ | ) − N , see e.g. [62, Lemma 6.3].Now, because of ϕ ∈ C ⌈ Λ ⌉ c (cid:0) R (cid:1) , we also have h x ( − πix ) β ϕ ( x ) i ∈ C ⌈ Λ ⌉ c (cid:0) R (cid:1) ֒ → W ⌈ Λ ⌉ , (cid:0) R (cid:1) and thus (cid:12)(cid:12) ∂ β b ϕ ( ξ ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:16) F h x ( − πix ) β ϕ ( x ) i(cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) −⌈ Λ ⌉ ≤ (1 + | ξ | ) − Λ , as desired.For the estimate concerning b ψ , we have to work slightly harder: With the same arguments as for ϕ , we get (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − (Λ +3+ ε ) and (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − (Λ +3+ ε ) for all | β | ≤ N . Now, in case of | ξ | ≥ , wehave | ξ | Λ ≥ ≥ (1 + | ξ | ) − Λ and thus (cid:12)(cid:12)(cid:12) ∂ β b ψ ( ξ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:16) ∂ β c ψ (cid:17) ( ξ ) · (cid:16) ∂ β c ψ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − (Λ +3+ ε ) · (1 + | ξ | ) − (Λ +3+ ε ) = min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ · [(1 + | ξ | ) (1 + | ξ | )] − (3+ ε ) ≤ min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ · (1 + | ξ | ) − (3+ ε ) , as desired. Here, the last step used that (1 + | ξ | ) (1 + | ξ | ) ≥ | ξ | + | ξ | ≥ | ξ | .It remains to consider the case | ξ | ≤ . But for arbitrary β ∈ N with β ≤ N , our assumptions on c ψ ensure ∂ θ h ∂ β c ψ i (0) = 0 for all θ ∈ { , . . . , ⌈ Λ ⌉ − } , where we note Λ > , so that ⌈ Λ ⌉ − ≥ . But as the Fouriertransform of a compactly supported function, c ψ (and thus also ∂ β c ψ ) can be extended to an entire function on nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein C . In particular, ∂ β c ψ ( ξ ) = ∞ X θ =0 ∂ θ h ∂ β c ψ i (0) θ ! · ξ θ = ∞ X θ = ⌈ Λ ⌉ ∂ θ h ∂ β c ψ i (0) θ ! · ξ θ ( with ℓ = θ −⌈ Λ ⌉ ) = ξ ⌈ Λ ⌉ · ∞ X ℓ =0 ∂ ℓ + ⌈ Λ ⌉ h ∂ β c ψ i (0)( ℓ + ⌈ Λ ⌉ )! · ξ ℓ (4.15)for all ξ ∈ R , where the power series in the last line converges absolutely on all of R . In particular, the (continuous(!))function deﬁned by the power series is bounded on [ − , , so that we get (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . | ξ | ⌈ Λ ⌉ ≤ | ξ | Λ for ξ ∈ [ − , . Furthermore, note (1 + | ξ | ) − Λ ≥ − Λ ≥ − Λ · | ξ | Λ , so that (cid:12)(cid:12)(cid:12) ∂ β b ψ ( ξ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:16) ∂ β c ψ (cid:17) ( ξ ) · (cid:16) ∂ β c ψ (cid:17) ( ξ ) (cid:12)(cid:12)(cid:12) . | ξ | Λ · (1 + | ξ | ) − (Λ +3+ ε ) ≤ Λ · min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ · ε · [(1 + | ξ | ) (1 + | ξ | )] − (3+ ε ) ≤ ε +Λ · min n | ξ | Λ , (1 + | ξ | ) − Λ o · (1 + | ξ | ) − Λ · (1 + | ξ | ) − (3+ ε ) . (cid:3) Finally, we provide an analogous simpliﬁcation of the conditions of Theorem 4.2:

Corollary 4.5.

Let α ∈ [0 , , ε, p , q ∈ (0 , and s , s ∈ R with s ≤ s . Let K, M , M , H as in Theorem 4.2and set N := (cid:6) p − · (2 + ε ) (cid:7) .The functions ϕ, ψ fulﬁll all assumption of Theorem 4.2 if the mother shearlet ψ can be written as ψ = ψ ⊗ ψ ,where ϕ, ψ , ψ satisfy the following:(1) We have ϕ ∈ C ⌈ H +1 ⌉ c (cid:0) R (cid:1) , ψ ∈ C ⌈ M +1 ⌉ c ( R ) , and ψ ∈ C ⌈ K +1 ⌉ c ( R ) .(2) We have d ℓ d ξ ℓ c ψ (0) = 0 for ℓ = 0 , . . . , N + ⌈ M ⌉ − .(3) We have b ϕ ( ξ ) = 0 for all ξ ∈ [ − , .(4) We have c ψ ( ξ ) = 0 for all ξ ∈ (cid:2) − , (cid:3) and c ψ ( ξ ) = 0 for all ξ ∈ [ − , . ◭ Proof.

Observe ϕ, ψ ∈ C c (cid:0) R (cid:1) ⊂ L (cid:0) R (cid:1) and note b ϕ, b ψ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of these functionsare bounded (and thus polynomially bounded), since ϕ, ψ are compactly supported. Next, since K, H, M , M ≥ ,our assumptions clearly entail ϕ, ψ ∈ C c (cid:0) R (cid:1) , so that ∇ ϕ, ∇ ψ ∈ L (cid:0) R (cid:1) ∩ L ∞ (cid:0) R (cid:1) . Furthermore, we see exactlyas in the proof of Corollary 4.4 that b ψ ( ξ ) = 0 for all ξ = ( ξ , ξ ) ∈ R with ξ ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | .Thus, it remains to verify that b ϕ, b ψ satisfy the decay conditions in equation (4.3). But we see exactly as in theproof of Corollary 4.4 (cf. the argument around equation (4.14)) that (cid:12)(cid:12) ∂ β b ϕ ( ξ ) (cid:12)(cid:12) . (1 + | ξ | ) −⌈ H +1 ⌉ ≤ (1 + | ξ | ) − ( H +1) ,as well as (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) −⌈ M +1 ⌉ ≤ (1 + | ξ | ) − ( M +1) and (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) −⌈ K +1 ⌉ ≤ (1 + | ξ | ) − ( K +1) for all β ∈ N and β , β ∈ N . This establishes the last two lines of equation (4.3).For the ﬁrst line of equation (4.3), we see as in the proof of Corollary 4.4 (cf. the argument around equation (4.15))that (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . | ξ | ⌈ M ⌉ ≤ | ξ | M for all ξ ∈ [ − , . Since we saw above that (cid:12)(cid:12)(cid:12) ∂ β c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − ( K +1) for all ξ ∈ R , we have thus also established the ﬁrst line of equation (4.3). (cid:3) The unconnected α -shearlet covering The α -shearlet covering as introduced in Deﬁnition 3.1 divides the frequency space R into a low-frequency partand into four diﬀerent frequency cones: the top, bottom, left and right cones. But for real-valued functions, theabsolute value of the Fourier transform is symmetric. Consequently, there is no non-zero real-valued function withFourier transform essentially supported in the top (or left, ...) cone.For this reason, it is customary to divide the frequency plane into a low-frequency part and two diﬀerent frequencycones: the horizontal and the vertical frequency cone. In this section, we account for this slightly diﬀerent partitionof the frequency plane, by introducing the so-called unconnected α -shearlet covering . The reason for thisnomenclature is that the connected base set Q = U ( − , ) ( − , from Deﬁnition 3.1 is replaced by the unconnected set Q ∪ ( − Q ) . We then show that all results from the preceding two sections remain true for this modiﬁed covering,essentially since the associated decomposition spaces are identical, cf. Lemma 5.5. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Deﬁnition 5.1.

Let α ∈ [0 , . The unconnected α -shearlet covering S ( α ) u is deﬁned as S ( α ) u := ( W ( α ) v ) v ∈ V ( α ) := ( W v ) v ∈ V ( α ) := ( B v W ′ v ) v ∈ V ( α ) = ( B v W ′ v + b v ) v ∈ V ( α ) , where: • The index set V ( α ) is given by V := V ( α ) := { } ∪ V , where V := V ( α )0 := { ( n, m, δ ) ∈ N × Z × { , } | | m | ≤ G n } with G n := G ( α ) n := ⌈ n (1 − α ) ⌉ . • The basic sets ( W ′ v ) v ∈ V ( α ) are given by W ′ := ( − , and by W ′ v := Q u := U ( − , ) ( − , ∪ (cid:2) − U ( − , ) ( − , (cid:3) for v ∈ V ( α )0 .The notation U ( γ,µ )( a,b ) used here is as deﬁned in equation (3.1). • The matrices ( B v ) v ∈ V ( α ) are given by B := B ( α )0 := id and by B v := B ( α ) v := R δ · A ( α ) n,m , where we deﬁne A ( α ) n,m := D ( α )2 n · S Tm for v = ( n, m, δ ) ∈ V . Here, the matrices R, S x and D ( α ) b are as in equation (1.5). • The translations ( b v ) v ∈ V ( α ) are given by b v := 0 for all v ∈ V ( α ) .Finally, we deﬁne the weight u = ( u v ) v ∈ V by u := 1 and u n,m,δ := 2 n for ( n, m, δ ) ∈ V . ◭ The unconnected α -shearlet covering S ( α ) u is highly similar to the (connected) α -shearlet covering S ( α ) fromDeﬁnition 3.1. In particular, we have Q u = Q ∪ ( − Q ) with Q = U ( − , ) ( − , as in Deﬁnition 3.1. To further exploitthis connection between the two coverings, we deﬁne the projection map π : I ( α ) → V ( α ) , i ( , if i = 0 , ( n, m, δ ) , if i = ( n, m, ε, δ ) ∈ I ( α )0 . Likewise, for ε ∈ {± } , we deﬁne the ε -injection ι ε : V ( α ) → I ( α ) , v ( , if v = 0 , ( n, m, ε, δ ) , if v = ( n, m, δ ) ∈ V ( α )0 . Note that B ( α ) v = ε · T ( α ) ι ε ( v ) for all v ∈ V ( α )0 , so that W ( α ) v = S ( α ) ι ( v ) ∪ S ( α ) ι − ( v ) = [ ε ∈{± } S ( α ) ι ε ( v ) ∀ v ∈ V ( α )0 , (5.1)since B ( α ) v (cid:2) − U ( , − ) ( − , (cid:3) = − B ( α ) v U ( , − ) ( − , = T ( α ) ι − ( v ) Q = S ( α ) ι − ( v ) . Because of W ( α )0 = ( − , = S ( α )0 , equation (5.1)remains valid for v = 0 . Using these observations, we can now prove the following lemma: Lemma 5.2.

The unconnected α -shearlet covering S ( α ) u is an almost structured covering of R . ◭ Proof.

In Lemma 3.3, we showed that the (connected) α -shearlet covering S ( α ) is almost structured. Thus, for theproof of the present lemma, we will frequently refer to the proof of Lemma 3.3.First of all, recall from the proof of Lemma 3.3 the notation P ′ ( n,m,ε,δ ) = U (1 / , / − / , / for arbitrary ( n, m, ε, δ ) ∈ I .Then, for v = ( n, m, δ ) ∈ V let us deﬁne R ′ ( n,m,δ ) := P ′ ( n,m, ,δ ) ∪ (cid:16) − P ′ ( n,m, ,δ ) (cid:17) . Furthermore, set R ′ := P ′ , againwith P ′ = (cid:0) − , (cid:1) as in the proof of Lemma 3.3. Then it is not hard to verify R ′ v ⊂ W ′ v for all v ∈ V .Furthermore, in the proof of Lemma 3.3, we showed S i ∈ I T i P ′ i = R . But this implies [ v ∈ V ( B v R ′ v + b v ) = R ′ ∪ [ ( n,m,δ ) ∈ V B ( n,m,δ ) R ′ ( n,m,δ ) = P ′ ∪ [ ( n,m,δ ) ∈ V h B ( n,m,δ ) P ′ ( n,m, ,δ ) ∪ − B ( n,m,δ ) P ′ ( n,m, ,δ ) i(cid:0) since P ′ ( n,m, ,δ ) = P ′ ( n,m, − ,δ ) (cid:1) = P ′ ∪ [ ( n,m,δ ) ∈ V h T ( n,m, ,δ ) P ′ ( n,m, ,δ ) ∪ T ( n,m, − ,δ ) P ′ ( n,m, − ,δ ) i = [ i ∈ I ( T i P ′ i + b i ) = R . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Next, if W ( α )( n,m,δ ) ∩ W ( α )( k,ℓ,γ ) = ∅ , then equation (5.1) yields certain ε, β ∈ {± } such that S ( α )( n,m,ε,δ ) ∩ S ( α )( k,ℓ,β,γ ) = ∅ .But this implies ( k, ℓ, γ ) = π (( k, ℓ, β, γ )) , where ( k, ℓ, β, γ ) ∈ I ∩ ( n, m, ε, δ ) ∗ and where the index cluster is formedwith respect to the covering S ( α ) . Consequently, we have shown ( n, m, δ ) ∗ ⊂ { } ∪ [ ε ∈{± } π (cid:0) I ∩ ( n, m, ε, δ ) ∗ (cid:1) . (5.2)But since S ( α ) is admissible, the constant N := sup i ∈ I | i ∗ | is ﬁnite. But by what we just showed, we have (cid:12)(cid:12) ( n, m, δ ) ∗ (cid:12)(cid:12) ≤ N for all ( n, m, δ ) ∈ V . Finally, using a very similar argument one can show ∗ S ( α ) u ⊂ { } ∪ π ( I ∩ ∗ S ( α ) ) , where the index-cluster is taken with respect to S ( α ) u on the left-hand side and with respect to S ( α ) on the right-handside. Thus, (cid:12)(cid:12)(cid:12) ∗ S ( α ) u (cid:12)(cid:12)(cid:12) ≤ N , so that sup v ∈ V | v ∗ | ≤ N < ∞ . All in all, we have thus shown that S ( α ) u is anadmissible covering of R .It remains to verify sup v ∈ V sup r ∈ v ∗ (cid:13)(cid:13) B − v B r (cid:13)(cid:13) < ∞ . To this end, recall that C := sup i ∈ I sup j ∈ i ∗ (cid:13)(cid:13) T − i T j (cid:13)(cid:13) isﬁnite, since S ( α ) is an almost structured covering. Now, let v ∈ V and r ∈ v ∗ be arbitrary. We distinguish severalcases: Case 1 : We have v = ( n, m, δ ) ∈ V and r = ( k, ℓ, γ ) ∈ V . As above, there are thus certain ε, β ∈ {± } suchthat ( k, ℓ, β, γ ) ∈ ( n, m, ε, δ ) ∗ . Hence, (cid:13)(cid:13) B − v B r (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) ( ε · T n,m,ε,δ ) − · β · T k,ℓ,β,γ (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) ( T n,m,ε,δ ) − · T k,ℓ,β,γ (cid:13)(cid:13)(cid:13) ≤ C. Case 2 : We have v = 0 and r = ( k, ℓ, γ ) ∈ V . There is then some β ∈ {± } satisfying ( k, ℓ, β, γ ) ∈ ∗ , wherethe index-cluster is taken with respect to S ( α ) . Hence, we get again that (cid:13)(cid:13) B − v B r (cid:13)(cid:13) = (cid:13)(cid:13) T − · β · T k,ℓ,β,γ (cid:13)(cid:13) = (cid:13)(cid:13) T − · T k,ℓ,β,γ (cid:13)(cid:13) ≤ C. Case 3 : We have v = ( n, m, δ ) ∈ V and r = 0 . Hence, ∈ ( n, m, ε, δ ) ∗ for some ε ∈ {± } , so that (cid:13)(cid:13) B − v B r (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) ( ε · T n,m,ε,δ ) − · T (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) T − n,m,ε,δ · T (cid:13)(cid:13)(cid:13) ≤ C. Case 4 : We have v = r = 0 . In this case, (cid:13)(cid:13) B − v B r (cid:13)(cid:13) = 1 ≤ C .Hence, we have veriﬁed sup v ∈ V sup r ∈ v ∗ (cid:13)(cid:13) B − v B r (cid:13)(cid:13) < ∞ . Since the sets { W ′ v | v ∈ V } and { R ′ v | v ∈ V } are ﬁnitefamilies of bounded, open sets (in fact, each of these families only has two elements), we have shown that S ( α ) u isan almost structured covering of R . (cid:3) Before we can deﬁne the decomposition spaces associated to the unconnected α -shearlet covering S ( α ) u , we needto verify that the weights that we want to use are S ( α ) u -moderate. Lemma 5.3.

Let u = ( u v ) v ∈ V as in Deﬁnition 5.1. Then u s = ( u sv ) v ∈ V is S ( α ) u -moderate with C S ( α ) u ,u s ≤ | s | . ◭ Proof.

As seen in equation (5.1), we have W ( α ) v = S ε ∈{± } S ( α ) ι ε ( v ) for arbitrary v ∈ V (also for v = 0 ). Furthermore,it is easy to see u v = w ι ε ( v ) for arbitrary ε ∈ {± } and v ∈ V .Thus, if W ( α ) v ∩ W ( α ) r = ∅ for certain v, r ∈ V , there are ε, β ∈ {± } such that S ( α ) ι ε ( v ) ∩ S ( α ) ι β ( r ) = ∅ . But Lemma3.4 shows that w s is S ( α ) -moderate with C S ( α ) ,w s ≤ | s | . Hence, u sv /u sr = w sι ε ( v ) /w sι β ( r ) ≤ | s | . (cid:3) Since we now know that S ( α ) u is an almost structured covering of R and since u s is S ( α ) u -moderate, we seeprecisely as in the remark after Deﬁnition 3.5 that the unconnected α -shearlet smoothness spaces that we nowdeﬁne are well-deﬁned Quasi-Banach spaces. We emphasize that the following deﬁnition will only be of transitoryrelevance, since we will immediately show that the newly deﬁned unconnected α -shearlet smoothness spaces areidentical with the previously deﬁned α -shearlet smoothness spaces. Deﬁnition 5.4.

For α ∈ [0 , , p, q ∈ (0 , ∞ ] and s ∈ R , we deﬁne the unconnected α -shearlet smoothnessspace D p,qα,s (cid:0) R (cid:1) associated to these parameters as D p,qα,s (cid:0) R (cid:1) := D (cid:16) S ( α ) u , L p , ℓ qu s (cid:17) , where the covering S ( α ) u and the weight u s are as in Deﬁnition 5.1 and Lemma 5.3, respectively. ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Lemma 5.5.

We have S p,qα,s ( R ) = D p,qα,s ( R ) ∀ α ∈ [0 , , p, q ∈ (0 , ∞ ] and s ∈ R , with equivalent quasi-norms. ◭ Proof.

We will derive the claim from [60, Lemma 6.11, part (2)], with the choice Q := S ( α ) u and P := S ( α ) , recallingthat S p,qα,s (cid:0) R (cid:1) = D (cid:0) S ( α ) , L p , ℓ qw s (cid:1) = F − (cid:2) D F (cid:0) S ( α ) , L p , ℓ qw s (cid:1)(cid:3) and likewise D p,qα,s (cid:0) R (cid:1) = F − h D F (cid:16) S ( α ) u , L p , ℓ qu s (cid:17)i .To this end, we ﬁrst have to verify that the coverings S ( α ) and S ( α ) u are weakly equivalent . This means that sup i ∈ I (cid:12)(cid:12)(cid:12)n v ∈ V (cid:12)(cid:12)(cid:12) W ( α ) v ∩ S ( α ) i = ∅ o(cid:12)(cid:12)(cid:12) < ∞ and sup v ∈ V (cid:12)(cid:12)(cid:12)n i ∈ I (cid:12)(cid:12)(cid:12) S ( α ) i ∩ W ( α ) v = ∅ o(cid:12)(cid:12)(cid:12) < ∞ . We begin with the ﬁrst claim and thus let i ∈ I be arbitrary. It is easy to see S ( α ) i ⊂ W ( α ) π ( i ) . Consequently, if v ∈ V satisﬁes W ( α ) v ∩ S ( α ) i = ∅ , then ∅ ( W ( α ) v ∩ S ( α ) i ⊂ W ( α ) v ∩ W ( α ) π ( i ) and thus v ∈ [ π ( i )] ∗ , where the index-cluster isformed with respect to S ( α ) u . On the one hand, this implies w ti = u tπ ( i ) ≍ t u tv if S ( α ) i ∩ W ( α ) v = ∅ , for arbitrary t ∈ R , (5.3)since u t is S ( α ) u -moderate by Lemma 5.3. On the other hand, we get sup i ∈ I (cid:12)(cid:12)(cid:12)n v ∈ V (cid:12)(cid:12)(cid:12) W ( α ) v ∩ S ( α ) i = ∅ o(cid:12)(cid:12)(cid:12) ≤ sup i ∈ I (cid:12)(cid:12) [ π ( i )] ∗ (cid:12)(cid:12) ≤ sup v ∈ V | v ∗ | < ∞ , since we know that S ( α ) u is admissible (cf. Lemma 5.2).Now, let us verify the second claim. To this end, let v ∈ V be arbitrary. For i ∈ I with S ( α ) i ∩ W ( α ) v = ∅ ,equation (5.1) shows ∅ = S ε ∈{± } (cid:16) S ( α ) i ∩ S ( α ) ι ε ( v ) (cid:17) and thus i ∈ S ε ∈{± } [ ι ε ( v )] ∗ , where the index-cluster is formedwith respect to S ( α ) . As above, this yields sup v ∈ V (cid:12)(cid:12)(cid:12)n i ∈ I (cid:12)(cid:12)(cid:12) S ( α ) i ∩ W ( α ) v = ∅ o(cid:12)(cid:12)(cid:12) ≤ sup v ∈ V (cid:12)(cid:12) [ ι ( v )] ∗ (cid:12)(cid:12) + (cid:12)(cid:12) [ ι − ( v )] ∗ (cid:12)(cid:12) ≤ · sup i ∈ I | i ∗ | < ∞ , since S ( α ) is admissible (cf. Lemma 3.3).We have thus veriﬁed the two main assumptions of [60, Lemma 6.11], namely that Q , P are weakly equivalentand that u sv ≍ w si if W ( α ) v ∩ S ( α ) i = ∅ , thanks to equation (5.3). But since we also want to get the claim for p ∈ (0 , ,we have to verify the additional condition (2) from [60, Lemma 6.11], i.e., that P = S ( α ) = ( S ( α ) j ) j ∈ I = (cid:0) T j Q ′ j (cid:1) j ∈ I is almost subordinate to Q = S ( α ) u = ( W v ) v ∈ V = ( B v W ′ v ) v ∈ V and that (cid:12)(cid:12) det (cid:0) T − j B v (cid:1)(cid:12)(cid:12) . if W v ∩ S ( α ) j = ∅ . Butwe saw in equation (5.3) that if W ( α ) v ∩ S ( α ) j = ∅ , then (cid:12)(cid:12) det (cid:0) T − j B v (cid:1)(cid:12)(cid:12) = (cid:0) w αj (cid:1) − · u αv ≍ α . Furthermore, S ( α ) j ⊂ W ( α ) π ( j ) for all j ∈ I , so that P = S ( α ) is subordinate (and thus also almost subordinate, cf. [60,Deﬁnition 2.10]) to Q = S ( α ) u , as required. The claim is now an immediate consequence of [60, Lemma 6.11]. (cid:3) In order to allow for a more succinct formulation of our results about Banach frames and atomic decompositionsin the setting of the unconnected α -shearlet covering, we now introduce the notion of cone-adapted α -shearletsystems . As we will see in Section D, these systems are diﬀerent, but intimately connected to the cone-adapted β -shearlet systems (with β ∈ (1 , ∞ ) ) as introduced in [37, Deﬁnition 3.10]. There are three main reasons whywe think that the new deﬁnition is preferable to the old one:(1) With the new deﬁnition, a family ( L δk ϕ ) k ∈ Z ∪ ( ψ j,ℓ,δ,k ) j,ℓ,δ,k of α -shearlets has the property that theshearlets ψ j,ℓ,δ,k of scale j have essential frequency support in the dyadic corona (cid:8) ξ ∈ R (cid:12)(cid:12) j − c < | ξ | < j + c (cid:9) for suitable c > . In contrast, for β -shearlets, the shearlets of scale j have essential frequency support in n ξ ∈ R (cid:12)(cid:12)(cid:12) β ( j − c ) < | ξ | < β ( j + c ) o , cf. Lemma D.2.(2) With the new deﬁnition, a family of cone-adapted α -shearlets is also a family of α -molecules, if the generatorsare chosen suitably. In contrast, for β -shearlets, one has the slightly inconvenient fact that a family of cone-adapted β -shearlets is a family of β − -molecules, cf. [37, Proposition 3.11].(3) The new deﬁnition includes the two boundary values α ∈ { , } which correspond to ridgelet-like systemsand to wavelet-like systems, respectively. In contrast, for β -shearlets, the boundary values β ∈ { , ∞} areexcluded from the deﬁnition. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein We remark that a very similar deﬁnition to the one given here is already introduced in [20, Deﬁnition 5.1], evengenerally in R d for d ≥ . Deﬁnition 5.6.

Let α ∈ [0 , . For generators ϕ, ψ ∈ L (cid:0) R (cid:1) + L (cid:0) R (cid:1) and a given sampling density δ > , wedeﬁne the cone-adapted α -shearlet system with sampling density δ generated by ϕ, ψ as SH α ( ϕ, ψ ; δ ) := (cid:16) γ [ v,k ] (cid:17) v ∈ V, k ∈ Z := (cid:16) L δ · B − Tv k γ [ v ] (cid:17) v ∈ V, k ∈ Z with γ [ v ] := ( | det B v | / · (cid:0) ψ ◦ B Tv (cid:1) , if v ∈ V ,ϕ, if v = 0 , where V, V and B v are as in Deﬁnition 5.1. Note that the notation γ [ v,k ] suppresses the sampling density δ > . Ifwe want to emphasize this sampling density, we write γ [ v,k,δ ] instead of γ [ v,k ] . ◭ Remark . In case of α = , the preceding deﬁnition yields special cone-adapted shearlet systems: As deﬁnedin [51, Deﬁnition 1.2], the cone-adapted shearlet system SH ( ϕ, ψ, θ ; δ ) with sampling density δ > generated by ϕ, ψ, θ ∈ L (cid:0) R (cid:1) is SH ( ϕ, ψ, θ ; δ ) = Φ ( ϕ ; δ ) ∪ Ψ ( ψ ; δ ) ∪ Θ ( θ ; δ ) , where Φ ( ϕ ; δ ) := (cid:8) ϕ k := ϕ ( • − δk ) (cid:12)(cid:12) k ∈ Z (cid:9) , Ψ ( ψ ; δ ) := n ψ j,ℓ,k := 2 j · ψ ( S ℓ A j • − δk ) (cid:12)(cid:12)(cid:12) j ∈ N , ℓ ∈ Z with | ℓ | ≤ ⌈ j/ ⌉ and k ∈ Z o , Θ ( θ ; δ ) := n θ j,ℓ,k := 2 j · θ (cid:16) S Tℓ e A j • − δk (cid:17) (cid:12)(cid:12)(cid:12) j ∈ N , ℓ ∈ Z with | ℓ | ≤ ⌈ j/ ⌉ and k ∈ Z o , with S k = ( k ) , A j = diag (cid:0) j , j/ (cid:1) and e A j = diag (cid:0) j/ , j (cid:1) .Now, the most common choice for θ is θ = ψ ◦ R for R = ( ) . With this choice, we observe in the notation ofDeﬁnitions 5.6 and 5.1 that γ [0 ,k ] = L δ · B − T k γ [0] = L δk ϕ = ϕ ( • − δk ) = ϕ k ∀ k ∈ Z . Furthermore, we note because of α = that B Tj,ℓ, = (cid:20)(cid:18) j

00 2 j/ (cid:19) · (cid:18) ℓ (cid:19)(cid:21) T = S ℓ · A j , with | det B j,ℓ, | = 2 j , so that γ [( j,ℓ, ,k ] = L δ · [ S ℓ A j ] − k γ [( j,ℓ, = 2 j · ψ ( S ℓ · A j • − δk ) = ψ j,ℓ,k ∀ ( j, ℓ, ∈ V and k ∈ Z . Finally, we observe θ (cid:16) S Tℓ e A j • − δk (cid:17) = ψ (cid:16) RS Tℓ e A j • − δRk (cid:17) , as well as R · S Tℓ · e A j = (cid:18) (cid:19) (cid:18) ℓ (cid:19) (cid:18) j/

00 2 j (cid:19) = (cid:18) j/ ℓ j j/ (cid:19) and B Tj,ℓ, = (cid:20)(cid:18) (cid:19) (cid:18) j

00 2 j/ (cid:19) (cid:18) ℓ (cid:19)(cid:21) T = (cid:20)(cid:18) j/ j (cid:19) (cid:18) ℓ (cid:19)(cid:21) T = (cid:18) j/ ℓ j/ j (cid:19) T = R · S Tℓ · e A j . Consequently, we also get γ [( j,ℓ, ,k ] = L δ · [ R · S Tℓ · e A j ] − k γ [( j,ℓ, = 2 j · ψ (cid:16) R · S Tℓ · e A j • − δk (cid:17) = 2 j · ψ (cid:16) R · S Tℓ · e A j • − δRRk (cid:17) = θ j,ℓ,Rk for arbitrary ( j, ℓ, ∈ V and k ∈ Z . Since Z → Z , k Rk is bijective, this implies SH ( ϕ, ψ, θ ; δ ) = SH / ( ϕ, ψ ; δ ) up to a reordering in the translation variable k if θ = ψ ◦ R. (cid:7) We now want to transfer Theorems 4.2 and 4.3 to the setting of the unconnected α -shearlet covering. The linkbetween the connected and the unconnected setting is provided by the following lemma: Lemma 5.8.

With ̺ , ̺ as in equation (4.1) , set e ̺ := ̺ , as well as e ̺ v := ̺ for v ∈ V . Moreover, set f M (0) r,v := (cid:18) u sr u sv (cid:19) τ (cid:0) (cid:13)(cid:13) B − r B v (cid:13)(cid:13)(cid:1) σ (cid:18) | det B v | − · Z W ( α ) v e ̺ r (cid:0) B − r ξ (cid:1) d ξ (cid:19) τ for v, r ∈ V . Then we have f M (0) r,v ≤ τ · M (0) ι ( r ) ,ι ( v ) for all v, r ∈ V , where M (0) ι ( r ) ,ι ( v ) is as in Lemma 4.1. ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Proof.

First of all, recall W ′ v =  U ( − , ) ( − , ∪ (cid:2) − U ( − , ) ( − , (cid:3) = Q ′ ι ( v ) ∪ (cid:2) − Q ′ ι ( v ) (cid:3) , if v ∈ V , ( − , = ( − , ∪ h − ( − , i = Q ′ ι ( v ) ∪ (cid:2) − Q ′ ι ( v ) (cid:3) , if v = 0 and B v = T ι ( v ) , as well as u v = w ι ( v ) and e ̺ v = ̺ ι ( v ) for all v ∈ V . Thus, f M (0) r,v = (cid:18) u sr u sv (cid:19) τ · (cid:0) (cid:13)(cid:13) B − r B v (cid:13)(cid:13)(cid:1) σ · (cid:18) | det B v | − · Z W ( α ) v e ̺ r (cid:0) B − r ξ (cid:1) d ξ (cid:19) τ ( with ζ = B − v ξ ) = w sι ( r ) w sι ( v ) ! τ · (cid:16) (cid:13)(cid:13)(cid:13) T − ι ( r ) T ι ( v ) (cid:13)(cid:13)(cid:13)(cid:17) σ · Z W ′ v e ̺ r (cid:0) B − r B v ζ (cid:1) d ζ ! τ = w sι ( r ) w sι ( v ) ! τ · (cid:16) (cid:13)(cid:13)(cid:13) T − ι ( r ) T ι ( v ) (cid:13)(cid:13)(cid:13)(cid:17) σ · Z Q ′ ι v ) ∪ h − Q ′ ι v ) i ̺ ι ( r ) (cid:16) T − ι ( r ) T ι ( v ) ζ (cid:17) d ζ ! τ ≤ w sι ( r ) w sι ( v ) ! τ · (cid:16) (cid:13)(cid:13)(cid:13) T − ι ( r ) T ι ( v ) (cid:13)(cid:13)(cid:13)(cid:17) σ · Z Q ′ ι v ) ̺ ι ( r ) (cid:16) T − ι ( r ) T ι ( v ) ζ (cid:17) d ζ + Z − Q ′ ι v ) ̺ ι ( r ) (cid:16) T − ι ( r ) T ι ( v ) ζ (cid:17) d ζ ! τ ( since ̺ ι r ) ( − ξ )= ̺ ι r ) ( ξ ) ) = w sι ( r ) w sι ( v ) ! τ · (cid:16) (cid:13)(cid:13)(cid:13) T − ι ( r ) T ι ( v ) (cid:13)(cid:13)(cid:13)(cid:17) σ · · Z Q ′ ι v ) ̺ ι ( r ) (cid:16) T − ι ( r ) T ι ( v ) ζ (cid:17) d ζ ! τ ( with ξ = T ι v ) ζ ) = 2 τ · w sι ( r ) w sι ( v ) ! τ · (cid:16) (cid:13)(cid:13)(cid:13) T − ι ( r ) T ι ( v ) (cid:13)(cid:13)(cid:13)(cid:17) σ · (cid:12)(cid:12) det T ι ( v ) (cid:12)(cid:12) − · Z S ( α ) ι v ) ̺ ι ( r ) (cid:16) T − ι ( r ) ξ (cid:17) d ξ ! τ = 2 τ · M (0) ι ( r ) ,ι ( v ) . (cid:3) Since the map ι : V → I is injective, Lemma 5.8 implies max  sup v ∈ V X r ∈ V f M (0) r,v ! /τ , sup r ∈ V X v ∈ V f M (0) r,v ! /τ  ≤ · max  sup i ∈ I X j ∈ I M (0) j,i , sup j ∈ I X i ∈ I M (0) j,i  . Then, recalling Lemma 5.5 and using precisely the same arguments as for proving Theorems 4.2 and 4.3, one canprove the following two theorems:

Theorem 5.9.

Theorem 4.2 remains essentially valid if the family f SH ( ± α,ϕ,ψ,δ is replaced by the α -shearlet system SH α ( e ϕ, e ψ ; δ ) = (cid:16) L δ · B − Tv k g γ [ v ] (cid:17) v ∈ V, k ∈ Z with γ [ v ] := ( | det B v | / · (cid:0) ψ ◦ B Tv (cid:1) , if v ∈ V ,ϕ, if v = 0 , where e ϕ ( x ) = ϕ ( − x ) and e ψ ( x ) = ψ ( − x ) . The only two necessary changes are the following:(1) The assumption b ψ ( ξ ) = 0 for ξ = ( ξ , ξ ) ∈ R with ξ ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | has to be replaced by b ψ ( ξ ) = 0 for ξ = ( ξ , ξ ) ∈ R with ≤ | ξ | ≤ and | ξ | ≤ | ξ | . (2) For the deﬁnition of the analysis operator A ( δ ) , the convolution γ [ v ] ∗ f has to be deﬁned as in equation(2.3), but using a regular partition of unity ( ϕ v ) v ∈ V for S ( α ) u , i.e., (cid:16) γ [ v ] ∗ f (cid:17) ( x ) = X ℓ ∈ V F − (cid:16)d γ [ v ] · ϕ ℓ · b f (cid:17) ( x ) ∀ x ∈ R d , where the series converges normally in L ∞ (cid:0) R (cid:1) and thus absolutely and uniformly, for all f ∈ S p,qα,s (cid:0) R (cid:1) .For a more convenient expression for this convolution—at least for f ∈ L (cid:0) R (cid:1) —see Lemma 5.12 below. ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Theorem 5.10.

Theorem 4.3 remains essentially valid if the family SH ( ± ϕ,ψ,δ is replaced by the α -shearlet system SH α ( ϕ, ψ ; δ ) = (cid:16) L δ · B − Tv k γ [ v ] (cid:17) v ∈ V, k ∈ Z with γ [ v ] := ( | det B v | / · (cid:0) ψ ◦ B Tv (cid:1) , if v ∈ V ,ϕ, if v = 0 . The only necessary change is that the assumption b ψ ( ξ ) = 0 for ξ = ( ξ , ξ ) ∈ R with ξ ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | has to be replaced by b ψ ( ξ ) = 0 for ξ = ( ξ , ξ ) ∈ R with ≤ | ξ | ≤ and | ξ | ≤ | ξ | . ◭ Remark . With the exact same reasoning, one can also show that Corollaries 4.4 and 4.5 remain valid with theobvious changes. Again, one now has to require c ψ ( ξ ) = 0 for ≤ | ξ | ≤ . instead of c ψ ( ξ ) = 0 for ξ ∈ (cid:2) − , (cid:3) . (cid:7) The one remaining limitation of Theorems 4.2 and 5.9 is their somewhat strange deﬁnition of the convolution (cid:0) γ [ i ] ∗ f (cid:1) ( x ) . The following lemma makes this deﬁnition more concrete, under the assumption that we alreadyknow f ∈ L (cid:0) R (cid:1) . For general f ∈ S p,qα,s (cid:0) R (cid:1) , this need not be the case, but for suitable values of p, q, s , we have S p,qα,s (cid:0) R (cid:1) ֒ → L (cid:0) R (cid:1) , as we will see in Theorem 5.13. Lemma 5.12.

Let ( ϕ i ) i ∈ I be a regular partition of unity subordinate to some almost structured covering Q = ( Q i ) i ∈ I of R d . Assume that γ ∈ L (cid:0) R d (cid:1) ∩ L (cid:0) R d (cid:1) with b γ ∈ C ∞ (cid:0) R d (cid:1) , where all partial derivatives of b γ are polynomiallybounded. Let f ∈ L (cid:0) R d (cid:1) ֒ → S ′ (cid:0) R d (cid:1) ֒ → Z ′ (cid:0) R d (cid:1) be arbitrary. Then we have X ℓ ∈ I F − (cid:16)b γ · ϕ ℓ · b f (cid:17) ( x ) = h f, L x e γ i ∀ x ∈ R d , where e γ ( x ) = γ ( − x ) and where h f, g i = R R d f ( x ) · g ( x ) d x . ◭ Proof.

In the expression F − (cid:16)b γ · ϕ ℓ · b f (cid:17) ( x ) , the inverse Fourier transform is the inverse Fourier transform of thecompactly supported, tempered distribution b γ · ϕ ℓ · b f ∈ S ′ (cid:0) R d (cid:1) . But by the Paley-Wiener theorem (see e.g. [55,Theorem 7.23]), the tempered distribution F − (cid:16)b γ · ϕ ℓ · b f (cid:17) is given by (integration against) a (uniquely determined)smooth function, whose value at x ∈ R d we denote by F − (cid:16)b γ · ϕ ℓ · b f (cid:17) ( x ) . Precisely, we have F − (cid:16)b γ · ϕ ℓ · b f (cid:17) ( x ) = Db γ · b f , ϕ ℓ · e πi h x, •i E D ′ ( R d ) ,C ∞ c ( R d ) = Z R d b γ ( ξ ) · b f ( ξ ) · e πi h x,ξ i · ϕ ℓ ( ξ ) d ξ. But since Q is an admissible covering of R d and since ( ϕ ℓ ) ℓ ∈ I is a regular partition of unity subordinate to Q , wehave X ℓ ∈ I (cid:12)(cid:12)(cid:12)b γ ( ξ ) · b f ( ξ ) · e πi h x,ξ i · ϕ ℓ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ | b γ ( ξ ) · b f ( ξ ) | · X ℓ ∈ I | ϕ ℓ ( ξ ) |≤ sup ℓ ∈ I k ϕ ℓ k sup · | b γ ( ξ ) · b f ( ξ ) | · X ℓ ∈ I Q ℓ ( ξ ) ≤ N Q · sup ℓ ∈ I k ϕ ℓ k sup · | b γ ( ξ ) · b f ( ξ ) | ∈ L (cid:0) R d (cid:1) , since b γ, b f ∈ L (cid:0) R d (cid:1) . Since we also have P ℓ ∈ I ϕ ℓ ≡ on R d , we get by the dominated convergence theorem that X ℓ ∈ I F − (cid:16)b γ · ϕ ℓ · b f (cid:17) ( x ) = Z R d b γ ( ξ ) · b f ( ξ ) · e πi h x,ξ i · X ℓ ∈ I ϕ ℓ ( ξ ) d ξ = Z R d b γ ( ξ ) · b f ( ξ ) · e πi h x,ξ i d ξ = F − ( b γ · b f ) ( x ) , where F − ( b γ · b f ) ∈ L (cid:0) R d (cid:1) ∩ C (cid:0) R d (cid:1) by the Riemann-Lebesgue Lemma and Plancherel’s theorem, because of b γ · b f ∈ L (cid:0) R d (cid:1) ∩ L (cid:0) R d (cid:1) . But Young’s inequality shows γ ∗ f ∈ L (cid:0) R d (cid:1) , while the convolution theorem yields F [ γ ∗ f ] = b γ · b f . Hence, γ ∗ f = F − ( b γ · b f ) almost everywhere. But both sides of the identity are continuous nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein functions, since the convolution of two L functions is continuous. Thus, the equality holds everywhere, so that weﬁnally get X ℓ ∈ I F − (cid:16)b γ · ϕ ℓ · b f (cid:17) ( x ) = F − ( b γ · b f ) ( x ) = ( γ ∗ f ) ( x ) = Z R d f ( y ) · γ ( x − y ) d y = h f, L x e γ i . (cid:3) We close this section with a theorem that justiﬁes the title of the paper: It formally encodes the fact that analysissparsity is equivalent to synthesis sparsity for (suitable) α -shearlet systems. Theorem 5.13.

Let α ∈ [0 , , ε, p ∈ (0 , and s (0) ≥ be arbitrary. Assume that ϕ, ψ ∈ L (cid:0) R (cid:1) satisfy theassumptions of Theorems 5.9 and 5.10 with q = p and s = 0 , as well as s = s (0) + (1 + α ) (cid:0) p − − − (cid:1) . For δ > , denote by SH α ( ϕ, ψ ; δ ) = (cid:0) γ [ v,k,δ ] (cid:1) v ∈ V, k ∈ Z the α -shearlet system generated by ϕ, ψ , as in Deﬁnition 5.6.Then there is some δ ∈ (0 , with the following property: For all p ∈ [ p , and all s ∈ (cid:2) , s (0) (cid:3) , we have S p,pα,s +(1+ α )( p − − − ) (cid:0) R (cid:1) = (cid:26) f ∈ L (cid:0) R (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:16) u sv · h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z ∈ ℓ p (cid:0) V × Z (cid:1)(cid:27) =  X ( v,k ) ∈ V × Z c ( v ) k · γ [ v,k,δ ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z ∈ ℓ p (cid:0) V × Z (cid:1) , as long as < δ ≤ δ . Here, the weight u = ( u v ) v ∈ V is as in Deﬁnition 5.1, i.e., u n,m,δ = 2 n and u = 1 .In fact, for f ∈ S p,pα,s +(1+ α )( p − − − ) (cid:0) R (cid:1) , we even have a (quasi)-norm equivalence k f k S p,pα,s +(1+ α ) ( p − − − ) ≍ (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) u sv · h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ p ≍ inf (cid:13)(cid:13)(cid:13) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f = X ( v,k ) ∈ V × Z c ( v ) k · γ [ v,k,δ ] with uncond. conv. in L (cid:0) R (cid:1) . In particular, S p,pα,s +(1+ α )( p − − − ) (cid:0) R (cid:1) ֒ → L (cid:0) R (cid:1) and SH α ( ϕ, ψ ; δ ) is a frame for L (cid:0) R (cid:1) . ◭ Remark.

As one advantage of the decomposition space point of view, we observe that S p,qα,s (cid:0) R (cid:1) is easily seen tobe translation invariant, while this is not so easy to see in the characterization via analysis or synthesis sparsity interms of a discrete α -shearlet system. (cid:7) Proof.

We start with a few preparatory deﬁnitions and observations. For brevity, we set k f k ∗ ,p,s,δ := inf (cid:13)(cid:13)(cid:13) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f = X ( v,k ) ∈ V × Z c ( v ) k · γ [ v,k,δ ] with uncond. conv. in L (cid:0) R (cid:1) (5.4)for f ∈ S p,pα,s +(1+ α )( p − − − ) (cid:0) R (cid:1) and s ∈ (cid:2) , s (0) (cid:3) , as well as p ∈ [ p , .Next, our assumptions entail that ϕ, ψ satisfy the assumptions of Theorem 5.9 (and thus equation (4.3)) for s = 0 and s = s (0) + (1 + α ) (cid:0) p − − − (cid:1) ≥ . But this implies (in the notation of Theorem 4.2) that K, H, M ≥ ε .Hence, (1 + | ξ | ) − ( M +1) (1 + | ξ | ) − ( K +1) ≤ [(1 + | ξ | ) (1 + | ξ | )] − (2+ ε ) ≤ (1 + | ξ | ) − (2+ ε ) ∈ L (cid:0) R (cid:1) . Therefore, equation (4.3) entails b ϕ, b ψ ∈ L (cid:0) R (cid:1) , so that Fourier inversion yields ϕ, ψ ∈ L (cid:0) R (cid:1) ∩ C (cid:0) R (cid:1) ֒ → L (cid:0) R (cid:1) .Consequently, γ [ v ] ∈ L (cid:0) R (cid:1) ∩ L (cid:0) R (cid:1) for all v ∈ V , which will be important for our application of Lemma 5.12later in the proof.Finally, for g : R → C , set g ∗ : R → C , x g ( − x ) . For g ∈ L (cid:0) R (cid:1) , we then have b g ∗ ( ξ ) = b g ( ξ ) for all ξ ∈ R .Therefore, in case of g ∈ C (cid:0) R (cid:1) with g, ∇ g ∈ L (cid:0) R (cid:1) ∩ L ∞ (cid:0) R (cid:1) and with b g ∈ C ∞ (cid:0) R (cid:1) , this implies that g ∗ satisﬁes the same properties and that (cid:12)(cid:12) ∂ θ b g ∗ (cid:12)(cid:12) = (cid:12)(cid:12) ∂ θ b g (cid:12)(cid:12) for all θ ∈ N . These considerations easily show that since ϕ, ψ satisfy the assumptions of Theorem 5.9 (with q = p and s = 0 , as well as s = s (0) + (1 + α ) (cid:0) p − − − (cid:1) ),so do ϕ ∗ , ψ ∗ .Thus, Theorem 5.9 yields a constant δ ∈ (0 , such that the α -shearlet system SH α (cid:0) ϕ, ψ ; δ (cid:1) = SH α ( f ϕ ∗ , f ψ ∗ ; δ ) forms a Banach frame for S p,qα,s (cid:0) R (cid:1) , for all p, q ∈ [ p , ∞ ] and all s ∈ R with ≤ s ≤ s (0) + (1 + α ) (cid:0) p − − − (cid:1) ,as long as < δ ≤ δ . Likewise, Theorem 5.10 yields a constant δ ∈ (0 , such that SH α ( ϕ, ψ ; δ ) yields anatomic decomposition of S p,qα,s (cid:0) R (cid:1) for the same range of parameters, as long as < δ ≤ δ . Now, let us set δ := min { δ , δ } ∈ (0 , . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Let p ∈ [ p , and s ∈ (cid:2) , s (0) (cid:3) be arbitrary and set s ♮ := s + (1 + α ) (cid:0) p − − − (cid:1) . It is not hard to see directlyfrom Deﬁnition 2.8—and because of | det B v | = u αv for all v ∈ V —that the quasi-norm of the coeﬃcient space C p,pu s♮ satisﬁes (cid:13)(cid:13)(cid:13) ( c ( v ) k ) v ∈ V,k ∈ Z (cid:13)(cid:13)(cid:13) C p,pus♮ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) | det B v | − p · u s ♮ v · (cid:13)(cid:13) ( c ( v ) k ) k ∈ Z (cid:13)(cid:13) ℓ p (cid:17) v ∈ V (cid:13)(cid:13)(cid:13)(cid:13) ℓ p = (cid:13)(cid:13)(cid:13) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p ∈ [0 , ∞ ] for arbitrary sequences ( c ( v ) k ) v ∈ V,k ∈ Z , and C p,pu s♮ contains exactly those sequences for which this (quasi)-norm isﬁnite. Now, note because of s ≥ and p ≤ that C p,pu s♮ ֒ → ℓ (cid:0) V × Z (cid:1) , since u v ≥ for all v ∈ V and since ℓ p ֒ → ℓ .Next, note that we have s = 0 ≤ s ≤ s ♮ ≤ s (0) + (1 + α ) (cid:0) p − − − (cid:1) = s , so that SH α ( ϕ, ψ ; δ ) forms an atomic decomposition of S p,pα,s ♮ (cid:0) R (cid:1) for all < δ ≤ δ . This means that the synthesisoperator S ( δ ) : C p,pu s♮ → S p,pα,s ♮ (cid:0) R (cid:1) , ( c ( v ) k ) v ∈ V, k ∈ Z X ( v,k ) ∈ V × Z c ( v ) k · γ [ v,k,δ ] is well-deﬁned and bounded with unconditional convergence of the series in S p,pα,s ♮ (cid:0) R (cid:1) . This implicitly uses that thesynthesis operator S ( δ ) as deﬁned in Theorem 4.3 is bounded and satisﬁes S ( δ ) ( δ v,k ) = γ [ v,k,δ ] for all ( v, k ) ∈ V × Z and that we have c = ( c ( v ) k ) v ∈ V, k ∈ Z = P ( v,k ) ∈ V × Z c ( v ) k · δ v,k for all c ∈ C p,pu s♮ , with unconditional convergence in C p,pu s♮ , since p ≤ < ∞ . This immediately yields Ω :=  X ( v,k ) ∈ V × Z c ( v ) k · γ [ v,k,δ ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z ∈ ℓ p (cid:0) V × Z (cid:1) = range ( S ( δ ) ) ⊂ S p,pα,s ♮ (cid:0) R (cid:1) . (5.5)Further, if f ∈ S p,pα,s ♮ (cid:0) R (cid:1) and if c = ( c ( v ) k ) v ∈ V, k ∈ Z is an arbitrary sequence satisfying f = P ( v,k ) ∈ V × Z c ( v ) k · γ [ v,k,δ ] with unconditional convergence in L (cid:0) R (cid:1) , there are two cases: Case

1. We have (cid:13)(cid:13)(cid:13) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p = ∞ . In this case, k f k S p,pα,s♮ ( R ) ≤ ||| S ( δ ) ||| · (cid:13)(cid:13)(cid:13) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p istrivial. Case

2. We have (cid:13)(cid:13)(cid:13) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p < ∞ . In this case, we get c ∈ C p,pu s♮ and f = S ( δ ) c . Therefore, we see k f k S p,pα,s♮ ( R ) ≤ ||| S ( δ ) ||| · k c k C p,pus♮ = ||| S ( δ ) ||| · (cid:13)(cid:13)(cid:13) ( u sv · c ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p .All in all, we have thus established k f k S p,pα,s♮ ( R ) ≤ ||| S ( δ ) ||| · k f k ∗ ,p,s,δ ∀ f ∈ S p,pα,s ♮ (cid:0) R (cid:1) . Next, note that the considerations from the preceding paragraph with the choice p = 2 and s = 0 also show that S ( δ ) : ℓ (cid:0) V × Z (cid:1) → S , α, (cid:0) R (cid:1) is well-deﬁned and bounded. But [60, Lemma 6.10] yields S , α, (cid:0) R (cid:1) = L (cid:0) R (cid:1) with equivalent norms. Since we saw above that C p,pu s♮ ֒ → ℓ (cid:0) V × Z (cid:1) for all p ≤ and s ≥ , this implies inparticular that the series deﬁning S ( δ ) c converges unconditionally in L (cid:0) R (cid:1) for arbitrary c ∈ C p,pu s♮ , for arbitrary s ∈ (cid:2) , s (0) (cid:3) and p ∈ [ p , .But from the atomic decomposition property of SH α ( ϕ, ψ ; δ ) , we also know that there is a bounded coeﬃcientoperator C ( δ ) : S p,pα,s ♮ (cid:0) R (cid:1) → C p,pu s♮ satisfying S ( δ ) ◦ C ( δ ) = id S p,pα,s♮ . Thus, for arbitrary f ∈ S p,pα,s ♮ (cid:0) R (cid:1) and e = ( e v,k ) v ∈ V,k ∈ Z := C ( δ ) f ∈ C p,pu s♮ , we have f = S ( δ ) e = P ( v,k ) ∈ V × Z e ( v ) k · γ [ v,k,δ ] ∈ Ω , where the series convergesunconditionally in L (cid:0) R (cid:1) (and in S p,pα,s ♮ (cid:0) R (cid:1) ). In particular, we get k f k ∗ ,p,s,δ ≤ (cid:13)(cid:13)(cid:13) ( u sv · e ( v ) k ) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p = k e k C p,pus♮ ≤ ||| C ( δ ) ||| · k f k S p,pα,s♮ < ∞ , as well as k f k L ( R ) . k f k S , α, ≤ ||| S ( δ ) ||| ℓ → S , α, · k e k C , u = ||| S ( δ ) ||| ℓ → S , α, · k e k ℓ ≤ ||| S ( δ ) ||| ℓ → S , α, · k e k C p,pus♮ ≤ ||| S ( δ ) ||| ℓ → S , α, · ||| C ( δ ) ||| · k f k S p,pα,s♮ < ∞ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein for all f ∈ S p,pα,s ♮ (cid:0) R (cid:1) . Up to now, we have thus shown S p,pα,s ♮ (cid:0) R (cid:1) = Ω (with Ω as in equation (5.5)) and k f k ∗ ,p,s,δ ≍ k f k S p,pα,s♮ for all f ∈ S p,pα,s ♮ (cid:0) R (cid:1) , with k f k ∗ ,p,s,δ as in equation (5.4). Finally, we have also shown S p,pα,s ♮ (cid:0) R (cid:1) ֒ → L (cid:0) R (cid:1) .Thus, it remains to show Ω := (cid:26) f ∈ L (cid:0) R (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:16) u sv · h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z ∈ ℓ p (cid:0) V × Z (cid:1)(cid:27) ! = S p,pα,s ♮ (cid:0) R (cid:1) , as well as k f k S p,pα,s♮ ≍ (cid:13)(cid:13)(cid:13)(cid:0) u sv · h f, γ [ v,k,δ ] i L (cid:1) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13) ℓ p for f ∈ S p,pα,s ♮ (cid:0) R (cid:1) . But Theorem 5.9 (applied with ϕ ∗ , ψ ∗ instead of ϕ, ψ , see above) shows that the analysis operator A ( δ ) : S p,pα,s ♮ (cid:0) R (cid:1) → C p,pu s♮ , f (cid:2) ( ̺ [ v ] ∗ f ) (cid:0) δ · B − Tv k (cid:1)(cid:3) v ∈ V, k ∈ Z is well-deﬁned and bounded, where (cf. Theorem 4.2), the family (cid:0) ̺ [ v ] (cid:1) v ∈ V is given by ̺ [ v ] = | det B v | / · (cid:0) ψ ∗ ◦ B Tv (cid:1) for v ∈ V and by ̺ [0] = ϕ ∗ . Note that this yields g ̺ [ v ] = γ [ v ] , where the family (cid:0) γ [ v ] (cid:1) v ∈ V is as in Deﬁnition 5.6.Now, since we already showed S p,pα,s ♮ (cid:0) R (cid:1) ֒ → L (cid:0) R (cid:1) and since ̺ [ v ] ∈ L (cid:0) R (cid:1) ∩ L (cid:0) R (cid:1) for all v ∈ V , as we sawat the start of the proof, Lemma 5.12 yields ( ̺ [ v ] ∗ f ) (cid:0) δ · B − Tv k (cid:1) = D f, L δ · B − Tv k g ̺ [ v ] E = D f, L δ · B − Tv k γ [ v ] E = D f, L δ · B − Tv k γ [ v ] E L = D f, γ [ v,k,δ ] E L for all f ∈ S p,pα,s ♮ (cid:0) R (cid:1) and ( v, k ) ∈ V × Z . We thus see S p,pα,s ♮ (cid:0) R (cid:1) ⊂ Ω and (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) u sv · h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ p = k A ( δ ) f k C p,pus♮ ≤ ||| A ( δ ) ||| · k f k S p,pα,s♮ ∀ f ∈ S p,pα,s ♮ (cid:0) R (cid:1) . Conversely, let f ∈ Ω be arbitrary, i.e., f ∈ L (cid:0) R (cid:1) with (cid:0) u sv · h f, γ [ v,k,δ ] i L (cid:1) v ∈ V, k ∈ Z ∈ ℓ p (cid:0) V × Z (cid:1) . Thismeans f ∈ L (cid:0) R (cid:1) = S , α, (cid:0) R (cid:1) and (cid:2)(cid:0) ̺ [ v ] ∗ f (cid:1) (cid:0) δ · B − Tv k (cid:1)(cid:3) v ∈ V, k ∈ Z ∈ C p,pu s♮ , again by Lemma 5.12. Thus, theconsistency statement of Theorem 4.2 shows f ∈ S p,pα,s ♮ (cid:0) R (cid:1) . Therefore, f = R ( δ ) A ( δ ) f for the reconstructionoperator R ( δ ) : C p,pu s♮ → S p,pα,s ♮ (cid:0) R (cid:1) that is provided by Theorem 5.9 (applied with ϕ ∗ , ψ ∗ instead of ϕ, ψ ). Thus, k f k S p,pα,s♮ ≤ ||| R ( δ ) ||| · k A ( δ ) f k C p,pus♮ = ||| R ( δ ) ||| · (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) u sv · h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ p . If we apply the preceding considerations for s = 0 and p = 2 , we in particular get k f k L ≍ k f k S , α, ≍ (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ ∀ f ∈ L (cid:0) R (cid:1) = S , α, (cid:0) R (cid:1) , which implies that the α -shearlet system SH α ( ϕ, ψ ; δ ) = (cid:0) γ [ v,k,δ ] (cid:1) v ∈ V, k ∈ Z is a frame for L (cid:0) R (cid:1) . (cid:3) Approximation of cartoon-like functions using α -shearlets One of the most celebrated properties of shearlet systems is that they provide (almost) optimal approximationrates for the model class E (cid:0) R ; ν (cid:1) of cartoon-like functions , which we introduce formally in Deﬁnition 6.1below. More precisely, this means (cf. [51, Theorem 1.3] for the case of compactly supported shearlets) that k f − f ( N ) k L ≤ C · N − · (1 + log N ) / ∀ N ∈ N and f ∈ E (cid:0) R ; ν (cid:1) , (6.1)where f ( N ) is the so-called N -term approximation of f .The exact interpretation of this N -term approximation, however, requires some explanation, as was brieﬂydiscussed in the introduction: In general, given a dictionary Ψ = ( ψ i ) i ∈ I in a Hilbert space H (which is assumed tosatisfy span { ψ i | i ∈ I } = H ), we let H ( N )Ψ := (X i ∈ J α i ψ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) J ⊂ I with | J | ≤ N and ( α i ) i ∈ J ∈ C J ) (6.2)denote the sub set (which is in general not a subspace) of H consisting of linear combinations of (at most) N elementsof Ψ . The usual deﬁnition of a (in general non-unique) best N -term approximation to f ∈ H is any f ( N )Ψ ∈ H ( N )Ψ satisfying k f − f ( N )Ψ k = inf g ∈H ( N )Ψ k f − g k . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein This deﬁnition is given for example in [50, Section 3.1]. Note, however, that in general, it is not clear whether sucha best N -term approximation exists. But regardless of whether a best N -term approximation exists or not, we canalways deﬁne the N -term approximation error as α ( N )Ψ ( f ) := inf g ∈H ( N )Ψ k f − g k . (6.3)All in all, the goal of (nonlinear) N -term approximations is to approximate an element f ∈ H using only a ﬁxednumber of elements from the dictionary Ψ . Thus, when one reads the usual statement that shearlets provide (almost)optimal N -term approximation rates for cartoon-like functions , one could be tempted to think that equation (6.1)has to be understood as α ( N )Ψ ( f ) ≤ C · N − · (1 + log N ) / ∀ N ∈ N and f ∈ E (cid:0) R ; ν (cid:1) , (6.4)where the dictionary Ψ is a (suitable) shearlet system . This, however, is not what is shown e.g. in [50]. What isshown there, instead, is that if e Ψ = ( e ψ i ) i ∈ I denotes the (canonical) dual frame (in fact, any dual frame will do) ofa suitable shearlet system Ψ , then we have α ( N ) e Ψ ( f ) ≤ C · N − · (1 + log N ) / ∀ N ∈ N and f ∈ E (cid:0) R ; ν (cid:1) . This approximation rate using the dual frame e Ψ is not completely satisfactory, since for non-tight shearlet systems Ψ , the properties of e Ψ (like smoothness, decay, etc) are largely unknown. Note that there is no known constructionof a tight, compactly supported cone-adapted shearlet frame. Furthermore, to our knowledge, there is—up tonow—nothing nontrivial known about α ( N )Ψ ( f ) for f ∈ E (cid:0) R (cid:1) in the case that Ψ is itself a shearlet system, unless Ψ is a tight shearlet frame.This diﬀerence between approximation using the primal and the dual frame is essentially a diﬀerence betweenanalysis and synthesis sparsity: The usual proof strategy to obtain the approximation rate with respect to the dual frame is to show that the analysis coeﬃcients ( h f, ψ i i ) i ∈ I are sparse in the sense that they lie in some (weak) ℓ p space. Then one uses the reconstruction formula f = X i ∈ I h f, ψ i i e ψ i using the dual frame e Ψ = ( e ψ i ) i ∈ I and truncates this series to the N terms with the largest coeﬃcients |h f, ψ i i| . Using the sparsity of the coeﬃcients,one then obtains the claim. In other words, since the analysis coeﬃcients with respect to Ψ = ( ψ i ) i ∈ I are thesynthesis coeﬃcients with respect to e Ψ , analysis sparsity with respect to Ψ yields synthesis sparsity with respect to e Ψ . Conversely, analysis sparsity with respect to e Ψ yields synthesis sparsity with respect to Ψ itself. But since onlylimited knowledge about e Ψ is available, this fact is essentially impossible to apply.But our preceding results concerning Banach frames and atomic decompositions for ( α )-shearlet smoothnessspaces show that analysis sparsity is equivalent to synthesis sparsity (cf. Theorem 5.13) for suﬃciently nice andsuﬃciently densely sampled α -shearlet frames. Using this fact, we will show in this section that we indeed have α ( N )Ψ ( f ) ≤ C ε · N − (1 − ε ) ∀ N ∈ N and f ∈ E (cid:0) R ; ν (cid:1) , where ε ∈ (0 , can be chosen arbitrarily and where Ψ is a (suitable) shearlet frame. In fact, we will also obtain acorresponding statement for α -shearlet frames. Note though that the approximation rate N − (1 − ε ) is slightly inferiorto the rate of decay in equation (6.4). Nevertheless—to the best of our knowledge—this is still the best result onapproximating cartoon-like functions by shearlets (instead of using the dual frame of a shearlet frame) which isknown.Our proof strategy is straightforward: The known analysis-sparsity results, in conjunction with our results aboutBanach frames for shearlet smoothness spaces, show that E (cid:0) R ; ν (cid:1) is a bounded subset of a certain range of shearletsmoothness spaces. Thus, using our results about atomic decompositions for these shearlet smoothness spaces, weget synthesis sparsity with respect to the (primal(!)) shearlet frame. We then truncate this (quickly decaying) seriesto obtain a good N -term approximation.We begin our considerations by recalling the notion of C β -cartoon-like functions, which were originally introduced(in a preliminary form) in [15]. Of course, one knows α ( N )Ψ ( f ) → as N → ∞ , but this holds for every f ∈ L (cid:0) R (cid:1) and every frame Ψ of L (cid:0) R (cid:1) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Deﬁnition 6.1.

Fix parameters < ̺ < ̺ < once and for all. • For ν > and β ∈ (1 , , the set STAR β ( ν ) is the family of all subsets B ⊂ [0 , for which there is some x ∈ R and a π -periodic function ̺ : R → [ ̺ , ̺ ] with ̺ ∈ C β ( R ) such that B − x = (cid:26) r · (cid:18) cos φ sin φ (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) φ ∈ [0 , π ] and ≤ r ≤ ̺ ( φ ) (cid:27) and such that the β − Hölder semi-norm [ ̺ ′ ] β − = sup φ,ϕ ∈ R ,φ = ϕ | ̺ ′ ( φ ) − ̺ ′ ( ϕ ) | | φ − ϕ | β − satisﬁes [ ̺ ′ ] β − ≤ ν . • For ν > and β ∈ (1 , , the class E β (cid:0) R ; ν (cid:1) of cartoon-like functions with regularity β is deﬁned as E β (cid:0) R ; ν (cid:1) := { f + B · f | B ∈ STAR β ( ν ) and f i ∈ C βc ([0 , ) with k f i k C β ≤ min { , ν } for i ∈ } , where k f k C β = k f k sup + k∇ f k sup + [ ∇ f ] β − and [ g ] β − = sup x,y ∈ R ,x = y | g ( x ) − g ( y ) || x − y | β − for g : R → C ℓ , as well as C βc ([0 , ) = n f ∈ C ⌊ β ⌋ (cid:0) R (cid:1) (cid:12)(cid:12)(cid:12) supp f ⊂ [0 , and k f k C β < ∞ o . Finally, we set E β (cid:0) R (cid:1) := S ν> E β (cid:0) R ; ν (cid:1) . ◭ Remark.

The deﬁnition of

STAR β ( ν ) given here is slightly more conservative than in [38, Deﬁnition 2.5], where it isonly assumed that ̺ : R → [0 , ̺ ] with < ̺ < , instead of ̺ : R → [ ̺ , ̺ ] . We also note that [ ̺ ′ ] β − = k ̺ ′′ k sup in case of β = 2 . This is a simple consequence of the deﬁnition of the derivative and of the mean-value theorem.Hence, in case of β = 2 , the deﬁnition given here is consistent with (in fact, slightly stronger than) the one used in[50, Deﬁnition 1.1].Further, we note that in [37, Deﬁnition 5.9], the class E β (cid:0) R (cid:1) is simply deﬁned as n f + B · f (cid:12)(cid:12)(cid:12) f , f ∈ C βc ([0 , ) , B ⊂ [0 , Jordan dom. with regular closed piecewise C β boundary curve o . Even for this—much more general—deﬁnition, the authors of [37] then invoke the results which are derived in [38]under the more restrictive assumptions.This is somewhat unpleasant, but does not need to concern us: In fact, in the following, we will frequently usethe notation E β (cid:0) R ; ν (cid:1) , but the precise deﬁnition of this space is not really used; all that we need to know is thatif ϕ, ψ are suitable shearlet generators, then the β -shearlet coeﬃcients c = ( c j,k,ε,m ) j,k,ε,m of f ∈ E β (cid:0) R ; ν (cid:1) satisfy c ∈ ℓ β + ε for all ε > , with k f k ℓ β + ε ≤ C ε,ν,β,ϕ,ψ . Below, we will derive this by combining [38, Theorem 4.2]with [37, Theorem 5.6], where [37, Theorem 5.6] does not use the notion of cartoon-like functions at all. (cid:7) As our ﬁrst main technical result in this section, we show that the C β -cartoon-like functions are bounded subsetsof suitably chosen α -shearlet smoothness spaces. Once we have developed this property, we obtain the claimedapproximation rate by invoking the atomic decomposition results from Theorem 5.10. Proposition 6.2.

Let ν > and β ∈ (1 , be arbitrary and let p ∈ (2 / (1 + β ) , . Then E β (cid:0) R ; ν (cid:1) is a bounded subset of S p,pβ − , (1+ β − )( p − − − ) (cid:0) R (cid:1) . ◭ Proof.

Here, we only give the proof for the case β = 2 . For β ∈ (1 , , the proof is more involved and thus postponedto the appendix (Section D). The main reason for the additional complications in case of β ∈ (1 , is that ourproof essentially requires that we already know that there is some suﬃciently nice, cone-adapted α -shearlet systemwith respect to which the C β -cartoon-like functions are analysis sparse (in a suitable “almost ℓ / (1+ β ) ” sense). Incase of β = 2 , this is known, since we then have α = β − = , so that the α -shearlet systems from Deﬁnition 5.6coincide with the usual cone-adapted shearlets, cf. Remark 5.7. But in case of β ∈ (1 , , it is only known (cf. [37,Theorem 5.6]) that C β -cartoon-like functions are analysis sparse with respect to suitable β -shearlet systems (cf.Deﬁnition D.7 and note β / ∈ [0 , , so that the notion of β -shearlets does not collide with our notion of α -shearletsfor α ∈ [0 , ) which are diﬀerent, but closely related to the β − -shearlet systems from Deﬁnition 5.6. Making thisclose connection precise is what mainly makes the proof in case of β ∈ (1 , more involved, cf. Section D.Thus, let us consider the case β = 2 . Choose φ ∈ C ∞ c ( R ) with φ ≥ and φ , so that c φ (0) = k φ k L > .By continuity of c φ , there is thus some ν > with c φ ( ξ ) = 0 on [ − ν, ν ] . Now, deﬁne φ := φ (3 • /ν ) and notethat φ ∈ C ∞ c ( R ) with c φ ( ξ ) = ν · c φ ( νξ/ = 0 for ξ ∈ [ − , .Now, set ϕ := φ ⊗ φ ∈ C ∞ c (cid:0) R (cid:1) and ψ := φ , as well as ψ := φ (8)1 , the -th derivative of φ . By diﬀerentiatingunder the integral and by performing partial integration, we get for ≤ k ≤ that d k d ξ k (cid:12)(cid:12)(cid:12)(cid:12) ξ =0 c ψ = d k d ξ k (cid:12)(cid:12)(cid:12)(cid:12) ξ =0 d φ (8)1 = Z R φ (8)1 ( x ) · ( − πix ) k d x = ( − · Z R φ ( x ) · d ( − πix ) k d x d x = 0 , (6.5) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein since d ( − πix ) k d x ≡ for ≤ k ≤ . Next, observe b ϕ ( ξ ) = c φ ( ξ ) · c φ ( ξ ) = 0 for ξ ∈ [ − , ⊃ [ − , , as well as c ψ ( ξ ) = 0 for ξ ∈ [ − , and ﬁnally c ψ ( ξ ) = (2 πiξ ) · c φ ( ξ ) = (2 π ) · ξ · c φ ( ξ ) = 0 for ξ ∈ [ − , \ { } , which in particular implies c ψ ( ξ ) = 0 for ≤ | ξ | ≤ .Now, setting ψ := ψ ⊗ ψ , we want to verify that ϕ, ψ satisfy the assumptions of Theorem 5.13 with the choices ε = , p = , s (0) = 0 and α = . Since we have ϕ ∈ C ∞ c (cid:0) R (cid:1) and ψ , ψ ∈ C ∞ c ( R ) and since c ψ ( ξ ) = 0 for ξ ∈ [ − , and c ψ ( ξ ) = 0 for ≤ | ξ | ≤ and since ﬁnally b ϕ ( ξ ) = 0 for ξ ∈ [ − , , Remark 5.11 andCorollaries 4.4 and 4.5 show that all we need to check is d ℓ d ξ ℓ (cid:12)(cid:12) ξ =0 c ψ = 0 for all ℓ = 0 , . . . , N + ⌈ Λ ⌉ − and all ℓ = 0 , . . . , N + ⌈ M ⌉ − , where N = (cid:6) p − · (2 + ε ) (cid:7) = ⌈ / ⌉ = 4 , Λ = ε + p − + max (cid:8) , (1 + α ) (cid:0) p − − (cid:1)(cid:9) = 52 ≤ , and M = ε + p − + max (cid:8) , (1 + α ) (cid:0) p − − − (cid:1)(cid:9) = 14 + 3 ≤ , cf. Theorems 5.13, 4.2, and 4.3. Hence, N + ⌈ Λ ⌉ − ≤ and N + ⌈ M ⌉ − ≤ , so that equation (6.5) showsthat ϕ, ψ indeed satisfy the assumptions of Theorem 5.13. That theorem yields because of α = some δ ∈ (0 , such that the following hold for all < δ ≤ δ : • The shearlet system SH / ( ϕ, ψ ; δ ) = (cid:0) γ [ v,k,δ ] (cid:1) v ∈ V,k ∈ Z is a frame for L (cid:0) R (cid:1) . • Since p ∈ (2 / (1 + β ) , ⊂ (cid:2) , (cid:3) = [ p , , we have S p,pβ − , (1+ β − ) ( p − ) (cid:0) R (cid:1) = S p,pα, (1+ α ) ( p − ) (cid:0) R (cid:1) = (cid:26) f ∈ L (cid:0) R (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:16) h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z ∈ ℓ p (cid:0) V × Z (cid:1)(cid:27) and there is a constant C p = C p ( ϕ, ψ, δ ) > such that k f k S p,pβ − , ( β − )( p − − − ) ≤ C p · (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) h f, γ [ v,k,δ ] i L (cid:17) v ∈ V, k ∈ Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ p ∀ f ∈ S p,pβ − , (1+ β − )( p − − − ) (cid:0) R (cid:1) . Thus, since we clearly have E (cid:0) R ; ν (cid:1) ⊂ L (cid:0) R (cid:1) , it suﬃces to show that there is a constant C = C ( p, ν, δ, ϕ, ψ ) > such that (cid:13)(cid:13) A ( δ ) f (cid:13)(cid:13) ℓ p ≤ C < ∞ for all f ∈ E (cid:0) R ; ν (cid:1) , where A ( δ ) f := (cid:0)(cid:10) f, γ [ v,k,δ ] (cid:11) L (cid:1) v ∈ V, k ∈ Z . Here, we note thatthe sequence A ( δ ) f just consists of the shearlet coeﬃcients of f (up to a trivial reordering in the translation variable k ) with respect to the shearlet frame SH ( ϕ, ψ ; δ ) = SH ( ϕ, ψ, θ ; δ ) with θ ( x, y ) = ψ ( y, x ) , cf. Remark 5.7. Hence,there is hope to derive the estimate (cid:13)(cid:13) A ( δ ) f (cid:13)(cid:13) ℓ p ≤ C as a consequence of [51, equation (3)], which states that X n>N | λ ( f ) | n ≤ C · N − · (1 + log N ) ∀ N ∈ N and f ∈ E (cid:0) R ; ν (cid:1) , (6.6)where ( | λ ( f ) | n ) n ∈ N are the absolute values of the shearlet coeﬃcients of f with respect to the shearlet frame SH ( ϕ, ψ, θ ; δ ) , ordered nonincreasingly. In particular, (cid:13)(cid:13) A ( δ ) f (cid:13)(cid:13) ℓ p = (cid:13)(cid:13) [ | λ ( f ) | n ] n ∈ N (cid:13)(cid:13) ℓ p .Note though that in order for [51, equation (3)] to be applicable, we need to verify that ϕ, ψ, θ satisfy theassumptions of [51, Theorem 1.3], i.e., ϕ, ψ, θ need to be compactly supported (which is satisﬁed) and(1) (cid:12)(cid:12)(cid:12) b ψ ( ξ ) (cid:12)(cid:12)(cid:12) . min { , | ξ | σ } · min n , | ξ | − τ o · min n , | ξ | − τ o and(2) (cid:12)(cid:12)(cid:12) ∂∂ξ b ψ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ | h ( ξ ) | · (cid:16) | ξ || ξ | (cid:17) − τ for some h ∈ L ( R ) for certain (arbitrary) σ > and τ ≥ . Furthermore, θ needs to satisfy the same estimate with interchanged rolesof ξ , ξ . But in view of θ ( x, y ) = ψ ( y, x ) , it suﬃces to establish the estimates for ψ . To this end, recall from abovethat c ψ ∈ C ∞ ( R ) is analytic with d k d ξ k (cid:12)(cid:12)(cid:12)(cid:12) ξ =0 c ψ = 0 for ≤ k ≤ . This easily implies (cid:12)(cid:12)(cid:12)c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . | ξ | for | ξ | ≤ , see e.g.the proof of Corollary 4.4, in particular equation (4.15). Furthermore, since ψ , ψ ∈ C ∞ c ( R ) , we get for arbitrary K ∈ N that (cid:12)(cid:12)(cid:12) b ψ i ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − K for i ∈ { , } . Altogether, we conclude (cid:12)(cid:12)(cid:12)c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . min n , | ξ | o · (1 + | ξ | ) − andlikewise (cid:12)(cid:12)(cid:12)c ψ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − for all ξ ∈ R , so that the ﬁrst estimate is fulﬁlled for σ := 8 > and τ := 8 ≥ .Next, we observe for ξ ∈ R with ξ = 0 that | ξ || ξ | ≤ (1 + | ξ | ) · (cid:16) | ξ | − (cid:17) ≤ · (1 + | ξ | ) · max n , | ξ | − o nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein and thus (1 + | ξ | / | ξ | ) − ≥ − · (1 + | ξ | ) − · min n , | ξ | o . But since we have c ψ ∈ S ( R ) and thus (cid:12)(cid:12)(cid:12)c ψ ′ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − , this implies (cid:12)(cid:12)(cid:12)(cid:12) ∂∂ξ b ψ ( ξ ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)c ψ ( ξ ) (cid:12)(cid:12)(cid:12) · (cid:12)(cid:12)(cid:12)c ψ ′ ( ξ ) (cid:12)(cid:12)(cid:12) . (1 + | ξ | ) − · (1 + | ξ | ) − · min n , | ξ | o . (1 + | ξ | ) − · (1 + | ξ | / | ξ | ) − , so that the second condition from above is satisﬁed for our choice τ = 8 , with h ( ξ ) = (1 + | ξ | ) − .Consequently, we conclude from [51, equation (3)] that equation (6.6) is satisﬁed. Now, for arbitrary M ∈ N ≥ ,we apply equation (6.6) with N = (cid:6) M (cid:7) ≥ , noting that (cid:6) M (cid:7) ≤ M + 1 ≤ M + M = M ≤ M to deduce M · | λ ( f ) | M ≤ | λ ( f ) | M · ( M − ⌈ M/ ⌉ ) ≤ X ⌈ M/ ⌉ ⌈ M/ ⌉ | λ ( f ) | n ≤ C · ⌈ M/ ⌉ − · (1 + log ⌈ M/ ⌉ ) ≤ C · M − · (1 + log M ) , which implies | λ ( f ) | M ≤ √ C · (cid:2) M − · (1 + log M ) (cid:3) / for M ∈ N ≥ . But since E (cid:0) R ; ν (cid:1) ⊂ L (cid:0) R (cid:1) is boundedand since the elements of the shearlet frame SH ( ϕ, ψ, θ ; δ ) are L -bounded, we have (cid:13)(cid:13) [ | λ ( f ) | n ] n ∈ N (cid:13)(cid:13) ℓ ∞ . , so thatwe get | λ ( f ) | M . (cid:2) M − · (1 + log M ) (cid:3) / for all M ∈ N and all f ∈ E (cid:0) R ; ν (cid:1) , where the implied constant is inde-pendent of the precise choice of f . But this easily yields (cid:13)(cid:13) [ | λ ( f ) | M ] M ∈ N (cid:13)(cid:13) ℓ p . , since p ∈ (cid:0) , (cid:3) = (2 / (1 + β ) , .Here, the implied constant might depend on ϕ, ψ, δ, p, ν , but not on f ∈ E (cid:0) R ; ν (cid:1) . (cid:3) We can now easily derive the claimed statement about the approximation rate of functions f ∈ E β (cid:0) R ; ν (cid:1) withrespect to β − -shearlet systems. Theorem 6.3.

Let β ∈ (1 , be arbitrary. Assume that ϕ, ψ ∈ L (cid:0) R (cid:1) satisfy the conditions of Theorem 5.10 for α = β − , p = q = β , s = 0 and s = (1 + β ) and some ε ∈ (0 , (see Remark 6.4 for simpliﬁed conditionswhich ensure that these assumptions are satisﬁed).Then there is some δ = δ ( ε, β, ϕ, ψ ) > such that for all < δ ≤ δ and arbitrary f ∈ E β (cid:0) R (cid:1) and N ∈ N , there is a function f ( N ) ∈ L (cid:0) R (cid:1) which is a linear combination of N elements of the β − -shearlet frame Ψ = SH β − ( ϕ, ψ ; δ ) = (cid:0) γ [ v,k,δ ] (cid:1) v ∈ V,k ∈ Z such that the following holds:For arbitrary σ, ν > , there is a constant C = C ( β, δ, ν, σ, ϕ, ψ ) > satisfying k f − f ( N ) k L ≤ C · N − ( β − σ ) ∀ f ∈ E β (cid:0) R ; ν (cid:1) and N ∈ N . ◭ Remark.

It was shown in [38, Theorem 2.8] that no dictionary Φ can achieve an error α ( N )Φ ( f ) ≤ C · N − θ for all N ∈ N and f ∈ E β (cid:0) R ; ν (cid:1) with θ > β , as long as one insists on a polynomial depth restriction for forming the N -term approximation. In this sense, the resulting approximation rate is almost optimal. We remark, however, thatit is not immediately clear whether the N -term approximation whose existence is claimed by the theorem above canbe chosen to satisfy the polynomial depth search restriction. There is a long-standing tradition[2, 38, 44, 37, 51] toomit further considerations concerning this question; therefore, we deferred to Section E the proof that the aboveapproximation rate can also be achieved using a polynomially restricted search depth.For more details on the technical assumption of polynomial depth restriction in N -term approximations, we referto [38, Section 2.1.1]. (cid:7) Proof.

Set α := β − . Under the given assumptions, Theorem 5.10 ensures that SH α ( ϕ, ψ ; δ ) forms an atomicdecomposition for S p,qα,s (cid:0) R (cid:1) for all p ≥ p , q ≥ q and s ≤ s ≤ s , for arbitrary < δ ≤ δ , where the constant δ = δ ( α, ε, p , q , s , s , ϕ, ψ ) = δ ( ε, β, ϕ, ψ ) > is provided by Theorem 5.10. Fix some < δ ≤ δ .Let S ( δ ) : C , u → S , α, (cid:0) R (cid:1) and C ( δ ) : S , α, (cid:0) R (cid:1) → C , u be the synthesis map and the coeﬃcient map whoseexistence and boundedness is guaranteed by Theorem 5.10, since ≥ p = q and since s ≤ ≤ s . Note directlyfrom Deﬁnition 2.8 that C , u = ℓ (cid:0) V × Z (cid:1) and that S , α, (cid:0) R (cid:1) = L (cid:0) R (cid:1) (cf. [60, Lemma 6.10]). Now, forarbitrary f ∈ E β (cid:0) R (cid:1) ⊂ L (cid:0) R (cid:1) , let ( c ( f ) j ) j ∈ V × Z := c ( f ) := C ( δ ) f ∈ ℓ (cid:0) V × Z (cid:1) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Furthermore, for f ∈ E β (cid:0) R (cid:1) and N ∈ N , choose a set J ( f ) N ⊂ V × Z with | J ( f ) N | = N and such that | c ( f ) j | ≥ | c ( f ) i | for all j ∈ J ( f ) N and all i ∈ (cid:0) V × Z (cid:1) \ J ( f ) N . For a general sequence, such a set need not exist, but since we have c ( f ) ∈ ℓ , a moment’s thought shows that it does, since for each ε > , there are only ﬁnitely many indices i ∈ V × Z satisfying | c ( f ) i | ≥ ε .Finally, set f ( N ) := S ( δ ) h c ( f ) · J ( f ) N i ∈ L (cid:0) R (cid:1) and note that f ( N ) is indeed a linear combination of (at most) N elements of SH β − ( ϕ, ψ ; δ ) = (cid:0) γ [ v,k,δ ] (cid:1) v ∈ V,k ∈ Z , by deﬁnition of S ( δ ) . Moreover, note that the so-called Stechkinlemma (see e.g. [47, Lemma 3.3]) shows (cid:13)(cid:13)(cid:13) c ( f ) − J ( f ) N · c ( f ) (cid:13)(cid:13)(cid:13) ℓ ≤ N − ( p − ) · k c ( f ) k ℓ p ∀ N ∈ N and p ∈ (0 , for which k c ( f ) k ℓ p < ∞ . (6.7)It remains to verify that the f ( N ) satisfy the stated approximation rate. To show this, let σ, ν > be arbitrary.Because of p − → β as p ↓ / (1 + β ) , there is some p ∈ (2 / (1 + β ) , satisfying p − ≥ β − σ . Set s := (1 + α ) (cid:0) p − − − (cid:1) . Observe that p ≥ p = q , as well as s = 0 ≤ s = (1 + α ) (cid:0) p − − − (cid:1) ≤ (1 + α ) (cid:18) β − (cid:19) = β (cid:0) β − (cid:1) = 12 (1 + β ) = s . Now, observe | det B ( α ) v | = u αv for all v ∈ V , so that the remark after Deﬁnition 2.8 shows that the coeﬃcientspace C p,pu s satisﬁes C p,pu s = ℓ p (cid:0) V × Z (cid:1) ֒ → ℓ (cid:0) V × Z (cid:1) = C , u . Therefore, Theorem 5.10 and the associated remark(and the inclusion S p,pα,s (cid:0) R (cid:1) ֒ → L (cid:0) R (cid:1) from Theorem 5.13) show that the synthesis map and the coeﬃcient mapfrom above restrict to bounded linear operators S ( δ ) : ℓ p (cid:0) V × Z (cid:1) → S p,pα,s (cid:0) R (cid:1) and C ( δ ) : S p,pα,s (cid:0) R (cid:1) → ℓ p (cid:0) V × Z (cid:1) . Next, Proposition 6.2 shows E β (cid:0) R (cid:1) ⊂ S p,pα,s (cid:0) R (cid:1) and even yields a constant C = C ( β, ν, p ) > satisfying k f k S p,pα,s ≤ C for all f ∈ E β (cid:0) R ; ν (cid:1) . This implies k c ( f ) k ℓ p = k C ( δ ) f k ℓ p ≤ ||| C ( δ ) ||| S p,pα,s → ℓ p · k f k S p,pα,s ≤ C · ||| C ( δ ) ||| S p,pα,s → ℓ p < ∞ ∀ f ∈ E β (cid:0) R ; ν (cid:1) . (6.8)By putting everything together and recalling S ( δ ) ◦ C ( δ ) = id S , α, = id L , we ﬁnally arrive at k f − f ( N ) k L = (cid:13)(cid:13)(cid:13) S ( δ ) C ( δ ) f − S ( δ ) h J ( f ) N · c ( f ) i(cid:13)(cid:13)(cid:13) L (cid:0) since S , α, ( R )= L ( R ) with equivalent norms (cid:1) ≍ (cid:13)(cid:13)(cid:13) S ( δ ) h c ( f ) − J ( f ) N · c ( f ) i(cid:13)(cid:13)(cid:13) S , α, ≤ ||| S ( δ ) ||| C , u → S , α, · (cid:13)(cid:13)(cid:13) c ( f ) − J ( f ) N · c ( f ) (cid:13)(cid:13)(cid:13) ℓ ( eq. (6.7) ) ≤ ||| S ( δ ) ||| C , u → S , α, · k c ( f ) k ℓ p · N − ( p − )( eq. (6.8) ) ≤ C · ||| C ( δ ) ||| S p,pα,s → ℓ p · ||| S ( δ ) ||| C , u → S , α, · N − ( p − ) (cid:0) since p − ≥ β − σ (cid:1) ≤ C · ||| C ( δ ) ||| S p,pα,s → ℓ p · ||| S ( δ ) ||| C , u → S , α, · N − ( β − σ ) for all N ∈ N and f ∈ E β (cid:0) R ; ν (cid:1) . Since p only depends on σ, β , this easily yields the desired claim. (cid:3) We close this section by making the assumptions of Theorem 6.3 more transparent:

Remark . With the choices of α, p , q , s , s from Theorem 6.3, one can choose ε = ε ( β ) ∈ (0 , such that theconstants (cid:6) p − · (2 + ε ) (cid:7) and Λ , . . . , Λ from Theorem 4.3 satisfy Λ ≤ , as well as (cid:24) εp (cid:25) = ( , if β < , , if β = 2 , Λ ≤ ( , if β < , , if β = 2 , Λ < ( , if β < , , if β = 2 and Λ < ( , if β < , , if β = 2 . Thus, in view of Remark 5.11 (which refers to Corollary 4.4), it suﬃces in every case to have ϕ ∈ C c (cid:0) R (cid:1) and ψ = ψ ⊗ ψ with ψ ∈ C c ( R ) and ψ ∈ C c ( R ) and with the following additional properties:(1) b ϕ ( ξ ) = 0 for all ξ ∈ [ − , ,(2) c ψ ( ξ ) = 0 for ≤ | ξ | ≤ and c ψ ( ξ ) = 0 for all ξ ∈ [ − , ,(3) We have d ℓ d ξ ℓ (cid:12)(cid:12) ξ =0 c ψ = 0 for ≤ ℓ ≤ . In case of β < , it even suﬃces to have this for ≤ ℓ ≤ . (cid:7) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Proof.

We have p − = β and thus p = 1 + β ∈ (2 , in case of β < . Hence, εp ∈ (2 , for ε = ε ( β ) suﬃciently small. In case of β = 2 , we get εp = 3 + εp ∈ (3 , for ε > suﬃciently small. This establishes theclaimed identity for N := (cid:6) p − · (2 + ε ) (cid:7) . For the remainder of the proof, we always assume that ε is chosen smallenough for this identity to hold.Next, the constant Λ from Theorem 4.3 satisﬁes because of α = β − that Λ = ε + 1min { p , q } + max (cid:26) , (1 + α ) (cid:18) p − (cid:19) − s (cid:27) = ε + 1 + β (cid:0) β − (cid:1) (cid:18) β − (cid:19) = ε + 12 + β − β − , which is strictly increasing with respect to β > . Therefore, we always have Λ ≤ ε + + 2 − − = 2 + + ε ≤ for ε ≤ .Furthermore, the constant Λ from Theorem 4.3 is—because of p = β < —given by Λ = 2 ε + 3 + max (cid:26) − α min { p , q } + 1 − αp + 1 + α + (cid:24) εp (cid:25) + s , (cid:27) = 2 ε + 3 + max ( (cid:0) − β − (cid:1) (1 + β )2 + (cid:0) − β − (cid:1) (1 + β )2 + 1 + β − + N + 12 (1 + β ) , ) = 2 ε + 3 + max (cid:26)

32 + 32 β + N , (cid:27) ≤ N + 12 + 2 ε. Since N = 3 for β < , this easily yields Λ ≤ for ε ≤ . Similarly, we get Λ ≤ for β = 2 and ε ≤ .Likewise, the constant Λ from Theorem 4.3 satisﬁes Λ = ε + max (cid:26) , (1 + α ) (cid:18) p + (cid:24) εp (cid:25)(cid:19) + s (cid:27) = ε + max (cid:26) , (cid:0) β − (cid:1) (cid:18) β N (cid:19) + 1 + β (cid:27) = ε + max (cid:26) ,

52 + β + N + 32 β − + β − N (cid:27) = ε + 52 + β + N + 32 β − + β − N . Hence, in case of β < , we thus get Λ = ε + + β + β − =: ε + g ( β ) , where g : (0 , ∞ ) → R is strictly convexwith g (1) = 11 and g (2) = < , so that g ( β ) < for all β ∈ (1 , . Thus, Λ < for suﬃciently small ε = ε ( β ) > . Finally, for β = 2 , we get Λ = 11 + + ε < for < ε < .As the ﬁnal constant, we consider Λ = ε + max (cid:26) − α min { p , q } + 3 − αp + 2 (cid:24) εp (cid:25) + 1 + α + s , { p , q } + 2 p + (cid:24) εp (cid:25)(cid:27) = ε + max ( (cid:0) − β − (cid:1) (1 + β )2 + (cid:0) − β − (cid:1) (1 + β )2 + 2 N + 1 + β − + 1 + β , β + 1 + β + N ) = ε + max (cid:26) β + 52 + 2 N , β + N (cid:27) = ε + 52 β + 52 + 2 N . In case of β = 2 , this means Λ = ε + 15 + < for < ε < . Finally, for β < , we get Λ <

13 + + ε < for < ε < . (cid:3) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Embeddings between α -shearlet smoothness spaces In the preceding sections, we saw that the α -shearlet smoothness spaces S p,qα,s (cid:0) R (cid:1) simultaneously characterizeanalysis and synthesis sparsity with respect to (suﬃciently nice) α -shearlet systems; see in particular Theorem 5.13.Since we have a whole family of α -shearlet systems, parametrized by α ∈ [0 , , it is natural to ask if the diﬀerentsystems are related in some way, e.g. if ℓ p -sparsity, p ∈ (0 , , with respect to α -shearlet systems implies ℓ q -sparsitywith respect to α -shearlet systems, for some q ∈ (0 , .In view of Theorem 5.13, this is equivalent to asking whether there is an embedding S p,pα , (1+ α )( p − − − ) (cid:0) R (cid:1) ֒ → S q,qα , (1+ α )( q − − − ) (cid:0) R (cid:1) . (7.1)Note, however, that equation (7.1) is equivalent to asking whether one can deduce ℓ q -sparsity with respect to α -shearlets from ℓ p -sparsity with respect to α -shearlets without any additional information . If one does haveadditional information, e.g., if one is only interested in functions f with supp f ⊂ Ω , where Ω ⊂ R is ﬁxedand bounded, then the embedding in equation (7.1) is a suﬃcient, but in general not a necessary criterion forguaranteeing that f is ℓ q -sparse with respect to α -shearlets if it is ℓ p -sparse with respect to α -shearlets.More general than equation (7.1), we will completely characterize the existence of the embedding S p ,q α ,s (cid:0) R (cid:1) ֒ → S p ,q α ,s (cid:0) R (cid:1) (7.2)for arbitrary p , p , q , q ∈ (0 , ∞ ] , α , α ∈ [0 , and s , s ∈ R . As an application, we will then see that theembedding (7.1) is never fulﬁlled for p, q ∈ (0 , , but that if one replaces the left-hand side of the embedding (7.1)by S p,pα ,ε +(1+ α )( p − − − ) (cid:0) R (cid:1) for some ε > , then the embedding holds for suitable p, q ∈ (0 , . Thus, withoutfurther information , ℓ p -sparsity with respect to α -shearlets never implies nontrivial ℓ q -sparsity with respect to α -shearlets; but one can still transfer sparsity in some sense if one has ℓ p -sparsity with respect to α -shearlets,together with a certain decay of the α -shearlet coeﬃcients with the scale .We remark that the results in this section can be seen as a continuation of the work in [37]: In that paper, theauthors develop the framework of α -molecules which allows one to transfer (analysis) sparsity results betweendiﬀerent systems that employ α -parabolic scaling; for example between α -shearlets and α -curvelets. Before [37,Theorem 4.2], the authors note that “ it might though be very interesting for future research to also let α -moleculesfor diﬀerent α ’s interact. ” In a way, this is precisely what we are doing in this section, although we focus on thespecial case of α -shearlets instead of (more general) α -molecules.In order to characterize the embedding (7.2), we will invoke the embedding theory for decomposition spaces[60]that was developed by one of the authors; this will greatly simplify the proof, since we do not need to start fromscratch. In order for the theory in [60] to be applicable to an embedding D ( Q , L p , ℓ q w ) ֒ → D ( P , L p , ℓ q v ) , the twocoverings Q = ( Q i ) i ∈ I and P = ( P j ) j ∈ J need to be compatible in a certain sense. For this, it suﬃces if Q is almostsubordinate to P (or vice versa); roughly speaking, this means that the covering Q is ﬁner than P . Precisely, itmeans that each set Q i is contained in P n ∗ j i for some j i ∈ J , where n ∈ N is ﬁxed and where P n ∗ j i = S ℓ ∈ j n ∗ i P ℓ . Here,the sets j n ∗ are deﬁned inductively, via L ∗ := S ℓ ∈ L ℓ ∗ (with ℓ ∗ as in Deﬁnition 2.1) and with L ( n +1) ∗ := ( L n ∗ ) ∗ for L ⊂ J . The following lemma establishes this compatibility between diﬀerent α -shearlet coverings. Lemma 7.1.

Let ≤ α ≤ α ≤ . Then S ( α ) = ( S ( α ) i ) i ∈ I ( α is almost subordinate to S ( α ) = ( S ( α ) j ) j ∈ I ( α . ◭ Proof.

Since we have S i ∈ I ( α S ( α ) i = R = S j ∈ I ( α S ( α ) j and since all of the sets S ( α ) i and S ( α ) j are open andpath-connected, [60, Corollary 2.13] shows that it suﬃces to show that S ( α ) is weakly subordinate to S ( α ) .This means that we have sup i ∈ I ( α | L i | < ∞ , with L i := n j ∈ I ( α ) (cid:12)(cid:12)(cid:12) S ( α ) j ∩ S ( α ) i = ∅ o for i ∈ I ( α ) . To show this, we ﬁrst consider only the case i = ( n, m, ε, ∈ I ( α )0 and let j ∈ L i be arbitrary. We nowdistinguish several cases regarding j : Case 1 : We have j = ( k, ℓ, β, ∈ I ( α )0 . Let ( ξ, η ) ∈ S ( α ) i ∩ S ( α ) j . In view of equation (3.3), this implies ξ ∈ ε (2 n / , · n ) ∩ β (cid:0) k / , · k (cid:1) , so that in particular ε = β . Furthermore, we see k / < | ξ | < · n , whichyields n − k > > − . Analogously, we get n / < | ξ | < · k and thus n − k < < . Together, theseconsiderations imply | n − k | ≤ .Furthermore, since ( ξ, η ) ∈ S ( α ) i ∩ S ( α ) j , equation (3.3) also shows ηξ = εηεξ = βηβξ ∈ n ( α − ( m − , m + 1) ∩ k ( α − ( ℓ − , ℓ + 1) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Hence, we get the two inequalities n ( α − ( m + 1) > k ( α − ( ℓ − and n ( α − ( m − < k ( α − ( ℓ + 1) and thus ℓ < ( m + 1) 2 nα − kα + k − n + 1 and ℓ > ( m −

1) 2 nα − kα + k − n − . In other words, ℓ ∈ (cid:0) ( m −

1) 2 nα − kα + k − n − , ( m + 1) 2 nα − kα + k − n + 1 (cid:1) ∩ Z =: s ( n,k ) m . But since any interval I = ( A, B ) with A ≤ B satisﬁes | I ∩ Z | ≤ B − A + 1 , the cardinality of s ( n,k ) m can be estimatedby (cid:12)(cid:12)(cid:12) s ( n,k ) m (cid:12)(cid:12)(cid:12) ≤ ( m + 1) 2 nα − kα + k − n + 1 − ( m −

1) 2 nα − kα + k − n + 1 + 1= 3 + 2 · nα − kα + k − n ( since | n − k |≤ ) ≤ · nα − ( n − α +3 = 3 + 2 · n ( α − α ) · α ( since α − α ≤ and α ≤ ) ≤ = 131 . Thus, L (0) i := n j = ( k, ℓ, β, ∈ I ( α )0 (cid:12)(cid:12)(cid:12) S ( α ) j ∩ S ( α ) i = ∅ o ⊂ n +3 [ t = n − (cid:16) { t } × s ( n,t ) m × {± } × { } (cid:17) , which is a ﬁnite set, with at most · · elements. Case 2 : We have j = ( k, ℓ, β, ∈ I ( α )0 . Let ( ξ, η ) ∈ S ( α ) i ∩ S ( α ) j . With similar arguments as in theprevious case, this implies ξ ∈ ε (2 n / , · n ) , η ∈ β (cid:0) k / , · k (cid:1) and ηξ ∈ n ( α − ( m − , m + 1) , as well as ξη ∈ k ( α − ( ℓ − , ℓ + 1) . Furthermore since ( ξ, η ) ∈ S ( α ) n,m,ε, and ( ξ, η ) ∈ S ( α ) k,ℓ,β, , we know from Lemma 3.2 that | η | < | ξ | and | ξ | < | η | .Thus, k / < | η | < · | ξ | < · · n and hence k − n < < . Likewise, n / < | ξ | < · | η | < · · k andhence n − k < , so that we get | n − k | ≤ . Now, we distinguish two subcases regarding | η/ξ | :(1) We have | η/ξ | > . Because of | m | ≤ (cid:6) n (1 − α ) (cid:7) ≤ n (1 − α ) , this implies < (cid:12)(cid:12)(cid:12)(cid:12) ηξ (cid:12)(cid:12)(cid:12)(cid:12) < n ( α − ( | m | + 1) ≤ n ( α − (cid:16) n (1 − α ) + 1 + 1 (cid:17) = 1 + 2 · n ( α − and hence

11 + 2 · n ( α − < (cid:12)(cid:12)(cid:12)(cid:12) ξη (cid:12)(cid:12)(cid:12)(cid:12) < . Furthermore, we know | ξ/η | < k ( α − ( | ℓ | + 1) , so that we get

11 + 2 · n ( α − < (cid:12)(cid:12)(cid:12)(cid:12) ξη (cid:12)(cid:12)(cid:12)(cid:12) < k ( α − ( | ℓ | + 1) and hence | ℓ | > k (1 − α ) · n ( α − − . Thus, we have | ℓ | ∈ Z ∩ (cid:18) k (1 − α ) · n ( α − − , ⌈ k (1 − α ) ⌉ (cid:21) ⊂ Z ∩ (cid:18) k (1 − α ) · n ( α − − , k (1 − α ) + 1 (cid:19) =: s ( n,k ) , where as above (cid:12)(cid:12)(cid:12) s ( n,k ) (cid:12)(cid:12)(cid:12) ≤ k (1 − α ) + 1 − k (1 − α ) · n ( α − + 1 + 1 = 3 + 2 k (1 − α ) (cid:18) −

11 + 2 · n ( α − (cid:19) = 3 + 2 k (1 − α ) · n ( α − · n ( α − ≤ · k (1 − α ) − n (1 − α ) ( since − α ≥ and | n − k |≤ ) ≤ · ( n +4)(1 − α ) − n (1 − α ) = 3 + 2 · − α ) n ( α − α ) ( since α − α ≤ and α ≥ ) ≤ · = 35 . Finally, note that | ℓ | ∈ s ( n,k ) implies ℓ ∈ ± s ( n,k ) , with (cid:12)(cid:12) ± s ( n,k ) (cid:12)(cid:12) ≤ . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein (2) We have | η/ξ | ≤ . This yields ≤ | ξ/η | < k ( α − ( | ℓ | + 1) and hence | ℓ | > k (1 − α ) − . Thus, we have | ℓ | ∈ Z ∩ (cid:16) k (1 − α ) − , ⌈ k (1 − α ) ⌉ i ⊂ Z ∩ (cid:16) k (1 − α ) − , k (1 − α ) + 1 (cid:17) =: ˜ s ( n,k ) , where one easily sees (cid:12)(cid:12) ˜ s ( n,k ) (cid:12)(cid:12) ≤ and then ℓ ∈ ± ˜ s ( n,k ) with (cid:12)(cid:12) ± ˜ s ( n,k ) (cid:12)(cid:12) ≤ .All in all, we see L (1) i := n j = ( k, ℓ, β, ∈ I ( α )0 (cid:12)(cid:12)(cid:12) S ( α ) j ∩ S ( α ) i = ∅ o ⊂ n +4 [ t = n − [ { t } × ([ ± s ( n,t ) ] ∪ [ ± ˜ s ( n,t ) ]) × {± } × { } ] and hence | L (1) i | ≤ · (70 + 6) · .In total, Cases 1 and 2 show because of L i ⊂ L (0) i ∪ L (1) i ∪ { } that | L i | ≤ | L (0) i | + | L (1) i | + |{ }| ≤ for all i = ( n, m, ε, ∈ I ( α )0 .But in case of i = ( n, m, ε, ∈ I ( α )0 , we get the same result. Indeed, if we set ˜ γ := 1 − γ for γ ∈ { , } , then I ( α )0 ∩ L ( n,m,ε, = n ( k, ℓ, β, γ ) ∈ I ( α )0 (cid:12)(cid:12)(cid:12) S ( α ) k,ℓ,β,γ ∩ S ( α ) n,m,ε, = ∅ o = n ( k, ℓ, β, γ ) ∈ I ( α )0 (cid:12)(cid:12)(cid:12) RS ( α ) k,ℓ,β, ˜ γ ∩ RS ( α ) n,m,ε, = ∅ o = n ( k, ℓ, β, ˜ γ ) ∈ I ( α )0 (cid:12)(cid:12)(cid:12) S ( α ) k,ℓ,β,γ ∩ S ( α ) n,m,ε, = ∅ o = n ( k, ℓ, β, ˜ γ ) (cid:12)(cid:12)(cid:12) ( k, ℓ, β, γ ) ∈ I ( α )0 ∩ L ( n,m,ε, o , and thus (cid:12)(cid:12) I ( α )0 ∩ L ( n,m,ε, (cid:12)(cid:12) = (cid:12)(cid:12) I ( α )0 ∩ L ( n,m,ε, (cid:12)(cid:12) ≤ , so that (cid:12)(cid:12) L ( n,m,ε, (cid:12)(cid:12) ≤ .It remains to consider the case i = 0 . But for ξ ∈ S ( α )0 = ( − , , we have | ξ | ≤ . Conversely, Lemma3.4 shows | ξ | ≥ · w j = 2 k / for all ξ ∈ S ( α ) j and all j = ( k, ℓ, β, γ ) ∈ I ( α )0 . Hence, j ∈ L can only hold if k / ≤ , i.e., if k ≤ . Since we also have | ℓ | ≤ (cid:6) k (1 − α ) (cid:7) ≤ k ≤ = 8 , this implies L ⊂ { } ∪ [ { , , , } × {− , . . . , } × {± } × { , } ] and hence | L | ≤ · · · ≤ .In total, we have shown sup i ∈ I ( α | L i | ≤ < ∞ , so that S ( α ) is weakly subordinate to S ( α ) . As seen at thebeginning of the proof, this suﬃces. (cid:3) Now that we have seen that S ( α ) is almost subordinate to S ( α ) for α ≤ α , the theory from [60] is applicable.But the resulting conditions simplify greatly, if in addition to the coverings, also the employed weights are compatiblein a certain sense. Precisely, for two coverings Q = ( Q i ) i ∈ I and P = ( P j ) j ∈ J and for a weight w = ( w i ) i ∈ I on theindex set of Q , we say that w is relatively P -moderate , if there is a constant C > with w i ≤ C · w ℓ for all i, ℓ ∈ I with Q i ∩ P j = ∅ = Q ℓ ∩ P j for some j ∈ J. Likewise, the covering Q = ( T i Q ′ i + b i ) i ∈ I is called relatively P -moderate, if the weight ( | det T i | ) i ∈ I is relatively P -moderate. Our next lemma shows that these two conditions are satisﬁed if Q and P are two α -shearlet coverings. Lemma 7.2.

Let ≤ α ≤ α ≤ and let S ( α ) and S ( α ) be the associated α -shearlet coverings. Then thefollowing hold:(1) S ( α ) is relatively S ( α ) -moderate.(2) For arbitrary s ∈ R , the weight w s = ( w si ) i ∈ I ( α with w = ( w i ) i ∈ I ( α as in Deﬁnition 3.1 (considered as aweight for S ( α ) ) is relatively S ( α ) -moderate. More precisely, we have −| s | · w sj ≤ w si ≤ | s | · w sj for all i ∈ I ( α ) and j ∈ I ( α ) with S ( α ) i ∩ S ( α ) j = ∅ . ◭ Proof.

It is not hard to see | det T ( α ) i | = w α i for all i ∈ I ( α ) . Thus, the second claim implies the ﬁrst one.To prove the second one, let i ∈ I ( α ) and j ∈ I ( α ) with S ( α ) i ∩ S ( α ) j = ∅ . Thus, there is some ξ ∈ S ( α ) i ∩ S ( α ) j .In view of Lemma 3.4, this implies w j ≤ | ξ | ≤ w i ≤ · (1 + | ξ | ) ≤ · w j , from which it easily follows that −| s | · w sj ≤ w si ≤ | s | · w sj . This establishes the second part of the second claimof the lemma. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein But this easily implies that the weight w s is relatively S ( α ) -moderate: Indeed, let i, ℓ ∈ I ( α ) be arbitrary with S ( α ) i ∩ S ( α ) j = ∅ = S ( α ) ℓ ∩ S ( α ) j for some j ∈ I ( α ) . This implies w si ≤ | s | · w sj ≤ (cid:0) (cid:1) | s | · w sℓ , as desired. (cid:3) Now that we have established the strong compatibility between the α -shearlet coverings S ( α ) and S ( α ) and ofthe associated weights, we can easily characterize the existence of embeddings between the α -shearlet smoothness. Theorem 7.3.

Let α , α ∈ [0 , with α ≤ α . For s, r ∈ R and p , p , q , q ∈ (0 , ∞ ] , the map S p ,q α ,r (cid:0) R (cid:1) → S p ,q α ,s (cid:0) R (cid:1) , f f is well-deﬁned and bounded if and only if we have p ≤ p and  r > s + (1 + α ) (cid:16) p − p (cid:17) + ( α − α ) (cid:16) q − p ±△ (cid:17) + + (1 − α ) (cid:16) q − q (cid:17) , if q < q ,r ≥ s + (1 + α ) (cid:16) p − p (cid:17) + ( α − α ) (cid:16) q − p ±△ (cid:17) + , if q ≥ q . Likewise, the map S p ,q α ,s (cid:0) R (cid:1) → S p ,q α ,r (cid:0) R (cid:1) , f f is well-deﬁned and bounded if and only if we have p ≤ p and  s > r + (1 + α ) (cid:16) p − p (cid:17) + ( α − α ) (cid:16) p ▽ − q (cid:17) + + (1 − α ) (cid:16) q − q (cid:17) , if q < q ,s ≥ r + (1 + α ) (cid:16) p − p (cid:17) + ( α − α ) (cid:16) p ▽ − q (cid:17) + , if q ≥ q . Here, we used the notations p ▽ := min { p, p ′ } , and p ±△ := min (cid:26) p , − p (cid:27) , where the conjugate exponent p ′ is deﬁned as usual for p ∈ [1 , ∞ ] and as p ′ := ∞ for p ∈ (0 , . ◭ Proof.

For the ﬁrst part, we want to invoke part (4) of [60, Theorem 7.2], with Q = S ( α ) = ( T ( α ) i Q ′ i ) i ∈ I ( α and P = S ( α ) = ( T ( α ) i Q ′ i ) i ∈ I ( α and with w = ( w ri ) i ∈ I ( α and v = ( w si ) i ∈ I ( α . To this end, we ﬁrst have to verify that Q , P , w, v satisfy [60, Assumption 7.1]. But we saw in Lemma 3.4 that w and v are Q -moderate and P -moderate,respectively. Furthermore, Q , P are almost structured coverings (cf. Lemma 3.3) and thus also semi-structuredcoverings (cf. [60, Deﬁnition 2.5]) of O = O ′ = R . Furthermore, since (cid:8) Q ′ i (cid:12)(cid:12) i ∈ I ( α ) (cid:9) is a ﬁnite family of nonemptyopen sets (for arbitrary α ∈ [0 , ), it is not hard to see that S ( α ) is an open covering of R and that there is some ε > and for each i ∈ I ( α ) some η i ∈ R with B ε ( η i ) ⊂ Q ′ i . Thus, S ( α ) is a tight , open semi-structured covering of R for all α ∈ [0 , . Hence, so are Q , P . Finally, [61, Corollary 2.7] shows that if Φ = ( ϕ i ) i ∈ I ( α and Ψ = ( ψ j ) j ∈ I ( α are regular partitions of unity for Q , P , respectively, then Φ , Ψ are L p -BAPUs (cf. [60, Deﬁnitions 3.5 and 3.6])for Q , P , simultaneously for all p ∈ (0 , ∞ ] . Hence, all assumptions of [60, Assumption 7.1] are satisﬁed.Next, Lemma 7.1 shows that P = S ( α ) is almost subordinate to Q = S ( α ) and Lemma 7.2 shows that P and v are relatively Q -moderate, so that all assumptions of [60, Theorem 7.2, part (4)] are satisﬁed.Now, let us choose, for each j ∈ I ( α ) , an arbitrary index i j ∈ I ( α ) with S ( α ) i j ∩ S ( α ) j = ∅ . Then [60, Theorem7.2, part (4)] shows that the embedding S p ,q α ,r (cid:0) R (cid:1) ֒ → S p ,q α ,s (cid:0) R (cid:1) holds if and only if we have p ≤ p and iffurthermore, the following expression (then a constant) is ﬁnite: K := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) w si j w rj · (cid:12)(cid:12)(cid:12) det T ( α ) j (cid:12)(cid:12)(cid:12) (cid:18) q − p ±△ (cid:19) + · (cid:12)(cid:12)(cid:12) det T ( α ) i j (cid:12)(cid:12)(cid:12) p − p − (cid:18) q − p ±△ (cid:19) +  j ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ ( Lemma . ≍ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ks kr · k (1+ α ) (cid:18) q − p ±△ (cid:19) + · k (1+ α ) " p − p − (cid:18) q − p ±△ (cid:19) +  ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k ( s − r )+(1+ α ) (cid:18) q − p ±△ (cid:19) + +(1+ α ) " p − p − (cid:18) q − p ±△ (cid:19) + ! ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k s − r +( α − α ) (cid:18) q − p ±△ (cid:19) + +(1+ α ) (cid:16) p − p (cid:17)! ! ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Note that we only took the norm of the sequence with j ∈ I ( α )0 , omitting the term for j = 0 , in contrast to thedeﬁnition of K in [60, Theorem 7.2]. This is justiﬁed, since we are only interested in ﬁniteness of the norm, forwhich the single (ﬁnite(!)) term for j = 0 is irrelevant.Now, we distinguish two diﬀerent cases regarding q and q : Case 1 : We have q < q . This implies ̺ := q · ( q /q ) ′ < ∞ , cf. [60, Equation (4.3)]. For brevity, let us deﬁne Θ := s − r + ( α − α ) (cid:16) q − p ±△ (cid:17) + + (1 + α ) (cid:16) p − p (cid:17) . Then, we get K ̺ ≍ (cid:13)(cid:13)(cid:13)(cid:0) k Θ (cid:1) ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13) ̺ℓ ̺ = X ( k,ℓ,β,γ ) ∈ I ( α k · ̺ · Θ = ∞ X k =0 k · ̺ · Θ X | ℓ |≤ ⌈ k (1 − α ⌉ X β ∈{± } X γ ∈{ , }

1= 4 · ∞ X k =0 k · ̺ · Θ (1 + 2 · ⌈ k (1 − α ) ⌉ ) ≍ ∞ X k =0 k ( ̺ · Θ+1 − α ) . Now, note from the remark to [60, Lemma 4.8] that p · ( q/p ) ′ = (cid:16) p − q (cid:17) + for arbitrary p, q ∈ (0 , ∞ ] . Hence, in thepresent case, we have ̺ − = (cid:0) q − − q − (cid:1) + = q − − q − . Therefore, we see that the last sum from above—andtherefore K —is ﬁnite if and only if ̺ · Θ + 1 − α < . But this is equivalent to s − r + ( α − α ) q − p ±△ ! + + (1 + α ) (cid:18) p − p (cid:19) = Θ ! < ( α − · (cid:0) q − − q − (cid:1) , from which it easily follows that the claimed equivalence from the ﬁrst part of the theorem holds in case of q < q . Case 2 : We have q ≥ q . This implies q · ( q /q ) ′ = ∞ , cf. [60, Equation (4.3)]. Thus, with Θ as in theprevious case, we have K ≍ sup ( k,ℓ,β,γ ) ∈ I ( α k Θ , so that K is ﬁnite if and only if Θ ≤ , which is equivalent to r ≥ s + ( α − α ) q − p ±△ ! + + (1 + α ) (cid:18) p − p (cid:19) . As in the previous case, this shows for q ≥ q that the claimed equivalence from the ﬁrst part of the theorem holds.For the second part of the theorem, we make use of part (4) of [60, Theorem 7.4], with Q = S ( α ) = ( T ( α ) i Q ′ i ) i ∈ I ( α and P = S ( α ) = ( T ( α ) i Q ′ i ) i ∈ I ( α and with w = ( w si ) i ∈ I ( α and v = ( v ri ) i ∈ I ( α . As above, one sees that the corre-sponding assumptions are fulﬁlled.Thus, [60, Theorem 7.4, part (4)] shows that the embedding S p ,q α ,s (cid:0) R (cid:1) ֒ → S p ,q α ,r (cid:0) R (cid:1) holds if and only if wehave p ≤ p and if furthermore the following expression (then a constant) is ﬁnite: C := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) w rj w si j · (cid:12)(cid:12)(cid:12) det T ( α ) j (cid:12)(cid:12)(cid:12) (cid:18) p ▽ − q (cid:19) + · (cid:12)(cid:12)(cid:12) det T ( α ) i j (cid:12)(cid:12)(cid:12) p − p − (cid:18) p ▽ − q (cid:19) +  j ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ , where for each j ∈ I ( α ) an arbitrary index i j ∈ I ( α ) with S ( α ) i j ∩ S ( α ) j = ∅ is chosen.But in view of Lemma 7.2, it is not hard to see that C satisﬁes C ≍ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) kr ks · k (1+ α ) (cid:18) p ▽ − q (cid:19) + · k (1+ α ) " p − p − (cid:18) p ▽ − q (cid:19) +  ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (1+ α ) " p − p − (cid:18) p ▽ − q (cid:19) + +(1+ α ) (cid:18) p ▽ − q (cid:19) + − s + r ! ! ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k (1+ α ) (cid:16) p − p (cid:17) +( α − α ) (cid:18) p ▽ − q (cid:19) + − s + r ! ! ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q · ( q /q ′ . As above, we distinguish two cases regarding q and q : nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Case 1 : We have q < q , so that ̺ := q · ( q /q ) ′ < ∞ . But setting Γ := (1 + α ) (cid:18) p − p (cid:19) + ( α − α ) (cid:18) p ▽ − q (cid:19) + − s + r, we have C ̺ ≍ (cid:13)(cid:13)(cid:13)(cid:0) k Γ (cid:1) ( k,ℓ,β,γ ) ∈ I ( α (cid:13)(cid:13)(cid:13) ̺ℓ ̺ = X ( k,ℓ,β,γ ) ∈ I ( α k · ̺ · Γ = ∞ X k =0 k · ̺ · Γ X | ℓ |≤ ⌈ k (1 − α ⌉ X β ∈{± } X γ ∈{ , } ≍ ∞ X k =0 k ( ̺ · Γ+1 − α ) . As above, we have ̺ − = (cid:0) q − − q − (cid:1) + = q − − q − and we see that the last sum—and thus C —is ﬁnite if andonly if we have ̺ · Γ + 1 − α < , which is equivalent to (1 + α ) (cid:18) p − p (cid:19) + ( α − α ) (cid:18) p ▽ − q (cid:19) + − s + r = Γ ! < ( α − · (cid:0) q − − q − (cid:1) . Based on this, it is not hard to see that the equivalence stated in the second part of the theorem is valid for q < q . Case 2 : We have q ≥ q , so that q · ( q /q ) ′ = ∞ . In this case, we have—with Γ as above—that C ≍ sup ( k,ℓ,β,γ ) ∈ I ( α k Γ , which is ﬁnite if and only if Γ ≤ , which is equivalent to s ≥ r + (1 + α ) (cid:18) p − p (cid:19) + ( α − α ) (cid:18) p ▽ − q (cid:19) + . This easily shows that the claimed equivalence from the second part of the theorem also holds for q ≥ q . (cid:3) With Theorem 7.3, we have established the characterization of the general embedding from equation (7.2). Ourmain application, however, was to determine under which conditions ℓ p -sparsity of f with respect to α -shearletsystems implies ℓ q -sparsity of f with respect to α -shearlet systems, if one has no additional information . As dis-cussed around equation (7.1), this amounts to an embedding S p,pα , (1+ α )( p − − − ) (cid:0) R (cid:1) ֒ → S q,qα , (1+ α )( q − − − ) (cid:0) R (cid:1) .Since we are only interested in nontrivial sparsity, and since arbitrary L functions have α -shearlet coeﬃcients in ℓ , the only interesting case is for p, q ≤ . This setting is considered in our next lemma: Lemma 7.4.

Let α , α ∈ [0 , with α = α , let p, q ∈ (0 , and let ε ∈ [0 , ∞ ) . The embedding S p,pα ,ε +(1+ α )( p − − − ) (cid:0) R (cid:1) ֒ → S q,qα , (1+ α )( q − − − ) (cid:0) R (cid:1) holds if and only if we have p ≤ q and q ≥ (cid:16) + ε | α − α | (cid:17) − . ◭ Remark.

The case ε = 0 corresponds to the embedding which is considered in equation (7.1). Here, the precedinglemma shows that the embedding can only hold if q ≥ . Since the α -shearlet coeﬃcients of every L function are ℓ -sparse, we see that ℓ p -sparsity with respect to α -shearlets does not imply any nontrivial ℓ q -sparsity with respectto α -shearlets for α = α , if no additional information than the ℓ p -sparsity with respect to α -shearlets is given .But in conjunction with Theorem 5.13, we see that if the α -shearlet coeﬃcients (cid:0)(cid:10) f, ψ [( j,ℓ,ι ) ,k ] (cid:11) L (cid:1) ( j,ℓ,ι ) ∈ I ( α ,k ∈ Z satisfy (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) εj · D f, ψ [( j,ℓ,ι ) ,k ] E L (cid:17) ( j,ℓ,ι ) ∈ I ( α , k ∈ Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ p < ∞ (7.3)for some ε > , then one can derive ℓ q -sparsity with respect to α -shearlets for q ≥ max (cid:26) p, (cid:16) + ε | α − α | (cid:17) − (cid:27) .Observe that equation (7.3) combines an ℓ p -estimate with a decay of the coeﬃcients with the scale parameter j ∈ N . (cid:7) Proof.

Theorem 7.3 shows that the embedding can only hold if p ≤ q . Thus, we only need to show for < p ≤ q ≤ that the stated embedding holds if and only if we have q ≥ (cid:16) + ε | α − α | (cid:17) − . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein For brevity, let s := ε + (1 + α ) (cid:0) p − − − (cid:1) and r := (1 + α ) (cid:0) q − − − (cid:1) . We start with a few auxiliaryobservations: Because of p ≤ q ≤ , we have q ▽ = min { q, q ′ } = q and p ±△ = min n p , − p o = 1 − p , as well as q ▽ − p = q − p ≤ and p + q ≥ , so that q − p ±△ = q − p ≥ .Now, let us ﬁrst consider the case α < α . Since we assume p ≤ q , Theorem 7.3 shows that the embeddingholds if and only if s ! ≥ r + (1 + α ) (cid:18) p − q (cid:19) + ( α − α ) (cid:18) q ▽ − p (cid:19) + ⇐⇒ (1 + α ) (cid:0) p − − − (cid:1) + ε ! ≥ (1 + α ) (cid:0) q − − − (cid:1) + (1 + α ) (cid:0) p − − q − (cid:1) ⇐⇒ ε ! ≥ (1 + α ) (cid:0) q − − − (cid:1) + (1 + α ) (cid:0) − − q − (cid:1) = ( α − α ) (cid:0) q − − − (cid:1) ( since α − α > ) ⇐⇒ εα − α + 12 ! ≥ q ⇐⇒ q ! ≥ (cid:18)

12 + εα − α (cid:19) − = (cid:18)

12 + ε | α − α | (cid:19) − . Finally, we consider the case α > α . Again, since p ≤ q , Theorem 7.3 (with interchanged roles of α , α and r, s )shows that the desired embedding holds if and only if s ! ≥ r + (1 + α ) (cid:18) p − q (cid:19) + ( α − α ) (cid:18) q − p ±△ (cid:19) + ( since q − − p − ≥ ) ⇐⇒ (1 + α ) (cid:18) p − (cid:19) + ε ! ≥ (1 + α ) (cid:18) q − (cid:19) + (1 + α ) (cid:18) p − q (cid:19) + ( α − α ) (cid:18) q − p (cid:19) ⇐⇒ ε ! ≥ (1 + α ) (cid:0) p − − − (cid:1) + (1 + α ) (cid:0) − − p − (cid:1) + ( α − α ) (cid:0) q − − p − (cid:1) ⇐⇒ ε ! ≥ ( α − α ) (cid:0) − − p − + q − − p − (cid:1) = ( α − α ) (cid:0) q − − − (cid:1) ( since α − α > ) ⇐⇒ εα − α + 12 ! ≥ q ⇐⇒ q ! ≥ (cid:18)

12 + εα − α (cid:19) − = (cid:18)

12 + ε | α − α | (cid:19) − . This completes the proof. (cid:3)

Acknowledgments

We would like to thank Gitta Kutyniok for pushing us to improve the statement and the proof of Lemma 4.1and thus also of Theorems 4.2, 4.3 and Remark 6.4. Without her positive insistence, the proof of Lemma 4.1would be about pages longer and Theorem 6.3 concerning the approximation of C -cartoon-like functions withshearlets would require ≈ vanishing moments and generators in C Mc (cid:0) R (cid:1) with M ≈ , while our new improvedconditions only require vanishing moments and generators in C c (cid:0) R (cid:1) , cf. Remark 6.4.FV would like to express warm thanks to Hartmut Führ for several fruitful discussions and suggestions relatedto the present paper, in particular for suggesting the title “ analysis vs. synthesis sparsity for shearlets ” which weadopted nearly unchanged. FV would also like to thank Philipp Petersen for useful discussions related to the topicsin this paper and for suggesting some changes in the notation.Both authors would like to thank Jackie Ma for raising the question whether membership in shearlet smoothnessspaces can also be characterized using compactly supported shearlets. We also thank Martin Schäfer for checkingparts of the introduction related to the paper [56] for correctness.Both authors acknowledge support from the European Commission through DEDALE (contract no. 665044)within the H2020 Framework Program. AP also acknowledges partial support by the Lichtenberg ProfessorshipGrant of the Volkswagen Stiftung awarded to Christian Kuehn. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Appendix A. Nonequivalence of analysis and synthesis sparsity for general frames

In this section, we present two examples which show that for general frames, neither does analysis sparsity implysynthesis sparsity, nor vice versa. We begin with the (easier) case that synthesis sparsity does not imply analysissparsity:

Example A.1.

We consider the Hilbert space ℓ ( N ) with the standard orthonormal basis given by ( δ n ) n ∈ N . Thefamily Ψ := ( ψ n ) n ∈ N given by ψ n := δ n for n ∈ N and by ψ := (cid:0) ℓ (cid:1) ℓ ∈ N clearly forms a frame in ℓ ( N ) .Furthermore, f := ψ is clearly ℓ p -synthesis sparse with respect to Ψ for arbitrary p ∈ (0 , , since we have f = P n ∈ N c n ψ n with ( c n ) n ∈ N = δ ∈ ℓ p ( N ) for all p ∈ (0 , . But the analysis coeﬃcients are given by A Ψ f = ( h f, ψ n i ) n ∈ N with h f, ψ n i = n for n ∈ N . Hence, A Ψ f / ∈ ℓ p ( N ) for p ∈ (0 , .Thus, for general frames, it is not true that ℓ p -synthesis sparsity implies ℓ p -analysis sparsity. (cid:7) Finally, we give a counterexample to the reverse implication. We remark that the counterexample constructedbelow is in fact a Riesz basis, not simply a frame.

Example A.2.

We again consider the Hilbert space ℓ ( N ) with the standard orthonormal basis given by ( δ n ) n ∈ N .Choose some N ∈ N with N > P ∞ n =1 1 n (i.e., N > π ≈ . ) and set ψ n := δ n − N · n · N · n X ℓ =1 δ n + ℓ for n ∈ N . Note that ψ n ∈ ℓ ( N ) ֒ → ℓ ( N ) with k ψ n k ℓ ≤ N · n P N · n ℓ =1 . We now want to show that the analysisoperator A Ψ : ℓ ( N ) → ℓ ( N ) , x ( h x, ψ n i ) n ∈ N associated to the family Ψ = ( ψ n ) n ∈ N is well-deﬁned, boundedand invertible. For this, it suﬃces by a Neumann series argument to show sup k x k ℓ ≤ k x − A Ψ x k ℓ < .But for arbitrary x = ( x n ) n ∈ N ∈ ℓ ( N ) , we have k x − A Ψ x k ℓ = (cid:13)(cid:13) ( x n ) n ∈ N − ( h x, ψ n i ) n ∈ N (cid:13)(cid:13) ℓ = ∞ X n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N · n N · n X ℓ =1 x n + ℓ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∞ X n =1  N · n N · n X ℓ =1 | x n + ℓ |  ( Cauchy-Schwarz ) ≤ ∞ X n =1  N · n vuut N · n X ℓ =1 | x n + ℓ | · vuut N · n X ℓ =1  = ∞ X n =1  N · n N · n X ℓ =1 | x n + ℓ |  ≤ ∞ X m =1 | x m | · N · ∞ X n =1 n , so that we get sup k x k ℓ ≤ k x − A Ψ x k ℓ ≤ q N · P ∞ n =1 n − < , as desired.As seen above, this implies that A Ψ : ℓ ( N ) → ℓ ( N ) is well-deﬁned, bounded and boundedly invertible. Hence,so is the synthesis operator S Ψ : ℓ ( N ) → ℓ ( N ) , ( c n ) n ∈ N P n ∈ N c n ψ n , since S Ψ = A ∗ Ψ . Therefore, the family Ψ = ( ψ n ) n ∈ N = ( S Ψ δ n ) n ∈ N is the image of an orthonormal basis under an invertible linear operator, so that Ψ is a Riesz-basis and in particular a frame for ℓ ( N ) , see [4, Deﬁnition 3.6.1, Proposition 3.6.4 and Theorem 3.6.6].Now, set f := δ ∈ ℓ ( N ) and note supp ψ n ⊂ { n, n + 1 , . . . } for every n ∈ N , so that h f, ψ n i = 0 for all n ≥ .Hence, A Ψ f = δ ∈ ℓ p ( N ) for all p ∈ (0 , , so that f is analysis sparse with respect to Ψ .But f is not ℓ p -synthesis sparse with respect to Ψ for p ≤ : If f = S Ψ c for c = ( c n ) n ∈ N ∈ ℓ p ( N ) ֒ → ℓ ( N ) with p ≤ , then the uniform boundedness k ψ n k ℓ ≤ ensures that the series f = P n ∈ N c n ψ n converges unconditionallyin ℓ ( N ) . In particular, with the continuous linear functional ϕ : ℓ ( N ) → C , ( x n ) n ∈ N P n ∈ N x n , we would have ϕ ( f ) = P n ∈ N c n ϕ ( ψ n ) = 0 , since ϕ ( ψ n ) = 0 for all n ∈ N .This contradiction shows that f is not ℓ p -synthesis sparse with respect to Ψ for p ≤ , even though f is ℓ p -analysissparse. (cid:7) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Appendix B. The α -shearlet covering is almost structured In this section, we provide the proof of Lemma 3.3, whose statement we repeat here for the sake of convenience:

Lemma.

The α -shearlet covering S ( α ) from Deﬁnition 3.1 is an almost structured covering of R .Proof. First of all, we deﬁne the family ( T i P ′ i + b i ) i ∈ I , with P ′ i := U (1 / , / − / , / for i ∈ I and P ′ := (cid:0) − , (cid:1) ; allbeing open sets. It is not hard to see P ′ i ⊂ Q ′ i for all i ∈ I . We now show that ( T i P ′ i + b i ) i ∈ I covers R . First, wenote ⌈ n (1 − α ) ⌉ [ m = −⌈ n (1 − α ) ⌉ (cid:18) n ( α − (cid:18) m − (cid:19) , n ( α − (cid:18) m + 34 (cid:19)(cid:19) = 2 n ( α − ⌈ n (1 − α ) ⌉ [ m = −⌈ n (1 − α ) ⌉ (cid:18) m − , m + 34 (cid:19) = 2 n ( α − (cid:18) − ⌈ n (1 − α ) ⌉ − , ⌈ n (1 − α ) ⌉ + 34 (cid:19) ⊃ n ( α − (cid:18) − n (1 − α ) − , n (1 − α ) + 34 (cid:19) = (cid:18) − − · n ( α − , · n ( α − (cid:19) ⊃ [ − , . Using this inclusion, as well as equation (3.3), and recalling G n = (cid:6) n (1 − α ) (cid:7) , we conclude ∞ [ n =0 G n [ m = − G n T ( α ) n,m, , P ′ n,m, , = ∞ [ n =0 ⌈ n (1 − α ) ⌉ [ m = −⌈ n (1 − α ) ⌉ U ( n , · n )( n ( α − ( m − / , n ( α − ( m +3 / ) ⊃ ∞ [ n =0 (cid:26)(cid:18) ξη (cid:19) ∈ (cid:18) n , · n (cid:19) × R (cid:12)(cid:12)(cid:12)(cid:12) ηξ ∈ [ − , (cid:27) ⊃ (cid:26) (cid:18) ξη (cid:19) ∈ (cid:18) , ∞ (cid:19) × R (cid:12)(cid:12)(cid:12)(cid:12) | η | ≤ | ξ | (cid:27) . Furthermore, since T ( α ) j,ℓ, − , = − T ( α ) j,ℓ, , , we have ∞ [ n =0 G n [ m = − G n [ ε ∈{± } T ( α ) n,m,ε, P ′ n,m,ε, ⊃ (cid:26) (cid:18) ξη (cid:19) ∈ R (cid:12)(cid:12)(cid:12)(cid:12) | ξ | > and | η | ≤ | ξ | (cid:27) , and since R (cid:0) ξη (cid:1) = (cid:0) ηξ (cid:1) , we ﬁnally get [ i ∈ I T i P ′ i ⊃ (cid:26) (cid:18) ξη (cid:19) ∈ R (cid:12)(cid:12)(cid:12)(cid:12) | ξ | > and | η | ≤ | ξ | (cid:27) ∪ (cid:26) (cid:18) ξη (cid:19) ∈ R (cid:12)(cid:12)(cid:12)(cid:12) | η | > and | ξ | ≤ | η | (cid:27) =: M. Since we clearly have T P ′ + b = (cid:0) − , (cid:1) ⊃ (cid:2) − , (cid:3) , it suﬃces to show that each (cid:0) ξη (cid:1) ∈ R \ (cid:2) − , (cid:3) satisﬁes (cid:0) ξη (cid:1) ∈ M , in order to prove that ( T i P ′ i + b i ) i ∈ I covers all of R . To see this, we distinguish two cases for (cid:0) ξη (cid:1) ∈ R \ (cid:2) − , (cid:3) : Case | η | ≥ | ξ | . Then | η | > , since otherwise we would have | ξ | ≤ | η | ≤ , contradicting (cid:0) ξη (cid:1) ∈ R \ (cid:2) − , (cid:3) .Hence, (cid:0) ξη (cid:1) ∈ M . Case | η | ≤ | ξ | . Then | ξ | > , since otherwise we would have | η | ≤ | ξ | ≤ contradicting (cid:0) ξη (cid:1) ∈ R \ (cid:2) − , (cid:3) .Hence, (cid:0) ξη (cid:1) ∈ M .All in all, we have shown that ( T i P ′ i + b i ) i ∈ I is a covering of R ; because of Q i = T i Q ′ i + b i ⊃ T i P ′ i + b i for all i ∈ I ,we also see that S ( α ) covers all of R . Moreover, the sets { P ′ i | i ∈ I } and { Q ′ i | i ∈ I } are ﬁnite; in fact, each of thesesets only has two elements. Furthermore, we clearly have Q i = T i Q ′ i + b i ⊂ R for all i ∈ I .Thus, to verify that S ( α ) is an almost structured covering of R , we only have to verify that S ( α ) is admissibleand that sup i ∈ I sup j ∈ i ∗ (cid:13)(cid:13) T − i T j (cid:13)(cid:13) is ﬁnite, cf. Deﬁnition 2.1. To this end, we deﬁne M i := i ∗ ∩ I and M ( ν ) i := { ( k, ℓ, β, γ ) ∈ M i | γ = ν } , as well as C ( ν ) i := sup j ∈ M ( ν ) i (cid:13)(cid:13) T − i T j (cid:13)(cid:13) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein for i ∈ I and ν ∈ { , } . Note M i = M (0) i ⊎ M (1) i . Next, for ( k, ℓ, β, γ ) ∈ I , we deﬁne ( k, ℓ, β, γ ) ′ := ( ( k, ℓ, β, , if γ = 0 , ( k, ℓ, β, , if γ = 1 . (B.1)It is not hard to see T i ′ = RT i and Q ′ i ′ = Q ′ i = Q for all i ∈ I . Hence, we have the following equivalence for i, j ∈ I : ∅ = S ( α ) i ∩ S ( α ) j ⇐⇒ ∅ = T i Q ∩ T j Q ⇐⇒ ∅ = R [ T i Q ∩ T j Q ] = T i ′ Q ∩ T j ′ Q ⇐⇒ ∅ = S ( α ) i ′ ∩ S ( α ) j ′ . Furthermore, T − i T j = ( RT i ) − RT j = T − i ′ · T j ′ . Hence, M i ′ = { j ′ | j ∈ M i } and C ( ν ) i = C ( ν ) i ′ for ν ∈ { , } and all i ∈ I , so that it suﬃces to consider the case i = ( n, m, ε, ∈ I from now on. We distinguish two cases regarding j ∈ M i : Case 1 : j = ( k, ℓ, β, ∈ M (0) i . We have ∅ = S ( α ) i ∩ S ( α ) j . Since S ( α ) i ⊂ ε (0 , ∞ ) × R and S ( α ) j ⊂ β (0 , ∞ ) × R ,this implies ε = β , so that equation (3.3) yields ∅ = ε · (cid:16) S ( α ) i ∩ S ( α ) j (cid:17) = S ( α ) n,m, , ∩ S ( α ) k,ℓ, , = U (2 n / , · n ) ( n ( α − ( m − , n ( α − ( m +1) ) ∩ U ( k / , · k )( k ( α − ( ℓ − , k ( α − ( ℓ +1) ) ⊂ (0 , ∞ ) × R . Now, we consider the diﬀeomorphism

Φ : (0 , ∞ ) × R → (0 , ∞ ) × R , ( ξ, η ) (cid:16) ξ, ηξ (cid:17) and observe the easily veriﬁableidentity Φ (cid:16) U ( γ,µ )( a,b ) (cid:17) = ( γ, µ ) × ( a, b ) . Consequently, we get ∅ = (cid:20)(cid:18) n , · n (cid:19) ∩ (cid:18) k , · k (cid:19)(cid:21) × h(cid:16) n ( α − ( m − , n ( α − ( m + 1) (cid:17) ∩ (cid:16) k ( α − ( ℓ − , k ( α − ( ℓ + 1) (cid:17)i . In particular, k < · n and n < · k , which yields k − n < < and n − k < < . Thus, | k − n | < andhence | k − n | ≤ , since k − n ∈ Z .Furthermore, we get k ( α − ( ℓ − < n ( α − ( m + 1) and n ( α − ( m − < k ( α − ( ℓ + 1) , which implies ℓ − < ( n − k )( α − ( m + 1) and ℓ + 1 > ( n − k )( α − ( m − . Because of ≤ − α ≤ and | k − n | ≤ , we have ( n − k )( α − = 2 (1 − α )( k − n ) ≤ and thus (1 − α )( k − n ) m − ≤ − − (1 − α )( k − n ) + 2 (1 − α )( k − n ) m < ℓ < ( k − n )(1 − α ) + 2 (1 − α )( k − n ) m ≤ (1 − α )( k − n ) m + 9 . Thus, with M n,m,λ := Z ∩ (cid:2) (1 − α )( λ − n ) m − , (1 − α )( λ − n ) m + 9 (cid:3) , we have shown j = ( k, ℓ, β, ∈ n +3 [ λ = n − [ { λ } × M n,m,λ × { ε } × { } ] . Because of | M n,m,λ | ≤ , the set on the right-hand side has at most ·

19 = 133 elements, so that we get (cid:12)(cid:12)(cid:12) M (0) i (cid:12)(cid:12)(cid:12) ≤

133 =: N .Finally, we note (cid:13)(cid:13) T − i T j (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − m (cid:19) (cid:18) − n

00 2 − nα (cid:19) (cid:18) k

00 2 kα (cid:19) (cid:18) ℓ (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) k − n α ( k − n ) ℓ − k − n m α ( k − n ) (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) . Now, since | k − n | ≤ , we have ≤ k − n ≤ and ≤ α ( k − n ) ≤ α ≤ . Furthermore, we saw above that (cid:12)(cid:12) ℓ − (1 − α )( k − n ) m (cid:12)(cid:12) ≤ , so that we get (cid:12)(cid:12)(cid:12) α ( k − n ) ℓ − k − n m (cid:12)(cid:12)(cid:12) = 2 α ( k − n ) · (cid:12)(cid:12)(cid:12) ℓ − (1 − α )( k − n ) m (cid:12)(cid:12)(cid:12) ≤ · α ( k − n ) ≤ · . All in all, this implies (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ · ≤ = 128 . Since j ∈ M (0) i was arbitrary, we conclude C (0) i ≤

128 =: K . Case 2 : j = ( k, ℓ, β, ∈ M (1) i . By deﬁnition of M i , there is some (cid:0) ξη (cid:1) ∈ S ( α ) i ∩ S ( α ) j . Lemma 3.2 implies n − < (cid:12)(cid:12)(cid:0) ξη (cid:1)(cid:12)(cid:12) < n +4 , as well as k − < (cid:12)(cid:12)(cid:0) ξη (cid:1)(cid:12)(cid:12) < k +4 and thus n − < k +4 as well as k − < n +4 . Consequently, | n − k | < and thus | n − k | ≤ , since n − k ∈ Z . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Next, we explicitly compute the transition matrix T − i T j : T − i T j = (cid:16) A ( α ) n,m,ε (cid:17) − RA ( α ) k,ℓ,β = εβ (cid:18) − m (cid:19) (cid:18) − n

00 2 − αn (cid:19) (cid:18) (cid:19) (cid:18) k kα ℓ kα (cid:19) = εβ (cid:18) − n − − n m − nα (cid:19) (cid:18) ℓ kα kα k (cid:19) = εβ (cid:18) kα − n ℓ kα − n k − nα − kα − n ℓm − kα − n m (cid:19) . (B.2)Now we distinguish three diﬀerent subcases regarding α ∈ [0 , and n ∈ N : Case 2(a) : α = 1 and n < − α . Since | n − k | ≤ this implies k ≤ n < − α = − α − α ≤ − α . We thushave M (1) i ⊂ ⌈ / (1 − α ) ⌉ [ k =0 [ { k } × {− ⌈ k (1 − α ) ⌉ , . . . , ⌈ k (1 − α ) ⌉} × {± } × { , } ] =: M, and hence | M (1) i | ≤ | M | ≤ N , for some absolute constant N = N ( α ) ∈ N , since M is a ﬁnite set. Note also that i ∈ M , since n < − α ≤ − α . Consequently, C (1) i = sup j ∈ M (1) i (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ max γ,λ ∈ M (cid:13)(cid:13) T − λ T γ (cid:13)(cid:13) =: K . Case 2(b) : α = 1 and n ≥ − α . Since | n − k | ≤ this implies k ≥ n − ≥ − α − α − α . We know fromLemma 3.2 that < | ξ | < | η | and < | η | < | ξ | , i.e., < (cid:12)(cid:12)(cid:12) ηξ (cid:12)(cid:12)(cid:12) < .Now, we claim | m | ≥ − . To see this, assume towards a contradiction that | m | < − . This implies becauseof n ≥ − α , because of equation (3.3) and because of (cid:0) ξη (cid:1) ∈ S ( α ) i = S ( α ) n,m,ε, that ηξ ∈ (cid:16) − n (1 − α ) ( m − , − n (1 − α ) ( m + 1) (cid:17) ⊂ (cid:16) − / n (1 − α ) , / n (1 − α ) (cid:17) ⊂ (cid:18) − / , / (cid:19) ⊂ (cid:18) − , (cid:19) , in contradiction to (cid:12)(cid:12)(cid:12) ηξ (cid:12)(cid:12)(cid:12) > . Thus we must have | m | ≥ − .Likewise, we have | ℓ | ≥ − . Indeed, since we have k ≥ α − α and < (cid:12)(cid:12)(cid:12) ξη (cid:12)(cid:12)(cid:12) < , the assumption | ℓ | < − yields the contradiction ξη ∈ (cid:16) − k (1 − α ) ( ℓ − , − k (1 − α ) ( ℓ + 1) (cid:17) ⊂ (cid:16) − / α , / α (cid:17) ⊂ (cid:16) − / , / (cid:17) ⊂ (cid:18) − , (cid:19) . Consequently, we must have | ℓ | ≥ − .Now, since | m | ≥ − , we either have m ≥ − > or m ≤ − < . Let us distinguish these two cases: Case 2(b)(i ): m ≥ − . Since (cid:0) ξη (cid:1) ∈ S ( α ) n,m,ε, ∩ S ( α ) k,ℓ,β, = S ( α ) n,m,ε, ∩ RS ( α ) k,ℓ,β, and using equation (3.3), we see ηξ > − n (1 − α ) ( m − > and < ξη < − k (1 − α ) ( ℓ + 1) . Hence, ℓ > − and since | ℓ | ≥ − , we have ℓ ≥ − .First, we want to show m ≥ n (1 − α ) − . Thus, assume towards a contradiction that m < n (1 − α ) − − andnote that n (1 − α ) − − n (1 − α ) − ≥ − > , since n ≥ − α . Now, we get ξη < − k (1 − α ) ( ℓ + 1) ≤ − k (1 − α ) ( ⌈ k (1 − α ) ⌉ + 1) < − k (1 − α ) (cid:16) k (1 − α ) + 1 + 1 (cid:17) = 1 + 2 − k (1 − α )+1 and ξη = (cid:18) ηξ (cid:19) − > (cid:16) − n (1 − α ) ( m + 1) (cid:17) − = 2 n (1 − α ) m + 1 > n (1 − α ) n (1 − α ) − = 1 + 2 n (1 − α ) − > n (1 − α ) . Thus − k (1 − α )+1 > n (1 − α ) and hence ( n − k )(1 − α ) > in contradiction to ( n − k )(1 − α ) ≤ | n − k | (1 − α ) ≤ | n − k | ≤ .Thus, m ≥ n (1 − α ) − .Next, we similarly show ℓ ≥ k (1 − α ) − . Again, we assume towards a contradiction that ℓ < k (1 − α ) − − and note k (1 − α ) − − ≥ α − − > . Now, on the one hand we get (cid:18) ξη (cid:19) − > (cid:16) − k (1 − α ) ( ℓ + 1) (cid:17) − = 2 k (1 − α ) ℓ + 1 ≥ k (1 − α ) k (1 − α ) − = 1 + 2 k (1 − α ) − > k (1 − α ) , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein but on the other hand (cid:18) ξη (cid:19) − = ηξ < − n (1 − α ) ( m + 1) ≤ − n (1 − α ) ( ⌈ n (1 − α ) ⌉ + 1) < − n (1 − α ) (cid:16) n (1 − α ) + 2 (cid:17) = 1 + 2 − n (1 − α )+1 , i.e., ( k − n )(1 − α ) > in contradiction to | n − k | ≤ . Thus, ℓ ≥ k (1 − α ) − − k (1 − α ) − .Using these estimates for m and ℓ , we can now bound the entries of T − i T j (cf. eq. (B.2)): We have (cid:12)(cid:12) kα − n ℓ (cid:12)(cid:12) ≤ kα − n ⌈ k (1 − α ) ⌉ < kα − n (cid:16) k (1 − α ) + 1 (cid:17) = 2 k − n + 2 kα − n ≤ + 2 k − n ≤ · and furthermore (cid:12)(cid:12) kα − n (cid:12)(cid:12) ≤ k − n ≤ , as well as (cid:12)(cid:12) − kα − n m (cid:12)(cid:12) = 2 kα − n | m | ≤ kα − n ⌈ n (1 − α ) ⌉ < kα − n (cid:16) n (1 − α ) + 1 (cid:17) = 2 ( k − n ) α + 2 kα − n ≤ α + 2 k − n ≤ + 2 . Finally, having in mind ≤ ℓm ≤ (cid:16) k (1 − α ) + 1 (cid:17) (cid:16) n (1 − α ) + 1 (cid:17) = 2 n (1 − α ) k (1 − α ) + 2 k (1 − α ) + 2 n (1 − α ) + 1 , as well as ℓ ≥ k (1 − α ) − > and m ≥ n (1 − α ) − > , we get (cid:12)(cid:12) k − nα − kα − n ℓm (cid:12)(cid:12) = 2 kα − n · (cid:12)(cid:12)(cid:12) n (1 − α )+ k (1 − α ) − ℓm (cid:12)(cid:12)(cid:12) ≤ kα − n · (cid:16)(cid:12)(cid:12)(cid:12) n (1 − α )+ k (1 − α ) + 2 k (1 − α ) + 2 n (1 − α ) + 1 − ℓm (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) − k (1 − α ) − n (1 − α ) − (cid:12)(cid:12)(cid:12)(cid:17) = 2 kα − n · h(cid:16) n (1 − α )+ k (1 − α ) + 2 k (1 − α ) + 2 n (1 − α ) + 1 − ℓm (cid:17) + (2 k (1 − α ) + 2 n (1 − α ) + 1) i ≤ kα − n · h n (1 − α )+ k (1 − α ) − (cid:16) k (1 − α ) − (cid:17) (cid:16) n (1 − α ) − (cid:17) + 2 · (cid:16) k (1 − α ) + 2 n (1 − α ) + 1 (cid:17)i = 2 kα − n · h · k (1 − α ) + 65 · n (1 − α ) − + 2 · (cid:16) k (1 − α ) + 2 n (1 − α ) + 1 (cid:17)i ≤ kα − n · (cid:16) · k (1 − α ) + 67 · n (1 − α ) + 2 (cid:17) = 2 k − n (cid:16)

67 + 67 · ( n − k )(1 − α ) + 2 · − k (1 − α ) (cid:17) ≤ (cid:0)

67 + 67 · + 2 (cid:1) = 70 816 . Thus, we have (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ + 2 + 2 + 70 816 = 70 976 =: K for all j ∈ M (1) i , as long as α = 1 and i = ( n, m, ε, ∈ I with n ≥ − α and m ≥ − . Case 2(b)(ii) : m ≤ − + 1 . Then we have ηξ < − n (1 − α ) ( m + 1) < and − k (1 − α ) ( ℓ − < ξη < . Hence, ℓ < and since | ℓ | ≥ − , we have ℓ ≤ − + 1 . Setting ˜ m := − m and ˜ ℓ := − ℓ and using − ηξ , − ξη instead of ηξ , ξη we get, with the same arguments as in the previous case, that ˜ m ≥ n (1 − α ) − and ˜ ℓ ≥ k (1 − α ) − , i.e. m ≤ − n (1 − α ) + 65 and ℓ ≤ − k (1 − α ) + 65 . Consequently, since mℓ = ˜ m ˜ ℓ and | m | = | ˜ m | , as well as | ℓ | = | ˜ ℓ | , we getthe same bounds for the matrix entries as in the previous case. Thus, (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ K .All in all, since the cases 2(b)(i) and 2(b)(ii) are the only ones possible—assuming that we are in case 2(b)—weget C (1) i ≤ K if α = 1 and if i = ( n, m, ε, satisﬁes n ≥ − α . Finally, in both of the cases from above, we sawthat ℓ ≤ − k (1 − α ) + 65 ≤ − (cid:6) k (1 − α ) (cid:7) + 66 or that ℓ ≥ k (1 − α ) − ≥ (cid:6) k (1 − α ) (cid:7) − . Consequently, we get for thewhole case 2(b) that M (1) i ⊂ f M with f M := n +5 [ λ = n − [ { λ } × ( {⌈ λ (1 − α ) ⌉ − , . . . , ⌈ λ (1 − α ) ⌉} ∪ {− ⌈ λ (1 − α ) ⌉ , . . . , − ⌈ λ (1 − α ) ⌉ + 66 } ) × {± } × { } ] and thus | M (1) i | ≤ | f M | ≤ · · · N , independent of i = ( n, m, ε, ∈ I , as long as α = 1 and n ≥ − α . Case 2(c) : α = 1 . In this case, the matrix T − i T j from equation (B.2) reduces to T − i T j = εβ (cid:18) k − n ℓ k − n k − n − k − n ℓm − k − n m (cid:19) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein and we have | m | ≤ G n = 1 , as well as | ℓ | ≤ G k = 1 . Thus, recalling | n − k | ≤ , we can easily bound all matrixelements uniformly: We have (cid:12)(cid:12) k − n ℓ (cid:12)(cid:12) = 2 k − n | ℓ | ≤ and (cid:12)(cid:12) k − n (cid:12)(cid:12) ≤ , as well as (cid:12)(cid:12) − k − n m (cid:12)(cid:12) ≤ k − n ≤ and ﬁnally (cid:12)(cid:12) k − n − k − n ℓm (cid:12)(cid:12) = 2 k − n | − ℓm | ≤ k − n (1 + | ℓ | · | m | ) ≤ · and thus (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ + 2 + 2 + 2 · = 160 =: K , independent of i = ( n, m, ε, ∈ I , as long as α = 1 .Furthermore, since we saw above that | k − n | ≤ for j = ( k, ℓ, β, ∈ M (1) i , we get M (1) i ⊂ n +5 [ λ = n − [ { λ } × {− , , } × {± } × { } ] and thus | M (1) i | ≤ · · N .All in all, the cases 2(a), 2(b) and 2(c) entail for i = ( n, m, ε, ∈ I that C (1) i ≤ K := ( K , if α = 1 , max { K , K } , if α = 1 and also | M (1) i | ≤ N := ( N , if α = 1 , max { N , N } , if α = 1 . Furthermore, putting cases 1 and 2 together yields for arbitrary i = ( n, m, ε, ∈ I that C i := sup j ∈ M i (cid:13)(cid:13) T − i T j (cid:13)(cid:13) = max { C (0) i , C (1) i } ≤ max { K, K } =: K and | M i | = | M (0) i ∪ M (1) i | ≤ | M (0) i | + | M (1) i | ≤ N + N =: N . As we saw above, this even holds for arbitrary i ∈ I (i.e., without assuming that the last component of i is ),since M i ′ = { j ′ | j ∈ M i } and since C i ′ = C i , cf. equation (B.1) and the ensuing paragraph.Next, we show that ∗ is ﬁnite: For i = ( n, m, ε, δ ) ∈ I and (cid:0) ξη (cid:1) ∈ S ( α ) i , we saw in Lemma 3.2 that | ( ξ, η ) | > n − .Since we clearly have | ( ξ, η ) | < for (cid:0) ξη (cid:1) ∈ S ( α )0 = ( − , , this implies that S ( α ) i ∩ S ( α )0 = ∅ can only hold if n < , i.e., if n ≤ . This implies ∗ ⊂ { } ∪ { ( n, m, ε, δ ) ∈ I | n ≤ }⊂ { } ∪ [ n =0 [ { n } × {− ⌈ n (1 − α ) ⌉ , . . . , ⌈ n (1 − α ) ⌉} × {± } × { , } ] , which is clearly a ﬁnite set. In fact, since (cid:6) n (1 − α ) (cid:7) ≤ n ≤ , we get | ∗ | ≤ · · · .Now, for i ∈ I , we have i ∗ ⊂ M i ∪ { } and thus | i ∗ | ≤ N . Furthermore, for an arbitrary i ∈ I = I ∪ { } we have | i ∗ | ≤ max { N , | ∗ |} and thus N S ( α ) < ∞ , i.e., S ( α ) is admissible.Moreover, for i ∈ I \ ∗ , we have / ∈ i ∗ and thus sup j ∈ i ∗ (cid:13)(cid:13) T − i T j (cid:13)(cid:13) = sup j ∈ M i (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ K . Next, for i ∈ ∗ ∩ I , we have sup j ∈ i ∗ (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ sup λ ∈ I ∩ ∗ (cid:2) max (cid:8)(cid:13)(cid:13) T − λ T (cid:13)(cid:13) , C λ (cid:9)(cid:3) ≤ K , for some ﬁxed constant K , since ∗ is ﬁnite. Finally, again by ﬁniteness of ∗ , we also get sup j ∈ ∗ (cid:13)(cid:13) T − T j (cid:13)(cid:13) ≤ K for a ﬁxed constant K . Thus, in total we get C S ( α ) = sup i ∈ I sup j ∈ i ∗ (cid:13)(cid:13) T − i T j (cid:13)(cid:13) ≤ max { K , K , K } < ∞ . All in all, we have shown that S ( α ) is an almost structured covering of R , as claimed. (cid:3) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Appendix C. The proof of Lemma 4.1

In this section, we provide the (highly technical and lengthy) proof of Lemma 4.1. For this proof, the followinglemma will turn out to be extremely useful.

Lemma C.1.

For f : R d → C and θ ∈ [0 , ∞ ) , deﬁne k f k θ := sup x ∈ R d (1 + | x | ) θ | f ( x ) | ∈ [0 , ∞ ] .Then, for each N ∈ [0 , ∞ ) and p ∈ (0 , ∞ ) , arbitrary β, L > and M ∈ R and all measurable f : R d → C wehave X k ∈ Z | βk + M | N Z βk + M + Lβk + M − L | f ( x ) | d x ! p ≤ p · N +3 · k f k p p ( N +2) · L p · (cid:0) L N (cid:1) · (cid:18) L + 1 β (cid:19) . ◭ Remark.

Note that (1 + θ ) N ≤ [2 · max { , θ } ] N ≤ N · max (cid:8) , θ N (cid:9) ≤ N · (cid:0) θ N (cid:1) for arbitrary θ ≥ , so that anapplication of the preceding lemma with N = λ for λ ∈ { , N } yields X k ∈ Z (1 + | βk + M | ) N Z βk + M + Lβk + M − L | f ( x ) | d x ! p ≤ N · X λ ∈{ ,N } X k ∈ Z | βk + M | λ Z βk + M + Lβk + M − L | f ( x ) | d x ! p ≤ N · X λ ∈{ ,N } p · λ +3 · k f k p p ( λ +2) · L p · (cid:0) L λ (cid:1) · (cid:18) L + 1 β (cid:19) ≤ p + N · N +3 · k f k p p ( N +2) · L p · (cid:0) L N (cid:1) · (cid:18) L + 1 β (cid:19) . (C.1)Here, the last step used that we have λ ≤ N and hence k f k p p ( λ +2) ≤ k f k p p ( N +2) for λ ∈ { , N } and furthermorethat L λ = 2 ≤ · (cid:0) L N (cid:1) for λ = 0 and trivially L λ ≤ · (cid:0) L N (cid:1) for λ = N . (cid:7) Proof.

Since otherwise the claim is trivial, we can assume k f k p ( N +2) < ∞ . We distinguish three cases for k ∈ Z : Case 1 : We have βk + M ≥ · L > . This implies x ≥ βk + M − L ≥ ( βk + M ) > for arbitrary x ∈ [ βk + M − L, βk + M + L ] and hence | f ( x ) | ≤ k f k p ( N +2) · (1 + | x | ) − p ( N +2) ≤ k f k p ( N +2) · (cid:18) βk + M ) (cid:19) − p ( N +2) ≤ (cid:18) (cid:19) p ( N +2) · k f k p ( N +2) · (1 + | βk + M | ) − p ( N +2) . This yields | βk + M | N Z βk + M + Lβk + M − L | f ( x ) | d x ! p ≤ (cid:18) (cid:19) N +2 k f k p p ( N +2) · (2 L ) p · (1 + | βk + M | ) − . (C.2) Case 2 : We have βk + M ≤ − · L < . This implies x ≤ βk + M + L ≤ ( βk + M ) < and hence | x | ≥ | βk + M | for arbitrary x ∈ [ βk + M − L, βk + M + L ] . This easily implies that estimate (C.2) also holdsin this case. Case 3 : We have | βk + M | ≤ · L . In this case, we have − · L ≤ βk + M ≤ · L and hence − L − Mβ ≤ k ≤ L − Mβ , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein which implies k ∈ Z ∩ h − Mβ − Lβ , − Mβ + Lβ i . But every (closed) interval I of length R ≥ satisﬁes | I ∩ Z | ≤ R ,so that there are at most Lβ possible values of k for which the present case is satisﬁed. Hence, X k ∈ Z | βk + M |≤ L | βk + M | N | {z } ≤ (10 L ) N Z βk + M + Lβk + M − L | f ( x ) | d x ! p ≤ N k f k pL ∞ · (cid:18) Lβ (cid:19) · L N · (2 L ) p ≤ p · N · k f k p p ( N +2) · L N + p · (cid:18) Lβ (cid:19) . All in all, we arrive at X k ∈ Z | βk + M | N Z βk + M + Lβk + M − L | f ( x ) | d x ! p ≤ p N k f k p p ( N +2) L N + p · (cid:18) Lβ (cid:19) + 2 p (cid:18) (cid:19) N +2 k f k p p ( N +2) L p · X k ∈ Z | βk + M |≥ L (1 + | βk + M | ) − . (C.3)Now, deﬁne g : R → [0 , ∞ ] , x P k ∈ Z (1 + | β ( k + x ) | ) − and note that g is -periodic and also that X k ∈ Z | βk + M |≥ L (1 + | βk + M | ) − ≤ X k ∈ Z (1 + | βk + M | ) − = X k ∈ Z (cid:18) (cid:12)(cid:12)(cid:12)(cid:12) β (cid:18) k + Mβ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:19) − = g ( M/β ) . Our next goal is to show g ( x ) ≤ β for all x ∈ R . Since g is -periodic, it suﬃces to consider x ∈ [0 , . Now,we again distinguish three cases regarding k ∈ Z : Case 1 : We have k ≥ β and hence β ( k + x ) ≥ βk ≥ . This implies X k ≥ /β (1 + | β ( k + x ) | ) − ≤ X k ≥ /β ( βk ) − = β − · X k ≥ /β k − . Now, note for arbitrary y > that for n ∈ Z ≥ y , we have n ≥ y > and hence n ≥ , which implies n + 1 ≤ n , sothat we get for z ∈ [ n, n + 1] the estimate z − ≥ ( n + 1) − ≥ (2 n ) − = n − / and hence X n ∈ Z ≥ y n − = X n ≥ y Z n +1 n n − d z ≤ X n ≥ y Z n +1 n z − d z ≤ · Z ∞ y z − d z = 4 · z − − (cid:12)(cid:12)(cid:12)(cid:12) ∞ z = y = 4 y . Thus, P k ≥ /β (1 + | β ( k + x ) | ) − ≤ β − · P k ≥ /β k − ≤ β − · /β = β . Case 2 : We have k ≤ − β − , which entails − ( k + 1) ≥ β . For x ∈ [0 , , this implies β ( k + x ) ≤ β ( k + 1) ≤ β · (cid:18) − β (cid:19) = − < and hence | β ( k + x ) | = − β ( k + x ) ≥ − β ( k + 1) > , so that we get X k ∈ Z ≤− β − (1 + | β ( k + x ) | ) − ≤ X k ∈ Z ≤− β − ( − β ( k + 1)) − ( with ℓ = − ( k +1) ) = X ℓ ∈ Z ≥ /β ( βℓ ) − ( as above ) ≤ β − · /β = 4 β . Case 3 : We have − β − ≤ k ≤ β and hence k ∈ Z ∩ h − β − , β i , so that there are at most β possiblevalues of k for which this case holds. Hence, X k ∈ Z − β − ≤ k ≤ β (1 + | β ( k + x ) | ) − ≤ (cid:18) β (cid:19) . Summarizing all three cases, we easily see g ( x ) ≤ β + β + 2 (cid:16) β (cid:17) = 2 + β for all x ∈ R , as claimed. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Returning to the proof of the claim of the lemma, we recall from equation (C.3) (and the displayed equationafter that) that we have X k ∈ Z | βk + M | N Z βk + M + Lβk + M − L | f ( x ) | d x ! p ≤ p · N k f k p p ( N +2) · L N + p · (cid:18) Lβ (cid:19) + 2 p (cid:18) (cid:19) N +2 k f k p p ( N +2) · L p · g (cid:18) Mβ (cid:19) ≤ p · N +2 · k f k p p ( N +2) · (cid:20) L N + p · (cid:18) Lβ (cid:19) + L p · g ( M/β ) (cid:21) ≤ p · N +2 · k f k p p ( N +2) · L p · (cid:20) L N · (cid:18) Lβ (cid:19) + (cid:18) β (cid:19)(cid:21) ≤ p · N +3 · k f k p p ( N +2) · L p · (cid:20) L N · (cid:18) Lβ (cid:19) + (cid:18) β (cid:19)(cid:21) ≤ p · N +3 · k f k p p ( N +2) · L p · (cid:0) L N (cid:1) · max (cid:26) Lβ , β (cid:27) ≤ p · N +3 · k f k p p ( N +2) · L p · (cid:0) L N (cid:1) · (cid:18) L + 1 β (cid:19) , which completes the proof. (cid:3) The proof of Lemma 4.1 will occupy the whole remainder of this section. In fact, we divide the remainder ofthis section into several subsections, each of which handles a certain subset of the whole set of pairs ( i, j ) ∈ I .Precisely, we deﬁne for ( e, d ) ∈ {± } × { , } the set I ( e,d ) := { ( n, m, ε, δ ) ∈ I | ε = e and δ = d } . Furthermore, we set I (0) := { } and L := { } ∪ ( {± } × { , } ) . Then I = U ℓ ∈ L I ( ℓ ) , so that sup i ∈ I X j ∈ I M (0) j,i ≤ X ℓ ∈ L sup i ∈ I ( ℓ X ℓ ∈ L X j ∈ I ( ℓ M (0) j,i ≤ X ℓ ,ℓ ∈ L sup i ∈ I ( ℓ X j ∈ I ( ℓ M (0) j,i (C.4)and likewise sup j ∈ I X i ∈ I M (0) j,i ≤ X ℓ ,ℓ ∈ L sup j ∈ I ( ℓ X i ∈ I ( ℓ M (0) j,i . (C.5)Now, each of the subsections of this section handles a speciﬁc choice of ℓ , ℓ ∈ L , which in principle are cases.Luckily, it will turn out that many of these cases can be handled completely analogously, so that the actual numberof subsections is smaller.We ﬁrst only consider the case ℓ , ℓ ∈ {± } × { , } . Then, I ( ℓ ) , I ( ℓ ) ⊂ I , so that ̺ j = ̺ i = ̺ and so that i ∈ I ( ℓ ) and j ∈ I ( ℓ ) are of the form i = ( n, m, ε, δ ) and j = ( ν, µ, e, d ) for certain n, ν ∈ N , m, µ ∈ Z with | m | ≤ G n and | µ | ≤ G ν and certain ε, e ∈ {± } and δ, d ∈ { , } . We will keep this convention throughout thesection, without mentioning it explicitly.In the remainder of the proof, the notation x + := ( x ) + := max { , x } for x ∈ R will be frequently useful. Weimmediately observe x + = max { , x } and min { , x } = 2 − ( − x ) + .Next, we collect two estimates concerning θ , θ that will frequently be useful: First, if C − ≤ η ≤ C for some C ≥ , then | ηξ | ≥ C − | ξ | ≥ C − · (1 + | ξ | ) and thus θ ( ηξ ) = min n | ηξ | M , (1 + | ηξ | ) − M o ≤ min n C M · | ξ | M , C M · (1 + | ξ | ) − M o ≤ C M · θ ( ξ ) (C.6)for arbitrary ξ ∈ R and M := max { M , M } .Finally, if η ≥ C for some C ∈ (0 , , then | ηξ | ≥ C · | ξ | ≥ C · (1 + | ξ | ) , so that θ ( ηξ ) = (1 + | ηξ | ) − K ≤ C − K · (1 + | ξ | ) − K = C − K · θ ( ξ ) ∀ ξ ∈ R . (C.7)Now, we properly start the proof of Lemma 4.1 by distinguishing the diﬀerent values of ℓ , ℓ ∈ L . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein C.1.

We have ℓ = ℓ = (1 , . For brevity, let ℓ := (1 , . Geometrically, the present case means that i, j ∈ I ( ℓ ) both belong to the right cone, i.e., ε = e = 1 and δ = d = 0 . Thus, we have T − j T i = (cid:18) − µ (cid:19) (cid:18) − ν

00 2 − να (cid:19) (cid:18) n

00 2 nα (cid:19) (cid:18) m (cid:19) = (cid:18) − µ (cid:19) (cid:18) n − ν

00 2 α ( n − ν ) (cid:19) (cid:18) m (cid:19) = (cid:18) − µ (cid:19) (cid:18) n − ν α ( n − ν ) m α ( n − ν ) (cid:19) = (cid:18) n − ν α ( n − ν ) m − n − ν µ α ( n − ν ) (cid:19) and hence, since α ( n − ν ) ≤ α ( n − ν ) + ≤ ( n − ν ) + and n − ν ≤ ( n − ν ) + , (cid:13)(cid:13) T − j T i (cid:13)(cid:13) ≤ · (cid:16) ( n − ν ) + + ω n,m,ν,µ (cid:17) ≤ · ( n − ν ) + · (1 + ω n,m,ν,µ ) for ω n,m,ν,µ := (cid:12)(cid:12)(cid:12) α ( n − ν ) m − n − ν µ (cid:12)(cid:12)(cid:12) , which ﬁnally yields (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ ≤ σ · σ · ( n − ν ) + · (1 + ω n,m,ν,µ ) σ . (C.8)On the other hand, with ̺, θ , θ as in equation (4.1), we have because of ̺ j = ̺ that | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ = | det T i | − · Z T i Q ̺ (cid:0) T − j ξ (cid:1) d ξ ( ξ = T i η ) = Z Q ̺ (cid:0) T − j T i η (cid:1) d η ( def. of Q, cf. Def. .

1) = Z / Z R ( − , (cid:18) η η (cid:19) · ̺ (cid:18) n − ν η (cid:0) α ( n − ν ) m − n − ν µ (cid:1) η + 2 α ( n − ν ) η (cid:19) d η d η (cid:0) ξ = η η in inner integral (cid:1) = Z η Z R ( − , ( ξ ) · θ (cid:0) n − ν η (cid:1) · θ (cid:16)(cid:16) α ( n − ν ) m − n − ν µ (cid:17) η +2 α ( n − ν ) ξη (cid:17) d ξ d η ≤ · Z / θ (cid:0) n − ν η (cid:1) · Z − (cid:16) η · (cid:12)(cid:12)(cid:12)(cid:16) α ( n − ν ) m − n − ν µ (cid:17) + 2 α ( n − ν ) ξ (cid:12)(cid:12)(cid:12)(cid:17) − K d ξ d η ( eq. (C.7) ) ≤ K +1 · Z / θ (cid:0) n − ν η (cid:1) d η · Z − (cid:16) (cid:12)(cid:12)(cid:12)(cid:16) α ( n − ν ) m − n − ν µ (cid:17) + 2 α ( n − ν ) ξ (cid:12)(cid:12)(cid:12)(cid:17) − K d ξ ( η =2 α ( n − ν ) m − n − ν µ +2 α ( n − ν ) ξ ) = 3 K +1 · α ( ν − n ) · Z / θ (cid:0) n − ν η (cid:1) d η · Z α ( n − ν ) m − n − ν µ +2 α ( n − ν ) α ( n − ν ) m − n − ν µ − α ( n − ν ) (1 + | η | ) − K d η ( eq. (C.6) ) ≤ K + M · α ( ν − n ) · θ (cid:0) n − ν (cid:1) · Z α ( n − ν ) m − n − ν µ +2 α ( n − ν ) α ( n − ν ) m − n − ν µ − α ( n − ν ) (1 + | η | ) − K d η . (C.9)Now, since the assumptions of Lemma 4.1 ensure K ≥ τ ( σ + 2) , an application of Lemma C.1 and of theassociated remark (with p = τ ∈ (0 , ∞ ) , β = 2 α ( n − ν ) > , N = σ ≥ , M = − n − ν µ ∈ R and L = 2 α ( n − ν ) > )yields X m ∈ Z (cid:16) (cid:12)(cid:12)(cid:12) α ( n − ν ) m + (cid:0) − n − ν µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:17) σ "Z α ( n − ν ) m + ( − n − ν µ ) +2 α ( n − ν ) α ( n − ν ) m +( − n − ν µ ) − α ( n − ν ) (1+ | η | ) − K d η τ ≤ p + N N +3 · (cid:13)(cid:13)(cid:13) (1 + |•| ) − K (cid:13)(cid:13)(cid:13) p p ( N +2) · L p · (cid:0) L N (cid:1) · (cid:18) L + 1 β (cid:19)(cid:16) k (1+ |•| ) − K k p p ( N +2) ≤ since K ≥ στ (cid:17) ≤ τ + σ · σ +3 · ατ ( n − ν ) · (cid:16) α ( n − ν ) σ (cid:17) · (cid:18) α ( n − ν ) α ( n − ν ) (cid:19) ≤ τ + σ · σ +3 · ατ ( n − ν ) · ασ · ( n − ν ) + · (cid:16) α ( ν − n ) (cid:17) ≤ τ + σ · σ +3 · ατ ( n − ν )+ ασ ( n − ν ) + · α · ( ν − n ) + ≤ τ +5 σ · ατ ( n − ν )+ ασ ( n − ν ) + + α · ( ν − n ) + . (C.10) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Consequently, we get for arbitrary j = ( ν, µ, , ∈ I ( ℓ ) the estimate X i ∈ I ( ℓ ) "(cid:18) w sj w si (cid:19) τ (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ | det T i | − Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ( eqs. (C.8) , (C.9) ) ≤ X n ∈ N σ · τ (2+ K + M ) · ( τs + τα )( ν − n )+ σ ( n − ν ) + · (cid:2) θ (cid:0) n − ν (cid:1)(cid:3) τ · X m ∈ Z "(cid:16) (cid:12)(cid:12)(cid:12) α ( n − ν ) m − n − ν µ (cid:12)(cid:12)(cid:12)(cid:17) σ · Z α ( n − ν ) m − n − ν µ +2 α ( n − ν ) α ( n − ν ) m − n − ν µ − α ( n − ν ) (1+ | η | ) − K d η ! τ ( eq. (C.10) ) ≤ τ +7 σ τ (2+ K + M ) · X n ∈ N (cid:16) ( τs + τα )( ν − n )+ σ ( n − ν ) + · (cid:2) θ (cid:0) n − ν (cid:1)(cid:3) τ · ατ ( n − ν )+ ασ ( n − ν ) + + α · ( ν − n ) + (cid:17) ≤ σ + τ (5+2 K +2 M ) · X n ∈ N (cid:16) τs ( ν − n )+ α · ( ν − n ) + + σ (1+ α )( n − ν ) + · (cid:2) θ (cid:0) n − ν (cid:1)(cid:3) τ (cid:17) . Now, observe θ (2 n − ν ) = min n ( n − ν ) M , (1 + 2 n − ν ) − M o ≤ min (cid:8) M ( n − ν ) , − M ( n − ν ) (cid:9) and hence τs ( ν − n )+ α · ( ν − n ) + + σ (1+ α )( n − ν ) + · (cid:2) θ (cid:0) n − ν (cid:1)(cid:3) τ ≤ ( −| ν − n | ( τM − τs − α ) ≤ − τc | ν − n | , if ν ≥ n, −| ν − n | ( τM + τs − σ (1+ α )) ≤ − τc | ν − n | , if ν ≤ n. Here, we used that M ≥ M (0)2 + c ≥ (1 + α ) στ − s + c , as well as M ≥ M (0)1 + c ≥ s + ατ + c by the assumptionsof Lemma 4.1. Thus, all in all, we arrive at X i ∈ I ( ℓ ) "(cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ≤ σ + τ (5+2 K +2 M ) · X n ∈ N − τc | ν − n | ≤ σ + τ (5+2 K +2 M ) · X ℓ ∈ Z − τc | ℓ | ≤ σ + τ (5+2 K +2 M ) / (cid:0) − − τc (cid:1) . Likewise, for the summation over j instead of i , we apply Lemma C.1 and the associated remark (using thechoices p = τ ∈ (0 , ∞ ) , β = 2 n − ν > , N = σ ≥ , M = 2 α ( n − ν ) m ∈ R and L = 2 α ( n − ν ) > ) to get X µ ∈ Z (cid:16) (cid:12)(cid:12)(cid:12) α ( n − ν ) m + (cid:0) − n − ν µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:17) σ Z α ( n − ν ) m + ( − n − ν µ ) +2 α ( n − ν ) α ( n − ν ) m +( − n − ν µ ) − α ( n − ν ) (1 + | η | ) − K d η ! τ ( ζ = − µ ) = X ζ ∈ Z (cid:16) (cid:12)(cid:12)(cid:12) n − ν ζ + 2 α ( n − ν ) m (cid:12)(cid:12)(cid:12)(cid:17) σ Z n − ν ζ +2 α ( n − ν ) m +2 α ( n − ν ) n − ν ζ +2 α ( n − ν ) m − α ( n − ν ) (1 + | η | ) − K d η ! τ ≤ p + N · N +3 · (cid:13)(cid:13)(cid:13) (1 + |•| ) − K (cid:13)(cid:13)(cid:13) p p ( N +2) · L p · (cid:0) L N (cid:1) · (cid:18) L + 1 β (cid:19) ( since K ≥ σ +2 τ ) ≤ τ + σ · σ +3 · ατ ( n − ν ) · (cid:16) σα ( n − ν ) (cid:17) · (cid:16) (1 − α )( ν − n ) + 2 ν − n (cid:17) ≤ τ +5 σ · ατ ( n − ν ) · σα · ( n − ν ) + · ( ν − n ) + , (C.11)where (cid:13)(cid:13)(cid:13) (1 + |•| ) − K (cid:13)(cid:13)(cid:13) p p ( N +2) ≤ , since K ≥ στ by the assumptions of Lemma 4.1. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Now we get as above for arbitrary i = ( n, m, , ∈ I ( ℓ ) that X j ∈ I ( ℓ ) "(cid:18) w sj w si (cid:19) τ (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ | det T i | − Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ( ∗ ) ≤ τ +7 σ · τ (2+ K + M ) · X ν ∈ N τs ( ν − n )+ σ · ( n − ν ) + + τα ( ν − n )+ ατ ( n − ν ) · (cid:2) θ (cid:0) n − ν (cid:1)(cid:3) τ · σα · ( n − ν ) + · ( ν − n ) + ≤ σ + τ (5+2 K +2 M ) · X ν ∈ N τs ( ν − n )+ σ (1+ α ) · ( n − ν ) + +( ν − n ) + · (cid:2) θ (cid:0) n − ν (cid:1)(cid:3) τ . Here, the step marked with ( ∗ ) is justiﬁed by equations (C.8), (C.9), and (C.11).As above, we observe τs ( ν − n )+ σ (1+ α ) · ( n − ν ) + +( ν − n ) + · (cid:2) θ (cid:0) n − ν (cid:1)(cid:3) τ ≤ ( −| ν − n | ( τM − τs − ≤ − τc | ν − n | , if ν ≥ n, −| ν − n | ( τM + τs − σ (1+ α )) ≤ − τc | ν − n | , if ν ≤ n, where we used that we have M ≥ M (0)1 + c ≥ τ + s + c and M ≥ M (0)2 + c ≥ (1 + α ) στ − s + c by the assumptionsof Lemma 4.1. Consequently, we conclude X j ∈ I ( ℓ ) "(cid:18) w sj w si (cid:19) τ (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ | det T i | − Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ≤ σ + τ (5+2 K +2 M ) · X ν ∈ N − τc | ν − n | ≤ σ + τ (5+2 K +2 M ) · X ℓ ∈ Z − τc | ℓ | ≤ σ + τ (5+2 K +2 M ) / (cid:0) − − τc (cid:1) . In summary, in this subsection, we have shown for C (1)0 := 2 σ + τ (5+2 K +2 M ) / (1 − − τc ) that sup i ∈ I (1 , X j ∈ I (1 , M (0) j,i ≤ C (1)0 and sup j ∈ I (1 , X i ∈ I (1 , M (0) j,i ≤ C (1)0 . C.2.

We have ℓ = (1 , and ℓ = (1 , . Geometrically, the present case means that i belongs to the top cone,while j belongs to the right cone, i.e., e = ε = 1 , δ = 1 and d = 0 . In this case, we have T − j T i = (cid:18) − µ (cid:19) (cid:18) − ν

00 2 − να (cid:19) (cid:18) (cid:19) (cid:18) n

00 2 nα (cid:19) (cid:18) m (cid:19) = (cid:18) − µ (cid:19) (cid:18) − ν

00 2 − να (cid:19) (cid:18) nα n (cid:19) (cid:18) m (cid:19) = (cid:18) − ν − − ν µ − να (cid:19) (cid:18) nα m nα n (cid:19) = (cid:18) nα − ν m nα − ν n − να − nα − ν µm − nα − ν µ (cid:19) . (C.12)As our ﬁrst step, we want to obtain an estimate for (cid:13)(cid:13) T − j T i (cid:13)(cid:13) .To this end, recall | m | ≤ G n = (cid:6) n (1 − α ) (cid:7) ≤ n (1 − α ) ; hence | nα − ν m | ≤ nα − ν + 2 n − ν ≤ · n − ν ≤ · ( n − ν ) + .Likewise, | nα − ν µ | ≤ nα − ν + 2 nα − να ≤ · α ( n − ν ) ≤ · α ( n − ν ) + ≤ · ( n − ν ) + and nα − ν ≤ n − ν ≤ ( n − ν ) + .Finally, setting κ := µ ν (1 − α ) and ι := m n (1 − α ) , (C.13)we have n − να − nα − ν µm = 2 n − να · (cid:16) − µ ν (1 − α ) m n (1 − α ) (cid:17) = 2 n − να · (1 − κι ) =: 2 n − να · λ n,m,ν,µ , and also | κ | = | µ | ν (1 − α ) ≤ ν (1 − α ) ν (1 − α ) = 1 + 2 − ν (1 − α ) ≤ and | ι | = | m | n (1 − α ) ≤ − n (1 − α ) ≤ . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein All in all, we have shown (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ ≤ (cid:16) · ( n − ν ) + + 2 n − να · | λ n,m,ν,µ | (cid:17) σ ≤ (cid:16) · ( n − ν ) + (cid:17) σ · (cid:0) n − να · | λ n,m,ν,µ | (cid:1) σ ≤ σ · σ · ( n − ν ) + · (cid:0) n − να · | λ n,m,ν,µ | (cid:1) σ . (C.14)Next, we consider the integral term occurring in M (0) j,i . Precisely, with ̺ and θ as in equation (4.1), we observe | det T i | − Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ = Z Q ̺ (cid:0) T − j T i η (cid:1) d η = Z / Z η − η ̺ (cid:18) nα − ν ( mη + η )2 n − να · λ n,m,ν,µ · η − nα − ν µη (cid:19) d η d η ( with ξ = η /η ) = Z / η Z − ̺ (cid:18) nα − ν ( m + ξ ) η η · (2 n − να λ n,m,ν,µ − nα − ν µξ ) (cid:19) d ξ d η ( since η ≤ ) ≤ · Z / Z − θ (cid:0) nα − ν ( m + ξ ) η (cid:1) · (cid:0) (cid:12)(cid:12) η · (cid:0) n − να λ n,m,ν,µ − nα − ν µξ (cid:1)(cid:12)(cid:12)(cid:1) − K d ξ d η ( ≤ η ≤ , cf. eqs. (C.6) , (C.7) ) ≤ K + M · Z / Z − θ (cid:0) nα − ν ( m + ξ ) (cid:1) · (cid:0) (cid:12)(cid:12) n − να λ n,m,ν,µ − nα − ν µξ (cid:12)(cid:12)(cid:1) − K d ξ d η ( eq. (C.13) ) ≤ K + M · Z − θ (cid:0) nα − ν ( m + ξ ) (cid:1) · (cid:16) (cid:12)(cid:12)(cid:12) n − να (1 − κι ) − α ( n − ν ) κξ (cid:12)(cid:12)(cid:12)(cid:17) − K d ξ. (C.15)As our next step, we derive several basic estimates for the quantities appearing in equation (C.15):(1) We have θ (cid:0) nα − ν · ( m + ξ ) (cid:1) ≤ M · min n , M ( n − ν ) o = 4 M · − M · ( ν − n ) + ∀ ξ ∈ [ − , . (C.16)To see this, we consider the cases | m | ≤ and | m | ≥ . In case of | m | ≤ , we have | m + ξ | ≤ | m | + | ξ | ≤ and thus θ (cid:0) nα − ν · ( m + ξ ) (cid:1) ≤ min n , (cid:12)(cid:12) nα − ν · ( m + ξ ) (cid:12)(cid:12) M o ≤ M · min n , M ( nα − ν ) o ( since nα ≤ n and M ≥ , as well as M ≤ M ) ≤ M · min n , M ( n − ν ) o , which is even slightly better than the estimate (C.16). Next, in case of | m | ≥ , we have | m | ≤ | m | − ≤ | m | − | ξ | ≤ | m + ξ | ≤ | m | + | ξ | ≤ | m | ≤ | m | ∀ ξ ∈ [ − , , (C.17)so that equation (C.6) yields θ (cid:0) nα − ν · ( m + ξ ) (cid:1) ≤ M · θ (cid:0) nα − ν · m (cid:1) ( cf. eq. (C.13) ) = 2 M · θ (cid:0) n − ν · ι (cid:1) ≤ M · min n , (cid:12)(cid:12) n − ν · ι (cid:12)(cid:12) M o ( since | ι |≤ ) ≤ M · min n , M · M ( n − ν ) o ( since M ≤ M ) ≤ M · min n , M ( n − ν ) o ∀ ξ ∈ [ − , . We have thus established equation (C.16) in both cases. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein (2) Next, in case of | κ | ≤ , we have (cid:12)(cid:12) n − να λ n,m,ν,µ − nα − ν µξ (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) n − να (1 − κι ) − α ( n − ν ) κξ (cid:12)(cid:12)(cid:12) = 2 n − να · (cid:12)(cid:12)(cid:12) − κι − − n (1 − α ) κξ (cid:12)(cid:12)(cid:12) ≥ n − να · (cid:16) − | κι | − − n (1 − α ) · | κξ | (cid:17) ( since − n (1 − α ) ≤ and | κ |≤ , as well as | ι |≤ and | ξ |≤ ) ≥ n − να · (cid:18) − − (cid:19) = 2 n − να ∀ ξ ∈ [ − , . (C.18)(3) Finally, we want to obtain an estimate similar to equation (C.18) also if | ι | ≤ . To this end, we additionallyassume α < and n ≥ − α , since this ensures − n (1 − α ) ≤ − and thus − n (1 − α ) ≤ . Consequently, (cid:12)(cid:12) n − να λ n,m,ν,µ − nα − ν µξ (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) n − να (1 − κι ) − α ( n − ν ) κξ (cid:12)(cid:12)(cid:12) ≥ n − να · (cid:16) − | κι | − − n (1 − α ) · | κξ | (cid:17) ( since | ι |≤ and − n (1 − α ) ≤ , as well as | ξ |≤ and | κ |≤ ) ≥ n − να · (cid:18) − − · (cid:19) = 2 n − να ∀ ξ ∈ [ − , . (C.19)For the last estimate above, we needed to assume α < . To avoid cumbersome case distinctions later on, we nowconsider the special case α = 1 , so that we can then assume α < for the remainder of the subsection.C.2.1. The special case α = 1 . Because of α = 1 , we simply have κ = µ and ι = m . Further, G n = (cid:6) n (1 − α ) (cid:7) = 1 for all n ∈ N , i.e., m, µ ∈ {− , , } . Consequently, we also get λ n,m,ν,µ = 1 − mµ ∈ { , , } and estimate (C.15)takes the form | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ≤ K + M · Z − θ (cid:0) n − ν ( m + ξ ) (cid:1) · (cid:0) (cid:12)(cid:12) n − ν [1 − µ ( m + ξ )] (cid:12)(cid:12)(cid:1) − K d ξ. (C.20)Finally, we get because of α = 1 and λ n,m,ν,µ ∈ { , , } from equation (C.14) that (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ ≤ (cid:16) · ( n − ν ) + + 2 n − να · | λ n,m,ν,µ | (cid:17) σ ≤ (cid:16) · ( n − ν ) + + 2 · n − ν (cid:17) σ ≤ σ · σ · ( n − ν ) + . (C.21)Next, we distinguish two subcases:(1) If n − ν ≤ , then | n − ν ( m + ξ ) | ≤ · n − ν since | m | ≤ and | ξ | ≤ . Hence θ (cid:0) n − ν ( m + ξ ) (cid:1) ≤ (cid:12)(cid:12) n − ν ( m + ξ ) (cid:12)(cid:12) M ≤ M · M ( n − ν ) = 2 M · − M | n − ν | ≤ M · − M | n − ν | . (2) Otherwise, n − ν ≥ , so that there are again two subcases:(a) If | m + ξ | ≥ , then ≤ | m + ξ | ≤ , so that equation (C.6) yields θ (cid:0) n − ν ( m + ξ ) (cid:1) ≤ M · θ (cid:0) n − ν (cid:1) ≤ M · (cid:0) (cid:12)(cid:12) n − ν (cid:12)(cid:12)(cid:1) − M ≤ M · − M | n − ν | . (b) Otherwise, | m + ξ | ≤ and hence | − µ ( m + ξ ) | ≥ − | µ ( m + ξ ) | ≥ − | m + ξ | ≥ , which implies (cid:0) (cid:12)(cid:12) n − ν [1 − µ ( m + ξ )] (cid:12)(cid:12)(cid:1) − K ≤ (cid:18) · n − ν (cid:19) − K ≤ K · − K | n − ν | . All in all, we have for all ξ ∈ [ − , that θ (cid:0) n − ν ( m + ξ ) (cid:1) · (cid:0) (cid:12)(cid:12) n − ν [1 − µ ( m + ξ )] (cid:12)(cid:12)(cid:1) − K ≤ ( M + K · − M | n − ν | , if n ≤ ν M + K · − min { M ,K }| n − ν | , if n ≥ ν = 2 M + K · − M ( ν − n ) + · − min { M ,K } ( n − ν ) + nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein and thus M (0) j,i = (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ( eqs. (C.20) and (C.21) ) ≤ sτ ( ν − n ) · σ · σ · ( n − ν ) + · h K + M · M + K · − M ( ν − n ) + · − min { M ,K } ( n − ν ) + i τ ≤ σ · τ (2+ K + M ) · ( −| n − ν | [ τ min { M ,K } + sτ − σ ] , if n − ν ≥ , −| n − ν | [ τM − sτ ] , if n − ν ≤ . But the assumptions of Lemma 4.1 ensure that M ≥ M (0)1 + c ≥ s + c and M , K ≥ στ − s + c , which entails τ min { M , K } + sτ − σ ≥ τ c , as well as τ M − sτ ≥ τ c , so that M (0) j,i ≤ σ · τ (2+ K + M ) · − τc | n − ν | for all i ∈ I ( ℓ ) and j ∈ I ( ℓ ) . Consequently, we get because of G n = G ν = 1 for all n, ν ∈ N that X i ∈ I ( ℓ M (0) j,i ≤ σ · τ (2+ K + M ) · ∞ X n =0 h − τc | n − ν | · · G n i ≤ · σ · τ (2+ K + M ) · X ℓ ∈ Z − τc | ℓ | ≤ · σ · τ (2+ K + M ) · − − τc =: C if α = 1 (C.22)for arbitrary j = ( ν, µ, e, d ) ∈ I ( ℓ ) . Exactly the same estimate also yields P j ∈ I ( ℓ M (0) j,i ≤ C for arbitrary i = ( n, m, ε, δ ) ∈ I ( ℓ ) , as long as α = 1 .C.2.2. The general case α ∈ [0 , . In this subsection, we ﬁrst consider two special cases and then the remaininggeneral case.

Case 1 : n ≤ − α . In this case, equation (C.16) yields θ (cid:0) nα − ν · ( m + ξ ) (cid:1) ≤ M · min n , M ( n − ν ) o ≤ M · M − α · − M ν ∀ ξ ∈ [ − , . Furthermore, equation (C.14) entails, because of | λ n,m,ν,µ | = | − κι | ≤ , that (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ ≤ σ · σ · ( n − ν ) + · (cid:0) · n − να (cid:1) σ ≤ σ · σ · ( n − ν ) + · (cid:16) · − α (cid:17) σ (cid:0) since ( n − ν ) + = n − ν ≤ n ≤ − α for n − ν ≥ and ( n − ν ) + =0 ≤ − α otherwise (cid:1) ≤ σ · σ − α · (cid:16) · − α (cid:17) σ ≤ σ · σ − α =: C . In combination with equation (C.15), we conclude M (0) j,i = (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ≤ C · sτ ( ν − n ) · (cid:20) K + M · Z − θ (cid:0) nα − ν ( m + ξ ) (cid:1) · (cid:0) (cid:12)(cid:12) n − να λ n,m,ν,µ − nα − ν µξ (cid:12)(cid:12)(cid:1) − K d ξ (cid:21) τ ≤ C · sτν · h · K + M · M · M − α · − M ν i τ = C · τν ( s − M ) · h · K + M · M · M − α i τ =: 2 τν ( s − M ) · C . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Since our assumptions imply M ≥ M (0)1 + c ≥ s + τ + c ≥ s + − ατ + c , we get − α + τ s − τ M ≤ − τ c and hence sup i =( n,m,ε,δ ) ∈ I ( ℓ with n ≤ / (1 − α ) X j ∈ I ( ℓ M (0) j,i ≤ C · ∞ X ν =0 X | µ |≤ G ν τν ( s − M ) (cid:0) since G ν = ⌈ ν (1 − α ) ⌉ ≤ ν (1 − α ) ≤ · ν (1 − α ) (cid:1) ≤ C · ∞ X ν =0 ν (1 − α + τs − τM ) ≤ C · ∞ X ν =0 − τcν = 6 C − − τc . (C.23)Furthermore, since τ ( s − M ) ≤ − α + τ s − τ M ≤ − τ c < , we also have sup j ∈ I ( ℓ X i =( n,m,ε,δ ) ∈ I ( ℓ with n ≤ / (1 − α ) M (0) j,i ≤ C · sup ν ∈ N τν ( s − M ) · X n ≤ − α X | m |≤ G n (cid:0) since G n = ⌈ (1 − α ) n ⌉ ≤ ⌈ ⌉ =8 (cid:1) ≤ C · (cid:18) − α (cid:19) · (1 + 2 · ≤ · C − α . (C.24)This completes our considerations for the special case n ≤ − α . In the remainder of this subsection, we can (andwill) thus assume n ≥ − α . Case 2 : We have (cid:2) | κ | ≥ (cid:3) ∧ [ | m | ≥ ∧ (cid:2) ( n ≤ ν ) ∨ (cid:0) | ι | ≥ (cid:1)(cid:3) , as well as n ≥ − α . We ﬁrst show that theseconditions imply θ (cid:0) nα − ν · ( m + ξ ) (cid:1) ≤ M · | ι | M · − M ( ν − n ) + · − M ( n − ν ) + = ( M · | ι | M · − M | n − ν | , if ν ≥ n M · | ι | M · − M | n − ν | , if n > ν ∀ ξ ∈ [ − , . (C.25)To establish equation (C.25), we ﬁrst note | m | ≤ | m + ξ | ≤ | m | , (cf. equation (C.17)) since | m | ≥ . Hence,equations (C.6) and (C.13) yield θ (cid:0) nα − ν ( m + ξ ) (cid:1) ≤ M · θ (cid:0) nα − ν | m | (cid:1) = 2 M · θ (cid:0) n − ν | ι | (cid:1) . Now, we distinguish the two cases that are suggested by equation (C.25):(1) In case of n ≤ ν , we get n − ν = − | n − ν | and thus θ (cid:0) nα − ν ( m + ξ ) (cid:1) ≤ M · θ (cid:0) n − ν | ι | (cid:1) ≤ M · (cid:0) n − ν · | ι | (cid:1) M = 2 M · | ι | M · − M | n − ν | = 2 M · | ι | M · − M ( ν − n ) + , which is even slightly better than equation (C.25).(2) In case of n > ν , we have | ι | ≥ , since we assume ( n ≤ ν ) ∨ (cid:0) | ι | ≥ (cid:1) . Consequently, ≤ | ι | ≤ ≤ , sothat equation (C.6) yields θ (cid:0) nα − ν ( m + ξ ) (cid:1) ≤ M · θ (cid:0) n − ν | ι | (cid:1) ≤ M M · θ (cid:0) n − ν (cid:1) ≤ M · (cid:0) n − ν (cid:1) − M ≤ M · − M | n − ν | ( since | ι |≥ ) ≤ M · M · | ι | M · − M | n − ν | ≤ M · | ι | M · − M | n − ν | = 2 M · | ι | M · − M ( n − ν ) + , which establishes equation (C.25) also in this case.We now properly start the proof: First, note that | ι | M ≤ M ≤ M , so that equation (C.25) yields the estimate θ (2 nα − ν ( m + ξ )) ≤ M · − M ( ν − n ) + · − M ( n − ν ) + for all ξ ∈ [ − , . In combination with equation (C.15), we nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein conclude | det T i | − Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ≤ K + M · Z − θ (cid:0) nα − ν ( m + ξ ) (cid:1) · (cid:16) (cid:12)(cid:12)(cid:12) n − να (1 − κι ) − α ( n − ν ) κξ (cid:12)(cid:12)(cid:12)(cid:17) − K d ξ ( with η =2 α ( n − ν ) κξ ) ≤ K +5 M − M ( ν − n ) + − M ( n − ν ) + · α ( ν − n ) | κ | Z α ( n − ν ) | κ |− α ( n − ν ) | κ | (cid:0) (cid:12)(cid:12) n − να − n − να κι − η (cid:12)(cid:12)(cid:1) − K d η ( since ι = m/ n (1 − α ) and | κ |≥ ) ≤ K +5 M α ( ν − n ) − M ( ν − n ) + − M ( n − ν ) + Z α ( n − ν ) | κ |− α ( n − ν ) | κ | (cid:16) (cid:12)(cid:12)(cid:12) n − να − α ( n − ν ) κm − η (cid:12)(cid:12)(cid:12)(cid:17) − K d η ( with ξ =2 n − να − α ( n − ν ) κm − η ) = 3 K +5 M α ( ν − n ) − M ( ν − n ) + − M ( n − ν ) + Z n − να − α ( n − ν ) κm +2 α ( n − ν ) | κ | n − να − α ( n − ν ) κm − α ( n − ν ) | κ | (1+ | ξ | ) − K d ξ. For brevity, let us set L := 2 α ( n − ν ) | κ | (which is independent of m ) and C := 6 σ · τ (4+ K +5 M ) , as well as Λ n,m,ν,µ := (cid:0) n − αν | λ n,m,ν,µ | (cid:1) σ = (cid:0) n − αν | − κι | (cid:1) σ ( eq. (C.13) ) = (cid:16) n − αν (cid:12)(cid:12)(cid:12) − n ( α − κm (cid:12)(cid:12)(cid:12)(cid:17) σ = (cid:16) (cid:12)(cid:12)(cid:12) n − αν − α ( n − ν ) κm (cid:12)(cid:12)(cid:12)(cid:17) σ . (C.26)In combination with equation (C.14), the preceding estimate yields X | m |≤ G n s.t. Case 2 holds M (0) j,i = X | m |≤ G n s.t. Case 2 holds (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ( eq. (C.14) ) ≤ C · τs ( ν − n )+ σ ( n − ν ) + · h − M ( ν − n ) + · − M ( n − ν ) + · α ( ν − n ) i τ · X m ∈ Z Λ n,m,ν,µ Z n − να − α ( n − ν ) κm + L n − να − α ( n − ν ) κm − L (1 + | ξ | ) − K d ξ ! τ ( ℓ = − sign( κ ) · m and eq. (C.26) ) = C · τ [ ( s + α )( ν − n )+ στ ( n − ν ) + − M ( ν − n ) + − M ( n − ν ) + ] · X ℓ ∈ Z (cid:16) (cid:12)(cid:12)(cid:12) n − να + 2 α ( n − ν ) | κ | ℓ (cid:12)(cid:12)(cid:12)(cid:17) σ Z α ( n − ν ) | κ | ℓ +2 n − να + L α ( n − ν ) | κ | ℓ +2 n − να − L (1 + | ξ | ) − K d ξ ! τ . Now, an application of Lemma C.1 and of the associated remark (with p = τ , N = σ , β = 2 α ( n − ν ) | κ | > and L = 2 α ( n − ν ) | κ | , as well as M = 2 n − να ) yields X ℓ ∈ Z (cid:16) (cid:12)(cid:12)(cid:12) n − να + 2 α ( n − ν ) | κ | ℓ (cid:12)(cid:12)(cid:12)(cid:17) σ Z α ( n − ν ) | κ | ℓ +2 n − να + L α ( n − ν ) | κ | ℓ +2 n − να − L (1 + | ξ | ) − K d ξ ! τ ≤ τ + σ · σ · (cid:13)(cid:13)(cid:13) (1+ |•| ) − K (cid:13)(cid:13)(cid:13) τ στ · (cid:16) α ( n − ν ) | κ | (cid:17) τ · (cid:16) h α ( n − ν ) | κ | i σ (cid:17) · (cid:18) α ( ν − n ) | κ | (cid:19) ( since K ≥ στ and ≤| κ |≤ ) ≤ σ +2 τ · σ · ατ ( n − ν ) · (cid:16) ασ ( n − ν ) (cid:17) · (cid:16) · α ( ν − n ) (cid:17) ≤ σ +2 τ · σ · ατ ( n − ν ) · ασ · ( n − ν ) + · α · ( ν − n ) + . All in all, we get for C := C · σ +2 τ · σ that X | m |≤ G n s.t. Case 2 holds M (0) j,i ≤ C · ατ ( n − ν ) · ασ · ( n − ν ) + · α · ( ν − n ) + · τ [ ( s + α )( ν − n )+ στ ( n − ν ) + − M ( ν − n ) + − M ( n − ν ) + ]= C · τs ( ν − n )+ α ( ν − n ) + +(1+ α ) σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + = C · ( − τ | ν − n | [ M − (1+ α ) στ + s ] , if n ≥ ν, − τ | ν − n | [ − s − ατ + M ] , if n ≤ ν ( since M ≥ M (0)1 + c and M ≥ M (0)2 + c ) ≤ C · − τc | ν − n | . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein As usual, this implies sup j ∈ I ( ℓ X i =( n,m,ε,δ ) ∈ I ( ℓ s.t. Case 2 holds M (0) j,i ≤ C · X ℓ ∈ Z − τc | ℓ | ≤ C − − τc . (C.27)In addition to the preceding inequality, we also need to estimate the corresponding expression where the sum istaken over j instead of over i . To this end, we set L := 2 α ( n − ν ) for brevity and estimate similar to the precedingcase Z − (cid:16) (cid:12)(cid:12)(cid:12) n − να (1 − κι ) − α ( n − ν ) κξ (cid:12)(cid:12)(cid:12)(cid:17) − K d ξ = 2 α ( ν − n ) | κ | · Z n − να (1 − κι )+2 α ( n − ν ) | κ | n − να (1 − κι ) − α ( n − ν ) | κ | (1 + | ζ | ) − K d ζ ( since ≤| κ |≤ ) ≤ · α ( ν − n ) · Z − n − να κι +2 n − να +2 α ( n − ν ) − n − να κι +2 n − να − α ( n − ν ) (1 + | ζ | ) − K d ζ ( since κ =2 ν ( α − µ ) = 4 · α ( ν − n ) · Z − n − ν µι +2 n − να + L − n − ν µι +2 n − να − L (1 + | ζ | ) − K d ζ. Now, a combination of equations (C.15) and (C.25) yields | det T i | − Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ≤ K +5 M · | ι | M · − M ( ν − n ) + − M ( n − ν ) + Z − (cid:16) (cid:12)(cid:12)(cid:12) n − να (1 − κι ) − α ( n − ν ) κξ (cid:12)(cid:12)(cid:12)(cid:17) − K d ξ ≤ K +5 M · | ι | M · α ( ν − n ) − M ( ν − n ) + − M ( n − ν ) + · Z − n − ν µι +2 n − να + L − n − ν µι +2 n − να − L (1 + | ζ | ) − K d ζ. In conjunction with equations (C.14) and (C.26), this entails M (0) j,i = (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ≤ σ · τ (4+ K +5 M ) · τ ( s + α )( ν − n )+ σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · | ι | τM · Λ n,m,ν,µ · "Z − n − ν µι +2 n − να + L − n − ν µι +2 n − να − L (1+ | ζ | ) − K d ζ τ . For brevity, set C := 6 σ · τ (4+ K +5 M ) and C := 2 τ +2 σ · σ · C and recall from equations (C.26) and(C.13) that Λ n,m,ν,µ = (cid:0) n − αν | − κι | (cid:1) σ = (cid:0) (cid:12)(cid:12) n − αν − n − ν µι (cid:12)(cid:12)(cid:1) σ = (cid:0) (cid:12)(cid:12) n − αν − n − ν | ι | sign ( ι ) µ (cid:12)(cid:12)(cid:1) σ . (C.28)We now invoke Lemma C.1 and the associated remark (with L = 2 α ( n − ν ) , N = σ , p = τ , M = 2 n − να and β = 2 n − ν | ι | ) to justify the following estimate: X | µ |≤ G ν s.t. Case 2 holds M (0) j,i ≤ C · τ ( s + α )( ν − n )+ σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · | ι | τM · X µ ∈ Z Λ n,m,ν,µ · "Z − n − ν ιµ +2 n − να + L − n − ν ιµ +2 n − να − L (1+ | ζ | ) − K d ζ τ ! ( eq. (C.28) and ℓ = − sign( ι ) µ ) = C · τ ( s + α )( ν − n )+ σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · | ι | τM · X ℓ ∈ Z (cid:0) (cid:12)(cid:12) n − αν + 2 n − ν | ι | · ℓ (cid:12)(cid:12)(cid:1) σ · "Z n − ν | ι | ℓ +2 n − να + L n − ν | ι | ℓ +2 n − να − L (1+ | ζ | ) − K d ζ τ ! ( Lem. C. and remark ) ≤ τ + σ · σ +3 · C · τ ( s + α )( ν − n )+ σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · | ι | τM · (cid:13)(cid:13)(cid:13) (1 + |•| ) − K (cid:13)(cid:13)(cid:13) τ στ · τ [1+ α ( n − ν )] · (cid:16) σ [1+ α ( n − ν )] (cid:17) · (cid:18) α ( n − ν ) n − ν | ι | + 2 ν − n | ι | (cid:19) ( since K ≥ στ ) ≤ C · τs ( ν − n )+(1+ α ) σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · | ι | τM · (cid:18)

1+ 2 (1 − α )( ν − n ) | ι | + 2 ν − n | ι | (cid:19) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Here, we note that we indeed have β > , since | m | ≥ > , so that ι = 0 . Now, recall | ι | ≤ and M ≥ τ , so that | ι | τM ≤ τM ≤ τM and furthermore | ι | τM | ι | = | ι | τM − ≤ τM − ≤ τM . Hence, we can continue the estimate from above as follows: X | µ |≤ G ν s.t. Case 2 holds M (0) j,i ≤ τM C · τs ( ν − n )+(1+ α ) σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · (cid:16) (1 − α )( ν − n ) +2 ν − n (cid:17) ≤ τM C · τs ( ν − n )+(1+ α ) σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · ( ν − n ) + . Now, set C := 2 τM C and observe τs ( ν − n )+(1+ α ) σ ( n − ν ) + − τM ( ν − n ) + − τM ( n − ν ) + · ( ν − n ) + = ( τs ( ν − n ) − τM | n − ν | · ν − n = 2 −| ν − n | ( τM − τs − , if n ≤ ν, τs ( ν − n )+(1+ α ) σ ( n − ν ) − τM | n − ν | = 2 −| ν − n | ( τs − (1+ α ) σ + τM ) , if n ≥ ν ≤ − τc | ν − n | , since the assumptions of Lemma 4.1 ensure M ≥ M (0)1 + c ≥ s + τ + c , as well as M ≥ M (0)2 + c ≥ (1 + α ) στ − s + c .All in all, we ﬁnally conclude sup i =( n,m,ε,δ ) ∈ I ( ℓ X j =( ν,µ,e,d ) ∈ I ( ℓ s.t. Case 2 holds M (0) j,i ≤ sup n ∈ N C · ∞ X ν =0 − τc | ν − n | ≤ C · X ℓ ∈ Z − τc | ℓ | ≤ C − − τc , (C.29)which completes our considerations in the present case. Case 3 : The remaining case, i.e., (cid:2) | κ | < (cid:3) ∨ [ | m | ≤ ∨ (cid:2) ( n > ν ) ∧ (cid:0) | ι | < (cid:1)(cid:3) , as well as n ≥ − α . Our ﬁrst stepis to show M (0) j,i ≤ C · τs ( ν − n ) · σ · ( n − ν ) + · (cid:0) n − να | λ n,m,ν,µ | (cid:1) σ · min n , τM ( n − ν ) o · min n , τK ( να − n ) o ( ∗ ) ≤ C · τs ( ν − n ) · σ · ( n − ν ) + · − τM ( ν − n ) + · ( σ − τK )( n − να ) + (C.30)for C := 6 σ · (cid:0) K +3 M · K (cid:1) τ and C := 6 σ · C . Furthermore, as an intermediate result of independent interest,we also show (cid:12)(cid:12) n − να λ n,m,ν,µ − nα − ν µξ (cid:12)(cid:12) ≥ n − να ∀ ξ ∈ [ − , . (C.31)Here, the step marked with ( ∗ ) in equation (C.30) used that | λ n,m,ν,µ | ≤ , so that Λ n,m,ν,µ = (cid:0) n − να | λ n,m,ν,µ | (cid:1) σ ≤ σ · σ ( n − να ) + . (C.32)To prove equations (C.30) and (C.31), we distinguish three subcases:(1) We have | m | ≤ . Because of n ≥ − α , this implies | ι | = 2 − (1 − α ) n | m | ≤ − (1 − α ) n ≤ − = 18 ≤ , so that equation (C.19) yields | n − να λ n,m,ν,µ − nα − ν µξ | ≥ n − να for all ξ ∈ [ − , , i.e., equation (C.31)holds. Hence, a combination of equations (C.14), (C.15), (C.31), and (C.16) yields M (0) j,i = (cid:18) w sj w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ≤ σ · τs ( ν − n ) · σ · ( n − ν ) + · (cid:0) n − να · | λ n,m,ν,µ | (cid:1) σ · " K +3 M · min n , M ( n − ν ) o · (cid:18)

1+ 2 n − να (cid:19) − K τ ≤ C · τs ( ν − n ) · σ · ( n − ν ) + · (cid:0) n − να · | λ n,m,ν,µ | (cid:1) σ · min n , τM ( n − ν ) o · min n , − τK ( n − να ) o . Thus, equations (C.30) and (C.31) are valid in this case.(2) We have | κ | < . In this case, equation (C.18) yields | n − να λ n,m,ν,µ − nα − ν µξ | ≥ n − να for all ξ ∈ [ − , .Then, validity of equations (C.30) and (C.31) follows just as in the previous case. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein (3) The remaining case, i.e., | κ | ≥ and | m | ≥ . Since we are in Case 3, this entails n > ν and | ι | < .Since we also have α < and n ≥ − α , equation (C.19) yields | n − να λ n,m,ν,µ − nα − ν µξ | ≥ n − να for all ξ ∈ [ − , , so that the desired estimates follow just as in the previous two cases.Now, we observe that n ≥ ν implies ( n − ν ) + = n − ν , as well as ( ν − n ) + = 0 and ﬁnally ( n − να ) + = n − να ,since n ≥ ν ≥ να . Consequently, equation (C.30) yields X i =( n,m,ε,δ ) ∈ I ( ℓ s.t. n ≥ ν and Case 3 holds M (0) j,i ≤ C · ∞ X n = ν X | m |≤ G n h τs ( ν − n ) · σ ( n − ν ) · ( σ − τK )( n − να ) i(cid:0) since G n = ⌈ n (1 − α ) ⌉ ≤ n (1 − α ) ≤ · n (1 − α ) (cid:1) ≤ C · ∞ X n = ν n (1 − α ) · τs ( ν − n ) · σ ( n − ν ) · ( σ − τK )( n − να ) = 6 C · ν ( τs − σ + ταK − ασ ) · ∞ X n = ν n (1 − α − τs +2 σ − τK ) ( eq. (C.34) ) ( ∗ ) ≤ C − − α − τs +2 σ − τK · ν ( τs − σ + ταK − ασ ) · ν (1 − α − τs +2 σ − τK ) ≤ C − − τc · ν (1 − α )(1+ σ − τK ) ≤ C − − τc . (C.33)Here, the last step used that K ≥ στ ≥ στ by the assumptions of Lemma 4.1, so that σ − τ K ≤ . Furthermore,the step marked with ( ∗ ) used that the assumptions of Lemma 4.1 imply K ≥ K + c ≥ − ατ + 2 στ − s + c and thus − α − τ s + 2 σ − τ K ≤ − τ c < and ﬁnally that ∞ X n = ν nφ = 2 νφ · ∞ X ℓ =0 ℓφ = 2 νφ − φ for arbitrary φ ∈ ( −∞ , . (C.34)To estimate the sum over j instead of over i , we observe again that n ≥ ν implies n ≥ ν ≥ αν . In combinationwith equation (C.30), this implies X j =( ν,µ,e,d ) ∈ I ( ℓ s.t. ν ≤ n and Case 3 holds M (0) j,i ≤ C · n X ν =0 X | µ |≤ G ν ( σ − τs )( n − ν ) · ( σ − τK )( n − να ) (cid:0) since G ν = ⌈ ν (1 − α ) ⌉ ≤ ν (1 − α ) ≤ · ν (1 − α ) (cid:1) ≤ C · n (2 σ − τs − τK ) · n X ν =0 ν (1 − α + τs − (1+ α ) σ + ταK ) . (C.35)To further estimate the right-hand side of this expression, we ﬁrst observe that g : [0 , ∞ ) → [0 , ∞ ) , x x · − x is diﬀerentiable with derivative g ′ ( x ) = 2 − x · (1 − x · ln 2) . Hence, g ′ ( x ) > for ≤ x < and g ′ ( x ) < for x > . Consequently, g attains its unique global maximum at x = . But we have ln 2 = ln 2 ≥ ln e = and thus g ( x ) ≤ g (cid:0) (cid:1) = · − ≤ e ≤ for all x ∈ [0 , ∞ ) . For arbitrary n ∈ N and φ > , this implies n · − φn = φ · (cid:0) φn · − φn (cid:1) = φ · g ( φn ) ≤ φ and thus ( n + 1) · − φn ≤ n · − φn ≤ φ ∀ n ∈ N and φ ∈ (0 , ∞ ) . Now, set β := 1 − α + τ s − (1 + α ) σ + τ αK for brevity and note n (2 σ − τs − τK ) · n X ν =0 βν ≤ ( n (2 σ − τs − τK ) · ( n + 1) · βn = ( n + 1) · n (1 − α )( σ +1 − τK ) , if β ≥ , n (2 σ − τs − τK ) · ( n + 1) , if β < ( ∗ ) ≤ ( ( n + 1) · − n (1 − α ) τc , if β ≥ , ( n + 1) · − nτc , if β < (cid:0) since ( n +1) · − φn ≤ φ (cid:1) ≤ ( − α ) τc , if β ≥ , τc , if β < ≤ − α ) τ c , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein where we recall that we assume α < in the present case. Furthermore, the step marked with ( ∗ ) used that theassumptions of Lemma 4.1 ensure K ≥ K + c ≥ − ατ + 2 στ − s + c ≥ στ − s + c and thus σ − τ s − τ K ≤ − τ c , aswell as K ≥ K + c ≥ στ + c , so that σ + 1 − τ K ≤ − τ c .By plugging this into equation (C.35), we obtain X j =( ν,µ,e,d ) ∈ I ( ℓ s.t. ν ≤ n and Case 3 holds M (0) j,i ≤ C · (cid:18) − α ) τ c (cid:19) ∀ i = ( n, m, ε, δ ) ∈ I ( ℓ ) . (C.36)Together, equations (C.33) and (C.36) take care of the case ν ≤ n , under the general assumptions of the currentcase. Hence, we only need to further consider the case ν > n , which we now do.Using estimate (C.30), we get X i =( n,m,ε,δ ) ∈ I ( ℓ s.t. n<ν and Case 3 holds M (0) j,i ≤ C · ν X n =0 X | m |≤ G n τs ( ν − n ) · − τM ( ν − n ) · ( σ − τK )( n − να ) + (cid:0) since G n = ⌈ (1 − α ) n ⌉ ≤ (1 − α ) n ≤ · (1 − α ) n (cid:1) ≤ C · ν X n =0 ( τs − τM )( ν − n ) ( σ − τK ) · ( n − να ) + · n (1 − α ) . We now divide the sum into the two parts were we know the sign of n − να . First, we observe that the assumptionsof Lemma 4.1 entail σ − τ K ≤ , so that ( σ − τK ) · ( n − να ) + ≤ . Consequently, X ≤ n ≤ να ( τs − τM )( ν − n ) ( σ − τK ) · ( n − να ) + · n (1 − α ) ≤ ν ( τs − τM ) · ⌊ να ⌋ X n =0 ( τM − τs +1 − α ) n ( eq. (C.37) and τM − τs +1 − α ≥ τM − τs> ) ( ∗ ) ≤ τM − τs +1 − α τM − τs +1 − α − · ν ( τs − τM ) · ⌊ να ⌋ ( τM − τs +1 − α ) ( since τM − τs +1 − α> and ⌊ να ⌋≤ να ) ≤ τM − τs +1 − α τM − τs +1 − α − · ν ( τs − τM ) · να ( τM − τs +1 − α ) ≤ − − ( τM − τs +1 − α ) ν (1 − α )( α + τs − τM ) ( since α + τs − τM ≤ and τM − τs +1 − α ≥ τM − τs ≥ ) ≤ − − = 2 . Here, the step marked with ( ∗ ) used that the geometric sum formula shows n X ℓ =0 φℓ = 2 ( n +1) φ − φ − ≤ ( n +1) φ φ − φ φ − · nφ = 11 − − φ · nφ =: C ( φ ) · nφ for arbitrary φ > . (C.37)Now, we consider the remaining part of the sum. To this end, we ﬁrst observe that the assumptions of Lemma4.1 entail M ≥ M := s + ατ . In conjunction with n ≤ ν , this implies − τ M ( ν − n ) ≤ − τ M ( ν − n ) and thus nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein − τM ( ν − n ) ≤ − τM ( ν − n ) . Consequently, X να . With another application of equation (C.30),we conclude X j =( ν,µ,e,d ) ∈ I ( ℓ s.t. n<ν and Case 3 holds M (0) j,i ≤ C · ∞ X ν = n X | µ |≤ G ν ( τs − τM )( ν − n ) · ( σ − τK )( n − να ) + (cid:0) since G ν = ⌈ ν (1 − α ) ⌉ ≤ ν (1 − α ) ≤ · ν (1 − α ) (cid:1) ≤ C · ∞ X ν = n (cid:16) ν (1 − α ) · ( τs − τM )( ν − n ) · ( σ − τK )( n − να ) + (cid:17) . As in the previous case, we now split the series into two parts, according to the sign of n − να . But ﬁrst, we observe bythe assumptions of Lemma 4.1 that K ≥ K := στ and hence ( σ − τK )( n − να ) + ≤ ( σ − τK )( n − να ) + = 2 − ( n − να ) + . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Now, for n ≤ ν ≤ (cid:4) nα (cid:5) ≤ nα , we have n − να ≥ and thus X n ≤ ν ≤⌊ n/α ⌋ (cid:16) ν (1 − α ) · ( τs − τM )( ν − n ) · ( σ − τK )( n − να ) + (cid:17) ≤ X n ≤ ν ≤⌊ n/α ⌋ (cid:16) ν (1 − α ) · ( τs − τM )( ν − n ) · − ( n − να ) (cid:17) = 2 n ( τM − τs − · X n ≤ ν ≤⌊ n/α ⌋ ν (1+ τs − τM ) ≤ n ( τM − τs − · ∞ X ν = n ν (1+ τs − τM ) ( eq. (C.34) and τs − τM < by the assump. of Lem. . ≤ − τs − τM · n ( τM − τs − · n (1+ τs − τM ) ( since τs − τM ≤− cτ by the assump. of Lem. . ≤ − − cτ . Finally, for the second part of the series, we have X ν> ⌊ n/α ⌋ (cid:16) ν (1 − α ) · ( τs − τM )( ν − n ) · ( σ − τK )( n − να ) + (cid:17) ≤ n ( τM − τs ) · ∞ X ν =1+ ⌊ n/α ⌋ ν (1 − α + τs − τM ) ( eq. (C.34) and − α + τs − τM < by the assump. of Lem. . ≤ − − α + τs − τM · n ( τM − τs ) · (1+ ⌊ n/α ⌋ ) · (1 − α + τs − τM ) (cid:0) since ⌊ nα ⌋ ≥ nα and − α + τs − τM ≤− cτ< by assump. of Lem. . (cid:1) ≤ − − cτ · n ( τM − τs ) · nα · (1 − α + τs − τM ) = 11 − − cτ · n · − αα · (1+ τs − τM ) ( since τs − τM ≤ by the assump. of Lem. . ≤ − − cτ . All in all, the preceding three displayed equations show for α > that sup i =( n,m,ε,δ ) ∈ I ( ℓ X j =( ν,µ,e,d ) ∈ I ( ℓ s.t. n<ν and Case 3 holds M (0) j,i ≤ · C − − cτ , (C.40)and in view of equation (C.39), this estimate also holds in case of α = 0 .Overall, our considerations in this subsection have established the bound sup i ∈ I ( ℓ X j ∈ I ( ℓ M (0) j,i ≤ C =: C (2)0 if α = 1 , cf. equation (C.22). Furthermore, in case of α ∈ [0 , , we have shown sup i ∈ I ( ℓ X j ∈ I ( ℓ M (0) j,i ≤ sup i ∈ I ( ℓ  X j ∈ I ( ℓ s.t. Case 1 holds M (0) j,i + X j ∈ I ( ℓ s.t. Case 2 holds M (0) j,i + X j ∈ I ( ℓ s.t. Case 3 holds M (0) j,i  ( eqs. (C.23) , (C.29) , (C.36) , (C.40) ) ≤ C − − τc + 2 C − − τc + (cid:18) C · (cid:18) − α ) τ c (cid:19) + 12 · C − − cτ (cid:19) =: C (2)0 . Note that the constant C (2)0 has a diﬀerent value depending on whether α = 1 or α < .Likewise, we have shown sup j ∈ I ( ℓ X i ∈ I ( ℓ M (0) j,i ≤ C =: C (3)0 if α = 1 , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein see again equation (C.22). In case of α ∈ [0 , , we have also shown sup j ∈ I ( ℓ X i ∈ I ( ℓ M (0) j,i ≤ sup j ∈ I ( ℓ  X i ∈ I ( ℓ s.t. Case 1 holds M (0) j,i + X i ∈ I ( ℓ s.t. Case 2 holds M (0) j,i + X i ∈ I ( ℓ s.t. Case 3 holds M (0) j,i  ( eqs. (C.24) , (C.27) , (C.38) , (C.33) ) ≤ · C − α + 2 C − − τc + (cid:18) · C + 6 C − − τc (cid:19) =: C (3)0 . Note as above that C (3)0 has a diﬀerent value depending on whether α = 1 or α < .Finally, we observe that (cid:16) C (2)0 (cid:17) /τ and (cid:16) C (3)0 (cid:17) /τ can be estimated solely in terms of α, τ , ω, c, K, H, M , M :Indeed, for arbitrary C ≥ , we have because of τ ≥ τ that C /τ ≤ [max { , C } ] /τ ≤ [max { , C } ] /τ = max n , C /τ o . Thus, if a constant C ≥ can be bounded only in terms of α, τ , ω, c, K, H, M , M , then so can C /τ . In particular, (cid:0) τc (cid:1) /τ ≤ max n , ( τ c ) − /τ o ≤ max n , ( τ c ) − /τ o =: Ω . Furthermore, using again that τ ≥ τ , we get ℓ τ ֒ → ℓ τ ,where the embedding does not increase the norm. Hence, (cid:18) − − τc (cid:19) /τ = ∞ X n =0 − τcn ! /τ ≤ ∞ X n =0 − τ cn ! /τ = (cid:18) − − τ c (cid:19) /τ =: Ω . Similarly, since − α ≥ for α < , we have (cid:16) − α (cid:17) /τ ≤ (cid:16) − α (cid:17) /τ =: Ω .Finally, using once more that τ ≥ τ , we see n X i =1 a i ! /τ ≤ ( n · max { a , . . . , a n } ) /τ ≤ n /τ · max n a /τ , . . . , a /τn o for arbitrary a , . . . , a n ≥ . Thus, if a /τ , . . . , a /τn can be estimated only in terms of α, τ , ω, c, K, H, M , M , thenso can ( P ni =1 a i ) /τ . All in all, we have shown that the set of all expressions/constants C ≥ for which C /τ canbe estimated only in terms of α, τ , ω, c, K, H, M , M is closed under multiplication and addition. Hence, it suﬃcesto show C /τi ≤ L i for i ∈ , where L i only depends on α, τ , ω, c, K, H, M , M .To this end, recall that στ ≤ ω and that M = max { M , M } only depends on M , M . Hence, recalling that theconstants C , . . . , C are only needed in case of α ∈ [0 , , we get C /τ = (cid:20) · σ · τ (2+ K + M ) · − − τc (cid:21) /τ ≤ /τ · Ω · στ · K + M ≤ /τ · Ω · ω · K + M =: L ,C /τ = 6 σ/τ · σ/τ − α ≤ ω · ω − α =: L ,C /τ = C /τ · · K + M · M · M − α ≤ L · · K + M · M · M − α =: L ,C /τ = 6 σ/τ · K +5 M ≤ ω · K +5 M =: L ,C /τ = C /τ · τ +2 στ +2 · τ + στ ≤ L · τ +2 ω +2 · τ + ω =: L ,C /τ = 6 σ/τ · K +5 M ≤ ω · K +5 M =: L ,C /τ = C /τ · τ +2+2 στ · τ + στ ≤ L · τ +2+2 ω · τ + ω =: L ,C /τ = 2 τ + M C /τ ≤ L · τ + M =: L ,C /τ = 6 σ/τ · K +3 M · K ≤ ω · K +3 M · K =: L ,C /τ = 6 σ/τ · C /τ ≤ ω · L =: L , where the constants L , . . . , L only depend on α, τ , ω, c, K, H, M , M . Taken together, these considerations easilyimply C (2)0 ≤ h C (2)00 i τ and C (3)0 ≤ h C (3)00 i τ , where C (2)00 and C (3)00 only depend on α, τ , ω, c, K, H, M , M . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Likewise, the constant C (1)0 = 2 σ + τ (5+2 K +2 M ) / (1 − − τc ) from Subsection C.1 can be estimated by h C (1)0 i /τ ≤ Ω · τ +7 στ +5+2 K +2 M ≤ Ω · τ +7 ω +5+2 K +2 M =: C (1)00 , where C (1)00 only depends on α, τ , ω, c, K, H, M , M .C.3. We have ℓ = (1 , and ℓ = (1 , . Geometrically, this case means that i belongs to the right cone, while j belongs to the upper cone. In this case, we have i = ( n, m, , and j = ( ν, µ, , and hence—because of R = R − —that T − ν,µ, , T n,m, , = (cid:16) R · A ( α ) ν,µ, (cid:17) − A ( α ) n,m, = (cid:16) A ( α ) ν,µ, (cid:17) − · R − · A ( α ) n,m, = (cid:16) A ( α ) ν,µ, (cid:17) − · (cid:16) RA ( α ) n,m, (cid:17) = T − ν,µ, , T n,m, , . Since furthermore ̺ ( ν,µ, , = ̺ ( ν,µ, , = ̺ and since the weight w ( n,m,ε,δ ) = 2 n is independent of ε, δ , we get M (0)( ν,µ, , , ( n,m, , = w sν,µ, , w sn,m, , ! τ (cid:0) (cid:13)(cid:13) T − ν,µ, , T n,m, , (cid:13)(cid:13)(cid:1) σ | det T n,m, , | − Z S ( α )( n,m, , ̺ ( ν,µ, , (cid:0) T − ν,µ, , ξ (cid:1) d ξ ! τ (cid:0) ζ = T − n,m, , ξ (cid:1) = w sν,µ, , w sn,m, , ! τ (cid:0) (cid:13)(cid:13) T − ν,µ, , T n,m, , (cid:13)(cid:13)(cid:1) σ (cid:18)Z Q ̺ ( ν,µ, , (cid:16) T − ν,µ, , T ( n,m, , ζ (cid:17) d ζ (cid:19) τ = w sν,µ, , w sn,m, , ! τ (cid:0) (cid:13)(cid:13) T − ν,µ, , T n,m, , (cid:13)(cid:13)(cid:1) σ (cid:18)Z Q ̺ ( ν,µ, , (cid:16) T − ν,µ, , T ( n,m, , ζ (cid:17) d ζ (cid:19) τ ( ξ = T n,m, , ζ ) = w sν,µ, , w sn,m, , ! τ (cid:0) (cid:13)(cid:13) T − ν,µ, , T n,m, , (cid:13)(cid:13)(cid:1) σ | det T n,m, , | − Z S ( α )( n,m, , ̺ ( ν,µ, , (cid:0) T − ν,µ, , ξ (cid:1) d ξ ! τ = M (0)( ν,µ, , , ( n,m, , . But Subsection C.2 shows under the assumptions of Lemma 4.1 that sup i ∈ I (1 , X j ∈ I (1 , M (0) j,i = sup n ∈ N sup | m |≤ G n X ν ∈ N X | µ |≤ G ν M (0)( ν,µ, , , ( n,m, , = sup n ∈ N sup | m |≤ G n X ν ∈ N X | µ |≤ G ν M (0)( ν,µ, , , ( n,m, , = sup i ∈ I (1 , X j ∈ I (1 , M (0) j,i ≤ C (2)0 ≤ h C (2)00 i τ and sup j ∈ I (1 , X i ∈ I (1 , M (0) j,i = sup ν ∈ N sup | µ |≤ G ν X n ∈ N X | m |≤ G n M (0)( ν,µ, , , ( n,m, , = sup ν ∈ N sup | µ |≤ G ν X n ∈ N X | m |≤ G n M (0)( ν,µ, , , ( n,m, , = sup j ∈ I (1 , X i ∈ I (1 , M (0) j,i ≤ C (3)0 ≤ h C (3)00 i τ . C.4.

We have ℓ = ℓ = (1 , . Geometrically, this case means that both i and j belong to the upper cone. In thiscase, we have i = ( n, m, , and j = ( ν, µ, , and hence T − ν,µ, , T n,m, , = (cid:16) R · A ( α ) ν,µ, (cid:17) − · (cid:16) R · A ( α ) n,m, (cid:17) = (cid:16) A ( α ) ν,µ, (cid:17) − · A ( α ) n,m, = T − ν,µ, , T n,m, , , as well as ̺ ν,µ, , = ̺ ν,µ, , = ̺ . This implies precisely as in the preceding subsection that M (0)( ν,µ, , , ( n,m, , = M (0)( ν,µ, , , ( n,m, , . Then, we use that Subsection C.1 shows under the assumptions of Lemma 4.1 that sup i ∈ I (1 , X j ∈ I (1 , M (0) j,i = sup i ∈ I (1 , X j ∈ I (1 , M (0) j,i ≤ C (1)0 ≤ h C (1)00 i τ and sup j ∈ I (1 , X i ∈ I (1 , M (0) j,i = sup j ∈ I (1 , X i ∈ I (1 , M (0) j,i ≤ C (1)0 ≤ h C (1)00 i τ . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein C.5.

We have ℓ , ℓ ∈ {− } × { , } . This case comprises all the cases considered in Subsections C.1–C.4, withthe only diﬀerence that geometrically the lower and left cones are considered instead of the upper and right cones.In this case, we have i = ( n, m, − , δ ) and j = ( ν, µ, − , d ) and hence T − ν,µ, − ,d T n,m, − ,δ = ( − · ( − · T − ν,µ, ,d T n,m, ,δ = T − ν,µ, ,d T n,m, ,δ , as well as ̺ ν,µ, − ,d = ̺ ν,µ, ,d = ̺ . As in Subsection C.3, this implies that M (0)( ν,µ, − ,d ) , ( n,m, − ,δ ) = M (0)( ν,µ, ,d ) , ( n,m, ,δ ) . Hence, depending on δ and d we get the same estimates as in Subsections C.1–C.4.C.6. We have ℓ ∈ {− } × { , } and ℓ ∈ { } × { , } . Geometrically this means that i belongs to the left orlower cone and j belongs to the right or upper cone. In this case, we have i = ( n, m, − , δ ) and j = ( ν, µ, , d ) andhence T − ν,µ, ,d T n,m, − ,δ = ( − · T − ν,µ, ,d T n,m, ,δ . Consequently, we get (cid:13)(cid:13)(cid:13) T − ν,µ, ,d T n,m, − ,δ (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) T − ν,µ, ,d T n,m, ,δ (cid:13)(cid:13)(cid:13) . Now, since we have ̺ ( − ξ ) = ̺ ( ξ ) for all ξ ∈ R and ̺ ( ν,µ, ,d ) = ̺ , we ﬁnally see Z Q ̺ ν,µ, ,d (cid:16) T − ν,µ, ,d T n,m, − ,δ ζ (cid:17) d ζ = Z Q ̺ ν,µ, ,d (cid:16) − T − ν,µ, ,d T n,m, ,δ ζ (cid:17) d ζ = Z Q ̺ ν,µ, ,d (cid:16) T − ν,µ, ,d T n,m, ,δ ζ (cid:17) d ζ. As before this implies M (0)( ν,µ, ,d ) , ( n,m, − ,δ ) = M (0)( ν,µ, ,d ) , ( n,m, ,δ ) and depending on δ and d we get the same estimatesas in Subsections C.1–C.4.C.7. We have ℓ ∈ { } × { , } and ℓ ∈ {− } × { , } . Geometrically this means that i belongs to the right orupper cone and j belongs to the left or lower cone. In this case, we have i = ( n, m, , δ ) and j = ( ν, µ, − , d ) andhence T − ν,µ, − ,d T n,m, ,δ = ( − · T − ν,µ, ,d T n,m, ,δ . Consequently, (cid:13)(cid:13)(cid:13) T − ν,µ, − ,d T n,m, ,δ (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) T − ν,µ, ,d T n,m, ,δ (cid:13)(cid:13)(cid:13) . Now, since ̺ ν,µ, − ,d = ̺ ν,µ, ,d = ̺ and since ̺ ( − ξ ) = ̺ ( ξ ) for all ξ ∈ R , we get Z Q ̺ ν,µ, − ,d ) (cid:16) T − ν,µ, − ,d T n,m, ,δ ζ (cid:17) d ζ = Z Q ̺ ν,µ, ,d (cid:16) − T − ν,µ, ,d T n,m, ,δ ζ (cid:17) d ζ = Z Q ̺ ν,µ, ,d (cid:16) T − ν,µ, ,d T n,m, ,δ ζ (cid:17) d ζ. As before this implies M (0)( ν,µ, − ,d ) , ( n,m, ,δ ) = M (0)( ν,µ, ,d ) , ( n,m, ,δ ) and depending on δ and d we get the same estimatesas in Subsections C.1–C.4.C.8. We have ℓ = 0 and ℓ ∈ {± } × { , } . In this case, we have for j = ( ν, µ, e, d ) ∈ I ( ℓ ) ⊂ I and i ∈ I ( ℓ ) = I (0) = { } that (cid:13)(cid:13) T − j T i (cid:13)(cid:13) = (cid:13)(cid:13) T − j (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13) e · (cid:16) A ( α ) ν,µ, (cid:17) − · R − d (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) A ( α ) ν,µ, (cid:17) − (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − ν − − ν µ − αν (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ · max (cid:8) , − ν | µ | (cid:9) . But because of | µ | ≤ G ν = ⌈ (1 − α ) ν ⌉ ≤ ⌈ ν ⌉ = 2 ν , we have − ν | µ | ≤ , which yields (cid:13)(cid:13) T − j T i (cid:13)(cid:13) ≤ .Next, recall that S ( α )0 = Q ′ = ( − , . Because of − ( − , = ( − , , this implies in case of d = 0 that T − j S ( α )0 = (cid:16) A ( α ) ν,µ, (cid:17) − ( − , = (cid:26)(cid:18) − ν − − ν µ − αν (cid:19) (cid:18) ξ ξ (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) ξ , ξ ∈ ( − , (cid:27) = (cid:8) ( η , η ) ∈ R (cid:12)(cid:12) η ∈ (cid:0) − − ν , − ν (cid:1) , η ∈ (cid:0) − − αν , − αν (cid:1) − µη (cid:9) . Likewise, since R = R − and since R ( − , = ( − , , we also get in case of d = 1 that T − j S ( α )0 = (cid:16) A ( α ) ν,µ, (cid:17) − R ( − , = (cid:16) A ( α ) ν,µ, (cid:17) − ( − , = (cid:8) ( η , η ) ∈ R (cid:12)(cid:12) η ∈ (cid:0) − − ν , − ν (cid:1) , η ∈ (cid:0) − − αν , − αν (cid:1) − µη (cid:9) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Consequently, we get in all cases that | det T | − · Z S ( α )0 ̺ j (cid:0) T − j ξ (cid:1) d ξ = | det T j | · Z T − j S ( α )0 ̺ ( η ) d η ≤ (1+ α ) ν · Z − ν − − ν θ ( η ) · Z − µη +2 − αν − µη − − αν (1 + | η | ) − K d η d η . But for | η | ≤ − ν , we have θ ( η ) ≤ | η | M ≤ − M ν , so that | det T | − · Z S ( α )0 ̺ (cid:0) T − j ξ (cid:1) d ξ ≤ − M ν · (1+ α ) ν · Z − ν − − ν Z − µη +2 − αν − µη − − αν d η d η ≤ · − M ν . All in all, this implies M (0) j, = (cid:18) w sj w s (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T (cid:13)(cid:13)(cid:1) σ · | det T | − Z S ( α )0 ̺ j (cid:0) T − j ξ (cid:1) d ξ ! τ ≤ σ · τsν · τ · − M τν , which yields X j ∈ I ( ℓ M (0) j, = ≤ τ + σ · ∞ X ν =0 X | µ |≤ G ν τν ( s − M ) ( since G ν = ⌈ (1 − α ) ν ⌉≤ (1 − α ) ν ≤ · (1 − α ) ν ) ≤ τ + σ · ∞ X ν =0 ν [ τ ( s − M )+(1 − α )] ≤ τ + σ · ∞ X ν =0 − ντc ≤ τ +2 σ − − τc =: C (4)0 , since the assumptions of Lemma 4.1 entail M ≥ M (0)1 + c ≥ s + τ + c ≥ s + − ατ + c .Likewise, we get sup j ∈ I ( ℓ X i ∈ I (0) M (0) j,i = sup j ∈ I ( ℓ M (0) j, ≤ X j ∈ I ( ℓ M (0) j, ≤ C (4)0 . Finally, we see as at the end of Subsection C.2 that h C (4)0 i /τ ≤ (cid:18) − − τ c (cid:19) /τ · τ +2+2 στ ≤ (cid:18) − − τ c (cid:19) /τ · τ +2+2 ω =: C (4)00 , where C (4)00 only depends on α, τ , ω, c, K, H, M , M .C.9. We have ℓ = 0 and ℓ ∈ {± } × { , } . In this case, we have for i = ( n, m, ε, δ ) ∈ I ( ℓ ) ⊂ I and j ∈ I ( ℓ ) = I (0) = { } that (cid:13)(cid:13) T − j T i (cid:13)(cid:13) = 1 + k T i k = 1 + (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) n nα m nα (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ · n , since | nα m | ≤ nα G n ≤ nα (cid:0) n (1 − α ) + 1 (cid:1) ≤ · n .Furthermore, we note λ d ( Q ) ≤ , since Q = Q ′ i = U ( − , ) ( − , ⊂ (cid:0) , (cid:1) × ( − , . Thus, | det T i | − · Z S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ = Z Q ̺ ( T i η ) d η ≤ · sup η ∈ Q ̺ ( T i η ) . (C.41)Now, we distinguish the cases δ = 0 and δ = 1 :(1) For δ = 0 , we have T i η = (cid:18) ε · n η ε · (2 nα mη + 2 nα η ) (cid:19) for η = ( η , η ) ∈ R . But for η ∈ Q , we have < η < and hence n η ≥ n / , so that we get ̺ ( T i η ) ≤ (1 + | ε · n η | ) − H ≤ H · − Hn , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein which yields | det T i | − · R S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ≤ · H · − Hn by virtue of equation (C.41).(2) For δ = 1 , we have T i η = (cid:18) ε · (2 nα mη + 2 nα η ) ε · n η (cid:19) for η = ( η , η ) ∈ R . Again, for η ∈ Q , we have n η ≥ n / and hence ̺ ( T i η ) ≤ (1 + | ε · n η | ) − H ≤ H · − Hn , which as above yields | det T i | − · R S ( α ) i ̺ j (cid:0) T − j ξ (cid:1) d ξ ≤ · H · − Hn .In total, we get for each case the estimate M (0)0 ,i = (cid:18) w s w si (cid:19) τ · (cid:0) (cid:13)(cid:13) T − T i (cid:13)(cid:13)(cid:1) σ · | det T i | − · Z S ( α ) i ̺ (cid:0) T − ξ (cid:1) d ξ ! τ ≤ − sτn · σ · nσ · τ · Hτ · − τHn ≤ σ +5 τ +2 Hτ · nτ ( στ − s − H ) . Thus, we get on the one hand X i ∈ I ( ℓ M (0)0 ,i ≤ σ +5 τ +2 Hτ · ∞ X n =0 X | m |≤ G n nτ ( στ − s − H ) (cid:0) since G n = ⌈ n (1 − α ) ⌉ ≤ n (1 − α ) ≤ · n (1 − α ) (cid:1) ≤ σ +5 τ +2 Hτ · ∞ X n =0 n (1 − α ) nτ ( στ − s − H ) ≤ σ +5 τ +2 Hτ · − − cτ =: C (5)0 , since the assumptions of Lemma 4.1 imply H ≥ H + c = − ατ + στ − s + c .Likewise, the summation over j yields sup i ∈ I ( ℓ X j ∈ I (0) M (0) j,i = sup i ∈ I ( ℓ M (0)0 ,i ≤ X i ∈ I ( ℓ M (0)0 ,i ≤ C (5)0 . Finally, we get as at the end of Subsection C.2 that h C (5)0 i /τ ≤ (cid:18) − − cτ (cid:19) /τ · τ +3 στ +5+2 H ≤ (cid:18) − − cτ (cid:19) /τ · τ +3 ω +5+2 H =: C (5)00 , where C (5)00 only depends on α, τ , ω, c, K, H, M , M .C.10. We have ℓ = ℓ = 0 . Here, the sum and the supremum reduce to a single term, namely to M (0)0 , = (cid:18) w s w s (cid:19) τ · (cid:0) (cid:13)(cid:13) T − T (cid:13)(cid:13)(cid:1) σ · | det T | − Z S ( α )0 ̺ (cid:0) T − ξ (cid:1) d ξ ! τ ( since Q ′ =( − , ) ≤ σ · Z Q ′ (1 + | ξ | ) − H d ξ ! τ ≤ σ · [ λ d ( Q ′ )] τ ≤ σ · τ =: C (6)0 , where [ C (6)0 ] /τ ≤ σ/τ · ≤ · ω =: C (6)00 .C.11. Completing the proof of Lemma 4.1.

By recalling equations (C.4) and (C.5) and by collecting our resultsfrom Subsections C.1–C.10, we ﬁnally conclude that max  sup j ∈ I X i ∈ I M (0) j,i , sup i ∈ I X j ∈ I M (0) j,i  ≤ · (cid:16) max n C (1)00 , C (2)00 , C (3)00 , C (4)00 , C (5)00 , C (6)00 o(cid:17) τ , given that the assumptions of Lemma 4.1 are fulﬁlled. This easily yields the claim of Lemma 4.1. (cid:3) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Appendix D. The proof of Proposition 6.2 in the general case

Recall that the parameter α for the deﬁnition of the α -shearlet smoothness spaces S p,qα,s (cid:0) R (cid:1) satisﬁes α ∈ [0 , ,as for the theory of α -molecules developed in [37] or as for α -curvelets[38]. In contrast, there is a deﬁnition ofcone-adapted β -shearlets (cf. [37, Deﬁnition 3.10]) for β ∈ (1 , ∞ ) .In this section we introduce so-called reciprocal β -shearlet smoothness spaces S p,qβ,s (cid:0) R (cid:1) which will turnout to be the smoothness spaces associated to β -shearlets. Our main goal is to show S p,qβ,s (cid:0) R (cid:1) = S p,qβ − ,s (cid:0) R (cid:1) for β ∈ (1 , ∞ ) , i.e., the reciprocal β -shearlet smoothness spaces coincide with the usual α -shearlet smoothness spacesfor α = β − . This will allow us to transfer approximation results that are known for β -shearlets to approximationresults for α -shearlets, which is not entirely trivial, since the two deﬁnitions diﬀer quite heavily for β = 2 , see alsothe discussion before Deﬁnition 5.6. Once this property from β -shearlets to α -shearlets is established, we use it toprove Proposition 6.2 for β ∈ (1 , .We begin with the deﬁnition of the reciprocal β -shearlet covering: Deﬁnition D.1.

For β ∈ (1 , ∞ ) , deﬁne J := J ( β )0 := { ( j, ℓ, δ ) ∈ N × Z × { , } | | ℓ | ≤ H j } with H j := H ( β ) j := ⌈ j ( β − ⌉ . Furthermore, recall the matrices S x , D ( α ) b and R from equation (1.5), and deﬁne Y j,ℓ,δ := Y ( β ) j,ℓ,δ := R δ · D (1 /β ) ( βj/ ) · S Tℓ for ( j, ℓ, δ ) ∈ J and P ′ j := P := U ( µ − ,µ ) ( − , ∪ (cid:0) − U ( µ − ,µ ) ( − , (cid:1) for j ∈ J with U ( γ,µ )( a,b ) as in equation (3.1) and with µ := µ ( β )0 := 3 · β/ .Finally, deﬁne J := J ( β ) := { } ⊎ J , set c j := 0 for all j ∈ J and Y := Y ( β )0 := id , as well as P ′ := ( − , .Then, the reciprocal β -shearlet covering is deﬁned as S ( β ) := ( S ( β ) j ) j ∈ J := (cid:0) Y ( β ) j P ′ j (cid:1) j ∈ J = (cid:0) Y ( β ) j P ′ j + c j (cid:1) j ∈ J . ◭ Remark.

The notation S ( β ) for the reciprocal β -shearlet covering might appear to be ambiguous with the notation S ( α ) for the α -shearlet covering introduced in Deﬁnition 3.1, but this is no real ambiguity: The parameter β in thepreceding deﬁnition always satisﬁes β ∈ (1 , ∞ ) , while the parameter α from Deﬁnition 3.1 satisﬁes α ∈ [0 , , sothat no ambiguity is possible. (cid:7) As for the usual α -shearlet covering, our ﬁrst goal is to show that S ( β ) is an almost structured covering of R .In this case, however, it will turn out to be useful to show the following slightly more general result: Lemma D.2.

Let β ∈ (1 , ∞ ) , a, b ∈ R and γ, µ, A ∈ (0 , ∞ ) be arbitrary and let U := U ( γ,µ )( a,b ) ∪ (cid:0) − U ( γ,µ )( a,b ) (cid:1) , as wellas U ′ := ( − A, A ) . Deﬁne U ′ j := U for j ∈ J and consider the family U := ( U j ) j ∈ J := (cid:0) Y ( β ) j U ′ j (cid:1) j ∈ J . Then there are constants N ∈ N and C, L ≥ (depending on β, a, b, γ, µ, A ) such that the following are true:(1) We have L − · β n ≤ | ξ | ≤ L · β n for all ξ ∈ U n,m,ε and arbitrary ( n, m, ε ) ∈ J .(2) We have | i ∗ | ≤ N for all i ∈ J and i ∗ := { j ∈ J | U j ∩ U i = ∅ } .(3) We have (cid:13)(cid:13) Y − i Y j (cid:13)(cid:13) ≤ C for all i ∈ J and j ∈ i ∗ . ◭ Proof.

The proof uses the same ideas as that of Lemma 3.3 and is only provided here for completeness.Set c := max {| a | , | b |} and note U ( γ,µ )( a,b ) ⊂ U ( γ,µ )( − c,c ) , so that we can assume a = − c and b = c , since the claim of thelemma is stronger the larger the set U ( γ,µ )( a,b ) is. By even further enlarging this set, we can also assume c ≥ . Withthe same reasoning, we can assume A ≥ .Next, note with U ( κ,λ )( B,C ) as in equation (3.1) that V ( κ,λ )( B,C ) := U ( κ,λ )( B,C ) ∪ (cid:16) − U ( κ,λ )( B,C ) (cid:17) = (cid:26)(cid:18) ξη (cid:19) ∈ R ∗ × R (cid:12)(cid:12)(cid:12)(cid:12) | ξ | ∈ ( κ, λ ) and ηξ ∈ ( B, C ) (cid:27) (D.1)for arbitrary B, C ∈ R and κ, λ > . It is now an easy consequence of equation (3.2) and of a = − c and b = c that U n,m, = V ( βn/ γ, βn/ µ )( n (1 − β ) / ( m − c ) , n (1 − β ) / ( m + c ) ) ∀ ( n, m, ∈ J . (D.2) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Now, since we have m + c ≤ | m | + c and m − c ≥ − | m | − c = − ( | m | + c ) , we get for arbitrary (cid:0) ξη (cid:1) ∈ U ( n,m, because of | m | ≤ (cid:6) n ( β − (cid:7) ≤ n ( β − + 1 that (cid:12)(cid:12)(cid:12)(cid:12) ηξ (cid:12)(cid:12)(cid:12)(cid:12) < n (1 − β ) ( | m | + c ) ≤ n (1 − β ) (cid:16) n ( β − + 1 + c (cid:17) ≤ c + 2 ≤ c. (D.3)Here, we used that n (1 − β ) ≤ , since β > . Consequently, we get γ · β n ≤ | ξ | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ξη (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | ξ | + | η | ≤ (1 + 3 c ) · | ξ | < β n · µc. This establishes the ﬁrst part of the lemma for L := max (cid:8) γ − , µc, (cid:9) , since we have U n,m, = R · U n,m, and | Rξ | = | ξ | for all ξ ∈ R .Now, let i = ( n, m, δ ) ∈ J be ﬁxed and let ( j, ℓ, ε ) ∈ J such that there is some (cid:0) ξη (cid:1) ∈ U n,m,δ ∩ U j,ℓ,ε = ∅ . Inthe following, we want to derive conditions on ( j, ℓ, ε ) which allow us to estimate the set i ∗ , as well as the norm (cid:13)(cid:13) Y − i Y j (cid:13)(cid:13) .First of all, set M := l β · log (cid:0) L (cid:1)m ∈ N , so that M ≥ β · log ( L ) and thus β M ≥ log ( L ) = L . Conse-quently, the ﬁrst part of the lemma implies L − · β j ≤ (cid:12)(cid:12)(cid:0) ξη (cid:1)(cid:12)(cid:12) ≤ L · β n and thus β ( j − n ) ≤ L ≤ β M , whichentails j − n ≤ M . By symmetry, we in fact get | j − n | ≤ M and thus j ∈ { n − M, . . . , n + M } .In order to establish further conditions on ( j, ℓ, ε ) , we distinguish several cases depending on ε, δ : Case 1 : We have ε = δ = 0 . In this case, equation (D.2) shows − β n ( m − c ) < ηξ < − β n ( m + c ) and − β j ( ℓ − c ) < ηξ < − β j ( ℓ + c ) . By rearranging, this implies for C := (cid:16) β − M + 1 (cid:17) · c that ℓ < β − ( j − n ) ( m + c ) + c ≤ β − ( j − n ) m + C , as well as ℓ > β − ( j − n ) ( m − c ) − c ≥ β − ( j − n ) m − C . Consequently, with Γ n,m,t := Z ∩ h β − ( t − n ) m − C , β − ( t − n ) m + C i , we have established ( j, ℓ, ε ) ∈ S n + Mt = n − M [ { t } × Γ n,m,t × { } ] . But since every (closed) interval I = [ B, D ] satisﬁes | I ∩ Z | ≤ D − B , we have | Γ n,m,t | ≤ C and thus |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }| ≤ n + M X t = n − M |{ t } × Γ n,m,t × { }| ≤ (1 + 2 M ) · (1 + 2 C ) . (D.4)Furthermore, a direct computation shows Y − n,m, Y j,ℓ, = β ( j − n ) j − n ℓ − β ( j − n ) m j − n ! . But thanks to | j − n | ≤ M , we have ≤ β ( j − n ) ≤ β M and ≤ j − n ≤ M . Finally, we saw above that (cid:12)(cid:12)(cid:12) ℓ − β − ( j − n ) m (cid:12)(cid:12)(cid:12) ≤ C , so that (cid:12)(cid:12)(cid:12) j − n ℓ − β ( j − n ) m (cid:12)(cid:12)(cid:12) = 2 j − n (cid:12)(cid:12)(cid:12) ℓ − β − ( j − n ) m (cid:12)(cid:12)(cid:12) ≤ M C . All in all, this implies (cid:13)(cid:13) Y − n,m, · Y j,ℓ, (cid:13)(cid:13) ≤ β M + 2 M + 2 M C and thus concludes our considerations for the presentcase. Case 2 : We have ε = 1 and δ = 0 . In this case, a direct calculation shows Y − n,m, Y j,ℓ, = (cid:18) ( j − βn ) ℓ ( j − βn ) ( βj − n ) − ( j − βn ) mℓ − ( j − βn ) m (cid:19) . (D.5) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein We immediately recall that | m | ≤ (cid:6) n ( β − (cid:7) ≤ n ( β − ≤ · n ( β − and likewise | ℓ | ≤ · j ( β − . In conjunctionwith | n − j | ≤ M and β > , this implies (cid:12)(cid:12)(cid:12) ( j − βn ) ℓ (cid:12)(cid:12)(cid:12) ≤ · ( j − βn ) j ( β − = 2 · β ( j − n ) ≤ · β M , (cid:12)(cid:12)(cid:12) ( j − βn ) (cid:12)(cid:12)(cid:12) ≤ ( j − n ) n (1 − β ) ≤ ( j − n ) ≤ M , (cid:12)(cid:12)(cid:12) − ( j − βn ) m (cid:12)(cid:12)(cid:12) ≤ · ( j − βn ) n ( β − = 2 · ( j − n ) ≤ · M . (D.6)In order to estimate the remaining entry of Y − n,m, Y j,ℓ, and to obtain an estimate similar to equation (D.4), wehave to work harder. To this end, deﬁne K := min n(cid:0) · c (cid:1) − , (cid:0) βM · c (cid:1) − o ∈ (0 , and n := 2 β − · log (cid:0) K − (cid:1) ∈ (0 , ∞ ) . (D.7)Based on these quantities, we now distinguish two subcases: Case 2(a) : We have n ≥ M + n . First note that this implies j ≥ n − M ≥ n . Furthermore, we have n ( β − = 2 log ( K − ) = K − and thus n ( β − ≥ K − and j ( β − ≥ K − . Next, note that equation (D.3) impliesbecause of (cid:0) ξη (cid:1) ∈ U n,m, that | η/ξ | < c . Likewise, since (cid:18) ηξ (cid:19) = R (cid:18) ξη (cid:19) ∈ R · U j,ℓ, = RR · U j,ℓ, = U j,ℓ, , (D.8)another application of equation (D.3) shows η = 0 and | ξ/η | < c , so that (3 c ) − < | η/ξ | < c .We now claim that this implies | m | > c . Indeed, if this was false, we would get from equation (D.2) because of n (1 − β ) ≤ K that (3 c ) − < (cid:12)(cid:12)(cid:12)(cid:12) ηξ (cid:12)(cid:12)(cid:12)(cid:12) < n (1 − β ) · ( | m | + c ) ≤ c · n (1 − β ) ≤ cK ≤ c · c = 12 · c < (3 c ) − , a contradiction. Because of | m | > c we either have m > c or m < − c . Let us now set C := 2 βM · c and distinguishthese two subcases: Case 2(a)(i) : We have m > c . We ﬁrst claim that this implies m ≥ n ( β − − C . To see this, assume towardsa contradiction that m < n ( β − − C . But equation (D.2) shows because of (cid:0) ξη (cid:1) ∈ U n,m, that < n (1 − β ) · ( m − c ) < ηξ < n (1 − β ) · ( m + c ) < n (1 − β ) · (cid:16) n ( β − − C + c (cid:17) . By taking reciprocals and by noting C ≥ c > c , we arrive at ξη > n ( β − n ( β − − C + c = 1 + C − c n ( β − − C + c > C − c n ( β − . But another application of equations (D.2) and (D.8) shows because of | ℓ | ≤ (cid:6) j ( β − / (cid:7) ≤ j ( β − / that ξη < j (1 − β ) ( ℓ + c ) ≤ j (1 − β ) (cid:16) j ( β − + 1 + c (cid:17) ≤ c j ( β − . A combination of the last two displayed equations ﬁnally yields C − c n ( β − < c j ( β − and thus C < c + 2 β − ( n − j ) · (1 + c ) ≤ c + 2 β − M · c ≤ c + 2 βM · c ≤ βM · c = C , a contradiction. Here, we used that | n − j | ≤ M and that c ≥ . This contradiction shows m ≥ n ( β − − C .Now, we claim similarly that ℓ ≥ j ( β − − C . To see this, assume towards a contradiction that ℓ < j ( β − − C .Recall from equation (D.2) and because of m > c that ηξ > n (1 − β ) · ( m − c ) > , so that also ξη > . Now, anapplication of equations (D.2) and (D.8) shows < ξη < j (1 − β ) · ( ℓ + c ) < j (1 − β ) · (cid:16) j ( β − − C + c (cid:17) . By taking reciprocals, we get as above because of C ≥ c > c that ηξ > j ( β − j ( β − − C + c = 1 + C − c j ( β − − C + c > C − c j ( β − . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein But equation (D.2) shows because of (cid:0) ξη (cid:1) ∈ U n,m, and since | m | ≤ (cid:6) n ( β − / (cid:7) ≤ n ( β − / that ηξ < n (1 − β ) ( m + c ) ≤ n (1 − β ) (cid:16) n ( β − + 1 + c (cid:17) ≤ c n ( β − . Again, by combining the preceding two displayed equations, we obtain a contradiction.We have thus shown ℓ ≥ j ( β − − C ≥ ⌈ j ( β − ⌉ − (1 + C ) ≥ ⌈ j ( β − ⌉ − (1 + ⌈ C ⌉ ) . Hence, setting C := 1 + ⌈ C ⌉ , we have shown for n ≥ M + n and m ≥ (which entails m > c ) that |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }| ≤ n + M X t = n − M |{ t } × {⌈ t ( β − ⌉ − C , . . . , ⌈ t ( β − ⌉} × { }|≤ (1 + 2 M ) · (1 + C ) . (D.9)Now, we can ﬁnally also estimate the remaining entry of the transition matrix Y − n,m, Y j,ℓ, (cf. equation (D.5)):Recall from the beginning of Case 2(a) and from equation (D.7) that j ( β − ≥ K − ≥ βM · c = C and likewisethat n ( β − ≥ C . Hence, ℓ ≥ j ( β − − C ≥ and similarly m ≥ , so that ≤ ℓm ≤ ⌈ j ( β − ⌉ · ⌈ n ( β − ⌉ ≤ (cid:16) j ( β − (cid:17) · (cid:16) n ( β − (cid:17) = 2 j ( β − n ( β − + 2 n ( β − + 2 j ( β − + 1 . Consequently, we get because of m ≥ n ( β − − C ≥ and ℓ ≥ j ( β − − C ≥ that (cid:12)(cid:12)(cid:12) ( βj − n ) − ( j − βn ) mℓ (cid:12)(cid:12)(cid:12) = 2 ( j − βn ) (cid:12)(cid:12)(cid:12) n ( β − j ( β − − mℓ (cid:12)(cid:12)(cid:12) ≤ ( j − βn ) · (cid:16)(cid:12)(cid:12)(cid:12) n ( β − j ( β − + 2 n ( β − + 2 j ( β − + 1 − mℓ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) n ( β − + 2 j ( β − + 1 (cid:12)(cid:12)(cid:12)(cid:17) = 2 ( j − βn ) · (cid:16) n ( β − j ( β − + 2 · n ( β − + 2 · j ( β − + 2 − mℓ (cid:17) ≤ ( j − βn ) · (cid:16) n ( β − j ( β − +2 · n ( β − +2 · j ( β − +2 − (cid:16) n ( β − − C (cid:17) (cid:16) j ( β − − C (cid:17)(cid:17) = 2 ( j − βn ) · (cid:16) (2 + C ) · n ( β − + (2 + C ) · j ( β − + 2 − C (cid:17) ≤ β ( j − n ) · j (1 − β ) · (cid:16) (2 + C ) · n ( β − + (2 + C ) · j ( β − + 2 (cid:17) = 2 β ( j − n ) · (cid:16) (2 + C ) · ( n − j ) β − + 2 + C + 2 · j (1 − β ) (cid:17) ( since | j − n |≤ M and β> ) ≤ β M · (cid:16) (2 + C ) · M β − + 2 + C + 2 (cid:17) =: C . (D.10)In conjunction with equation (D.6), this implies (cid:13)(cid:13) Y − n,m, Y j,ℓ, (cid:13)(cid:13) ≤ · β M + 3 · M + C . Case 2(a)(ii) : We have m < − c . Here, we set e m := − m and e ℓ := − ℓ and note that n (1 − β ) ( m − c ) < ηξ < n (1 − β ) ( m + c ) implies n (1 − β ) ( − m − c ) < − ηξ < n (1 − β ) ( − m + c ) , so that ( ξ, − η ) ∈ U n, − m, = U n, e m, . Likewise, it is not hard to see ( ξ, − η ) ∈ U j, − ℓ, = U j, e ℓ, , so that Case2(a)(i) shows (because of e m > c ) that e m ≥ n ( β − − C and e ℓ ≥ j ( β − − C ≥ ⌈ j ( β − ⌉ − C , which entails ℓ ≤ − ⌈ j ( β − ⌉ + C . Hence, we have shown for n ≥ M + n and m < (which entails m < − c ) that |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }| ≤ n + M X t = n − M |{ t } × {− ⌈ t ( β − ⌉ , . . . , − ⌈ t ( β − ⌉ + C } × { }|≤ (1 + 2 M ) · (1 + C ) , (D.11)as in the preceding case.Finally, because of ℓm = e ℓ · e m , we get (cid:12)(cid:12)(cid:12) ( βj − n ) − ( j − βn ) mℓ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ( βj − n ) − ( j − βn ) e m e ℓ (cid:12)(cid:12)(cid:12) ≤ C from equation(D.10) and thus (cid:13)(cid:13) Y − n,m, Y j,ℓ, (cid:13)(cid:13) ≤ · β M + 3 · M + C as in the previous case. Case 2(b) : We have n ≤ n + M . This implies j ≤ n + 2 M and | ℓ | ≤ ⌈ j ( β − ⌉ ≤ ⌈ β j ⌉ ≤ ⌈ β ( n +2 M ) ⌉ ,because of | n − j | ≤ M . On the one hand, this implies |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }| ≤ |{ , . . . , n + 2 M } × {− ⌈ β ( n +2 M ) ⌉ , . . . , ⌈ β ( n +2 M ) ⌉} × { }|≤ ( n + 2 M + 1) · (1 + 2 · ⌈ β ( n +2 M ) ⌉ ) (D.12) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein and on the other hand (cid:13)(cid:13) Y − n,m, Y j,ℓ, (cid:13)(cid:13) ≤ max n ′ ≤ M + n max | m ′ |≤ ⌈ n ′· ( β − / ⌉ max j ′ ≤ n +2 M max | ℓ ′ |≤ ⌈ j ′· ( β − / ⌉ (cid:13)(cid:13)(cid:13) Y − n ′ ,m ′ , · Y j ′ ,ℓ ′ , (cid:13)(cid:13)(cid:13) =: C . Case 3 : We have ε = δ = 1 . Here, we observe that U n,m, ∩ U j,ℓ, = R · ( U n,m, ∩ U j,ℓ, ) , so that U n,m, ∩ U j,ℓ, = ∅ if and only if U n,m, ∩ U j,ℓ, = ∅ . Consequently, we get from Case 1, equation (D.4) that |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }| = |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }| ≤ (1 + 2 M ) · (1 + 2 C ) . Likewise, since Y − n,m, · Y j,ℓ, = Y − n,m, · R − R · Y j,ℓ, = Y − n,m, · Y j,ℓ, , we get in case of U n,m, ∩ U j,ℓ, = ∅ that (cid:13)(cid:13) Y − n,m, · Y j,ℓ, (cid:13)(cid:13) = (cid:13)(cid:13) Y − n,m, · Y j,ℓ, (cid:13)(cid:13) ≤ β M + 2 M + 2 M C , since U n,m, ∩ U j,ℓ, = ∅ , cf. Case 1. Case 4 : We have ε = 0 and δ = 1 . As in the previous case, we observe U n,m, ∩ U j,ℓ, = R · ( U n,m, ∩ U j,ℓ, ) ,so that we can reduce the present case to the setting of Case 2, similar to what was done in Case 3. In view ofequations (D.9), (D.11) and (D.12), this implies |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }| = |{ ( j, ℓ, ∈ J | U j,ℓ, ∩ U n,m, = ∅ }|≤ max { (1 + 2 M ) · (1 + C ) , ( n + 2 M + 1) · (1 + 2 · ⌈ β ( n +2 M ) ⌉ ) } , as well as (cid:13)(cid:13) Y − n,m, Y j,ℓ, (cid:13)(cid:13) ≤ max n · β M + 3 · M + C , C o , provided that U n,m, ∩ U j,ℓ, = ∅ .It remains to consider the case i = 0 or j = 0 . Recall from the ﬁrst part of the lemma that | ξ | ≥ L − · β n for all ξ ∈ U n,m,ε . Conversely, for ξ ∈ U = U ′ , we have | ξ | ≤ A , so that U ∩ U n,m,ε = ∅ can only hold if β n ≤ AL ,i.e., if n ≤ j β · log (2 AL ) k =: n ∈ N . On the one hand, this implies because of | ℓ | ≤ ⌈ j ( β − ⌉ ≤ (cid:6) βj (cid:7) for ( j, ℓ, ε ) ∈ J that |{ j ∈ J | U j ∩ U = ∅ }| ≤ |{ } ∪ ( { , . . . , n } × {− ⌈ βn ⌉ , . . . , ⌈ βn ⌉} × {± } ) | ≤ · (1 + n ) · (1 + 2 · ⌈ βn ⌉ ) . On the other hand, we get in case of U ∩ U n,m,ε = ∅ for some ( n, m, ε ) ∈ J that (cid:13)(cid:13) Y − Y n,m,ε (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) β n

00 2 n (cid:19) · (cid:18) m (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) β n

00 2 n (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) · (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) m (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max n n , β n o · (2 + | m | ) ≤ β n · (cid:0) (cid:6) βn (cid:7)(cid:1) , as well as (cid:13)(cid:13) Y − n,m,ε Y (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − m (cid:19) · (cid:18) − β n

00 2 − n (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − m (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) · (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − β n

00 2 − n (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ | m | ≤ (cid:6) βn (cid:7) . Taken together, the preceding cases easily yield the claim of the lemma. (cid:3)

As a corollary of the preceding lemma, we can now easily show that the reciprocal β -shearlet covering is indeedan almost structured covering of R . Corollary D.3.

For every β ∈ (1 , ∞ ) , the family S ( β ) from Deﬁnition D.1 is an almost structured covering of R .Furthermore, if we set v n,m,ε := 2 β n for ( n, m, ε ) ∈ J and v := 1 , then the weight v s = (cid:0) v sj (cid:1) j ∈ J is S ( β ) -moderatefor arbitrary s ∈ R .Precisely, we have C S ( β ) ,v s ≤ K | s | for some absolute constant K = K ( β ) ≥ which also satisﬁes K − · v j ≤ | ξ | ≤ K · v j ∀ ξ ∈ S ( β ) j and all j ∈ J. ◭ Proof.

First of all, note that an application of Lemma D.2 with a = − , b = 3 , µ = µ ( β )0 = 3 · β/ and γ = µ − , aswell as A = 1 yields constants L, N, C satisfying L − · β n ≤ | ξ | ≤ L · β n for all ( n, m, ε ) ∈ J ( β )0 and all ξ ∈ S ( β ) n,m,ε ,as well as | j ∗ | ≤ N for all j ∈ J ( β ) and ﬁnally (cid:13)(cid:13) Y − i Y j (cid:13)(cid:13) ≤ C for all j ∈ J ( β ) and i ∈ j ∗ .Thus, since we have S ( β ) = (cid:0) Y j P ′ j + c j (cid:1) j ∈ J with (cid:8) P ′ j (cid:12)(cid:12) j ∈ J (cid:9) having only two elements, in order to establishthat S ( β ) is an almost structured covering of R it suﬃces to prove R = S j ∈ J T j R ′ j for R ′ := (cid:0) − , (cid:1) and nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein R ′ j := U ( − β/ , β/ ) ( − , ∪ (cid:20) − U ( − β/ , β/ ) ( − , (cid:21) , since clearly each R ′ j is open with R ′ j ⊂ P ′ j and since (cid:8) R ′ j (cid:12)(cid:12) j ∈ J (cid:9) is ﬁnite.But an analog of equation (3.2) (see equations (D.1) and (D.2) for more details) shows Y n,m, R ′ n,m, = V ( β ( n − / , β ( n +1) / )( n (1 − β ) / ( m − , n (1 − β ) / ( m +1) )= (cid:26)(cid:18) ξη (cid:19) ∈ R ∗ × R (cid:12)(cid:12)(cid:12)(cid:12) | ξ | ∈ (cid:16) β ( n − , β ( n +1) (cid:17) and ηξ ∈ (cid:16) n (1 − β ) ( m − , n (1 − β ) ( m + 1) (cid:17)(cid:27) for all ( n, m, ∈ J ( β )0 . But recalling the notation H n = H ( β ) n = (cid:6) n ( β − / (cid:7) , we see H n [ m = − H n (cid:16) n (1 − β ) ( m − , n (1 − β ) ( m + 1) (cid:17) = 2 n (1 − β ) · H n [ m = − H n ( m − , m + 1) ⊃ n (1 − β ) · ( − ⌈ n ( β − / ⌉ − , ⌈ n ( β − / ⌉ + 1) ⊃ n (1 − β ) · h − n ( β − / , n ( β − / i = [ − , and because of β > and since (cid:0) − / (cid:1) = < = (cid:0) (cid:1) , we also get ∞ [ n =0 (cid:16) β ( n − , β ( n +1) (cid:17) ⊃ (cid:16) − β , ∞ (cid:17) ⊃ (cid:16) − , ∞ (cid:17) ⊃ [3 / , ∞ ) . Taken together, this implies ∞ [ n =0 H n [ m = − H n Y n,m, R ′ n,m, ⊃ ∞ [ n =0 (cid:26)(cid:18) ξη (cid:19) ∈ R ∗ × R (cid:12)(cid:12)(cid:12)(cid:12) | ξ | ∈ (cid:16) β ( n − , β ( n +1) (cid:17) and ηξ ∈ [ − , (cid:27) ⊃ { ( ξ, η ) ∈ R ∗ × R | | ξ | ∈ [3 / , ∞ ) and | η | ≤ | ξ |} =: M and therefore also ∞ [ n =0 H n [ m = − H n Y n,m, R ′ n,m, = R · " ∞ [ n =0 H n [ m = − H n Y n,m, R ′ n,m, = { ( ξ, η ) ∈ R ∗ × R | | η | ∈ [3 / , ∞ ) and | ξ | ≤ | η |} =: M . Altogether, we see R = S j ∈ J Y j R ′ j , since for (cid:0) ξη (cid:1) ∈ R \ (cid:0) − , (cid:1) = R \ [ Y R ′ ] , there are only two cases: Case

1. We have | ξ | ≤ | η | . This implies | η | ≥ and thus (cid:0) ξη (cid:1) ∈ M , since otherwise (cid:0) ξη (cid:1) ∈ (cid:0) − , (cid:1) . Case

2. We have | η | ≤ | ξ | . This yields | ξ | ≥ and thus (cid:0) ξη (cid:1) ∈ M , since otherwise (cid:0) ξη (cid:1) ∈ (cid:0) − , (cid:1) .We have thus shown that S ( β ) is an almost structured covering of R , so that it remains to verify the part of thelemma related to the weight v .But for j = 0 and ξ ∈ S ( β )0 = ( − , , we simply have (2 + L ) − · v j ≤ v j = 1 ≤ | ξ | ≤ ≤ (2 + L ) · v j since L ≥ . Furthermore, for j = ( n, m, ε ) ∈ J ( β )0 , we have (2 + L ) − · v j ≤ L − · β n ≤ | ξ | ≤ | ξ | ≤ L · β n ≤ (1 + L ) · β n ≤ (2 + L ) · v j for all ξ ∈ S ( β ) j . Therefore, we have shown K − · v j ≤ | ξ | ≤ K · v j for all j ∈ J ( β ) and ξ ∈ S ( β ) j with K := 2 + L ,as claimed in the last part of the lemma.Finally, assume S ( β ) j ∩ S ( β ) i = ∅ . For an arbitrary ξ ∈ S ( β ) j ∩ S ( β ) i , this implies v i ≤ K · (1 + | ξ | ) ≤ K · v j and thus K − ≤ v i v j ≤ K by symmetry. This easily yields v si v sj ≤ K | s | , so that v s is S ( β ) -moderate, with C S ( β ) ,v s ≤ K | s | , asclaimed. (cid:3) Since we now know that S ( β ) is an almost structured covering of R and that v s is S ( β ) -moderate, we seeprecisely as in the remark after Deﬁnition 3.5 that the reciprocal β -shearlet smoothness spaces that we now deﬁneare well-deﬁned Quasi-Banach spaces. As for the unconnected α -shearlet smoothness spaces, the following deﬁnitionwill only be of transitory relevance, since we will immediately show that the newly deﬁned reciprocal β -shearletsmoothness spaces are identical with the previously deﬁned α -shearlet smoothness spaces, for α = β − . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Deﬁnition D.4.

For β ∈ (1 , ∞ ) , p, q ∈ (0 , ∞ ] and s ∈ R , we deﬁne the reciprocal β -shearlet smoothnessspace S p,qβ,s (cid:0) R (cid:1) associated to these parameters as S p,qβ,s (cid:0) R (cid:1) := D ( S ( β ) , L p , ℓ qv s ) , where the covering S ( β ) and the weight v s are deﬁned as in Deﬁnition D.1 and Corollary D.3, respectively. ◭ Next, we want to show S p,qβ,s (cid:0) R (cid:1) = S p,qβ − ,s (cid:0) R (cid:1) . To this end, we will utilize the general theory of embeddingsbetween decomposition spaces that was developed in [60]. The main prerequisite for an application of this theoryis to have a certain compatibility between the two relevant coverings. This compatibility is established in the nextlemma: Lemma D.5.

Let β ∈ (1 , ∞ ) and set α := β − ∈ (0 , . Then, for each i ∈ I ( α ) , there is some j = j i ∈ J ( β ) satisfying S ( α ) i ⊂ S ( β ) j . ◭ Remark.

The set P in Deﬁnition D.1 is chosen precisely to make the preceding lemma true. In general, one couldhave chosen P to be smaller. (cid:7) Proof.

For i = 0 , we clearly have S ( α )0 = ( − , = S ( β )0 , so that we can assume i = ( n, m, ε, δ ) ∈ I ( α )0 in thefollowing. Let us ﬁrst consider the case ε = 1 and δ = 0 . Deﬁne j := ⌊ αn ⌋ ∈ N and observe αn − < j ≤ αn .Recall the notation µ = 3 · β/ from Deﬁnition D.1 and note for arbitrary ℓ ∈ Z with | ℓ | ≤ H j that S ( β )( j,ℓ, ⊃ diag (cid:16) β j , j (cid:17) · (cid:18) ℓ (cid:19) U ( µ − ,µ ) ( − , = U ( βj/ · µ − , βj/ · µ ) (cid:18) j − β ( ℓ − , j − β ( ℓ +3) (cid:19) and S ( α ) i = U (2 n / , · n ) ( n ( α − ( m − , n ( α − ( m +1) ) , thanks to equation (3.2). Consequently, it suﬃces to show that we have (cid:0) βj/ · µ − , βj/ · µ (cid:1) ⊃ (2 n / , · n ) and that one can choose ℓ ∈ Z with | ℓ | ≤ H j such that (cid:16) j (1 − β ) / ( ℓ − , j (1 − β ) / ( ℓ + 3) (cid:17) ⊃ (cid:16) n ( α − ( m − , n ( α − ( m + 1) (cid:17) . (D.13)The ﬁrst of these inclusions is straightforward to verify: We have µ = 3 · β/ ≥ and j ≤ αn = β n , so that βj/ µ − ≤ · βj/ ≤ · n . Furthermore, since j > αn − , βj/ · µ ≥ β (2 αn − · µ = 2 n · − β · · β/ = 3 · n . Thus, all that remains is to show that one can choose ℓ suitably. To this end, let ℓ := j n ( α − β − j ( m − k ∈ Z and observe ℓ ≤ n ( α − β − j ( m − ≤ n ( α − β − j ( | m | − (cid:0) since | m |− ≤ ⌈ n (1 − α ) ⌉ − < n (1 − α ) (cid:1) ≤ ( β − j ≤ ⌈ ( β − j ⌉ = H ( β ) j . We now distinguish two cases:

Case 1 : We have ℓ ≥ − H ( β ) j . In this case, we set ℓ := ℓ and note | ℓ | ≤ H ( β ) j . Furthermore, we observe j (1 − β ) ( ℓ − < j (1 − β ) ℓ ≤ n ( α − ( m − . Finally, since we have ℓ + 1 > n ( α − β − j ( m − , we get j (1 − β ) ( ℓ + 3) > j (1 − β ) h n ( α − β − j ( m − i = 2 · j (1 − β ) + 2 n ( α − ( m − since − β< and j ≤ αn ) ≥ · αn (1 − β ) + 2 n ( α − ( m − · n ( α − + 2 n ( α − ( m −

1) = 2 n ( α − ( m + 1) . The last two displayed equations establish the desired inclusion (D.13), so that indeed S ( α ) i ⊂ S ( β ) j,ℓ, . Case 2 : We have ℓ < − H ( β ) j . This implies n ( α − ( m − ≤ − , since we would otherwise have ℓ = j n ( α − β − j ( m − k ≥ j − ( β − j k = − l ( β − j m = − H ( β ) j . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Consequently, we get for ℓ := − H ( β ) j ∈ Z that n ( α − ( m + 1) = 2 n ( α − ( m −

1) + 2 · n ( α − ≤ − · n ( α − ( since α − < and n ≥ j α ) ≤ − · j α ( α − = 2 j (1 − β ) h − j ( β − + 2 i(cid:16) since j β − ≥⌈ j β − ⌉− − ℓ − (cid:17) ≤ j (1 − β ) ( ℓ + 3) . Finally, recall | m | ≤ (cid:6) n (1 − α ) (cid:7) ≤ n (1 − α ) , so that n ( α − ( m − ≥ − n ( α − ( | m | + 1) ≥ − n ( α − (cid:16) n (1 − α ) + 2 (cid:17) = − − · n ( α − ( since α − < and n ≥ j α ) ≥ − − · j α ( α − = 2 j (1 − β ) · (cid:16) − j ( β − − (cid:17)(cid:16) since j β − ≤⌈ j β − ⌉ = − ℓ (cid:17) ≥ j (1 − β ) · ( ℓ − ≥ j (1 − β ) · ( ℓ − . We have thus again established the inclusion (D.13), so that S ( α ) i ⊂ S ( β )( j,ℓ, .Up to now, we have constructed for i = ( n, m, ε, δ ) ∈ I ( α )0 with ε = 1 and δ = 0 some ( j, ℓ, ∈ J ( β )0 with S ( α ) i ⊂ S ( β ) j , so that it remains to consider the general case ε ∈ {± } and δ ∈ { , } . But since the base-set P fromDeﬁnition D.1 satisﬁes P = − P , we have S ( β ) j = − S ( β ) j , so that S ( α ) n,m, − , = − S ( α ) n,m, , ⊂ − S ( β ) j,ℓ, = S ( β ) j,ℓ, , assuming S ( α ) n,m, , ⊂ S ( β ) j,ℓ, . Finally, assuming that S ( α ) n,m,ε, ⊂ S ( β ) j,ℓ, , we get S ( α ) n,m,ε, = R · S ( α ) n,m,ε, ⊂ R · S ( β ) j,ℓ, = S ( β ) j,ℓ, . This completes the proof. (cid:3)

Now, we can ﬁnally show that the reciprocal β -shearlet smoothness spaces are identical to the α -shearlet smooth-ness spaces from Section 3. Lemma D.6.

Let β ∈ (1 , ∞ ) , s ∈ R and p, q ∈ (0 , ∞ ] . Then S p,qβ,s (cid:0) R (cid:1) = S p,qβ − ,s (cid:0) R (cid:1) . ◭ Proof.

Set α := β − ∈ (0 , for brevity. As in the proof of Lemma 5.5, we want to invoke [60, Lemma 6.11, part (2)],with the choice P := S ( α ) and Q := S ( β ) , recalling that S p,qβ − ,s (cid:0) R (cid:1) = D (cid:0) S ( α ) , L p , ℓ qw s (cid:1) = F − (cid:2) D F (cid:0) S ( α ) , L p , ℓ qw s (cid:1)(cid:3) and likewise S p,qβ,s (cid:0) R (cid:1) = F − (cid:2) D F (cid:0) S ( β ) , L p , ℓ qv s (cid:1)(cid:3) .To this end, we ﬁrst have to verify that we have v sj ≍ w si if S ( α ) i ∩ S ( β ) j = ∅ and that the coverings S ( α ) and S ( β ) are weakly equivalent . This means that sup i ∈ I ( α ) (cid:12)(cid:12)(cid:12)n j ∈ J ( β ) (cid:12)(cid:12)(cid:12) S ( β ) j ∩ S ( α ) i = ∅ o(cid:12)(cid:12)(cid:12) < ∞ and sup j ∈ J ( β ) (cid:12)(cid:12)(cid:12)n i ∈ I ( α ) (cid:12)(cid:12)(cid:12) S ( α ) i ∩ S ( β ) j = ∅ o(cid:12)(cid:12)(cid:12) < ∞ . For the ﬁrst point, let K ≥ as in Corollary D.3, i.e., such that K − · v j ≤ | ξ | ≤ K · v j for all j ∈ J ( β ) andall ξ ∈ S ( β ) j . Likewise, Lemma 3.4 shows · w i ≤ | ξ | ≤ · w i for all i ∈ I ( α ) and ξ ∈ S ( α ) i . Consequently, if S ( α ) i ∩ S ( β ) j = ∅ , we can choose some ξ ∈ S ( α ) i ∩ S ( β ) j , so that (13 K ) − · w i ≤ K − · (1 + | ξ | ) ≤ v j ≤ K · (1 + | ξ | ) ≤ K · w i . Consequently, we get (13 K ) −| t | ≤ v tj w ti ≤ (13 K ) | t | ∀ t ∈ R if i ∈ I ( α ) and j ∈ J ( β ) with S ( α ) i ∩ S ( β ) j = ∅ . (D.14) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein It remains to show that S ( α ) and S ( β ) are weakly equivalent. To this end, let i ∈ I ( α ) be arbitrary and note fromLemma D.5 that S ( α ) i ⊂ S ( β ) j i for some j i ∈ J ( β ) . Thus, for arbitrary j ∈ J ( β ) with ∅ = S ( β ) j ∩ S ( α ) i = ∅ , we get ∅ ( S ( β ) j ∩ S ( α ) i ⊂ S ( β ) j ∩ S ( β ) j i and thus j ∈ j ∗ i . This implies sup i ∈ I ( α ) (cid:12)(cid:12)(cid:12)n j ∈ J ( β ) (cid:12)(cid:12)(cid:12) S ( β ) j ∩ S ( α ) i = ∅ o(cid:12)(cid:12)(cid:12) ≤ sup i ∈ I ( α ) | j ∗ i | ≤ sup j ∈ J ( β ) | j ∗ | = N S ( β ) < ∞ , since S ( β ) is an almost structured covering of R .For the second part of weak equivalence, we have to work harder: Let j ∈ J ( β ) be arbitrary. For each i ∈ I ( α ) with S ( α ) i ∩ S ( β ) j = ∅ , Lemma D.5 yields some j i ∈ J ( β ) satisfying S ( α ) i ⊂ S ( β ) j i . Hence, ∅ ( S ( α ) i ∩ S ( β ) j ⊂ S ( β ) j i ∩ S ( β ) j ,so that j i ∈ j ∗ . Thus, with [ S ( β ) j ] ∗ := S ℓ ∈ j ∗ S ( β ) ℓ , we have shown S ( α ) i ⊂ S ( β ) j i ⊂ [ S ( β ) j ] ∗ for arbitrary i ∈ I ( α ) with S ( α ) i ∩ S ( β ) j = ∅ .Now, we will need the easily veriﬁable identities | det T ( α ) i | = w αi and | det Y ( β ) j | = v β − j = v αj for i ∈ I ( α ) and j ∈ J ( β ) . To use these identities, set M j := n i ∈ I ( α ) (cid:12)(cid:12)(cid:12) S ( α ) i ∩ S ( β ) j = ∅ o , as well as C := min n λ (cid:16) ( − , (cid:17) , λ (cid:16) U ( − , ) ( − , (cid:17)o > and C := max n λ (cid:16) ( − , (cid:17) , λ ( P ) o > , with P as in Deﬁnition D.1. We clearly have P i ∈ M j S ( α ) i ≤ P i ∈ I ( α ) S ( α ) i ≤ N S ( α ) . But because of S ( α ) i ⊂ [ S ( β ) j ] ∗ for i ∈ M j , this implies < C · X i ∈ M j w αi = C · X i ∈ M j | det T ( α ) i | ≤ X i ∈ M j λ ( S ( α ) i ) = Z R X i ∈ M j S ( α ) i ( ξ ) d ξ ≤ N S ( α ) · λ (cid:0) [ S ( β ) j ] ∗ (cid:1) ≤ N S ( α ) · X ℓ ∈ j ∗ λ ( S ( β ) ℓ ) ≤ C N S ( α ) · X ℓ ∈ j ∗ | det Y ( β ) ℓ | = C N S ( α ) · X ℓ ∈ j ∗ v αℓ ( Corollary D. ≤ C N S ( α ) · | j ∗ | · K α ) · v αj (cid:0) eq. (D.14) and S ( α ) i ∩ S ( β ) j = ∅ for i ∈ M j (cid:1) ≤ C N S ( α ) N S ( β ) · K α ) · (13 K ) α · inf i ∈ M j w αi . Now, observe inf i ∈ M j w αi ≥ > , so that the preceding inequality shows that M j is ﬁnite with | M j | ≤ C − · C N S ( α ) N S ( β ) · K α ) · (13 K ) α , where the right-hand side is independent of j ∈ J ( β ) .We have thus veriﬁed the main requirements of [60, Lemma 6.11]. But since we want to apply that lemma alsoin case of p ∈ (0 , , we still have to verify the extra condition that P = S ( α ) = ( T ( α ) i Q ′ i ) i ∈ I ( α ) is almost subordinateto Q = S ( β ) = (cid:0) Y ( β ) j P ′ j (cid:1) j ∈ J ( β ) and that we have (cid:12)(cid:12)(cid:12) det h ( T ( α ) i ) − Y ( β ) j i(cid:12)(cid:12)(cid:12) . if S ( β ) j ∩ S ( α ) i = ∅ But Lemma D.5 shows that P = S ( α ) is subordinate (and thus also almost subordinate, cf. [60, Deﬁnition 2.10]) to Q = S ( β ) . Furthermore, in case of S ( β ) j ∩ S ( α ) i = ∅ , equation (D.14) yields (cid:12)(cid:12)(cid:12) det h ( T ( α ) i ) − Y ( β ) j i(cid:12)(cid:12)(cid:12) = (cid:0) w αi (cid:1) − · v αj ≍ , as desired. The claim now follows from [60, Lemma 6.11]. (cid:3) Now, we show that a suitable β -shearlet system generated by bandlimited functions yields a Banach frame forthe reciprocal shearlet smoothness spaces. We restrict ourselves to bandlimited functions, since this simpliﬁes theproof.But ﬁrst, we review the precise deﬁnition of a β -shearlet system from [37, Deﬁnition 3.10]. Deﬁnition D.7.

For c ∈ (0 , ∞ ) and β ∈ (1 , ∞ ) and given generators ϕ, ψ, θ ∈ L (cid:0) R (cid:1) , the cone-adapted β -shearlet system SH ( ϕ, ψ, θ ; c, β ) with sampling density c generated by ϕ, ψ, θ is deﬁned as SH ( ϕ, ψ, θ ; c, β ) := Φ ( ϕ ; c, β ) ∪ Ψ ( ψ ; c, β ) ∪ Θ ( θ ; c, β ) , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein where Φ ( ϕ ; c, β ) = (cid:8) ϕ ( • − ck ) (cid:12)(cid:12) k ∈ Z (cid:9) , Ψ ( ψ ; c, β ) = n j ( β +1) / · ψ (cid:0) S ℓ A β − , βj/ • − ck (cid:1) (cid:12)(cid:12)(cid:12) j ∈ N , k ∈ Z and ℓ ∈ Z with | ℓ | ≤ ⌈ j ( β − / ⌉ o , Θ ( θ ; c, β ) = n j ( β +1) / · θ (cid:16) S Tℓ e A β − , βj/ • − ck (cid:17) (cid:12)(cid:12)(cid:12) j ∈ N , k ∈ Z and ℓ ∈ Z with | ℓ | ≤ ⌈ j ( β − / ⌉ o , where A α,s = diag ( s, s α ) and e A α,s = diag ( s α , s ) , as well as S ℓ = ( ℓ ) for α ∈ [0 , , s ∈ (0 , ∞ ) and ℓ ∈ R . ◭ Proposition D.8.

Let ϕ, ψ ∈ S (cid:0) R (cid:1) with b ϕ, b ψ ∈ C ∞ c (cid:0) R (cid:1) and the following additional properties:(1) We have b ϕ ( ξ ) = 0 for all ξ ∈ [ − , .(2) We have b ψ ( ξ ) = 0 for all ξ ∈ P with P as in Deﬁnition D.1.(3) We have supp b ψ ⊂ R ∗ × R .Then, for β ∈ (1 , ∞ ) , p , q ∈ (0 , and s , s ∈ R with s ≤ s , there is some δ = δ ( β, p , q , s , s , ϕ, ψ ) > suchthat for every < δ ≤ δ , all p ∈ [ p , ∞ ] , all q ∈ [ q , ∞ ] and all s ∈ R with s ≤ s ≤ s , the family SH ( ϕ, ψ, θ ; δ, β ) forms a Banach frame for S p,qβ,s (cid:0) R (cid:1) , where θ := ψ ◦ R , i.e., θ ( x, y ) = ψ ( y, x ) .Precisely, this means with the coeﬃcient space C p,qv s as in Deﬁnition 2.8 (with Q = S ( β ) and w = v s ) and with γ [ i,k,δ ] :=  ϕ ( • − δk ) , if i = 02 j ( β +1) / · ψ (cid:0) S ℓ A β − , βj/ • − δk (cid:1) , if i = ( j, ℓ, , j ( β +1) / · θ (cid:16) S Tℓ e A β − , βj/ • − δk (cid:17) , if i = ( j, ℓ, for i ∈ J ( β ) and k ∈ Z that the following hold:(1) For each < δ ≤ , the analysis map A ( δ ) : S p,qβ,s ( R ) → C p,qv s , f (cid:16) h f, γ [ i,k,δ ] i Z ′ ( R ) ,Z ( R ) (cid:17) i ∈ J ( β ) , k ∈ Z is well-deﬁned and bounded.(2) For all < δ ≤ δ , there is a bounded linear reconstruction map R ( δ ) : C p,qv s → S p,qβ,s ( R ) satisfying R ( δ ) ◦ A ( δ ) = id S p,qβ,s ( R ) .(3) We have the following consistency statement: If f ∈ S p,qβ,s ( R ) and if p ∈ [ p , ∞ ] and q ∈ [ q , ∞ ] and s ≤ r ≤ s , then we have the following equivalence: f ∈ S p ,q β,r (cid:0) R (cid:1) ⇐⇒ A ( δ ) f ∈ C p ,q v r . ◭ Proof.

We want to verify that Theorem 2.9 applies in the current setting, i.e., with Q = S ( β ) = (cid:0) Y ( β ) j P ′ j (cid:1) j ∈ J ( β ) . Tothis end, we ﬁrst recall the notation introduced in Assumption 2.7: If we set n := 2 and Q (1)0 := P with µ = 3 · β/ and P = U ( µ − ,µ ) ( − , ∪ (cid:0) − U ( µ − ,µ ) ( − , (cid:1) as in Deﬁnition D.1, as well as Q (2)0 := ( − , and ﬁnally k j := 1 for j ∈ J ( β )0 and k := 2 , then we have P ′ j = Q ( k j )0 for all j ∈ J ( β ) .Now, we set γ (0)1 := ψ and γ (0)2 := ϕ , as well as ε := 1 . With these choices, we want to verify the prerequisites ofTheorem 2.9. We clearly have γ (0) k , F γ (0) k ∈ S (cid:0) R (cid:1) ⊂ W , (cid:0) R (cid:1) ∩ W , ∞ (cid:0) R (cid:1) ∩ C ∞ (cid:0) R (cid:1) and all partial derivatives ofthese functions are (polynomially) bounded, so that the ﬁrst two prerequisites of Theorem 2.9 clearly hold. Next, ourassumptions on b ϕ, b ψ ensure that F γ (0)1 ( ξ ) = b ψ ( ξ ) = 0 for all ξ ∈ P = Q (1)0 and likewise that F γ (0)2 ( ξ ) = b ϕ ( ξ ) = 0 for all ξ ∈ [ − , = Q (2)0 .Consequently, since we are interested in the decomposition space S p,qβ,s (cid:0) R (cid:1) = D (cid:0) S ( β ) , L p , ℓ qv s (cid:1) in R d = R , itremains to verify C := sup i ∈ J ( β ) X j ∈ J ( β ) M j,i < ∞ and C := sup j ∈ J ( β ) X i ∈ J ( β ) M j,i < ∞ , where M j,i := (cid:18) v sj v si (cid:19) τ · (cid:0) (cid:13)(cid:13) Y − j Y i (cid:13)(cid:13)(cid:1) σ · max | ν |≤ | det Y i | − · Z S ( β ) i max | α |≤ N (cid:12)(cid:12)(cid:12)(cid:16)h ∂ α [ ∂ ν γ j i (cid:0) Y − j ξ (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) d ξ ! τ , with N := (cid:24) d + ε min { , p } (cid:25) ≤ (cid:24) p (cid:25) =: N , τ := min { , p, q } ≥ τ := min { p , q } and σ := τ · (cid:18) d min { , p } + N (cid:19) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein and where γ j := γ (0) k j for j ∈ J ( β ) , i.e., γ j = ψ for j ∈ J ( β )0 and γ = ϕ .Now, since b ϕ ∈ C ∞ c (cid:0) R (cid:1) , there is some A > satisfying supp b ϕ ⊂ ( − A, A ) . Furthermore, since supp b ψ ⊂ R ∗ × R is compact, there are < λ < µ and B > with supp b ψ ⊂ (cid:8) ( ξ, η ) ∈ R (cid:12)(cid:12) λ < | ξ | < µ and | η | < λB (cid:9) ⊂ (cid:8) ( ξ, η ) ∈ R (cid:12)(cid:12) λ < | ξ | < µ and − B < η/ξ < B (cid:9) = U ( λ,µ )( − B,B ) ∪ h − U ( λ,µ )( − B,B ) i =: U. By possibly shrinking λ and enlarging µ and B , we can assume λ ≤ µ − , µ ≥ µ and B ≥ , so that U ⊃ P = Q (1)0 .Setting U ′ := ( − A, A ) and U ′ j := U for j ∈ J ( β )0 , we have just shown supp b γ j ⊂ U ′ j for all j ∈ J ( β ) . But standardproperties of the Fourier transform (see e.g. [21, Theorem 8.22]) show [ ∂ ν γ j ( ξ ) = (2 πiξ ) ν · b γ j ( ξ ) and thus again supp ∂ α [ ∂ ν γ j ⊂ U ′ j for all j ∈ J ( β ) and arbitrary α, ν ∈ N . Therefore, we get max | ν |≤ max | α |≤ N (cid:12)(cid:12)(cid:12)(cid:16)h ∂ α [ ∂ ν γ j i (cid:0) Y − j ξ (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) ≤ " sup j ∈ J ( β ) max | ν |≤ max | α |≤ N (cid:13)(cid:13)(cid:13) ∂ α [ ∂ ν γ j (cid:13)(cid:13)(cid:13) sup · U ′ j (cid:0) Y − j ξ (cid:1) =: K · U ′ j (cid:0) Y − j ξ (cid:1) = K · Y j U ′ j ( ξ ) for all ξ ∈ R and j ∈ J ( β ) . Here, we emphasize that the constant K is ﬁnite since (cid:8) γ j (cid:12)(cid:12) j ∈ J ( β ) (cid:9) = { ϕ, ψ } ⊂ S (cid:0) R (cid:1) is a ﬁnite set.Next, if we set U j := Y ( β ) j U ′ j for j ∈ J ( β ) , then Lemma D.2 yields constants L , C > and M ∈ N (dependingonly on λ, µ, A, B, β ) such that Υ j := n i ∈ J ( β ) (cid:12)(cid:12)(cid:12) U i ∩ U j = ∅ o satisﬁes | Υ j | ≤ M ∀ j ∈ J ( β ) , (cid:13)(cid:13) Y − i Y j (cid:13)(cid:13) ≤ C ∀ i, j ∈ J ( β ) with U i ∩ U j = ∅ ,L − · v j ≤ | ξ | ≤ L · v j ∀ j ∈ J ( β )0 and ξ ∈ U j . As a slight modiﬁcation, the last estimate yields because of v j ≥ that (1 + L ) − · v j ≤ | ξ | ≤ | ξ | ≤ (1 + L ) · v j for all ξ ∈ U j and j ∈ J ( β )0 . Likewise, for ξ ∈ U = ( − A, A ) , we have (1 + 2 A ) − · v j ≤ ≤ | ξ | ≤ A = (1 + 2 A ) · v j , so that there is a constant L = L ( A, B, λ, µ, β ) > satisfying L − · v j ≤ | ξ | ≤ L · v j for all ξ ∈ U j and j ∈ J ( β ) . In particular, for i ∈ Υ j there is some ξ ∈ U i ∩ U j = ∅ , so that v i ≤ L · (1 + | ξ | ) ≤ L · v j . By symmetry,we also get v j ≤ L · v i and thus (cid:0) v sj /v si (cid:1) ≤ L | s | for all j ∈ J ( β ) and i ∈ Υ j .Putting everything together and recalling P ′ j ⊂ U ′ j for all j ∈ J ( β ) , we thus see M j,i = (cid:18) v sj v si (cid:19) τ · (cid:0) (cid:13)(cid:13) Y − j Y i (cid:13)(cid:13)(cid:1) σ · max | ν |≤ | det Y i | − · Z S ( β ) i max | α |≤ N (cid:12)(cid:12)(cid:12)(cid:16)h ∂ α [ ∂ ν γ j i (cid:0) Y − j ξ (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) d ξ ! τ ≤ (cid:18) v sj v si (cid:19) τ · (cid:0) (cid:13)(cid:13) Y − j Y i (cid:13)(cid:13)(cid:1) σ · (cid:18) K · | det Y i | − · Z U i Y j U ′ j ( ξ ) d ξ (cid:19) τ = (cid:18) v sj v si (cid:19) τ · (cid:0) (cid:13)(cid:13) Y − j Y i (cid:13)(cid:13)(cid:1) σ · (cid:16) K · | det Y i | − · λ ( U i ∩ U j ) (cid:17) τ ≤ Υ j ( i ) · (cid:18) v sj v si (cid:19) τ · (cid:0) (cid:13)(cid:13) Y − j Y i (cid:13)(cid:13)(cid:1) σ · (cid:16) K · | det Y i | − · λ ( Y i U ′ i ) (cid:17) τ ≤ Υ j ( i ) · L τ | s | · (1 + C ) σ · K · sup i ∈ J ( β ) λ ( U ′ i ) ! τ = Υ j ( i ) · L τ | s | · (1 + C ) σ · K τ for K := K · max { λ (( − A, A ) ) , λ ( U ) } . Hence, using Υ j ( i ) = Υ i ( j ) , we ﬁnally get C /τ = sup i ∈ J ( β ) (cid:16) X j ∈ J ( β ) M j,i (cid:17) /τ ≤ L | s | · (1 + C ) στ · K · sup i ∈ J ( β ) | Υ i | /τ ≤ L | s | · (1 + C ) στ · K · M /τ ≤ L {| s | , | s |} · (1 + C ) p + N · K · M /τ =: K , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein where the last step used στ = d min { ,p } + N ≤ p + N . Precisely the same arguments also show C /τ ≤ K < ∞ .Observe that K is independent of p, q, s , as long as p ≥ p , q ≥ q and s ≤ s ≤ s .Consequently, all assumptions of Theorem 2.9 are satisﬁed. Furthermore, since the sets R ∗ × R , P and ( − , are symmetric, we see analogously that all assumptions of Theorem 2.9 are still satisﬁed (possibly with a slightlydiﬀerent constant K ) if ϕ is replaced by e ϕ and ψ by e ψ , where e f ( x ) = f ( − x ) . Thus, γ j = e ψ for j ∈ J ( β )0 and γ = e ϕ . Consequently, with a ﬁxed regular partition of unity Φ = ( ϕ ℓ ) ℓ ∈ J ( β ) for S ( β ) , Theorem 2.9 yields a constant K = K (cid:16) p , q , S ( β ) , Φ , e ϕ, e ψ (cid:17) = K ( p , q , β, ϕ, ψ ) > , such that for arbitrary < δ ≤ δ = (cid:18) K · C S ( β ) ,v s · (cid:16) C /τ + C /τ (cid:17) (cid:19) − , the family (cid:16) L δ · Y − Ti k f γ [ i ] (cid:17) i ∈ J ( β ) ,k ∈ Z with γ [ i ] = | det Y i | / · M c i (cid:2) γ i ◦ Y Ti (cid:3) and f γ [ i ] ( x ) = γ [ i ] ( − x ) yields a Banach frame for S p,qβ,s (cid:0) R (cid:1) = D (cid:0) S ( β ) , L p , ℓ qv s (cid:1) , as precisely described in Theorem 2.9. But with what wejust saw and thanks to Corollary D.3, we have δ = (cid:18) K · C S ( β ) ,v s · (cid:16) C /τ + C /τ (cid:17) (cid:19) − ≥ (cid:16) K · K | s | · (2 K ) (cid:17) − ≥ (cid:16) · K · K {| s | , | s |} · K (cid:17) − =: δ for a suitable constant K = K ( β ) ≥ which is provided by Corollary D.3.Finally, note that the coeﬃcient map A ( δ ) f = (cid:0)(cid:2) γ [ i ] ∗ f (cid:3) (cid:0) δ · Y − Ti k (cid:1)(cid:1) i ∈ J ( β ) , k ∈ Z from Theorem 2.9 uses a some-what peculiar deﬁnition of the convolution γ [ i ] ∗ f , cf. equation (2.3). Precisely, with the regular partition of unity Φ = ( ϕ ℓ ) ℓ ∈ J ( β ) from above, we have h γ [ i ] ∗ f i ( x ) = X ℓ ∈ J F − (cid:16) c γ [ i ] · ϕ ℓ · b f (cid:17) ( x ) (cid:0) series is ﬁnite sum, since Φ is a locally ﬁnite and d γ [ i ] ∈ C ∞ c ( R ) (cid:1) = F − X ℓ ∈ J ϕ ℓ · c γ [ i ] · b f ! ( x )( P ℓ ∈ J ϕ ℓ ≡ on R ) = F − (cid:16) c γ [ i ] · b f (cid:17) ( x ) = D b f , e πi h x, •i · c γ [ i ] E D ′ ( R ) ,C ∞ c ( R ) = D f, F h e πi h x, •i · c γ [ i ] iE Z ′ ( R ) ,Z ( R ) = (cid:28) f, L x · cc γ [ i ] (cid:29) Z ′ ( R ) ,Z ( R ) ( Fourier inversion ) = D f, L x · f γ [ i ] E Z ′ ( R ) ,Z ( R ) for all x ∈ R and i ∈ J ( β ) .It remains to verify that the family (cid:16) L δ · Y − Ti k f γ [ i ] (cid:17) i ∈ J ( β ) ,k ∈ Z is (almost) identical to the family (cid:0) γ [ i,k,δ ] (cid:1) i ∈ J ( β ) ,k ∈ Z from the statement of the theorem. Recall that c i = 0 for all i ∈ J ( β ) . Now, for i = 0 , Y i = Y = id and thus L δ · Y − Ti k f γ [ i ] = L δ · k e γ i = L δ · k ϕ = ϕ ( • − δk ) = γ [ i,k,δ ] . Next, in case of i = ( j, ℓ, ∈ J ( β )0 , recall from Deﬁnition D.1 that Y Ti = S ℓ · diag (cid:0) βj/ , j/ (cid:1) = S ℓ · A β − , βj/ and | det Y i | = 2 j (1+ β ) , so that L δ · Y − Ti k f γ [ i ] = L δ · h S ℓ · A β − , βj/ i − k f γ [ i ] = 2 j (1+ β ) · ψ (cid:0) S ℓ · A β − , βj/ • − δk (cid:1) = γ [ i,k,δ ] . Finally, in case of i = ( j, ℓ, ∈ J ( β )0 , a direct calculation shows Y Ti = (cid:18) j/ ℓ βj/ j/ (cid:19) = R · S Tℓ · e A β − , βj/ , nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein so that L δ · Y − Ti k f γ [ i ] = 2 j (1+ β ) · ψ (cid:0) Y Ti (cid:2) • − δY − Ti k (cid:3)(cid:1) = 2 j (1+ β ) · ψ (cid:16) R h S Tℓ · e A β − , βj/ • − δRk i(cid:17) = 2 j (1+ β ) · θ (cid:16) S Tℓ · e A β − , βj/ • − δRk (cid:17) = γ [ i,Rk,δ ] . But since Z → Z , k Rk is bijective, it is not hard to see directly from the deﬁnition of the coeﬃcient space C p,qv s (cf. Deﬁnition 2.8) that if we set δ i := 1 for i of the form i = ( j, ℓ, and δ i := 0 otherwise, then Ω : C p,qv s → C p,qv s , ( c ( i ) k ) i ∈ J,k ∈ Z (cid:16) c ( i ) R δi · k (cid:17) i ∈ J,k ∈ Z is an isometric isomorphism. All in all, we have shown A ( δ ) f = (cid:16)h γ [ i ] ∗ f i (cid:0) δ · Y − Ti k (cid:1)(cid:17) i ∈ J ( β ) , k ∈ Z = (cid:18)D f, L δ · Y − Ti k f γ [ i ] E Z ′ ( R ) ,Z ( R ) (cid:19) i ∈ J ( β ) , k ∈ Z = (cid:18)D f, γ [ i,R δi k,δ ] E Z ′ ( R ) ,Z ( R ) (cid:19) i ∈ J ( β ) , k ∈ Z = Ω "(cid:18)D f, γ [ i,k,δ ] E Z ′ ( R ) ,Z ( R ) (cid:19) i ∈ J ( β ) , k ∈ Z . In conjunction with Theorem 2.9, this easily yields all claimed properties. (cid:3)

Now, we can ﬁnally provide the proof of Proposition 6.2 for the general case β ∈ (1 , . Proof of Proposition 6.2 for β ∈ (1 , . Set α := β − ∈ (0 , . For j ∈ N , let L j := 2 ⌊ j (1 − α ) ⌋ and M := M × Z ,with M := { } ∪ { ( j, ℓ ) ∈ N × Z | ℓ ∈ { , . . . , L j − }} . Furthermore, let ( ψ µ ) µ ∈ M be the tight α -curvelet frameconstructed in [38, Section 3]; see also [37, Deﬁnition 2.2]. Then, [38, Theorem 4.2] yields a constant C = C ( β, ν ) > such that we have | θ ∗ N ( f ) | ≤ C · (cid:2) N − · (1 + log N ) (cid:3) − β ∀ N ∈ N and f ∈ E β (cid:0) R ; ν (cid:1) , where θ ∗ N ( f ) denotes the N -th largest (in absolute value) α -curvelet coeﬃcient of f with respect to the α -curveletframe ( ψ µ ) µ ∈ M . This easily implies (cid:13)(cid:13)(cid:13)(cid:0) h f, ψ µ i L (cid:1) µ ∈ M (cid:13)(cid:13)(cid:13) ℓ p = (cid:13)(cid:13) ( θ ∗ N ( f )) N ∈ N (cid:13)(cid:13) ℓ p ≤ C ( p )1 ∀ f ∈ E β (cid:0) R ; ν (cid:1) and p >

21 + β , (D.15)for a suitable constant C ( p )1 = C ( p )1 ( β, ν ) .Now, let ϕ, ψ be real-valued functions satisfying the requirements of Proposition D.8 and let θ := ψ ◦ R . Let p = q = β , s = 0 and s = (1 + β ) and choose δ = δ ( β, p , q , s , s , ϕ, ψ ) > as provided by PropositionD.8, so that the cone-adapted β -shearlet system SH ( ϕ, ψ, θ ; δ, β ) forms a Banach frame for S p,qβ,s (cid:0) R (cid:1) for all < δ ≤ δ , p ≥ p , q ≥ q and s ≤ s ≤ s , in the sense of Proposition D.8.From this point on, the proof heavily uses the results and terminology of [37]: Since ϕ, ψ, θ are bandlimited, [37,Proposition 3.11(ii)] shows that SH ( ϕ, ψ, θ ; δ, β ) is a system of β − -molecules of order ( ∞ , ∞ , ∞ , ∞ ) with respectto the parametrization (Λ s , Φ s ) with τ = δ , σ = 2 β/ , η j = σ − j (1 − α ) and L j = (cid:6) σ j (1 − α ) (cid:7) , cf. [37, Deﬁnitions 3.7and 3.8] for details of this parametrization. Furthermore, [37, Proposition 3.3(iii)] shows that the α -curvelet frame ( ψ µ ) µ ∈ M from above is a system of α -molecules of order ( ∞ , ∞ , ∞ , ∞ ) with respect to the parametrization (Λ c , Φ c ) given in [37, Deﬁnition 3.2], with parameters σ = 2 , τ = 1 and L j = 2 ⌊ j (1 − α ) ⌋ as above.Next, [37, Theorem 5.7] shows that the α -curvelet parametrization (Λ c , Φ c ) (deﬁned in [37, Deﬁnition 3.2])and the α -shearlet parametrization (Λ s , Φ s ) are ( α, k ) -consistent for all k > ; cf. [37, Deﬁnition 5.5] for thedeﬁnition of ( α, k ) -consistency. Now, for arbitrary p ∈ (2 / (1 + β ) , , [37, Theorem 5.6] shows that ( ψ µ ) µ ∈ M and SH ( ϕ, ψ, θ ; δ, β ) = (cid:0) γ [ i,k,δ ] (cid:1) ( i,k ) ∈ J ( β ) × Z are sparsity equivalent in ℓ p , which means (cf. [37, Deﬁnition 5.3]) Before [37, Proposition 3.11], it is required that the generators ϕ, ψ, θ of a band-limited β -shearlet system satisfy supp ϕ ⊂ Q , supp ψ ⊂ W and supp θ ⊂ f W , where Q ⊂ R is a cube centered at the origin and W, f W ⊂ R satisfy W ⊂ [ − a, a ] × ([ − c, − b ] ∪ [ b, c ]) and f W ⊂ ([ − c, − b ] ∪ [ b, c ]) × [ − a, a ] for certain a > and < b < c . This is of course impossible, since ϕ, ψ, θ would then need tobe simultaneously bandlimited and compactly supported. What is actually meant is supp b ϕ ⊂ Q , supp b ψ ⊂ f W and supp b θ ⊂ W , with Q, W, f W as above. Note the interchange of the sets f W and W compared to the condition in [37]. It is not hard to see that our generators ϕ, ψ, θ satisfy these corrected assumptions, since supp b ψ ⊂ R ∗ × R . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein that the operator A : ℓ p ( M ) → ℓ p (cid:0) J ( β ) × Z (cid:1) given by the inﬁnite matrix (cid:0)(cid:10) ψ µ , γ [ i,k,δ ] (cid:11) L (cid:1) µ ∈ M, ( i,k ) ∈ J ( β ) × Z iswell-deﬁned and bounded. Now, since ( ψ µ ) µ ∈ M is a tight frame, we get f = X µ ∈ M h f, ψ µ i L · ψ µ and thus h f, γ [ i,k,δ ] i L = X µ ∈ M h ψ µ , γ [ i,k,δ ] i L h f, ψ µ i L = A h(cid:0) h f, ψ µ i L (cid:1) µ ∈ M i . Consequently, since ϕ, ψ and thus also γ [ i,k,δ ] are real-valued, we get from equation (D.15) that (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) h f, γ [ i,k,δ ] i Z ′ ( R ) ,Z ( R ) (cid:17) ( i,k ) ∈ J ( β ) × Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ p = (cid:13)(cid:13)(cid:13) ( h f, γ [ i,k,δ ] i L ) ( i,k ) ∈ J ( β ) × Z (cid:13)(cid:13)(cid:13) ℓ p = (cid:13)(cid:13)(cid:13) A h(cid:0) h f, ψ µ i L (cid:1) µ ∈ M i(cid:13)(cid:13)(cid:13) ℓ p ≤ ||| A ||| ℓ p → ℓ p C ( p )1 =: C ( p )2 < ∞ for all f ∈ E β (cid:0) R ; ν (cid:1) ⊂ L (cid:0) R (cid:1) = S , β, (cid:0) R (cid:1) (cf. [60, Lemma 6.10]) and p ∈ (2 / (1 + β ) , . But since ℓ ֒ → ℓ p for p ≥ , this estimate in fact holds for all p ∈ (2 / (1 + β ) , ∞ ] , with C ( p )2 := C (1)2 for p ≥ .In view of the consistency statement in Proposition D.8, and since the remark after Deﬁnition 2.8 shows C p,pv ( β − ) ( p − ) = ℓ p (cid:0) J ( β ) × Z (cid:1) , we thus get f ∈ S p,pβ, (1+ β − ) ( p − ) (cid:0) R (cid:1) , with k f k S p,pβ, ( β − ) ( p − ) = (cid:13)(cid:13)(cid:13) R ( δ ) A ( δ ) f (cid:13)(cid:13)(cid:13) S p,pβ, ( β − ) ( p − ) ≤ ||| R ( δ ) ||| C p,pv ( β − ) ( p − ) → S p,pβ, ( β − ) ( p − ) · (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) h f, γ [ i,k,δ ] i Z ′ ( R ) ,Z ( R ) (cid:17) ( i,k ) ∈ J ( β ) × Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ p ≤ C ( p )3 for all f ∈ E β (cid:0) R ; ν (cid:1) and arbitrary p ∈ (2 / (1 + β ) , , for a suitable constant C ( p )3 = C ( p )3 ( ϕ, ψ, β, ν, p, δ ) . Here, R ( δ ) is the reconstruction operator provided by Proposition D.8. This uses that we indeed have p ≥ p = q = 2 / (1 + β ) and s = 0 ≤ (cid:0) β − (cid:1) (cid:18) p − (cid:19) ≤ (cid:0) β − (cid:1) (cid:18) β − (cid:19) = 12 (1 + β ) = s , so that Proposition D.8 applies. Since Lemma D.6 shows S p,pβ, (1+ β − ) ( p − ) = S p,pβ − , (1+ β − ) ( p − ) (cid:0) R (cid:1) , the proof iscomplete. (cid:3) Appendix E. A slight twist for achieving polynomial search depth

In Theorem 6.3, we saw for β ∈ (1 , and α = β − that suitable α -shearlet systems achieve the approximationrate k f − f N k L . N − ( β − ε ) for arbitrary ε > and C β -cartoon-like functions f ∈ E β (cid:0) R (cid:1) . Furthermore, werecalled from [38, Theorem 2.8] that this approximation rate is essentially optimal, in the sense that no system Φ = ( ϕ n ) n ∈ N can achieve an approximation rate better than N − β/ for the whole class E β (cid:0) R ; ν (cid:1) , if one imposes a polynomial search depth for forming the N -term approximation f N . This means that f N is assumed to be a linearcombination of N elements of (cid:8) ϕ , . . . , ϕ π ( N ) (cid:9) , where π is a ﬁxed polynomial, independent of f . We did not show,however, that the N -term approximations f N constructed in Theorem 6.3 satisfy such a polynomial search depthrestriction. The goal of this section is precisely to show that this is possible for a suitable enumeration ( ψ n ) n ∈ N ofthe α -shearlet system under consideration.The proof, however, is surprisingly nontrivial: In the proof of Theorem 6.3, we used that f = P i ∈ V × Z c i ψ i for asequence c = ( c i ) i ∈ V × Z with c ∈ T p> / (1+ β ) ℓ p (cid:0) V × Z (cid:1) and then truncated c to c · J N to form f N = P i ∈ J N c i ψ i ,where J N ⊂ V × Z contains the indices of the N largest entries of c . But the positions of these indices dependheavily on c = c ( f ) and thus on f , while the polynomial search depth restriction requires us to use only indices in { , . . . , π ( N ) } , where π is independent of f .Thus, what we essentially need is a certain (weak) decay of the coeﬃcients, uniformly over the whole class E β (cid:0) R ; ν (cid:1) . But with our present decomposition space formalism, we can not express such a decay, cf. Theorem5.13: By choosing the exponent s for the weight u s suitably, we can enforce a decay of the coeﬃcients with the scale .But since the weight is independent of the translation variable k ∈ Z and since the space ℓ p (cid:0) Z (cid:1) is permutationinvariant, the current formalism cannot impose a decay of the coeﬃcients as | k | → ∞ .Ultimately, this is caused by the deﬁnition of the decomposition spaces: It is not hard to see that the spaces D ( Q , L p , ℓ qw ) are isometrically translation invariant. What we need, therefore, is a modiﬁed type of decomposition Note that an inﬁnite matrix ( A i,j ) i ∈ I,j ∈ J usually would yield an operator ℓ p ( J ) → ℓ p ( I ) , not ℓ p ( I ) → ℓ p ( J ) . But the conventionused here is the same as in [37], see e.g. the proof of [37, Proposition 5.2]. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein spaces which does not have this property. Luckily, such a type of decomposition spaces already exists. In fact, thetheory of structured Banach frame decompositions in [62] was developed for the spaces D ( Q , L pv , ℓ qw ) , where theLebesgue spaces L p (cid:0) R d (cid:1) are replaced by the weighted Lebesgue spaces L pv (cid:0) R d (cid:1) = (cid:8) f : R d → C (cid:12)(cid:12) v · f ∈ L p (cid:0) R d (cid:1)(cid:9) with k f k L pv = k v · f k L p , where v : R d → (0 , ∞ ) is measurable. This theory is brieﬂy discussed in the next subsection.E.1. Structured Banach frame decompositions of weighted decomposition spaces.

The weight v fromabove needs to satisfy certain regularity properties to ensure that the spaces D ( Q , L pv , ℓ qw ) are well-deﬁned. Precisely,we say that a measurable weight v : R d → (0 , ∞ ) is v -moderate for some weight v : R d → (0 , ∞ ) if we have v ( x + y ) ≤ v ( x ) · v ( y ) ∀ x, y ∈ R d . (E.1)Now, as in Section 2, let us ﬁx an almost structured covering Q = ( T i Q ′ i + b i ) i ∈ I of an open set ∅ = O ⊂ R d withassociated regular partition of unity Φ = ( ϕ i ) i ∈ I for the remainder of the subsection and assume that Q satisﬁesAssumption 2.7. The weight v is called ( Q , Ω , Ω , K ) -regular , for Ω , Ω ∈ [1 , ∞ ) and K ∈ [0 , ∞ ) , if it satisﬁesthe following:(1) v is measurable and symmetric, i.e., v ( − x ) = v ( x ) for all x ∈ R d .(2) v is submultiplicative , i.e., v ( x + y ) ≤ v ( x ) · v ( y ) for all x, y ∈ R d .(3) We have v ( x ) ≤ Ω · (1 + | x | ) K for all x ∈ R d .(4) We have K = 0 , or (cid:13)(cid:13) T − i (cid:13)(cid:13) ≤ Ω for all i ∈ I .We note that the preceding assumptions imply v ( x ) ≥ for all x ∈ R d . Indeed, v (0) = v ( x + ( − x )) ≤ [ v ( x )] for all x ∈ R d by symmetry and submultiplicativity. For x = 0 , this yields v (0) ≥ , since v (0) > . Finally, wethen see ≤ v (0) ≤ [ v ( x )] and hence v ( x ) ≥ for all x ∈ R d .The following example introduces the class of weights in which we will be mainly interested. Example E.1.

The standard weight ω is given by ω : R d → (0 , ∞ ) , x | x | . It is submultiplicative, since | x + y | ≤ | x | + | y | ≤ (1 + | x | ) · (1 + | y | ) ∀ x, y ∈ R d . Hence, if we have K = 0 and Ω = 1 , or if K > and (cid:13)(cid:13) T − i (cid:13)(cid:13) ≤ Ω for all i ∈ I , then ω K is ( Q , Ω , , K ) -regular.Furthermore, if L ∈ R with | L | ≤ K , then ω L is ω K -moderate. For L ≥ , this follows from submultiplicativityof ω , since ω L ( x + y ) ≤ ω L ( x ) ω L ( y ) ≤ ω L ( x ) ω K ( y ) . If L < , then our considerations for L ≥ show ω − L ( x ) = ω − L ([ x + y ] − y ) ≤ ω − L ( x + y ) ω − L ( − y ) ≤ ω − L ( x + y ) ω K ( y ) . Rearranging again yields the claim.Finally, in case of the unconnected α -shearlet covering Q = S ( α ) u = ( B v W ′ v ) v ∈ V ( α ) , we have (cid:13)(cid:13) B − v (cid:13)(cid:13) ≤ forall v ∈ V ( α ) . Indeed, for v = 0 , this is trivial and for v = ( j, ℓ, δ ) ∈ V ( α )0 , we have (cid:13)(cid:13)(cid:13) B − j,ℓ,δ ) (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − j − − j ℓ − αj (cid:19) R − δ (cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) − j − − j ℓ − αj (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ≤ − j + 2 − αj + (cid:12)(cid:12) − − j ℓ (cid:12)(cid:12) ≤ . Here, the last step used that | ℓ | ≤ (cid:6) (1 − α ) j (cid:7) ≤ j . Therefore, ω K is ( S ( α ) u , , , K ) -regular for K ≥ . (cid:7) Now, we can deﬁne the modiﬁed, weighted decomposition spaces.

Deﬁnition E.2.

Let p, q ∈ (0 , ∞ ] and let w = ( w i ) i ∈ I be Q -moderate. Further, let v be ( Q , Ω , Ω , K ) -regularand let v be v -moderate.Then, the (weighted) decomposition space (quasi)-norm of g ∈ Z ′ ( O ) is deﬁned as k g k D ( Q ,L pv ,ℓ qw ) := (cid:13)(cid:13)(cid:13)(cid:13)(cid:16)(cid:13)(cid:13) F − ( ϕ i · b g ) (cid:13)(cid:13) L pv (cid:17) i ∈ I (cid:13)(cid:13)(cid:13)(cid:13) ℓ qw ∈ [0 , ∞ ] and the associated (weighted) decomposition space is D ( Q , L pv , ℓ qw ) := n g ∈ Z ′ ( O ) (cid:12)(cid:12)(cid:12) k g k D ( Q ,L pv ,ℓ qw ) < ∞ o . ◭ Remark.

It is a consequence of [62, Proposition 2.24, Lemma 5.5, and Corollary 6.5] that the resulting space is awell-deﬁned Quasi-Banach space, with equivalent (quasi)-norms for diﬀerent choices of Φ . Indeed, [62, Proposition2.24] shows that the deﬁnition is independent of the Q - v -BAPU Φ , while [62, Corollary 6.5] ensures that everyregular partition of unity is a Q - v -BAPU. Finally, [62, Lemma 5.5] establishes completeness of D ( Q , L pv , ℓ qw ) . (cid:7) Recall from Section 2 that the Banach frame and atomic decomposition results for D ( Q , L p , ℓ qw ) were formulatedin terms of the coeﬃcient space C p,qw from Deﬁnition 2.8. This coeﬃcient space needs to be slightly adjusted in thepresent case. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Deﬁnition E.3.

Under the assumptions of Deﬁnition E.2 and for δ ∈ (0 , ∞ ) , deﬁne the weighted coeﬃcientspace C p,qw,v,δ as C p,qw,v,δ := ( c = ( c ( i ) k ) i ∈ I,k ∈ Z d (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k c k C p,qw,v,δ := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) | det T i | − p · w i · (cid:13)(cid:13)(cid:13)(cid:13)h v (cid:0) δ · T − Ti k (cid:1) · c ( i ) k i k ∈ Z d (cid:13)(cid:13)(cid:13)(cid:13) ℓ p (cid:19) i ∈ I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ q < ∞ ) . ◭ The corresponding “weighted version” of Theorem 2.9 on the existence of Banach frames for decomposition spacesreads as follows:

Theorem E.4.

Assume that Q satisﬁes Assumption 2.7. Let Ω , Ω ∈ [1 , ∞ ) , K ∈ [0 , ∞ ) and ε, p , q ∈ (0 , .Let v be ( Q , Ω , Ω , K ) -regular. Let w = ( w i ) i ∈ I be a Q -moderate weight and let v be v -moderate. Finally, let p, q ∈ (0 , ∞ ] with p ≥ p and q ≥ q .Deﬁne N := (cid:24) K + d + ε min { , p } (cid:25) , τ := min { , p, q } and σ := τ · (cid:18) d min { , p } + K + N (cid:19) . Let γ (0)1 , . . . , γ (0) n : R d → C be given and deﬁne γ i := γ (0) k i for i ∈ I . Assume that the following conditions aresatisﬁed:(1) We have γ (0) k ∈ L |•| ) K (cid:0) R d (cid:1) and F γ (0) k ∈ C ∞ (cid:0) R d (cid:1) for all k ∈ n , where all partial derivatives of F γ (0) k arepolynomially bounded.(2) We have h F γ (0) k i ( ξ ) = 0 for all ξ ∈ Q ( k )0 and all k ∈ n .(3) We have γ (0) k ∈ C (cid:0) R d (cid:1) and ∇ γ (0) k ∈ L v (cid:0) R d (cid:1) ∩ L ∞ (cid:0) R d (cid:1) for all k ∈ n .(4) We have C := sup i ∈ I X j ∈ I M j,i < ∞ and C := sup j ∈ I X i ∈ I M j,i < ∞ , where M j,i := (cid:18) w j w i (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · max | β |≤ (cid:18) | det T i | − · Z Q i max | α |≤ N (cid:12)(cid:12)(cid:12)(cid:16)h ∂ α [ ∂ β γ j i (cid:0) T − j ( ξ − b j ) (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) d ξ (cid:19) τ . Then there is some δ = δ (cid:0) p, q, w, v, v , ε, ( γ i ) i ∈ I (cid:1) > such that for arbitrary < δ ≤ δ , the family (cid:16) L δ · T − Ti k f γ [ i ] (cid:17) i ∈ I,k ∈ Z d with γ [ i ] = | det T i | / · M b i (cid:2) γ i ◦ T Ti (cid:3) and f γ [ i ] ( x ) = γ [ i ] ( − x ) forms a Banach frame for D ( Q , L pv , ℓ qw ) . Precisely, this means the following: • The analysis operator A ( δ ) : D ( Q , L pv , ℓ qw ) → C p,qw,v,δ , f (cid:0) [ γ [ i ] ∗ f ] (cid:0) δ · T − Ti k (cid:1)(cid:1) i ∈ I,k ∈ Z d is well-deﬁned and bounded for each δ ∈ (0 , . Here, the convolution γ [ i ] ∗ f is deﬁned as in equation (2.3),where now the series converges normally in L ∞ (1+ |•| ) − K (cid:0) R d (cid:1) and thus absolutely and locally uniformly, for each f ∈ D ( Q , L pv , ℓ qw ) . Of course, the simpliﬁed expression from Lemma 5.12 still holds if f ∈ L (cid:0) R d (cid:1) ⊂ Z ′ ( O ) . • For < δ ≤ δ , there is a bounded linear reconstruction operator R ( δ ) : C p,qw,v,δ → D ( Q , L pv , ℓ qw ) satisfying R ( δ ) ◦ A ( δ ) = id D ( Q ,L pv ,ℓ qw ) . • We have the following consistency property : If Q -moderate weights w (1) = ( w (1) i ) i ∈ I and w (2) = ( w (2) i ) i ∈ I and exponents p , p , q , q ∈ (0 , ∞ ] , as well as two v -moderate weights v , v : R d → (0 , ∞ ) are chosen suchthat the assumptions of the current theorem are satisﬁed for p , q , w (1) , v , as well as for p , q , w (2) , v andif < δ ≤ min (cid:8) δ (cid:0) p , q , w (1) , v , v , ε, ( γ i ) i ∈ I (cid:1) , δ (cid:0) p , q , w (2) , v , v , ε, ( γ i ) i ∈ I (cid:1)(cid:9) , then we have the followingequivalence: ∀ f ∈ D (cid:0) Q , L p v , ℓ q w (2) (cid:1) : f ∈ D (cid:0) Q , L p v , ℓ q w (1) (cid:1) ⇐⇒ (cid:0) [ γ [ i ] ∗ f ] (cid:0) δ · T − Ti k (cid:1)(cid:1) i ∈ I,k ∈ Z d ∈ C p ,q w (1) ,v ,δ . Finally, there is an estimate for the size of δ which is independent of the choice of p ≥ p and q ≥ q and of v, v :There is a constant L = L (cid:16) p , q , K, ε, d, Q , Φ , Ω , Ω , γ (0)1 , . . . , γ (0) n (cid:17) > such that we can choose δ = (cid:18) L · C Q ,w · (cid:16) C /τ + C /τ (cid:17) (cid:19) − . ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Proof.

For brevity, set N := (cid:6) K + p − · ( d + ε ) (cid:7) and note N ≤ N .First of all, we verify that the family Γ = ( γ i ) i ∈ I satisﬁes [62, Assumption 3.6]. To this end, we want to apply[62, Lemma 3.7] (with N = n ). Recall that γ i = γ (0) k i and from Assumption 2.7 that Q ′ i = Q ( k i )0 for all i ∈ I . Thus,in the notation of [62, Lemma 3.7], we have for k ∈ n that Q ( k ) = [ { Q ′ i | i ∈ I and k i = k } ⊂ Q ( k )0 . But by our assumption, by continuity of F γ (0) k and by compactness of the sets Q ( k )0 , there is some c > satisfying | [ F γ (0) k ] ( ξ ) | ≥ c for all ξ ∈ Q ( k )0 ⊃ Q ( k ) and all k ∈ n . Consequently, [62, Lemma 3.7] shows that Γ satisﬁes [62,Assumption 3.6] and also yields the estimate Ω ( p,K )2 ≤ Ω for a constant Ω = Ω (cid:16) Q , γ (0)1 , . . . , γ (0) n , p , K, d (cid:17) > .Here, Ω ( p,K )2 is a constant deﬁned in [62, Assumption 3.6]. To obtain this estimate, we used that p ≥ p .Now, since the family Γ satisﬁes [62, Assumption 3.6], the assumptions of the present theorem easily imply thatall assumptions of [62, Corollary 6.6] are satisﬁed. This uses the special structure of the family Γ = ( γ i ) i ∈ I , i.e.,that γ i = γ (0) k i for each i ∈ I .In particular, [62, Corollary 6.6] shows that the operators −→ A and −→ B from [62, Assumption 3.1 and Assumption 4.1]are well-deﬁned and bounded with |||−→ A ||| max { , p } ≤ L (0)1 · (cid:16) C /τ + C /τ (cid:17) and |||−→ B ||| max { , p } ≤ L (0)1 · (cid:16) C /τ + C /τ (cid:17) for L (0)1 = Ω K Ω · d / min { ,p } · (4 d ) N · (cid:0) ε − · s d (cid:1) / min { ,p } · max | α |≤ N C ( α ) ≤ Ω K Ω · d /p · (4 d ) N · (cid:0) ε − · s d (cid:1) /p · max | α |≤ N C ( α ) =: L . Note that L = L ( d, ε, Q , Φ , p , Ω , Ω , K ) , since the constants C ( α ) from Deﬁnition 2.2 only depend on α, Q , Φ .Since [62, Corollary 6.6] is applicable to Γ , we see that Γ satisﬁes [62, Assumption 4.1]. Therefore, [62, Lemma4.3] shows that the series in equation (2.3) converges normally in L ∞ (1+ |•| ) − K (cid:0) R d (cid:1) for all f ∈ D ( Q , L pv , ℓ qw ) . Sinceeach of the summands of the series is a continuous functions, this yields absolute and locally uniform convergenceof the series.Next, since −→ A and −→ B are bounded, [62, Theorem 4.7] is applicable. This shows that the family (cid:16) L δ · T − Ti k f γ [ i ] (cid:17) i ∈ I,k ∈ Z d yields a Banach frame for D ( Q , L pv , ℓ qw ) as in the statement of the current theorem, as soon as < δ ≤ δ for δ = 1 (cid:14) (1 + 2 · ||| F ||| ) , where the operator F is deﬁned in [62, Lemma 4.6]. That lemma also yields the estimate ||| F ||| ≤ q C Q , Φ ,v ,p · ||| Γ Q ||| · (cid:16) |||−→ A ||| max { , p } + |||−→ B ||| max { , p } (cid:17) · L (0)2 ≤ q C Q , Φ ,v ,p · ||| Γ Q ||| · (cid:16) C /τ + C /τ (cid:17) · L · L (0)2 , where C Q , Φ ,v ,p = sup i ∈ I h | det T i | { ,p } − · (cid:13)(cid:13) F − ϕ i (cid:13)(cid:13) L min { ,p } v i and where Γ Q : ℓ qw ( I ) → ℓ qw ( I ) is the Q -clustering map given by Γ Q ( c i ) i ∈ I = ( c ∗ i ) i ∈ I where c ∗ i = P ℓ ∈ i ∗ c ℓ .Further, with M := l K + d +1min { ,p } m ≤ l K + d +1 p m =: M , the constant L (0)2 is given by L (0)2 =  (cid:16) · /d (cid:17) dp · d · d · (cid:16) · d · M (cid:17) M +1 · N ( p − ) Q (1 + R Q C Q ) d ( p − ) · Ω K Ω Ω ( p,K )2 , if p < , (cid:16) ⌈ K ⌉ · √ d (cid:17) − · (cid:0) · d / · M (cid:1) ⌈ K ⌉ + d +2 · (1 + R Q ) d · Ω K Ω Ω ( p,K )2 , if p ≥ ≤  d/p · (cid:0) · d · M (cid:1) M +1 · N p Q · (1 + R Q C Q ) dp · Ω K Ω Ω , if p < , (cid:0) · d / · M (cid:1) ⌈ K ⌉ + d +2 · (1 + R Q ) d · Ω K Ω Ω , if p ≥ since C Q ≥ ) ≤ d/p · (cid:0) · d · M (cid:1) M +1 · N p Q · (1 + R Q C Q ) dp · Ω K Ω Ω =: L . Here, the last step used that M = (cid:6) K + p − · ( d + 1) (cid:7) ≥ ⌈ K + d + 1 ⌉ = ⌈ K ⌉ + d + 1 , as well as Ω , Ω ≥ . Noteas above that L = L ( d, p , Q , Ω , Ω , Ω , K, M ) = K ( d, p , Q , Ω , Ω , K, γ (0)1 , . . . , γ (0) n ) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein As seen in [60, Lemma 4.13], we have ||| Γ Q ||| ≤ C Q ,w · N q Q ≤ C Q ,w · N q Q . Furthermore, [62, Corollary 6.5]shows that there is a function ̺ ∈ C ∞ c (cid:0) R d (cid:1) (which only depends on Q ) such that C Q , Φ ,v ,p ≤ Ω K Ω · (4 d ) N · (cid:16) s d ε (cid:17) / min { ,p } · N · λ d [ i ∈ I Q ′ i ! · max | α |≤ N k ∂ α ̺ k sup · max | α |≤ N C ( α ) ≤ Ω K Ω · (8 d ) N · (cid:16) s d ε (cid:17) /p · (2 R Q ) d · max | α |≤ N k ∂ α ̺ k sup · max | α |≤ N C ( α ) =: L . Here, we used that Q ′ i ⊂ B R Q (0) ⊂ [ − R Q , R Q ] d for all i ∈ I . Since the constants C ( α ) = C ( α ) (Φ , Q ) from Deﬁnition2.2 only depend on α, Φ , Q and since ̺ only depends on Q , we see L = L ( d, ε, p , Q , Φ , Ω , Ω , K ) .All in all, we arrive at ||| F ||| ≤ q N q Q · L L L · C Q ,w · (cid:16) C /τ + C /τ (cid:17) = L · C Q ,w · (cid:16) C /τ + C /τ (cid:17) for a suitable constant L = L (cid:16) d, ε, p , q , Q , Φ , γ (0)1 , . . . , γ (0) n , Ω , Ω , K (cid:17) , so that the family (cid:16) L δ · T − Ti k f γ [ i ] (cid:17) i ∈ I,k ∈ Z d yields a Banach frame for D ( Q , L pv , ℓ qw ) as soon as < δ ≤ δ for δ := (cid:18) h L · C Q ,w · (cid:16) C /τ + C /τ (cid:17)i (cid:19) − ,since δ ≤ δ . Now, setting L := 2 · L yields the claim. (cid:3) Finally, we present a “weighted version” of Theorem 2.10 concerning the existence of atomic decompositions fordecomposition spaces.

Theorem E.5.

Assume that Q satisﬁes Assumption 2.7. Let Ω , Ω ∈ [1 , ∞ ) , K ∈ [0 , ∞ ) and ε, p , q ∈ (0 , .Let v be ( Q , Ω , Ω , K ) -regular. Let w = ( w i ) i ∈ I be a Q -moderate weight and let v be v -moderate. Finally, let p, q ∈ (0 , ∞ ] with p ≥ p and q ≥ q .Deﬁne N := (cid:24) K + d + ε min { , p } (cid:25) , τ := min { , p, q } , ϑ := (cid:18) p − (cid:19) + , and Υ := K + 1 + d min { , p } , as well as σ := ( τ · N, if p ∈ [1 , ∞ ] ,τ · (cid:0) p − · d + K + N (cid:1) , if p ∈ (0 , . Let γ (0)1 , . . . , γ (0) n : R d → C be given and deﬁne γ i := γ (0) k i for i ∈ I . Assume that there are functions γ (0 ,j )1 , . . . , γ (0 ,j ) n for j ∈ { , } such that the following conditions are satisﬁed:(1) We have γ (0 , k ∈ L |•| ) K (cid:0) R d (cid:1) for all k ∈ n .(2) We have γ (0 , k ∈ C (cid:0) R d (cid:1) for all k ∈ n .(3) We have Ω ( p ) := max k ∈ n (cid:13)(cid:13)(cid:13) γ (0 , k (cid:13)(cid:13)(cid:13) Υ + max k ∈ n (cid:13)(cid:13)(cid:13) ∇ γ (0 , k (cid:13)(cid:13)(cid:13) Υ < ∞ , where k f k Υ = sup x ∈ R d (1 + | x | ) Υ · | f ( x ) | for f : R d → C ℓ and (arbitrary) ℓ ∈ N .(4) We have F γ (0 ,j ) k ∈ C ∞ (cid:0) R d (cid:1) and all partial derivatives of F γ (0 ,j ) k are polynomially bounded for all k ∈ n and j ∈ { , } .(5) We have γ (0) k = γ (0 , k ∗ γ (0 , k for all k ∈ n .(6) We have (cid:13)(cid:13)(cid:13) γ (0) k (cid:13)(cid:13)(cid:13) Υ < ∞ for all k ∈ n .(7) We have h F γ (0) k i ( ξ ) = 0 for all ξ ∈ Q ( k )0 and all k ∈ n .(8) We have K := sup i ∈ I X j ∈ I N i,j < ∞ and K := sup j ∈ I X i ∈ I N i,j < ∞ , where γ j, := γ (0 , k j for j ∈ I and N i,j := (cid:18) w i w j · (cid:0) | det T j | (cid:14) | det T i | (cid:1) ϑ (cid:19) τ · (cid:0) (cid:13)(cid:13) T − j T i (cid:13)(cid:13)(cid:1) σ · (cid:18) | det T i | − · Z Q i max | α |≤ N (cid:12)(cid:12) [ ∂ α d γ j, ] (cid:0) T − j ( ξ − b j ) (cid:1)(cid:12)(cid:12) d ξ (cid:19) τ . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Then there is some δ ∈ (0 , such that the family Ψ δ := (cid:16) L δ · T − Ti k γ [ i ] (cid:17) i ∈ I, k ∈ Z d with γ [ i ] = | det T i | / · M b i (cid:2) γ i ◦ T Ti (cid:3) forms an atomic decomposition of D ( Q , L pv , ℓ qw ) , for all δ ∈ (0 , δ ] . Precisely, this means the following: • The synthesis map S ( δ ) : C p,qw,v,δ → D ( Q , L pv , ℓ qw ) , ( c ( i ) k ) i ∈ I, k ∈ Z d X i ∈ I X k ∈ Z d h c ( i ) k · L δ · T − Ti k γ [ i ] i is well-deﬁned and bounded for every δ ∈ (0 , . • For < δ ≤ δ , there is a bounded linear coeﬃcient map C ( δ ) : D ( Q , L pv , ℓ qw ) → C p,qw,v,δ satisfying S ( δ ) ◦ C ( δ ) = id D ( Q ,L pv ,ℓ qw ) . Finally, there is an estimate for the size of δ which is independent of p ≥ p , q ≥ q and of v, v : There is a constant L = L (cid:16) p , q , ε, d, Q , Φ , γ (0)1 , . . . , γ (0) n , Ω , Ω , K (cid:17) > such that we can choose δ = min n , (cid:14) h L · Ω ( p ) · (cid:16) K /τ + K /τ (cid:17)io . ◭ Remark.

Convergence of the series deﬁning S ( δ ) c has to be understood as in the remark to Theorem 2.10. Also asin that remark, the action of the coeﬃcient map C ( δ ) on a given f ∈ D ( Q , L pv , ℓ qw ) is independent of the precisechoice of p, q, v, w , as long as C ( δ ) f is deﬁned at all. (cid:7) Proof.

For brevity, set N := (cid:6) K + p − · ( d + ε ) (cid:7) . As in the proof of Theorem E.4, we see as a consequenceof [62, Lemma 3.7] that Γ = ( γ i ) i ∈ I satisﬁes [62, Assumption 3.6], with Ω ( p,K )2 ≤ Ω for a suitable constant Ω = Ω (cid:16) Q , γ (0)1 , . . . , γ (0) n , p , d, K (cid:17) > . For brevity, set Ω := Ω K Ω Ω .Now, since we have γ i = γ (0) k i = γ (0 , k i ∗ γ (0 , k i for all i ∈ I , it is easy to see that all assumptions of [62, Corollary6.7] are satisﬁed. Consequently, [62, Corollary 6.7] shows that the operator −→ C deﬁned in [62, Assumption 5.1] iswell-deﬁned and bounded, with |||−→ C ||| max { , p } ≤ L (0)1 · (cid:16) K /τ + K /τ (cid:17) , where L (0)1 = Ω K Ω · (4 d ) N · (cid:16) s d ε (cid:17) / min { ,p } · max | α |≤ N C ( α ) ≤ Ω K Ω (4 d ) N · (cid:16) s d ε (cid:17) /p · max | α |≤ N C ( α ) =: L , where the constants C ( α ) = C ( α ) ( Q , Φ) are as in Deﬁnition 2.2. Thus, L = L ( d, p , ε, Q , Φ , Ω , Ω , K ) .Finally, [62, Corollary 6.7] shows that Γ satisﬁes all assumptions of [62, Theorem 5.6], so that the family Ψ δ deﬁnedin the statement of the theorem yields an atomic decomposition of D ( Q , L pv , ℓ qw ) as soon as < δ ≤ min { , δ } ,where δ > is deﬁned by δ − :=  s d √ d · (cid:0) · d · ( K + 2 + d ) (cid:1) K + d +3 · (1 + R Q ) d +1 · Ω K Ω Ω ( p,K )2 Ω ( p ) · |||−→ C ||| , if p ≥ , (cid:16) /d (cid:17) dp · d · (cid:16) s d p (cid:17) p (cid:18) d · (cid:16) K + 1 + d +1 p (cid:17) (cid:19) K +2+ d +1 p · (1+ R Q ) dp · Ω K Ω Ω ( p,K )2 Ω ( p ) · |||−→ C ||| p , if p < ≤  s d · (cid:0) · d · ( K + 2 + d ) (cid:1) K + d +3 · (1 + R Q ) d +1 · Ω K Ω Ω Ω ( p ) · L · (cid:16) K /τ + K /τ (cid:17) , if p ≥ , d/p (cid:16) s d p (cid:17) p · (cid:18) d · (cid:16) K +1+ d +1 p (cid:17) (cid:19) K +2+ d +1 p · (1+ R Q ) dp · Ω Ω ( p ) · L · (cid:16) K /τ + K /τ (cid:17) , if p < ≤ L · Ω ( p ) · (cid:16) K /τ + K /τ (cid:17) for L := (cid:2) d · (cid:0) p − · s d (cid:1)(cid:3) /p · (cid:16) d · (cid:0) K + 1 + ( d + 1) · p − (cid:1) (cid:17) K +2+ d +1 p · (1 + R Q ) dp · Ω · L . Here, our application of [62, Theorem 5.6] implicitly used that the constant Ω ( p ) from the statement of TheoremE.5 satisﬁes Ω ( p ) = Ω ( p,K )4 with Ω ( p,K )4 as in [62, Assumption 5.1].Note that L := L = L (cid:16) d, ε, p , Q , Φ , γ (0)1 , . . . , γ (0) n , Ω , Ω , K (cid:17) and ﬁnally observe that if δ = min (cid:26) , h L · Ω ( p ) · (cid:16) K /τ + K /τ (cid:17)i − (cid:27) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein is deﬁned as in the statement of Theorem E.5, then δ ≤ min { , δ } , so that the family Ψ δ indeed yields an atomicdecomposition of D ( Q , L pv , ℓ qw ) as soon as δ ∈ (0 , δ ] . Finally, the remark associated to [62, Theorem 5.6] showsthat convergence of the series in the deﬁnition of S ( δ ) occurs as claimed in the remark after Theorem 2.10 and thatthe action of C ( δ ) on a given f ∈ D ( Q , L pv , ℓ qw ) is independent of the precise choice of p, q, w, v , as claimed in theremark to Theorem E.5. (cid:3) E.2.

Cartoon approximation with α -shearlets and polynomial search depth. In view of the results in thepreceding subsection, we ﬁrst deﬁne a new variant of the α -shearlet smoothness spaces: Deﬁnition E.6.

Let α ∈ [0 , , let ω be the standard weight from Example E.1 and let u = ( u v ) v ∈ V ( α ) be as inDeﬁnition 5.1. For p, q ∈ (0 , ∞ ] and s, κ ∈ R , we deﬁne the (weighted) α -shearlet smoothness space as S p,qα,s,κ (cid:0) R (cid:1) := D (cid:16) S ( α ) u , L pω κ , ℓ qu s (cid:17) . ◭ In this section, we will only consider exponents κ ≥ , for which clearly S p,qα,s,κ (cid:0) R (cid:1) ֒ → S p,qα,s (cid:0) R (cid:1) ֒ → S ′ (cid:0) R (cid:1) , cf.Lemma 3.6. Now, for ≤ κ ≤ κ , Example E.1 shows that the weight ω κ used above is ω κ -moderate and that ω κ is ( S ( α ) u , , , κ ) -regular. Then, by repeating the proofs of Theorems 4.2 and 4.3 for the modiﬁed values of N, σ, τ or N, σ, τ, Υ , one easily sees that Theorems 5.9 and 5.10 remain valid (with the proper modiﬁcations) for the moregeneral spaces S p,qα,s,κ (cid:0) R (cid:1) , cf. Theorems E.7 and E.8 below.The only nontrivial modiﬁcation in the proof is the following: In the proof of Theorem 4.3, Proposition 2.11(with N = N ) is used to obtain factorizations ϕ = ϕ ∗ ϕ and ψ = ψ ∗ ψ , where one still has a certain controlover ϕ , ϕ , ψ , ψ . Indeed, Proposition 2.11 ensures that ϕ , ψ , ∇ ϕ , ∇ ψ decay faster than any polynomial, sothat the constant Ω ( p ) from Theorem E.5 is ﬁnite. But Theorem E.5 requires ϕ , ψ ∈ L |•| ) κ (cid:0) R (cid:1) , whereasTheorem 2.10 only required ϕ , ψ ∈ L (cid:0) R (cid:1) . But this is still guaranteed by Proposition 2.11, since it implies k ϕ k N , k ψ k N < ∞ , where now N = (cid:6) κ + p − · (2 + ε ) (cid:7) ≥ κ + 2 + ε > κ + 2 , from which we easily get ϕ , ψ ∈ L |•| ) κ (cid:0) R (cid:1) .The precise statements of the “weighted” versions of Theorems 5.9 and 5.10 are as follows: Theorem E.7.

Let α ∈ [0 , , ε, p , q ∈ (0 , , κ ∈ [0 , ∞ ) and s , s ∈ R with s ≤ s . Assume that ϕ, ψ : R → C satisfy the following: • ϕ, ψ ∈ L |•| ) κ (cid:0) R (cid:1) and b ϕ, b ψ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of b ϕ, b ψ have at most polynomial growth. • ϕ, ψ ∈ C (cid:0) R (cid:1) and ∇ ϕ, ∇ ψ ∈ L |•| ) κ (cid:0) R (cid:1) ∩ L ∞ (cid:0) R (cid:1) . • We have b ψ ( ξ ) = 0 for all ξ = ( ξ , ξ ) ∈ R with | ξ | ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | , b ϕ ( ξ ) = 0 for all ξ ∈ [ − , . • ϕ, ψ satisfy equation (4.3) for all θ ∈ N with | θ | ≤ N , where N := (cid:6) κ + p − · (2 + ε ) (cid:7) and K := ε + max (cid:26) − α min { p , q } + 2 (cid:18) p + κ + N (cid:19) − s , { p , q } + 2 p + κ + N (cid:27) ,M := ε + 1min { p , q } + max { , s } ,M := max (cid:26) , ε + (1 + α ) (cid:18) p + κ + N (cid:19) − s (cid:27) ,H := max (cid:26) , ε + 1 − α min { p , q } + 2 p + κ + N − s (cid:27) . Then there is some δ ∈ (0 , such that for < δ ≤ δ and all p, q ∈ (0 , ∞ ] and κ, s ∈ R with p ≥ p , q ≥ q and s ≤ s ≤ s , as well as ≤ κ ≤ κ , the following is true: The family SH α ( ˜ ϕ, ˜ ψ ; δ ) = (cid:16) L δ · B − Tv k g γ [ v ] (cid:17) v ∈ V ( α ) ,k ∈ Z with g γ [ v ] ( x ) = γ [ v ] ( − x ) and γ [ v ] := ( | det B v | · (cid:0) ψ ◦ B Tv (cid:1) , if v ∈ V ( α )0 ,ϕ, if v = 0 forms a Banach frame for S p,qα,s,κ (cid:0) R (cid:1) .The precise interpretation of this statement is as in Theorem 4.2, with the obvious changes. In particular, thecoeﬃcient space C p,qu s needs to be replaced by C p,qu s ,ω κ ,δ . ◭ nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Theorem E.8.

Let α ∈ [0 , , ε, p , q ∈ (0 , , κ ∈ [0 , ∞ ) and s , s ∈ R with s ≤ s . Assume that ϕ, ψ : R → C satisfy the following: • We have k ϕ k κ +1+ p < ∞ and k ψ k κ +1+ p < ∞ , where k g k Λ = sup x ∈ R (1 + | x | ) Λ | g ( x ) | for g : R → C ℓ (witharbitrary ℓ ∈ N ) and Λ ≥ . • We have b ϕ, b ψ ∈ C ∞ (cid:0) R (cid:1) , where all partial derivatives of b ϕ, b ψ are polynomially bounded. • We have b ψ ( ξ ) = 0 for all ξ = ( ξ , ξ ) ∈ R with | ξ | ∈ (cid:2) − , (cid:3) and | ξ | ≤ | ξ | , b ϕ ( ξ ) = 0 for all ξ ∈ [ − , . • ϕ, ψ satisfy equation (4.9) for all ξ = ( ξ , ξ ) ∈ R and all β ∈ N with | β | ≤ N := (cid:6) κ + p − · (2 + ε ) (cid:7) , where Λ :=  ε + max n , − α min { p ,q } + N + s o , if p = 1 , ε + max n , − α min { p ,q } + − αp + κ + N + 1 + α + s o , if p ∈ (0 , , Λ := ε + 1min { p , q } + max (cid:26) , (1 + α ) (cid:18) p − (cid:19) − s (cid:27) , Λ := ( ε + max { , (1 + α ) N + s } , if p = 1 ,ε + max n , (1 + α ) (cid:16) p + κ + N (cid:17) + s o , if p ∈ (0 , , Λ :=  ε + max n − α min { p ,q } + 2 N + s , { p ,q } + N o , if p = 1 ,ε + max n − α min { p ,q } + − αp + 2 κ + 2 N + 1 + α + s , { p ,q } + p + κ + N o , if p ∈ (0 , . Then there is some δ ∈ (0 , such that for all < δ ≤ δ and all p, q ∈ (0 , ∞ ] and κ, s ∈ R with p ≥ p , q ≥ q and s ≤ s ≤ s , as well as ≤ κ ≤ κ , the following is true: The family SH α ( ϕ, ψ ; δ ) = (cid:16) L δ · B − Tv k γ [ v ] (cid:17) v ∈ V ( α ) , k ∈ Z with γ [ v ] := ( | det B v | / · (cid:0) ψ ◦ B Tv (cid:1) , if v ∈ V ( α )0 ,ϕ, if v = 0 forms an atomic decomposition for S p,qα,s,κ (cid:0) R (cid:1) . Precisely, this has to be understood as in Theorem 4.3, with theobvious changes. In particular, the coeﬃcient space C p,qu s needs to be replaced by C p,qu s ,ω κ ,δ . ◭ Remark

E.9 . Of course, Remark 5.11 (cf. Corollaries 4.4 and 4.5) also applies in the current setting; one simplyneeds to replace the old values of N and K, M , M , H or Λ , . . . , Λ with the modiﬁed ones. (cid:7) We can now ﬁnally show that the approximation rate stated in Theorem 6.3 can also be achieved when restrictingto polynomial search depth:

Theorem E.10.

Let β ∈ (1 , be arbitrary and set α := β − ∈ [0 , . Let ε ∈ (0 , be arbitrary and set π ( x ) := 40000 · x ⌈ /ε ⌉ for x ∈ R . There is an enumeration ̺ : N → V ( α ) × Z , with the index set V ( α ) fromDeﬁnition 5.1, such that the following is true:Assume that ϕ, ψ satisfy the assumptions of Theorem E.8 for the choices p = q = β , κ = ε and s = 0 ,as well as s := (1 + β ) and for ε as above. Then there is some δ ∈ (0 , such that every < δ ≤ δ satisﬁesthe following: If (cid:0) γ [ v,k ] (cid:1) v ∈ V ( α ) ,k ∈ Z = SH α ( ϕ, ψ ; δ ) denotes the α -shearlet system generated by ϕ, ψ , then thereis for each f ∈ E β (cid:0) R (cid:1) and each N ∈ N a function f N which is a linear combination of N elements of theset (cid:8) γ [ ̺ ( n )] (cid:12)(cid:12) n = 1 , . . . , π ( N ) (cid:9) and such that for all σ, ν > there is a constant C = C ( ϕ, ψ, δ, ε, σ, ν, β ) > (independent of f, N ) satisfying k f − f N k L ≤ C · N − ( β − σ ) ∀ f ∈ E β (cid:0) R ; ν (cid:1) and all N ∈ N . ◭ Remark.

Using Remark E.9, one can show similarly to Remark 6.4 that the above theorem is applicable (with asuitable choice of ε ), if ϕ, ψ satisfy the assumptions stated in Remark 6.4. (cid:7) Proof.

Let N ∈ N be arbitrary and choose n ∈ N with n ≤ N < n +1 , i.e., n = ⌊ log N ⌋ . For v ∈ V ( α ) , we denoteby s ( v ) the scale encoded by v , i.e., s (0) := − and s ( j, m, ι ) := j for ( j, m, ι ) ∈ V ( α )0 . Then, we deﬁne W N := n ( v, k ) ∈ V ( α ) × Z (cid:12)(cid:12)(cid:12) s ( v ) ≤ n and (cid:12)(cid:12) B − Tv k (cid:12)(cid:12) ≤ ⌈ n/ε ⌉ o . (E.2) nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Now, note that if ( v, k ) = (( j, m, ι ) , k ) ∈ ( V ( α )0 × Z ) ∩ W N , then j ≤ n and | m | ≤ (cid:6) (1 − α ) j (cid:7) ≤ · (1 − α ) j , so thatwe get | k | = (cid:12)(cid:12) B Tv B − Tv k (cid:12)(cid:12) ≤ (cid:13)(cid:13) B Tv (cid:13)(cid:13) · ⌈ n/ε ⌉ ≤ n/ε ) · (cid:13)(cid:13)(cid:13)(cid:16) j αj m αj (cid:17)(cid:13)(cid:13)(cid:13) ≤ · n/ε · (cid:0) j + 2 αj + 2 αj | m | (cid:1) ≤ · n/ε · j ≤ · (4+2 ⌈ /ε ⌉ ) n , and thus k ∈ {− · n n , . . . , · n n } , where we deﬁned n := 4 + 2 ⌈ /ε ⌉ ∈ N for brevity. Furthermore, clearly | m | ≤ (cid:6) (1 − α ) j (cid:7) ≤ j ≤ n and thus m ∈ (cid:8) − n , . . . , n (cid:9) . Finally, in case of ( v, k ) = (0 , k ) ∈ V ( α ) × Z , we get | k | = (cid:12)(cid:12) B − Tv k (cid:12)(cid:12) ≤ ⌈ n/ε ⌉ ≤ n n and hence k ∈ {− n n , . . . , n n } . All in all, we have shown W N ⊂ h { } × {− n n , . . . , n n } i ∪ n [ j =0 h { j } × (cid:8) − n , . . . , n (cid:9) × { , } × {− · n n , . . . , · n n } i , and thus | W N | ≤ (1 + 2 · n n ) + (1 + 4 n ) · (cid:0) · n (cid:1) · · (1 + 32 · n n ) ≤ (3 · n n ) + 5 n · · n · · (33 · n n ) ( since n ≤ n and n ≤ N ) ≤ · n n + 30 · · (5+2 n ) n ≤ · (5+2 n ) n ≤ · N n . Next, note for arbitrary ( v, k ) ∈ V ( α ) × Z that there is some n ∈ N with s ( v ) ≤ n and (cid:12)(cid:12) B − Tv k (cid:12)(cid:12) ≤ ⌈ n/ε ⌉ , sothat ( v, k ) ∈ W N for N = 2 n . Hence, W := V ( α ) × Z = S N ∈ N W N . Now, choose the enumeration ̺ : N → W suchthat ̺ ﬁrst enumerates W (in an arbitrary way), then W \ W (again arbitrarily), then W \ ( W ∪ W ) , and soon. Formally, if we deﬁne M := 0 and M N := (cid:12)(cid:12)(cid:12) W N \ S N − ℓ =1 W ℓ (cid:12)(cid:12)(cid:12) ∈ N , then ̺ satisﬁes ̺ (cid:16)P Nℓ =1 M ℓ (cid:17) = S Nℓ =1 W ℓ for all N ∈ N . Because of P Nℓ =1 M ℓ ≤ P Nℓ =1 | W ℓ | ≤ · P Nℓ =1 ℓ n ≤ · N n = π ( N ) , we thus have ̺ ( π ( N )) ⊃ S Nℓ =1 W ℓ ⊃ W N for all N ∈ N . For brevity, let us set Z N := ̺ ( π ( N )) ⊂ W for N ∈ N .We have thus constructed the enumeration ̺ : N → V ( α ) × Z from the statement of the theorem. Now, let ϕ, ψ be as in the assumptions of the theorem. Then Theorem E.8 yields some δ ∈ (0 , such that if < δ ≤ δ ,then the system SH α ( ϕ, ψ ; δ ) forms an atomic decomposition simultaneously for all α -shearlet-smoothness spaces S p,qα,s,κ (cid:0) R (cid:1) for p, q ≥ p , s ≤ s ≤ s and ≤ κ ≤ κ = ε . Let < δ ≤ δ be arbitrary and let S ( δ ) , C ( δ ) bethe associated synthesis and coeﬃcient operators. As noted in Theorem E.8 (see Theorem 4.3), the domain andcodomain of these operators strictly speaking depend on the choice of p, q, s, κ , but the action of these operatorsdoes not. Hence, we commit the weak notational crime of not indicating this dependence.For f ∈ E β (cid:0) R (cid:1) ⊂ L (cid:0) R (cid:1) = S , α, , (cid:0) R (cid:1) , let c ( f ) := ( c ( f ) w ) w ∈ W := C ( δ ) f ∈ C , u ,ω ,δ = ℓ ( W ) and choose asubset J ( f ) N ⊂ Z N satisfying | J ( f ) N | = N and | c ( f ) j | ≥ | c ( f ) i | for all j ∈ J ( f ) N and all i ∈ Z N \ J ( f ) N . Such a choice ispossible, since Z N is ﬁnite with | Z N | = π ( N ) ≥ N . Finally, set f N := S ( δ ) ( J ( f ) N · c ( f ) ) . By deﬁnition of S ( δ ) , f N is then a linear combination of N elements of the set (cid:8) γ [ ̺ ( ℓ )] (cid:12)(cid:12) ℓ = 1 , . . . , π ( N ) (cid:9) , as desired.It remains to verify the claimed approximation rate. Thus, let σ, ν > be arbitrary.We start with some preliminary considerations: In view of Remark E.9, we see that there are symmetric, real-valued functions ϕ , ψ ∈ C c (cid:0) R (cid:1) which satisfy the assumptions of Theorem E.7 for the choices of p , q , κ , s , s , ε from the current theorem. Hence, there is τ > such that the α -shearlet system (cid:0) θ [ v,k ] (cid:1) v ∈ V ( α ) ,k ∈ Z := SH α ( ϕ , ψ ; τ ) forms a Banach frame for all α -shearlet smoothness spaces S p,qα,s,κ (cid:0) R (cid:1) , for the same range of parameters as above.Note that the distinction between SH α ( ˜ ϕ , ˜ ψ ; τ ) and SH α ( ϕ , ψ ; τ ) does not matter by symmetry of ϕ , ψ . Asa consequence of Lemma 5.12 and of the symmetry and real-valuedness of ϕ , ψ , we then see that the analysisoperator A ( δ ) from Theorem E.7 satisﬁes A ( δ ) f = (cid:0)(cid:10) f, θ [ v,k ] (cid:11) L (cid:1) v ∈ V ( α ) ,k ∈ Z for all f ∈ L (cid:0) R (cid:1) = S , α, , (cid:0) R (cid:1) andthus in particular for f ∈ E β (cid:0) R (cid:1) .Now, for v ∈ V ( α ) and f ∈ E β (cid:0) R ; ν (cid:1) , we have k f k L ∞ ≤ C = C ( ν ) and thus |h f, θ [ v,k ] i L | ≤ k f k L ∞ · k θ [ v,k ] k L ≤ C · | det B ( α ) v | − / · max {k ϕ k L , k ψ k L } = C C · u − α v , with C := max {k ϕ k L , k ψ k L } . By the consistency statement of Theorem E.7 (see Theorem 4.2), this shows f ∈ S ∞ , ∞ α, (cid:0) R (cid:1) with k f k S ∞ , ∞ α, ≤ C · (cid:13)(cid:13) A ( δ ) f (cid:13)(cid:13) C ∞ , ∞ u ≤ C C C with C := (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R ( δ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) C ∞ , ∞ u → S ∞ , ∞ α, , with the re-construction operator R ( δ ) provided by Theorem E.7 (for ϕ , ψ ). Here, we used the easily veriﬁable identity nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein C ∞ , ∞ u = ℓ ∞ u (1+ α ) / (cid:0) V ( α ) × Z (cid:1) , where u (1+ α ) / = ( u (1+ α ) / v ) v ∈ V ( α ) is interpreted as a weight on V ( α ) × Z in theobvious way.Now, choose A ≥ with supp ϕ , supp ψ ⊂ ( − A, A ) . Further, note that R = R − = R T = ( ) preserves the ℓ ∞ -norm, so that every ( j, m, ι ) ∈ V ( α )0 satisﬁes (cid:13)(cid:13) B − Tj,m,ι (cid:13)(cid:13) ℓ ∞ → ℓ ∞ = (cid:13)(cid:13)(cid:13)(cid:16) − j

00 2 − αj (cid:17) (cid:16) − m (cid:17)(cid:13)(cid:13)(cid:13) ℓ ∞ → ℓ ∞ = (cid:13)(cid:13)(cid:13)(cid:16) − j − − j m − αj (cid:17)(cid:13)(cid:13)(cid:13) ℓ ∞ → ℓ ∞ ≤ − j + 2 − αj + (cid:12)(cid:12) − − j m (cid:12)(cid:12) ≤ , since | m | ≤ (cid:6) (1 − α ) j (cid:7) ≤ j . Further, clearly (cid:13)(cid:13) B − T (cid:13)(cid:13) ℓ ∞ → ℓ ∞ = k id k ℓ ∞ → ℓ ∞ = 1 ≤ . Now, since each f ∈ E β (cid:0) R (cid:1) satisﬁes supp f ⊂ [ − , , we see that (cid:10) f, θ [ v,k ] (cid:11) L = 0 can only hold if ∅ ( [ − , ∩ supp θ [ v,k ] ( with θ v = ψ for v ∈ V ( α )0 and θ = ϕ ) = [ − , ∩ supp L τ · B − Tv k (cid:2) θ v ◦ B Tv (cid:3) ⊂ [ − , ∩ h τ · B − Tv k + B − Tv ( − A, A ) i , which implies τ · B − Tv k ∈ [ − , − B − Tv ( − A, A ) ⊂ [ − , + 3 ( − A, A ) ⊂ [ − A, A ] , since A ≥ .Hence, ω κ (cid:0) τ · B − Tv k (cid:1) = (cid:0) (cid:12)(cid:12) τ · B − Tv k (cid:12)(cid:12)(cid:1) ε ≤ (1 + 8 A ) ε ≤ A for all ( v, k ) ∈ W with (cid:10) f, θ [ v,k ] (cid:11) L = 0 , since ε ≤ . But Proposition 6.2 shows because of ∈ (2 / (1 + β ) , that E β (cid:0) R ; ν (cid:1) ⊂ S , α, (1+ α )(1 − − ) (cid:0) R (cid:1) is bounded,i.e., k f k S , α, (1+ α ) ( − − )( R ) ≤ C = C ( β, ν ) . Since the associated coeﬃcient space is C , u (1+ α ) / = ℓ ( W ) , thisimplies (cid:13)(cid:13) A ( δ ) f (cid:13)(cid:13) ℓ ≤ C = C ( β, ν ) . But since we just saw that ω κ (cid:0) τ · B − Tv k (cid:1) ≤ A for those ( v, k ) ∈ W forwhich (cid:0) A ( δ ) f (cid:1) v,k = 0 , this implies (cid:13)(cid:13) A ( δ ) f (cid:13)(cid:13) C , u (1+ α ) / ,ωκ ,τ ≤ A · C for all f ∈ E β (cid:0) R ; ν (cid:1) , as one can see directlyfrom Deﬁnition E.3. By the consistency statement of Theorem E.7 (see Theorem 4.2), this shows as above that f ∈ S , α, α ,κ (cid:0) R (cid:1) with k f k S , α, α ,κ ≤ C = C ( β, ν, κ , ϕ , ψ , τ ) , for all f ∈ E β (cid:0) R ; ν (cid:1) . Here, we used that s = 0 ≤ α ≤ β = s , since α ≤ ≤ β .Now, we continue with the proof of the approximation rate: Since we have p − − − → β/ as p ↓ β and β − σ < β , there is some p = p ( β, σ ) ∈ (2 / (1 + β ) , with p − − − > β − σ . By Proposition 6.2, E β (cid:0) R ; ν (cid:1) ⊂ S p,pα, (1+ α )( p − − − ) (cid:0) R (cid:1) is bounded and the associated coeﬃcient space to this α -shearlet smoothnessspace is C p,pu (1+ α ) ( p − − − ) = ℓ p ( W ) , so that we get (cid:13)(cid:13) c ( f ) (cid:13)(cid:13) ℓ p = (cid:13)(cid:13) C ( δ ) f (cid:13)(cid:13) C p,pu (1+ α ) ( p − − − ) ≤ C = C ( ϕ, ψ, β, δ, p, ν ) .Here, we used that s = 0 ≤ (1 + α ) (cid:0) p − − − (cid:1) ≤ (1 + α ) (cid:16) β − (cid:17) = β +12 = s and p ≥ p = β , so that S p,pα, (1+ α )( p − − − ) (cid:0) R (cid:1) is in the “allowed” range.Likewise, our considerations from above showed that E β (cid:0) R ; ν (cid:1) is a bounded subset of S ∞ , ∞ α, (cid:0) R (cid:1) , and of S , α, α ,κ (cid:0) R (cid:1) , so that there are constants C , C (only dependent on ϕ, ψ, δ, β, ν, ε ) with (cid:13)(cid:13) c ( f ) (cid:13)(cid:13) C , u (1+ α ) / ,ωκ ,δ ≤ C and (cid:13)(cid:13) c ( f ) (cid:13)(cid:13) ℓ ∞ u (1+ α ) / ≤ C , since C ∞ , ∞ u = ℓ ∞ u (1+ α ) / (cid:0) V ( α ) × Z (cid:1) . Finally, set C := (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) S ( δ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ℓ → L .Because of S ( δ ) ◦ C ( δ ) = id S , α, = id L and since c ( f ) = C ( δ ) f , we have k f − f N k L = (cid:13)(cid:13)(cid:13) S ( δ ) [ c ( f ) − J ( f ) N · c ( f ) ] (cid:13)(cid:13)(cid:13) L ≤ C · k c ( f ) − J ( f ) N · c ( f ) k ℓ ( W ) ( since J ( f ) N ⊂ Z N ) ≤ C · (cid:18)(cid:13)(cid:13)(cid:13) c ( f ) (cid:13)(cid:13)(cid:13) ℓ ( W \ Z N ) + (cid:13)(cid:13)(cid:13) c ( f ) − J ( f ) N · c ( f ) (cid:13)(cid:13)(cid:13) ℓ ( Z N ) (cid:19) . (E.3)Now, our choice of the set J ( f ) N , together with Stechkin’s estimate (see e.g. [24, Proposition 2.3]), shows (cid:13)(cid:13)(cid:13) c ( f ) − J ( f ) N · c ( f ) (cid:13)(cid:13)(cid:13) ℓ ( Z N ) ≤ N − ( p − ) · (cid:13)(cid:13)(cid:13) c ( f ) (cid:13)(cid:13)(cid:13) ℓ p ( Z N ) ≤ C · N − ( p − ) ≤ C · N − ( β − σ ) , since p − − − ≥ β − σ , so that it suﬃces to further estimate the ﬁrst term in equation (E.3).But for ( v, k ) ∈ W \ Z N ⊂ W \ W N , we have s ( v ) ≥ n (and thus in particular v ∈ V ( α )0 ), or (cid:12)(cid:12) B − Tv k (cid:12)(cid:12) > ⌈ n/ε ⌉ ,where we recall that n ≤ N < n +1 . In the ﬁrst case, we have (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) ≤ C · u − (1+ α ) / v ≤ C · u − / v ≤ C · − n andin the second case, we get ω κ (cid:0) δ · B − Tv k (cid:1) = (cid:0) (cid:12)(cid:12) δ · B − Tv k (cid:12)(cid:12)(cid:1) ε ≥ δ ε · n ≥ δ · n and thus (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) ≤ C · (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) ≤ C · − n δ · ω κ (cid:0) δ · B − Tv k (cid:1) · (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) . nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Therefore, (cid:13)(cid:13)(cid:13) c ( f ) (cid:13)(cid:13)(cid:13) ℓ ( W \ Z N ) ≤ X v ∈ V ( α )0 with s ( v ) ≥ n X k ∈ Z (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) + X v ∈ V ( α )0 X k ∈ Z with | B − Tv k | > ⌈ n/ε ⌉ (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) ≤ C · − n X v ∈ V ( α ) X k ∈ Z (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) + C δ · − n · X v ∈ V ( α ) X k ∈ Z ω κ (cid:0) δ · B − Tv k (cid:1) (cid:12)(cid:12)(cid:12) c ( f ) v,k (cid:12)(cid:12)(cid:12) ( since ω κ ≥ ) ≤ C · (cid:0) δ − (cid:1) · − n · (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:18) | det B v | − · u α v · (cid:13)(cid:13)(cid:13)(cid:13)(cid:16) ω κ (cid:0) δ · B − Tv k (cid:1) · c ( f ) v,k (cid:17) k ∈ Z (cid:13)(cid:13)(cid:13)(cid:13) ℓ (cid:19) v ∈ V ( α ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ℓ = C · (cid:0) δ − (cid:1) · − n · (cid:13)(cid:13)(cid:13) c ( f ) (cid:13)(cid:13)(cid:13) C , u (1+ α ) / ,ωκ ,δ ≤ C C · (cid:0) δ − (cid:1) · − n ( since N ≤ n +1 and β − σ ≤ β ≤ ) ≤ C C · (cid:0) δ − (cid:1) · N − ≤ C C · (cid:0) δ − (cid:1) · N − ( β − σ ) . Taking the square root and recalling equation (E.3) ﬁnishes the proof. (cid:3)

References [1] L. Borup and M. Nielsen. Frame decomposition of decomposition spaces.

J. Fourier Anal. Appl. , 13(1):39–70, 2007.[2] E.J. Candès and D.L. Donoho. New tight frames of curvelets and optimal representations of objects with piecewise C singularities. Comm. Pure Appl. Math. , 57(2):219–266, 2004.[3] J.G. Christensen and G. Ólafsson. Coorbit spaces for dual pairs.

Appl. Comput. Harmon. Anal. , 31(2):303–324, 2011.[4] O. Christensen.

An Introduction to Frames and Riesz Bases , volume 7 of

Appl. Numer. Harmon. Anal.

Birkhäuser Boston, Inc.,Boston, MA, 2003.[5] E. Cordero and K. Gröchenig. Localization of frames II.

Appl. Comput. Harmon. Anal. , 17(1):29–47, 2004.[6] S. Dahlke, F. De Mari, E. De Vito, D. Labate, G. Steidl, G. Teschke, and S. Vigogna. Coorbit spaces with voice in a Fréchet space.

J. Fourier Anal. Appl. , pages 1–66, 2016.[7] S. Dahlke, S. Häuser, G. Steidl, and G. Teschke. Shearlet coorbit spaces: traces and embeddings in higher dimensions.

Monatsh.Math. , 169(1):15–32, 2013.[8] S. Dahlke, S. Häuser, and G. Teschke. Coorbit space theory for the Toeplitz shearlet transform.

Int. J. Wavelets Multiresolut. Inf.Process. , 10(4):1250037, 13, 2012.[9] S. Dahlke, G. Kutyniok, G. Steidl, and G. Teschke. Shearlet coorbit spaces and associated Banach frames.

Appl. Comput. Harmon.Anal. , 27(2):195–214, 2009.[10] S. Dahlke, G. Steidl, and G. Teschke. Coorbit spaces and Banach frames on homogeneous spaces with applications to the sphere.

Adv. Comput. Math. , 21(1-2):147–180, 2004.[11] S. Dahlke, G. Steidl, and G. Teschke. Weighted coorbit spaces and Banach frames on homogeneous spaces.

J. Fourier Anal. Appl. ,10(5):507–539, 2004.[12] S. Dahlke, G. Steidl, and G. Teschke. The continuous shearlet transform in arbitrary space dimensions.

J. Fourier Anal. Appl. ,16(3):340–364, 2010.[13] S. Dahlke, G. Steidl, and G. Teschke. Shearlet coorbit spaces: compactly supported analyzing shearlets, traces and embeddings.

J.Fourier Anal. Appl. , 17(6):1232–1255, 2011.[14] S. Dahlke, G. Steidl, and G. Teschke. Multivariate shearlet transform, shearlet coorbit spaces and their structural properties. In

Shearlets , Appl. Numer. Harmon. Anal., pages 105–144. Birkhäuser/Springer, New York, 2012.[15] D.L. Donoho. Sparse components of images and optimal atomic decompositions.

Constr. Approx. , 17(3):353–382, 2001.[16] H.G. Feichtinger and P. Gröbner. Banach spaces of distributions deﬁned by decomposition methods, I.

Math. Nachr. , 123(1):97–120,1985.[17] H.G. Feichtinger and K. Gröchenig. A uniﬁed approach to atomic decompositions via integrable group representations. In

Functionspaces and applications (Lund, 1986) , volume 1302 of

Lecture Notes in Math. , pages 52–73. Springer, Berlin, 1988.[18] H.G. Feichtinger and K. Gröchenig. Banach spaces related to integrable group representations and their atomic decompositions, I.

J. Funct. Anal. , 86:307–340, 1989.[19] H.G. Feichtinger and K. Gröchenig. Banach spaces related to integrable group representations and their atomic decompositions, II.

Monatsh. Math. , 108:129–148, 1989.[20] A. Flinth and M. Schäfer. Multivariate α -molecules. J. Approx. Theory , 202(C):64–108, February 2016.[21] G.B. Folland.

Real Analysis: Modern Techniques and Their Applications . Pure and applied mathematics. Wiley, second edition,1999.[22] M. Fornasier and K. Gröchenig. Intrinsic localization of frames.

Constr. Approx. , 22(3):395–415, 2005.[23] M. Fornasier and H. Rauhut. Continuous frames, function spaces, and the discretization problem.

J. Fourier Anal. Appl. , 11(3):245–287, 2005.[24] S. Foucart and H. Rauhut.

A Mathematical Introduction to Compressive Sensing . Appl. Numer. Harmon. Anal.Birkhäuser/Springer, New York, 2013.[25] H. Führ. Wavelet frames and admissibility in higher dimensions.

J. Math. Phys. , 37(12):6353–6366, 1996.[26] H. Führ. Continuous wavelets transforms from semidirect products.

Cienc. Mat. (Havana) , 18(2):179–191, 2000. nalysis vs. Synthesis Sparsity for α -Shearlets — Felix Voigtlaender & Anne Pein Colloq. Math. , 120(1):103–126, 2010.[28] H. Führ. Coorbit spaces and wavelet coeﬃcient decay over general dilation groups.

Trans. Amer. Math. Soc. , 367(10):7373–7401,2015.[29] H. Führ. Vanishing moment conditions for wavelet atoms in higher dimensions.

Adv. Comput. Math. , 42(1):127–153, 2016.[30] H. Führ and M. Mayer. Continuous wavelet transforms from semidirect products: cyclic representations and Plancherel measure.

J. Fourier Anal. Appl. , 8(4):375–397, 2002.[31] H. Führ and R. Raisi-Tousi. Simpliﬁed vanishing moment criteria for wavelets over general dilation groups, with applications toabelian and shearlet dilation groups.

Appl. Comput. Harmon. Anal. , 2016.[32] H. Führ and F. Voigtlaender. Wavelet coorbit spaces viewed as decomposition spaces.

J. Funct. Anal. , 269:80–154, April 2015.[33] R. Gribonval and M. Nielsen. Highly sparse representations from dictionaries are unique and independent of the sparseness measure.

Appl. Comput. Harmon. Anal. , 22(3):335 – 355, 2007.[34] K. Gröchenig. Describing functions: atomic decompositions versus frames.

Monatsh. Math. , 112(1):1–42, 1991.[35] K. Gröchenig. Localization of frames, Banach frames, and the invertibility of the frame operator.

J. Fourier Anal. Appl. , 10(2):105–132, 2004.[36] P. Grohs. Intrinsic localization of anisotropic frames.

Appl. Comput. Harmon. Anal. , 35(2):264–283, 2013.[37] P. Grohs, S. Keiper, G. Kutyniok, and M. Schäfer. α -molecules. Appl. Comput. Harmon. Anal. , 41(1):297 – 336, 2016.[38] P. Grohs, S. Keiper, G. Kutyniok, and M. Schäfer. Cartoon approximation with α -curvelets. J. Fourier Anal. Appl. , 22(6):1235–1293, 2016.[39] P. Grohs and G. Kutyniok. Parabolic molecules.

Found. Comput. Math. , 14(2):299–337, 2014.[40] P. Grohs, G. Kutyniok, J. Ma, and P. Petersen. Anisotropic multiscale systems on bounded domains. arXiv preprint , 2015. arxiv.org/abs/1510.04538 .[41] P. Grohs, G. Kutyniok, P. Petersen, and M. Raslan. Shearlet frames for Sobolev spaces: frame and approximation properties on R and bounded domains. 2017. In preparation.[42] P. Grohs and S. Vigogna. Intrinsic localization of anisotropic frames II: α -molecules. J. Fourier Anal. Appl. , 21(1):182–205, 2015.[43] K. Guo, G. Kutyniok, and D. Labate. Sparse multidimensional representations using anisotropic dilation and shear operators. In

Wavelets and splines: Athens 2005 , Mod. Methods Math., pages 189–201. Nashboro Press, Brentwood, TN, 2006.[44] K. Guo and D. Labate. Optimally sparse multidimensional representation using shearlets.

SIAM J. Math. Anal. , 39(1):298–318,2007.[45] S. Keiper. A Flexible Shearlet Transform – Sparse Approximations and Dictionary Learning. Bachelor thesis, TU Berlin, 2012. .[46] P. Kittipoom, G. Kutyniok, and W. Lim. Construction of compactly supported shearlet frames.

Constr. Approx. , 35(1):21–72,2012.[47] D. Kressner and C. Tobler. Low-rank tensor Krylov subspace methods for parametrized linear systems.

SIAM J. Matrix Anal.Appl. , 32(4):1288–1316, 2011.[48] G. Kutyniok and D. Labate, editors.

Shearlets . Appl. Numer. Harmon. Anal. Birkhäuser/Springer, New York, 2012.[49] G. Kutyniok, J. Lemvig, and W. Lim. Optimally sparse approximations of 3D functions by compactly supported shearlet frames.

SIAM J. Math. Anal. , 44(4):2962–3017, 2012.[50] G. Kutyniok, J. Lemvig, and W. Lim. Shearlets and optimally sparse approximations. In

Shearlets , Appl. Numer. Harmon. Anal.,pages 145–197. Birkhäuser/Springer, New York, 2012.[51] G. Kutyniok and W. Lim. Compactly supported shearlets are optimally sparse.

J. Approx. Theory , 163(11):1564–1589, 2011.[52] D. Labate, L. Mantovani, and P. Negi. Shearlet smoothness spaces.

J. Fourier Anal. Appl. , 19(3):577–611, 2013.[53] H. Rauhut. Coorbit space theory for Quasi-Banach spaces.

Studia Math. , 180(3):237–253, 2007.[54] H. Rauhut and T. Ullrich. Generalized coorbit space theory and inhomogeneous function spaces of Besov–Lizorkin–Triebel type.

J. Funct. Anal. , 260(11):3299–3362, 2011.[55] W. Rudin.

Functional Analysis . International series in pure and applied mathematics. McGraw-Hill, 1991.[56] M. Schäfer. The Role of α -Scaling for Cartoon Approximation. arXiv preprint , 2016. arxiv.org/abs/1612.01036 .[57] D. Vera. Triebel–Lizorkin spaces and shearlets on the cone in R . Appl. Comput. Harmon. Anal. , 35(1):130 – 150, 2013.[58] D. Vera. Shear anisotropic inhomogeneous Besov spaces in R d . Int. J. Wavelets Multiresolut. Inf. Process. , 12(01):1450007, 2014.[59] F. Voigtlaender.

Embedding Theorems for Decomposition Spaces with Applications to Wavelet Coorbit Spaces . PhD thesis, RWTHAachen University, 2015. http://publications.rwth-aachen.de/record/564979 .[60] F. Voigtlaender. Embeddings of Decomposition Spaces. arXiv preprints , 2016. http://arxiv.org/abs/1605.09705 .[61] F. Voigtlaender. Embeddings of Decomposition Spaces into Sobolev and BV Spaces. arXiv preprints , 2016. http://arxiv.org/abs/1601.02201 .[62] F. Voigtlaender. Structured, Compactly Supported Banach Frame Decompositions of Decomposition Spaces. arXiv preprints , 2016. arxiv.org/abs/1612.08772arxiv.org/abs/1612.08772