[PDF] The Role of α -Scaling for Cartoon Approximation

Abstract

The class of cartoon-like functions, classicly defined as piecewise C 2 functions consisting of smooth regions separated by C 2 discontinuity curves, is a well-established model for image data. The quest for optimal approximation of this class has among others led to the development of curvelets, contourlets, and shearlets. Due to parabolic scaling, these systems are able to provide a quasi-optimal N -term approximation rate of order N −2 . Replacing parabolic scaling by α -scaling, one obtains α -curvelets and α -shearlets, which interpolate between wavelet-type systems ( α=1 ), parabolically scaled systems ( α= 1 2 ), and ridgelet-type systems ( α=0 ). Previous research shows that in the range α∈[ 1 2 ,1) they provide quasi-optimal approximation for cartoons of regularity C 1/α with a rate of order N −1/α . In this work we continue to explore α -scaled representation systems, with the aim to better understand the role of the parameter α for approximation. Concerning α -curvelets with α<1 , we prove that the best possible N -term approximation rate achievable for cartoons with curved edges is limited to at most N −1/(1−α) , independent of the smoothness of the cartoons. The maximal rate achievable by simple thresholding of the frame coefficients is even bounded by N −1/max{α,1−α} . If the edges of the cartoons are straight the approximation performance of α -curvelets is different: Assuming C β regularity, we establish an approximation rate of order N −min{ α −1 ,β} , which is quasi-optimal if α∈[0, β −1 ] . Finally, via the framework of α -molecules, the obtained results are extended to other α -scaled systems including in particular α -shearlets.

Full PDF

aa r X i v : . [ m a t h . F A ] D ec The Role of α -Scaling for Cartoon Approximation Martin Sch¨afer ∗ Institute of Mathematics, Technische Universit¨at BerlinStraße des 17. Juni 136, 10623 Berlin, GermanyOctober 10, 2018

Abstract

The class of cartoon-like functions, classicly deﬁned as piecewise C functions consisting of smoothregions separated by C discontinuity curves, is a well-established model for image data. The questfor frames providing optimal approximation for this class has among others led to the development ofcurvelets, contourlets, and shearlets. Due to parabolic scaling, these systems are able to provide N -term approximations converging with a quasi-optimal rate of order N − . Replacing parabolic scalingby α -scaling, one can construct α -curvelet and α -shearlet frames which interpolate between wavelet-type systems for α = 1, the classic parabolically scaled systems for α = , and ridgelet-type systemsfor α = 0. Previous research shows that if α ∈ [ ,

1) they provide quasi-optimal approximation forcartoons of regularity C /α with a rate of order N − /α .In this work we continue the exploration of approximation properties of α -scaled representationsystems, with the aim to better understand the role of the parameter α . Concerning α -curveletswith α <

1, we prove that the best possible N -term approximation rate achievable for cartoons withcurved edges is limited to at most N − / (1 − α ) , independent of the smoothness of the cartoons. Themaximal rate that can be obtained by simple thresholding of the frame coeﬃcients is even boundedby N − / max { α, − α } . Systems of α -curvelets thus cannot take advantage of regularity higher than C /α if α ∈ [ , N − /α cannot be surpassed. For C β cartoons with β ≥ -curvelets provide the best performance with a rate of order N − , however below the optimal rateof order N − β if β >

2. In the range α ∈ [0 , ] the achievable rate cannot exceed N − / (1 − α ) anddeteriorates as α approaches 0.The approximation performance of α -curvelets is diﬀerent if the edges of the cartoons are straight.Assuming C β regularity, we establish an approximation rate of order N − min { α − ,β } , which improvesas α tends to 0. In the range α ∈ [0 , β − ] it is even quasi-optimal, generalizing optimality results forridgelets. By applying the framework of α -molecules, we ﬁnally extend the obtained results to other α -scaled representation systems, including for instance α -shearlet frames. Keywords:

Cartoon Images, Nonlinear Approximation, Wavelets, Curvelets, Shearlets, Ridgelets,Anisotropic Scaling, α -Molecules. MSC2000 Subject Classiﬁcation:

In the age of ‘big data’, eﬃcient data representation is an objective of an ever increasing importance. Notonly does it simplify the handling of the data due to the reduction of needed storage space or the possiblespeed-up of processing times. The knowledge of a ‘good’ representation also gives valuable informationabout the structure of the data itself, simplifying certain processing tasks or even just enabling them inthe ﬁrst place. As an example we may think of the restoration of corrupted signals or the separation ofseveral superimposed signals of distinct types. ∗ Email: [email protected] f a sequence of approximants( f N ) N ∈ N converging to the signal. A standard choice here is to use N -term approximations in therespective dictionary, i.e., approximants being built from just N dictionary elements.A main goal of approximation theory is the development of approximation schemes with a best possiblespeed of convergence, commonly quantiﬁed by the asymptotic decay of the approximation error k f − f N k as N → ∞ . With regard to N -term approximations, the achievable rate is determined by the utilizeddictionary in the background and one aims to ﬁnd dictionaries providing high approximation rates forthe data. Such dictionaries are said to sparsely approximate the corresponding signals and clearly needto be chosen depending on the considered signal class. For eﬃcient data representation it is thereforeessential, ﬁrst, to be able to precisely specify the type of data under consideration, e.g., in the form of anappropriate model, and, second, to develop dictionaries, well adapted to the speciﬁc data class, providingsparse approximations. Subsequently, we are interested in the sparse approximation of image data. In our investigation, we willalways stay in the continuum setting, where images are as usual represented as functions supported onsome compact image domain Ω ⊂ R with values containing pixel information at the respective positions,such as e.g. color or brightness information. Being compactly supported and bounded, the image datacan conveniently be modeled as a subset of the Hilbert space L (Ω), which in turn is considered as asubspace of L ( R ). Hence, we are in a concrete Hilbert space scenario and can resort to the methodologydescribed above, i.e., we aim for appropriate image models and sparsifying dictionaries.For the space L (Ω) the classic Fourier systems constitute an orthonormal basis, providing a straight-forward procedure for representation. However, Fourier systems work well only if the functions underconsideration are smooth. For general images such smoothness assumptions are certainly not fulﬁlled.As another popular representation system wavelets [12, 43] come to mind. Nowadays, they are one ofthe most widely used systems in applied harmonic analysis, with various applications ranging from signalcompression (e.g. JPEG2000 [10]) and restoration [1] to PDE solvers [11]. In particular, they have theability to sparsely approximate functions, which are smooth apart from isolated point singularities. Forgeneral image data, however, such regularity assumptions are still too strict. A characteristic feature ofimages are edges, leading to curvilinear discontinuities in the data. With respect to such line singularities,wavelet systems do not perform optimally any more. The isotropy of their scaling prohibits an optimalresolution of these kind of anisotropic structures.With the desire to speciﬁcally model the occurrence of edges, the concept of cartoon-like functionsemerged. These are piecewise smooth functions featuring discontinuities along lower-dimensional man-ifolds, in our case along the 1-dimensional edge curves of the image. Based on such functions, suitablemodels for natural images have been conceived and diﬀerent model classes have been introduced. Typ-ically, these classes are characterized by a speciﬁc smoothness of the regions and by certain conditionson the separating edges. As examples, let us mention the classic cartoons [5] with C regularity of theregions and the discontinuity curves, or the horizon classes considered e.g. in [15, 8, 39].The achievable approximation rate for a class of cartoon-like functions essentially depends on theregularity of the cartoons, including both the smoothness of the edge curves and the smoothness of theregions in between. It was shown in [39, 38] that C β regularity of the regions and the separating edgeswith β > N − β . By information theoretic arguments, it has furtherbeen established that this rate cannot be surpassed [18], at least in a class-wise sense. Interestingly, thebenchmark N − β is the same for the class of so-called binary cartoons, i.e., cartoon-like functions withconstant regions, and it also does not change if one restricts to C β smooth functions without any edges.With the model of cartoon-like functions at hand, let us turn again to the question of eﬃcient imagerepresentation. In the past, a great amount of energy has been devoted to the eﬀort of constructingdictionaries well-suited for cartoon approximation. Thereby, many diﬀerent paths have been pursued andthe developed methods can be divided into two general categories: adaptive and nonadaptive methods.Adaptive methods are by nature more ﬂexible and have the inherent advantage of being able toadjust to the given data. On the downside, the increased ﬂexibility typically comes at the cost of highercomputational complexity of the employed approximation and reconstruction schemes. Some prominentexamples of adaptive methods for image data are based on wedgelet dictionaries [15] and their higher-orderrelatives, so-called surﬂets [9, 8]. They have been shown to reach the optimality bound N − β for binarycartoons with C β regularity [6, 7]. Other notable dictionaries used for adaptive approximation includebeamlets [19], platelets [45], and derivatives of wedgelets such as multiwedgelets [42] or smoothlets [41].More recently, new adaptive schemes have emerged that use bases, e.g., bandelets [39], grouplets [44],and tetrolets [34]. Quasi-optimal approximation for C β cartoons with β > L ( R ). By giving them a slow decay along the ridge,Donoho constructed an orthonormal basis whose elements are called ‘orthonormal ridgelets’ [16]. Theirclose relationship to the original concept has been analyzed in [17]. Another construction, based ondirectional scaling, goes back to Grohs, providing tight frames [22]. This kind of construction coincideswith the concept of ‘0-curvelets’ presented below.To deal with curved edges, numerous types of frames have been developed. An important milestonewas the introduction of the ﬁrst generation of curvelets [4] by Cand`es and Donoho in 1999. They representthe ﬁrst frame to reach the optimal approximation order of N − for C cartoons via simple thresholding.A modiﬁcation of this system, the second generation of curvelets [5], was introduced in 2002 by thesame authors. It is based on a more elegant and simpler construction principle, yet features the samequasi-optimal approximation properties. Following this early breakthrough, other constructions bettersuited for digital implementation were developed. Let us mention contourlets [14] by Do and Vetterliand shearlets, whose construction goes back mainly to Guo, Kutyniok, Labate, Lim, and Weiss. Theﬁrst shearlet construction consisted of band-limited functions and was presented in [35, 28]. Later, moresophisticated shearlet systems were developed, such as e.g. the well-localized band-limited Parseval framein [30] or even systems of compactly supported shearlets [33]. Like curvelets, those systems providequasi-optimal approximation for C cartoons. For the classic band-limited shearlets this was establishedin [29], for those with compact support in [37].A common principle underlying the above constructions is parabolic scaling, a type of scaling optimallyadapted to C singularity curves. It is essential for the quasi-optimal approximation of C cartoons andled to the notion of parabolic molecules [25]. This concept uniﬁes various parabolically scaled systemsunder one roof, in particular the classic curvelet and shearlet systems, and is the predecessor of the moregeneral framework of α -molecules [24]. α -Scaling Comparing the approximation properties of wavelets, curvelets, and ridgelets, a distinct behavior withrespect to their ability to resolve edges is characteristic. Ridgelets are optimally adapted to straightedges, curvelets are optimal for C line singularities, and wavelets for point singularities. This distinctbehavior is due to the diﬀerent scaling laws underlying their respective constructions: isotropic scalingfor wavelets, parabolic scaling for curvelets, and directional scaling for ridgelets.Introducing a parameter α ∈ R and associated α -scaling matrices A α,s = (cid:18) s s α (cid:19) , s > , (1)one can interpolate between these diﬀerent types of scaling and construct corresponding α -scaled repre-sentation systems. Incorporating α -scaling in the original construction of curvelets, for instance, yieldsso-called α -curvelets [23]. For α ∈ [0 , α = 0, the classic curvelets for α = , and wavelets for α = 1. In a similarfashion, α -shearlet systems [32, 36] can be obtained by modifying the classic shearlet constructions.A natural question concerning such α -scaled systems is how their approximation properties are aﬀectedby a change of the parameter α . With regard to cartoon approximation, this question has been pursuedin [23] for α -curvelet frames and in [32, 36] for α -shearlet frames. It was shown that, if α ∈ [ ,

1) andif the cartoon f is of regularity C β with β = α − , simple thresholding of the coeﬃcients yields N -termapproximations f N with a convergence of k f − f N k . N − β log( N ) β as N → ∞ , (2)which apart from the log-factor is optimal. Later, these results were further extended utilizing thetheory of α -molecules [24]. This is a framework providing a uniﬁed approach to α -scaled systems, basedsolely on assumptions on the time-frequency localization of the respective functions. It allows to transferapproximation results obtained for one system of α -molecules to other systems, under certain consistencyconditions. In particular, the rate (2) for α -curvelets was generalized (in a weak form) to other α -scaledrepresentation systems [24], which all achieve a rate of N − β + ε with ε > α -scaled representation systems and their abilityto approximate cartoon-like functions remain open, e.g., their performance in the range α < or theirsuitability for the approximation of straight edges. In this research we want to address these openquestions, shedding (even) more light on the role of the parameter α . Our exposition starts with a short review of α -scaled systems in Section 2, where also a speciﬁc con-struction of an α -curvelet frame for L ( R ) is presented. This frame, denoted by C s,α , will serve as aprototypical system whose properties have ramiﬁcations for other α -scaled systems, such as for example α -shearlets, due to the transference principle of the framework of α -molecules.In the main part of the article, Sections 3 and 4, we analyze the N -term approximation properties ofthe frame C s,α with regard to diﬀerent classes of cartoon images. In Section 3 we start with cartoons withcurved edges and ﬁrst introduce corresponding signal classes of C β regularity for β ∈ [0 , ∞ ). Theorem 3.2recalls N − β as the order of the maximal achievable approximation rate for such C β cartoons, whichcannot be surpassed by any polynomial-depth restricted N -term approximation scheme, independent ofthe utilized dictionary.Then we recall the quasi-optimal approximation (2) of α -curvelets, proved in [23], if α ∈ [ ,

1) and β = α − . Our main ﬁndings in Section 3, Theorems 3.9 and 3.11, extend and complement this result.Theorem 3.9 shows that the best possible N -term approximation rate achievable by C s,α for cartoonswith curved edges is limited to at most N − − α , where α < N − { α, − α } if asimple thresholding scheme is used.These bounds show that α -curvelets with α ∈ [ ,

1) cannot take advantage of regularity higher than C /α . Furthermore, they prohibit optimal approximation of C β cartoons if β >

2, since decreasing α beyond deteriorates the achievable rates compared to the classic curvelets. Hence, with a rate of order N − , these provide the best performance among all α -curvelet systems, if the regularity of the cartoonsis at least C and curved singularities are involved. As a consequence, no curvelet system can reach theoptimality bound N − β if β >

2. In fact, up to now, no frame construction is known where a nonadaptiveapproximation scheme can break this N − barrier and the quest for such frames remains open.In Section 4 we consider cartoons featuring only straight edges. For the corresponding classes ofregularity C β the same optimality benchmark N − β holds true as for the cartoons with curved edges. Ourmain result of Section 4, Theorem 4.1, shows that a simple thresholding scheme for the α -curvelet frame C s,α yields approximation rates of order N − min { α − ,β } . Hence, here a smaller α is beneﬁcial and evenensures quasi-optimal approximation if α ∈ [0 , β − ]. This ﬁnding generalizes earlier results for ridgelets.We ﬁnish with a short discussion of our results in Section 5. In particular, we point out someramiﬁcations for other α -scaled representation systems, utilizing the framework of α -molecules. All α -scaled systems which are frames and in a certain sense consistent with C s,α feature similar properties,formulated in Theorem 5.3 and Corollary 5.4.Some useful properties of Bessel functions needed in Section 3 are collected in the appendix. Before we begin, let us ﬁx some general notation. Writing N we will refer to the natural numbers withoutzero, and we let N := N ∪ { } . As usual, Z , R and C denote the integer, real and complex numbers.Further, we put R +0 := [0 , ∞ ) and R + := (0 , ∞ ). We also introduce the ‘ﬂoor’ and ‘ceiling’ of t ∈ R , ⌊ t ⌋ := max { n ∈ Z : n ≤ t } and ⌈ t ⌉ := min { n ∈ Z : n ≥ t } . The symbol T is used for the torus obtainedfrom the interval [0 , π ] by identifying the endpoints. The unit-circle in C ≃ R is denoted by S .The vector space R d , d ∈ N , is equipped with the Euclidean scalar product h· , ·i and associatednorm | · | . The notation | · | p , p ∈ (0 , ∞ ], is used for the p -(quasi-)norms on R d . For a multi-index m = ( m , . . . , m d ) ∈ N d , ∂ m := ∂ m · · · ∂ m d d is a diﬀerential operator with ∂ i , i ∈ { , . . . , d } , the partialderivative in the i -th coordinate direction. Given a vector x = ( x , . . . , x d ) ∈ R d , we further deﬁne x m := x m · · · x m d d (with the convention 0 := 1).If A ( ω ) ≤ CB ( ω ) holds true for two quantities A, B ∈ R depending on a set of parameters ω with auniform constant C >

0, we write A . B or equivalently B & A . If both, A . B and B . A , hold true,we denote this by A ≍ B .For measurable subsets Ω ⊆ R d we let L p (Ω), p ∈ (0 , ∞ ], denote the usual Lebesgue spaces withrespect to the Lebesgue measure. The corresponding (quasi-)norms are denoted by k · k L p (Ω) , in caseΩ = R d we abbreviate k · k p := k · k L p ( R d ) . For the scalar product on L (Ω) the same notation h· , ·i asfor the Euclidean product on R d is used. The Lebesgue sequence spaces, for a discrete index set Λ, aredenoted by ℓ p (Λ) with associated (quasi-)norms k · k ℓ p . The deﬁnition of their weak counterparts wℓ p (Λ),equipped with (quasi-)norms k · k wℓ p , are recalled in Section 4.The space C β loc ( R d ), for an integer β ∈ N ∪ {∞} , shall comprise all continuous real-valued functionson R d , whose classic derivatives up to order β ∈ N exist. For β ∈ [0 , ∞ ) we then deﬁne C β ( R d ) := n f ∈ C ⌊ β ⌋ loc ( R d ) : k f k C β ( R d ) := k f k C ⌊ β ⌋ ( R d ) + X | m | = ⌊ β ⌋ H¨ol ( ∂ m f, β − ⌊ β ⌋ ) < ∞ o , where k f k C ⌊ β ⌋ ( R d ) := P | m | ≤⌊ β ⌋ sup x ∈ R d | ∂ m f ( x ) | and the H¨older constant of exponent α ∈ [0 ,

1] is given by

H¨ol ( f, α ) := sup x,y ∈ R d | f ( x ) − f ( y ) || x − y | α . The notation C β (Ω), for some open subset Ω ⊆ R d , is used for functions f ∈ C β ( R d ) whose supportsupp f is compact and contained in the closure Ω of Ω. Frequently, we also need to measure functions f ∈ C β loc ( R d ), β ∈ N , with the following Sobolev norms, where p ∈ [1 , ∞ ], k f k β,p := k f k W β,p ( R d ) := X | m | ≤ β k ∂ m f k L p ( R d ) . Finally, we will use the following version of the Fourier transform. For a Schwartz function f ∈ S ( R d ) F f ( ξ ) := Z R d f ( x ) exp( − πi h x, ξ i ) dx , ξ ∈ R d . As usual, F is extended to the tempered distributions S ′ ( R d ), and we often write b f for F f . α -Curvelets Directional multi-scale systems based on α -scaling feature a characteristic tiling of the frequency do-main. The multi-scale structure is reﬂected by a partition of the Fourier plane into dyadic coronae,further divided into wedge-like tiles, where the energy of the system elements is concentrated. In case ofinhomogeneous systems, a ball around the origin corresponds to the low-frequency base scale.A prototypical instance of such an α -scaled system is the frame C s,α of α -curvelets, thoroughly deﬁnedin this section. It is prototypical in the sense that many of its properties transfer – via the framework of α -molecules [24] – to other α -scaled systems. Among these are other α -curvelet constructions [5, 23], butalso band-limited [35, 28, 30] as well as compactly supported [33, 32, 36] α -shearlet systems. This factgives the system C s,α a special signiﬁcance for our purpose and motivates its detailed discussion here.Before deﬁning C s,α , which is similar to the construction of α -curvelets in [23], let us ﬁrst elaboratethe geometric aspects of the corresponding frequency tiling. At scales j ≥ C j := n ξ ∈ R : C s ( j − ≤ | ξ | ≤ C s ( j +1) o , (3)where s > C > j is given by theangle ϕ j := π −⌊ js (1 − α ) ⌋− (4)and depends on another parameter α ∈ ( −∞ , α -scaled rectangle of dimension 2 js × jsα . By combining opposite wedges to wedge pairs, weobtain the tiles for the scales j ≥

1. There is only one tile associated with the base scale j = 0, the lowfrequency ball C := { ξ ∈ R : | ξ | ≤ C s } .For convenience, let us also introduce the angle ϕ := π . According to the above construction, ateach scale j ∈ N the number of tiles L j is given by L := πϕ − = 1 and L j := πϕ − j = 2 ⌊ js (1 − α ) ⌋ +1 , j ≥ . (5)In the following, the individual tiles will be denoted by W j,ℓ and indexed by the set J := (cid:8) ( j, ℓ ) : j ∈ N , ℓ ∈ {− L − j , . . . , L + j } (cid:9) with L − j := ⌊ L j / ⌋ and L + j := ⌈ L j / ⌉ −

1. Hereby we let W , := C , and in each corona C j with j ≥ W j, shall be aligned horizontally, i.e., W j, := n ξ = ( ξ , ξ ) ∈ C j : | ξ | ≥ cos( ϕ j / | ξ | o . The remaining tiles W j,ℓ , ℓ = 0, are obtained via rotations of W j, by integer multiples ϕ j,ℓ := ℓϕ j of theangle ϕ j deﬁned in (4). Hence, W j,ℓ := R − j,ℓ W j, with rotation matrix R j,ℓ := R ϕ j,ℓ , where R ϕ := (cid:18) cos( ϕ ) − sin( ϕ )sin( ϕ ) cos( ϕ ) (cid:19) , ϕ ∈ R . (6)The resulting tiling of the Fourier domain is schematically depicted in Figure 1 (a).We remark that in contrast to [23], where α ∈ [0 , α ∈ ( −∞ ,

1] in the α -curvelet construc-tion. This range is natural for the considered inhomogeneous systems. If α >

1, the number of tiles L j in each corona decreases with rising scale, and eventually L j = 1. Thus, at high scales, those systemswould behave like isotropically scaled systems with α = 1. α -Curvelets C s,α Let us now turn to the actual construction of the α -curvelet frame C s,α . To realize the described frequencytiling, smooth functions W J : R → C , J ∈ J , are used, with compact support approximately given bythe tiles W J . It is convenient to construct them as tensor products of a radial and an angular component.This allows to realize the desired support separately on the ray R +0 = [0 , ∞ ) and on the circle S ⊂ R .Projecting the coronae C j onto the ray R +0 yields the intervals I := C · [0 , s ] and I j := C · [2 s ( j − , s ( j +1) ] , j ≥ . (7) C I C I ... C j W j, I j W j,ℓ ϕ j ϕ j,ℓ (a) W j, W + j, W − j, Ξ j, C j S A j, A − j, I − j I j (b)Figure 1: (a): Tiling of Fourier domain into coronae C j and wedges W j,ℓ . (b): Schematic display of thefrequency support of a wedge function W j, .For the radial subdivision, we thus utilize nonnegative smooth functions U j ∈ C ∞ ( R +0 ), j ∈ N , whichsatisfy the support condition supp U j ⊆ I j and for r ∈ R +0 A ≤ X j ≥ U j ( r ) ≤ B with constants 0 < A ≤ B < ∞ . (8)More concretely, we assume that the functions U j , j ≥

1, are generated by a single function U ∈ C ∞ ( R +0 , [0 , U j ( · ) := U (2 − js · ) and that there are 1 < τ < τ < s such thatsupp U ⊆ C · [0 , τ ] , p A ≤ U ≤ p B on C · [0 , τ ] , supp U ⊆ C · [2 − s τ , τ ] , p A ≤ U ≤ p B on C · [2 − s τ , τ ] . (9)Such functions exist and can even be constructed with A = B = 1 in (8).For the angular subdivision, we construct at each scale j ∈ N a smooth partition on the unit circle S ⊂ R , reﬂecting the angular support of the tiles W j,ℓ . We start with a function e V ∈ C ∞ ( R , [0 , e V ⊆ [ − π, π ] , √ A ≤ e V ≤ √ B on [ − π , π ] , A ≤ P k ∈ Z e V ( · − kπ ) ≤ B , where 0 < A ≤ B < ∞ . Scaling then gives rise to the functions e V j ( · ) := e V ( L j · ) ∈ C ∞ ( R , [0 , j ∈ N . Via the bijection t e it these functions yield functions e V j, ∈ C ∞ ( S , [0 , V j, ( ξ ) := e V j, ( ξ ) + e V j, ( − ξ ) , ξ ∈ S , and note that √ A ≤ V , ≤ √ B on S . Applying the rotation (6) then yields functions V j,ℓ ( · ) := V j, ( R j,ℓ · ) for every J = ( j, ℓ ) ∈ J , which satisfy A ≤ P | J | = j V J ( ξ ) ≤ B for all ξ ∈ S . Here we use thenotation | J | := j for J = ( j, ℓ ) ∈ J .Finally, we are ready to deﬁne the wedge functions W j,ℓ ∈ C ∞ ( R ) as the polar tensor products W j,ℓ ( ξ ) := U j ( | ξ | ) V j,ℓ ( ξ/ | ξ | ) , ξ ∈ R . (10)These functions are non-negative ‘bumps’ approximately supported in the corresponding wedges W j,ℓ .They are symmetric, i.e., W j,ℓ ( ξ ) = W j,ℓ ( − ξ ) for ξ ∈ R , and they satisfy A := A A ≤ X J =( j,ℓ ) ∈ J W J ( ξ ) ≤ B B =: B , ξ ∈ R . (11)Let us analyze the support of W J in more detail. Recall the angular function e V j, and note that its supporton S covers an angle range of ϕ + j := ϕ j with ϕ j = πL − j as in (4). Moreover, √ A ≤ e V j, ≤ √ B on arange of size ϕ − j := ϕ j . Hence, supp V j,ℓ ⊆ A j,ℓ and V j,ℓ ≍ A − j,ℓ for the angular intervals A j,ℓ := R − j,ℓ A j, with A j, := n ξ = ( ξ , ξ ) ∈ S : | ξ | ≥ cos( ϕ + j / o , A − j,ℓ := R − j,ℓ A − j, with A − j, := n ξ = ( ξ , ξ ) ∈ S : | ξ | ≥ cos( ϕ − j / o . (12)Next, recall the functions U j on the ray with supp U j ⊆ I j . Due to (8) and (9) their function values arebetween √ A and √ B on I − := C · [0 , τ ] and I − j := C · [2 s ( j − τ , sj τ ] , j ≥ , (13)respectively. This leads us to the following deﬁnition. For J = ( j, ℓ ) ∈ J we introduce the wedge pairs W + J := n ξ ∈ R : | ξ | ∈ I j , ϕ ( ξ ) ∈ A J o and W − J := n ξ ∈ R : | ξ | ∈ I − j , ϕ ( ξ ) ∈ A − J o . (14)The following support properties will be of essential importance later,supp W J ⊆ W + J and √ A ≤ W J ≤ √ B on W − J . (15)A geometric illustration is displayed in Figure 1 (b).Now we ﬁx C = 2 − s / (3 π ) in (3) such that each W + J is contained in the respective rectangleΞ J := R − J Ξ j, , where Ξ j, := [ − js − , js − ] × [ − jsα − , jsα − ] . (16)The rectangles Ξ j, are of size 2 js × jsα and hence the Fourier system { u j, ,k } k ∈ Z given by u j, ,k ( ξ ) := 2 − js (1+ α ) / exp (cid:0) πi (2 − sj k ξ + 2 − sjα k ξ ) (cid:1) , ξ ∈ R , constitutes an orthonormal basis for L (Ξ j, ) . Consequently, the rotated system { u j,ℓ,k } k ∈ Z of functions u j,ℓ,k ( ξ ) := u j, ,k ( R j,ℓ ξ ) , ξ ∈ R , (17)is an orthonormal basis for L (Ξ J ).After this preparation, we are ready to deﬁne the α -curvelet system C s,α . Deﬁnition 2.1.

Let s > , α ∈ ( −∞ , , and assume that { W J } J ∈ J is a family of functions of theform (10) such that (11) holds for < A ≤ B < ∞ . Further, let u j,ℓ,k be the functions deﬁned in (17) . The curvelet system C s,α ( A, B ) := { ψ µ } µ ∈ M with associated index set M := J × Z consists of thefunctions ψ µ = ψ j,ℓ,k given by b ψ j,ℓ,k ( ξ ) := W j,ℓ ( ξ ) u j,ℓ,k ( ξ ) , ξ ∈ R . (18) Note that C s,α ( A, B ) depends on the utilized family { W J } J ∈ J , which is not accounted for in the notation. The curvelets ψ µ are real-valued due to the symmetry of W j,ℓ . Their L -norms may vary slightly withscale, however there are constants 0 < C ≤ C < ∞ such that C ≤ k ψ µ k ≤ C holds true for all µ ∈ M . Most importantly, the system C s,α ( A, B ) is a frame for L ( R ). Lemma 2.2.

The system C s,α ( A, B ) given by (18) is a frame for L ( R ) with frame bounds A and B .Proof. The functions W J satisfy condition (11) wherefore A k f k = A k b f k ≤ X J ∈ J k b f W J k ≤ B k b f k = B k f k for every f ∈ L ( R ) . Since supp ( b f W J ) ⊆ Ξ J and since { u J,k } k ∈ Z is an orthonormal basis of L (Ξ J ) we have the orthogonalexpansion b f W J = P k h b f W J , u J,k i u J,k χ Ξ J . The proof is ﬁnished by the following equality, k b f W J k = X k ∈ Z |h b f W J , u J,k i| = X k ∈ Z |h b f , W J u J,k i| = X k ∈ Z |h b f , b ψ J,k i| = X k ∈ Z |h f, ψ J,k i| . The Parseval frame C s,α (1 ,

1) is of most interest to us and one might wonder why we did not ﬁx theframe bounds A = B = 1 in the beginning. The reason is that, in the proof of Lemma 4.16, we need theadditional ﬂexibility provided by variable A and B . Remark 2.3.

Subsequently, we will write C s,α to refer to the Parseval frame C s,α (1 , . Let us ﬁnish this section with a short discussion of the situation in spatial domain. Here the α -curvelets { ψ j,ℓ,k } k ∈ Z are translates of the functions ψ j,ℓ, . Indeed, since b ψ j,ℓ, = 2 − js (1+ α ) / W j,ℓ and u j,ℓ,k ( · ) = u j, ,k ( R j,ℓ · ) = 2 − js (1+ α ) / exp (cid:0) πi h R − j,ℓ A − j k, ·i (cid:1) , where R j,ℓ is the rotation matrix deﬁned in (6) and A j := A α, js is an α -scaling matrix of the form (1),we have b ψ j,ℓ,k = b ψ j,ℓ, exp (cid:0) πi h R − j,ℓ A − j k, ·i (cid:1) and hence ψ j,ℓ,k = ψ j,ℓ, ( · − x j,ℓ,k ) with x j,ℓ,k := R − j,ℓ A − j k. Since ψ j,ℓ, is the rotation of ψ j, , by the angle ϕ j,ℓ = ℓϕ j , we arrive at the representation ψ j,ℓ,k ( x ) = ψ j, , ( R j,ℓ ( x − x j,ℓ,k )) . (19)In fact, these systems are instances of α -molecules, a concept recalled in the deﬁnition below. Deﬁnition 2.4 ([24, Def. 2.9]) . Let Λ be a set and Φ Λ : Λ → P a map, assigning to each λ ∈ Λ a point ( s λ , θ λ , x λ ) ∈ P in the so-called phase-space P = R + × T × R . Futher, assume that L, M, N , N ∈ N .A family { m λ } λ ∈ Λ of functions in L ( R ) is called a family of α -molecules of order ( L, M, N , N ) withrespect to the parametrization (Λ , Φ Λ ) , if there exist generators a ( λ ) ∈ L ( R ) such that for all λ ∈ Λ m λ ( · ) = s (1+ α ) / λ a ( λ ) ( A α,s λ R ϕ λ ( · − x λ )) , and if for each ρ ∈ N , | ρ | ≤ L , there is a constant C ρ > such that for all λ ∈ Λ (cid:12)(cid:12) ∂ ρ ˆ a ( λ ) ( ξ ) (cid:12)(cid:12) ≤ C ρ min n , s − λ + | ξ | + s − (1 − α ) λ | ξ | o M (cid:0) | ξ | (cid:1) − N / (1 + | ξ | ) − N / , ξ ∈ R . (20)We can deduce from (19) that the α -curvelets ψ j,ℓ,k can be represented in the form ψ j,ℓ,k ( x ) = 2 js (1+ α ) / a j ( A j R j,ℓ ( x − x j,ℓ,k )) = 2 js (1+ α ) / a j ( A j R j,ℓ x − k ) (21)with respect to the generators a j := 2 − js (1+ α ) / ψ j, , ( A − j · ) . (22)Since these generators fulﬁll condition (20), as shown in Lemma 2.5 below, C s,α is a system of α -moleculesof arbitrary order, at least in the range α ∈ [0 ,

1] for which the concept was formulated. The associatedparametrization, mapping the curvelet index set M into the phase-space P = R + × T × R , is given byΦ M : M → P , ( j, ℓ, k ) (2 js , ϕ j,ℓ , x j,ℓ,k ) = (2 js , ℓϕ j , R − j,ℓ A − j k ) . (23) Lemma 2.5.

Let

M, N , N ∈ N and ρ = ( ρ , ρ ) ∈ N be ﬁxed. There is a constant C > such thatfor all j ∈ N the generators (22) satisfy the estimate (cid:12)(cid:12) ∂ ρ b a j ( ξ ) (cid:12)(cid:12) ≤ C min (cid:8) , − js + | ξ | + 2 − js (1 − α ) | ξ | (cid:9) M (1 + | ξ | ) − N / (1 + | ξ | ) − N / . (24) Proof.

On the Fourier side the functions (22) have the form b a j = 2 js (1+ α ) / b ψ j, , ( A j · ) = W j, ( A j · ) . Let j ∈ N be arbitrary. We have supp W j, ⊆ W + j, and W + j, ⊆ [ − js − , js − ] × [ − jsα − , jsα − ] = Ξ j, , b a j ⊆ [ − − , − ] × [ − − , − ] = Ξ , . (25)Further, if j > b ψ j, , vanishes on the square [ − s ( j − − , s ( j − − ] . Consequently, b a j vanishes on [ − − s − , − s − ] × (cid:0) js (1 − α ) · [ − − s − , − s − ] (cid:1) .The mixed derivatives ∂ ρ ∂ ρ W j, obey uniformly in j ∈ N k ∂ ρ ∂ ρ W j, k ∞ . − jsρ − jsαρ . (26)With the chain rule we deduce k ∂ ρ b a j k ∞ = k ∂ ρ ∂ ρ W j, ( A j · ) k ∞ = 2 jsρ jsαρ k (cid:0) ∂ ρ ∂ ρ W j, (cid:1) ( A j · ) k ∞ . . Due to supp ∂ ρ b a j ⊆ supp b a j this estimate together with the support properties of b a j implies (24).With the machinery of α -molecules at our disposal, it is possible to use C s,α as an anchor system whoseproperties have consequences for other α -scaled systems if they fulﬁll certain consistency conditions. Inparticular, approximation properties of C s,α are shared by other α -scaled systems such as e.g. α -shearlets.A short discussion of this can be found in Section 5. For more details on the topic of α -molecules werefer to [24, 20]. In the two central sections of this article, Sections 3 and 4, we study the approximation performance ofthe α -curvelet frame C s,α with respect to diﬀerent cartoon classes. We begin in this section with classesof general cartoons, used e.g. to model natural images. In Section 4 we then turn our focus on cartoonsfeaturing only straight edges. Many suitable and well-established models for natural images are based on the concept of so-calledcartoon-like functions. In a nutshell, such functions can be thought of as a patchwork of smooth regionsseparated from one another by piecewise-smooth discontinuity curves. Their structure imitates the factthat edges, a typical feature of natural images, are characterized by abrupt changes of color and brightness,whereas changes in the regions in between occur smoothly.Mathematically, models based on this idea can be concretised in diﬀerent ways. A classic model [5]postulates a compact image domain separated into two C regions by a closed C discontinuity curve.This model was generalized in various directions, e.g., to take into account piecewise-smooth edges orto allow more general C β regularity with β ∈ [0 , ∞ ). Cartoon classes of this kind have been studiedextensively, especially in the range β ∈ (1 , Deﬁnition 3.1.

Let β ∈ [0 , ∞ ) and ν > . Given a domain Ω ⊆ R and a set A of admissible subsetsof R , the class E β (Ω; A , ν ) consists of all functions f ∈ L ( R ) of the form f = f + f χ D , where D ∈ A and f , f ∈ C β ( R ) with supp f , f ⊆ Ω and k f k C β , k f k C β ≤ ν . The class E β bin (Ω; A ) shall be the collection of all ‘binary functions’ χ D , where D ∈ A and

D ⊆ Ω . A many of the classes appearing in the literature can be retrieved, includingclasses of horizon-type. In this section we focus on the class E β (Ω; A , ν ) with ﬁxed image domain Ω =[ − , and certain C β domains as admissible sets A . Similar to [18, 5, 37, 36], we restrict our investigationto star-shaped domains, since those allow a simple parametrization of the boundary curve. The resultsobtained however also hold true for more general domains.Let us introduce the collection of admissible sets Star β ( ν ), ν >

0, as all translates of sets B ⊆ R ,whose boundary ∂B possesses a parametrization b : T → R of the form b ( ϕ ) = ρ ( ϕ ) (cid:18) cos( ϕ )sin( ϕ ) (cid:19) , ϕ ∈ T = [0 , π ] , where the radius function ρ : T → R is a C β function with | ∂ ⌊ β ⌋ ρ ( ϕ ) − ∂ ⌊ β ⌋ ρ ( ϕ ′ ) | ≤ νρ | ϕ − ϕ ′ | β −⌊ β ⌋ for all ϕ, ϕ ′ ∈ T , (27)where we set ρ := min ϕ ∈ T ρ ( ϕ ) ≥ ν − . The condition (27) implies that with C = C ( β ) = (2 π ) β ≥ k ρ ( k ) k C ( T ) ≤ Cρ ν for every k ∈ { , . . . , ⌊ β ⌋} if β ≥

1, and | ρ ( ϕ ) − ρ ( ϕ ′ ) | ≤ Cρ ν for ϕ, ϕ ′ ∈ T . Inparticular ρ ≤ ρ ( ϕ ) ≤ ρ (1 + Cν ) for all ϕ ∈ T .Note, that the set Star β ( ν ) diﬀers from the set of star-shaped domains used in [18, 5, 37, 36]. Thedomains in Star β ( ν ) are not restricted to subsets of [ − , . In fact, every star-shaped C β domain withcenter 0 and ρ > Star β ( ν ) for suitably large ν . Moreover, the collection Star β ( ν ) isscaling invariant in the sense that for B ∈ Star β ( ν ) and λ > λB ∈ Star β ( ν ), provided λρ ≥ ν − .In addition, with B ∈ Star β ( ν ) also the complement B c = R \ B is contained in Star β ( ν ).Building upon Deﬁnition 3.1 we now deﬁne the class of functions which we want to study in thissection. We put Ω = [ − , and A = Star β ( ν ). Further, we assume β ∈ [0 , ∞ ) and ν >

0. For theresulting class E β ([ − , ; Star β ( ν ) , ν ) we simplify the notation E β ([ − , ; ν ) := E β ([ − , ; Star β ( ν ) , ν ) . (28)The associated binary class shall be denoted by E βbin ([ − , ; ν ) := E βbin ([ − , ; Star β ( ν )). Before we investigate the approximation performance of the α -curvelet frame C s,α with respect to theclass E β ([ − , ; ν ), let us take a broader stance and aim for best possible N -term approximation in casewe can freely choose the utilized dictionary. Of course, a countable dense subset of L ( R ) would yieldarbitrarily good 1-term approximations. This shows that, without further restrictions, the question ofbest possible approximation is not well-posed.To cast a realistic scenario, when computing N -term approximations typically a constraint on thesearch depth is imposed. More concretely, given a ﬁxed ordering of the dictionary and some polynomial π , it is common to allow only N -term approximants being built from the ﬁrst π ( N ) elements of thedictionary. Under this so-called polynomial depth search constraint, an upper bound on the maximalachievable approximation rate was ﬁrst derived by Donoho [18, Thm. 1] for binary C β cartoons in therange β ∈ (1 , E β ([ − , ; ν ) speciﬁed in (28). Theorem 3.2.

Let β, γ ∈ [0 , ∞ ) and ν > . Assume that there is a constant C > such that sup f ∈E β ([ − , ; ν ) k f − f N k ≤ CN − γ for all N ∈ N , where f N denotes the best N -term approximation of f obtained by polynomial depth search in a ﬁxeddictionary. Then necessarily γ ≤ β . In principle, this is a known result (see e.g. [36]). However, for reasons of completeness, we outline ashort proof based on the technique used in [18]. It relies on Theorem 3.4 below and the fact that theclass E β ([ − , ; ν ) contains a copy of ℓ p for p = 2 / ( β + 1). Let us recall this notion introduced in [18].2 Deﬁnition 3.3 ([18, Def. 1&2]) . A function class F ⊆ L ( R ) is said to contain an embedded orthogonalhypercube of dimension m and side-length δ if there exist f ∈ F and orthogonal functions ψ ℓ ∈ L ( R ) , ℓ ∈ { , ..., m } , with k ψ ℓ k = δ such that the collection of hypercube vertices embeds, i.e., n f + m X ℓ =1 ǫ ℓ ψ ℓ : ǫ = ( ǫ , . . . , ǫ m ) ∈ { , } m o ⊆ F . It is said to contain a copy of ℓ p , p > if it contains a sequence of embedded orthogonal hypercubes,whose associated dimensions m k and side-lengths δ k satisfy δ k → for k → ∞ and with a constant C > Cδ − pk ≤ m k for all k ∈ N . The signiﬁcance of this notion is due to the following result, which was ﬁrst obtained in [18, Thm. 2].The reformulated version below can be found in [23, Thm. 2.2].

Theorem 3.4 ([23, Thm. 2.2]) . Suppose, that a class of functions F ⊆ L ( R ) is uniformly L -boundedand contains a copy of ℓ p . Then, allowing only polynomial depth search in a given dictionary, there is aconstant C > such that for every N ∈ N there is a function f ∈ F and an N ∈ N , N ≥ N such that k f − f N k ≥ C (cid:0) N log ( N ) (cid:1) − (2 − p ) /p , where f N denotes the best N -term approximation under the polynomial depth search constraint. It remains to investigate for which p > E β ([ − , ; ν ) contains a copy of ℓ p . To this end, letus introduce the following subclass of smooth functions for β ∈ [0 , ∞ ) and ν > C β ([ − , ; ν ) := (cid:8) f ∈ C β ([ − , ) : k f k C β ≤ ν (cid:9) . (29)Note, that the choice Ω = [ − , and A = {∅} in Deﬁnition 3.1 yields this class. As a consequence, C β ([ − , ; ν ) ⊂ E β ([ − , ; ν ) . (30)Lemma 3.5 below is the 2D analogon of the statement of [36, Thm. 3.2]. It shows, in particular, that C β ([ − , ; ν ) contains a copy of ℓ / ( β +1)0 . Hence, as a consequence of (30), also E β ([ − , ; ν ) containsa copy of ℓ / ( β +1)0 . An application of Theorem 3.4 thus yields Theorem 3.2. Lemma 3.5.

Let ν > , β ∈ [0 , ∞ ) , and p = 2 / ( β + 1) . Then the following holds true.(i) The function class C β ([ − , ; ν ) contains a copy of ℓ p .(ii) The class of binary cartoons E β bin ([ − , ; ν ) contains a copy of ℓ p if ν ≥ , otherwise it onlycontains the zero-function.Proof. The proof is a 2D-adaption of the proof of [36, Thm. 3.2].Summarizing, this establishes N − β as an upper bound for the possible order of approximation for general C β cartoons. This rate is the benchmark, against which the performance of C s,α has to be measured.We end this paragraph with the following observation. Remark 3.6.

According to Lemma 3.5(i), the bound of Theorem 3.2 actually holds true for the class C β ([ − , ; ν ) . This is a stronger statement due to the inclusion (30) . Further, due to Lemma 3.5(ii), astatement analogous to Theorem 3.2 holds true for the binary class E β bin ([ − , ; ν ) if ν ≥ . According to Theorem 3.2 and Remark 3.6 the order of the N -term approximation rate achievable forthe classes E β bin ([ − , ; ν ), ν ≥

1, and E β ([ − , ; ν ), ν >

0, cannot exceed N − β . This bound is validfor arbitrary dictionaries and independent of the approximation scheme employed, as long as it respectsa polynomial depth search condition. Even adaptive approximation schemes cannot perform better.Schemes, where these rates are provably achieved, at least up to order, have been developed for binarycartoons based on wedgelets [15] and surﬂets [9], for general cartoons utilizing bandelets [38, 39]. Theseresults show that the optimality benchmark N − β can indeed be realized in practice, at least up to order.However, the utilized schemes are mostly adaptive, only for certain cartoon classes nonadaptive methodswith quasi-optimal performance are known.A breakthrough concerning the nonadaptive approximation of C cartoons with curved edges was theintroduction of curvelets by Cand`es and Donoho [4, 5]. By a simple thresholding scheme, curvelet framesachieve an approximation rate matching the class bound N − up to a log-factor. The reason for thisperformance is due to the parabolic scaling employed. The following argument shall heuristically explain,why this type of scaling is ideal for the representation of C edges.In local Cartesian coordinates, a C curve can be represented as the graph ( E ( x ) , x ) of a function E ∈ C ( R ) and one can choose a coordinate system such that E ′ (0) = E (0) = 0. A Taylor expansionthen yields approximately E ( x ) ≈ E ′′ (0) x , which matches the essential support width ≈ length ofparabolically scaled functions. Hence, those can provide optimal resolution of the curve across all scales.A similar heuristic applies to C β curves if β ∈ (1 , E ∈ C β ( R ) yields | E ( x ) | . x β . The curve is thus contained in a rectangle of size width ≈ length /β which suggests α -scaling with α = β − for optimal approximation. And indeed, the classic approximation result by Cand`esand Donoho could be extended in [23, Thm. 4.1] to the range β ∈ (1 , E β ([ − , ; ν ) used here is not fully identical to the class in [23]. Moreover, only curvelet framesof the type C s,α with s = 1 were considered there. It is not hard to verify though that the proof carriesover to general s > Theorem 3.7 ([23, Thm. 4.1]) . Let β ∈ (1 , , ν > . For the choice α = β − , s > arbitrary, the frameof α -curvelets C s,α provides almost optimal sparse approximations for the class E β ([ − , ; ν ) . Moreprecisely, there exists a constant C > such that for every f ∈ E β ([ − , ; ν ) and N ∈ N k f − f N k ≤ CN − β log (1 + N ) β , where f N denotes the N -term approximation of f obtained by choosing the N largest coeﬃcients. This theorem naturally raises the question of extendibility beyond the range β ∈ (1 , α = β − is still optimal for β > β > α = β − . In fact, it is still α = and choosing α < deteriorates the approximation performance. The main results of this subsection, Theorems 3.9 and 3.11, establish bounds on the achievable N -termapproximation rate for the class E β ([ − , ; ν ), β ∈ [0 , ∞ ), when using the α -curvelet frame C s,α forapproximation. Unlike the bounds in Theorem 3.2 associated with the signal class the bounds derivedhere are tied to the particular approximation system C s,α . However, via the framework of α -moleculesthey are also eﬀective for other α -scaled systems, such as e.g. α -shearlets as discussed in Section 5.In order to establish these bounds we study the approximability of certain example cartoons. As asuitable object, we choose the characteristic function of the ball B (0 , ) ⊂ R of radius , for which wesubsequently use the symbol Θ( x ) := χ B (0 , ) ( x , x ) , x ∈ R . (31)This function embodies an exceptionally regular cartoon with a closed curved C ∞ -singularity. It is radialsymmetric and binary, contained in E β bin ([ − , , ν ) for arbitrary β ∈ [0 , ∞ ) and ν ≥

2. Furthermore, for4every β ∈ [0 , ∞ ) and ν ≥ γ > γ Θ ∈ E β ([ − , ; ν ), wherefore the approximabilityof Θ has implications for the approximability of these cartoon classes.The Fourier transform of Θ is explicitly known. Let J denote the Bessel function of order 1, thenaccording to (67) b Θ( ξ ) = J ( π | ξ | )2 | ξ | , ξ ∈ R . (32)Some properties of J and Bessel functions in general are collected in the appendix.At the center of the following investigation is the lemma below, which estimates the energy of b Θcontained in the wedges W J , J ∈ J . Let { W J } J ∈ J be a family of functions of the kind (10) with property(11) for 0 < A ≤ B < ∞ . Further, let W − J := χ W − J and W + J := χ W + J be the characteristic functions of the sets W − J and W + J deﬁned in (14). Lemma 3.8.

There are constants < C ≤ C < ∞ , independent of scale j ≥ j , where j ∈ N is asuitable base scale, such that for all J ∈ J with | J | ≥ j , where | J | = j for J = ( j, ℓ ) ∈ J , AC − js (2 − α ) ≤ A k b Θ W − J k ≤ k b Θ W J k ≤ B k b Θ W + J k ≤ BC − js (2 − α ) . Proof.

Let us recall the Bessel function J of order 1 and its asymptotic behavior. According to (69)there is a constant C > R on [1 , ∞ ) satisfying | R ( r ) | ≤ Cr − / such that J ( r ) = r πr cos( r − π R ( r ) for r ≥ . This allows to separate terms of higher order from J . We decompose J ( r ) = h π cos ( r − π r − i + hr π cos( r − π r − / R ( r ) + R ( r ) i =: T ( r ) + T ( r ) . For the following argumentation we need the square wave function ⊓ : R → { , } deﬁned by ⊓ ( r ) := ( , r ∈ S k ∈ Z kπ + [ − π , , , r ∈ S k ∈ Z kπ + (0 , π ) . For all r ∈ R it has the property 2 cos ( r − π/ ≥ ⊓ ( r ). Therefore we can deduce for 1 ≤ a ≤ b Z ba T ( r ) r − dr = 1 π Z ba ( r − π r − dr ≥ π Z ba ⊓ ( r ) r − dr ≥ X k ∈ I a,b ( kπ ) − with I a,b := { k ∈ Z : kπ ∈ [ a + π, b ] } . To proceed, we use the relation n X k = m ( kπ ) − ≥ π Z ( n +1) πmπ k − dk, which is valid for all m, n ∈ N and m ≤ n . We obtain12 X k ∈ I a,b ( kπ ) − ≥ π Z ba +2 π k − dk = 12 π (cid:0) Z ba k − dk − Z a +2 πa k − dk (cid:17) ≥ π ( a − − b − ) − a − . Next, we see that with a constant

C > ≤ a ≤ b Z ba | T ( r ) | r − dr ≤ C Z ba r − dr ≤ C Z ∞ a r − dr ≤ Ca − . Z ba J ( r ) r dr ≥ π (1 − ab − ) a − − (1 + C ) a − . If c = ab − ≤ a ≥ π C − c the estimate Z a/ca J ( r ) r dr ≥ π (1 − c ) a − . (33)After this preparation, we can now turn to the actual proof of the assertion. The relation A k b Θ W − J k ≤ k b Θ W J k ≤ B k b Θ W + J k is a direct consequence of (15) and k W J k ∞ ≤ √ B . Let I j be the intervals deﬁned in (7). Further, recallthe intervals I − j ⊂ I j deﬁned in (13). Using (32) and the deﬁnition (14) of W − J we calculate k b Θ W − J k = Z W − J J ( π | ξ | )4 | ξ | dξ = Z I − j Z A − J J ( πr )4 r dϕdr ≍ − js (1 − α ) Z π I − j J ( r ) r dr. The intervals I − j scale like ∼ js . Hence, if j ∈ N is chosen large enough by (33) k b Θ W − J k ≍ − js (1 − α ) Z π I − j J ( r ) r − dr & − js (1 − α ) − js = 2 − js (2 − α ) . The estimate from above is much easier to establish. If j ∈ N such that π I j ⊂ [1 , ∞ ) we have k b Θ W + J k = Z W + J J ( π | ξ | )4 | ξ | dξ = Z I j Z A J J ( πr )4 r dϕdr ≍ − js (1 − α ) Z π I j J ( r ) r dr . − js (1 − α ) Z I j r − dr . − js (2 − α ) . Based on Lemma 3.8 we can prove the ﬁrst main result of this article.

Theorem 3.9.

Let C s,α be the α -curvelet frame constructed in Section 2 for ﬁxed α ∈ ( −∞ , and s > .There exists a constant C > such that for any given N ∈ N every N -term approximation f N of Θ withrespect to C s,α (not even subject to a polynomial depth search constraint) satisﬁes k Θ − f N k ≥ CN − − α . Proof.

Let N ∈ N be ﬁxed and assume that f N = N X r =1 θ J r ,k r ψ J r ,k r is a linear combination of α -curvelets ψ J r ,k r with coeﬃcients θ J r ,k r ∈ R . The curvelets ψ J r ,k r ∈ C s,α satisfy supp b ψ J r ,k r ⊆ W + J r as recorded in (15). It follows supp b f N ⊆ W N where W N := S J ∈ J N W + J for J N := { J , . . . , J N } ⊂ J . Using the notation J cN := J \ J N and W cN := R \W N we get with Lemma 3.8 k Θ − f N k = k b Θ − b f N k ≥ k b Θ k L ( W cN ) ≥ X J ∈ J cN k b Θ W − J k & X J ∈ J cN − js (2 − α ) . We want to bound the right-hand side from below. By (5), the number of tiles in each corona C j , j ∈ N ,is given by L j , where L = 1 and L j = 2 ⌊ js (1 − α ) ⌋ +1 for j ≥

1. Let j ( N ) ∈ N denote the unique numbersuch that j ( N ) − X j =0 L j < N ≤ j ( N ) X j =0 L j . − js (2 − α ) decreases with rising scale we obtain X J ∈ J cN − js (2 − α ) ≥ ∞ X j = j ( N )+1 L j − js (2 − α ) ≥ ∞ X j = j ( N )+1 − js & − j ( N ) s . Here we used L j ≥ js (1 − α ) . Since N & P j ( N ) − j =0 js (1 − α ) & j ( N ) s (1 − α ) we can ﬁnally deduce k Θ − f N k & − j ( N ) s = (cid:16) j ( N ) s (1 − α ) (cid:17) − − α & N − − α . This result can be strengthened if we restrict to greedy N -term approximations obtained by thresholdingthe coeﬃcients. Essential is the following observation, which has also been used in [23]. Due to itsimportance we give a rigorous proof here. Lemma 3.10.

There is a constant

C > such that all curvelets ψ µ ∈ C s,α , µ ∈ M , satisfy k ψ µ k ≤ C − js (1+ α ) / . Proof.

Let a j be the functions from (22) and recall that according to (25) the support of b a j is containedin the unit square Ξ , for every j ∈ N . Let Id denote the identity operator. We have the estimate (cid:13)(cid:13)(cid:13) F − (cid:16) ( Id + ∂ )( Id + ∂ ) b a j (cid:17)(cid:13)(cid:13)(cid:13) ∞ ≤ k ( Id + ∂ )( Id + ∂ ) b a j k ≤ k ( Id + ∂ )( Id + ∂ ) b a j k ∞ . According to Lemma 2.5 the right-hand side is bounded uniformly over all scales. We conclude that thereis a constant

C >

0, independent of j ∈ N , such thatsup x ∈ R | (1 + x )(1 + x ) a j ( x ) | ≤ C. In other words | a j ( x ) | ≤ C (1 + x ) − (1 + x ) − . Using the representation (21) we obtain | ψ j, , ( x ) | = 2 js (1+ α ) / | a j ( A j x ) | ≤ C js (1+ α ) / (1 + 2 js x ) − (1 + 2 jsα x ) − and hence Z R | ψ j, , ( x ) | dx . js (1+ α ) / Z R (1 + 2 js x ) − (1 + 2 jsα x ) − dx = 2 − js (1+ α ) / Z R (1 + x ) − (1 + x ) − dx . − js (1+ α ) / . Since k ψ j,ℓ,k k = k ψ j, , k the proof is ﬁnished.Lemma 3.10 allows to deduce a simple a-priori estimate of the curvelet coeﬃcient size, namely | θ µ | = |h f, ψ µ i| ≤ k f k ∞ k ψ µ k ≤ C k f k ∞ − js (1+ α ) / for µ = ( j, ℓ, k ) ∈ M . (34)Note, that the constant C > C s,α . Using (34) we now prove a stronger statementthan Theorem 3.9 for greedy approximations. Theorem 3.11.

Let α ∈ ( −∞ , and s > be ﬁxed. Further, let f N denote the N -term approximation of Θ with respect to the α -curvelet frame C s,α obtained by thresholding the coeﬃcients. There is a constant C > such that for every N ∈ N k Θ − f N k ≥ CN − { α, − α } . Proof. If α ≤ the assertion is true by Theorem 3.9. It remains to handle the range 1 ≥ α > .Let θ J r ,k r = h Θ , ψ J r ,k r i , r ∈ { , . . . , N } , be the N largest curvelet coeﬃcients which determine the7approximant f N := P Nr =1 θ J r ,k r ψ J r ,k r . On the Fourier side the curvelet ψ J,k ∈ C s,α is the product of thefunctions W J and u J,k deﬁned in (10) and (17), respectively. Using condition (11) we ﬁrst estimate k Θ − f N k = k b Θ − b f N k ≥ B − X J ∈ J k b Θ W J − b f N W J k ≥ A B X J ∈ J k b Θ W − J − b f N W − J k , where W − J is the characteristic function of the set W − J deﬁned in (14). The triangle inequality yields12 k b Θ W − J k ≤ k b Θ W − J − b f N W − J k + k b f N W − J k for every J ∈ J . (35)Observe the relation √ AW − J ≤ W − J W J ≤ √ BW − J and W − J W J ′ = 0 for J = J ′ . Therefore, it holds b f N W − J = N X r =1 θ J r ,k r b ψ J r ,k r W − J = N X r =1 θ J r ,k r u J r ,k r W J r W − J ≍ X k ∈ K J θ J,k u J,k W − J with K J = { k r ∈ Z : r ∈ { , . . . , N } , J r = J } . Next, we use that { u J,k } k ∈ Z is an orthonormal basisof L (Ξ J ), where Ξ J ⊃ W − J is the set deﬁned in (16). We estimate (cid:13)(cid:13)(cid:13) X k ∈ K J θ J,k u J,k W − J (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) X k ∈ K J θ J,k u J,k (cid:13)(cid:13)(cid:13) L (Ξ J ) = X k ∈ K J | θ J,k | . The frame coeﬃcients satisfy the a-priori estimate | θ J,k | . − js (1+ α ) according to (34). Thus we obtain k b f N W − J k ≍ (cid:13)(cid:13)(cid:13) X k ∈ K J θ J,k u J,k W − J (cid:13)(cid:13)(cid:13) . ( K J )2 − js (1+ α ) . By Lemma 3.8 we have k b Θ W − J k & − js (2 − α ) . We deduce from (35) k b Θ W − J − b f N W − J k ≥ k b Θ W − J k − k b f N W − J k & − js (2 − α ) − ( K J )2 − js (1+ α ) . Altogether, we conclude k Θ − f N k ≥ X J ∈ J k b Θ W − J − b f N W − J k & X J ∈ J max (cid:8) , − js (2 − α ) − ( K J )2 − js (1+ α ) (cid:9) . Note that P J ( K J ) ≤ N . To derive a lower bound let us consider the following minimization problem: Minimize { N J } J ∈ J X J ∈ J max { , − js (2 − α ) − N J − js (1+ α ) } s.t. X J ∈ J N J ≤ N, N J ∈ [0 , ∞ ) ( J ∈ J ) . The condition N J ∈ [0 , ∞ ), which simpliﬁes the subsequent argumentation, is possible since we are onlyinterested in a bound. For the optimal choice { N J } J , it necessarily holds P J N J = N and N J ≤ − js (2 − α ) js (1+ α ) = 2 js (2 α − . Hence, the minimization problem can be reformulated as minimizing the term X J ∈ J (cid:0) − js (2 − α ) − N J − js (1+ α ) (cid:1) under the constraints P J N J = N and N J ≤ js (2 α − . Assume that the family { N J } J fulﬁlls theseconstraints. Further, let j ( N ) ∈ N denote the number determined by the property j ( N ) − X j =0 js (2 α − L j < N ≤ j ( N ) X j =0 js (2 α − L j , (36)8where L j from (5) counts the wedges in the corona C j . Then the following estimate holds true X J ∈ J (cid:16) − js (2 − α ) − N J − js (1+ α ) (cid:17) ≥ ∞ X j = j ( N )+1 (cid:16) X | J | = j − js (2 − α ) (cid:17) ≥ ∞ X j = j ( N )+1 − js & − j ( N ) s . To see this, note that 2 − js (1+ α ) is decreasing with rising scale and that L j ≥ js (1 − α ) . Since N ≍ j ( N ) sα ,which follows from (36), we have proven k Θ − f N k & X J ∈ J max (cid:8) , − js (2 − α ) − ( K J )2 − js (1+ α ) (cid:9) & − j ( N ) s ≍ N − α and the proof is ﬁnished.The approximation results for Θ have direct implications for the class-wise approximation of cartoon-likefunctions. If ν ≥

2, then Θ ∈ E βbin ([ − , ; ν ) for arbitrary β ∈ [0 , ∞ ). Moreover, we can always ﬁnd γ > γ Θ ∈ E β ([ − , ; ν ). This allows to draw the following conclusion. Corollary 3.12.

Let β ∈ [0 , ∞ ) and ν ≥ . The uniform decay of the N -term approximation error for E βbin ([ − , ; ν ) and E β ([ − , ; ν ) provided by C s,α cannot be faster than N − − α . Futhermore, thresh-olding of coeﬃcients cannot yield rates better than N − { α, − α } . If β > C s,α to reach the theoretically possible approximation order of N − β for theclass E β ([ − , ; ν ). The best performance is achieved for the classic choice α = , with a correspondingapproximation rate of order N − . A smaller α leads to a deterioration of the approximation. As isobvious from our investigation, this behavior applies to cartoons with curved edges exempliﬁed by thefunction Θ = χ B (0 , ) from (31). For such cartoons the rate inevitably deteriorates as α tends to 0,since their energy is spread more or less uniformly across all directions of the Fourier plane. In the nextsection, we narrow our focus and consider only cartoons with straight edges. Such cartoons are highlyanisotropic and in a certain sense the opposite extreme of the isotropic function Θ. Since their Fourierenergy is concentrated in only one direction, a smaller α will be an advantage for their approximation. In the following, we investigate the approximation performance of the curvelet frame C s,α with respectto cartoons with straight edges. To specify the associated signal class, let Straight be the collection ofall closed half-spaces of R . Parameterized by ϕ ∈ [0 , π ) and c ∈ R , these are subsets of the form H ( ϕ, c ) = n ( x , x ) ∈ R : x cos( ϕ ) − x sin( ϕ ) ≥ c o . Using Deﬁnition 3.1 we then introduce the following image class with parameters β ∈ [0 , ∞ ) and ν > E β ([ − , ; ν ) := E β ([ − , ; Straight , ν ) . This is a subclass of the general cartoons (28) considered in Section 3. Indeed, for ν > ν ≥ ν chosen large enough C β ([ − , ; ν ) ⊂ E β ([ − , ; ν ) ⊂ E β ([ − , ; ˜ ν ) , where C β ([ − , ; ν ) is the class deﬁned in (29). These inclusions allow to transfer the optimalitybenchmark N − β , valid for both E β ([ − , ; ˜ ν ) and C β ([ − , ; ν ) (see Theorem 3.2 and Remark 3.6).For E β ([ − , ; ν ), we thus again aim for an approximation rate of order N − β .Ridgelet frames were developed speciﬁcally for the optimal representation of functions with straightline singularities. For both variants, ‘orthonormal ridgelets’ [16] and ‘0-curvelets’ [22], it has been shownthat they reach the optimality bound N − β . More precisely, this rate was proved for ‘mutilated Sobolevfunctions’ with compact support [3, 26], i.e., compactly supported functions which are in the Sobolevspace H β ( R ) apart from straight line singularities. In line with the result from [26] for 0-curvelets, wecan expect that decreasing α improves the approximation ability of C s,α for E β ([ − , ; ν ).9Our main result concerning the α -curvelet approximation of E β ([ − , ; ν ) is Theorem 4.1 below. Itis formulated and proved for integer β ∈ N only, although the statement should extend to the whole range β ∈ R + . In this way, we avoid technical diﬃculties which would arise if we used ﬁnite diﬀerences insteadof integer derivatives (compare [23]). Theorem 4.1.

The parameters β ∈ N , ν > , α ∈ [0 , , and s > shall be ﬁxed. Further, let f N be the N -term approximation of a signal f ∈ L ( R ) provided by the N largest coeﬃcients with respect to theframe C s,α = { ψ µ } µ ∈ M . There exists a constant C > such that for every f ∈ E β ([ − , ; ν ) and N ∈ N k f − f N k ≤ C ( N − β log (1 + N ) β if α ≤ β − ,N − /α if α > β − . As expected, decreasing the parameter α improves the approximation performance. If α ∈ [0 , β − ] theachieved rate is even optimal up to the log-factor. In this range signals from E β ([ − , ; ν ) are representedwith the same eﬃciency as a smooth function from C β ([ − , ; ν ).Theorem 4.1 is deduced by studying the curvelet coeﬃcients, whose decay is closely related to theachieved N -term approximation rate. Recall that a typical measure for the sparsity of a sequence { c λ } λ ⊂ C is given by the weak ℓ p -(quasi)-norms, for p > k{ c λ } λ k wℓ p := (cid:16) sup ε> ε p · { λ : | c λ | > ε } (cid:17) /p . By deﬁnition, the sequence { c λ } λ belongs to wℓ p (Λ) if and only if the quantity k{ c λ } λ k wℓ p is ﬁnite. Thisis the case precisely if there exists a constant C > { λ : | c λ | > ε } ≤ C p ε − p for all ε >

0. Thesmallest possible such constant then coincides with the weak ℓ p -(quasi)-norm of the sequence. Anotheruseful characterization of a sequence { c λ } λ ∈ wℓ p (Λ) is given in terms of its non-increasing rearrangement { c ∗ n } n ∈ N . It holds | c ∗ n | . n − /p and sup n> n /p | c ∗ n | = k{ c λ } λ k wℓ p .As illustrated by the following well-known lemma (see e.g. [13]), the decay of the frame coeﬃcientsdetermines the N -term approximation rate achieved by thresholding. A full proof is given e.g. in [24]. Lemma 4.2 ([24, Lem. 5.1]) . Let { m λ } λ ∈ Λ be a frame in L ( R ) and f = P c λ m λ an expansion of f ∈ L ( R ) with respect to this frame. If { c λ } λ ∈ wℓ / ( β +1) (Λ) for some β ≥ , then the N -termapproximations f N obtained by keeping the N largest coeﬃcients satisfy k f − f N k . N − β . Beginning in Subsection 4.1, we study the sparsity of the coeﬃcients θ µ = h f, ψ µ i provided by theframe C s,α = { ψ µ } µ ∈ M for a signal f ∈ E β ([ − , ; ν ). The decay rates proved in Theorem 4.3 are thefoundation of the following proof of Theorem 4.1. Proof of Theorem 4.1. If α > β − the sequence { θ µ } µ ∈ M of curvelet coeﬃcients θ µ = h f, ψ µ i belongs to wℓ p ( M ) with p = 2 / (1 + 1 /α ). This is proved in Theorem 4.3. Lemma 4.2 directly translates this intothe statement of Theorem 4.1. In case α ≤ β − Theorem 4.3 yields | θ ∗ m | ≤ Cm − (1+ β ) (log m ) β forthe curvelet coeﬃcient θ ∗ m of m -th largest modulus. Utilizing the frame property of C s,α we can estimate k f − f N k . X m>N | θ ∗ m | . X m>N m − (1+ β ) · (log m ) (1+ β ) ≤ Z ∞ N t − (1+ β ) · (log (1 + t )) (1+ β ) dt. Note that N ≥

1. Partial integration leads to Z ∞ N t − (1+ β ) · (log (1 + t )) (1+ β ) dt . N − β (log (1 + N )) (1+ β ) + Z ∞ N t − (1+ β ) · (log (1 + t )) ⌈ β ⌉ dt. We repeat this ⌈ β ⌉ -times and ﬁnally arrive at Z ∞ N t − (1+ β ) · (log (1 + t )) (1+ β ) dt . N − β (log (1 + N )) (1+ β ) . Subsequently, we study the decay of the curvelet coeﬃcients θ µ = h f, ψ µ i . Our main result is Theorem 4.3. Theorem 4.3.

Let α ∈ [0 , , s > , β ∈ N , and ν > be ﬁxed. Further, denote by θ ∗ N the (in modulus) N -th largest coeﬃcient of f ∈ E β ([ − , ; ν ) with respect to C s,α = { ψ µ } µ ∈ M . There exists a constant C > independent of N ≥ such that sup f ∈ E β ([ − , ; ν ) | θ ∗ N | ≤ C · ( N − (1+ β ) · (log N ) β if α ≤ β − ,N − (1+1 /α ) if α > β − . Proof.

Let M j denote the subset of the curvelet index set M corresponding to scale j . Further, given ε >

0, let us deﬁne M ε := n µ ∈ M : | θ µ | > ε o and M j,ε := n µ ∈ M j : | θ µ | > ε o . According to (34) thereis a constant e C >

0, independent of scale, such that | θ µ | ≤ e C k f k ∞ − js (1+ α ) / ≤ e Cν − js (1+ α ) / . At scales j > j ε := ( e Cνε − ) s (1+ α ) the coeﬃcients thus satisfy | θ µ | < ε and the sets M j,ε are empty. Inparticular M ε = 0 in case ε > e Cν since then j ε <

0. If j ≤ j ε Proposition 4.4, which is stated andproved below, gives the estimate M j,ε . jρ ε − / (1+ β ) with ρ = s max { αβ − , } β ≥ . If α > β − we have ρ > M ε = ⌊ j ε ⌋ X j =0 M j,ε . ⌊ j ε ⌋ X j =0 jρ ε − / (1+ β ) . j ε ρ ε − / (1+ β ) = ε − αβ − β )(1+ α ) ε − / (1+ β ) = ε − / (1+1 /α ) . From here, a direct argument leads to | θ ∗ N | . N − (1+1 /α ) for the N -th largest coeﬃcient θ ∗ N .If α ≤ β − we have ρ = 0 and the estimate M ε . ⌊ j ε ⌋ X j =0 ε − / (1+ β ) . (log ( e Cνε − ) + 1) ε − / (1+ β ) = log (2 e Cνε − ) ε − / (1+ β ) . Hence, there is a constant C ≥ M ε ≤ C log ( C ε − )( C ε − ) / (1+ β ) with C = max { , e Cν } .It follows | θ ∗ N | ≤ C δ N for the number δ N which solves N = C log ( δ − N ) δ − / (1+ β ) N . In general δ N cannotbe calculated explicitly, wherefore we resort to an estimate.If N ≥ ε N := N − β ≤ since β ≥

1. Taking into account C ≥ C ε N − / (1+ β ) log ( ε − N ) ≥ N = C δ − / (1+ β ) N log ( δ − N ) , which in turn proves δ N ≥ ε N = N − β . Therefore e δ N ≥ δ N for the solution e δ N of N = C e δ − / (1+ β ) N log ( N β ) . An explicit calculation yields e δ N = ( C β ) (1+ β ) / N − (1+ β ) / (log N ) (1+ β ) / , which proves the claim.The missing ingredient in the proof of Theorem 4.3 is Proposition 4.4. Proposition 4.4.

Let the parameters α ∈ [0 , , s > , β ∈ N , and ν > be ﬁxed. Further, let M j denote the curvelet indices at scale j . The sequence { θ µ } µ ∈ M j of coeﬃcients θ µ = h f, ψ µ i obeys k{ θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . jρ with ρ = s max { αβ − , } / (1 + β ) and an implicit constant independent of scale j ∈ N and f ∈ E β ([ − , ; ν ) . f into fragments, a technique pioneered in [5]. To thisend, let Q j at every scale j ∈ N denote the collection of cubes Q := Q ( j )( k ,k ) := [2 − jsα ( k − , − jsα ( k + 1)] × [2 − jsα ( k − , − jsα ( k + 1)] , ( k , k ) ∈ Z . Further, let ω ∈ C ∞ ([ − , ) be a nonnegative window vanishing outside the square [ − , , such thatthe family { ω Q } Q ∈Q j of functions ω Q ( x ) := ω (2 jsα x − k , jsα x − k ) is a partition of unity, i.e., it hasthe property P Q ∈Q j ω Q = 1. Following [5] we then decompose f = P Q f Q into the fragments f Q := f ω Q , Q ∈ Q j . (37)Note that supp f Q ⊆ Q and that the size of the squares Q ∈ Q j corresponds to the ‘essential’ length ofthe curvelets at scale j . Therefore h f, ψ µ i ≈ h f Q , ψ µ i for a curvelet ψ µ at the location of the cube Q .For every Q ∈ Q j we now investigate the sparsity of the sequence θ Q := {h f Q , ψ µ i} µ ∈ M j . (38)Clearly, due to supp f ⊆ [ − , we only need to consider cubes Q ∈ Q j which meet the square [ − , .Of these relevant cubes, let us collect those which intersect the straight edge in Q j , the others in Q j .The associated fragments f Q will be called edge fragments and smooth fragments , respectively. The mainresult concerning the sparsity of (38) is Proposition 4.5. Proposition 4.5.

Let α ∈ [0 , , s > , β ∈ N , and ν > be ﬁxed. Let Q ∈ Q j , j ∈ N , be a square and θ Q the curvelet coeﬃcient sequence of the fragment f Q = f ω Q deﬁned in (38) . There is a constant C > independent of j ∈ N and Q ∈ Q j such that for all f ∈ E β ([ − , ; ν ) the following estimates hold true.(i) If Q ∈ Q j the sequence θ Q satiﬁes k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ C · − jsα .(ii) If Q ∈ Q j the sequence θ Q satisﬁes k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ C · − jsα jρ with ρ = s max { αβ − , } / (1 + β ) . A direct consequence of Proposition 4.5, whose proof is given later on, is Proposition 4.4.

Proof of Proposition 4.4.

We have the decomposition { θ µ } µ ∈ M j = P Q ∈Q j θ Q . Since 0 < / (1 + β ) ≤ p -triangle inequality with p = 2 / (1 + β ) yields k { θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) ≤ X Q ∈Q j k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ (cid:0) Q j (cid:1) · sup Q ∈Q j k θ Q k / (1+ β ) wℓ / (1+ β ) + (cid:0) Q j (cid:1) · sup Q ∈Q j k θ Q k / (1+ β ) wℓ / (1+ β ) . Since f is supported in [ − , , there are constants C , C >

0, independent of scale, such that Q j ≤ C jsα and Q j ≤ C jsα . Utilizing the estimates of Proposition 4.5, we thus obtain with ρ = s max { αβ − , } / (1 + β ) ≥ k { θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . C + C jρ . jρ . In the remainder of this section we are concerned with the proof of Proposition 4.5. Hereby, we restrictto functions f ∈ E β ([ − , ; ν ) of the simple form f = gχ H ( ϕ,c ) (39)with g ∈ C β ([ − , , ν ) and H ( ϕ, c ) ∈ Straight a half-space determined by ϕ ∈ [0 , π ) and c ∈ R .Note that for a general cartoon f = f + f χ H ( ϕ,c ) both components e f := f and e f := f χ H ( ϕ,c ) havethe form (39), due to the representation f = f χ H (0 , − .Hence, if the estimates of Proposition 4.5 are proven for elements of type (39), they are then also truefor all f ∈ E β ([ − , ; ν ). This is a consequence of the estimate 2 − jsα ≤ − jsα jρ and k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ k{h e f ω Q , ψ µ i} µ ∈ M j k / (1+ β ) wℓ / (1+ β ) + k{h e f ω Q , ψ µ i} µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . Q ∈ Q j be a cube at scale j ∈ N with center M Q := 2 − jsα ( k , k ) ∈ R , which nontrivially intersectsthe cartoon domain [ − , . If Q ∈ Q j we put P Q := M Q . If Q ∈ Q j let us ﬁx a point P Q ∈ Q on theedge curve { ( x , x ) ∈ R : x cos( ϕ ) − x sin( ϕ ) = c } of the cartoon such that χ H ( ϕ,c ) = H ( R ϕ ( x − P Q )),with rotation matrix (6) and where H := H ⊗ H ( t ) = ( , if t < , , if t ≥ . (40)Putting e g Q ( x ) := g ( R − ϕ x + P Q ) and e ω Q ( x ) := ω Q ( R − ϕ x + P Q ), the fragment f Q can then be written as f Q ( x ) = f ˜ Q (cid:0) R ϕ ( x − P Q ) (cid:1) with a function f ˜ Q of the form( i ) f ˜ Q := e g Q e ω Q , if Q ∈ Q j , or ( ii ) f ˜ Q := e g Q e ω Q H , if Q ∈ Q j . (41)On the Fourier side we have b f Q ( ξ ) = c f ˜ Q ( R ϕ ξ ) exp (cid:0) − πi h P Q , ξ i (cid:1) . Now, let ψ µ = ψ j,ℓ,k ∈ C s,α be a ﬁxed curvelet and recall b ψ j,ℓ,k = W j,ℓ u j,ℓ,k with the real-valued wedgefunctions W j,ℓ ( · ) = W j, ( R j,ℓ · ) from (10) and the functions u j,ℓ,k ( · ) = 2 − js (1+ α ) / exp(2 πi h R − j,ℓ A − j k, ·i ) . There are unique k • ∈ Z and ∆ k ∈ [0 , such that P Q = R − j,ℓ A − j ( k • + ∆ k ). Further, we can express ϕ as a ‘fractional multiple’ of the angle ϕ j deﬁned in (4), writing ϕ = ( ℓ • − ∆ ℓ ) ϕ j with unique ℓ • ∈ Z and ∆ ℓ ∈ [0 , h f Q , ψ j,ℓ,k i = h b f Q , b ψ j,ℓ,k ih f Q , ψ j,ℓ,k i = Z R c f ˜ Q (cid:0) R j,ℓ • − ∆ ℓ ξ (cid:1) exp (cid:0) − πi h R − j,ℓ A − j ( k • + ∆ k ) , ξ i (cid:1) W j,ℓ ( ξ ) u j,ℓ,k ( ξ ) dξ = Z R c f ˜ Q ( ξ ) W j,ℓ − ℓ • +∆ ℓ ( ξ ) u j,ℓ − ℓ • +∆ ℓ,k + k • +∆ k • ( ξ ) dξ. Relabelling the indices ( l , k ) := ([ ℓ − ℓ • ] , k + k • ), where [ ℓ − ℓ • ] ∈ {− L − j , . . . , L + j } is the unique numberobtained by shifting ℓ − ℓ • ∈ Z by integer multiples of L j = πϕ − j (see (5)), we can write h f Q , ψ j,ℓ,k i = Z R c f ˜ Q ( ξ ) W j, l +∆ ℓ ( ξ ) u j, l +∆ ℓ, k +∆ k ( ξ ) dξ. (42)To estimate the integral (42) we need knowledge about the Fourier localization of the functions f ˜ Q . Thisinvestigation is carried out in the next two subsections. The Fourier analysis of the functions f ˜ Q , Q ∈ Q j , from (41) is conducted in a generic setting, independentof the concrete cube Q . We assume α ∈ [0 , β ∈ N , and let κ, ν, ˜ ν > f j , j ∈ N , called standard fragments, deﬁned by( i ) f j := gω j , or ( ii ) f j := gω j H , (43)where H is the step function (40), g ∈ C β ( κ [ − , , ν ) and ω j := ω (2 jsα · ) with ω ∈ C ∞ ( R ) ∩ C β ( κ [ − , , ˜ ν ). For every Q ∈ Q j the corresponding fragment f ˜ Q is of the form (43) with speciﬁcfunctions g and ω , namely g = e g Q and ω = e ω Q (2 − jsα · ) (compare to (41)). Note that the parameters κ, ν, ˜ ν > Q ∈ Q j , e.g. κ = 2 √

2, and ˜ ν, ν > f ∈ E β ([ − , ; ν ) and the partition of unity { ω Q } Q utilized in (37). Since the resultsof this subsection are valid uniformly for all choices of g and ω , as long as they fulﬁll the speciﬁcationsin accordance with κ, ν, ˜ ν >

0, they hence apply to all fragments f ˜ Q .The investigation starts with an elementary lemma, where I j , j ∈ N , denote the dyadic intervalsintroduced in (7).3 Lemma 4.6.

Let s > be ﬁxed and for j ∈ N let f j be fragments of the form (43) . Then there exists aconstant C > independent of j ∈ N and the concrete choice of the functions g and ω in (43) such thatfor every p ∈ N and ϕ ∈ [ − π, π ) Z I p | b f j ( r, ϕ ) | dr ≤ Cε j,p ( ϕ )2 − ps − jsα k g k ∞ k ω k with functions ε j,p : [ − π, π ) → R satisfying P p ∈ N R π − π ε j,p ( ϕ ) dϕ ≤ .Proof. Let us assume k g k ∞ = 0 and k ω k ∞ = 0, otherwise the proof is trivial. Since for every p, j ∈ N and ϕ ∈ [ − π, π ) I j,p ( ϕ ) := Z I p | b f j ( r, ϕ ) | dr < ∞ we can deﬁne functions ǫ j,p : [ − π, π ) → R via ǫ j,p ( ϕ ) := I j,p ( ϕ )2 ps jsα k g k − ∞ k ω k − . Then I j,p ( ϕ ) = ǫ j,p ( ϕ )2 − ps − jsα k g k ∞ k ω k . Let us prove that there is a constant

C >

0, independent of the relevant parameters, such that X p ∈ N Z π − π ǫ j,p ( ϕ ) dϕ ≤ C. (44)We put f ˜ j = f j (2 − jsα · ). Then b f j = 2 − jsα c f ˜ j (2 − jsα · ) and it follows for p ∈ N k b f j k L ( C p ) = 2 − jsα k c f ˜ j k L (2 − jsα C p ) , where C p are the coronae deﬁned in (3). We conclude k g k ∞ k ω k X p ∈ N Z π − π ǫ j,p ( ϕ ) dϕ = X p ∈ N jsα Z π − π I j,p ( ϕ )2 ps dϕ ≍ X p ∈ N jsα k b f j k L ( C p ) = X p ∈ N k c f ˜ j k L (2 − jsα C p ) ≍ k c f ˜ j k = k f ˜ j k . Using k f ˜ j k ≤ k g (2 − jsα · ) k ∞ k ω k = k g k ∞ k ω k we arrive at (44). Finally, note that the functions ε j,p := C − / ǫ j,p have properties as desired.An immediate consequence of Lemma 4.6 is the following corollary, with particular choice j = p . Corollary 4.7.

Let s > be ﬁxed and assume that f j , j ∈ N , are fragments of the form (43) . Thereexist functions ε j : [ − π, π ) → R , each with the property R π − π ε j ( ϕ ) dϕ ≤ , and a constant C > suchthat for every j ∈ N and ϕ ∈ [ − π, π ) Z I j | b f j ( r, ϕ ) | dr ≤ Cε j ( ϕ )2 − js − jsα k g k ∞ k ω k . Moreover, the constant C can be chosen independent of the functions ω and g .Proof. The functions ε j := ε j,j obtained from Lemma 4.6 by choosing p = j have the desired properties.In particular they satisfy R π − π ε j ( ϕ ) dϕ ≤ j ∈ N .Note, that the smoothness of f j did not enter the proofs of the previous two results. By incorporatingsmoothness information we can strengthen Corollary 4.7 for a smooth fragment of the form (i) in (43). Lemma 4.8.

Let s > , α ∈ [0 , , and put γ = ⌈ / (1 − α ) ⌉ . For j ∈ N let f j be a smooth fragmentof the form (i) in (43) with regularity C β , β ∈ N . Then there exist functions ε j : [ − π, π ) → R and aconstant C > such that for every j ∈ N and ϕ ∈ [ − π, π ) Z I j | b f j ( r, ϕ ) | dr ≤ Cε j ( ϕ )2 − js − jsα − jsβ k g k β, ∞ k ω k β, with R π − π ε j ( ϕ ) dϕ ≤ for every j ∈ N . The constant C can be chosen independent of ω and g . Proof. If β = 0 the assertion is given by Corollary 4.7. For β ≥ β , whereby we restrict our considerations to j ≥ j = 0 the asserted estimate is clearly true,also due to Corollary 4.7.For ﬁxed angle ϕ ∈ [ − π, π ) let ∂ r denote the radial derivative in the corresponding direction. Put e g := ∂ r g , e ω := ∂ r ω , and e ω j := e ω (2 jsα · ). Then ∂ r f j ( · , ϕ ) = e gω j + 2 jsα g e ω j and we conclude for j ∈ N js Z I j | b f j ( r, ϕ ) | dr ≍ Z I j | r b f j ( r, ϕ ) | dr . Z I j | d ∂ r f j ( r, ϕ ) | dr ≍ Z I j | de gω j ( r, ϕ ) | dr + 2 jsα Z I j | d g e ω j ( r, ϕ ) | dr =: e I (0) j ( ϕ ) + 2 jsα I (1) j ( ϕ ) . Hence, we get I (0) j ( ϕ ) := Z I j | b f j ( r, ϕ ) | dr . − js e I (0) j ( ϕ ) + 2 − js (1 − α ) I (1) j ( ϕ ) . The integral I (1) j ( ϕ ) can be estimated in the same way as I (0) j ( ϕ ). After γ = ⌈ / (1 − α ) ⌉ iterations weend up with e I (0) j ( ϕ ), . . . , e I ( γ − j ( ϕ ), and e I ( γ ) j ( ϕ ) := I ( γ ) j ( ϕ ). Since γ ≥ / (1 − α ) it holds I (0) j ( ϕ ) . − js γ − X k =0 − js (1 − α ) k e I ( k ) j ( ϕ ) + 2 − js (1 − α ) γ e I ( γ ) j ( ϕ ) ≤ − js γ X k =0 e I ( k ) j ( ϕ ) . Note that g ∈ C β − ([ − κ, κ ] ) and e g ∈ C β − ([ − κ, κ ] ), with κ the ﬁxed parameter from (43). Using theinduction hypothesis, the expressions e I ( k ) j can be estimated with corresponding functions ε ( k ) j : [ − π, π ) → R . Putting ε j := P γk =0 ε ( k ) j yields the desired result.Our next goal is to estimate the energy of b f j contained in wedges W + J of the form (14). However, weallow more general scale-angle pairs J = ( j, ℓ ) ∈ J + from the set J + := (cid:8) ( j, ℓ ) : j ∈ N , ℓ ∈ [ − L − j , L + j + 1) (cid:9) . The associated orientations, given by ϕ J = ℓϕ j with ϕ j = π −⌊ js (1 − α ) ⌋− ﬁxed as in (4), then comprisethe whole interval [ − π , π ). To formulate the next result we need the quantities A J := 12 Z A J ε j ( ϕ ) dϕ, J ∈ J + , (45)corresponding to angular intervals A J given as in (12) and the functions ε j : [ − π, π ) → R associated to f j from Corollary 4.7. Lemma 4.9.

Let ( m , m ) ∈ N be ﬁxed and assume that f j is of the form (43) . Further, for J ∈ J + let A J be the value deﬁned in (45) . Then k ∂ ( m ,m ) b f j k L ( W + J ) . A J − j ( m + m ) sα − jsα k g k ∞ k ω k , with an implicit constant independent of J ∈ J + and the functions g and ω .Proof. Using Corollary 4.7 we calculate (in the nontrivial case when g = 0 and ω = 0) k g k − ∞ k ω k − Z W + J | b f j ( ξ ) | dξ = Z I j Z A J | b f j ( r, ϕ ) | r dϕ dr . − jsα Z A J ε j ( ϕ ) dϕ ≍ A J − jsα . This proves the assertion for ( m , m ) = (0 , m = ( m , m ) = (0 ,

0) we deﬁne a new window˜ ω ( x ) := x m ω ( x ) and put ˜ ω j ( x ) := ˜ ω (2 jsα x ) for x ∈ R . Then x m ω j ( x ) = 2 − jsα ( m + m ) ˜ ω (2 jsα x ) = 2 − jsα ( m + m ) ˜ ω j ( x ) , x ∈ R . f ˜ j := g ˜ ω j H (or in case of a smooth fragment f ˜ j := g ˜ ω j ) we can write Z W + J | ∂ ( m ,m ) b f j ( ξ ) | dξ ≍ Z W + J | [ x m f j ( ξ ) | dξ = 2 − jsα ( m + m ) Z W + J | c f ˜ j ( ξ ) | dξ. Since f ˜ j is of the form (43), the integral on the right-hand side can be estimated as above with Corol-lary 4.7. The proof is ﬁnished since k ˜ ω k . k ω k .For the smooth fragments we can improve this result, taking into account smoothness information. Lemma 4.10.

Let s > , α ∈ [0 , , and γ = ⌈ / (1 − α ) ⌉ . For j ∈ N let f j be a smooth fragment of theform (i) in (43) with regularity C β , β ∈ N . Let J = ( j, ℓ ) ∈ J + be a scale-angle pair, A J be given as in (45) . For ( m , m ) ∈ N k ∂ ( m ,m ) b f j k L ( W + J ) . A J − j ( m + m ) sα − jsα − jsβ k g k β, ∞ k ω k β, . Proof.

The proof is analogous to Lemma 4.9, using Lemma 4.8 instead of Corollary 4.7.To formulate the main result of this subsection we need the diﬀerential operator L J, := ( Id − jsα D J, )( Id − jsα D J, ) , (46)where Id is the identity and the partial derivatives D J, and D J, , dependent on J ∈ J + , are given by D J, := cos( ϕ J ) ∂ + sin( ϕ J ) ∂ and D J, := − sin( ϕ J ) ∂ + cos( ϕ J ) ∂ . (47)Recall that ϕ J = ℓϕ j with ϕ j as in (4). Further, recall the functions W J from (10) with supp W J ⊆ W + J . Proposition 4.11.

Let L J, be the diﬀerential operator (46) and let d ∈ N be arbitrary but ﬁxed.(i) An edge fragment f j of the form (ii) in (43) satisﬁes the estimate Z R |L dJ, ( b f j W J )( ξ ) | dξ . A J − jsα . (ii) A smooth fragment f j of the form (i) in (43) satisﬁes the improved estimate Z R |L dJ, ( b f j W J )( ξ ) | dξ . A J − jsα − jsβ . Here A J are the quantities deﬁned in (45) . The implicit constants are independent of J ∈ J + , ω and g .Proof. Using the deﬁnition (47) of the operators D J, and D J, we obtain for ( m , m ) ∈ N D m J, D m J, = X a + b = m a + b = m c a ,a ,b ,b (sin ϕ J ) a + b (cos ϕ J ) a + b ∂ ( a + a ,b + b ) (48)with purely combinatorial coeﬃcients c a ,a ,b ,b ∈ Z . This leads to kD m J, D m J, b f j k L ( W + J ) ≤ C ( m , m ) X a + b = m a + b = m k ∂ ( a + a ,b + b ) b f j k L ( W + J ) with a constant C ( m , m ) >

0. If f j is an edge fragment, we proceed with Lemma 4.9 and deduce kD m J, D m J, b f j k L ( W + J ) . X a + b = m a + b = m A J − j ( m + m ) sα − jsα . A J − j ( m + m ) sα − jsα . d , d ∈ N . The function D d J, D d J, ( b f j W J ) is a linear combination of terms ( D m J, D m J, b f j )( D n J, D n J, W J )with m + n = d and m + n = d . In view of (26) and the estimate above, it holds kD m J, D m J, b f j k L ( W + J ) · kD n J, D n J, W J k ∞ . A J − j ( m + m ) sα − jsα · − sjn − sjαn ≤ A J − jsαd − jsαd − jsα . Using H¨older’s inequality we thus obtain for d , d ∈ N kD d J, D d J, ( b f j W J ) k . A J − jsα ( d + d ) − jsα . Since L dJ, ( b f j W J ) consists of terms of the form2 jsα ( d + d ) D d D d with d , d ≤ d , not taking into account combinatorial coeﬃcients, the desired estimate for each term of L dJ, ( b f j W J ) follows.If f j is a smooth fragment of regularity C β , we use Lemma 4.10 instead of Lemma 4.9. The rest ofthe proof is completely analogous. As in the previous Subsection 4.2, let α ∈ [0 , β ∈ N , and κ, ν, ˜ ν > g ∈ C β ( κ [ − , , ν ), ω j = ω (2 jsα · ) and ω ∈ C ∞ ( R ) ∩ C β ( κ [ − , , ˜ ν ). Further, let δ denote the univariateDirac distribution and deﬁne δ { x =0 } := δ ⊗

1. We are interested in the Fourier localization of thedistributions d j := gω j δ { x =0 } , j ∈ N . (49)The exposition is analogous to the investigation of the functions (43) in Subsection 4.2. A valuable toolis given by the following lemma, where I j are the intervals deﬁned in (7). Lemma 4.12.

Let e A = 0 and κ, s > be ﬁxed. Further assume that h ∈ C β ( R ) , β ∈ N , is a function with supp h ⊆ [ − κ, κ ] . Then there are a constant C > and numbers η j ∈ [0 , , j ∈ N , with P j ∈ N η j ≤ such that for every j ∈ N Z e A I j | b h ( r ) | dr = Cη j | e A js | − β k h ( β ) k . Moreover, the constant C can be chosen independent of h and e A .Proof. Deﬁne ˜ η j := | e A js | β Z e A I j | b h ( r ) | dr. Then P j ∈ N ˜ η j ≤ C k h ( β ) k with a constant C > X j ∈ N ˜ η j ≍ X j ∈ N Z e A I j | r | β | b h ( r ) | dr ≍ X j ∈ N Z e A I j | d h ( β ) ( r ) | dr . Z R | d h ( β ) ( r ) | dr = k h ( β ) k . In case k h ( β ) k = 0, rescaling yields functions η j := C − k h ( β ) k − ˜ η j as desired. The case k h ( β ) k = 0 istrivial, since then h ≡ h ⊆ [ − κ, κ ].With Lemma 4.12 we can prove the following result. Lemma 4.13.

Let s > be ﬁxed and ϕ ∈ [ − π, π ) . We have for j ∈ N Z I j | b d j ( r, ϕ ) | dr . − jsα js (1 − α ) (1 + 2 js (1 − α ) | sin( ϕ ) | ) − β − k g k β, ∞ k ω k β, . Proof.

The distribution d j = gω j δ { x =0 } can be written as the tensor product d j = δ ⊗ h j of the Diracdistribution δ with the function h j := ( gω j ) | { x =0 } . Therefore, we have b d j = \ δ ⊗ h j = 1 ⊗ b h j = b h j ◦ π , where π : R → R is the orthogonal projection onto the second variable.Let ϕ ∈ [ − π, π ) and assume ﬁrst that | sin( ϕ ) | ≥ − js (1 − α ) . Then ϕ / ∈ {− π, } and it holds Z I j | b d j ( r, ϕ ) | dr = Z I j | b h j ( r sin( ϕ )) | dr = | sin( ϕ ) | − Z sin( ϕ ) I j | b h j ( r ) | dr. Applying Lemma 4.12 with e A = sin( ϕ ) yields Z I j | b d j ( r, ϕ ) | dr . η j − jsβ | sin( ϕ ) | − β − k h ( β ) j k = η j − jsαβ − js (1 − α ) β | sin( ϕ ) | − β − k h ( β ) j k L ( R ) , where η j ≤ j ∈ N . Note that Lemma 4.12 is applied with a diﬀerent integrand | b h j | at eachscale. However, the implicit constants are uniform over all j ∈ N .Applying Leibniz’s rule h ( β ) j = P γ ≤ β (cid:0) βγ (cid:1) ∂ γ g (0 , · ) ∂ β − γ ω j (0 , · ) we further deduce k h ( β ) j k . X γ ≤ β k ∂ γ g (0 , · ) k ∞ k ∂ β − γ ω j (0 , · ) k . . − jsα jsαβ k ω k β, k g k β, ∞ . This settles the case | sin( ϕ ) | ≥ − js (1 − α ) . If | sin( ϕ ) | < − js (1 − α ) we argue diﬀerently based on k b h j k ∞ ≤k h j k ≤ · − jsα k h j k . We deduce Z I j | b d j ( r, ϕ ) | dr = Z I j | b h j ( r sin( ϕ )) | dr . js k b h j k ∞ . js (1 − α ) k h j k . The proof is ﬁnished since k h j k ≤ k ω j (0 , · ) k k g (0 , · ) k ∞ ≤ − jsα k ω k k g k ∞ .Lemma 4.13 shows that the Fourier decay of d j is highly dependent on the direction ϕ ∈ [ − π, π ). Itmotivates the introduction of the quantity ℓ J := 1 + 2 js (1 − α ) | sin( ϕ J ) | , J = ( j, ℓ ) ∈ J + , (50)where ϕ J = ℓϕ j and ϕ j = π −⌊ js (1 − α ) ⌋− is the angle in (4). Note that 1 ≤ ℓ J ≤ js (1 − α ) .Similar to the analysis of the fragments (43), we now proceed to estimate the Fourier energy of b d j concentrated in a wedge W + J . The following result corresponds to Lemmas 4.9 and 4.10. Lemma 4.14.

Let J ∈ J + be a scale-angle pair, ℓ J the associated quantity (50) . For ( m , m ) ∈ N k ∂ ( m ,m ) b d j k L ( W + J ) . k g k β, ∞ k ω k β, ( , m = 0 , − jm sα js (1 − α ) ℓ − β − J , m = 0 . The implicit constant is independent of J ∈ J + and g and ω .Proof. If m = 0 the assertion follows from ∂ m b d j = ∂ m (cid:0)b h j ◦ π (cid:1) = 0. To handle the case m = 0,let us introduce the modiﬁed window ˜ ω ( x ) = x m ω ( x ) and its rescaled versions ˜ ω j = ˜ ω (2 jsα · ). Then˜ ω j ( x ) = 2 jsαm x m ω j ( x ), and as a consequence ∂ m b d j = ( − πi ) m \ x m d j = (2 πi ) m − jsαm c d ˜ j with d ˜ j := g ˜ ω j δ { x =0 } of the form (49). Hence, we can apply Lemma 4.13, which yields Z W + J | ∂ m b d j ( ξ ) | dξ ≍ − jsαm Z I j Z A J | c d ˜ j ( r, ϕ ) | r dϕ dr . − jsαm k g k β, ∞ k ˜ ω k β, Z A J js (1 − α ) (1 + 2 js (1 − α ) | sin( ϕ ) | ) − β − dϕ . − jsαm js (1 − α ) ℓ − β − J k g k β, ∞ k ω k β, . L J, := ( Id − js ℓ − J D J, )( Id − jsα D J, ) , (51)where we use the same notation as in the deﬁnition of the operator (46). Similar to Proposition 4.11 weobtain the following result. Proposition 4.15.

Let L J, be the diﬀerential operator (51) , J ∈ J + , and d ∈ N . We have Z R |L dJ, ( b d j W J )( ξ ) | dξ . js (1 − α ) ℓ − β − J . The implicit constant is independent of J ∈ J + , ω and g .Proof. Let ( m , m ) ∈ N . In view of (48) and Lemma 4.14 we obtain kD m J, D m J, b d j k L ( W + J ) . X a + b = m a + b = m | sin( ϕ J ) | a + b ) k ∂ ( a + a ,b + b ) b d j k L ( W + J ) = | sin( ϕ J ) | m k ∂ (0 ,m + m ) b d j k L ( W + J ) . | sin( ϕ J ) | m − j ( m + m ) sα js (1 − α ) ℓ − β − J . Using | sin( ϕ J ) | ≤ − js (1 − α ) ℓ J , we can further deduce kD m J, D m J, b d j k L ( W + J ) . − jm s − jm sα js (1 − α ) ℓ m − β − J . The function D d J, D d J, ( b d j W J ) is a linear combination of terms ( D m J, D m J, b d j )( D n J, D n J, W J ) with m + n = d and m + n = d . They satisfy kD m J, D m J, b d j k L ( W + J ) · kD n J, D n J, W J k ∞ . − jm s − jm sα js (1 − α ) ℓ m − β − J · − jsn − jsαn = 2 − jsd − sjαd js (1 − α ) ℓ m − β − J . Using H¨older’s inequality, it follows for d , d ∈ N kD d J, D d J, ( b d j W J ) k . − jsd − jsαd js (1 − α ) ℓ d − β − J . This proves the desired estimate for each term of L dJ, ( b d j W J ), since these are of the form2 jsd jsαd ℓ − d J D d D d ( b d j W J ) with d , d ≤ d . After the preparation of the preceding two subsections we now turn back to the proof of Proposition 4.5.Due to the assumptions, α ∈ [0 , s > β ∈ N , ν > f ∈ E β ([ − , ; ν ) is of thesimpliﬁed form (39). Further recall that for a cube Q ∈ Q j , j ∈ N , the notation f Q is used for theassociated fragment (37).Instead of the sequence θ Q = { θ µ } µ ∈ M j , we will analyze the relabelled sequence ˜ θ Q := { ˜ θ µ } µ ∈ M j withelements ˜ θ j,ℓ,k := θ j, [ ℓ + ℓ • ] ,k − k • , where we use the notation introduced at the end of Subsection 4.1. Recallthat the quantities ℓ • ∈ Z , k • ∈ Z are determined by Q ∈ Q j . In view of (42), we then have˜ θ j,ℓ,k = Z R c f ˜ Q ( ξ ) W j,ℓ +∆ ℓ ( ξ ) u j,ℓ +∆ ℓ,k +∆ k ( ξ ) dξ (52)with ﬁxed ∆ k ∈ [0 , , ∆ ℓ ∈ [0 ,

1) depending on Q ∈ Q j . We deﬁne ∆ J := (0 , ∆ ℓ ) and J + := J + ∆ J for scale-angle pairs J = ( j, ℓ ) ∈ J . Further, we deﬁne for J = ( j, ℓ ) ∈ J and K = ( K , K ) ∈ Z the sets Z QJ,K := n ( k , k ) ∈ Z : ℓ − J +∆ J ( k + ∆ k ) ∈ [ K , K + 1) , k + ∆ k ∈ [ K , K + 1) o , e Z QJ,K := n ( k , k ) ∈ Z : 2 − js (1 − α ) ( k + ∆ k ) ∈ [ K , K + 1) , k + ∆ k ∈ [ K , K + 1) o . (53)9In the deﬁnition of Z QJ,K the quantity ℓ J +∆ J = 1 + 2 − js (1 − α ) | sin( ϕ J +∆ J ) | is used, with angle ϕ J +∆ J =( ℓ + ∆ ℓ ) ϕ j and ϕ j as in (4). To shorten notation, it is further useful to henceforth abbreviate L K := (1 + K )(1 + K ) . (54)Essential for the proof of Proposition 4.5, especially part (ii), is the following lemma which disentanglesthe smooth contribution from the singular part. Lemma 4.16.

Let j ∈ N and Q ∈ Q j be ﬁxed. Under the assumptions of Proposition 4.5, the relabelledcoeﬃcients ˜ θ Q = { ˜ θ µ } µ ∈ M j given by (52) can be decomposed in the form ˜ θ µ = a µ + b µ , µ ∈ M j , such that for every J ∈ J with | J | = j and every K ∈ Z , with a uniform constant and d ∈ N ﬁxed, X k ∈ Z QJ,K | a j,ℓ,k | . L − dK − js (1+ α ) ℓ − β − J and X k ∈ e Z QJ,K | b j,ℓ,k | . L − dK e A J − jsα − jsβ . Here L K is the quantity deﬁned in (54) , Z QJ,K and e Z QJ,K are given by (53) , and e A J ∈ [0 , are numberswith P | J | = j e A J ≤ . If f Q is a smooth fragment, a possible decomposition is given by a µ := 0 and b µ := ˜ θ µ for µ ∈ M j . It is important to note that the implicit constants in Lemma 4.16 can be chosen uniformly for all j ∈ N and Q ∈ Q j . Proof.

Recall, that the functions u J,k , J ∈ J + , are obtained by rotation of the function u j, ,k ( ξ ) = 2 − js (1+ α ) / exp (cid:0) h πi (2 − js k , − jsα k ) , ξ i (cid:1) , ξ ∈ R . Hence D J, u J,k = (2 πi )2 − js k u J,k and D J, u J,k = (2 πi )2 − jsα k u J,k for each J ∈ J + . We thus establish L J, u J,k = (cid:0) π ) − js (1 − α ) k (cid:1)(cid:0) π ) k (cid:1) u J,k for the diﬀerential operator L J, deﬁned in (46). Applying partial integration, we obtain from (52)˜ θ J,k = (cid:0)(cid:0) π − js (1 − α ) ( k + ∆ k ) (cid:1)(cid:0) π ) ( k + ∆ k ) (cid:1)(cid:1) − d Z R L dJ + , ( c f ˜ Q W J + )( ξ ) u J + ,k +∆ k ( ξ ) dξ. Further, since u J +∆ J,k +∆ k ( ξ ) = u J +∆ J,k ( ξ ) · exp (cid:0) h πi (2 − js ∆ k , − jsα ∆ k ) , R J +∆ J ξ i (cid:1) and { u J + ,k } k ∈ Z is an orthonormal basis for L (Ξ J + ), we obtain for J ∈ J , | J | = j , and K = ( K , K ) ∈ Z X k ∈ e Z QJ,K | ˜ θ j,ℓ,k | ≤ (1 + K ) − d (1 + K ) − d Z R |L dJ + , ( c f ˜ Q W J + )( ξ ) | dξ. (55)In case that f ˜ Q is a smooth fragment, Proposition 4.11 (ii) yields X k ∈ e Z QJ,K | ˜ θ j,ℓ,k | . L − dK A J +∆ J − jsα − jsβ . By relabelling e A J := A J +∆ J we get the desired result.If f ˜ Q is an edge fragment, we prove the assertion by induction on β . In case β = 0, we choose b µ := ˜ θ µ and a µ := 0. Then the assertion is fulﬁlled, since by (55) and Proposition 4.11 (i) X k ∈ e Z QJ,K | ˜ θ j,ℓ,k | . L − dK A J − jsα . β ≥ j = 0, also due toProposition 4.11 (i).It thus remains to prove the assertion for j, β ∈ N . If j ∈ N , by deﬁnition, W J ( ξ ) = U j ( | ξ | ) V J ( ξ/ | ξ | ) = U (2 − js | ξ | ) V J ( ξ/ | ξ | ). To use induction we rewrite (52) in the form˜ θ J,k = 2 − js Z R | ξ | c f ˜ Q ( ξ ) U (2 − js | ξ | ) V J +∆ J ( ξ/ | ξ | )2 − js | ξ | u J +∆ J,k +∆ k ( ξ ) dξ. We introduce the function e U ( r ) = U ( r ) r , r ∈ R +0 , and put e U j := e U (2 − js · ) for j ≥

1. In addition,we put e U ( r ) = U ( r ), r ∈ R +0 . Further, we deﬁne e V J ( ξ ) := V J ( ξ ) cos( | ϕ ( ξ ) − ϕ J | ) − for ξ ∈ S and J ∈ J + , | J | ≥

1. For J = (0 ,

0) we deﬁne e V J := V J . Note that for ξ ∈ A J , | J | ≥

1, we have | ϕ ( ξ ) − ϕ J | ≤ ϕ + j / ≤ π/ ≤ cos( | ϕ ( ξ ) − ϕ J | ) − ≤

3. For J ∈ J + we then deﬁne f W J ( ξ ) := e U j ( | ξ | ) e V J ( ξ/ | ξ | ) , ξ ∈ R . The functions { f W J } J ∈ J are again wedge functions of the form (10) which satisfy condition (11) with some(possibly diﬀerent) constants 0 < A ≤ B < ∞ . Using these functions the coeﬃcients take the form˜ θ J,k = 2 − js Z R | ξ | cos( | ϕ ( ξ ) − ϕ J +∆ J | ) c f ˜ Q ( ξ ) f W J +∆ J ( ξ ) u J +∆ J,k +∆ k ( ξ ) dξ. ( J, k ) ∈ M j . (56)Now recall the directional derivative D J, = cos( ϕ J ) ∂ +sin( ϕ J ) ∂ depending on J ∈ J + . For ξ = ( ξ , ξ ) =( | ξ | cos ϕ, | ξ | sin ϕ ) ∈ R we have ξ cos( ϕ J ) + ξ sin( ϕ J ) = | ξ | (cid:0) cos( ϕ ) cos( ϕ J ) + sin( ϕ ) sin( ϕ J ) (cid:1) = | ξ | cos( | ϕ − ϕ J | ) . Hence, (56) becomes ˜ θ J,k = (2 πi ) − − js Z R (cid:0) D J + , f ˜ Q (cid:1) ∧ ( ξ ) f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ. The edge fragment f ˜ Q is of the form f j = gω (2 jsα · ) H with g ∈ C β ( R ), ω ∈ C ∞ ( R ), and the bivariatestep function H = h ⊗ e g = D J + , g , e ω = D J + , ω , and e ω j = e ω (2 jsα · ). Further,recall ∂ H = δ { x =0 } and note that D J + , H = cos( ϕ J + ) ∂ H + sin( ϕ J + ) ∂ H = cos( ϕ J + ) δ { x =0 } . The product rule yields D J + , f j = e gω j H + cos( ϕ J + ) δ { x =0 } ω j g + 2 jsα g e ω j H = T + cos( ϕ J + ) T + 2 jsα T with terms T := e gω j H , T := δ { x =0 } ω j g , and T := g e ω j H . This leads to the decomposition˜ θ j,ℓ,k ≍ − js c (0) j,ℓ,k + 2 − js cos( ϕ J + ) d (0) j,ℓ,k + 2 − js (1 − α ) ˜ θ (1) j,ℓ,k (57)with c (0) j,ℓ,k := Z R b T f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ,d (0) j,ℓ,k := Z R b T f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ, ˜ θ (1) j,ℓ,k := Z R b T f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ. Note that e g ∈ C β − ( R ) and e ω ∈ C ∞ ( R ) with supp e ω ⊆ supp ω . By induction we can decompose c (0) µ = a (0) µ + b (0) µ , µ ∈ M j , { a (0) µ } µ ∈ M j and { b (0) µ } µ ∈ M j satisfy the assertion for β −

1. The coeﬃcients { d (0) j,ℓ,k } µ ∈ M j can be handled with the help of Proposition 4.15. We have for the diﬀerential operator L J, from (51) L J, u J,k = (cid:0) π ) ℓ − dJ k (cid:1)(cid:0) π ) k (cid:1) u J,k . Partial integration leads to d (0) J,k = (cid:0) π ) ℓ − dJ + ( k + ∆ k ) (cid:1) − d (cid:0) π ) ( k + ∆ k ) (cid:1) − d Z R L dJ + , ( b T f W J + )( ξ ) u J + ,k +∆ k ( ξ ) dξ. We deduce that for every J ∈ J with | J | = j and every K = ( K , K ) ∈ Z X k ∈ Z QJ,K | d (0) J,k | ≤ ( L K ) − d Z R |L dJ + , ( b T f W J + )( ξ ) | dξ . ( L K ) − d js (1 − α ) ℓ − β − J + . Here we applied the fact that { u J + ,k } k ∈ Z is an orthonormal basis for L (Ξ J + ) and Proposition 4.15.Finally, note that | sin( ϕ J ) | ≍ | ϕ J | ≍ | ℓ | − js (1 − α ) uniformly for J ∈ J + . Hence, due to ∆ ℓ ∈ [0 , ℓ J ≍ | ℓ | ≍ | ℓ + ∆ ℓ | ≍ ℓ J +∆ J .It remains to handle the sequence { ˜ θ (1) µ } µ ∈ M j which resembles the original sequence { ˜ θ µ } µ ∈ M j and canbe handled accordingly. After γ iterations of the decomposition process (57) we end up with sequences { c (0) µ } µ ∈ M j , . . . , { c ( γ − µ } µ ∈ M j , { d (0) µ } µ ∈ M j , . . . , { d ( γ − µ } µ ∈ M j , and { ˜ θ ( γ ) µ } µ ∈ M j . We choose γ = ⌈ − α ⌉ sothat 2 − js (1 − α ) γ ≤ − js . We can apply the induction hypothesis on { c ( τ ) µ } µ ∈ M j for every τ ∈ { , . . . , γ − } , which leads to sequences { a ( τ ) µ } µ ∈ M j and { b ( τ ) µ } µ ∈ M j . Since g ∈ C β ( R ) ⊂ C β − ( R ) also { ˜ θ ( γ ) µ } µ ∈ M j can be decomposed into twosequences { a ( γ ) µ } µ ∈ M j and { b ( γ ) µ } µ ∈ M j .Finally, we obtain the desired decomposition ˜ θ µ = a µ + b µ , µ ∈ M j , with a µ := 2 − js γ − X τ =0 − js (1 − α ) τ a ( τ ) µ + 2 − js (1 − α ) γ a ( γ ) µ ,b µ := 2 − js γ − X τ =0 − js (1 − α ) τ b ( τ ) µ + 2 − js (1 − α ) γ b ( γ ) µ + 2 − js cos( ϕ J + ) γ − X τ =0 − js (1 − α ) τ d ( τ ) µ . With Lemma 4.16 in our toolbox, it is not diﬃcult any more to prove Proposition 4.5. The remainingconsiderations are merely interpolation arguments.

Proof of Proposition 4.5.

We ﬁrst handle part (i) of the proposition, when f j is a smooth fragment.Let M j denote the curvelet indices at scale j ∈ N and deﬁne M Qj,K := { ( j, ℓ, k ) ∈ M j : k ∈ e Z QJ,K } for K ∈ Z . Since P | J | = j A J .

1, Lemma 4.16 yields for K ∈ Z X µ ∈ M Qj,K | ˜ θ J,k | = X | J | = j X k ∈ e Z QJ,K | ˜ θ J,k | . L − dK − jsα − jsβ . Let us ﬁx d ∈ N as the smallest integer satisfying d > (1 + β ) /

4, i.e., d := ⌊ (1 + β ) / ⌋ + 1. This ensures X K ∈ Z L − d/ (1+ β ) K = X K ∈ Z (cid:0) (1 + K )(1 + K ) (cid:1) − d/ (1+ β ) . , (58)which will be important below. Further, note that we have the estimate X | J | = j e Z QJ,K ≤ X | J | = j js (1 − α ) . js (1 − α ) . k{ c λ } λ ∈ Λ k ℓ p ≤ ( /p − / k{ c λ } λ ∈ Λ k ℓ valid for 0 < p ≤ { c λ } λ ∈ Λ . Interpolation with p = 2 / (1 + β ) yields k{ ˜ θ µ } µ ∈ M Qj,K k / (1+ β ) . jsβ (1 − α ) ( L K ) − d − jsα − jsβ = ( L K ) − d − jsα (1+ β ) . The proof of part (i) is ﬁnished by applying the p -triangle inequality with p = 2 / (1 + β ) ≤

1. In viewof (58) we arrive at k{ ˜ θ µ } µ ∈ M j k / (1+ β )2 / (1+ β ) ≤ X K ∈ Z k{ ˜ θ µ } µ ∈ M Qj,K k / (1+ β )2 / (1+ β ) . − jsα . We ﬁnally turn to the proof of part (ii) and assume that f j is an edge fragment. We denote by { a µ } µ ∈ M j and { b µ } µ ∈ M j the decomposition of the sequence { ˜ θ µ } µ ∈ M j according to Lemma 4.16. Analogous to thetreatment of the smooth case, one can deduce k{ b µ } µ ∈ M j k / (1+ β )2 / (1+ β ) . − jsα . (59)It remains to handle { a µ } µ ∈ M j . Due to Lemma 4.16 we have with d ∈ N chosen as above X k ∈ Z QJ,K | a j,ℓ,k | . L − dK − js (1+ α ) ℓ − β − J . (60)Recall that ℓ J = 1 + 2 js (1 − α ) | sin( ϕ J ) | ≥ Z QJ,K ≤ ℓ J +∆ J ≍ ℓ J . (61)In view of (60) and (61) we conclude for ε > N QJ,K ( ε ) := n k ∈ Z QJ,K : | a j,ℓ,k | > ε o . min n ℓ J , ε − L − dK − js (1+ α ) ℓ − β − J o . The next step is to show X | J | = j N QJ,K ( ε ) . ε − / ( β +1) L − d/ ( β +1) K − js (1+ α ) / (1+ β ) . (62)Since ℓ J ≍ | ℓ | we can estimate, where we use the quantities ℓ −∗ := ⌈ ℓ ∗ ⌉ − ℓ + ∗ := ⌈ ℓ ∗ ⌉ with ℓ ∗ := ε − / (1+ β ) L − d/ (1+ β ) K − js α β ) , L + j X ℓ =0 N j,ℓ,K ( ε ) . L + j +1 X ℓ =1 min n ℓ, ε − L − dK − js (1+ α ) ℓ − β − o ≤ ℓ −∗ X ℓ =1 ℓ + L + j +1 X ℓ = ℓ + ∗ ε − L − dK − js (1+ α ) ℓ − β − . Note that ℓ −∗ ∈ N . Therefore, it holds ℓ −∗ X ℓ =1 ℓ = 12 ℓ −∗ ( ℓ −∗ + 1) ≤ ℓ ∗ = rhs(62) . Further, taking into account ℓ ∗ ≤ ℓ + ∗ , we obtain L + j +1 X ℓ = ℓ + ∗ ε − L − dK − js (1+ α ) ℓ − β − . ε − L − dK − js (1+ α ) ℓ − β ∗ = rhs(62) . Altogether, this proves (62) since the sum P ℓ = − L − j N Qj,ℓ,K ( ε ) can be estimated analogously.3Recall that M j denotes the curvelet indices at scale j . Using (58) we deduce from (62) n µ ∈ M j : | a µ | > ε o = X K ∈ Z X | J | = j N QJ,K ( ε ) . − js (1+ α ) / (1+ β ) ε − / (1+ β ) . This implies the following estimate, where we let ρ = max (cid:8) , s ( αβ − / (1 + β ) (cid:9) , k{ a µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . − js (1+ α ) / (1+ β ) = 2 − jsα js ( αβ − / (1+ β ) ≤ − jsα jρ . (63)In a last step, we combine (59) and (63). Using the p -triangle inequality with p = β ≤ k{ ˜ θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) ≤ k{ a µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) + k{ b µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . − jsα jρ + 2 − jsα . − jsα jρ , which ﬁnishes the proof. In this ﬁnal section we interpret and discuss the results of our previous investigations. First we note thatTheorem 3.11 complements the result of Theorem 3.7. The latter guarantees at least an approximationrate of order N − /α for E β ([ − , ; ν ) if β ≥ α − and α ∈ [ , β = α − . Theorem 3.11 now tells us that this rate doesnot improve for C β cartoons with β > α − , at least if we restrict to greedy approximations obtainedby simple thresholding. Hence, α -curvelets in the range α ∈ [ ,

1) cannot take advantage of cartoonregularity higher than α − .Turning to the range α ∈ [0 , ), according to both, Theorem 3.9 and Theorem 3.11, the approximationdeteriorates as α tends to 0. In Theorem 3.11 the achievable rate peaks for α = , a conﬁrmation of theoutstanding role of parabolic scaling for cartoon approximation. Among all α -curvelet frames, the classicparabolically scaled systems provide the best performance for E β ([ − , ; ν ) if β ≥

2. However, if β > N − is suboptimal.To better understand this behavior, recall the heuristic considerations in Subsection 3.3. A Taylorexpansion showed that C β curves with β ∈ (1 ,

2] are locally contained in (properly aligned) rectangles ofsize width ≈ length /β . This explains why α -scaling with α = β − is optimally suited to resolve suchcurves. It also indicates that it is not the smoothness of the curves that determines the best type ofscaling, but their local scaling behavior. If the second-order Taylor term at some point of a C β curve,where β ≥

2, does not vanish the scaling locally obeys width ≈ length / . Consequently, the choice α = is still the best for C β curves with β ≥ α from 0 deteriorates the approximability of the edge, but according to Theorem 4.1 forsignals from E β ([ − , ; ν ) this deterioration is masked by the overall approximation performance oforder N − β if α ∈ [0 , β − ].It is remarkable that up to now no frame is known where a nonadaptive thresholding scheme yieldsapproximation rates better than N − for the class E β ([ − , ; ν ), β >

2. As we have seen, α -scalingis not able to take advantage of smoothness beyond C , wherefore new ideas need to be considered.One approach might be based on the bendlet transform [40], which incorporates bending in addition to α -scaling for improved adaptability to the edges. While the bendlet dictionary seems to be useful forcertain image analysis tasks, the question of how to extract bendlet frames for approximation is not clearhowever and requires further research.Finally, let us derive some implications of the obtained results for other α -scaled representationsystems. The framework of α -molecules allows to transfer properties of C s,α to other systems of α -molecules if their parametrization is consistent with the parametrization ( M , Φ M ) of C s,α from (23). Forthe required notion of consistency, let us ﬁrst recall the phase-space metric ω α introduced in [24] for thephase space P = R + × T × R .4 Deﬁnition 5.1 ([24, Def. 4.1]) . Let α ∈ [0 , . The α -scaled index distance ω α : P × P → [1 , ∞ ) is deﬁnedby ω α ( p λ , p µ ) = max n s λ s µ , s µ s λ o (1 + d α ( p λ , p µ )) , where p λ = ( s λ , θ λ , x λ ) ∈ P , p µ = ( s µ , θ µ , x µ ) ∈ P , and with s = min { s λ , s µ } , e λ = (cos( θ λ ) , − sin( θ λ )) , d α ( p λ , p µ ) = s − α )0 | θ λ − θ µ | + s α | x λ − x µ | + s s − α )0 | θ λ − θ µ | |h e λ , x λ − x µ i| . The consistency of two parametrizations is then deﬁned as follows.

Deﬁnition 5.2 ([24, Def. 5.5]) . Let α ∈ [0 , and k > . Two parametrizations (Λ , Φ Λ ) and (∆ , Φ ∆ ) ,for index sets Λ and ∆ respectively, are called ( α, k ) -consistent if sup λ ∈ Λ X µ ∈ ∆ ω α (cid:0) Φ Λ ( λ ) , Φ ∆ ( µ ) (cid:1) − k < ∞ and sup µ ∈ ∆ X λ ∈ Λ ω α (cid:0) Φ Λ ( λ ) , Φ ∆ ( µ ) (cid:1) − k < ∞ . Since C s,α is a tight frame of α -molecules of arbitrary order, as shown by Lemma 2.5, the theory of α -molecules allows to deduce the following result practically for free. Theorem 5.3.

Let α ∈ [0 , and let M := { m λ } λ ∈ Λ be a frame of α -molecules whose parametrization,for some k > , is ( α, k ) -consistent with the α -curvelet parametrization ( M , Φ M ) of C s,α . Further, assumethat for some γ ∈ R +0 the order ( L, M, N , N ) of M satisﬁes L ≥ k (1 + γ ) , M ≥ k γ ) + α − , N ≥ k γ ) + 1 + α , N ≥ k (1 + γ ) . (64) Then the following holds true:(i) Let ˜ c λ := h f, m λ i , λ ∈ Λ , denote the analysis coeﬃcients of f ∈ E β ([ − , , ν ) with respect to M ,and assume β ∈ N . If (64) is fulﬁlled for γ = min { β, α − } , then { ˜ c λ } λ ∈ Λ ∈ ℓ p (Λ) for all p > γ .(ii) Let Θ = P λ ∈ Λ c λ m λ be a representation of the function Θ from (31) with respect to M . If (64) isfulﬁlled for some γ > ˜ γ := max { α, − α } − , then { c λ } λ ∈ Λ / ∈ ℓ p (Λ) for p < γ .Proof. According to [24, Thm. 5.6] condition (64) ensures that the systems M and C s,α are sparsityequivalent in ℓ p for p := γ , which means k ( h m λ , ψ µ i ) λ,µ k ℓ p → ℓ p < ∞ (see [24, Def. 5.3]). Since f = P µ h f, ψ µ i ψ µ and {h f, ψ µ i} µ ∈ ℓ p + ε ( M ), ε >

0, by Theorem 4.3, assertion ( i ) follows. For ( ii ) assumethat { c λ } λ ∈ ℓ p (Λ), which implies by sparsity equivalence {h Θ , ψ µ i} µ ∈ ℓ p ( M ). Using Θ = P µ h Θ , ψ µ i ψ µ and Lemma 4.2, this then implies an N -term approximation rate of order N − γ , in contradiction toTheorem 3.11.A direct corollary is obtained via Lemma 4.2. Corollary 5.4.

Under the assumptions of Theorem 5.3 (i), every dual frame { ˜ m λ } λ ∈ Λ of M yields – viasimple thresholding – N -term approximations f N to f ∈ E β ([ − , ) satisfying k f − f N k . N − min { β,α − } + ε , ε > arbitrary , as N → ∞ . To see the reach of these results, let us mention that the α -shearlet parametrization is ( α, k )-consistentwith the α -curvelet parametrization for k > α -shearlet frames, including both band-limited and compactly supported constructions (see [24,Prop. 3.11]).5 A Bessel Functions

In this appendix we collect some useful facts about Bessel functions mainly taken from [31] and [21]. Weare only interested in Bessel functions J ν of integer and half-integer order in the range ν ∈ {− , , , , . . . } .Bessel functions of this kind occur naturally in the Fourier analysis of radial functions. For t ∈ R + thevalue J ν ( t ) is conveniently deﬁned by either of the two series (see [31] and [21, Appendix B.3]) J ν ( t ) = (cid:16) t (cid:17) ν ∞ X k =0 ( − k Γ( k + 1)Γ( k + ν + 1) (cid:16) t (cid:17) k = 1 √ π (cid:16) t (cid:17) ν ∞ X k =0 ( − k Γ( k + )Γ( k + ν + 1) t k (2 k )! , (65)where the Gamma function Γ extends the factorial z ! to the complex numbers with Γ( z ) = ( z − k + ) = (2 k )! k !4 k √ π for k ∈ N .We explicitly remark, that deﬁnition (65) is also valid for ν = − , although this case is not included inthe exposition of [21]. As is obvious from the second representation, the functions J ν of half-integer ordercan be expressed in closed form in terms of trigonometric functions. For integer orders such closed formrepresentations do not exist.If f ( x ) = f ( | x | ) is a radial function on R d , d ∈ N , with a suitable function f deﬁned on R +0 , theFourier transform of f is given by the formula b f ( ξ ) = 2 π | ξ | ( d − / Z ∞ f ( r ) J d/ − (2 πr | ξ | ) r d/ dr , ξ ∈ R d . Applying this formula to the characteristic function χ B d (0 , of the d -dimensional unit ball B d (0 , R d yields( χ B d (0 , ) ∧ ( ξ ) = 2 π | ξ | ( d − / Z J d/ − (2 π | ξ | r ) r d/ dr = J d/ (2 π | ξ | ) | ξ | d/ , ξ ∈ R d . (66)Here, for the integration, we used the second of the following recurrence relations [21, Appendix B.2],which are valid for ν ∈ N and all t ∈ R + , t − ν +1 J ν ( t ) = − ddt (cid:0) t − ν +1 J ν − ( t ) (cid:1) and t ν J ν − ( t ) = ddt (cid:0) t ν J ν ( t ) (cid:1) . The case ν = is not treated in [21], yet it can be easily conﬁrmed by a direct calculation.By scaling, we can further deduce from (66) the following Fourier representation of the bivariatefunction Θ( x ) = χ B (0 , (2 x ), x ∈ R , from (31), b Θ( ξ ) = 14 ( χ B (0 , ) ∧ ( ξ/

2) = J ( π | ξ | )2 | ξ | , ξ ∈ R . (67)Important for our investigation in Section 3 is the asymptotic behavior of J ν ( r ) as r → ∞ . We cite thefollowing result from [21, Appendix B.8], which states for ν ∈ N the identity J ν ( r ) = r πr cos( r − πν − π R ν ( r ) , r ∈ R + , (68)with a function R ν given on R + by R ν ( r ) = (2 π ) − / r ν Γ( ν + 1 / e i ( r − πν/ − π/ Z ∞ e − rt t ν +1 / [(1 + it/ ν − / − dtt + (2 π ) − / r ν Γ( ν + 1 / e − i ( r − πν/ − π/ Z ∞ e − rt t ν +1 / [(1 − it/ ν − / − dtt . Further, for each ν ∈ N there is a constant C ν > R ν satisﬁes the estimate | R ν ( r ) | ≤ C ν r − / whenever r ≥ . (69)The representation (68) and the estimate (69) play an important role in the proof of Lemma 3.8. Forcompleteness, let us ﬁnally note that the identity (68) especially holds true in case ν = − , with vanishing R − ≡

0. This is a direct consequence of the deﬁnition (65) and the Taylor series of the cosine.6

Acknowledgements

The author acknowledges support by the BMS (Berlin Mathematical School) and thanks Prof. Dr. GittaKutyniok and Anton Kolleck for proofreading the manuscript, as well as many helpful comments.

References [1] J. Cai, B. Dong, S. Osher, and Z. Shen. Image restoration: total variation, wavelet frames, andbeyond.

J. Amer. Math. Soc. , 25(4):1033–1089, 2012.[2] E. J. Cand`es.

Ridgelets: theory and applications . Ph.D. thesis, Stanford University, CA, 1998. Onlineavailable: http://statweb.stanford.edu/~candes/publications.html .[3] E. J. Cand`es. Ridgelets and the representation of mutilated Sobolev functions.

SIAM J. Math.Anal. , 33(2):347–368, 2001.[4] E. J. Cand`es and D. L. Donoho. Curvelets – a surprisingly eﬀective nonadaptive representation forobjects with edges. In C. Rabut, A. Cohen, and L. Schumaker, editors,

Curves and Surfaces , pages105–120. Vanderbilt University Press, 2000.[5] E. J. Cand`es and D. L. Donoho. New tight frames of curvelets and optimal representations of objectswith C singularities. Comm. Pure Appl. Math. , 57(2):219–266, 2004.[6] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Compressing piece-wise smooth multidimensional functions using surﬂets: rate-distortion analysis. Technical re-port, Department of Electrical and Computer Engineering, Rice University, Mar. 2004. On-line available: http://dsp.rice.edu/sites/dsp.rice.edu/files/publications/report/2004/compressin-riceece-2004.pdf .[7] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Compression of higher dimensionalfunctions containing smooth discontinuities. In

Conference on Information Sciences and Systems ,Princeton, Mar. 2004.[8] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Surﬂets: a sparse representation formultidimensional functions containing smooth discontinuities. In

IEEE Symposium on InformationTheory , Chicago, Jul. 2004.[9] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Representation and compressionof multidimensional piecewise functions using surﬂets.

IEEE Trans. Inform. Theory , 55(1):374–400,2009.[10] C. Christopoulos, A. Skodras, and T. Ebrahimi. The JPEG2000 still image coding system: anoverview.

IEEE Trans. Consum. Electron. , 46(4):1103–1127, 2000.[11] A. Cohen, W. Dahmen, and R. DeVore. Adaptive wavelet methods for elliptic operator equations:convergence rates.

Math. Comp. , 70(233):27–75, 2001.[12] I. Daubechies.

Ten Lectures on Wavelets . SIAM, Philadelphia, 1992.[13] R. A. DeVore. Nonlinear approximation.

Acta Numerica , 7:51–150, 1998.[14] M. N. Do and M. Vetterli. The contourlet transform: an eﬃcient directional multiresolution imagerepresentation.

IEEE Trans. Image Process. , 14(12):2091–2106, 2005[15] D. L. Donoho. Wedgelets: nearly-minimax estimation of edges.

Ann. Statist. , 27:859–897, 1999.[16] D. L. Donoho. Orthonormal ridgelets and linear singularities.

SIAM J. Math. Anal. , 31(5):1062–1099,2000.[17] D. L. Donoho. Ridge functions and orthonormal ridgelets.

J. Approx. Theory , 111(2):143–179, 2001.7[18] D. L. Donoho. Sparse components of images and optimal atomic decompositions.

Constr. Approx. ,17(3):353–382, 2001.[19] D. L. Donoho and X. Huo. Beamlet pyramids: a new form of multiresolution analysis suited forextracting lines, curves, and objects from very noisy image data. In

Wavelet Applications in Signaland Image Processing VIII (San Diego, CA, 2000), Proc. SPIE , volume 4119, pages 434–444. SPIE,2000.[20] A. Flinth and M. Sch¨afer. Multivariate α -molecules. J. Approx. Theory , 202:64–108, 2016.[21] L. Grafakos.

Classical Fourier Analysis . Springer, 2nd edition, 2008.[22] P. Grohs. Ridgelet-type frame decompositions for Sobolev spaces related to linear transport.

J.Fourier Anal. Appl. , 18(2):309–325, 2012.[23] P. Grohs, S. Keiper, G. Kutyniok, and M. Sch¨afer. Cartoon approximation with α -curvelets. J.Fourier Anal. Appl. , 22(6):1235–1293, 2016.[24] P. Grohs, S. Keiper, G. Kutyniok, and M. Sch¨afer. α -Molecules. Appl. Comput. Harmon. Anal. ,41(1):297–336, 2016.[25] P. Grohs and G. Kutyniok. Parabolic molecules.

Found. Comput. Math. , 14(2):299–337, 2014.[26] P. Grohs and A. Obermeier. On the approximation of functions with line singularities by ridgelets.Technical Report 2016-4, Seminar for Applied Mathematics, ETH Z¨urich, Switzerland, 2016. Onlineavailable: .[27] P. Grohs and A. Obermeier. Optimal adaptive ridgelet schemes for linear advection equations.

Appl.Comput. Harmon. Anal. , 41(3):768–814, 2016.[28] K. Guo, G. Kutyniok, and D. Labate. Sparse multidimensional representations using anisotropicdilation and shear operators. In

Wavelets and Splines (Athens, GA, 2005) , pages 189–201. NashboroPress, Nashville, TN, 2006.[29] K. Guo and D. Labate. Optimally sparse multidimensional representation using shearlets.

SIAM J.Math. Anal. , 39(1):298–318, 2007.[30] K. Guo and D. Labate. The construction of smooth Parseval frames of shearlets.

Math. Model. Nat.Phenom. , 8(1):82–105, 2013.[31] W. Hackbusch, H. R. Schwarz, and E. Zeidler.

Teubner-Taschenbuch der Mathematik . B. G. TeubnerStuttgart, Leipzig, 1996.[32] S. Keiper. A ﬂexible shearlet transform – sparse approximation and dictionary learning. Bachelor’sthesis, TU Berlin, Germany, 2012.[33] P. Kittipoom, G. Kutyniok, and W.-Q Lim. Construction of compactly supported shearlet frames.

Constr. Approx. , 35(1):21–72, 2012.[34] J. Krommweh. Image approximation by adaptive tetrolet transform. In

International conference onsampling theory and applications , Marseille, France, May 2009.[35] G. Kutyniok, D. Labate, W.-Q Lim, and G. Weiss. Sparse multidimensional representation usingshearlets. In

Wavelets XI (San Diego, CA, 2005), SPIE Proc. , volume 5914, pages 254–262. SPIE,Bellingham, WA, 2005.[36] G. Kutyniok, J. Lemvig, and W.-Q Lim. Optimally sparse approximations of 3D functions bycompactly supported shearlet frames.

SIAM J. Math. Anal. , 44(4):2962–3017, 2012.[37] G. Kutyniok and W.-Q Lim. Compactly supported shearlets are optimally sparse.

J. Approx. Theory ,163(11):1564–1589, 2011.8[38] E. Le Pennec and S. Mallat. Bandelet image approximation and compression.

Multiscale Model.Simul. , 4(3):992–1039, 2005.[39] E. Le Pennec and S. Mallat. Sparse geometric image representations with bandelets.

IEEE Trans.Image Process. , 14(4):423–438, 2005.[40] C. Lessig, P. Petersen, and M. Sch¨afer. Bendlets: a second-order shearlet transform with bentelements. 2016. submitted. arXiv:1607.05520 [math.FA].[41] A. Lisowska. Smoothlets – multiscale functions for adaptive representation of images.

IEEE Trans.Image Process. , 20(7):1777–1787, 2011.[42] A. Lisowska. Multiwedgelets in image denoising. In J. Park, J. Ng, H.-Y. Jeong, and B. Waluyo,editors,

Multimedia and Ubiquitous Engineering: MUE 2013 , pages 3–11. Springer Netherlands,Dordrecht, 2013.[43] S. Mallat.

A Wavelet Tour of Signal Processing: The Sparse Way . Academic Press, 2nd edition,2008.[44] S. Mallat. Geometrical grouplets.

Appl. Comput. Harmon. Anal. , 26(2):161–180, 2009.[45] R. M. Willet and R. D. Nowak. Platelets: a multiscale approach for recovering edges and surfacesin photon-limited medical imaging.