aa r X i v : . [ m a t h . F A ] D ec The Role of α -Scaling for Cartoon Approximation Martin Sch¨afer ∗ Institute of Mathematics, Technische Universit¨at BerlinStraße des 17. Juni 136, 10623 Berlin, GermanyOctober 10, 2018
Abstract
The class of cartoon-like functions, classicly defined as piecewise C functions consisting of smoothregions separated by C discontinuity curves, is a well-established model for image data. The questfor frames providing optimal approximation for this class has among others led to the development ofcurvelets, contourlets, and shearlets. Due to parabolic scaling, these systems are able to provide N -term approximations converging with a quasi-optimal rate of order N − . Replacing parabolic scalingby α -scaling, one can construct α -curvelet and α -shearlet frames which interpolate between wavelet-type systems for α = 1, the classic parabolically scaled systems for α = , and ridgelet-type systemsfor α = 0. Previous research shows that if α ∈ [ ,
1) they provide quasi-optimal approximation forcartoons of regularity C /α with a rate of order N − /α .In this work we continue the exploration of approximation properties of α -scaled representationsystems, with the aim to better understand the role of the parameter α . Concerning α -curveletswith α <
1, we prove that the best possible N -term approximation rate achievable for cartoons withcurved edges is limited to at most N − / (1 − α ) , independent of the smoothness of the cartoons. Themaximal rate that can be obtained by simple thresholding of the frame coefficients is even boundedby N − / max { α, − α } . Systems of α -curvelets thus cannot take advantage of regularity higher than C /α if α ∈ [ , N − /α cannot be surpassed. For C β cartoons with β ≥ -curvelets provide the best performance with a rate of order N − , however below the optimal rateof order N − β if β >
2. In the range α ∈ [0 , ] the achievable rate cannot exceed N − / (1 − α ) anddeteriorates as α approaches 0.The approximation performance of α -curvelets is different if the edges of the cartoons are straight.Assuming C β regularity, we establish an approximation rate of order N − min { α − ,β } , which improvesas α tends to 0. In the range α ∈ [0 , β − ] it is even quasi-optimal, generalizing optimality results forridgelets. By applying the framework of α -molecules, we finally extend the obtained results to other α -scaled representation systems, including for instance α -shearlet frames. Keywords:
Cartoon Images, Nonlinear Approximation, Wavelets, Curvelets, Shearlets, Ridgelets,Anisotropic Scaling, α -Molecules. MSC2000 Subject Classification:
In the age of ‘big data’, efficient data representation is an objective of an ever increasing importance. Notonly does it simplify the handling of the data due to the reduction of needed storage space or the possiblespeed-up of processing times. The knowledge of a ‘good’ representation also gives valuable informationabout the structure of the data itself, simplifying certain processing tasks or even just enabling them inthe first place. As an example we may think of the restoration of corrupted signals or the separation ofseveral superimposed signals of distinct types. ∗ Email: [email protected] f a sequence of approximants( f N ) N ∈ N converging to the signal. A standard choice here is to use N -term approximations in therespective dictionary, i.e., approximants being built from just N dictionary elements.A main goal of approximation theory is the development of approximation schemes with a best possiblespeed of convergence, commonly quantified by the asymptotic decay of the approximation error k f − f N k as N → ∞ . With regard to N -term approximations, the achievable rate is determined by the utilizeddictionary in the background and one aims to find dictionaries providing high approximation rates forthe data. Such dictionaries are said to sparsely approximate the corresponding signals and clearly needto be chosen depending on the considered signal class. For efficient data representation it is thereforeessential, first, to be able to precisely specify the type of data under consideration, e.g., in the form of anappropriate model, and, second, to develop dictionaries, well adapted to the specific data class, providingsparse approximations. Subsequently, we are interested in the sparse approximation of image data. In our investigation, we willalways stay in the continuum setting, where images are as usual represented as functions supported onsome compact image domain Ω ⊂ R with values containing pixel information at the respective positions,such as e.g. color or brightness information. Being compactly supported and bounded, the image datacan conveniently be modeled as a subset of the Hilbert space L (Ω), which in turn is considered as asubspace of L ( R ). Hence, we are in a concrete Hilbert space scenario and can resort to the methodologydescribed above, i.e., we aim for appropriate image models and sparsifying dictionaries.For the space L (Ω) the classic Fourier systems constitute an orthonormal basis, providing a straight-forward procedure for representation. However, Fourier systems work well only if the functions underconsideration are smooth. For general images such smoothness assumptions are certainly not fulfilled.As another popular representation system wavelets [12, 43] come to mind. Nowadays, they are one ofthe most widely used systems in applied harmonic analysis, with various applications ranging from signalcompression (e.g. JPEG2000 [10]) and restoration [1] to PDE solvers [11]. In particular, they have theability to sparsely approximate functions, which are smooth apart from isolated point singularities. Forgeneral image data, however, such regularity assumptions are still too strict. A characteristic feature ofimages are edges, leading to curvilinear discontinuities in the data. With respect to such line singularities,wavelet systems do not perform optimally any more. The isotropy of their scaling prohibits an optimalresolution of these kind of anisotropic structures.With the desire to specifically model the occurrence of edges, the concept of cartoon-like functionsemerged. These are piecewise smooth functions featuring discontinuities along lower-dimensional man-ifolds, in our case along the 1-dimensional edge curves of the image. Based on such functions, suitablemodels for natural images have been conceived and different model classes have been introduced. Typ-ically, these classes are characterized by a specific smoothness of the regions and by certain conditionson the separating edges. As examples, let us mention the classic cartoons [5] with C regularity of theregions and the discontinuity curves, or the horizon classes considered e.g. in [15, 8, 39].The achievable approximation rate for a class of cartoon-like functions essentially depends on theregularity of the cartoons, including both the smoothness of the edge curves and the smoothness of theregions in between. It was shown in [39, 38] that C β regularity of the regions and the separating edgeswith β > N − β . By information theoretic arguments, it has furtherbeen established that this rate cannot be surpassed [18], at least in a class-wise sense. Interestingly, thebenchmark N − β is the same for the class of so-called binary cartoons, i.e., cartoon-like functions withconstant regions, and it also does not change if one restricts to C β smooth functions without any edges.With the model of cartoon-like functions at hand, let us turn again to the question of efficient imagerepresentation. In the past, a great amount of energy has been devoted to the effort of constructingdictionaries well-suited for cartoon approximation. Thereby, many different paths have been pursued andthe developed methods can be divided into two general categories: adaptive and nonadaptive methods.Adaptive methods are by nature more flexible and have the inherent advantage of being able toadjust to the given data. On the downside, the increased flexibility typically comes at the cost of highercomputational complexity of the employed approximation and reconstruction schemes. Some prominentexamples of adaptive methods for image data are based on wedgelet dictionaries [15] and their higher-orderrelatives, so-called surflets [9, 8]. They have been shown to reach the optimality bound N − β for binarycartoons with C β regularity [6, 7]. Other notable dictionaries used for adaptive approximation includebeamlets [19], platelets [45], and derivatives of wedgelets such as multiwedgelets [42] or smoothlets [41].More recently, new adaptive schemes have emerged that use bases, e.g., bandelets [39], grouplets [44],and tetrolets [34]. Quasi-optimal approximation for C β cartoons with β > L ( R ). By giving them a slow decay along the ridge,Donoho constructed an orthonormal basis whose elements are called ‘orthonormal ridgelets’ [16]. Theirclose relationship to the original concept has been analyzed in [17]. Another construction, based ondirectional scaling, goes back to Grohs, providing tight frames [22]. This kind of construction coincideswith the concept of ‘0-curvelets’ presented below.To deal with curved edges, numerous types of frames have been developed. An important milestonewas the introduction of the first generation of curvelets [4] by Cand`es and Donoho in 1999. They representthe first frame to reach the optimal approximation order of N − for C cartoons via simple thresholding.A modification of this system, the second generation of curvelets [5], was introduced in 2002 by thesame authors. It is based on a more elegant and simpler construction principle, yet features the samequasi-optimal approximation properties. Following this early breakthrough, other constructions bettersuited for digital implementation were developed. Let us mention contourlets [14] by Do and Vetterliand shearlets, whose construction goes back mainly to Guo, Kutyniok, Labate, Lim, and Weiss. Thefirst shearlet construction consisted of band-limited functions and was presented in [35, 28]. Later, moresophisticated shearlet systems were developed, such as e.g. the well-localized band-limited Parseval framein [30] or even systems of compactly supported shearlets [33]. Like curvelets, those systems providequasi-optimal approximation for C cartoons. For the classic band-limited shearlets this was establishedin [29], for those with compact support in [37].A common principle underlying the above constructions is parabolic scaling, a type of scaling optimallyadapted to C singularity curves. It is essential for the quasi-optimal approximation of C cartoons andled to the notion of parabolic molecules [25]. This concept unifies various parabolically scaled systemsunder one roof, in particular the classic curvelet and shearlet systems, and is the predecessor of the moregeneral framework of α -molecules [24]. α -Scaling Comparing the approximation properties of wavelets, curvelets, and ridgelets, a distinct behavior withrespect to their ability to resolve edges is characteristic. Ridgelets are optimally adapted to straightedges, curvelets are optimal for C line singularities, and wavelets for point singularities. This distinctbehavior is due to the different scaling laws underlying their respective constructions: isotropic scalingfor wavelets, parabolic scaling for curvelets, and directional scaling for ridgelets.Introducing a parameter α ∈ R and associated α -scaling matrices A α,s = (cid:18) s s α (cid:19) , s > , (1)one can interpolate between these different types of scaling and construct corresponding α -scaled repre-sentation systems. Incorporating α -scaling in the original construction of curvelets, for instance, yieldsso-called α -curvelets [23]. For α ∈ [0 , α = 0, the classic curvelets for α = , and wavelets for α = 1. In a similarfashion, α -shearlet systems [32, 36] can be obtained by modifying the classic shearlet constructions.A natural question concerning such α -scaled systems is how their approximation properties are affectedby a change of the parameter α . With regard to cartoon approximation, this question has been pursuedin [23] for α -curvelet frames and in [32, 36] for α -shearlet frames. It was shown that, if α ∈ [ ,
1) andif the cartoon f is of regularity C β with β = α − , simple thresholding of the coefficients yields N -termapproximations f N with a convergence of k f − f N k . N − β log( N ) β as N → ∞ , (2)which apart from the log-factor is optimal. Later, these results were further extended utilizing thetheory of α -molecules [24]. This is a framework providing a unified approach to α -scaled systems, basedsolely on assumptions on the time-frequency localization of the respective functions. It allows to transferapproximation results obtained for one system of α -molecules to other systems, under certain consistencyconditions. In particular, the rate (2) for α -curvelets was generalized (in a weak form) to other α -scaledrepresentation systems [24], which all achieve a rate of N − β + ε with ε > α -scaled representation systems and their abilityto approximate cartoon-like functions remain open, e.g., their performance in the range α < or theirsuitability for the approximation of straight edges. In this research we want to address these openquestions, shedding (even) more light on the role of the parameter α . Our exposition starts with a short review of α -scaled systems in Section 2, where also a specific con-struction of an α -curvelet frame for L ( R ) is presented. This frame, denoted by C s,α , will serve as aprototypical system whose properties have ramifications for other α -scaled systems, such as for example α -shearlets, due to the transference principle of the framework of α -molecules.In the main part of the article, Sections 3 and 4, we analyze the N -term approximation properties ofthe frame C s,α with regard to different classes of cartoon images. In Section 3 we start with cartoons withcurved edges and first introduce corresponding signal classes of C β regularity for β ∈ [0 , ∞ ). Theorem 3.2recalls N − β as the order of the maximal achievable approximation rate for such C β cartoons, whichcannot be surpassed by any polynomial-depth restricted N -term approximation scheme, independent ofthe utilized dictionary.Then we recall the quasi-optimal approximation (2) of α -curvelets, proved in [23], if α ∈ [ ,
1) and β = α − . Our main findings in Section 3, Theorems 3.9 and 3.11, extend and complement this result.Theorem 3.9 shows that the best possible N -term approximation rate achievable by C s,α for cartoonswith curved edges is limited to at most N − − α , where α < N − { α, − α } if asimple thresholding scheme is used.These bounds show that α -curvelets with α ∈ [ ,
1) cannot take advantage of regularity higher than C /α . Furthermore, they prohibit optimal approximation of C β cartoons if β >
2, since decreasing α beyond deteriorates the achievable rates compared to the classic curvelets. Hence, with a rate of order N − , these provide the best performance among all α -curvelet systems, if the regularity of the cartoonsis at least C and curved singularities are involved. As a consequence, no curvelet system can reach theoptimality bound N − β if β >
2. In fact, up to now, no frame construction is known where a nonadaptiveapproximation scheme can break this N − barrier and the quest for such frames remains open.In Section 4 we consider cartoons featuring only straight edges. For the corresponding classes ofregularity C β the same optimality benchmark N − β holds true as for the cartoons with curved edges. Ourmain result of Section 4, Theorem 4.1, shows that a simple thresholding scheme for the α -curvelet frame C s,α yields approximation rates of order N − min { α − ,β } . Hence, here a smaller α is beneficial and evenensures quasi-optimal approximation if α ∈ [0 , β − ]. This finding generalizes earlier results for ridgelets.We finish with a short discussion of our results in Section 5. In particular, we point out someramifications for other α -scaled representation systems, utilizing the framework of α -molecules. All α -scaled systems which are frames and in a certain sense consistent with C s,α feature similar properties,formulated in Theorem 5.3 and Corollary 5.4.Some useful properties of Bessel functions needed in Section 3 are collected in the appendix. Before we begin, let us fix some general notation. Writing N we will refer to the natural numbers withoutzero, and we let N := N ∪ { } . As usual, Z , R and C denote the integer, real and complex numbers.Further, we put R +0 := [0 , ∞ ) and R + := (0 , ∞ ). We also introduce the ‘floor’ and ‘ceiling’ of t ∈ R , ⌊ t ⌋ := max { n ∈ Z : n ≤ t } and ⌈ t ⌉ := min { n ∈ Z : n ≥ t } . The symbol T is used for the torus obtainedfrom the interval [0 , π ] by identifying the endpoints. The unit-circle in C ≃ R is denoted by S .The vector space R d , d ∈ N , is equipped with the Euclidean scalar product h· , ·i and associatednorm | · | . The notation | · | p , p ∈ (0 , ∞ ], is used for the p -(quasi-)norms on R d . For a multi-index m = ( m , . . . , m d ) ∈ N d , ∂ m := ∂ m · · · ∂ m d d is a differential operator with ∂ i , i ∈ { , . . . , d } , the partialderivative in the i -th coordinate direction. Given a vector x = ( x , . . . , x d ) ∈ R d , we further define x m := x m · · · x m d d (with the convention 0 := 1).If A ( ω ) ≤ CB ( ω ) holds true for two quantities A, B ∈ R depending on a set of parameters ω with auniform constant C >
0, we write A . B or equivalently B & A . If both, A . B and B . A , hold true,we denote this by A ≍ B .For measurable subsets Ω ⊆ R d we let L p (Ω), p ∈ (0 , ∞ ], denote the usual Lebesgue spaces withrespect to the Lebesgue measure. The corresponding (quasi-)norms are denoted by k · k L p (Ω) , in caseΩ = R d we abbreviate k · k p := k · k L p ( R d ) . For the scalar product on L (Ω) the same notation h· , ·i asfor the Euclidean product on R d is used. The Lebesgue sequence spaces, for a discrete index set Λ, aredenoted by ℓ p (Λ) with associated (quasi-)norms k · k ℓ p . The definition of their weak counterparts wℓ p (Λ),equipped with (quasi-)norms k · k wℓ p , are recalled in Section 4.The space C β loc ( R d ), for an integer β ∈ N ∪ {∞} , shall comprise all continuous real-valued functionson R d , whose classic derivatives up to order β ∈ N exist. For β ∈ [0 , ∞ ) we then define C β ( R d ) := n f ∈ C ⌊ β ⌋ loc ( R d ) : k f k C β ( R d ) := k f k C ⌊ β ⌋ ( R d ) + X | m | = ⌊ β ⌋ H¨ol ( ∂ m f, β − ⌊ β ⌋ ) < ∞ o , where k f k C ⌊ β ⌋ ( R d ) := P | m | ≤⌊ β ⌋ sup x ∈ R d | ∂ m f ( x ) | and the H¨older constant of exponent α ∈ [0 ,
1] is given by
H¨ol ( f, α ) := sup x,y ∈ R d | f ( x ) − f ( y ) || x − y | α . The notation C β (Ω), for some open subset Ω ⊆ R d , is used for functions f ∈ C β ( R d ) whose supportsupp f is compact and contained in the closure Ω of Ω. Frequently, we also need to measure functions f ∈ C β loc ( R d ), β ∈ N , with the following Sobolev norms, where p ∈ [1 , ∞ ], k f k β,p := k f k W β,p ( R d ) := X | m | ≤ β k ∂ m f k L p ( R d ) . Finally, we will use the following version of the Fourier transform. For a Schwartz function f ∈ S ( R d ) F f ( ξ ) := Z R d f ( x ) exp( − πi h x, ξ i ) dx , ξ ∈ R d . As usual, F is extended to the tempered distributions S ′ ( R d ), and we often write b f for F f . α -Curvelets Directional multi-scale systems based on α -scaling feature a characteristic tiling of the frequency do-main. The multi-scale structure is reflected by a partition of the Fourier plane into dyadic coronae,further divided into wedge-like tiles, where the energy of the system elements is concentrated. In case ofinhomogeneous systems, a ball around the origin corresponds to the low-frequency base scale.A prototypical instance of such an α -scaled system is the frame C s,α of α -curvelets, thoroughly definedin this section. It is prototypical in the sense that many of its properties transfer – via the framework of α -molecules [24] – to other α -scaled systems. Among these are other α -curvelet constructions [5, 23], butalso band-limited [35, 28, 30] as well as compactly supported [33, 32, 36] α -shearlet systems. This factgives the system C s,α a special significance for our purpose and motivates its detailed discussion here.Before defining C s,α , which is similar to the construction of α -curvelets in [23], let us first elaboratethe geometric aspects of the corresponding frequency tiling. At scales j ≥ C j := n ξ ∈ R : C s ( j − ≤ | ξ | ≤ C s ( j +1) o , (3)where s > C > j is given by theangle ϕ j := π −⌊ js (1 − α ) ⌋− (4)and depends on another parameter α ∈ ( −∞ , α -scaled rectangle of dimension 2 js × jsα . By combining opposite wedges to wedge pairs, weobtain the tiles for the scales j ≥
1. There is only one tile associated with the base scale j = 0, the lowfrequency ball C := { ξ ∈ R : | ξ | ≤ C s } .For convenience, let us also introduce the angle ϕ := π . According to the above construction, ateach scale j ∈ N the number of tiles L j is given by L := πϕ − = 1 and L j := πϕ − j = 2 ⌊ js (1 − α ) ⌋ +1 , j ≥ . (5)In the following, the individual tiles will be denoted by W j,ℓ and indexed by the set J := (cid:8) ( j, ℓ ) : j ∈ N , ℓ ∈ {− L − j , . . . , L + j } (cid:9) with L − j := ⌊ L j / ⌋ and L + j := ⌈ L j / ⌉ −
1. Hereby we let W , := C , and in each corona C j with j ≥ W j, shall be aligned horizontally, i.e., W j, := n ξ = ( ξ , ξ ) ∈ C j : | ξ | ≥ cos( ϕ j / | ξ | o . The remaining tiles W j,ℓ , ℓ = 0, are obtained via rotations of W j, by integer multiples ϕ j,ℓ := ℓϕ j of theangle ϕ j defined in (4). Hence, W j,ℓ := R − j,ℓ W j, with rotation matrix R j,ℓ := R ϕ j,ℓ , where R ϕ := (cid:18) cos( ϕ ) − sin( ϕ )sin( ϕ ) cos( ϕ ) (cid:19) , ϕ ∈ R . (6)The resulting tiling of the Fourier domain is schematically depicted in Figure 1 (a).We remark that in contrast to [23], where α ∈ [0 , α ∈ ( −∞ ,
1] in the α -curvelet construc-tion. This range is natural for the considered inhomogeneous systems. If α >
1, the number of tiles L j in each corona decreases with rising scale, and eventually L j = 1. Thus, at high scales, those systemswould behave like isotropically scaled systems with α = 1. α -Curvelets C s,α Let us now turn to the actual construction of the α -curvelet frame C s,α . To realize the described frequencytiling, smooth functions W J : R → C , J ∈ J , are used, with compact support approximately given bythe tiles W J . It is convenient to construct them as tensor products of a radial and an angular component.This allows to realize the desired support separately on the ray R +0 = [0 , ∞ ) and on the circle S ⊂ R .Projecting the coronae C j onto the ray R +0 yields the intervals I := C · [0 , s ] and I j := C · [2 s ( j − , s ( j +1) ] , j ≥ . (7) C I C I ... C j W j, I j W j,ℓ ϕ j ϕ j,ℓ (a) W j, W + j, W − j, Ξ j, C j S A j, A − j, I − j I j (b)Figure 1: (a): Tiling of Fourier domain into coronae C j and wedges W j,ℓ . (b): Schematic display of thefrequency support of a wedge function W j, .For the radial subdivision, we thus utilize nonnegative smooth functions U j ∈ C ∞ ( R +0 ), j ∈ N , whichsatisfy the support condition supp U j ⊆ I j and for r ∈ R +0 A ≤ X j ≥ U j ( r ) ≤ B with constants 0 < A ≤ B < ∞ . (8)More concretely, we assume that the functions U j , j ≥
1, are generated by a single function U ∈ C ∞ ( R +0 , [0 , U j ( · ) := U (2 − js · ) and that there are 1 < τ < τ < s such thatsupp U ⊆ C · [0 , τ ] , p A ≤ U ≤ p B on C · [0 , τ ] , supp U ⊆ C · [2 − s τ , τ ] , p A ≤ U ≤ p B on C · [2 − s τ , τ ] . (9)Such functions exist and can even be constructed with A = B = 1 in (8).For the angular subdivision, we construct at each scale j ∈ N a smooth partition on the unit circle S ⊂ R , reflecting the angular support of the tiles W j,ℓ . We start with a function e V ∈ C ∞ ( R , [0 , e V ⊆ [ − π, π ] , √ A ≤ e V ≤ √ B on [ − π , π ] , A ≤ P k ∈ Z e V ( · − kπ ) ≤ B , where 0 < A ≤ B < ∞ . Scaling then gives rise to the functions e V j ( · ) := e V ( L j · ) ∈ C ∞ ( R , [0 , j ∈ N . Via the bijection t e it these functions yield functions e V j, ∈ C ∞ ( S , [0 , V j, ( ξ ) := e V j, ( ξ ) + e V j, ( − ξ ) , ξ ∈ S , and note that √ A ≤ V , ≤ √ B on S . Applying the rotation (6) then yields functions V j,ℓ ( · ) := V j, ( R j,ℓ · ) for every J = ( j, ℓ ) ∈ J , which satisfy A ≤ P | J | = j V J ( ξ ) ≤ B for all ξ ∈ S . Here we use thenotation | J | := j for J = ( j, ℓ ) ∈ J .Finally, we are ready to define the wedge functions W j,ℓ ∈ C ∞ ( R ) as the polar tensor products W j,ℓ ( ξ ) := U j ( | ξ | ) V j,ℓ ( ξ/ | ξ | ) , ξ ∈ R . (10)These functions are non-negative ‘bumps’ approximately supported in the corresponding wedges W j,ℓ .They are symmetric, i.e., W j,ℓ ( ξ ) = W j,ℓ ( − ξ ) for ξ ∈ R , and they satisfy A := A A ≤ X J =( j,ℓ ) ∈ J W J ( ξ ) ≤ B B =: B , ξ ∈ R . (11)Let us analyze the support of W J in more detail. Recall the angular function e V j, and note that its supporton S covers an angle range of ϕ + j := ϕ j with ϕ j = πL − j as in (4). Moreover, √ A ≤ e V j, ≤ √ B on arange of size ϕ − j := ϕ j . Hence, supp V j,ℓ ⊆ A j,ℓ and V j,ℓ ≍ A − j,ℓ for the angular intervals A j,ℓ := R − j,ℓ A j, with A j, := n ξ = ( ξ , ξ ) ∈ S : | ξ | ≥ cos( ϕ + j / o , A − j,ℓ := R − j,ℓ A − j, with A − j, := n ξ = ( ξ , ξ ) ∈ S : | ξ | ≥ cos( ϕ − j / o . (12)Next, recall the functions U j on the ray with supp U j ⊆ I j . Due to (8) and (9) their function values arebetween √ A and √ B on I − := C · [0 , τ ] and I − j := C · [2 s ( j − τ , sj τ ] , j ≥ , (13)respectively. This leads us to the following definition. For J = ( j, ℓ ) ∈ J we introduce the wedge pairs W + J := n ξ ∈ R : | ξ | ∈ I j , ϕ ( ξ ) ∈ A J o and W − J := n ξ ∈ R : | ξ | ∈ I − j , ϕ ( ξ ) ∈ A − J o . (14)The following support properties will be of essential importance later,supp W J ⊆ W + J and √ A ≤ W J ≤ √ B on W − J . (15)A geometric illustration is displayed in Figure 1 (b).Now we fix C = 2 − s / (3 π ) in (3) such that each W + J is contained in the respective rectangleΞ J := R − J Ξ j, , where Ξ j, := [ − js − , js − ] × [ − jsα − , jsα − ] . (16)The rectangles Ξ j, are of size 2 js × jsα and hence the Fourier system { u j, ,k } k ∈ Z given by u j, ,k ( ξ ) := 2 − js (1+ α ) / exp (cid:0) πi (2 − sj k ξ + 2 − sjα k ξ ) (cid:1) , ξ ∈ R , constitutes an orthonormal basis for L (Ξ j, ) . Consequently, the rotated system { u j,ℓ,k } k ∈ Z of functions u j,ℓ,k ( ξ ) := u j, ,k ( R j,ℓ ξ ) , ξ ∈ R , (17)is an orthonormal basis for L (Ξ J ).After this preparation, we are ready to define the α -curvelet system C s,α . Definition 2.1.
Let s > , α ∈ ( −∞ , , and assume that { W J } J ∈ J is a family of functions of theform (10) such that (11) holds for < A ≤ B < ∞ . Further, let u j,ℓ,k be the functions defined in (17) . The curvelet system C s,α ( A, B ) := { ψ µ } µ ∈ M with associated index set M := J × Z consists of thefunctions ψ µ = ψ j,ℓ,k given by b ψ j,ℓ,k ( ξ ) := W j,ℓ ( ξ ) u j,ℓ,k ( ξ ) , ξ ∈ R . (18) Note that C s,α ( A, B ) depends on the utilized family { W J } J ∈ J , which is not accounted for in the notation. The curvelets ψ µ are real-valued due to the symmetry of W j,ℓ . Their L -norms may vary slightly withscale, however there are constants 0 < C ≤ C < ∞ such that C ≤ k ψ µ k ≤ C holds true for all µ ∈ M . Most importantly, the system C s,α ( A, B ) is a frame for L ( R ). Lemma 2.2.
The system C s,α ( A, B ) given by (18) is a frame for L ( R ) with frame bounds A and B .Proof. The functions W J satisfy condition (11) wherefore A k f k = A k b f k ≤ X J ∈ J k b f W J k ≤ B k b f k = B k f k for every f ∈ L ( R ) . Since supp ( b f W J ) ⊆ Ξ J and since { u J,k } k ∈ Z is an orthonormal basis of L (Ξ J ) we have the orthogonalexpansion b f W J = P k h b f W J , u J,k i u J,k χ Ξ J . The proof is finished by the following equality, k b f W J k = X k ∈ Z |h b f W J , u J,k i| = X k ∈ Z |h b f , W J u J,k i| = X k ∈ Z |h b f , b ψ J,k i| = X k ∈ Z |h f, ψ J,k i| . The Parseval frame C s,α (1 ,
1) is of most interest to us and one might wonder why we did not fix theframe bounds A = B = 1 in the beginning. The reason is that, in the proof of Lemma 4.16, we need theadditional flexibility provided by variable A and B . Remark 2.3.
Subsequently, we will write C s,α to refer to the Parseval frame C s,α (1 , . Let us finish this section with a short discussion of the situation in spatial domain. Here the α -curvelets { ψ j,ℓ,k } k ∈ Z are translates of the functions ψ j,ℓ, . Indeed, since b ψ j,ℓ, = 2 − js (1+ α ) / W j,ℓ and u j,ℓ,k ( · ) = u j, ,k ( R j,ℓ · ) = 2 − js (1+ α ) / exp (cid:0) πi h R − j,ℓ A − j k, ·i (cid:1) , where R j,ℓ is the rotation matrix defined in (6) and A j := A α, js is an α -scaling matrix of the form (1),we have b ψ j,ℓ,k = b ψ j,ℓ, exp (cid:0) πi h R − j,ℓ A − j k, ·i (cid:1) and hence ψ j,ℓ,k = ψ j,ℓ, ( · − x j,ℓ,k ) with x j,ℓ,k := R − j,ℓ A − j k. Since ψ j,ℓ, is the rotation of ψ j, , by the angle ϕ j,ℓ = ℓϕ j , we arrive at the representation ψ j,ℓ,k ( x ) = ψ j, , ( R j,ℓ ( x − x j,ℓ,k )) . (19)In fact, these systems are instances of α -molecules, a concept recalled in the definition below. Definition 2.4 ([24, Def. 2.9]) . Let Λ be a set and Φ Λ : Λ → P a map, assigning to each λ ∈ Λ a point ( s λ , θ λ , x λ ) ∈ P in the so-called phase-space P = R + × T × R . Futher, assume that L, M, N , N ∈ N .A family { m λ } λ ∈ Λ of functions in L ( R ) is called a family of α -molecules of order ( L, M, N , N ) withrespect to the parametrization (Λ , Φ Λ ) , if there exist generators a ( λ ) ∈ L ( R ) such that for all λ ∈ Λ m λ ( · ) = s (1+ α ) / λ a ( λ ) ( A α,s λ R ϕ λ ( · − x λ )) , and if for each ρ ∈ N , | ρ | ≤ L , there is a constant C ρ > such that for all λ ∈ Λ (cid:12)(cid:12) ∂ ρ ˆ a ( λ ) ( ξ ) (cid:12)(cid:12) ≤ C ρ min n , s − λ + | ξ | + s − (1 − α ) λ | ξ | o M (cid:0) | ξ | (cid:1) − N / (1 + | ξ | ) − N / , ξ ∈ R . (20)We can deduce from (19) that the α -curvelets ψ j,ℓ,k can be represented in the form ψ j,ℓ,k ( x ) = 2 js (1+ α ) / a j ( A j R j,ℓ ( x − x j,ℓ,k )) = 2 js (1+ α ) / a j ( A j R j,ℓ x − k ) (21)with respect to the generators a j := 2 − js (1+ α ) / ψ j, , ( A − j · ) . (22)Since these generators fulfill condition (20), as shown in Lemma 2.5 below, C s,α is a system of α -moleculesof arbitrary order, at least in the range α ∈ [0 ,
1] for which the concept was formulated. The associatedparametrization, mapping the curvelet index set M into the phase-space P = R + × T × R , is given byΦ M : M → P , ( j, ℓ, k ) (2 js , ϕ j,ℓ , x j,ℓ,k ) = (2 js , ℓϕ j , R − j,ℓ A − j k ) . (23) Lemma 2.5.
Let
M, N , N ∈ N and ρ = ( ρ , ρ ) ∈ N be fixed. There is a constant C > such thatfor all j ∈ N the generators (22) satisfy the estimate (cid:12)(cid:12) ∂ ρ b a j ( ξ ) (cid:12)(cid:12) ≤ C min (cid:8) , − js + | ξ | + 2 − js (1 − α ) | ξ | (cid:9) M (1 + | ξ | ) − N / (1 + | ξ | ) − N / . (24) Proof.
On the Fourier side the functions (22) have the form b a j = 2 js (1+ α ) / b ψ j, , ( A j · ) = W j, ( A j · ) . Let j ∈ N be arbitrary. We have supp W j, ⊆ W + j, and W + j, ⊆ [ − js − , js − ] × [ − jsα − , jsα − ] = Ξ j, , b a j ⊆ [ − − , − ] × [ − − , − ] = Ξ , . (25)Further, if j > b ψ j, , vanishes on the square [ − s ( j − − , s ( j − − ] . Consequently, b a j vanishes on [ − − s − , − s − ] × (cid:0) js (1 − α ) · [ − − s − , − s − ] (cid:1) .The mixed derivatives ∂ ρ ∂ ρ W j, obey uniformly in j ∈ N k ∂ ρ ∂ ρ W j, k ∞ . − jsρ − jsαρ . (26)With the chain rule we deduce k ∂ ρ b a j k ∞ = k ∂ ρ ∂ ρ W j, ( A j · ) k ∞ = 2 jsρ jsαρ k (cid:0) ∂ ρ ∂ ρ W j, (cid:1) ( A j · ) k ∞ . . Due to supp ∂ ρ b a j ⊆ supp b a j this estimate together with the support properties of b a j implies (24).With the machinery of α -molecules at our disposal, it is possible to use C s,α as an anchor system whoseproperties have consequences for other α -scaled systems if they fulfill certain consistency conditions. Inparticular, approximation properties of C s,α are shared by other α -scaled systems such as e.g. α -shearlets.A short discussion of this can be found in Section 5. For more details on the topic of α -molecules werefer to [24, 20]. In the two central sections of this article, Sections 3 and 4, we study the approximation performance ofthe α -curvelet frame C s,α with respect to different cartoon classes. We begin in this section with classesof general cartoons, used e.g. to model natural images. In Section 4 we then turn our focus on cartoonsfeaturing only straight edges. Many suitable and well-established models for natural images are based on the concept of so-calledcartoon-like functions. In a nutshell, such functions can be thought of as a patchwork of smooth regionsseparated from one another by piecewise-smooth discontinuity curves. Their structure imitates the factthat edges, a typical feature of natural images, are characterized by abrupt changes of color and brightness,whereas changes in the regions in between occur smoothly.Mathematically, models based on this idea can be concretised in different ways. A classic model [5]postulates a compact image domain separated into two C regions by a closed C discontinuity curve.This model was generalized in various directions, e.g., to take into account piecewise-smooth edges orto allow more general C β regularity with β ∈ [0 , ∞ ). Cartoon classes of this kind have been studiedextensively, especially in the range β ∈ (1 , Definition 3.1.
Let β ∈ [0 , ∞ ) and ν > . Given a domain Ω ⊆ R and a set A of admissible subsetsof R , the class E β (Ω; A , ν ) consists of all functions f ∈ L ( R ) of the form f = f + f χ D , where D ∈ A and f , f ∈ C β ( R ) with supp f , f ⊆ Ω and k f k C β , k f k C β ≤ ν . The class E β bin (Ω; A ) shall be the collection of all ‘binary functions’ χ D , where D ∈ A and
D ⊆ Ω . A many of the classes appearing in the literature can be retrieved, includingclasses of horizon-type. In this section we focus on the class E β (Ω; A , ν ) with fixed image domain Ω =[ − , and certain C β domains as admissible sets A . Similar to [18, 5, 37, 36], we restrict our investigationto star-shaped domains, since those allow a simple parametrization of the boundary curve. The resultsobtained however also hold true for more general domains.Let us introduce the collection of admissible sets Star β ( ν ), ν >
0, as all translates of sets B ⊆ R ,whose boundary ∂B possesses a parametrization b : T → R of the form b ( ϕ ) = ρ ( ϕ ) (cid:18) cos( ϕ )sin( ϕ ) (cid:19) , ϕ ∈ T = [0 , π ] , where the radius function ρ : T → R is a C β function with | ∂ ⌊ β ⌋ ρ ( ϕ ) − ∂ ⌊ β ⌋ ρ ( ϕ ′ ) | ≤ νρ | ϕ − ϕ ′ | β −⌊ β ⌋ for all ϕ, ϕ ′ ∈ T , (27)where we set ρ := min ϕ ∈ T ρ ( ϕ ) ≥ ν − . The condition (27) implies that with C = C ( β ) = (2 π ) β ≥ k ρ ( k ) k C ( T ) ≤ Cρ ν for every k ∈ { , . . . , ⌊ β ⌋} if β ≥
1, and | ρ ( ϕ ) − ρ ( ϕ ′ ) | ≤ Cρ ν for ϕ, ϕ ′ ∈ T . Inparticular ρ ≤ ρ ( ϕ ) ≤ ρ (1 + Cν ) for all ϕ ∈ T .Note, that the set Star β ( ν ) differs from the set of star-shaped domains used in [18, 5, 37, 36]. Thedomains in Star β ( ν ) are not restricted to subsets of [ − , . In fact, every star-shaped C β domain withcenter 0 and ρ > Star β ( ν ) for suitably large ν . Moreover, the collection Star β ( ν ) isscaling invariant in the sense that for B ∈ Star β ( ν ) and λ > λB ∈ Star β ( ν ), provided λρ ≥ ν − .In addition, with B ∈ Star β ( ν ) also the complement B c = R \ B is contained in Star β ( ν ).Building upon Definition 3.1 we now define the class of functions which we want to study in thissection. We put Ω = [ − , and A = Star β ( ν ). Further, we assume β ∈ [0 , ∞ ) and ν >
0. For theresulting class E β ([ − , ; Star β ( ν ) , ν ) we simplify the notation E β ([ − , ; ν ) := E β ([ − , ; Star β ( ν ) , ν ) . (28)The associated binary class shall be denoted by E βbin ([ − , ; ν ) := E βbin ([ − , ; Star β ( ν )). Before we investigate the approximation performance of the α -curvelet frame C s,α with respect to theclass E β ([ − , ; ν ), let us take a broader stance and aim for best possible N -term approximation in casewe can freely choose the utilized dictionary. Of course, a countable dense subset of L ( R ) would yieldarbitrarily good 1-term approximations. This shows that, without further restrictions, the question ofbest possible approximation is not well-posed.To cast a realistic scenario, when computing N -term approximations typically a constraint on thesearch depth is imposed. More concretely, given a fixed ordering of the dictionary and some polynomial π , it is common to allow only N -term approximants being built from the first π ( N ) elements of thedictionary. Under this so-called polynomial depth search constraint, an upper bound on the maximalachievable approximation rate was first derived by Donoho [18, Thm. 1] for binary C β cartoons in therange β ∈ (1 , E β ([ − , ; ν ) specified in (28). Theorem 3.2.
Let β, γ ∈ [0 , ∞ ) and ν > . Assume that there is a constant C > such that sup f ∈E β ([ − , ; ν ) k f − f N k ≤ CN − γ for all N ∈ N , where f N denotes the best N -term approximation of f obtained by polynomial depth search in a fixeddictionary. Then necessarily γ ≤ β . In principle, this is a known result (see e.g. [36]). However, for reasons of completeness, we outline ashort proof based on the technique used in [18]. It relies on Theorem 3.4 below and the fact that theclass E β ([ − , ; ν ) contains a copy of ℓ p for p = 2 / ( β + 1). Let us recall this notion introduced in [18].2 Definition 3.3 ([18, Def. 1&2]) . A function class F ⊆ L ( R ) is said to contain an embedded orthogonalhypercube of dimension m and side-length δ if there exist f ∈ F and orthogonal functions ψ ℓ ∈ L ( R ) , ℓ ∈ { , ..., m } , with k ψ ℓ k = δ such that the collection of hypercube vertices embeds, i.e., n f + m X ℓ =1 ǫ ℓ ψ ℓ : ǫ = ( ǫ , . . . , ǫ m ) ∈ { , } m o ⊆ F . It is said to contain a copy of ℓ p , p > if it contains a sequence of embedded orthogonal hypercubes,whose associated dimensions m k and side-lengths δ k satisfy δ k → for k → ∞ and with a constant C > Cδ − pk ≤ m k for all k ∈ N . The significance of this notion is due to the following result, which was first obtained in [18, Thm. 2].The reformulated version below can be found in [23, Thm. 2.2].
Theorem 3.4 ([23, Thm. 2.2]) . Suppose, that a class of functions F ⊆ L ( R ) is uniformly L -boundedand contains a copy of ℓ p . Then, allowing only polynomial depth search in a given dictionary, there is aconstant C > such that for every N ∈ N there is a function f ∈ F and an N ∈ N , N ≥ N such that k f − f N k ≥ C (cid:0) N log ( N ) (cid:1) − (2 − p ) /p , where f N denotes the best N -term approximation under the polynomial depth search constraint. It remains to investigate for which p > E β ([ − , ; ν ) contains a copy of ℓ p . To this end, letus introduce the following subclass of smooth functions for β ∈ [0 , ∞ ) and ν > C β ([ − , ; ν ) := (cid:8) f ∈ C β ([ − , ) : k f k C β ≤ ν (cid:9) . (29)Note, that the choice Ω = [ − , and A = {∅} in Definition 3.1 yields this class. As a consequence, C β ([ − , ; ν ) ⊂ E β ([ − , ; ν ) . (30)Lemma 3.5 below is the 2D analogon of the statement of [36, Thm. 3.2]. It shows, in particular, that C β ([ − , ; ν ) contains a copy of ℓ / ( β +1)0 . Hence, as a consequence of (30), also E β ([ − , ; ν ) containsa copy of ℓ / ( β +1)0 . An application of Theorem 3.4 thus yields Theorem 3.2. Lemma 3.5.
Let ν > , β ∈ [0 , ∞ ) , and p = 2 / ( β + 1) . Then the following holds true.(i) The function class C β ([ − , ; ν ) contains a copy of ℓ p .(ii) The class of binary cartoons E β bin ([ − , ; ν ) contains a copy of ℓ p if ν ≥ , otherwise it onlycontains the zero-function.Proof. The proof is a 2D-adaption of the proof of [36, Thm. 3.2].Summarizing, this establishes N − β as an upper bound for the possible order of approximation for general C β cartoons. This rate is the benchmark, against which the performance of C s,α has to be measured.We end this paragraph with the following observation. Remark 3.6.
According to Lemma 3.5(i), the bound of Theorem 3.2 actually holds true for the class C β ([ − , ; ν ) . This is a stronger statement due to the inclusion (30) . Further, due to Lemma 3.5(ii), astatement analogous to Theorem 3.2 holds true for the binary class E β bin ([ − , ; ν ) if ν ≥ . According to Theorem 3.2 and Remark 3.6 the order of the N -term approximation rate achievable forthe classes E β bin ([ − , ; ν ), ν ≥
1, and E β ([ − , ; ν ), ν >
0, cannot exceed N − β . This bound is validfor arbitrary dictionaries and independent of the approximation scheme employed, as long as it respectsa polynomial depth search condition. Even adaptive approximation schemes cannot perform better.Schemes, where these rates are provably achieved, at least up to order, have been developed for binarycartoons based on wedgelets [15] and surflets [9], for general cartoons utilizing bandelets [38, 39]. Theseresults show that the optimality benchmark N − β can indeed be realized in practice, at least up to order.However, the utilized schemes are mostly adaptive, only for certain cartoon classes nonadaptive methodswith quasi-optimal performance are known.A breakthrough concerning the nonadaptive approximation of C cartoons with curved edges was theintroduction of curvelets by Cand`es and Donoho [4, 5]. By a simple thresholding scheme, curvelet framesachieve an approximation rate matching the class bound N − up to a log-factor. The reason for thisperformance is due to the parabolic scaling employed. The following argument shall heuristically explain,why this type of scaling is ideal for the representation of C edges.In local Cartesian coordinates, a C curve can be represented as the graph ( E ( x ) , x ) of a function E ∈ C ( R ) and one can choose a coordinate system such that E ′ (0) = E (0) = 0. A Taylor expansionthen yields approximately E ( x ) ≈ E ′′ (0) x , which matches the essential support width ≈ length ofparabolically scaled functions. Hence, those can provide optimal resolution of the curve across all scales.A similar heuristic applies to C β curves if β ∈ (1 , E ∈ C β ( R ) yields | E ( x ) | . x β . The curve is thus contained in a rectangle of size width ≈ length /β which suggests α -scaling with α = β − for optimal approximation. And indeed, the classic approximation result by Cand`esand Donoho could be extended in [23, Thm. 4.1] to the range β ∈ (1 , E β ([ − , ; ν ) used here is not fully identical to the class in [23]. Moreover, only curvelet framesof the type C s,α with s = 1 were considered there. It is not hard to verify though that the proof carriesover to general s > Theorem 3.7 ([23, Thm. 4.1]) . Let β ∈ (1 , , ν > . For the choice α = β − , s > arbitrary, the frameof α -curvelets C s,α provides almost optimal sparse approximations for the class E β ([ − , ; ν ) . Moreprecisely, there exists a constant C > such that for every f ∈ E β ([ − , ; ν ) and N ∈ N k f − f N k ≤ CN − β log (1 + N ) β , where f N denotes the N -term approximation of f obtained by choosing the N largest coefficients. This theorem naturally raises the question of extendibility beyond the range β ∈ (1 , α = β − is still optimal for β > β > α = β − . In fact, it is still α = and choosing α < deteriorates the approximation performance. The main results of this subsection, Theorems 3.9 and 3.11, establish bounds on the achievable N -termapproximation rate for the class E β ([ − , ; ν ), β ∈ [0 , ∞ ), when using the α -curvelet frame C s,α forapproximation. Unlike the bounds in Theorem 3.2 associated with the signal class the bounds derivedhere are tied to the particular approximation system C s,α . However, via the framework of α -moleculesthey are also effective for other α -scaled systems, such as e.g. α -shearlets as discussed in Section 5.In order to establish these bounds we study the approximability of certain example cartoons. As asuitable object, we choose the characteristic function of the ball B (0 , ) ⊂ R of radius , for which wesubsequently use the symbol Θ( x ) := χ B (0 , ) ( x , x ) , x ∈ R . (31)This function embodies an exceptionally regular cartoon with a closed curved C ∞ -singularity. It is radialsymmetric and binary, contained in E β bin ([ − , , ν ) for arbitrary β ∈ [0 , ∞ ) and ν ≥
2. Furthermore, for4every β ∈ [0 , ∞ ) and ν ≥ γ > γ Θ ∈ E β ([ − , ; ν ), wherefore the approximabilityof Θ has implications for the approximability of these cartoon classes.The Fourier transform of Θ is explicitly known. Let J denote the Bessel function of order 1, thenaccording to (67) b Θ( ξ ) = J ( π | ξ | )2 | ξ | , ξ ∈ R . (32)Some properties of J and Bessel functions in general are collected in the appendix.At the center of the following investigation is the lemma below, which estimates the energy of b Θcontained in the wedges W J , J ∈ J . Let { W J } J ∈ J be a family of functions of the kind (10) with property(11) for 0 < A ≤ B < ∞ . Further, let W − J := χ W − J and W + J := χ W + J be the characteristic functions of the sets W − J and W + J defined in (14). Lemma 3.8.
There are constants < C ≤ C < ∞ , independent of scale j ≥ j , where j ∈ N is asuitable base scale, such that for all J ∈ J with | J | ≥ j , where | J | = j for J = ( j, ℓ ) ∈ J , AC − js (2 − α ) ≤ A k b Θ W − J k ≤ k b Θ W J k ≤ B k b Θ W + J k ≤ BC − js (2 − α ) . Proof.
Let us recall the Bessel function J of order 1 and its asymptotic behavior. According to (69)there is a constant C > R on [1 , ∞ ) satisfying | R ( r ) | ≤ Cr − / such that J ( r ) = r πr cos( r − π R ( r ) for r ≥ . This allows to separate terms of higher order from J . We decompose J ( r ) = h π cos ( r − π r − i + hr π cos( r − π r − / R ( r ) + R ( r ) i =: T ( r ) + T ( r ) . For the following argumentation we need the square wave function ⊓ : R → { , } defined by ⊓ ( r ) := ( , r ∈ S k ∈ Z kπ + [ − π , , , r ∈ S k ∈ Z kπ + (0 , π ) . For all r ∈ R it has the property 2 cos ( r − π/ ≥ ⊓ ( r ). Therefore we can deduce for 1 ≤ a ≤ b Z ba T ( r ) r − dr = 1 π Z ba ( r − π r − dr ≥ π Z ba ⊓ ( r ) r − dr ≥ X k ∈ I a,b ( kπ ) − with I a,b := { k ∈ Z : kπ ∈ [ a + π, b ] } . To proceed, we use the relation n X k = m ( kπ ) − ≥ π Z ( n +1) πmπ k − dk, which is valid for all m, n ∈ N and m ≤ n . We obtain12 X k ∈ I a,b ( kπ ) − ≥ π Z ba +2 π k − dk = 12 π (cid:0) Z ba k − dk − Z a +2 πa k − dk (cid:17) ≥ π ( a − − b − ) − a − . Next, we see that with a constant
C > ≤ a ≤ b Z ba | T ( r ) | r − dr ≤ C Z ba r − dr ≤ C Z ∞ a r − dr ≤ Ca − . Z ba J ( r ) r dr ≥ π (1 − ab − ) a − − (1 + C ) a − . If c = ab − ≤ a ≥ π C − c the estimate Z a/ca J ( r ) r dr ≥ π (1 − c ) a − . (33)After this preparation, we can now turn to the actual proof of the assertion. The relation A k b Θ W − J k ≤ k b Θ W J k ≤ B k b Θ W + J k is a direct consequence of (15) and k W J k ∞ ≤ √ B . Let I j be the intervals defined in (7). Further, recallthe intervals I − j ⊂ I j defined in (13). Using (32) and the definition (14) of W − J we calculate k b Θ W − J k = Z W − J J ( π | ξ | )4 | ξ | dξ = Z I − j Z A − J J ( πr )4 r dϕdr ≍ − js (1 − α ) Z π I − j J ( r ) r dr. The intervals I − j scale like ∼ js . Hence, if j ∈ N is chosen large enough by (33) k b Θ W − J k ≍ − js (1 − α ) Z π I − j J ( r ) r − dr & − js (1 − α ) − js = 2 − js (2 − α ) . The estimate from above is much easier to establish. If j ∈ N such that π I j ⊂ [1 , ∞ ) we have k b Θ W + J k = Z W + J J ( π | ξ | )4 | ξ | dξ = Z I j Z A J J ( πr )4 r dϕdr ≍ − js (1 − α ) Z π I j J ( r ) r dr . − js (1 − α ) Z I j r − dr . − js (2 − α ) . Based on Lemma 3.8 we can prove the first main result of this article.
Theorem 3.9.
Let C s,α be the α -curvelet frame constructed in Section 2 for fixed α ∈ ( −∞ , and s > .There exists a constant C > such that for any given N ∈ N every N -term approximation f N of Θ withrespect to C s,α (not even subject to a polynomial depth search constraint) satisfies k Θ − f N k ≥ CN − − α . Proof.
Let N ∈ N be fixed and assume that f N = N X r =1 θ J r ,k r ψ J r ,k r is a linear combination of α -curvelets ψ J r ,k r with coefficients θ J r ,k r ∈ R . The curvelets ψ J r ,k r ∈ C s,α satisfy supp b ψ J r ,k r ⊆ W + J r as recorded in (15). It follows supp b f N ⊆ W N where W N := S J ∈ J N W + J for J N := { J , . . . , J N } ⊂ J . Using the notation J cN := J \ J N and W cN := R \W N we get with Lemma 3.8 k Θ − f N k = k b Θ − b f N k ≥ k b Θ k L ( W cN ) ≥ X J ∈ J cN k b Θ W − J k & X J ∈ J cN − js (2 − α ) . We want to bound the right-hand side from below. By (5), the number of tiles in each corona C j , j ∈ N ,is given by L j , where L = 1 and L j = 2 ⌊ js (1 − α ) ⌋ +1 for j ≥
1. Let j ( N ) ∈ N denote the unique numbersuch that j ( N ) − X j =0 L j < N ≤ j ( N ) X j =0 L j . − js (2 − α ) decreases with rising scale we obtain X J ∈ J cN − js (2 − α ) ≥ ∞ X j = j ( N )+1 L j − js (2 − α ) ≥ ∞ X j = j ( N )+1 − js & − j ( N ) s . Here we used L j ≥ js (1 − α ) . Since N & P j ( N ) − j =0 js (1 − α ) & j ( N ) s (1 − α ) we can finally deduce k Θ − f N k & − j ( N ) s = (cid:16) j ( N ) s (1 − α ) (cid:17) − − α & N − − α . This result can be strengthened if we restrict to greedy N -term approximations obtained by thresholdingthe coefficients. Essential is the following observation, which has also been used in [23]. Due to itsimportance we give a rigorous proof here. Lemma 3.10.
There is a constant
C > such that all curvelets ψ µ ∈ C s,α , µ ∈ M , satisfy k ψ µ k ≤ C − js (1+ α ) / . Proof.
Let a j be the functions from (22) and recall that according to (25) the support of b a j is containedin the unit square Ξ , for every j ∈ N . Let Id denote the identity operator. We have the estimate (cid:13)(cid:13)(cid:13) F − (cid:16) ( Id + ∂ )( Id + ∂ ) b a j (cid:17)(cid:13)(cid:13)(cid:13) ∞ ≤ k ( Id + ∂ )( Id + ∂ ) b a j k ≤ k ( Id + ∂ )( Id + ∂ ) b a j k ∞ . According to Lemma 2.5 the right-hand side is bounded uniformly over all scales. We conclude that thereis a constant
C >
0, independent of j ∈ N , such thatsup x ∈ R | (1 + x )(1 + x ) a j ( x ) | ≤ C. In other words | a j ( x ) | ≤ C (1 + x ) − (1 + x ) − . Using the representation (21) we obtain | ψ j, , ( x ) | = 2 js (1+ α ) / | a j ( A j x ) | ≤ C js (1+ α ) / (1 + 2 js x ) − (1 + 2 jsα x ) − and hence Z R | ψ j, , ( x ) | dx . js (1+ α ) / Z R (1 + 2 js x ) − (1 + 2 jsα x ) − dx = 2 − js (1+ α ) / Z R (1 + x ) − (1 + x ) − dx . − js (1+ α ) / . Since k ψ j,ℓ,k k = k ψ j, , k the proof is finished.Lemma 3.10 allows to deduce a simple a-priori estimate of the curvelet coefficient size, namely | θ µ | = |h f, ψ µ i| ≤ k f k ∞ k ψ µ k ≤ C k f k ∞ − js (1+ α ) / for µ = ( j, ℓ, k ) ∈ M . (34)Note, that the constant C > C s,α . Using (34) we now prove a stronger statementthan Theorem 3.9 for greedy approximations. Theorem 3.11.
Let α ∈ ( −∞ , and s > be fixed. Further, let f N denote the N -term approximation of Θ with respect to the α -curvelet frame C s,α obtained by thresholding the coefficients. There is a constant C > such that for every N ∈ N k Θ − f N k ≥ CN − { α, − α } . Proof. If α ≤ the assertion is true by Theorem 3.9. It remains to handle the range 1 ≥ α > .Let θ J r ,k r = h Θ , ψ J r ,k r i , r ∈ { , . . . , N } , be the N largest curvelet coefficients which determine the7approximant f N := P Nr =1 θ J r ,k r ψ J r ,k r . On the Fourier side the curvelet ψ J,k ∈ C s,α is the product of thefunctions W J and u J,k defined in (10) and (17), respectively. Using condition (11) we first estimate k Θ − f N k = k b Θ − b f N k ≥ B − X J ∈ J k b Θ W J − b f N W J k ≥ A B X J ∈ J k b Θ W − J − b f N W − J k , where W − J is the characteristic function of the set W − J defined in (14). The triangle inequality yields12 k b Θ W − J k ≤ k b Θ W − J − b f N W − J k + k b f N W − J k for every J ∈ J . (35)Observe the relation √ AW − J ≤ W − J W J ≤ √ BW − J and W − J W J ′ = 0 for J = J ′ . Therefore, it holds b f N W − J = N X r =1 θ J r ,k r b ψ J r ,k r W − J = N X r =1 θ J r ,k r u J r ,k r W J r W − J ≍ X k ∈ K J θ J,k u J,k W − J with K J = { k r ∈ Z : r ∈ { , . . . , N } , J r = J } . Next, we use that { u J,k } k ∈ Z is an orthonormal basisof L (Ξ J ), where Ξ J ⊃ W − J is the set defined in (16). We estimate (cid:13)(cid:13)(cid:13) X k ∈ K J θ J,k u J,k W − J (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) X k ∈ K J θ J,k u J,k (cid:13)(cid:13)(cid:13) L (Ξ J ) = X k ∈ K J | θ J,k | . The frame coefficients satisfy the a-priori estimate | θ J,k | . − js (1+ α ) according to (34). Thus we obtain k b f N W − J k ≍ (cid:13)(cid:13)(cid:13) X k ∈ K J θ J,k u J,k W − J (cid:13)(cid:13)(cid:13) . ( K J )2 − js (1+ α ) . By Lemma 3.8 we have k b Θ W − J k & − js (2 − α ) . We deduce from (35) k b Θ W − J − b f N W − J k ≥ k b Θ W − J k − k b f N W − J k & − js (2 − α ) − ( K J )2 − js (1+ α ) . Altogether, we conclude k Θ − f N k ≥ X J ∈ J k b Θ W − J − b f N W − J k & X J ∈ J max (cid:8) , − js (2 − α ) − ( K J )2 − js (1+ α ) (cid:9) . Note that P J ( K J ) ≤ N . To derive a lower bound let us consider the following minimization problem: Minimize { N J } J ∈ J X J ∈ J max { , − js (2 − α ) − N J − js (1+ α ) } s.t. X J ∈ J N J ≤ N, N J ∈ [0 , ∞ ) ( J ∈ J ) . The condition N J ∈ [0 , ∞ ), which simplifies the subsequent argumentation, is possible since we are onlyinterested in a bound. For the optimal choice { N J } J , it necessarily holds P J N J = N and N J ≤ − js (2 − α ) js (1+ α ) = 2 js (2 α − . Hence, the minimization problem can be reformulated as minimizing the term X J ∈ J (cid:0) − js (2 − α ) − N J − js (1+ α ) (cid:1) under the constraints P J N J = N and N J ≤ js (2 α − . Assume that the family { N J } J fulfills theseconstraints. Further, let j ( N ) ∈ N denote the number determined by the property j ( N ) − X j =0 js (2 α − L j < N ≤ j ( N ) X j =0 js (2 α − L j , (36)8where L j from (5) counts the wedges in the corona C j . Then the following estimate holds true X J ∈ J (cid:16) − js (2 − α ) − N J − js (1+ α ) (cid:17) ≥ ∞ X j = j ( N )+1 (cid:16) X | J | = j − js (2 − α ) (cid:17) ≥ ∞ X j = j ( N )+1 − js & − j ( N ) s . To see this, note that 2 − js (1+ α ) is decreasing with rising scale and that L j ≥ js (1 − α ) . Since N ≍ j ( N ) sα ,which follows from (36), we have proven k Θ − f N k & X J ∈ J max (cid:8) , − js (2 − α ) − ( K J )2 − js (1+ α ) (cid:9) & − j ( N ) s ≍ N − α and the proof is finished.The approximation results for Θ have direct implications for the class-wise approximation of cartoon-likefunctions. If ν ≥
2, then Θ ∈ E βbin ([ − , ; ν ) for arbitrary β ∈ [0 , ∞ ). Moreover, we can always find γ > γ Θ ∈ E β ([ − , ; ν ). This allows to draw the following conclusion. Corollary 3.12.
Let β ∈ [0 , ∞ ) and ν ≥ . The uniform decay of the N -term approximation error for E βbin ([ − , ; ν ) and E β ([ − , ; ν ) provided by C s,α cannot be faster than N − − α . Futhermore, thresh-olding of coefficients cannot yield rates better than N − { α, − α } . If β > C s,α to reach the theoretically possible approximation order of N − β for theclass E β ([ − , ; ν ). The best performance is achieved for the classic choice α = , with a correspondingapproximation rate of order N − . A smaller α leads to a deterioration of the approximation. As isobvious from our investigation, this behavior applies to cartoons with curved edges exemplified by thefunction Θ = χ B (0 , ) from (31). For such cartoons the rate inevitably deteriorates as α tends to 0,since their energy is spread more or less uniformly across all directions of the Fourier plane. In the nextsection, we narrow our focus and consider only cartoons with straight edges. Such cartoons are highlyanisotropic and in a certain sense the opposite extreme of the isotropic function Θ. Since their Fourierenergy is concentrated in only one direction, a smaller α will be an advantage for their approximation. In the following, we investigate the approximation performance of the curvelet frame C s,α with respectto cartoons with straight edges. To specify the associated signal class, let Straight be the collection ofall closed half-spaces of R . Parameterized by ϕ ∈ [0 , π ) and c ∈ R , these are subsets of the form H ( ϕ, c ) = n ( x , x ) ∈ R : x cos( ϕ ) − x sin( ϕ ) ≥ c o . Using Definition 3.1 we then introduce the following image class with parameters β ∈ [0 , ∞ ) and ν > E β ([ − , ; ν ) := E β ([ − , ; Straight , ν ) . This is a subclass of the general cartoons (28) considered in Section 3. Indeed, for ν > ν ≥ ν chosen large enough C β ([ − , ; ν ) ⊂ E β ([ − , ; ν ) ⊂ E β ([ − , ; ˜ ν ) , where C β ([ − , ; ν ) is the class defined in (29). These inclusions allow to transfer the optimalitybenchmark N − β , valid for both E β ([ − , ; ˜ ν ) and C β ([ − , ; ν ) (see Theorem 3.2 and Remark 3.6).For E β ([ − , ; ν ), we thus again aim for an approximation rate of order N − β .Ridgelet frames were developed specifically for the optimal representation of functions with straightline singularities. For both variants, ‘orthonormal ridgelets’ [16] and ‘0-curvelets’ [22], it has been shownthat they reach the optimality bound N − β . More precisely, this rate was proved for ‘mutilated Sobolevfunctions’ with compact support [3, 26], i.e., compactly supported functions which are in the Sobolevspace H β ( R ) apart from straight line singularities. In line with the result from [26] for 0-curvelets, wecan expect that decreasing α improves the approximation ability of C s,α for E β ([ − , ; ν ).9Our main result concerning the α -curvelet approximation of E β ([ − , ; ν ) is Theorem 4.1 below. Itis formulated and proved for integer β ∈ N only, although the statement should extend to the whole range β ∈ R + . In this way, we avoid technical difficulties which would arise if we used finite differences insteadof integer derivatives (compare [23]). Theorem 4.1.
The parameters β ∈ N , ν > , α ∈ [0 , , and s > shall be fixed. Further, let f N be the N -term approximation of a signal f ∈ L ( R ) provided by the N largest coefficients with respect to theframe C s,α = { ψ µ } µ ∈ M . There exists a constant C > such that for every f ∈ E β ([ − , ; ν ) and N ∈ N k f − f N k ≤ C ( N − β log (1 + N ) β if α ≤ β − ,N − /α if α > β − . As expected, decreasing the parameter α improves the approximation performance. If α ∈ [0 , β − ] theachieved rate is even optimal up to the log-factor. In this range signals from E β ([ − , ; ν ) are representedwith the same efficiency as a smooth function from C β ([ − , ; ν ).Theorem 4.1 is deduced by studying the curvelet coefficients, whose decay is closely related to theachieved N -term approximation rate. Recall that a typical measure for the sparsity of a sequence { c λ } λ ⊂ C is given by the weak ℓ p -(quasi)-norms, for p > k{ c λ } λ k wℓ p := (cid:16) sup ε> ε p · { λ : | c λ | > ε } (cid:17) /p . By definition, the sequence { c λ } λ belongs to wℓ p (Λ) if and only if the quantity k{ c λ } λ k wℓ p is finite. Thisis the case precisely if there exists a constant C > { λ : | c λ | > ε } ≤ C p ε − p for all ε >
0. Thesmallest possible such constant then coincides with the weak ℓ p -(quasi)-norm of the sequence. Anotheruseful characterization of a sequence { c λ } λ ∈ wℓ p (Λ) is given in terms of its non-increasing rearrangement { c ∗ n } n ∈ N . It holds | c ∗ n | . n − /p and sup n> n /p | c ∗ n | = k{ c λ } λ k wℓ p .As illustrated by the following well-known lemma (see e.g. [13]), the decay of the frame coefficientsdetermines the N -term approximation rate achieved by thresholding. A full proof is given e.g. in [24]. Lemma 4.2 ([24, Lem. 5.1]) . Let { m λ } λ ∈ Λ be a frame in L ( R ) and f = P c λ m λ an expansion of f ∈ L ( R ) with respect to this frame. If { c λ } λ ∈ wℓ / ( β +1) (Λ) for some β ≥ , then the N -termapproximations f N obtained by keeping the N largest coefficients satisfy k f − f N k . N − β . Beginning in Subsection 4.1, we study the sparsity of the coefficients θ µ = h f, ψ µ i provided by theframe C s,α = { ψ µ } µ ∈ M for a signal f ∈ E β ([ − , ; ν ). The decay rates proved in Theorem 4.3 are thefoundation of the following proof of Theorem 4.1. Proof of Theorem 4.1. If α > β − the sequence { θ µ } µ ∈ M of curvelet coefficients θ µ = h f, ψ µ i belongs to wℓ p ( M ) with p = 2 / (1 + 1 /α ). This is proved in Theorem 4.3. Lemma 4.2 directly translates this intothe statement of Theorem 4.1. In case α ≤ β − Theorem 4.3 yields | θ ∗ m | ≤ Cm − (1+ β ) (log m ) β forthe curvelet coefficient θ ∗ m of m -th largest modulus. Utilizing the frame property of C s,α we can estimate k f − f N k . X m>N | θ ∗ m | . X m>N m − (1+ β ) · (log m ) (1+ β ) ≤ Z ∞ N t − (1+ β ) · (log (1 + t )) (1+ β ) dt. Note that N ≥
1. Partial integration leads to Z ∞ N t − (1+ β ) · (log (1 + t )) (1+ β ) dt . N − β (log (1 + N )) (1+ β ) + Z ∞ N t − (1+ β ) · (log (1 + t )) ⌈ β ⌉ dt. We repeat this ⌈ β ⌉ -times and finally arrive at Z ∞ N t − (1+ β ) · (log (1 + t )) (1+ β ) dt . N − β (log (1 + N )) (1+ β ) . Subsequently, we study the decay of the curvelet coefficients θ µ = h f, ψ µ i . Our main result is Theorem 4.3. Theorem 4.3.
Let α ∈ [0 , , s > , β ∈ N , and ν > be fixed. Further, denote by θ ∗ N the (in modulus) N -th largest coefficient of f ∈ E β ([ − , ; ν ) with respect to C s,α = { ψ µ } µ ∈ M . There exists a constant C > independent of N ≥ such that sup f ∈ E β ([ − , ; ν ) | θ ∗ N | ≤ C · ( N − (1+ β ) · (log N ) β if α ≤ β − ,N − (1+1 /α ) if α > β − . Proof.
Let M j denote the subset of the curvelet index set M corresponding to scale j . Further, given ε >
0, let us define M ε := n µ ∈ M : | θ µ | > ε o and M j,ε := n µ ∈ M j : | θ µ | > ε o . According to (34) thereis a constant e C >
0, independent of scale, such that | θ µ | ≤ e C k f k ∞ − js (1+ α ) / ≤ e Cν − js (1+ α ) / . At scales j > j ε := ( e Cνε − ) s (1+ α ) the coefficients thus satisfy | θ µ | < ε and the sets M j,ε are empty. Inparticular M ε = 0 in case ε > e Cν since then j ε <
0. If j ≤ j ε Proposition 4.4, which is stated andproved below, gives the estimate M j,ε . jρ ε − / (1+ β ) with ρ = s max { αβ − , } β ≥ . If α > β − we have ρ > M ε = ⌊ j ε ⌋ X j =0 M j,ε . ⌊ j ε ⌋ X j =0 jρ ε − / (1+ β ) . j ε ρ ε − / (1+ β ) = ε − αβ − β )(1+ α ) ε − / (1+ β ) = ε − / (1+1 /α ) . From here, a direct argument leads to | θ ∗ N | . N − (1+1 /α ) for the N -th largest coefficient θ ∗ N .If α ≤ β − we have ρ = 0 and the estimate M ε . ⌊ j ε ⌋ X j =0 ε − / (1+ β ) . (log ( e Cνε − ) + 1) ε − / (1+ β ) = log (2 e Cνε − ) ε − / (1+ β ) . Hence, there is a constant C ≥ M ε ≤ C log ( C ε − )( C ε − ) / (1+ β ) with C = max { , e Cν } .It follows | θ ∗ N | ≤ C δ N for the number δ N which solves N = C log ( δ − N ) δ − / (1+ β ) N . In general δ N cannotbe calculated explicitly, wherefore we resort to an estimate.If N ≥ ε N := N − β ≤ since β ≥
1. Taking into account C ≥ C ε N − / (1+ β ) log ( ε − N ) ≥ N = C δ − / (1+ β ) N log ( δ − N ) , which in turn proves δ N ≥ ε N = N − β . Therefore e δ N ≥ δ N for the solution e δ N of N = C e δ − / (1+ β ) N log ( N β ) . An explicit calculation yields e δ N = ( C β ) (1+ β ) / N − (1+ β ) / (log N ) (1+ β ) / , which proves the claim.The missing ingredient in the proof of Theorem 4.3 is Proposition 4.4. Proposition 4.4.
Let the parameters α ∈ [0 , , s > , β ∈ N , and ν > be fixed. Further, let M j denote the curvelet indices at scale j . The sequence { θ µ } µ ∈ M j of coefficients θ µ = h f, ψ µ i obeys k{ θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . jρ with ρ = s max { αβ − , } / (1 + β ) and an implicit constant independent of scale j ∈ N and f ∈ E β ([ − , ; ν ) . f into fragments, a technique pioneered in [5]. To thisend, let Q j at every scale j ∈ N denote the collection of cubes Q := Q ( j )( k ,k ) := [2 − jsα ( k − , − jsα ( k + 1)] × [2 − jsα ( k − , − jsα ( k + 1)] , ( k , k ) ∈ Z . Further, let ω ∈ C ∞ ([ − , ) be a nonnegative window vanishing outside the square [ − , , such thatthe family { ω Q } Q ∈Q j of functions ω Q ( x ) := ω (2 jsα x − k , jsα x − k ) is a partition of unity, i.e., it hasthe property P Q ∈Q j ω Q = 1. Following [5] we then decompose f = P Q f Q into the fragments f Q := f ω Q , Q ∈ Q j . (37)Note that supp f Q ⊆ Q and that the size of the squares Q ∈ Q j corresponds to the ‘essential’ length ofthe curvelets at scale j . Therefore h f, ψ µ i ≈ h f Q , ψ µ i for a curvelet ψ µ at the location of the cube Q .For every Q ∈ Q j we now investigate the sparsity of the sequence θ Q := {h f Q , ψ µ i} µ ∈ M j . (38)Clearly, due to supp f ⊆ [ − , we only need to consider cubes Q ∈ Q j which meet the square [ − , .Of these relevant cubes, let us collect those which intersect the straight edge in Q j , the others in Q j .The associated fragments f Q will be called edge fragments and smooth fragments , respectively. The mainresult concerning the sparsity of (38) is Proposition 4.5. Proposition 4.5.
Let α ∈ [0 , , s > , β ∈ N , and ν > be fixed. Let Q ∈ Q j , j ∈ N , be a square and θ Q the curvelet coefficient sequence of the fragment f Q = f ω Q defined in (38) . There is a constant C > independent of j ∈ N and Q ∈ Q j such that for all f ∈ E β ([ − , ; ν ) the following estimates hold true.(i) If Q ∈ Q j the sequence θ Q satifies k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ C · − jsα .(ii) If Q ∈ Q j the sequence θ Q satisfies k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ C · − jsα jρ with ρ = s max { αβ − , } / (1 + β ) . A direct consequence of Proposition 4.5, whose proof is given later on, is Proposition 4.4.
Proof of Proposition 4.4.
We have the decomposition { θ µ } µ ∈ M j = P Q ∈Q j θ Q . Since 0 < / (1 + β ) ≤ p -triangle inequality with p = 2 / (1 + β ) yields k { θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) ≤ X Q ∈Q j k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ (cid:0) Q j (cid:1) · sup Q ∈Q j k θ Q k / (1+ β ) wℓ / (1+ β ) + (cid:0) Q j (cid:1) · sup Q ∈Q j k θ Q k / (1+ β ) wℓ / (1+ β ) . Since f is supported in [ − , , there are constants C , C >
0, independent of scale, such that Q j ≤ C jsα and Q j ≤ C jsα . Utilizing the estimates of Proposition 4.5, we thus obtain with ρ = s max { αβ − , } / (1 + β ) ≥ k { θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . C + C jρ . jρ . In the remainder of this section we are concerned with the proof of Proposition 4.5. Hereby, we restrictto functions f ∈ E β ([ − , ; ν ) of the simple form f = gχ H ( ϕ,c ) (39)with g ∈ C β ([ − , , ν ) and H ( ϕ, c ) ∈ Straight a half-space determined by ϕ ∈ [0 , π ) and c ∈ R .Note that for a general cartoon f = f + f χ H ( ϕ,c ) both components e f := f and e f := f χ H ( ϕ,c ) havethe form (39), due to the representation f = f χ H (0 , − .Hence, if the estimates of Proposition 4.5 are proven for elements of type (39), they are then also truefor all f ∈ E β ([ − , ; ν ). This is a consequence of the estimate 2 − jsα ≤ − jsα jρ and k θ Q k / (1+ β ) wℓ / (1+ β ) ≤ k{h e f ω Q , ψ µ i} µ ∈ M j k / (1+ β ) wℓ / (1+ β ) + k{h e f ω Q , ψ µ i} µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . Q ∈ Q j be a cube at scale j ∈ N with center M Q := 2 − jsα ( k , k ) ∈ R , which nontrivially intersectsthe cartoon domain [ − , . If Q ∈ Q j we put P Q := M Q . If Q ∈ Q j let us fix a point P Q ∈ Q on theedge curve { ( x , x ) ∈ R : x cos( ϕ ) − x sin( ϕ ) = c } of the cartoon such that χ H ( ϕ,c ) = H ( R ϕ ( x − P Q )),with rotation matrix (6) and where H := H ⊗ H ( t ) = ( , if t < , , if t ≥ . (40)Putting e g Q ( x ) := g ( R − ϕ x + P Q ) and e ω Q ( x ) := ω Q ( R − ϕ x + P Q ), the fragment f Q can then be written as f Q ( x ) = f ˜ Q (cid:0) R ϕ ( x − P Q ) (cid:1) with a function f ˜ Q of the form( i ) f ˜ Q := e g Q e ω Q , if Q ∈ Q j , or ( ii ) f ˜ Q := e g Q e ω Q H , if Q ∈ Q j . (41)On the Fourier side we have b f Q ( ξ ) = c f ˜ Q ( R ϕ ξ ) exp (cid:0) − πi h P Q , ξ i (cid:1) . Now, let ψ µ = ψ j,ℓ,k ∈ C s,α be a fixed curvelet and recall b ψ j,ℓ,k = W j,ℓ u j,ℓ,k with the real-valued wedgefunctions W j,ℓ ( · ) = W j, ( R j,ℓ · ) from (10) and the functions u j,ℓ,k ( · ) = 2 − js (1+ α ) / exp(2 πi h R − j,ℓ A − j k, ·i ) . There are unique k • ∈ Z and ∆ k ∈ [0 , such that P Q = R − j,ℓ A − j ( k • + ∆ k ). Further, we can express ϕ as a ‘fractional multiple’ of the angle ϕ j defined in (4), writing ϕ = ( ℓ • − ∆ ℓ ) ϕ j with unique ℓ • ∈ Z and ∆ ℓ ∈ [0 , h f Q , ψ j,ℓ,k i = h b f Q , b ψ j,ℓ,k ih f Q , ψ j,ℓ,k i = Z R c f ˜ Q (cid:0) R j,ℓ • − ∆ ℓ ξ (cid:1) exp (cid:0) − πi h R − j,ℓ A − j ( k • + ∆ k ) , ξ i (cid:1) W j,ℓ ( ξ ) u j,ℓ,k ( ξ ) dξ = Z R c f ˜ Q ( ξ ) W j,ℓ − ℓ • +∆ ℓ ( ξ ) u j,ℓ − ℓ • +∆ ℓ,k + k • +∆ k • ( ξ ) dξ. Relabelling the indices ( l , k ) := ([ ℓ − ℓ • ] , k + k • ), where [ ℓ − ℓ • ] ∈ {− L − j , . . . , L + j } is the unique numberobtained by shifting ℓ − ℓ • ∈ Z by integer multiples of L j = πϕ − j (see (5)), we can write h f Q , ψ j,ℓ,k i = Z R c f ˜ Q ( ξ ) W j, l +∆ ℓ ( ξ ) u j, l +∆ ℓ, k +∆ k ( ξ ) dξ. (42)To estimate the integral (42) we need knowledge about the Fourier localization of the functions f ˜ Q . Thisinvestigation is carried out in the next two subsections. The Fourier analysis of the functions f ˜ Q , Q ∈ Q j , from (41) is conducted in a generic setting, independentof the concrete cube Q . We assume α ∈ [0 , β ∈ N , and let κ, ν, ˜ ν > f j , j ∈ N , called standard fragments, defined by( i ) f j := gω j , or ( ii ) f j := gω j H , (43)where H is the step function (40), g ∈ C β ( κ [ − , , ν ) and ω j := ω (2 jsα · ) with ω ∈ C ∞ ( R ) ∩ C β ( κ [ − , , ˜ ν ). For every Q ∈ Q j the corresponding fragment f ˜ Q is of the form (43) with specificfunctions g and ω , namely g = e g Q and ω = e ω Q (2 − jsα · ) (compare to (41)). Note that the parameters κ, ν, ˜ ν > Q ∈ Q j , e.g. κ = 2 √
2, and ˜ ν, ν > f ∈ E β ([ − , ; ν ) and the partition of unity { ω Q } Q utilized in (37). Since the resultsof this subsection are valid uniformly for all choices of g and ω , as long as they fulfill the specificationsin accordance with κ, ν, ˜ ν >
0, they hence apply to all fragments f ˜ Q .The investigation starts with an elementary lemma, where I j , j ∈ N , denote the dyadic intervalsintroduced in (7).3 Lemma 4.6.
Let s > be fixed and for j ∈ N let f j be fragments of the form (43) . Then there exists aconstant C > independent of j ∈ N and the concrete choice of the functions g and ω in (43) such thatfor every p ∈ N and ϕ ∈ [ − π, π ) Z I p | b f j ( r, ϕ ) | dr ≤ Cε j,p ( ϕ )2 − ps − jsα k g k ∞ k ω k with functions ε j,p : [ − π, π ) → R satisfying P p ∈ N R π − π ε j,p ( ϕ ) dϕ ≤ .Proof. Let us assume k g k ∞ = 0 and k ω k ∞ = 0, otherwise the proof is trivial. Since for every p, j ∈ N and ϕ ∈ [ − π, π ) I j,p ( ϕ ) := Z I p | b f j ( r, ϕ ) | dr < ∞ we can define functions ǫ j,p : [ − π, π ) → R via ǫ j,p ( ϕ ) := I j,p ( ϕ )2 ps jsα k g k − ∞ k ω k − . Then I j,p ( ϕ ) = ǫ j,p ( ϕ )2 − ps − jsα k g k ∞ k ω k . Let us prove that there is a constant
C >
0, independent of the relevant parameters, such that X p ∈ N Z π − π ǫ j,p ( ϕ ) dϕ ≤ C. (44)We put f ˜ j = f j (2 − jsα · ). Then b f j = 2 − jsα c f ˜ j (2 − jsα · ) and it follows for p ∈ N k b f j k L ( C p ) = 2 − jsα k c f ˜ j k L (2 − jsα C p ) , where C p are the coronae defined in (3). We conclude k g k ∞ k ω k X p ∈ N Z π − π ǫ j,p ( ϕ ) dϕ = X p ∈ N jsα Z π − π I j,p ( ϕ )2 ps dϕ ≍ X p ∈ N jsα k b f j k L ( C p ) = X p ∈ N k c f ˜ j k L (2 − jsα C p ) ≍ k c f ˜ j k = k f ˜ j k . Using k f ˜ j k ≤ k g (2 − jsα · ) k ∞ k ω k = k g k ∞ k ω k we arrive at (44). Finally, note that the functions ε j,p := C − / ǫ j,p have properties as desired.An immediate consequence of Lemma 4.6 is the following corollary, with particular choice j = p . Corollary 4.7.
Let s > be fixed and assume that f j , j ∈ N , are fragments of the form (43) . Thereexist functions ε j : [ − π, π ) → R , each with the property R π − π ε j ( ϕ ) dϕ ≤ , and a constant C > suchthat for every j ∈ N and ϕ ∈ [ − π, π ) Z I j | b f j ( r, ϕ ) | dr ≤ Cε j ( ϕ )2 − js − jsα k g k ∞ k ω k . Moreover, the constant C can be chosen independent of the functions ω and g .Proof. The functions ε j := ε j,j obtained from Lemma 4.6 by choosing p = j have the desired properties.In particular they satisfy R π − π ε j ( ϕ ) dϕ ≤ j ∈ N .Note, that the smoothness of f j did not enter the proofs of the previous two results. By incorporatingsmoothness information we can strengthen Corollary 4.7 for a smooth fragment of the form (i) in (43). Lemma 4.8.
Let s > , α ∈ [0 , , and put γ = ⌈ / (1 − α ) ⌉ . For j ∈ N let f j be a smooth fragmentof the form (i) in (43) with regularity C β , β ∈ N . Then there exist functions ε j : [ − π, π ) → R and aconstant C > such that for every j ∈ N and ϕ ∈ [ − π, π ) Z I j | b f j ( r, ϕ ) | dr ≤ Cε j ( ϕ )2 − js − jsα − jsβ k g k β, ∞ k ω k β, with R π − π ε j ( ϕ ) dϕ ≤ for every j ∈ N . The constant C can be chosen independent of ω and g . Proof. If β = 0 the assertion is given by Corollary 4.7. For β ≥ β , whereby we restrict our considerations to j ≥ j = 0 the asserted estimate is clearly true,also due to Corollary 4.7.For fixed angle ϕ ∈ [ − π, π ) let ∂ r denote the radial derivative in the corresponding direction. Put e g := ∂ r g , e ω := ∂ r ω , and e ω j := e ω (2 jsα · ). Then ∂ r f j ( · , ϕ ) = e gω j + 2 jsα g e ω j and we conclude for j ∈ N js Z I j | b f j ( r, ϕ ) | dr ≍ Z I j | r b f j ( r, ϕ ) | dr . Z I j | d ∂ r f j ( r, ϕ ) | dr ≍ Z I j | de gω j ( r, ϕ ) | dr + 2 jsα Z I j | d g e ω j ( r, ϕ ) | dr =: e I (0) j ( ϕ ) + 2 jsα I (1) j ( ϕ ) . Hence, we get I (0) j ( ϕ ) := Z I j | b f j ( r, ϕ ) | dr . − js e I (0) j ( ϕ ) + 2 − js (1 − α ) I (1) j ( ϕ ) . The integral I (1) j ( ϕ ) can be estimated in the same way as I (0) j ( ϕ ). After γ = ⌈ / (1 − α ) ⌉ iterations weend up with e I (0) j ( ϕ ), . . . , e I ( γ − j ( ϕ ), and e I ( γ ) j ( ϕ ) := I ( γ ) j ( ϕ ). Since γ ≥ / (1 − α ) it holds I (0) j ( ϕ ) . − js γ − X k =0 − js (1 − α ) k e I ( k ) j ( ϕ ) + 2 − js (1 − α ) γ e I ( γ ) j ( ϕ ) ≤ − js γ X k =0 e I ( k ) j ( ϕ ) . Note that g ∈ C β − ([ − κ, κ ] ) and e g ∈ C β − ([ − κ, κ ] ), with κ the fixed parameter from (43). Using theinduction hypothesis, the expressions e I ( k ) j can be estimated with corresponding functions ε ( k ) j : [ − π, π ) → R . Putting ε j := P γk =0 ε ( k ) j yields the desired result.Our next goal is to estimate the energy of b f j contained in wedges W + J of the form (14). However, weallow more general scale-angle pairs J = ( j, ℓ ) ∈ J + from the set J + := (cid:8) ( j, ℓ ) : j ∈ N , ℓ ∈ [ − L − j , L + j + 1) (cid:9) . The associated orientations, given by ϕ J = ℓϕ j with ϕ j = π −⌊ js (1 − α ) ⌋− fixed as in (4), then comprisethe whole interval [ − π , π ). To formulate the next result we need the quantities A J := 12 Z A J ε j ( ϕ ) dϕ, J ∈ J + , (45)corresponding to angular intervals A J given as in (12) and the functions ε j : [ − π, π ) → R associated to f j from Corollary 4.7. Lemma 4.9.
Let ( m , m ) ∈ N be fixed and assume that f j is of the form (43) . Further, for J ∈ J + let A J be the value defined in (45) . Then k ∂ ( m ,m ) b f j k L ( W + J ) . A J − j ( m + m ) sα − jsα k g k ∞ k ω k , with an implicit constant independent of J ∈ J + and the functions g and ω .Proof. Using Corollary 4.7 we calculate (in the nontrivial case when g = 0 and ω = 0) k g k − ∞ k ω k − Z W + J | b f j ( ξ ) | dξ = Z I j Z A J | b f j ( r, ϕ ) | r dϕ dr . − jsα Z A J ε j ( ϕ ) dϕ ≍ A J − jsα . This proves the assertion for ( m , m ) = (0 , m = ( m , m ) = (0 ,
0) we define a new window˜ ω ( x ) := x m ω ( x ) and put ˜ ω j ( x ) := ˜ ω (2 jsα x ) for x ∈ R . Then x m ω j ( x ) = 2 − jsα ( m + m ) ˜ ω (2 jsα x ) = 2 − jsα ( m + m ) ˜ ω j ( x ) , x ∈ R . f ˜ j := g ˜ ω j H (or in case of a smooth fragment f ˜ j := g ˜ ω j ) we can write Z W + J | ∂ ( m ,m ) b f j ( ξ ) | dξ ≍ Z W + J | [ x m f j ( ξ ) | dξ = 2 − jsα ( m + m ) Z W + J | c f ˜ j ( ξ ) | dξ. Since f ˜ j is of the form (43), the integral on the right-hand side can be estimated as above with Corol-lary 4.7. The proof is finished since k ˜ ω k . k ω k .For the smooth fragments we can improve this result, taking into account smoothness information. Lemma 4.10.
Let s > , α ∈ [0 , , and γ = ⌈ / (1 − α ) ⌉ . For j ∈ N let f j be a smooth fragment of theform (i) in (43) with regularity C β , β ∈ N . Let J = ( j, ℓ ) ∈ J + be a scale-angle pair, A J be given as in (45) . For ( m , m ) ∈ N k ∂ ( m ,m ) b f j k L ( W + J ) . A J − j ( m + m ) sα − jsα − jsβ k g k β, ∞ k ω k β, . Proof.
The proof is analogous to Lemma 4.9, using Lemma 4.8 instead of Corollary 4.7.To formulate the main result of this subsection we need the differential operator L J, := ( Id − jsα D J, )( Id − jsα D J, ) , (46)where Id is the identity and the partial derivatives D J, and D J, , dependent on J ∈ J + , are given by D J, := cos( ϕ J ) ∂ + sin( ϕ J ) ∂ and D J, := − sin( ϕ J ) ∂ + cos( ϕ J ) ∂ . (47)Recall that ϕ J = ℓϕ j with ϕ j as in (4). Further, recall the functions W J from (10) with supp W J ⊆ W + J . Proposition 4.11.
Let L J, be the differential operator (46) and let d ∈ N be arbitrary but fixed.(i) An edge fragment f j of the form (ii) in (43) satisfies the estimate Z R |L dJ, ( b f j W J )( ξ ) | dξ . A J − jsα . (ii) A smooth fragment f j of the form (i) in (43) satisfies the improved estimate Z R |L dJ, ( b f j W J )( ξ ) | dξ . A J − jsα − jsβ . Here A J are the quantities defined in (45) . The implicit constants are independent of J ∈ J + , ω and g .Proof. Using the definition (47) of the operators D J, and D J, we obtain for ( m , m ) ∈ N D m J, D m J, = X a + b = m a + b = m c a ,a ,b ,b (sin ϕ J ) a + b (cos ϕ J ) a + b ∂ ( a + a ,b + b ) (48)with purely combinatorial coefficients c a ,a ,b ,b ∈ Z . This leads to kD m J, D m J, b f j k L ( W + J ) ≤ C ( m , m ) X a + b = m a + b = m k ∂ ( a + a ,b + b ) b f j k L ( W + J ) with a constant C ( m , m ) >
0. If f j is an edge fragment, we proceed with Lemma 4.9 and deduce kD m J, D m J, b f j k L ( W + J ) . X a + b = m a + b = m A J − j ( m + m ) sα − jsα . A J − j ( m + m ) sα − jsα . d , d ∈ N . The function D d J, D d J, ( b f j W J ) is a linear combination of terms ( D m J, D m J, b f j )( D n J, D n J, W J )with m + n = d and m + n = d . In view of (26) and the estimate above, it holds kD m J, D m J, b f j k L ( W + J ) · kD n J, D n J, W J k ∞ . A J − j ( m + m ) sα − jsα · − sjn − sjαn ≤ A J − jsαd − jsαd − jsα . Using H¨older’s inequality we thus obtain for d , d ∈ N kD d J, D d J, ( b f j W J ) k . A J − jsα ( d + d ) − jsα . Since L dJ, ( b f j W J ) consists of terms of the form2 jsα ( d + d ) D d D d with d , d ≤ d , not taking into account combinatorial coefficients, the desired estimate for each term of L dJ, ( b f j W J ) follows.If f j is a smooth fragment of regularity C β , we use Lemma 4.10 instead of Lemma 4.9. The rest ofthe proof is completely analogous. As in the previous Subsection 4.2, let α ∈ [0 , β ∈ N , and κ, ν, ˜ ν > g ∈ C β ( κ [ − , , ν ), ω j = ω (2 jsα · ) and ω ∈ C ∞ ( R ) ∩ C β ( κ [ − , , ˜ ν ). Further, let δ denote the univariateDirac distribution and define δ { x =0 } := δ ⊗
1. We are interested in the Fourier localization of thedistributions d j := gω j δ { x =0 } , j ∈ N . (49)The exposition is analogous to the investigation of the functions (43) in Subsection 4.2. A valuable toolis given by the following lemma, where I j are the intervals defined in (7). Lemma 4.12.
Let e A = 0 and κ, s > be fixed. Further assume that h ∈ C β ( R ) , β ∈ N , is a function with supp h ⊆ [ − κ, κ ] . Then there are a constant C > and numbers η j ∈ [0 , , j ∈ N , with P j ∈ N η j ≤ such that for every j ∈ N Z e A I j | b h ( r ) | dr = Cη j | e A js | − β k h ( β ) k . Moreover, the constant C can be chosen independent of h and e A .Proof. Define ˜ η j := | e A js | β Z e A I j | b h ( r ) | dr. Then P j ∈ N ˜ η j ≤ C k h ( β ) k with a constant C > X j ∈ N ˜ η j ≍ X j ∈ N Z e A I j | r | β | b h ( r ) | dr ≍ X j ∈ N Z e A I j | d h ( β ) ( r ) | dr . Z R | d h ( β ) ( r ) | dr = k h ( β ) k . In case k h ( β ) k = 0, rescaling yields functions η j := C − k h ( β ) k − ˜ η j as desired. The case k h ( β ) k = 0 istrivial, since then h ≡ h ⊆ [ − κ, κ ].With Lemma 4.12 we can prove the following result. Lemma 4.13.
Let s > be fixed and ϕ ∈ [ − π, π ) . We have for j ∈ N Z I j | b d j ( r, ϕ ) | dr . − jsα js (1 − α ) (1 + 2 js (1 − α ) | sin( ϕ ) | ) − β − k g k β, ∞ k ω k β, . Proof.
The distribution d j = gω j δ { x =0 } can be written as the tensor product d j = δ ⊗ h j of the Diracdistribution δ with the function h j := ( gω j ) | { x =0 } . Therefore, we have b d j = \ δ ⊗ h j = 1 ⊗ b h j = b h j ◦ π , where π : R → R is the orthogonal projection onto the second variable.Let ϕ ∈ [ − π, π ) and assume first that | sin( ϕ ) | ≥ − js (1 − α ) . Then ϕ / ∈ {− π, } and it holds Z I j | b d j ( r, ϕ ) | dr = Z I j | b h j ( r sin( ϕ )) | dr = | sin( ϕ ) | − Z sin( ϕ ) I j | b h j ( r ) | dr. Applying Lemma 4.12 with e A = sin( ϕ ) yields Z I j | b d j ( r, ϕ ) | dr . η j − jsβ | sin( ϕ ) | − β − k h ( β ) j k = η j − jsαβ − js (1 − α ) β | sin( ϕ ) | − β − k h ( β ) j k L ( R ) , where η j ≤ j ∈ N . Note that Lemma 4.12 is applied with a different integrand | b h j | at eachscale. However, the implicit constants are uniform over all j ∈ N .Applying Leibniz’s rule h ( β ) j = P γ ≤ β (cid:0) βγ (cid:1) ∂ γ g (0 , · ) ∂ β − γ ω j (0 , · ) we further deduce k h ( β ) j k . X γ ≤ β k ∂ γ g (0 , · ) k ∞ k ∂ β − γ ω j (0 , · ) k . . − jsα jsαβ k ω k β, k g k β, ∞ . This settles the case | sin( ϕ ) | ≥ − js (1 − α ) . If | sin( ϕ ) | < − js (1 − α ) we argue differently based on k b h j k ∞ ≤k h j k ≤ · − jsα k h j k . We deduce Z I j | b d j ( r, ϕ ) | dr = Z I j | b h j ( r sin( ϕ )) | dr . js k b h j k ∞ . js (1 − α ) k h j k . The proof is finished since k h j k ≤ k ω j (0 , · ) k k g (0 , · ) k ∞ ≤ − jsα k ω k k g k ∞ .Lemma 4.13 shows that the Fourier decay of d j is highly dependent on the direction ϕ ∈ [ − π, π ). Itmotivates the introduction of the quantity ℓ J := 1 + 2 js (1 − α ) | sin( ϕ J ) | , J = ( j, ℓ ) ∈ J + , (50)where ϕ J = ℓϕ j and ϕ j = π −⌊ js (1 − α ) ⌋− is the angle in (4). Note that 1 ≤ ℓ J ≤ js (1 − α ) .Similar to the analysis of the fragments (43), we now proceed to estimate the Fourier energy of b d j concentrated in a wedge W + J . The following result corresponds to Lemmas 4.9 and 4.10. Lemma 4.14.
Let J ∈ J + be a scale-angle pair, ℓ J the associated quantity (50) . For ( m , m ) ∈ N k ∂ ( m ,m ) b d j k L ( W + J ) . k g k β, ∞ k ω k β, ( , m = 0 , − jm sα js (1 − α ) ℓ − β − J , m = 0 . The implicit constant is independent of J ∈ J + and g and ω .Proof. If m = 0 the assertion follows from ∂ m b d j = ∂ m (cid:0)b h j ◦ π (cid:1) = 0. To handle the case m = 0,let us introduce the modified window ˜ ω ( x ) = x m ω ( x ) and its rescaled versions ˜ ω j = ˜ ω (2 jsα · ). Then˜ ω j ( x ) = 2 jsαm x m ω j ( x ), and as a consequence ∂ m b d j = ( − πi ) m \ x m d j = (2 πi ) m − jsαm c d ˜ j with d ˜ j := g ˜ ω j δ { x =0 } of the form (49). Hence, we can apply Lemma 4.13, which yields Z W + J | ∂ m b d j ( ξ ) | dξ ≍ − jsαm Z I j Z A J | c d ˜ j ( r, ϕ ) | r dϕ dr . − jsαm k g k β, ∞ k ˜ ω k β, Z A J js (1 − α ) (1 + 2 js (1 − α ) | sin( ϕ ) | ) − β − dϕ . − jsαm js (1 − α ) ℓ − β − J k g k β, ∞ k ω k β, . L J, := ( Id − js ℓ − J D J, )( Id − jsα D J, ) , (51)where we use the same notation as in the definition of the operator (46). Similar to Proposition 4.11 weobtain the following result. Proposition 4.15.
Let L J, be the differential operator (51) , J ∈ J + , and d ∈ N . We have Z R |L dJ, ( b d j W J )( ξ ) | dξ . js (1 − α ) ℓ − β − J . The implicit constant is independent of J ∈ J + , ω and g .Proof. Let ( m , m ) ∈ N . In view of (48) and Lemma 4.14 we obtain kD m J, D m J, b d j k L ( W + J ) . X a + b = m a + b = m | sin( ϕ J ) | a + b ) k ∂ ( a + a ,b + b ) b d j k L ( W + J ) = | sin( ϕ J ) | m k ∂ (0 ,m + m ) b d j k L ( W + J ) . | sin( ϕ J ) | m − j ( m + m ) sα js (1 − α ) ℓ − β − J . Using | sin( ϕ J ) | ≤ − js (1 − α ) ℓ J , we can further deduce kD m J, D m J, b d j k L ( W + J ) . − jm s − jm sα js (1 − α ) ℓ m − β − J . The function D d J, D d J, ( b d j W J ) is a linear combination of terms ( D m J, D m J, b d j )( D n J, D n J, W J ) with m + n = d and m + n = d . They satisfy kD m J, D m J, b d j k L ( W + J ) · kD n J, D n J, W J k ∞ . − jm s − jm sα js (1 − α ) ℓ m − β − J · − jsn − jsαn = 2 − jsd − sjαd js (1 − α ) ℓ m − β − J . Using H¨older’s inequality, it follows for d , d ∈ N kD d J, D d J, ( b d j W J ) k . − jsd − jsαd js (1 − α ) ℓ d − β − J . This proves the desired estimate for each term of L dJ, ( b d j W J ), since these are of the form2 jsd jsαd ℓ − d J D d D d ( b d j W J ) with d , d ≤ d . After the preparation of the preceding two subsections we now turn back to the proof of Proposition 4.5.Due to the assumptions, α ∈ [0 , s > β ∈ N , ν > f ∈ E β ([ − , ; ν ) is of thesimplified form (39). Further recall that for a cube Q ∈ Q j , j ∈ N , the notation f Q is used for theassociated fragment (37).Instead of the sequence θ Q = { θ µ } µ ∈ M j , we will analyze the relabelled sequence ˜ θ Q := { ˜ θ µ } µ ∈ M j withelements ˜ θ j,ℓ,k := θ j, [ ℓ + ℓ • ] ,k − k • , where we use the notation introduced at the end of Subsection 4.1. Recallthat the quantities ℓ • ∈ Z , k • ∈ Z are determined by Q ∈ Q j . In view of (42), we then have˜ θ j,ℓ,k = Z R c f ˜ Q ( ξ ) W j,ℓ +∆ ℓ ( ξ ) u j,ℓ +∆ ℓ,k +∆ k ( ξ ) dξ (52)with fixed ∆ k ∈ [0 , , ∆ ℓ ∈ [0 ,
1) depending on Q ∈ Q j . We define ∆ J := (0 , ∆ ℓ ) and J + := J + ∆ J for scale-angle pairs J = ( j, ℓ ) ∈ J . Further, we define for J = ( j, ℓ ) ∈ J and K = ( K , K ) ∈ Z the sets Z QJ,K := n ( k , k ) ∈ Z : ℓ − J +∆ J ( k + ∆ k ) ∈ [ K , K + 1) , k + ∆ k ∈ [ K , K + 1) o , e Z QJ,K := n ( k , k ) ∈ Z : 2 − js (1 − α ) ( k + ∆ k ) ∈ [ K , K + 1) , k + ∆ k ∈ [ K , K + 1) o . (53)9In the definition of Z QJ,K the quantity ℓ J +∆ J = 1 + 2 − js (1 − α ) | sin( ϕ J +∆ J ) | is used, with angle ϕ J +∆ J =( ℓ + ∆ ℓ ) ϕ j and ϕ j as in (4). To shorten notation, it is further useful to henceforth abbreviate L K := (1 + K )(1 + K ) . (54)Essential for the proof of Proposition 4.5, especially part (ii), is the following lemma which disentanglesthe smooth contribution from the singular part. Lemma 4.16.
Let j ∈ N and Q ∈ Q j be fixed. Under the assumptions of Proposition 4.5, the relabelledcoefficients ˜ θ Q = { ˜ θ µ } µ ∈ M j given by (52) can be decomposed in the form ˜ θ µ = a µ + b µ , µ ∈ M j , such that for every J ∈ J with | J | = j and every K ∈ Z , with a uniform constant and d ∈ N fixed, X k ∈ Z QJ,K | a j,ℓ,k | . L − dK − js (1+ α ) ℓ − β − J and X k ∈ e Z QJ,K | b j,ℓ,k | . L − dK e A J − jsα − jsβ . Here L K is the quantity defined in (54) , Z QJ,K and e Z QJ,K are given by (53) , and e A J ∈ [0 , are numberswith P | J | = j e A J ≤ . If f Q is a smooth fragment, a possible decomposition is given by a µ := 0 and b µ := ˜ θ µ for µ ∈ M j . It is important to note that the implicit constants in Lemma 4.16 can be chosen uniformly for all j ∈ N and Q ∈ Q j . Proof.
Recall, that the functions u J,k , J ∈ J + , are obtained by rotation of the function u j, ,k ( ξ ) = 2 − js (1+ α ) / exp (cid:0) h πi (2 − js k , − jsα k ) , ξ i (cid:1) , ξ ∈ R . Hence D J, u J,k = (2 πi )2 − js k u J,k and D J, u J,k = (2 πi )2 − jsα k u J,k for each J ∈ J + . We thus establish L J, u J,k = (cid:0) π ) − js (1 − α ) k (cid:1)(cid:0) π ) k (cid:1) u J,k for the differential operator L J, defined in (46). Applying partial integration, we obtain from (52)˜ θ J,k = (cid:0)(cid:0) π − js (1 − α ) ( k + ∆ k ) (cid:1)(cid:0) π ) ( k + ∆ k ) (cid:1)(cid:1) − d Z R L dJ + , ( c f ˜ Q W J + )( ξ ) u J + ,k +∆ k ( ξ ) dξ. Further, since u J +∆ J,k +∆ k ( ξ ) = u J +∆ J,k ( ξ ) · exp (cid:0) h πi (2 − js ∆ k , − jsα ∆ k ) , R J +∆ J ξ i (cid:1) and { u J + ,k } k ∈ Z is an orthonormal basis for L (Ξ J + ), we obtain for J ∈ J , | J | = j , and K = ( K , K ) ∈ Z X k ∈ e Z QJ,K | ˜ θ j,ℓ,k | ≤ (1 + K ) − d (1 + K ) − d Z R |L dJ + , ( c f ˜ Q W J + )( ξ ) | dξ. (55)In case that f ˜ Q is a smooth fragment, Proposition 4.11 (ii) yields X k ∈ e Z QJ,K | ˜ θ j,ℓ,k | . L − dK A J +∆ J − jsα − jsβ . By relabelling e A J := A J +∆ J we get the desired result.If f ˜ Q is an edge fragment, we prove the assertion by induction on β . In case β = 0, we choose b µ := ˜ θ µ and a µ := 0. Then the assertion is fulfilled, since by (55) and Proposition 4.11 (i) X k ∈ e Z QJ,K | ˜ θ j,ℓ,k | . L − dK A J − jsα . β ≥ j = 0, also due toProposition 4.11 (i).It thus remains to prove the assertion for j, β ∈ N . If j ∈ N , by definition, W J ( ξ ) = U j ( | ξ | ) V J ( ξ/ | ξ | ) = U (2 − js | ξ | ) V J ( ξ/ | ξ | ). To use induction we rewrite (52) in the form˜ θ J,k = 2 − js Z R | ξ | c f ˜ Q ( ξ ) U (2 − js | ξ | ) V J +∆ J ( ξ/ | ξ | )2 − js | ξ | u J +∆ J,k +∆ k ( ξ ) dξ. We introduce the function e U ( r ) = U ( r ) r , r ∈ R +0 , and put e U j := e U (2 − js · ) for j ≥
1. In addition,we put e U ( r ) = U ( r ), r ∈ R +0 . Further, we define e V J ( ξ ) := V J ( ξ ) cos( | ϕ ( ξ ) − ϕ J | ) − for ξ ∈ S and J ∈ J + , | J | ≥
1. For J = (0 ,
0) we define e V J := V J . Note that for ξ ∈ A J , | J | ≥
1, we have | ϕ ( ξ ) − ϕ J | ≤ ϕ + j / ≤ π/ ≤ cos( | ϕ ( ξ ) − ϕ J | ) − ≤
3. For J ∈ J + we then define f W J ( ξ ) := e U j ( | ξ | ) e V J ( ξ/ | ξ | ) , ξ ∈ R . The functions { f W J } J ∈ J are again wedge functions of the form (10) which satisfy condition (11) with some(possibly different) constants 0 < A ≤ B < ∞ . Using these functions the coefficients take the form˜ θ J,k = 2 − js Z R | ξ | cos( | ϕ ( ξ ) − ϕ J +∆ J | ) c f ˜ Q ( ξ ) f W J +∆ J ( ξ ) u J +∆ J,k +∆ k ( ξ ) dξ. ( J, k ) ∈ M j . (56)Now recall the directional derivative D J, = cos( ϕ J ) ∂ +sin( ϕ J ) ∂ depending on J ∈ J + . For ξ = ( ξ , ξ ) =( | ξ | cos ϕ, | ξ | sin ϕ ) ∈ R we have ξ cos( ϕ J ) + ξ sin( ϕ J ) = | ξ | (cid:0) cos( ϕ ) cos( ϕ J ) + sin( ϕ ) sin( ϕ J ) (cid:1) = | ξ | cos( | ϕ − ϕ J | ) . Hence, (56) becomes ˜ θ J,k = (2 πi ) − − js Z R (cid:0) D J + , f ˜ Q (cid:1) ∧ ( ξ ) f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ. The edge fragment f ˜ Q is of the form f j = gω (2 jsα · ) H with g ∈ C β ( R ), ω ∈ C ∞ ( R ), and the bivariatestep function H = h ⊗ e g = D J + , g , e ω = D J + , ω , and e ω j = e ω (2 jsα · ). Further,recall ∂ H = δ { x =0 } and note that D J + , H = cos( ϕ J + ) ∂ H + sin( ϕ J + ) ∂ H = cos( ϕ J + ) δ { x =0 } . The product rule yields D J + , f j = e gω j H + cos( ϕ J + ) δ { x =0 } ω j g + 2 jsα g e ω j H = T + cos( ϕ J + ) T + 2 jsα T with terms T := e gω j H , T := δ { x =0 } ω j g , and T := g e ω j H . This leads to the decomposition˜ θ j,ℓ,k ≍ − js c (0) j,ℓ,k + 2 − js cos( ϕ J + ) d (0) j,ℓ,k + 2 − js (1 − α ) ˜ θ (1) j,ℓ,k (57)with c (0) j,ℓ,k := Z R b T f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ,d (0) j,ℓ,k := Z R b T f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ, ˜ θ (1) j,ℓ,k := Z R b T f W J + ( ξ ) u J + ,k +∆ k ( ξ ) dξ. Note that e g ∈ C β − ( R ) and e ω ∈ C ∞ ( R ) with supp e ω ⊆ supp ω . By induction we can decompose c (0) µ = a (0) µ + b (0) µ , µ ∈ M j , { a (0) µ } µ ∈ M j and { b (0) µ } µ ∈ M j satisfy the assertion for β −
1. The coefficients { d (0) j,ℓ,k } µ ∈ M j can be handled with the help of Proposition 4.15. We have for the differential operator L J, from (51) L J, u J,k = (cid:0) π ) ℓ − dJ k (cid:1)(cid:0) π ) k (cid:1) u J,k . Partial integration leads to d (0) J,k = (cid:0) π ) ℓ − dJ + ( k + ∆ k ) (cid:1) − d (cid:0) π ) ( k + ∆ k ) (cid:1) − d Z R L dJ + , ( b T f W J + )( ξ ) u J + ,k +∆ k ( ξ ) dξ. We deduce that for every J ∈ J with | J | = j and every K = ( K , K ) ∈ Z X k ∈ Z QJ,K | d (0) J,k | ≤ ( L K ) − d Z R |L dJ + , ( b T f W J + )( ξ ) | dξ . ( L K ) − d js (1 − α ) ℓ − β − J + . Here we applied the fact that { u J + ,k } k ∈ Z is an orthonormal basis for L (Ξ J + ) and Proposition 4.15.Finally, note that | sin( ϕ J ) | ≍ | ϕ J | ≍ | ℓ | − js (1 − α ) uniformly for J ∈ J + . Hence, due to ∆ ℓ ∈ [0 , ℓ J ≍ | ℓ | ≍ | ℓ + ∆ ℓ | ≍ ℓ J +∆ J .It remains to handle the sequence { ˜ θ (1) µ } µ ∈ M j which resembles the original sequence { ˜ θ µ } µ ∈ M j and canbe handled accordingly. After γ iterations of the decomposition process (57) we end up with sequences { c (0) µ } µ ∈ M j , . . . , { c ( γ − µ } µ ∈ M j , { d (0) µ } µ ∈ M j , . . . , { d ( γ − µ } µ ∈ M j , and { ˜ θ ( γ ) µ } µ ∈ M j . We choose γ = ⌈ − α ⌉ sothat 2 − js (1 − α ) γ ≤ − js . We can apply the induction hypothesis on { c ( τ ) µ } µ ∈ M j for every τ ∈ { , . . . , γ − } , which leads to sequences { a ( τ ) µ } µ ∈ M j and { b ( τ ) µ } µ ∈ M j . Since g ∈ C β ( R ) ⊂ C β − ( R ) also { ˜ θ ( γ ) µ } µ ∈ M j can be decomposed into twosequences { a ( γ ) µ } µ ∈ M j and { b ( γ ) µ } µ ∈ M j .Finally, we obtain the desired decomposition ˜ θ µ = a µ + b µ , µ ∈ M j , with a µ := 2 − js γ − X τ =0 − js (1 − α ) τ a ( τ ) µ + 2 − js (1 − α ) γ a ( γ ) µ ,b µ := 2 − js γ − X τ =0 − js (1 − α ) τ b ( τ ) µ + 2 − js (1 − α ) γ b ( γ ) µ + 2 − js cos( ϕ J + ) γ − X τ =0 − js (1 − α ) τ d ( τ ) µ . With Lemma 4.16 in our toolbox, it is not difficult any more to prove Proposition 4.5. The remainingconsiderations are merely interpolation arguments.
Proof of Proposition 4.5.
We first handle part (i) of the proposition, when f j is a smooth fragment.Let M j denote the curvelet indices at scale j ∈ N and define M Qj,K := { ( j, ℓ, k ) ∈ M j : k ∈ e Z QJ,K } for K ∈ Z . Since P | J | = j A J .
1, Lemma 4.16 yields for K ∈ Z X µ ∈ M Qj,K | ˜ θ J,k | = X | J | = j X k ∈ e Z QJ,K | ˜ θ J,k | . L − dK − jsα − jsβ . Let us fix d ∈ N as the smallest integer satisfying d > (1 + β ) /
4, i.e., d := ⌊ (1 + β ) / ⌋ + 1. This ensures X K ∈ Z L − d/ (1+ β ) K = X K ∈ Z (cid:0) (1 + K )(1 + K ) (cid:1) − d/ (1+ β ) . , (58)which will be important below. Further, note that we have the estimate X | J | = j e Z QJ,K ≤ X | J | = j js (1 − α ) . js (1 − α ) . k{ c λ } λ ∈ Λ k ℓ p ≤ ( /p − / k{ c λ } λ ∈ Λ k ℓ valid for 0 < p ≤ { c λ } λ ∈ Λ . Interpolation with p = 2 / (1 + β ) yields k{ ˜ θ µ } µ ∈ M Qj,K k / (1+ β ) . jsβ (1 − α ) ( L K ) − d − jsα − jsβ = ( L K ) − d − jsα (1+ β ) . The proof of part (i) is finished by applying the p -triangle inequality with p = 2 / (1 + β ) ≤
1. In viewof (58) we arrive at k{ ˜ θ µ } µ ∈ M j k / (1+ β )2 / (1+ β ) ≤ X K ∈ Z k{ ˜ θ µ } µ ∈ M Qj,K k / (1+ β )2 / (1+ β ) . − jsα . We finally turn to the proof of part (ii) and assume that f j is an edge fragment. We denote by { a µ } µ ∈ M j and { b µ } µ ∈ M j the decomposition of the sequence { ˜ θ µ } µ ∈ M j according to Lemma 4.16. Analogous to thetreatment of the smooth case, one can deduce k{ b µ } µ ∈ M j k / (1+ β )2 / (1+ β ) . − jsα . (59)It remains to handle { a µ } µ ∈ M j . Due to Lemma 4.16 we have with d ∈ N chosen as above X k ∈ Z QJ,K | a j,ℓ,k | . L − dK − js (1+ α ) ℓ − β − J . (60)Recall that ℓ J = 1 + 2 js (1 − α ) | sin( ϕ J ) | ≥ Z QJ,K ≤ ℓ J +∆ J ≍ ℓ J . (61)In view of (60) and (61) we conclude for ε > N QJ,K ( ε ) := n k ∈ Z QJ,K : | a j,ℓ,k | > ε o . min n ℓ J , ε − L − dK − js (1+ α ) ℓ − β − J o . The next step is to show X | J | = j N QJ,K ( ε ) . ε − / ( β +1) L − d/ ( β +1) K − js (1+ α ) / (1+ β ) . (62)Since ℓ J ≍ | ℓ | we can estimate, where we use the quantities ℓ −∗ := ⌈ ℓ ∗ ⌉ − ℓ + ∗ := ⌈ ℓ ∗ ⌉ with ℓ ∗ := ε − / (1+ β ) L − d/ (1+ β ) K − js α β ) , L + j X ℓ =0 N j,ℓ,K ( ε ) . L + j +1 X ℓ =1 min n ℓ, ε − L − dK − js (1+ α ) ℓ − β − o ≤ ℓ −∗ X ℓ =1 ℓ + L + j +1 X ℓ = ℓ + ∗ ε − L − dK − js (1+ α ) ℓ − β − . Note that ℓ −∗ ∈ N . Therefore, it holds ℓ −∗ X ℓ =1 ℓ = 12 ℓ −∗ ( ℓ −∗ + 1) ≤ ℓ ∗ = rhs(62) . Further, taking into account ℓ ∗ ≤ ℓ + ∗ , we obtain L + j +1 X ℓ = ℓ + ∗ ε − L − dK − js (1+ α ) ℓ − β − . ε − L − dK − js (1+ α ) ℓ − β ∗ = rhs(62) . Altogether, this proves (62) since the sum P ℓ = − L − j N Qj,ℓ,K ( ε ) can be estimated analogously.3Recall that M j denotes the curvelet indices at scale j . Using (58) we deduce from (62) n µ ∈ M j : | a µ | > ε o = X K ∈ Z X | J | = j N QJ,K ( ε ) . − js (1+ α ) / (1+ β ) ε − / (1+ β ) . This implies the following estimate, where we let ρ = max (cid:8) , s ( αβ − / (1 + β ) (cid:9) , k{ a µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . − js (1+ α ) / (1+ β ) = 2 − jsα js ( αβ − / (1+ β ) ≤ − jsα jρ . (63)In a last step, we combine (59) and (63). Using the p -triangle inequality with p = β ≤ k{ ˜ θ µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) ≤ k{ a µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) + k{ b µ } µ ∈ M j k / (1+ β ) wℓ / (1+ β ) . − jsα jρ + 2 − jsα . − jsα jρ , which finishes the proof. In this final section we interpret and discuss the results of our previous investigations. First we note thatTheorem 3.11 complements the result of Theorem 3.7. The latter guarantees at least an approximationrate of order N − /α for E β ([ − , ; ν ) if β ≥ α − and α ∈ [ , β = α − . Theorem 3.11 now tells us that this rate doesnot improve for C β cartoons with β > α − , at least if we restrict to greedy approximations obtainedby simple thresholding. Hence, α -curvelets in the range α ∈ [ ,
1) cannot take advantage of cartoonregularity higher than α − .Turning to the range α ∈ [0 , ), according to both, Theorem 3.9 and Theorem 3.11, the approximationdeteriorates as α tends to 0. In Theorem 3.11 the achievable rate peaks for α = , a confirmation of theoutstanding role of parabolic scaling for cartoon approximation. Among all α -curvelet frames, the classicparabolically scaled systems provide the best performance for E β ([ − , ; ν ) if β ≥
2. However, if β > N − is suboptimal.To better understand this behavior, recall the heuristic considerations in Subsection 3.3. A Taylorexpansion showed that C β curves with β ∈ (1 ,
2] are locally contained in (properly aligned) rectangles ofsize width ≈ length /β . This explains why α -scaling with α = β − is optimally suited to resolve suchcurves. It also indicates that it is not the smoothness of the curves that determines the best type ofscaling, but their local scaling behavior. If the second-order Taylor term at some point of a C β curve,where β ≥
2, does not vanish the scaling locally obeys width ≈ length / . Consequently, the choice α = is still the best for C β curves with β ≥ α from 0 deteriorates the approximability of the edge, but according to Theorem 4.1 forsignals from E β ([ − , ; ν ) this deterioration is masked by the overall approximation performance oforder N − β if α ∈ [0 , β − ].It is remarkable that up to now no frame is known where a nonadaptive thresholding scheme yieldsapproximation rates better than N − for the class E β ([ − , ; ν ), β >
2. As we have seen, α -scalingis not able to take advantage of smoothness beyond C , wherefore new ideas need to be considered.One approach might be based on the bendlet transform [40], which incorporates bending in addition to α -scaling for improved adaptability to the edges. While the bendlet dictionary seems to be useful forcertain image analysis tasks, the question of how to extract bendlet frames for approximation is not clearhowever and requires further research.Finally, let us derive some implications of the obtained results for other α -scaled representationsystems. The framework of α -molecules allows to transfer properties of C s,α to other systems of α -molecules if their parametrization is consistent with the parametrization ( M , Φ M ) of C s,α from (23). Forthe required notion of consistency, let us first recall the phase-space metric ω α introduced in [24] for thephase space P = R + × T × R .4 Definition 5.1 ([24, Def. 4.1]) . Let α ∈ [0 , . The α -scaled index distance ω α : P × P → [1 , ∞ ) is definedby ω α ( p λ , p µ ) = max n s λ s µ , s µ s λ o (1 + d α ( p λ , p µ )) , where p λ = ( s λ , θ λ , x λ ) ∈ P , p µ = ( s µ , θ µ , x µ ) ∈ P , and with s = min { s λ , s µ } , e λ = (cos( θ λ ) , − sin( θ λ )) , d α ( p λ , p µ ) = s − α )0 | θ λ − θ µ | + s α | x λ − x µ | + s s − α )0 | θ λ − θ µ | |h e λ , x λ − x µ i| . The consistency of two parametrizations is then defined as follows.
Definition 5.2 ([24, Def. 5.5]) . Let α ∈ [0 , and k > . Two parametrizations (Λ , Φ Λ ) and (∆ , Φ ∆ ) ,for index sets Λ and ∆ respectively, are called ( α, k ) -consistent if sup λ ∈ Λ X µ ∈ ∆ ω α (cid:0) Φ Λ ( λ ) , Φ ∆ ( µ ) (cid:1) − k < ∞ and sup µ ∈ ∆ X λ ∈ Λ ω α (cid:0) Φ Λ ( λ ) , Φ ∆ ( µ ) (cid:1) − k < ∞ . Since C s,α is a tight frame of α -molecules of arbitrary order, as shown by Lemma 2.5, the theory of α -molecules allows to deduce the following result practically for free. Theorem 5.3.
Let α ∈ [0 , and let M := { m λ } λ ∈ Λ be a frame of α -molecules whose parametrization,for some k > , is ( α, k ) -consistent with the α -curvelet parametrization ( M , Φ M ) of C s,α . Further, assumethat for some γ ∈ R +0 the order ( L, M, N , N ) of M satisfies L ≥ k (1 + γ ) , M ≥ k γ ) + α − , N ≥ k γ ) + 1 + α , N ≥ k (1 + γ ) . (64) Then the following holds true:(i) Let ˜ c λ := h f, m λ i , λ ∈ Λ , denote the analysis coefficients of f ∈ E β ([ − , , ν ) with respect to M ,and assume β ∈ N . If (64) is fulfilled for γ = min { β, α − } , then { ˜ c λ } λ ∈ Λ ∈ ℓ p (Λ) for all p > γ .(ii) Let Θ = P λ ∈ Λ c λ m λ be a representation of the function Θ from (31) with respect to M . If (64) isfulfilled for some γ > ˜ γ := max { α, − α } − , then { c λ } λ ∈ Λ / ∈ ℓ p (Λ) for p < γ .Proof. According to [24, Thm. 5.6] condition (64) ensures that the systems M and C s,α are sparsityequivalent in ℓ p for p := γ , which means k ( h m λ , ψ µ i ) λ,µ k ℓ p → ℓ p < ∞ (see [24, Def. 5.3]). Since f = P µ h f, ψ µ i ψ µ and {h f, ψ µ i} µ ∈ ℓ p + ε ( M ), ε >
0, by Theorem 4.3, assertion ( i ) follows. For ( ii ) assumethat { c λ } λ ∈ ℓ p (Λ), which implies by sparsity equivalence {h Θ , ψ µ i} µ ∈ ℓ p ( M ). Using Θ = P µ h Θ , ψ µ i ψ µ and Lemma 4.2, this then implies an N -term approximation rate of order N − γ , in contradiction toTheorem 3.11.A direct corollary is obtained via Lemma 4.2. Corollary 5.4.
Under the assumptions of Theorem 5.3 (i), every dual frame { ˜ m λ } λ ∈ Λ of M yields – viasimple thresholding – N -term approximations f N to f ∈ E β ([ − , ) satisfying k f − f N k . N − min { β,α − } + ε , ε > arbitrary , as N → ∞ . To see the reach of these results, let us mention that the α -shearlet parametrization is ( α, k )-consistentwith the α -curvelet parametrization for k > α -shearlet frames, including both band-limited and compactly supported constructions (see [24,Prop. 3.11]).5 A Bessel Functions
In this appendix we collect some useful facts about Bessel functions mainly taken from [31] and [21]. Weare only interested in Bessel functions J ν of integer and half-integer order in the range ν ∈ {− , , , , . . . } .Bessel functions of this kind occur naturally in the Fourier analysis of radial functions. For t ∈ R + thevalue J ν ( t ) is conveniently defined by either of the two series (see [31] and [21, Appendix B.3]) J ν ( t ) = (cid:16) t (cid:17) ν ∞ X k =0 ( − k Γ( k + 1)Γ( k + ν + 1) (cid:16) t (cid:17) k = 1 √ π (cid:16) t (cid:17) ν ∞ X k =0 ( − k Γ( k + )Γ( k + ν + 1) t k (2 k )! , (65)where the Gamma function Γ extends the factorial z ! to the complex numbers with Γ( z ) = ( z − k + ) = (2 k )! k !4 k √ π for k ∈ N .We explicitly remark, that definition (65) is also valid for ν = − , although this case is not included inthe exposition of [21]. As is obvious from the second representation, the functions J ν of half-integer ordercan be expressed in closed form in terms of trigonometric functions. For integer orders such closed formrepresentations do not exist.If f ( x ) = f ( | x | ) is a radial function on R d , d ∈ N , with a suitable function f defined on R +0 , theFourier transform of f is given by the formula b f ( ξ ) = 2 π | ξ | ( d − / Z ∞ f ( r ) J d/ − (2 πr | ξ | ) r d/ dr , ξ ∈ R d . Applying this formula to the characteristic function χ B d (0 , of the d -dimensional unit ball B d (0 , R d yields( χ B d (0 , ) ∧ ( ξ ) = 2 π | ξ | ( d − / Z J d/ − (2 π | ξ | r ) r d/ dr = J d/ (2 π | ξ | ) | ξ | d/ , ξ ∈ R d . (66)Here, for the integration, we used the second of the following recurrence relations [21, Appendix B.2],which are valid for ν ∈ N and all t ∈ R + , t − ν +1 J ν ( t ) = − ddt (cid:0) t − ν +1 J ν − ( t ) (cid:1) and t ν J ν − ( t ) = ddt (cid:0) t ν J ν ( t ) (cid:1) . The case ν = is not treated in [21], yet it can be easily confirmed by a direct calculation.By scaling, we can further deduce from (66) the following Fourier representation of the bivariatefunction Θ( x ) = χ B (0 , (2 x ), x ∈ R , from (31), b Θ( ξ ) = 14 ( χ B (0 , ) ∧ ( ξ/
2) = J ( π | ξ | )2 | ξ | , ξ ∈ R . (67)Important for our investigation in Section 3 is the asymptotic behavior of J ν ( r ) as r → ∞ . We cite thefollowing result from [21, Appendix B.8], which states for ν ∈ N the identity J ν ( r ) = r πr cos( r − πν − π R ν ( r ) , r ∈ R + , (68)with a function R ν given on R + by R ν ( r ) = (2 π ) − / r ν Γ( ν + 1 / e i ( r − πν/ − π/ Z ∞ e − rt t ν +1 / [(1 + it/ ν − / − dtt + (2 π ) − / r ν Γ( ν + 1 / e − i ( r − πν/ − π/ Z ∞ e − rt t ν +1 / [(1 − it/ ν − / − dtt . Further, for each ν ∈ N there is a constant C ν > R ν satisfies the estimate | R ν ( r ) | ≤ C ν r − / whenever r ≥ . (69)The representation (68) and the estimate (69) play an important role in the proof of Lemma 3.8. Forcompleteness, let us finally note that the identity (68) especially holds true in case ν = − , with vanishing R − ≡
0. This is a direct consequence of the definition (65) and the Taylor series of the cosine.6
Acknowledgements
The author acknowledges support by the BMS (Berlin Mathematical School) and thanks Prof. Dr. GittaKutyniok and Anton Kolleck for proofreading the manuscript, as well as many helpful comments.
References [1] J. Cai, B. Dong, S. Osher, and Z. Shen. Image restoration: total variation, wavelet frames, andbeyond.
J. Amer. Math. Soc. , 25(4):1033–1089, 2012.[2] E. J. Cand`es.
Ridgelets: theory and applications . Ph.D. thesis, Stanford University, CA, 1998. Onlineavailable: http://statweb.stanford.edu/~candes/publications.html .[3] E. J. Cand`es. Ridgelets and the representation of mutilated Sobolev functions.
SIAM J. Math.Anal. , 33(2):347–368, 2001.[4] E. J. Cand`es and D. L. Donoho. Curvelets – a surprisingly effective nonadaptive representation forobjects with edges. In C. Rabut, A. Cohen, and L. Schumaker, editors,
Curves and Surfaces , pages105–120. Vanderbilt University Press, 2000.[5] E. J. Cand`es and D. L. Donoho. New tight frames of curvelets and optimal representations of objectswith C singularities. Comm. Pure Appl. Math. , 57(2):219–266, 2004.[6] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Compressing piece-wise smooth multidimensional functions using surflets: rate-distortion analysis. Technical re-port, Department of Electrical and Computer Engineering, Rice University, Mar. 2004. On-line available: http://dsp.rice.edu/sites/dsp.rice.edu/files/publications/report/2004/compressin-riceece-2004.pdf .[7] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Compression of higher dimensionalfunctions containing smooth discontinuities. In
Conference on Information Sciences and Systems ,Princeton, Mar. 2004.[8] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Surflets: a sparse representation formultidimensional functions containing smooth discontinuities. In
IEEE Symposium on InformationTheory , Chicago, Jul. 2004.[9] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. G. Baraniuk. Representation and compressionof multidimensional piecewise functions using surflets.
IEEE Trans. Inform. Theory , 55(1):374–400,2009.[10] C. Christopoulos, A. Skodras, and T. Ebrahimi. The JPEG2000 still image coding system: anoverview.
IEEE Trans. Consum. Electron. , 46(4):1103–1127, 2000.[11] A. Cohen, W. Dahmen, and R. DeVore. Adaptive wavelet methods for elliptic operator equations:convergence rates.
Math. Comp. , 70(233):27–75, 2001.[12] I. Daubechies.
Ten Lectures on Wavelets . SIAM, Philadelphia, 1992.[13] R. A. DeVore. Nonlinear approximation.
Acta Numerica , 7:51–150, 1998.[14] M. N. Do and M. Vetterli. The contourlet transform: an efficient directional multiresolution imagerepresentation.
IEEE Trans. Image Process. , 14(12):2091–2106, 2005[15] D. L. Donoho. Wedgelets: nearly-minimax estimation of edges.
Ann. Statist. , 27:859–897, 1999.[16] D. L. Donoho. Orthonormal ridgelets and linear singularities.
SIAM J. Math. Anal. , 31(5):1062–1099,2000.[17] D. L. Donoho. Ridge functions and orthonormal ridgelets.
J. Approx. Theory , 111(2):143–179, 2001.7[18] D. L. Donoho. Sparse components of images and optimal atomic decompositions.
Constr. Approx. ,17(3):353–382, 2001.[19] D. L. Donoho and X. Huo. Beamlet pyramids: a new form of multiresolution analysis suited forextracting lines, curves, and objects from very noisy image data. In
Wavelet Applications in Signaland Image Processing VIII (San Diego, CA, 2000), Proc. SPIE , volume 4119, pages 434–444. SPIE,2000.[20] A. Flinth and M. Sch¨afer. Multivariate α -molecules. J. Approx. Theory , 202:64–108, 2016.[21] L. Grafakos.
Classical Fourier Analysis . Springer, 2nd edition, 2008.[22] P. Grohs. Ridgelet-type frame decompositions for Sobolev spaces related to linear transport.
J.Fourier Anal. Appl. , 18(2):309–325, 2012.[23] P. Grohs, S. Keiper, G. Kutyniok, and M. Sch¨afer. Cartoon approximation with α -curvelets. J.Fourier Anal. Appl. , 22(6):1235–1293, 2016.[24] P. Grohs, S. Keiper, G. Kutyniok, and M. Sch¨afer. α -Molecules. Appl. Comput. Harmon. Anal. ,41(1):297–336, 2016.[25] P. Grohs and G. Kutyniok. Parabolic molecules.
Found. Comput. Math. , 14(2):299–337, 2014.[26] P. Grohs and A. Obermeier. On the approximation of functions with line singularities by ridgelets.Technical Report 2016-4, Seminar for Applied Mathematics, ETH Z¨urich, Switzerland, 2016. Onlineavailable: .[27] P. Grohs and A. Obermeier. Optimal adaptive ridgelet schemes for linear advection equations.
Appl.Comput. Harmon. Anal. , 41(3):768–814, 2016.[28] K. Guo, G. Kutyniok, and D. Labate. Sparse multidimensional representations using anisotropicdilation and shear operators. In
Wavelets and Splines (Athens, GA, 2005) , pages 189–201. NashboroPress, Nashville, TN, 2006.[29] K. Guo and D. Labate. Optimally sparse multidimensional representation using shearlets.
SIAM J.Math. Anal. , 39(1):298–318, 2007.[30] K. Guo and D. Labate. The construction of smooth Parseval frames of shearlets.
Math. Model. Nat.Phenom. , 8(1):82–105, 2013.[31] W. Hackbusch, H. R. Schwarz, and E. Zeidler.
Teubner-Taschenbuch der Mathematik . B. G. TeubnerStuttgart, Leipzig, 1996.[32] S. Keiper. A flexible shearlet transform – sparse approximation and dictionary learning. Bachelor’sthesis, TU Berlin, Germany, 2012.[33] P. Kittipoom, G. Kutyniok, and W.-Q Lim. Construction of compactly supported shearlet frames.
Constr. Approx. , 35(1):21–72, 2012.[34] J. Krommweh. Image approximation by adaptive tetrolet transform. In
International conference onsampling theory and applications , Marseille, France, May 2009.[35] G. Kutyniok, D. Labate, W.-Q Lim, and G. Weiss. Sparse multidimensional representation usingshearlets. In
Wavelets XI (San Diego, CA, 2005), SPIE Proc. , volume 5914, pages 254–262. SPIE,Bellingham, WA, 2005.[36] G. Kutyniok, J. Lemvig, and W.-Q Lim. Optimally sparse approximations of 3D functions bycompactly supported shearlet frames.
SIAM J. Math. Anal. , 44(4):2962–3017, 2012.[37] G. Kutyniok and W.-Q Lim. Compactly supported shearlets are optimally sparse.
J. Approx. Theory ,163(11):1564–1589, 2011.8[38] E. Le Pennec and S. Mallat. Bandelet image approximation and compression.
Multiscale Model.Simul. , 4(3):992–1039, 2005.[39] E. Le Pennec and S. Mallat. Sparse geometric image representations with bandelets.
IEEE Trans.Image Process. , 14(4):423–438, 2005.[40] C. Lessig, P. Petersen, and M. Sch¨afer. Bendlets: a second-order shearlet transform with bentelements. 2016. submitted. arXiv:1607.05520 [math.FA].[41] A. Lisowska. Smoothlets – multiscale functions for adaptive representation of images.
IEEE Trans.Image Process. , 20(7):1777–1787, 2011.[42] A. Lisowska. Multiwedgelets in image denoising. In J. Park, J. Ng, H.-Y. Jeong, and B. Waluyo,editors,
Multimedia and Ubiquitous Engineering: MUE 2013 , pages 3–11. Springer Netherlands,Dordrecht, 2013.[43] S. Mallat.
A Wavelet Tour of Signal Processing: The Sparse Way . Academic Press, 2nd edition,2008.[44] S. Mallat. Geometrical grouplets.
Appl. Comput. Harmon. Anal. , 26(2):161–180, 2009.[45] R. M. Willet and R. D. Nowak. Platelets: a multiscale approach for recovering edges and surfacesin photon-limited medical imaging.