[PDF] Analysis of simultaneous inpainting and geometric separation based on sparse decomposition

Abstract

Natural images are often the superposition of various parts of different geometric characteristics. For instance, an image might be a mixture of cartoon and texture structures. In addition, images are often given with missing data. In this paper, we develop a method for simultaneously decomposing an image to its two underlying parts and inpainting the missing data. Our separation inpainting method is based on and l 1 minimization approach, using two dictionaries, each sparsifying one of the image parts but not the other. We introduce a comprehensive convergence analysis of our method, in a general setting, utilizing the concepts of joint concentration, clustered sparsity, and cluster coherence. As the main application of our theory, we consider the problem of separating and inpainting an image to a cartoon and texture parts.

Full PDF

AANALYSIS OF SIMULTANEOUS INPAINTING AND GEOMETRICSEPARATION BASED ON SPARSE DECOMPOSITION

VAN TIEP DO , , RON LEVIE , , GITTA KUTYNIOK , Abstract.

Natural images are often the superposition of various parts of diﬀerent geometriccharacteristics. For instance, an image might be a mixture of cartoon and texture structures. Inaddition, images are often given with missing data. In this paper we develop a method for simul-taneously decomposing an image to its two underlying parts and inpainting the missing data. Ourseparation–inpainting method is based on an l minimization approach, using two dictionaries,each sparsifying one of the image parts but not the other. We introduce a comprehensive conver-gence analysis of our method, in a general setting, utilizing the concepts of joint concentration,clustered sparsity, and cluster coherence. As the main application of our theory, we consider theproblem of separating and inpainting an image to a cartoon and texture parts. Introduction

A digital image typically has two or more distinct constituents, e.g., it might contain a cartoon andtexture components. A key question is whether we can separate the image into its two components.This task is of interest in many applications, such as compression and restoration [5, 28, 57]. Theseparation problem is underdetermined and seems to be impossible to stably solve. However, if wehave prior knowledge about the types of geometric components underlying the image, the separationtask is possible, as shown in previous papers [4, 23, 27, 32, 36]. Some approaches commonly used inthis context for image separation are variational methods, e.g., [28, 37] and PDE-based separationmethods [37].More recently, compressed sensing approaches showed that l minimization can stably and pre-cisely solve this problem both theoretically [10, 26, 31, 35, 40, 42] and empirically [4, 27]. Thecore idea is to use multiple dictionaries, each sparsely representing one geometric part of the image.As opposed to thresholding approaches like [22], which are harder to analyze, the l minimizationapproach comes with strong theoretical results. Another classical problem in imaging science is torestore corrupted or missing parts of images, namely image inpainting . Images are often damageddue to diﬀerent factors including improper storage, chemical processing, or losses of image dataduring transmission. The problem of image inpainting is of interest in numerous applications, fromthe restoration of scratched photos and corrupted images, to the removal of selected objects. Im-age inpainting is an ill-posed problem, so some prior information on the missing part is requiredhere as well. One approach that has been eﬀectively applied to this problem is total variational[44, 48, 50, 51]. Variational approaches work well on piecewise smooth images, but generally per-form poorly on images that contain a superposition of cartoon and texture. On the other hand,local statistical analysis and prediction have been shown to perform well at inpainting the texturecontent [46, 49]. In our analysis, we focus on a compressed sensing driven approach, based on theprior information that the components of the image are sparsely represented by two representationsystems.For an illustration of texture and cartoon parts of an image, we present in Figure 1 a corruptedphoto with missing part and region covered by texture. Department of Mathematics, Technische Universität Berlin, 10623 Berlin, Germany Department of Mathematics, Mechanics and Informatics, Vietnam National University, 334Nguyen Trai, Thanh Xuan, Hanoi Department of Mathematics, Ludwig-Maximilians-Universität München, Munich, Germany

E-mail address : [email protected], [email protected], [email protected] . a r X i v : . [ m a t h . F A ] S e p VAN TIEP DO, RON LEVIE, GITTA KUTYNIOK

Figure 1.

A corrupted photo, where the noise is a texture part together withmissing stripes, and the clean image is the cartoon part.1.1.

Separation and inpainting through cluster coherence.

The problem of inpainting andseparation of cartoon from texture was proposed in various papers, for instance [24, 33, 45]. Thegeneral approach is to ﬁnd two dictionaries, each sparsely representing one image part, and notsparsely representing the other. For example, texture is sparsely represented by a Gabor system,and cartoon by curvelets/shearlets. Then, in the separation and inpainting algorithm, the image isdecomposed to a sparse representation based on the combined Gabor-curvelet system, using somecompressed sensing approach. However, in the above-mentioned papers, there is no theoreticalanalysis of the success of the proposed methods. In this paper, we consider a similar setting to theabove papers, and give a full approximation analysis of the method, with convergence guarantees.Our analysis is based on the theoretical machinery of joint concentration and cluster coherence.These notions were used in the past for analyzing separation problems and inpainting problems,but never (to the best of or knowledge) for the simultaneous problem. Given two dictionaries,both joint concentration and cluster coherence are deﬁnitions that quantify the ability of eachdictionary to sparsely represent signals of one type, but not signals of another type. Some papersuse joint concentration to do inpainting [19, 25], some to do separation [10, 13]. Proving theoreticalresults is typically easier when using joint concentration. However, checking joint concentrationon dictionaries in practice is diﬃcult. Thus the notion of cluster coherence was introduced in [10],which is strictly stronger than joint concentration, and makes checking the assumptions of the theoryeasier in practical examples. Papers like [10, 13, 19, 25] use the notions of cluster coherence eitherfor separation or for inpainting, but not simultaneously.In this paper, we modify and extend the deﬁnitions of joint concentration to accommodate asimultaneous analysis of separation and inpainting. We then further extend the notion of clustercoherence to ﬁt the modiﬁed deﬁnitions of joint concentration. With the new general deﬁnitions, weconsider the problem of inpainting and separating texture from cartoon. We show that the universalshearlet systems [25] and Gabor systems [1] satisfy the cluster coherence requirement, thus provingthe success of our inapinting–separation method.1.2.

Our contribution.

We summarize our main contribution as follows. • We modify the joint concentration and cluster coherence deﬁnitions to accommodate thesimultaneous separation-inpainting problem (Section 3). • Using the modiﬁed notions of joint concentration and cluster coherence we prove that the l sparse decomposition method successfully inpaints and separates the two parts of an imagein a generic setting (Theorem 3.8). • We use the general theory to inpaint and separate cartoon from texture in images. Here, thesparse decomposition method is based on a Gabor basis, sparsifying texture, and a universalshearlet dictionary, sparsifying cartoons (Section 5). The two systems are shown to satisfythe cluster coherence property, thus proving the success of the inpainting-separation cartoonfrom texture method (Theorem 6.4). ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTING2.

Simultaneous inpainting and separation approach

In this section we summarize our general theory of simultaneous inpainting and separation.2.1.

The separation problem.

The task is to extract the two components C and T from theobserved image f , where we assume that f = C + T . (1)Here, only f is given, and the components C and T are unknown to us.To solve this underdetermined problem, we assume that each component can be sparsely rep-resented by some dictionary, that cannot sparsely represent the other component. In our theory,dictionaries are modeled as frames [38]. Deﬁnition 2.1.

Let I be a discrete index set. A sequence Φ = { φ i } i ∈ I in a separable Hilbert space H is called a frame for H if there exist constants < A ≤ B < ∞ such that A (cid:107) f (cid:107) ≤ (cid:88) i ∈ I |(cid:104) f, φ i (cid:105)| ≤ B (cid:107) f (cid:107) , ∀ f ∈ H , where A, B are called lower and upper frame bound. If A and B can be chosen to be equal, we callit ( A − ) tight frame . If A=B=1, { φ i } i ∈ I is called a Parseval frame .In our context, the Hilbert space H is the space of signals/images, and Φ is the dictionary. Byabusing notion, we also denote by Φ the synthesis operator Φ : l ( I ) → H , Φ( { a i } i ∈ I ) = (cid:88) i ∈ I a i φ i , which synthesizes an image f ∈ H from given coeﬃcients { a i } i ∈ I . We denote by Φ (cid:63) the analysisoperator Φ (cid:63) : H → l ( I ) , Φ (cid:63) ( f ) = ( (cid:104) f, φ i (cid:105) ) i ∈ I . The analysis operator is interpreted as the transform that computed the diﬀerent dictionary coeﬃ-cients of images f .2.2. The inpainting problem.

To model the inpainting problem, we suppose that there is somemissing data in the image f . Given the Hilbert space H , we assume that H = H K ⊕ H M , where thesubspaces H K , H M denote the known part and missing part respectively. Let P M and P K denotethe orthogonal projection of H upon these two subspaces, respectively. In the inpainting problem,we assume that we are only given the image in the known part P K f , and the goal is to reconstruct f . The inpainting problem is solved by considering a dictionary { φ i } i ∈ I that sparsely represents f using a known subset of indices Λ ⊂ I , but does not sparsely represent P M f using the indices Λ .This idea is formalized in Deﬁnition 3.2.2.3. The simultaneous separation–inpainting problem.

In the simultaneous problem, the goalis to extract C and T , satisfying (1), given only the image in the known part P K f . In our approach,we consider two Parseval frames Φ and Φ in H that sparsely represent their respective component C and T , but do not sparsely represent the other component. This is formalized through Deﬁnitions 3.3and 3.1 for the joint concentration approach, and 3.3 and 3.5 for the cluster coherence approach. Wemoreover suppose that the index subsets Λ and Λ of Φ and Φ respectively can sparsely represent C and T , but cannot sparsely represent P M T and P M C . This is formalized through Deﬁnitions 3.3and 3.2 for the joint concentration approach, and 3.3 and 3.5 for the cluster coherence approach.We consider the following algorithm, for simultaneously inpaint and separate (INP-SEP) geomet-ric components, based on l minimization. Algorithm (INP-SEP)INPUT: corrupted signal P K f ∈ H K , two Parseval frames { Φ } i ∈ I and { Φ } j ∈ J .COMPUTE: f (cid:63) = ( C (cid:63) , T (cid:63) ) where ( C (cid:63) , T (cid:63) ) = arg min x ,x (cid:107) Φ ∗ x (cid:107) + (cid:107) Φ ∗ x (cid:107) , s.t P K ( x + x ) = P K ( f ) . (2)OUTPUT: recovered components C (cid:63) , T (cid:63) . VAN TIEP DO, RON LEVIE, GITTA KUTYNIOKIn Section 3, we provide a theoretical analysis for the success of algorithm (2).3.

General separation and inpainting theory

In this section, we introduce a theory in which we can prove the success of Algorithm (INP-SEP)for a general separation and inpainting problem.3.1.

Joint concentration.

We propose a suﬃcient condition for the success of the (INP-SEP)algorithm, based on the notion of joint concentration. Joint concentration was ﬁrst introduced in[10]. We present two slightly diﬀerent notion of joint concentration, modiﬁed for our needs.

Deﬁnition 3.1.

Let Φ , Φ be two Parseval frames. Given two sets of coeﬃcients Λ , Λ , deﬁnethe mixed joint concentration κ = κ (Λ , Λ ) by κ = κ (Λ , Λ ) = sup x,y ∈H (cid:107) Λ Φ ∗ x (cid:107) + (cid:107) Λ Φ ∗ y (cid:107) (cid:107) Φ ∗ y (cid:107) + (cid:107) Φ ∗ x (cid:107) . (3)Given an l normalized signal f , it is common to interpret the l norm of f as a measure of spread(opposite of sparsity). Thus, the mixed joint concentration with respect to the coeﬃcient sets Λ and Λ quantiﬁes the extent to which signals can have most of their energy supported and wellspread in Λ j while being sparsely concentrated in the other frame Φ k , for j (cid:54) = k . Bounding κ fromabove is one of the conditions that ensure that Algorithm (INP-SEP) succeeds in the separationtask. Next, we introduce another joint concentration notion which will be used to prove the successof the inpainting method. Deﬁnition 3.2.

Let Φ , Φ be two Parseval frames. Given two sets of coeﬃcients Λ , Λ , deﬁnethe joint concentration of the missing part κ = κ (Λ , Λ ) by κ = κ (Λ , Λ ) = sup x,y ∈H ; P K x = P K y (cid:107) Λ Φ ∗ ( x − y ) (cid:107) + (cid:107) Λ Φ ∗ ( x − y ) (cid:107) (cid:107) Φ ∗ x (cid:107) + (cid:107) Φ ∗ y (cid:107) . The joint concentration of the missing part quantiﬁes the extent to which signals which coincideon the known part can have most of the energy of their diﬀerence supported and well spread in Λ , Λ while begin sparse in Φ and Φ . The joint concentration κ is thus used to encode thegeometric relation between the missing part H M and expansions in Φ and Φ . Bounding κ fromabove is another one of the conditions that ensure the success of Algorithm (INP-SEP).3.2. Recovery guarantee through joint concentration.

The suﬃcient condition for recoveryis based on ﬁnding sets of coeﬃcients Λ and Λ for which the joint concentrations are small, and Λ and Λ capture most of the energy of the ground truth separated components C and T . For thiswe, recall the following deﬁnition from [19]. Deﬁnition 3.3.

Fix δ > . Given a Hilbert space H with a Parseval frame Φ , f ∈ H is δ − relativelysparse in Φ with respect to Λ if (cid:107) Λ c Φ (cid:63) f (cid:107) ≤ δ, where Λ c denotes X \ Λ . We now prove our ﬁrst separation result.

Proposition 3.4.

For δ , δ > , ﬁx δ = δ + δ , and suppose that f ∈ H can be decomposed as f = C + T so that each component C , T is δ , δ − relatively sparse in Φ and Φ with respect to Λ and Λ , respectively. Let ( C (cid:63) , T (cid:63) ) solve (INP-SEP) . If we have κ + κ < , then (cid:107)C (cid:63) − C(cid:107) + (cid:107)T (cid:63) − T (cid:107) ≤ δ − κ + κ ) . (4) Proof.

First, we set κ = κ (Λ , Λ ) := sup P K u = P K v (cid:107) Λ Φ ∗ u (cid:107) + (cid:107) Λ Φ ∗ v (cid:107) (cid:107) Φ ∗ u (cid:107) + (cid:107) Φ ∗ v (cid:107) , and x = C (cid:63) − C , y = T − T (cid:63) . Since Φ and Φ are two Parseval frames, thus (cid:107)C (cid:63) − C(cid:107) + (cid:107)T (cid:63) − T (cid:107) = (cid:107) Φ ∗ ( C (cid:63) − C ) (cid:107) + (cid:107) Φ ∗ ( T (cid:63) − T ) (cid:107) ≤ (cid:107) Φ ∗ ( C (cid:63) − C ) (cid:107) + (cid:107) Φ ∗ ( T (cid:63) − T ) (cid:107) = (cid:107) Φ ∗ ( x ) (cid:107) + (cid:107) Φ ∗ ( y ) (cid:107) := S. ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTINGNow we invoke the relation P K ( C (cid:63) + T (cid:63) ) = P K ( f ) = P K ( C + T ) and hence P K ( x ) = P K ( y ) . By thedeﬁnition of κ, we have S = (cid:107) Λ Φ ∗ x (cid:107) + (cid:107) Λ Φ ∗ y (cid:107) + (cid:107) Λ c Φ ∗ ( C (cid:63) − C ) (cid:107) + (cid:107) Λ c Φ ∗ ( T (cid:63) − T ) (cid:107) ≤ κS + (cid:107) Λ c Φ ∗ C (cid:63) (cid:107) + (cid:107) Λ c Φ ∗ C(cid:107) + (cid:107) Λ c Φ ∗ T (cid:63) (cid:107) + (cid:107) Λ c Φ ∗ T (cid:107) ≤ κS + (cid:107) Λ c Φ ∗ C (cid:63) (cid:107) + (cid:107) Λ c Φ ∗ T (cid:63) (cid:107) + δ = κS + δ + (cid:107) Φ ∗ C (cid:63) (cid:107) + (cid:107) Φ ∗ T (cid:63) (cid:107) − (cid:107) Λ Φ ∗ C (cid:63) (cid:107) − (cid:107) Λ Φ ∗ T (cid:63) (cid:107) . We note that ( C (cid:63) , T (cid:63) ) is a minimizer of (INP-SEP). Thus (cid:107) Φ ∗ C (cid:63) (cid:107) + (cid:107) Φ ∗ T (cid:63) (cid:107) ≤ (cid:107) Φ ∗ C(cid:107) + (cid:107) Φ ∗ T (cid:107) . Therefore, S ≤ κS + δ + (cid:107) Φ ∗ C(cid:107) + (cid:107) Φ ∗ T (cid:107) − (cid:107) Λ Φ ∗ C (cid:63) (cid:107) − (cid:107) Λ Φ ∗ T (cid:63) (cid:107) ≤ κS + δ + (cid:107) Φ ∗ C(cid:107) + (cid:107) Φ ∗ T (cid:107) + (cid:107) Λ Φ ∗ x (cid:107) + (cid:107) Λ Φ ∗ y (cid:107) − (cid:107) Λ Φ ∗ C(cid:107) −(cid:107) Λ Φ ∗ T (cid:107) ≤ κS + 2 δ + κS = 2 κS + 2 δ. Thus , S ≤ δ − κ . The last inequality comes from the fact that (cid:107) Λ Φ ∗ x (cid:107) + (cid:107) Λ Φ ∗ y (cid:107) (cid:107) Φ ∗ x (cid:107) + (cid:107) Φ ∗ y (cid:107) ≤ (cid:107) Λ Φ ∗ y (cid:107) + (cid:107) Λ Φ ∗ x (cid:107) (cid:107) Φ ∗ x (cid:107) + (cid:107) Φ ∗ y (cid:107) + (cid:107) Λ Φ ∗ ( x − y ) (cid:107) + (cid:107) Λ Φ ∗ ( x − y ) (cid:107) (cid:107) Φ ∗ x (cid:107) + (cid:107) Φ ∗ y (cid:107) . This leads to κ ≤ κ + κ . Finally, we obtain (cid:107)C (cid:63) − C(cid:107) + (cid:107)T (cid:63) − T (cid:107) ≤ δ − κ + κ ) . (cid:3) Recovery guarantee through cluster coherence.

Deriving bounds for joint concentrationscan be diﬃcult in practice. Our goal in this subsection is to replace the joint concentrations by easierto compute terms. For that, we recall the notion of cluster coherence. We then prove that jointconcentrations can be bounded by cluster coherence terms. Deriving bounds to cluster coherenceterms is generally easier than bounding joint concentrations. The following deﬁnition is taken from[10].

Deﬁnition 3.5.

Given two Parseval frames Φ = (Φ i ) i ∈ I and Φ = (Φ j ) j ∈ J . Then the clustercoherence µ c (Λ , Φ ; Φ ) of Φ and Φ with respect to the index set Λ ⊂ I is deﬁned by µ c (Λ , Φ ; Φ ) = max j ∈ J (cid:88) i ∈ Λ |(cid:104) φ i , φ j (cid:105)| in case this maximum exists.We allow applying a projection P M on one or both of the frames in Deﬁnition 3.5. For example,similarly to [19], we use the notation µ c (Λ , P M Φ ; P M Φ ) to denote µ c (Λ , P M Φ ; P M Φ ) = max j ∈ J (cid:88) i ∈ Λ |(cid:104) P M φ i , P M φ j (cid:105)| . Next, we present two lemmas which bound the joint concentrations by corresponding clustercoherences.

Lemma 3.6.

We have κ (Λ , Λ ) ≤ max { µ c (Λ , Φ ; Φ ) , µ c (Λ , Φ ; Φ ) } . VAN TIEP DO, RON LEVIE, GITTA KUTYNIOK

Proof.

For x, y ∈ H , we set α = Φ ∗ y, α = Φ ∗ x . Then invoking the fact that Φ and Φ are twoParseval frames, hence x = Φ Φ ∗ x = Φ α and y = Φ Φ ∗ y = Φ α , we have (cid:107) Λ Φ ∗ x (cid:107) + (cid:107) Λ Φ ∗ y (cid:107) = (cid:107) Λ Φ ∗ Φ α (cid:107) + (cid:107) Λ Φ ∗ Φ α (cid:107) ≤ (cid:88) i ∈ Λ (cid:16) (cid:88) j |(cid:104) Φ i , Φ j (cid:105)|| α j | (cid:17) + (cid:88) j ∈ Λ (cid:16) (cid:88) i |(cid:104) Φ i , Φ j (cid:105)|| α i | (cid:17) = (cid:88) j (cid:16) (cid:88) i ∈ Λ |(cid:104) Φ i , Φ j (cid:105)| (cid:17) | α j | + (cid:88) i (cid:16) (cid:88) j ∈ Λ |(cid:104) Φ i , Φ j (cid:105)| (cid:17) | α i |≤ µ c (Λ , Φ ; Φ ) (cid:107) α (cid:107) + µ c (Λ , Φ ; Φ ) (cid:107) α (cid:107) ≤ max { µ c (Λ , Φ ; Φ ) , µ c (Λ , Φ ; Φ ) } ( (cid:107) α (cid:107) + (cid:107) α (cid:107) )= max { µ c (Λ , Φ ; Φ ) , µ c (Λ , Φ ; Φ ) } ( (cid:107) Φ ∗ x (cid:107) + (cid:107) Φ ∗ y (cid:107) ) . Thus, we obtain κ (Λ , Λ ) ≤ max { µ c (Λ , Φ ; Φ ) , µ c (Λ , Φ ; Φ ) } . (cid:3) Lemma 3.7.

We have κ (Λ , Λ ) ≤ max { µ c (Λ , P M Φ ; P M Φ ) + µ c (Λ , P M Φ ; P M Φ ) ,µ c (Λ , P M Φ ; P M Φ ) + µ c (Λ , P M Φ ; P M Φ ) } = max { µ c (Λ , P M Φ ; Φ ) + µ c (Λ , P M Φ ; Φ ) ,µ c (Λ , P M Φ ; Φ ) + µ c (Λ , P M Φ ; Φ ) } . Proof.

For each h ∈ H M and x ∈ H , set α = Φ ∗ ( x + h ) and β = Φ ∗ x then we have h = x + h − x = Φ Φ ∗ ( x + h ) − Φ Φ ∗ ( x )= Φ α − Φ β. Note that P K is an orthogonal projection, and hence h = P ∗ M P M h = P ∗ M P M Φ α − P ∗ M P M Φ β. Therefore, we obtain (cid:107) Λ Φ ∗ h (cid:107) = (cid:107) Λ Φ ∗ P ∗ M P M Φ α − Λ Φ ∗ P ∗ M P M Φ β (cid:107) = (cid:107) Λ ( P M Φ ) ∗ ( P M Φ ) α − Λ ( P M Φ ) ∗ ( P M Φ ) β (cid:107) ≤ (cid:107) Λ ( P M Φ ) ∗ ( P M Φ ) α (cid:107) + (cid:107) Λ ( P M Φ ) ∗ ( P M Φ ) β (cid:107) ≤ (cid:88) i ∈ Λ (cid:16) (cid:88) j |(cid:104) P M Φ i , P M Φ j (cid:105)|| α j | (cid:17) + (cid:88) i ∈ Λ (cid:16) (cid:88) j |(cid:104) P M Φ i , P M Φ j (cid:105)|| β j | (cid:17) ≤ (cid:88) j (cid:16) (cid:88) i ∈ Λ |(cid:104) P M Φ i , P M Φ j (cid:105)| (cid:17) | α j | + (cid:88) j (cid:16) (cid:88) i ∈ Λ |(cid:104) P M Φ i , P M Φ j (cid:105)| (cid:17) | β j |≤ µ c (Λ , P M Φ ; P M Φ ) (cid:107) α (cid:107) + µ c (Λ , P M Φ ; P M Φ ) (cid:107) β (cid:107) = µ c (Λ , P M Φ ; P M Φ ) (cid:107) Φ ∗ ( x + h ) (cid:107) + µ c (Λ , P M Φ ; P M Φ ) (cid:107) Φ ∗ ( x ) (cid:107) . Similarly, we have (cid:107) Λ Φ ∗ h (cid:107) ≤ µ c (Λ , P M Φ ; P M Φ ) (cid:107) Φ ∗ ( x + h ) (cid:107) + µ c (Λ , P M Φ ; P M Φ ) (cid:107) Φ ∗ ( x ) (cid:107) . This leads to (cid:107) Λ Φ ∗ h (cid:107) + (cid:107) Λ Φ ∗ h (cid:107) ≤ max { µ c (Λ , P M Φ ; P M Φ ) + µ c (Λ , P M Φ ; P M Φ ) ,µ c (Λ , P M Φ ; P M Φ ) + µ c (Λ , P M Φ ; P M Φ ) } ( (cid:107) Φ ∗ ( x + h ) (cid:107) + (cid:107) Φ ∗ ( x ) (cid:107) ) . ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTINGFinally, κ (Λ , Λ ) ≤ max { µ c (Λ , P M Φ ; P M Φ ) + µ c (Λ , P M Φ ; P M Φ ) ,µ c (Λ , P M Φ ; P M Φ ) + µ c (Λ , P M Φ ; P M Φ ) } = max { µ c (Λ , P M Φ ; Φ ) + µ c (Λ , P M Φ ; Φ ) ,µ c (Λ , P M Φ ; Φ ) + µ c (Λ , P M Φ ; Φ ) } . (cid:3) We can now formulate a general guarantee for the success of Algorithm (INP-SEP), based oncluster coherence.

Theorem 3.8.

Let Φ , Φ be two Parseval frames for a Hilbert space H . For δ , δ > , ﬁx δ = δ + δ , and suppose that f ∈ H can be decomposed as f = C + T so that each component C , T is δ , δ − relatively sparse in Φ and Φ with respect to Λ , Λ respectively. Let ( C (cid:63) , T (cid:63) ) solve (INP-SEP) . If we have κ + κ < , then (cid:107)C (cid:63) − C(cid:107) + (cid:107)T (cid:63) − T (cid:107) ≤ δ − µ c , (5) where µ c = max { µ c (Λ , P M Φ ; Φ ) + µ c (Λ , P M Φ ; Φ ) , µ c (Λ , P M Φ ; Φ ) + µ c (Λ , P M Φ ; Φ ) } + max { µ c (Λ , Φ ; Φ ) , µ c (Λ , Φ ; Φ ) } .Proof. The bound (5) holds as a consequence of Lemmas 3.6, 3.7 and Proposition 3.4. (cid:3)

Let us interpret this estimate. The relative sparsity δ and the cluster coherence µ c depend on thegeometric sets of indices Λ , Λ , which are not used at all in Algorithm (INP-SEP). Hence, Λ , Λ are analytic tools that we can choose arbitrarily, and are used only for deriving theoretical bounds.If we choose very large Λ , Λ , we get very small relative sparsities δ , δ , but we might loose controlover the cluster coherence µ c . Therefore, choosing appropriate Λ and Λ is an important step whenapplying our theory in concrete examples.4. Mathematical models of texture and cartoon

We now focus on the speciﬁc problem of inpainting and separating texture from cartoon. In thissection we deﬁne our models of texture and cartoon parts, and of the missing part.4.1.

Model of texture.

One of the earliest qualitative texture description that corresponds tohuman visual perception is: coarseness, contrast, directionality, line-likeness, regularity, roughness[41]. From a mathematical modeling point of view, many deﬁnitions of texture were proposed inthe past, for instance [13, 30, 39, 43]. Our model for texture is inspired by [13], and based on anexpansion of Gabor frame elements. To motivate our deﬁnition of texture we oﬀer the followingdiscussion. A very restrictive deﬁnition of texture would be a periodical signal with a short period.The Fourier transform of a periodical signal is a signal supported on a delta train on a regular gridin the frequency domain. To relax the hard periodicity condition, suppose that we allow to perturbthe locations of the points of the regular frequency grid. This would result in a signal supportedon a sparse set of Fourier coeﬃcients, which is exactly our deﬁnition of texture. Such a deﬁnitionindeed produces images that look like a repeating pattern, but not in a strictly periodical fashion(see Figure 2 for example).Before formally deﬁning texture, we recall the

Schwartz functions or the rapidly decreasing func-tions S ( R ) := (cid:110) f ∈ C ∞ ( R ) | ∀ K, N ∈ N : sup x ∈ R (1 + | x | ) − N/ (cid:88) | α |≤ K | D α f ( x ) | < ∞ (cid:111) . (6)We deﬁne the Fourier transform and inverse Fourier transform for f, F ∈ S ( R ) by ˆ f ( ξ ) = F [ f ]( ξ ) = (cid:90) R f ( x ) e πix T ξ dx, ˇ F ( x ) = F − [ F ]( x ) = (cid:90) R F ( ξ ) e πiξ T x dx, VAN TIEP DO, RON LEVIE, GITTA KUTYNIOKwhich can be extended to a well deﬁne Fourier transform and inverse Fourier transform for functionsin L ( R ) (cf. [56]). Using a window function g : R → R which localizes a texture patch in thespatial domain, we now introduce our model for texture. Deﬁnition 4.1.

Let g ∈ L ( R ) be a window with ˆ g ∈ C ∞ ( R ) , and frequency support supp ˆ g ∈ [ − , satisfying the partition of unity condition (cid:88) n ∈ Z | ˆ g ( ξ + n ) | = 1 , ξ ∈ R . (7)For s > , we deﬁne the L − normalized scaled version of g by g s ( x ) = s · g ( sx ) . Let I T ⊆ Z , be asubset of Fourier elements. A texture is deﬁned by T s ( x ) = (cid:88) n ∈ I T d n g s ( x ) e πix T sn , (8)where ( d n ) n ∈ Z denotes a sequence of complex numbers.In Subsection 6.3 we add a restriction on the size of the index set I T . For now, we just mentionthat I T is a small/sparse set in some sense. We also remark that by Deﬁnition 4.1, we have ˆ g s ( ξ ) = s − · ˆ g ( s − ξ ) , supp ˆ g s ⊆ [ − s, s ] and the partition of unity condition now reads (cid:88) n ∈ Z | ˆ g s ( ξ + sn ) | = s − , ξ ∈ R . (9) Figure 2.

A texture sample produced by randomly choosing a sparse set ofFourier coeﬃcients with random values, with the rest of the Fourier coeﬃcients setto zero.4.2.

The local cartoon patch.

In [52, 53], cartoon functions are deﬁned as C = f + f · B τ , (10)where f , f ∈ L ( R ) ∩ C β ( R ) , β ∈ (0 , + ∞ ) , with compact support and B τ denotes the interior ofa closed, non-intersecting curve τ in C β ( R ) . In the separation algorithm of cartoon and texture,we analyze the input image using a Gabor frame and the so-called universal shearlet frame (seeSubsection 5). Both of these frames analyze images on local patches. Thus, for the sake of simplicity,we also reduce our model of the cartoon part to a local cartoon model. This is done in two steps.Cartoon images are locally close to piecewise constant functions with discontinuity along an edgecurve, which is locally close to a line. We thus consider the windowed step function w S ( x ) = (cid:40) w ( x ) x ≤ x > , (11)where w ∈ C ∞ ( R ) is a weighted function satisfying w (cid:54)≡ , ≤ w ( u ) ≤ and supp w ⊂ [ − ρ, ρ ] . Inour work, the cartoon part is analyzed using universal shearlets, which is a shear invariant systemup to the cone adaptation. Since shearing is a way of changing orientation, our analysis also appliesto the more general case where the discontinuity in (11) is along a general line in any orientation. ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTINGWe note that it is possible to extend our results from the local cartoon model (11) to the globalcartoon model (10) by using a tubular neighborhood argument as shown in [13].The cartoon part (11) can be seen as a tempered distribution acting on

Schwartz functions by (cid:104) w S , f (cid:105) = (cid:90) ρ − ρ w ( x ) (cid:90) −∞ f ( x , x ) dx dx , f ∈ S ( R ) . (12)The following lemma is used for computing the Fourier transform of w S . Lemma 4.2.

We have (cid:90) −∞ e − πiωx dx = 12 (cid:104) δ ( ω ) + iπω (cid:105) . By Lemma 4.2, we can compute the Fourier transform of w S by (cid:104) (cid:100) w S , f (cid:105) = (cid:104) w S , ˆ f (cid:105) = (cid:90) ρρ w ( x ) (cid:90) −∞ (cid:16) (cid:90) R f ( ξ ) e − πi ( ξ x + ξ x ) dξ (cid:17) dx dx = (cid:90) R ˆ w ( ξ ) (cid:16) (cid:90) −∞ e − πiξ x dx (cid:17) f ( ξ ) dξ = (cid:90) R ˆ w ( ξ ) (cid:16) δ ( ξ ) + iπξ (cid:17) f ( ξ ) dξ, f ∈ L ( R ) . (13)By (13) , we obtain (cid:100) w S ( ξ ) = ˆ w ( ξ ) (cid:16) δ ( ξ ) + iπξ (cid:17) . Now, we modify the local cartoon part to get an image in L ( R ) . We note that we are interestedin the local behaviour of w S about ( x , x ) = 0 . Thus, the “DC part” of (cid:100) w S is not of interest tous, as it models some global asymptotic behaviour of the patch in R . We thus ﬁlter our the band | ξ | < r from (cid:100) w S ( ξ , ξ ) for some arbitrarily small r > , to obtain (cid:99) w S = {| ξ |≥ r } (cid:100) w S . (14)The main change incurred by this ﬁltering of w S is a “translation along the y axis w S (cid:55)→ w S − DC ”,which does not aﬀect the analysis via shearlet frames. This ﬁltering changes the behaviour of w S about ( x , x ) = 0 negligibly, since r is small, and retains the quality of w S being approximatelypiecewise constant locally with a line discontinuity (see Figure 3). Henceforth, we call the localcartoon part w S simply a cartoon part. Figure 3.

A local cartoon patch.4.3.

The missing part.

In our analysis, the shape of the missing region is chosen to be M h = { x = ( x , x ) ∈ R | | x | ≤ h } , for h > , (15)and the orthogonal projection associated with this missing part is P M = M h . Figure 4.

The missing part (grey) at scale j .Note that this models a local and axis aligned missing part, corresponding to the local model oftexture (8) and the local and axis aligned cartoon model (14). When combining and re-orientingthe local patches, we can obtain many missing stripes at various orientations. Moreover, in practice,the missing part can be any domain contained in M h of (15).One practical example where the missing part is of the form (15) is seismic data, where the imageis commonly incomplete due to missing or faulty sensor or land development causing white strips[16, 17]. 5. Sparsifying systems for texture and cartoon

In this section, we choose frames that sparsely represent the texture and cartoon parts. Torepresent texture, it is clear from Deﬁnition 4.1 that a Gabor frame is a natural choice. For thecartoon part, among popular sparsifying representation systems such as wavelets, curvelets, andshearlets, it was shown in [15] that shearlets outperform not only wavelets, but also most otherdirectional sparsifying systems for inpainting larger gap size [19, 25].We hence choose the following sparse representation system. • Gabor frame - a tight frame with time-frequency balanced elements • Universal shearlet frame - a directional tight frame.5.1.

Gabor frame.

Gabor frames are deﬁned as follows.

Deﬁnition 5.1.

Denote for x, y ∈ R and g ∈ L ( R ) g ( x, y )( t ) = g ( t − x ) e πit T y . For a, b > we call the collection of functions { g ( ma, nb ) | m, n ∈ Z } a Gabor system . Such asystem is called a

Gabor frame if it forms a frame.For our Gabor system, we use a window ˜ g that satisﬁes the assumptions of the window g of texture(Deﬁnition 4.1). We note that ˜ g and g may be diﬀerent in general. However, the analysis for ˜ g (cid:54) = g is almost identical to the analysis in case g = ˜ g . Without loss of generality, we assume henceforththat both Deﬁnitions 4.1 and 5.1 are based on the same window g , satisfying the assumptions ofDeﬁnition 4.1.For each scaling factor s > , we consider the Gabor tight frame G s = { ( g s ) λ ( x ) } λ . This frameis represented in the frequency domain as (ˆ g ) λ ( ξ ) = ˆ g s ( ξ − sn ) e πiξ T m s , (16)where λ = ( m, n ) is the spatial and frequency position and the parameter s denotes the band-size.This system constitutes a tight frame for L ( R ) , see [11] for more details.1 ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTING5.2. The universal-scaling shearlet frame.

Shearlets were ﬁrst introduced in [6] as a represen-tation system extending the wavelet framework. They are a directional representation system withsimilar optimal approximation properties as curvelets, but with the advantage of allowing for a uni-ﬁed treatment of the continuum and digital domains. Extending the shearlet system, α -shearlets,ﬁrst introduced in [20], can be regarded as a parametrized family ranging from wavelets to shear-lets. In [52, 53] it was shown that α − shearlets provide optimally sparse approximations for cartoonimages deﬁned as piecewise C /α − functions, separated by a C /α singularity curve. A further ex-tension is universal shearlets [25], which is motivated by [29], with the aim at constructing a typeof α − shearlets that forms a Parseval frame. In our setting we choose universal shearlets as thesparsifying system of the cartoon part, since they form a Parseval frame, a necessary condition inour theory.Let φ be a function in S ( R ) satisfying ≤ ˆ φ ( u ) ≤ for u ∈ R , ˆ φ ( u ) = 1 for u ∈ [ − / , / and supp ˆ φ ⊂ [ − / , / . Deﬁne the low pass function Φ( ξ ) and the corona scaling functions for j ∈ N and ξ = ( ξ , ξ ) ∈ R , ˆΦ( ξ ) := ˆ φ ( ξ ) ˆ φ ( ξ ) ,W ( ξ ) := (cid:113) ˆΦ (2 − ξ ) − ˆΦ ( ξ ) , W j ( ξ ) := W (2 − j ξ ) . (17) Figure 5.

Frequency tiling of a cone-adapted shearlet.It is easy to see that we have the partition of unity property ˆΦ ( ξ ) + (cid:88) j ≥ W j ( ξ ) = 1 , ξ ∈ R . (18)Next, we use a bump-like function v ∈ C ∞ ( R ) to produce the directional scaling feature of thesystem. Suppose supp ( v ) ⊂ [ − , and | v ( u − | + | v ( u ) | + | v ( u + 1) | = 1 for u ∈ [ − , . Deﬁnethe horizontal frequency cone and the vertical frequency cone C (h) := (cid:110) ( ξ , ξ ) ∈ R | (cid:12)(cid:12)(cid:12) ξ ξ (cid:12)(cid:12)(cid:12) ≤ (cid:111) (19)and C (v) := (cid:110) ( ξ , ξ ) ∈ R | (cid:12)(cid:12)(cid:12) ξ ξ (cid:12)(cid:12)(cid:12) ≤ (cid:111) . (20)Deﬁne the cone functions V (h) , V (v) by V (h) ( ξ ) := v (cid:16) ξ ξ (cid:17) , V (v) ( ξ ) := v (cid:16) ξ ξ (cid:17) . (21)The shearing and scaling matrix are deﬁned by A α, (h) := (cid:20)

00 2 α (cid:21) , S (h) := (cid:20) (cid:21) , (22)2 VAN TIEP DO, RON LEVIE, GITTA KUTYNIOK A α, (v) := (cid:20) α

00 2 (cid:21) , S (v) := (cid:20) (cid:21) , (23)where α ∈ ( −∞ , is the scaling parameter . The following two deﬁnitions are taken from [25]. Deﬁnition 5.2.

Let Φ , W, v be deﬁned as above. For α ∈ ( −∞ , , k ∈ Z , we deﬁne(1) Coarse scaling functions : ψ − ,k ( x ) := Φ( x − k ) , k ∈ Z , x ∈ R . (2) Interior shearlets : let j ∈ N , l ∈ Z such that | l | < (2 − α ) j , k ∈ Z and ( ι ) ∈ { (h) , (v) } .Then we deﬁne ψ α, ( ι ) j,l,k ( x ) by its Fourier transform ˆ ψ α, ( ι ) j,l,k ( ξ ) := 2 − (2+ α ) j/ W j ( ξ ) V ( ι ) (cid:16) ξ T A − jα, ( ι ) S − l ( ι ) (cid:17) e − πiξ T A − jα, ( ι ) S − l ( ι ) k , ξ ∈ R . (3) Boundary shearlets : let j ∈ N , and l = ±(cid:100) (2 − α ) j (cid:101) we deﬁne ˆ ψ α, (b) j,l,k ( ξ ) :=  − (2+ α ) j/ − / W j ( ξ ) V (h) (cid:16) ξ T A − jα, (h) S − l (h) (cid:17) e − πiξ T A − jα, (h) S − l (h) k , ξ ∈ C (h) , − (2+ α ) j/ − / W j ( ξ ) V (v) (cid:16) ξ T A − jα, (v) S − l (v) (cid:17) e − πiξ T A − jα, (v) S − l (v) k , ξ ∈ C (v) , and in the case j = 0 , l = ± , we deﬁne ˆ ψ α, (b)0 ,l,k ( ξ ) :=  W ( ξ ) V (h) (cid:16) ξ T S − l (h) (cid:17) e − πiξ T k , ξ ∈ C (h) W ( ξ ) V (v) (cid:16) ξ T S − l (v) (cid:17) e − πiξ T k , ξ ∈ C (v) . Deﬁnition 5.3.

Let ( α j ) j ∈ N ⊂ R be a scaling sequence , i.e, α j ∈ Z j := (cid:110) mj | m ∈ Z , m ≤ j − (cid:111) . (24)We deﬁne the associated universal-scaling shearlet system , or shorter, universal shearlet system , by Ψ = SH ( φ, v, ( α j ) j ) := SH Low ( φ ) ∪ SH Int ( φ, v, ( α j ) j ) ∪ SH Bound ( φ, v, ( α j ) j ) , where SH Low ( φ ) := { ψ − ,k ( x ) | k ∈ Z } SH Int ( φ, v, ( α j ) j ) := (cid:110) ψ α j , ( ι ) j,l,k ( x ) | j ∈ N , l ∈ Z , | l | < (2 − α j ) j , k ∈ Z , ( ι ) ∈ { (h) , (v) } (cid:111) SH Bound ( φ, v, ( α j ) j ) := (cid:110) ψ α j , (b) j,l,k ( x ) | j ∈ N , l ∈ Z , | l | = ± (2 − α j ) j , k ∈ Z (cid:111) . In [25] it was shown that { Ψ } η , η = ( j, l, k ; α j , ( ι )) constitutes a Parseval frame for L ( R ) . Theuniversal shearlet system is based on discrete α j ∈ Z j in order for the set of sheared cone functionsto exactly cover the horizontal and vertical frequency cones. We note that for real α ∈ ( −∞ , wecan choose ( α j ) j that provide the best possible approximation of α , i.e, α j := arg min ˜ α j ∈ Z j | ˜ α j − α | , j ≥ . (25)This implies that α j j = Θ(2 αj ) and lim α j = α as j → ∞ . Indeed, we can show that α j = (cid:98) jα + 0 . (cid:99) j . (26)For our model, we consider universal shearlets with an arbitrary scaling sequence ( α j ) j ∈ N chosenas in (26) with α ∈ (0 , .6. Inpainting and separation of cartoon and texture

In this section we formulate our main result on inpainting and separation of cartoon and texture.3 ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTING6.1.

Multi-scale separation and inpainting.

In our analysis, the separation problem is analyzedin a multi-scale setting. What we show is that the inpainting and separation method is more accuratefor smaller scale components of the image, as long as the size of the missing part gets smaller inscale. This is the same approach taken by [25] for the problem of inpainting.Consider the window function W deﬁned in (17). We construct a family of frequency ﬁlters F j with the Fourier representation ˆ F j ( ξ ) := W j ( ξ ) = W ( ξ/ j ) , ∀ j ≥ , ξ ∈ R , (27)and in the case j = 0 ˆ F ( ξ ) := Φ( ξ ) , ξ ∈ R . Notice that W j is compactly supported in the corona A j := [ − j − , j − ] \ [ − j − , j − ] , ∀ j ≥ . (28)We now consider the texture and cartoon parts at diﬀerent scales by ﬁltering them with F j . Forthe patch size of texture, we typically use a ﬁxed size s for all scales j . However, we also allow s to depend on the scale j to make the setting more ﬂexible. We hence consider a variable s j , andassociated with s j the domain A s,j , ∀ j ≥ , deﬁned by A s,j = { ξ ∈ R : s j ξ ∈ A j } . (29)We split the components C and T s into pieces of diﬀerent scales C j = F j (cid:63) C and T s,j = F j (cid:63) T s , where T s is the texture of patch size s = s j . The total signal at scale j is deﬁned to be f j = C j + T s,j . As mentioned above, we typically pick a constant s j = s . In the constant s case, the pieces f j can be ﬁltered directly from f , namely, f j = F j (cid:63) f . We moreover have by (18) f = (cid:88) j f j . (30)In the general case, the Fourier transform ˆ f j of f j is supported in the annulus with inner radius j − and outer radius j − .Now, at each scale j we consider the simultaneous inpainting and separation problem of extracting C j and T s,j from (1 − P j ) f j = (1 − P j ) C j + (1 − P j ) T s,j . (31)As in [25], we assume that at scale j the size of the missing part h j depends on j . Denote by P j thecharacteristic function of the missing part at scale j , i.e, P j = {| x |≤ h j } . (32)After inpainting and decomposing each scale component (1 − P j ) f j into texture and cartoon com-ponents T s,j and C j , the total components T s and C can be reconstructed similarly to (30) , in case s is constant. Since generally s = s j depends on j , we use diﬀerent Gabor frames to solve (31) fordiﬀerent scales. We denote by G j the Gabor tight frame associated with s = s j , i.e, G j := G s j = { ( g s j ) ( m,n ) ( x ) } ( m,n ) ∈ Z . (33)In the analysis of the next subsections, we need the following representation of P j in the frequencydomain. Lemma 6.1.

We have ˆ P j ( ξ ) = 2 h j sinc(2 h j ξ ) δ ( ξ ) . Proof.

By deﬁnition, we have ˆ P j ( ξ ) = (cid:90) R P j ( x ) e − πix T ξ dx = (cid:16) (cid:90) h j − h j e − πix ξ dx (cid:17)(cid:16) (cid:90) R e − πix ξ dx (cid:17) = 2 h j sinc(2 h j ξ ) δ ( ξ ) . (cid:3) Balancing the texture and cartoon parts.

It is important to avoid the trivial case whereone of the parts, cartoon or texture, is much larger than the other. This would lead to the trivialseparation outcome in which the whole image is taken as the estimation of the larger part, and thesmaller part is estimated as zero. We thus suppose that the ﬁltered components C j and T s,j havecomparable magnitudes at each scale.Consider the sub-band components w S j = F j ∗ w S , (34)and T s,j = F j ∗ T s , (35)where w S is deﬁned in (14) and texture is in Deﬁnition 4.1 . The following claim formulates the energy balancing condition . Claim 6.2.

Consider the cartoon patch and texture with sub-bands deﬁned in (34), (35), respec-tively. We have (cid:107) w S j (cid:107) ∼ − j , j → ∞ , and (cid:107)T s,j (cid:107) ∼ (cid:88) n ∈ Z ∩A s,j | d n | . Therefore, energy balance is achieved for c − j ≤ (cid:88) n ∈ I T ∩A s,j | d n | ≤ c − j , c , c > . (36) Proof.

We present the following sketch of the proof. For W suﬃcient nice we have (cid:107) w S j (cid:107) = (cid:90) ξ ∈ R , | ξ |≥ r | ˆ w ( ξ ) | · W (2 − j ξ ) · (cid:12)(cid:12)(cid:12) δ ( ξ ) + iπξ (cid:12)(cid:12)(cid:12) dξ ∼ (cid:90) ξ ∈A j , | ξ |≥ r ξ · | ˆ w ( ξ ) | dξ ∼ (cid:90) ξ ∈A j , | ξ |≥ j − ξ | ˆ w ( ξ ) | dξ + (cid:90) ξ ∈A j , j − ≥| ξ |≥ r ξ | ˆ w ( ξ ) | dξ . Note that by (28), A j = [ − j − , j − ] \ [ − j − , j − ] . Thus, { ξ ∈ A j , j − ≥ | ξ | ≥ r } ⊂ { ξ ∈A j , | ξ | ≥ j − } . We now use the rapid decay of ˆ w sup | ξ |≥ j − | ˆ w ( ξ ) | ≤ C N (cid:104)| j |(cid:105) − N , ∀ N ∈ N . This leads to (cid:90) ξ ∈A j , j − ≥| ξ |≥ r ξ | ˆ w ( ξ ) | dξ ≤ C N · j r · j (cid:104)| j |(cid:105) − N , ∀ N ∈ N . (37)In addition, (cid:90) ξ ∈A j , | ξ |≥ j − ξ | ˆ w ( ξ ) | dξ ∼ C · − j . (38)Combining (37) and (38), we ﬁnally obtain (cid:107) w S j (cid:107) ∼ − j . (39)On the other hand, we have T s = (cid:88) n ∈ I T d n g s j ( x ) e πi (cid:104) s j n,x (cid:105) . This leads to ˆ T s = (cid:88) n ∈ I T d n ˆ g s j ( ξ − s j n ) . (40)5 ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTINGNow, using the change of variable ω = ξs j we get (cid:107)T s,j (cid:107) = (cid:107) ˆ T s,j (cid:107) = (cid:88) n, ˜ n ∈ I T (cid:90) d n d ˜ n W ( ξ/ j )ˆ g s j ( ξ − s j n )ˆ g s j ( ξ − s j ˜ n ) dξ = s j · (cid:88) | n − ˜ n |≤ n, ˜ n ∈ I T (cid:90) d n d ˜ n W ( s j ω/ j )ˆ g s j ( s j ( ω − n ))ˆ g s j ( s j ( ω − ˜ n )) dω = (cid:88) | n − ˜ n |≤ n, ˜ n ∈ I T (cid:90) d n d ˜ n W ( s j ω/ j )ˆ g ( ω − n )ˆ g ( ω − ˜ n ) dω. Note that for each n , there exist a ﬁnite number of ˜ n ’s independent of j satisfying | n − ˜ n | ≤ , and since W is suﬃciently nice, (cid:90) W ( s j ω/ j )ˆ g ( ω − n )ˆ g ( ω − ˜ n ) dω ∼ (cid:90) A s,j ˆ g ( ω − n )ˆ g ( ω − ˜ n ) dω. Thus, we obtain (cid:107)T s,j (cid:107) ∼ (cid:88) | n − ˜ n |≤ n ∈ I T ∩A s,j (cid:90) A s,j d n d ˜ n ˆ g ( ω − n )ˆ g ( ω − ˜ n ) dω ∼ (cid:88) n ∈ I T ∩A s,j (cid:90) A s,j | d n | | ˆ g ( ω − n ) | dω. Up to a small set of coeﬃcients, we assume that the support of ˆ g is always entirely contained in A s,j . Hence, (cid:107)T s,j (cid:107) ∼ (cid:88) n ∈ I T ∩A s,j (cid:90) R | d n | | ˆ g ( ω − n ) | dω = (cid:88) n ∈ I T ∩A s,j | d n | (cid:90) R | ˆ g ( ω ) | dω ∼ (cid:88) n ∈ I T ∩A s,j | d n | . (41)Combining (39) and (41), we ﬁnish the proof. (cid:3) For later analysis, we also need an energy estimate of the cartoon patch in the missing region.We use the following useful notation in the whole paper (cid:104)| x |(cid:105) = (1 + | x | ) / . (42) Lemma 6.3.

For h j = o (2 − α j j ) , α j ∈ (0 , , and ω (0) (cid:54) = 0 , there exists a constant C > such that (cid:107) P j w S j (cid:107) ≥ C · h j − j (43) Proof.

By Plancherel’s theorem and Lemma 6.1, we have (cid:107) P j w S j (cid:107) = (cid:90) ξ ∈ R | ξ |≥ r (cid:12)(cid:12)(cid:12) (cid:90) R ˆ P j ( t ) (cid:100) w S j ( ξ − t ) dt (cid:12)(cid:12)(cid:12) dξ = (cid:90) ξ ∈ R | ξ |≥ r (cid:12)(cid:12)(cid:12) (cid:90) R h j sinc(2 h j t ) ˆ w ( ξ − t ) 1( πξ ) W j ( ξ − t , ξ ) dt (cid:12)(cid:12)(cid:12) dξ = 4 h j π (cid:90) ξ ∈ R | ξ |≥ r ξ (cid:12)(cid:12)(cid:12) (cid:90) R sinc(2 h j t ) ˆ w ( ξ − t ) W j ( ξ − t , ξ ) dt (cid:12)(cid:12)(cid:12) dξ. (44)Next, recalling the deﬁnition of W j ( ξ ) in (17), we have W j ( ξ ) = 1 , ∀ ξ such that ξ ∈ [ − j − , j − ] \ [ − j − , j − ] . (45)6 VAN TIEP DO, RON LEVIE, GITTA KUTYNIOKFor ξ in the corona-shape { ξ ∈ R | ξ ∈ [ − j − , j − ] \ [ − j − , j − ] } , if ξ is small then ξ must be large. More accurately, for ξ with | ξ | ≤ , we are guaranteed to have ξ ∈ [ − j − , j − ] \ [ − j − , j − ] in case ξ ∈ (2 j − , j − ) . Combining this observation with (44), the followingholds (cid:107) P j ω S j (cid:107) ≥ Ch j (cid:90) j − j − ξ (cid:90) − (cid:12)(cid:12)(cid:12) (cid:90) R sinc(2 h j t )ˆ ω ( ξ − t ) W j ( ξ − t , ξ ) dt (cid:12)(cid:12)(cid:12) dξ dξ . (46)Next, we study the term I j ( ξ ) := (cid:12)(cid:12)(cid:12) (cid:82) R sinc(2 h j t )ˆ ω ( ξ − t ) W j ( ξ − t , ξ ) dt (cid:12)(cid:12)(cid:12) . By the triangleinequality, we have I j ( ξ ) ≥ (cid:12)(cid:12)(cid:12) (cid:90) | t |≤ αjj − sinc(2 h j t )ˆ ω ( ξ − t ) W j ( ξ − t , ξ ) dt (cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12) (cid:90) | t |≥ αjj − sinc(2 h j t )ˆ ω ( ξ − t ) W j ( ξ − t , ξ ) dt (cid:12)(cid:12)(cid:12) . (47)We now use the rapid decay of ˆ w to obtain sup | t |≥ αjj | ˆ w ( ξ − t ) | ≤ C N (cid:104)| α j j |(cid:105) − N , ∀ ξ ∈ [ τ, τ ] , ∀ N ∈ N . (48)This leads to (cid:12)(cid:12)(cid:12) (cid:90) | t |≥ αjj sinc(2 h j t )ˆ ω ( ξ − t ) W j ( ξ − t , ξ ) dt (cid:12)(cid:12)(cid:12) ≤ C (cid:48) N (cid:104)| α j j |(cid:105) − N . (49)For the other term, we observe that for ﬁne scale j , we have ( ξ − t , ξ ) ∈ [ − j − , j − ] \ [ − j − , j − ] for ξ ∈ [ − , , | t | ≤ α j j − , α j ∈ (0 , and ξ ∈ (2 j − , j − ) . Combining thiswith (45), we derive W j ( ξ − t , ξ ) = 1 . Thus, we obtain (cid:90) | t |≤ αjj − sinc(2 h j t )ˆ ω ( ξ − t ) W j ( ξ − t , ξ ) dt = (cid:90) | t |≤ αjj − sinc(2 h j t )ˆ ω ( ξ − t ) dt . We now prove that there exists a constant C (cid:48) > such that for suﬃciently large j , we have (cid:12)(cid:12)(cid:12) (cid:90) | t |≤ αjj − sinc(2 h j t )ˆ ω ( ξ − t ) dt (cid:12)(cid:12)(cid:12) ≥ C (cid:48) , ∀ ξ ∈ [ − , . (50)By triangle inequality, we obtain (cid:12)(cid:12)(cid:12) (cid:90) | t |≤ αjj − sinc(2 h j t )ˆ ω ( ξ − t ) dt (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:90) | t |≤ αjj − ˆ w ( ξ − t ) dt (cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12) (cid:90) | t |≤ αjj − (cid:16) sinc(2 h j t ) − (cid:17) ˆ w ( ξ − t ) dt (cid:12)(cid:12)(cid:12) . (51)For the ﬁrst term, the following holds for ∀ ξ ∈ [ − , (cid:12)(cid:12)(cid:12) (cid:90) | t |≤ αjj − ˆ w ( ξ − t ) dt (cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12) (cid:90) R ˆ w ( t ) dt (cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12) (cid:90) | t |≥ αjj − ˆ w ( ξ − t ) dt (cid:12)(cid:12)(cid:12) by (48) ≈ | w (0) | = const > . (52)For the second term of the right hand side of (51), by the assumption h j α j j j → + ∞ −−−−→ , we have (cid:12)(cid:12)(cid:12) (cid:90) | t |≤ αjj − (sinc(2 h j t ) −

1) ˆ w ( ξ − t ) dt (cid:12)(cid:12)(cid:12) ≤ sup | t |≤ αjj − | sinc(2 h j t ) − | (cid:90) R | ˆ w ( t | dt ≤ C · sup | t |≤ αjj − | sinc(2 h j t ) − | j → + ∞ −−−−→ . (53)Thus, by (51), (53) and (52), we conclude the proof of (50).7 ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTINGNow, by (47), (49), and (50), we have I j ( ξ ) ≥ C (cid:48) = const. Combining this with (46), we ﬁnallyobtain (cid:107) P j ω S j (cid:107) ≥ Ch j (cid:16) (cid:90) j − j − ξ dξ (cid:17)(cid:16) (cid:90) − C (cid:48) dξ (cid:17) = C · h j − j , which concludes the proof. (cid:3) Sparsity assumptions on texture.

Next, we restrict the deﬁnition of texture by boundingthe size of the index set I T of Deﬁnition 4.1. Deﬁne the domain M j,l, ( ι ) := (cid:104) s − j · S − l ( ι ) · ( supp ˆ ψ α j , ( ι ) j, , + B (0 , (cid:105) ∩ Z , (54)where A + B := { a + b | a ∈ A, b ∈ B } is the Minkowski sum, and multiplication of a set ofvectors by a matrix is done elementwise. We assume that at every scale j , the number of non-zeroGabor elements with the same position, generating T s,j , is not too large. More accurately, denote | A | = { n ∈ Z ∩ A } , and deﬁne the neighborhood set I ± T of the index set I T by I ± T = { n (cid:48) ∈ Z | ∃ n ∈ I T : | n (cid:48) − n | ≤ } . (55)What we assume is that at every scale j , for all l ∈ Z satisfying | l | ≤ (2 − α j ) j we have | I T ∩ M j,l, ( ι ) | ≤ (2 − α j − (cid:15) ) j/ = o (2 (2 − α j ) j/ ) , ∀ ( ι ) ∈ { (h) , (v) , (b) } , (56)and | I ± T ∩ A s,j | ≤ α j j s j as j → ∞ . (57) Figure 6.

Left: Interaction between a cluster of Gabor elements (grey smallsquares) and a shearlet (green) in the frequency domain; Right: Some shearletelements (green) associated with Λ ,j in the spatial domain for l = 0 .6.4. Cluster sets for texture and cartoon.

We deﬁne the cluster for texture at scale j to be Λ ,j := (cid:16) Z ∩ B (0 , M j ) (cid:17) × (cid:16) I ± T ∩ A s,j (cid:17) , (58)where M j := 2 (cid:15)j/ and B (0 , r ) denotes the closed l ball around the origin in R . The term M j = 2 (cid:15)j/ controls the trade-oﬀ between the relative sparsity and cluster coherence of the Gaborsystems.To represent the cartoon part, consider a universal shearlet system with α j from (25), where α ∈ (0 , is a global constant. To deﬁne the cluster sets, we ﬁx a constant (cid:15) satisfying < (cid:15) < − α . (59)8 VAN TIEP DO, RON LEVIE, GITTA KUTYNIOKSince α j satisﬁes lim j →∞ α j = α , we have < (cid:15) < − α j , (60)at ﬁne enough scales j . For the cartoon model w S j we deﬁne the set of signiﬁcant coeﬃcients ofuniversal shearlet system by Λ ± ,j := Λ ,j − ∪ Λ ,j ∪ Λ ,j +1 , ∀ j ≥ , (61)where Λ ,j := (cid:110) ( j, l, k ; α j , v) | | l | ≤ , k = ( k , k ) ∈ Z , | k − lk | ≤ (cid:15)j (cid:111) . The separation and inpainting theorem for cartoon and texture.

We now presentour main result. In the following theorem, we prove the success of separation and inpaining viaAlgorithm (INP-SEP). Since we inpaint a band of width h j for each scale j , it is important to showthat the inpainting error is asymptotically smaller than the energy that typical cartoon and textureparts have in the missing band. For the cartoon part, we can prove that the relative reconstructionerror, restricted to the missing part, goes to zero as j → ∞ . For texture, we note that it is possiblefor T s,j to be close to zero in the missing part, and thus the relative error restricted to the missingpart need not go to zero in the general case. However, generic texture parts are not close to zero inthe missing band, and typically (cid:107) P j T s,j (cid:107) ∝ h j (cid:107)T s,j (cid:107) ∝ h j − j . We thus consider for texture a relative error of the form (cid:107) P j T ∗ j − P j T s,j (cid:107) h j − j Theorem 6.4.

Consider the cartoon patch and texture with w S j and T s,j deﬁned in (34) and (35)respectively. Suppose that the energy matching (36) holds. Suppose that the index set of the texture T s,j satisﬁes (56) and (57) . Then, for < h j = o (2 − ( α j + (cid:15) ) j ) with α j ∈ (0 , , lim inf α j > and (cid:15) satisfying (60), the recovery error provided by Algorithm (INP-SEP) decays rapidly and we haveasymptotically perfect simultaneous separation and inpainting. Namely, for all N ∈ N , (cid:107)C ∗ j − w S j (cid:107) + (cid:107)T ∗ j − T s,j (cid:107) (cid:107) w S j (cid:107) + (cid:107)T s,j (cid:107) = o (2 − Nj ) → , j → ∞ , (62) where ( C ∗ j , T ∗ j ) is the solution of (INP-SEP) and ( C j , T s,j ) are ground truth components. In addition,if ω (0) (cid:54) = 0 , we have asymptotically accurate relative reconstruction error in the missing part (cid:107) P j C ∗ j − P j ω S j (cid:107) (cid:107) P j ω S j (cid:107) j → + ∞ −−−−→ and (cid:107) P j T ∗ j − P j T s,j (cid:107) h j − j j → + ∞ −−−−→ . (63)We postpone the proof of Theorem 6.4 to Section 10, after we discuss preliminary material.Note that in Theorem 6.4 there is no direct restriction on the texture patch sizes s − j , and thetheorem works even when the texture patch at each scale is smaller than the scale − j . However,the most useful case of Theorem 6.4, which appropriately models the relation between texture andcartoon, is when s j = s is constant for all j . This is formulated in the following corollary. Corollary 6.5.

Consider the conditions of Theorem 6.4, and suppose that the texture part T s,j isbased on a constant s j = s . Then, we have asymptotically perfect separation and inpainting, i.e, for ∀ N ∈ N , (cid:107)C j − C j (cid:107) + (cid:107)T j − T s,j (cid:107) (cid:107)C j (cid:107) + (cid:107)T s,j (cid:107) = o (2 − Nj ) → , j → ∞ . (64) If in addition ω (0) (cid:54) = 0 , we have (cid:107) P j C ∗ j − P j ω S j (cid:107) (cid:107) P j ω S j (cid:107) j → + ∞ −−−−→ and (cid:107) P j T ∗ j − P j T s,j (cid:107) h j − j j → + ∞ −−−−→ . (65)9 ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTING7. Extensions and future directions

In this section, we present potential extensions and future directions of our approach.(1)

Global cartoon model . Using the technique introduced in [10] and [13], we can localize aglobal cartoon part C j by a partition of unity ( w Q ) Q ∈Q . Denoting C j,Q = C j · w Q , we have (cid:88) Q C j,Q = C j . In this approach, using a tubular neighborhood theorem ([10, Sect. 6] and [13]), we applya diﬀeomorphism Φ Q to each piece C j,Q to straighten out the local curve discontinuity. Bycombining the diﬀerent neighborhoods, we can derive a convergence results for a globalcartoon part.(2) Other types of components . Our theoretical analysis holds for general representationsystems which form Parseval frames. Thus, our general technique can be applied to problemsof image separation and image ipainting of other types of image parts.(3)

Noisy case . In future work we will extend our theory to the case where the image containsnoise.(4)

More components . In our analysis we consider the case where there are two components.In future work we will extend our theoretical guarantee for separation and inpainting ofmore than two geometric components.In the rest of the paper we prove Theorem 6.4. For that, in section 8 we study the relativesparsity of texture and cartoon, and in Section 9 we study the cluster coherence of the sparsifyingsystems. 8.

Relative sparsity of texture and cartoon

In this section, we bound the relative sparsity of texture with respect to shearlet frame (Subsection8.2) and texture with respect to Gabor frame (Subsection 8.3).8.1.

Decay estimates of shearlet and Gabor elements.

First, we provide decay estimates ofshearlet and Gabor elements.

Lemma 8.1.

Consider the shearlet frame Ψ of Deﬁnition 5.3 and the Gabor frame of scale j, G j deﬁned in (33). For any arbitrary integer N = 1 , , . . . there exists a constant C N depending on j such that the following estimates holdi) | ( g s j ) m,n ( x ) | ≤ C N · s j · (cid:104)| s j x + m |(cid:105) − N (cid:104)| s j x + m |(cid:105) − N , ii) | ψ α j , (v) j,l,k ( x ) | ≤ C N · (2+ α j ) j/ · (cid:104)| α j j x − k |(cid:105) − N (cid:104)| j x + l α j j x − k |(cid:105) − N , | ψ α j , (h) j,l,k ( x ) | ≤ C N · (2+ α j ) j/ · (cid:104)| j x + l α j j x − k ) |(cid:105) − N (cid:104)| α j j x − k |(cid:105) − N , iii) |(cid:104) ( g s j ) m,n , ψ α j , ( ι ) j,l,k (cid:105)| ≤ C N · − (2 − α j ) j/ , ∀ ( ι ) ∈ { (v) , (h) , (b) } . Proof. i) By the change of variable ζ = s − j ξ − n , we have | ( g s j ) m,n ( x ) | = (cid:12)(cid:12)(cid:12) (cid:90) R (ˆ g s j ) m,n ( ξ ) e πiξ T x dξ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:90) R s − j ˆ g ( s − j ξ − n ) e πiξ T ( x + m sj ) dξ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) s j e πin T ( s j x + m ) (cid:90) R ˆ g ( ζ ) e πiζ T ( s j x + m ) dζ (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) s j (cid:90) R ˆ g ( ζ ) e πiζ T ( s j x + m ) dζ (cid:12)(cid:12)(cid:12) . We now apply integration by parts for N , N = 1 , , . . . , with respect to ζ , ζ , respectively, weobtain | ( g s j ) m,n ( x ) | = (cid:12)(cid:12)(cid:12) (cid:90) R s j ( s j x + m − N ∂ N ∂ζ N [ˆ g ( ζ )] e πiζ T ( s j x + m ) dζ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:90) R s j ( s j x + m − N ( s j x + m − N ∂ N + N ∂ζ N ∂ζ N [ˆ g ( ζ )] e πiζ T ( s j x + m ) dζ (cid:12)(cid:12)(cid:12) ≤ s j | s j x + m | − N | s j x + m | − N (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N + N ∂ζ N ∂ζ N [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) , | ( g s j ) m,n ( x ) | ≤ s j | s j x k + m k | − N k (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N k ∂ζ N k k [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) for k = 1 , . Here, the boundary terms vanish due to the compact support of (ˆ g s j ) m,n ( ζ ) . Thus, s − j (cid:16) | s j x + m | N + | s j x + m | N + | s j x + m | N | s j x + m | N (cid:17) | ( g s j ) m,n ( x ) | = s − j (1 + | s j x + m | N )(1 + | s j x + m | N ) | ( g s j ) m,n ( x ) |≤ (cid:12)(cid:12)(cid:12) (cid:90) R ˆ g ( ζ ) dζ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N ∂ζ N [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N ∂ζ N [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N + N ∂ζ N ∂ζ N [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) . By the smoothness of ˆ g ( ζ ) and supp ˆ g = [ − , , there exists a constant C (cid:48) N ,N independent of j such that (cid:12)(cid:12)(cid:12) (cid:90) R ˆ g ( ζ ) dζ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N ∂ζ N [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N ∂ζ N [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (cid:90) R ∂ N + N ∂ζ N ∂ζ N [ˆ g ( ζ )] dζ (cid:12)(cid:12)(cid:12) ≤ C (cid:48) N ,N . We obtain | ( g s j ) m,n ( x ) | ≤ C (cid:48) N ,N · s j ·

11 + | s j x + m | N

11 + | s j x + m | N . Moreover, for each N , N = 1 , , . . . there exists C (cid:48) N , C (cid:48) N such that (cid:104)| s j x + m |(cid:105) N = (1 + | s j x + m | ) N / ≤ C (cid:48) N (1 + | s j x + m | ) N , (cid:104)| s j x + m |(cid:105) N = (1 + | s j x + m | ) N / ≤ C (cid:48) N (1 + | s j x + m | ) N . Thus, | ( g s j ) m,n ( x ) | ≤ C N ,N · s j · (cid:104)| s j x + m |(cid:105) − N (cid:104)| s j x + m |(cid:105) − N . This proves the claim for any arbitrary integer N .ii) By the change of variable ζ T = ξ T A − jα j , ( ι ) S − l ( ι ) , we have | ψ α j , ( ι ) j,l,k ( x ) | = (cid:12)(cid:12)(cid:12) (cid:90) R ˆ ψ α j , ( ι ) j,l,k ( ξ ) e πiξ T x dξ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:90) R − (2+ α j ) j/ W j ( ξ ) V ( ι ) (cid:16) ξ T A − jα j , ( ι ) S − l ( ι ) (cid:17) e πiξ T ( x − A − jαj, ( ι ) S − l ( ι ) k ) dξ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:90) R (2+ α j ) j/ W j (( S l ( ι ) A jα j , ( ι ) ) T ζ ) V ( ι ) ( ζ ) e πiζ T ( S l ( ι ) A jαj, ( ι ) x − k ) dζ (cid:12)(cid:12)(cid:12) . Similarly to in (i) we apply integration by parts for N , N = 1 , , . . . with respect to ζ , ζ . Weobtain the decay estimate of universal shearlets | ψ α j , (v) j,l,k ( x ) | ≤ C N · (2+ α j ) j/ · (cid:104)| α j j x − k |(cid:105) − N (cid:104)| j x + l α j j x − k |(cid:105) − N , | ψ α j , (h) j,l,k ( x ) | ≤ C N · (2+ α j ) j/ · (cid:104)| j x + l α j j x − k ) |(cid:105) − N (cid:104)| α j j x − k |(cid:105) − N . iii) We consider three cases.Case 1: ( ι ) = (v) . Applying (i) and (ii) and the change of variables ( y , y ) = ( s j x , j x + l α j j x ) ,we obtain |(cid:104) ( g s j ) m,n , ψ α j , (v) j,l,k (cid:105)| = (cid:12)(cid:12)(cid:12) (cid:90) R ( g s j ) m,n ( x ) ψ α j , (v) j,l,k ( x ) dx (cid:12)(cid:12)(cid:12) ≤ C (cid:48) N · − (2 − α j ) j/ (cid:90) R (cid:104)| y + m |(cid:105) − N (cid:104)| y − k |(cid:105) − N dy dy ≤ C N · − (2 − α j ) j/ . Case 2: ( ι ) = (h) . Similarly, we can use (i), (ii) and the change of varibles ( y , y ) = (2 j x + l α j j x , s j x ) to obtain |(cid:104) ( g s j ) m,n , ψ α j , (v) j,l,k (cid:105)| ≤ C N · − (2 − α j ) j/ . Case 3: ( ι ) = (b) . By the deﬁnition of boundary shearlets, we can verify this estimate by combiningtwo cases above. (cid:3)

Lemma 8.2.

For each n ∈ N there exists a constant C N > such that (cid:90) R (cid:104)| z |(cid:105) − N (cid:104)| z + t |(cid:105) − N dz ≤ C N (cid:104)| t |(cid:105) − N , ∀ t ∈ R . Proof.

We have (cid:90) R (cid:104)| z |(cid:105) − N (cid:104)| z + t |(cid:105) − N dz = (cid:90) R max {(cid:104)| z |(cid:105) − N , (cid:104)| z + t |(cid:105) − N } · min {(cid:104)| z |(cid:105) − N , (cid:104)| z + t |(cid:105) − N } dz ≤ (cid:90) R (cid:16) (cid:104)| z |(cid:105) − N + (cid:104)| z + t |(cid:105) − N (cid:17) · (cid:104)| t/ |(cid:105) − N dz ≤ C N (cid:104)| t |(cid:105) − N . (cid:3) Cartoon patch.

For the sake of brevity, we use some indexing sets for universal shearlets ∆ := (cid:110) ( j, l, k ; α j , ( ι )) | j ≥ , | l | < (2 − α j )j , k ∈ Z , ( ι ) ∈ { (h) , (v) } (cid:111)(cid:91) (cid:110) ( j, l, k ; α j , (b)) | j ≥ , | l | = 2 (2 − α j )j , k ∈ Z (cid:111) , (66) ∆ j := { ( j (cid:48) , k, l ; α j , ( ι )) ∈ ∆ | j (cid:48) = j } , j ≥ , (67) ∆ ± j := ∆ j − ∪ ∆ j ∪ ∆ j +1 , (68)where ∆ − = ∅ . We now have the following result. Proposition 8.3.

Consider the shearlet frame Ψ of Deﬁnition 5.3 and the cartoon patch w S j asdeﬁned in (34). We assume that lim inf j →∞ α j > . Then, we have the following decay estimate ofthe cluster approximate error δ ,j δ ,j := (cid:88) η ∈ ∆ ,η / ∈ Λ ± ,j |(cid:104) ψ α j (cid:48) , ( ι ) j (cid:48) ,l,k , w S j (cid:105)| = o (2 − Nj ) , ∀ N ∈ N , (69) where η = ( j (cid:48) , l, k ; α j (cid:48) , ( ι )) . The proof of this proposition roughly follows the lines of the Proposition 5.2 in [25]. For the sakeof brevity we denote t (v) = ( t (v)1 , t (v)2 ) := A − jα j , (v) S − l (v) k = (2 − α j j k , − j ( k − lk )) , (70) t (h) = ( t (h)1 , t (h)2 ) := A − jα j , (h) S − l (h) k = (2 − j ( k − lk ) , − α j j k ) . (71)The proof also relies on the following two lemmas. Lemma 8.4.

Consider the shearlet frame Ψ of Deﬁnition 5.3 and the cartoon patch with sub-band w S j deﬁned in (34). Then, for α j ∈ (0 , , j (cid:48) ∈ { j − , j, j + 1 } , j ≥ , the following estimates holdfor arbitrary integers M ≥ i) If ( ι ) = (v) and | l | > , we have (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j (cid:48) , (v) j (cid:48) ,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ C M · − (2+ α j (cid:48) ) j/ · (cid:104)| t (v)1 |(cid:105) − (cid:104)| t (v)2 |(cid:105) − (cid:104)| α j (cid:48) j (cid:48) |(cid:105) − M . ii) If ( ι ) = (h) , we have (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j (cid:48) , (h) j (cid:48) ,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ C M · j (cid:48) r · (cid:104)| t (h)1 |(cid:105) − (cid:104)| t (h)2 |(cid:105) − (cid:104)| j (cid:48) |(cid:105) − M , where r is deﬁned in (14).iii) If ( ι ) = (b) and | l | = 2 (2 − α j (cid:48) ) j (cid:48) , we have (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j (cid:48) , (b) j (cid:48) ,l,k (cid:105) (cid:12)(cid:12)(cid:12) = C M · − (2+ α j (cid:48) ) j (cid:48) / · (cid:104)| t (b)1 |(cid:105) − (cid:104)| t (b)2 |(cid:105) − (cid:104)| α j (cid:48) j (cid:48) |(cid:105) − M . Proof.

Without loss of generality, we prove for j (cid:48) = j . The other cases are treated similarly.(i) By the deﬁnition of w S j and Plancherel’s theorem, we obtain (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:104) (cid:99) w S j , ˆ ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:90) R ˆ w ( ξ ) 12 πξ W j ( ξ ) ˆ ψ α j , (v) j,l,k ( ξ ) (cid:12)(cid:12)(cid:12) = 12 π (cid:12)(cid:12)(cid:12) (cid:90) R e πit (v)2 ξ (cid:90) R ˆ w ( ξ ) 1 ξ W j ( ξ ) ˆ ψ α j , (v) j,l, ( ξ ) e πit (v)1 ξ dξ dξ (cid:12)(cid:12)(cid:12) = 12 π (cid:12)(cid:12)(cid:12) (cid:90) R e πit (v)2 ξ (cid:90) R ˆ w ( ξ ) τ j,l ( ξ ) e πit (v)1 ξ dξ dξ (cid:12)(cid:12)(cid:12) , where τ j,l ( ξ ) := ξ W j ( ξ ) ˆ ψ α j , (v) j,l, ( ξ ) .We now apply repeated integration by parts with respect to ξ i , i = 1 , , . We get (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) = 12 π (cid:12)(cid:12)(cid:12) (cid:90) R e πit (v)2 ξ (cid:90) R ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] 1(2 πit (v)1 ) e πit (v)1 ξ dξ (cid:105) dξ (cid:12)(cid:12)(cid:12) ≤ | t (v)1 | (cid:12)(cid:12)(cid:12) (cid:90) R e πit (v)2 ξ (cid:90) R ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] e πit (v)1 ξ dξ dξ (cid:12)(cid:12)(cid:12) ≤ | t (v)1 | − (cid:90) R (cid:90) R (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ dξ (72)and similarly (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ | t (v)2 | − (cid:90) R (cid:90) R (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ dξ (73) (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ | t (v)1 | − | t (v)2 | − (cid:90) R (cid:90) R (cid:12)(cid:12)(cid:12) ∂ ∂ξ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ dξ . (74)Thus, | t (v)1 | (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) (72) ≤ (cid:90) R (cid:90) R (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ dξ | t (v)2 | (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) (73) ≤ (cid:90) R (cid:90) R (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ dξ | t (v)1 | | t (v)2 | (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) (74) ≤ (cid:90) R (cid:90) R (cid:12)(cid:12)(cid:12) ∂ ∂ξ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ dξ . This leads to (1 + | t (v)1 | + | t (v)2 | + | t (v)1 | | t (v)2 | ) (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) = (cid:104)| t (v)1 |(cid:105) (cid:104)| t (v)2 |(cid:105) (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ (cid:90) R (cid:12)(cid:12)(cid:12) ˆ w ( ξ ) τ j,l ( ξ ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ ∂ξ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ (75) = (cid:82) R Γ( ξ ) dξ , where Γ( ξ ) := (cid:82) R (cid:12)(cid:12)(cid:12) ˆ w ( ξ ) τ j,l ( ξ ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ ∂ξ ∂ξ [ ˆ w ( ξ ) τ j,l ( ξ )] (cid:12)(cid:12)(cid:12) dξ . Next, we bound Γ . By the deﬁnition of the universal shearlets, ˆ ψ α j , (v) j,l,k has compact support in the trapezoidal region supp( ˆ ψ α j , (v) j,l,k ) = (cid:110) ξ ∈ R | ξ ∈ [ − j − , j − ] \ [ − j − , j − ] , (cid:12)(cid:12)(cid:12) ξ ξ − l − (2 − α j ) j (cid:12)(cid:12)(cid:12)(cid:111) . (76)This implies that for any ξ ∈ supp τ j,l we have ( l − ( α j − j ≤ ξ ξ ≤ ( l + 1)2 ( α j − j and j − ≤ | ξ | ≤ j − . Using the assumption | l | > follows that there exist constants C , C > such that ξ ∈ I j,l := [ − C l α j j , − C l α j j ] ∪ [ C l α j j , C l α j j ] . τ j,l ( ξ ) is zero for ξ < j − , we obtain (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ∂∂ξ (cid:104) ξ W j ( ξ ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17)(cid:105)(cid:12)(cid:12)(cid:12) ≤ j − (cid:104) j W ( ξ j , ξ j ) ∂W∂ξ ( ξ j , ξ j ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17) +2 (2 − α j ) j ξ ∂v∂ξ (cid:16) (2 − α j ) j ξ ξ − l (cid:17) W ( ξ j , ξ j ) (cid:105) ≤ j − (cid:16) C − j + C (2 − α j ) j | ξ | (cid:17) <α j ≤ ≤ C − j . (77)Furthermore, we have (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) = 2 (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂ σ j,l ξ ∂ξ (cid:12)(cid:12)(cid:12) , where σ j,l := W j ( ξ ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17) . Thus, in the support (76) we have (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂σ j,l ∂ξ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ∂∂ξ (cid:104) W j ( ξ ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17)(cid:105)(cid:12)(cid:12)(cid:12) = 22 j W ( ξ j , ξ j ) ∂W∂ξ ( ξ j , ξ j ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17) − (2 − α j ) j ξ ξ ∂v∂ξ W ( ξ j , ξ j ) ≤ C − j + C (2 − α j ) j | ξ || ξ | α j ≤ ≤ C − α j j . Consequently, we get (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) = 2 (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂σ j,l ∂ξ ξ − σ j,l ( ξ ) ξ (cid:12)(cid:12)(cid:12) ≤ C − α j j j − + C (cid:48) j − ≤ C (cid:48)(cid:48) − j . (78)Similarly, we obtain (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂ τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C − j , (79)and (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂ τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C − j . (80)We now exploit the speciﬁc form of I j,l as well as the rapid decay of ˆ w to study ˆ w ( n ) for n = 0 , , .These lead to (see Figure 7 for illustration) (cid:107) ˆ w ( n ) (cid:107) L ( I j,l ) ≤ vol ( I j,l ) sup ξ ∈ I j,l | ˆ w ( n ) ( ξ ) | ≤ C M,n | l α j j |(cid:104)| l α j j |(cid:105) − ( M +1) ≤ C M,n (cid:104)| l α j j |(cid:105) − M l> ≤ C M,n (cid:104)| α j j |(cid:105) − M ∀ M ∈ N , ∀ n = 1 , , . . . (81)Combining (77), (78), (79), (80), (81), with (75), we have Γ( ξ ) ≤ C · − (2+ α j ) j/ · − j max (cid:110) (cid:107) ˆ w (cid:107) L ( I j,l ) , (cid:107) ∂ ˆ w∂ξ (cid:107) L ( I j,l ) , (cid:107) ∂ ˆ w∂ξ (cid:107) L ( I j,l ) (cid:111) (82) ≤ C M · − (2+ α j ) j/ · − j · (cid:104)| α j j |(cid:105) − M . (83)In addition, since supp Γ( ξ ) = [2 j − , j − ] , we ﬁnally obtain (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ C M · − (2+ α j ) j/ · (cid:104)| t (v)1 |(cid:105) − (cid:104)| t (v)2 |(cid:105) − (cid:104)| α j j |(cid:105) − M . (ii) This claim is similar to (i) but we need to modify the intervals I j,l to be [ − C j , − C j ] ∪ [ C j , C j ] , where C , C > . This leads to the additional terms (cid:104)| j |(cid:105) − M and r in (ii) whencomparing with (i). Next, we use r ≤ | ξ | ≤ j − instead of using j − ≤ | ξ | ≤ j − forthe support of τ j,l . Thus, redeﬁning τ j,l ( ξ ) = ξ W j ( ξ ) ˆ ψ α j , (h) j,l, ( ξ ) , we modify (77) , (78) , (79) , (80) asfollows.4 VAN TIEP DO, RON LEVIE, GITTA KUTYNIOK (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ∂∂ξ (cid:104) ξ W j ( ξ ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17)(cid:105)(cid:12)(cid:12)(cid:12) ≤ r (cid:12)(cid:12)(cid:12) j W ( ξ j , ξ j ) ∂W∂ξ ( ξ j , ξ j ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (2 − α j ) j ξ ∂v∂ξ W ( ξ j , ξ j ) (cid:12)(cid:12)(cid:12) ≤ r (cid:16) C − j + C (2 − α j ) j | ξ | (cid:17) <α j ≤ ≤ Cr · j , (84)Furthermore, we have (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) = 2 (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂ σ j,l ξ ∂ξ (cid:12)(cid:12)(cid:12) , where σ j,l := W j ( ξ ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17) . Thus, (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂σ j,l ∂ξ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ∂∂ξ (cid:104) W j ( ξ ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17)(cid:105)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) j W ( ξ j , ξ j ) ∂W∂ξ ( ξ j , ξ j ) v (cid:16) (2 − α j ) j ξ ξ − l (cid:17)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) (2 − α j ) j ξ ∂v∂ξ W ( ξ j , ξ j ) (cid:12)(cid:12)(cid:12) ≤ C − j + C (2 − α j ) j | ξ | <α j ≤ ≤ C. Consequently, we have (cid:12)(cid:12)(cid:12) ∂τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ∂σ j,l ∂ξ ξ − σ j,l ( ξ ) ξ (cid:12)(cid:12)(cid:12) ≤ C (cid:48) r . (85)Similarly, we obtain (2+ α j ) j/ (cid:12)(cid:12)(cid:12) ∂ τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ Cr · j , (86)and (cid:12)(cid:12)(cid:12) ∂ τ j,l ∂ξ ( ξ ) (cid:12)(cid:12)(cid:12) ≤ C (cid:48) r . (87)These bounds lead to Γ( ξ ) ≤ Cr · max (cid:110) (cid:107) ˆ w (cid:107) L ( I j,l ) , (cid:107) ∂ ˆ w∂ξ (cid:107) L ( I j,l ) , (cid:107) ∂ ˆ w∂ξ (cid:107) L ( I j,l ) (cid:111) . Thus, by the rapid decay of ˆ w , as before, (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ C M · j r · (cid:104)| t (v)1 |(cid:105) − (cid:104)| t (v)2 |(cid:105) − (cid:104)| j |(cid:105) − M . For an illustration, we refer to Figure 7. Finally, we obtain (iii). Recalling the deﬁnition of theshearlets, we can easily verify this estimate since boundary shearlets is just a particular case ofhorizontal and vertical shearlets. (cid:3)

Lemma 8.5.

Consider the shearlet frame Ψ of Deﬁnition 5.3 and the cartoon patch with sub-band w S j deﬁned in (34). For j ∈ { j − , j, j + 1 } , j ≥ , and for any arbitrary N ≥ , there exists aconstant C N > such that (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j (cid:48) , (v) j (cid:48) ,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ C N (2 − α j (cid:48) ) j (cid:48) / (cid:90) R ˜ w N,j (2 − α j (cid:48) j (cid:48) ( x + k )) (cid:104)| x |(cid:105) − N (cid:104)| lx + lk − k |(cid:105) − N dx , where ˜ w N,j (cid:48) := | w | (cid:63) (cid:104)| j (cid:48) [ · ] |(cid:105) − N , and (cid:104)| · |(cid:105) is deﬁned from (42) .Proof. Without loss of generality, we can prove only for j (cid:48) = j. We ﬁrst note that in supp ψ α j , (v) j,l,k wehave w S ≡ w S , where wS is deﬁned in (14). We now consider the line distribution w L introducedin [25] by (cid:104) w L , f (cid:105) = (cid:90) ρ − ρ w ( x ) f ( x , dx , f ∈ S ( R ) , (88)5 ANALYSIS OF SIMULTANEOUS SEPARATION AND INPAINTINGwith Fourier transform (cid:104) (cid:99) w L , f (cid:105) = (cid:90) R ˆ w ( ξ ) f ( ξ ) dξ. (89)Next, we observe that (cid:104) w L , f (cid:105) = (cid:90) ρ − ρ w ( x ) f ( x , dx = (cid:90) ρ − ρ w ( x ) (cid:90) R f ( x , x ) δ ( x ) dx dx = (cid:90) (cid:90) R w ( x ) δ ( x ) f ( x , x ) dx dx = (cid:104) w ( x ) δ ( x ) , f (cid:105) , where δ ( x ) denotes the usual Dirac delta functional. Thus, w L ( x ) = w ( x ) δ ( x ) . (90)In addition, (cid:104) w S , f (cid:105) = (cid:90) ρ − ρ w ( x ) (cid:90) −∞ f ( x ) dx dx = (cid:90) ρ − ρ w ( x ) (cid:90) + ∞−∞ f ( x ) { ≥ x } (0) dx dx = (cid:90) ρ − ρ w ( x ) (cid:90) R f ( x ) (cid:16) (cid:90) R { y ≥ x } ( y ) δ ( y ) dy (cid:17) dx dx = (cid:90) ρ − ρ w ( x ) (cid:90) R f ( x ) (cid:16) (cid:90) + ∞ x δ ( y ) dy (cid:17) dx dx = (cid:68) (cid:90) + ∞ x w ( x ) δ ( y ) dy, f ( x ) (cid:69) . Hence, we obtain w S ( x ) = (cid:90) + ∞ x w L ( x , y ) dy. (91)Since integration commutes with convolution, we also have w S j = w S (cid:63) F j = (cid:90) + ∞ x w L j ( x , y ) dy. where w L j is deﬁned by w L (cid:63) F j . In addition, by integration by parts with respect to variable x ,we obtain (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) = (cid:104) (cid:90) + ∞ x w L j ( x , y ) dy, ψ α j , (v) j,l,k (cid:105)(cid:105) = (cid:104) w L j ( x ) , (cid:90) + ∞ x ψ α j , (v) j,l,k ( x , y ) dy (cid:105) , where the boundary terms vanish due to the compact support of ψ α j , (v) j,l,k . Now we put Ξ α j , (v) j,l,k ( x ) := (cid:90) + ∞ x ψ α j , (v) j,l,k ( x , y ) dy, and note that ˆΞ α j , (v) j,l,k ( ξ ) = i πξ · ˆ ψ α j , (v) j,l,k ( ξ ) . Similarly to Lemma 8.1, which was based on ψ α j , (v) j,l,k , wecan prove a lemma for Ξ α j , (v) j,l,k , and prove the rapid decay property | Ξ α j , (v) j,l,k ( x ) | ≤ C (cid:48) N · j − · (2+ α j ) j/ · (cid:104)| α j j x − k |(cid:105) − N (cid:104)| l α j j x + 2 j x − k |(cid:105) − N ≤ C N · ( α j − j/ · (cid:104)| α j j x − k |(cid:105) − N (cid:104)| l α j j x + 2 j x − k |(cid:105) − N . (92)Now we can use the decay estimate of the line singularities w L j to obtain | w L j ( x ) | = | [ w L (cid:63) F j ]( x ) | = (cid:12)(cid:12)(cid:12) (cid:90) R w ( y ) F j ( x − ( y , dy (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) (cid:90) R | w ( y ) | j | ˇ W (2 j ( x − ( y , | dy (cid:12)(cid:12)(cid:12) ≤ (cid:90) R | w ( y ) | j C N (cid:104)| j x |(cid:105) − N (cid:104)| j ( y − x ) |(cid:105) − N dy = C N j (cid:104)| j x |(cid:105) − N [ | w (cid:63) (cid:104) j [ · ] |(cid:105) − N ]( x )= C N j (cid:104)| j x |(cid:105) − N ˜ w N,j ( x ) , ˜ w N,j ( x ) := [ | w (cid:63) (cid:104) j [ · ] |(cid:105) − N ]( x ) , x = ( x , x ) ∈ R . Combining this with the rapid decayproperties of Ξ α j , (v) j,l,k ( x ) from (92) and Lemma 8.2, we have (cid:12)(cid:12)(cid:12) (cid:104) w S j , ψ α j , (v) j,l,k (cid:105) (cid:12)(cid:12)(cid:12) ≤ C N (cid:90) R j (cid:104)| j x |(cid:105) − N ˜ w N,j ( x )2 ( α j +2) j/ (cid:104)| α j j x − k |(cid:105) − N ·(cid:104)| l α j j x + 2 j x − k |(cid:105) − N dx = C N (2 − α j ) j/ (cid:90) R (cid:104)| x |(cid:105) − N ˜ w N,j (2 − α j j ( x + k )) (cid:104)| x |(cid:105) − N (cid:104)| lx + lk + x − k |(cid:105) − N dx ≤ C N (2 − α j ) j/ (cid:90) R ˜ w N,j (2 − α j j ( x + k )) (cid:104)| x |(cid:105) − N (cid:104)| lx + lk − k |(cid:105) − N dx . (cid:3) Figure 7.

Interactions between horizontal shearlets, vertical shearlets and thesub-image of cartoon part w S j . Left: Frequency support of horizontal shearletsat scale j (grey), w S j (blue). Right: Frequency support of vertical shearlets with | l | > (brown), w S j (blue).Now we are ready to prove Proposition 8.3 Proof of Proposition 8.3.

By the deﬁnition of δ ,j and the support of the window function F j , onlyshearlet coeﬃcients in ∆ ± j have nonzero inner products with wS j , so we have δ ,j = (cid:88) η ∈ ∆ ,η / ∈ Λ ± ,j |(cid:104) ψ α j (cid:48) , ( ι ) j (cid:48) ,l,k , w S j (cid:105)| = (cid:88) η ∈ ∆ ± j ,η / ∈ Λ ± ,j |(cid:104) ψ α j (cid:48) , ( ι ) j (cid:48) ,l,k , w S j (cid:105)| = (cid:88) k ∈ Z , | l |≤ | k − lk | > (cid:15)j (cid:48) |(cid:104) ψ α j (cid:48) , (v) j (cid:48) ,l,k , w S j (cid:105)| + (cid:88) k ∈ Z , | l | > |(cid:104) ψ α j (cid:48) , (v) j (cid:48) ,l,k , w S j (cid:105)| + (cid:88) k ∈ Z ,l ∈ Z |(cid:104) ψ α j (cid:48) , (h) j (cid:48) ,l,k , w S j (cid:105)| + (cid:88) k ∈ Z |(cid:104) ψ α j (cid:48) , (b) j (cid:48) , ± (2 − αj (cid:48) ) j (cid:48) ,k , w S j (cid:105)| = T + T + T + T , where T := (cid:88) k ∈ Z , | l |≤ | k − lk | > (cid:15)j (cid:48) |(cid:104) ψ α j (cid:48) , (v) j (cid:48) ,l,k , w S j (cid:105)| , T := (cid:88) k ∈ Z , | l | > |(cid:104) ψ α j (cid:48) , (v) j (cid:48) ,l,k , w S j (cid:105)| , T = (cid:88) k ∈ Z ,l ∈ Z |(cid:104) ψ α j (cid:48) , (h) j (cid:48) ,l,k , w S j (cid:105)| , T := (cid:88) k ∈ Z |(cid:104) ψ α j (cid:48) j (cid:48) , ± (2 − αj (cid:48) ) j (cid:48) ,k , w S j (cid:105)| . Without loss of generality we restrict to the scale index j (cid:48) = j . The respective arguments for theother cases j (cid:48) = j ± are similar.We start with an estimation of T . By Lemma 8.5, for N ≥ , we obtain T = (cid:88) k ∈ Z , | l |≤ | k − lk | > (cid:15)j |(cid:104) ψ α j , (v) j,l,k , w S j (cid:105)|≤ C N (2 − α j ) j/ (cid:88) k ∈ Z , | l |≤ | k − lk | > (cid:15)j (cid:90) R ˜ w N,j (2 − α j j ( x + k )) (cid:104)| x |(cid:105) − N (cid:104)| lx + lk − k |(cid:105) − N dx = C N (2 − α j ) j/ (cid:88) k ∈ Z , | l |≤ | k | > (cid:15)j (cid:90) R ˜ w N,j (2 − α j j ( x + k )) (cid:104)| x |(cid:105) − N (cid:104)| lx + k |(cid:105) − N dx = C N (2 − α j ) j/ (cid:88) k ∈ Z , | l |≤ | k | > (cid:15)j (cid:90) R (cid:16) (cid:88) k ∈ Z ˜ w N,j (2 − α j j ( x + k )) (cid:17) (cid:104)| x |(cid:105) − N (cid:104)| lx + k |(cid:105) − N dx . Furthermore, we have (cid:88) k ∈ Z ˜ w N,j (2 − α j j ( x + k )) = (cid:88) k ∈ Z (cid:90) R | w ( y ) |(cid:104)| j ( y − − α j j ( x + k )) |(cid:105) − N dy = (cid:88) k ∈ Z (cid:90) R | w ( y ) |(cid:104)| (2 − α j ) j ( k + x − α j j y ) |(cid:105) − N dy α j ≤ ≤ (cid:90) R | w ( y ) | (cid:16) (cid:88) k ∈ Z (cid:104)| k + x − α j j y |(cid:105) − N (cid:17) dy ≤ C (cid:48) (cid:90) R | w ( y ) | (cid:16) (cid:90) R (cid:104)| t + x − α j j y |(cid:105) − N dt (cid:17) dy = C (cid:48) (cid:90) R | w ( y ) | (cid:16) (cid:90) R (cid:104)| t |(cid:105) − N dt (cid:17) dy ≤ C N . This implies T ≤ C N (2 − α j ) j/ (cid:88) k ∈ Z , | l |≤ | k | > (cid:15)j (cid:48) (cid:90) R (cid:104)| x |(cid:105) − N (cid:104)| lx + k |(cid:105) − N dx . Lemma . ≤ C N (2 − α j ) j/ (cid:88) k ∈ Z , | k | > (cid:15)j (cid:48) (cid:104)| k |(cid:105) − N dx . ≤ C N (2 − α j ) j/ (cid:90) t> (cid:15)j (cid:104)| t |(cid:105) − N ≤ C N (2 − α j ) j/ − ( N − (cid:15)j . We now bound the decay rate for the term T . First we ﬁx j ≥ and | l | > . For N ≥ , weobtain (cid:88) k ∈ Z (cid:104)| t (v)1 |(cid:105) − | t (v)2 | − N = (cid:88) k ∈ Z (cid:104)| − α J j k |(cid:105) − (cid:104)| − j ( k − lk ) |(cid:105) − ≤ C (cid:48) (2+ α j )2 j (cid:90) R (cid:90) R (cid:104)| x |(cid:105) − (cid:104)| x |(cid:105) − dx dx ≤ C (2+ α j )2 j . (93)8 VAN TIEP DO, RON LEVIE, GITTA KUTYNIOKNow, by Lemma 8.4, we have T = (cid:88) k ∈ Z < | l |≤ (2 − αj ) j |(cid:104) ψ α j , (v) j,l,k , w S j (cid:105)|≤ C M (cid:88) k ∈ Z < | l |≤ (2 − αj ) j − (2+ α j ) j/ · (cid:104)| t (v)1 |(cid:105) − (cid:104)| t (v)2 |(cid:105) − (cid:104)| α j j |(cid:105) − M ≤ C M · (2 − α j ) j/ · (cid:104)| α j j |(cid:105) − M (cid:88) k ∈ Z (cid:104)| t (v)1 |(cid:105) − (cid:104)| t (v)2 |(cid:105) − ≤ C (cid:48) M · (2 − α j ) j/ · (2+ α j )2 j · (cid:104)| α j j |(cid:105) − M . Using the assumption lim inf j →∞ α j > and choosing M suﬃciently large we obtain the desireddecay rates. By using (ii) and (iii) of the Lemma 8.4 the estimates for T and T are done similarly.Here, note that r · o (2 − Nj ) = o (2 − Nj ) for ﬁxed r > . For an illustration of Lemmas 8.4, 8.5, andthe main proposition, we refer to Figure 7. (cid:3) Texture.

We now provide a relative sparsity error (Deﬁnition 3.3 ) for the texture part.

Proposition 8.6.

Consider the Gabor frame of scale j, G j , deﬁned in (33), and the texture deﬁnedin Deﬁnition 4.1. Then, for every N ∈ N the sequence ( δ ,j ) j ∈ N decays rapidly in the sense δ ,j := (cid:88) λ/ ∈ B (0 ,M j ) × I ± T |(cid:104)T s,j , ( g s j ) λ (cid:105)| = o (2 − Nj ) . (94) Proof.

First, note that ˆ T s = (cid:90) R (cid:88) n ∈ I T d n g s j ( x ) e − πi (cid:104) ξ − s j n,x (cid:105) dx = (cid:88) n ∈ I T d n ˆ g s j ( ξ − s j n ) . Using the support condition of ˆ g , denoting λ = ( ˜ m, ˜ n ) , we obtain |(cid:104)T s,j , ( g s j ) λ (cid:105)| = |(cid:104) ˆ T s,j , (ˆ g s j ) λ (cid:105)| = (cid:12)(cid:12)(cid:12) (cid:88) n ∈ I T d n (cid:90) W j ( ξ )ˆ g s j ( ξ − s j n ) · ˆ g s j ( ξ − s j ˜ n ) e − πi (cid:104) ˜ m,ξ (cid:105) sj dξ (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:88) n ∈ I T , | n − ˜ n |≤ d n (cid:90) W j ( ξ )ˆ g s j ( ξ − s j n ) · ˆ g s j ( ξ − s j ˜ n ) e − πi (cid:104) ˜ m,ξ (cid:105) sj dξ (cid:12)(cid:12)(cid:12) . Indeed, |(cid:104)T s,j , ( g s j ) λ (cid:105)| = |(cid:104) ˆ T s,j , (ˆ g s j ) λ (cid:105)| = 0 , for ˜ n / ∈ I ± T ∩ A s,j . Moreover, we have { ( ˜ m, ˜ n ) / ∈ B (0 , M ) × I ± T } = { ( ˜ m, ˜ n ) ∈ Z × Z , | ˜ m | > M j } ∪ { ( ˜ m, ˜ n ) ∈ Z × Z , ˜ n / ∈ I ± T } . Thus, since the Gabor coeﬃcients of T s,j are zero for frequencies outside I ± T , δ ,j = (cid:88) λ/ ∈ B (0 ,M j ) × I ± T |(cid:104)T s,j , ( g s j ) λ (cid:105)| = (cid:88) | ˜ m |≥ M j , ˜ n ∈ I ± T ∩A s,j |(cid:104)T s,j , ( g s j ) λ (cid:105)| = (cid:88) | ˜ m |≥ M j , ˜ n ∈ I ± T ∩A s,j (cid:12)(cid:12)(cid:12) (cid:90) R (cid:88) n ∈ I T , | n − ˜ n |≤ d n ( g s j ) (0 ,n ) ( x )( g s j ) ( ˜ m, ˜ n ) ( x ) dx (cid:12)(cid:12)(cid:12) ≤ C (cid:88) | ˜ m |≥ M j , ˜ n ∈ I ± T ∩A s,j | d ˜ n | (cid:90) R | ( g s j ) (0 , ˜ n ) ( x ) | · | ( g s j ) ( ˜ m, ˜ n ) ( x ) | dx, where d ˜ n = max {| d n | | n ∈ I T , | n − ˜ n | ≤ } . Note that the last inequality holds since for each ˜ n ,there exist a ﬁnite number of n satisfying | n − ˜ n | ≤ . Moreover, by (56), there are o (2 (2 − α j ) j/ ) points of I ± T in each support of a shearlet and there are o (2 (2 − α j ) j ) shearlets in A j , so { n ∈ I ± T ∩ A s,j } ≤ { n ∈ I T ∩ A s,j } = o (2 (2 − α j ) j/ ) · (2 − α j ) j = o (2 (2 − α j )3 j/ ) . (cid:88) n ∈ I T ∩A s,j | d n | ≤ c − j , which implies d n ≤ √ c , ∀ n ∈ I T ∩ A s,j , we have δ ,j ≤ C · o (2 (2 − α j )3 j/ ) (cid:88) | ˜ m | >M j (cid:90) R | ( g s j ) (0 , ˜ n ) ( x ) | · | ( g s j ) ˜ m, ˜ n ( x ) | dx. Lemma . ≤ C N · o (2 (2 − α j )3 j/ ) (cid:90) R (cid:88) | ˜ m | >M j s j · (cid:104)| s j x |(cid:105) − N (cid:104)| s j x |(cid:105) − N (cid:104)| s j x + ˜ m |(cid:105) − N ·(cid:104)| s j x + ˜ m |(cid:105) − N dx. By Lemma 8.2, we now obtain δ ,j ≤ C N · o (2 (2 − α j )3 j/ ) · (cid:88) | ˜ m | >M j (cid:104)| ˜ m |(cid:105) − N (cid:104)| ˜ m |(cid:105) − N ≤ C N · o (2 (2 − α j )3 j/ ) · (cid:90) R (cid:90) M j (cid:104) t |(cid:105) − N (cid:104)| t |(cid:105) − N dt dt M j =2 (cid:15)j/ ≤ C N · o (2 (2 − α j )3 j/ ) · − ( N − (cid:15)j/ . This proves the claim since we can choose an arbitrarily large N ∈ N . (cid:3) Cluster coherence for cartoon and texture

In the previous section we proved relative sparsity for the texture and cartoon parts. For thesuccess of separation and inpainting, by Theorem 3.8, we need to prove that the cluster coherenceterms are less than / when summed together. To guarantee this asymptotically, we prove thatthe cluster coherence terms decay to zero when the scale goes to inﬁnity.9.1. Cluster coherence of the un-projected frames.

We now analyze the cluster coherence ofthe un-projected frames µ c (Λ ± ,j , Ψ ; G j ) and µ c (Λ ,j , G j ; Ψ ) . Proposition 9.1.

Proposition 9.2.

Consider the shearlet frame Ψ of Deﬁnition 5.3 and the Gabor frame of scale j, G j , deﬁned in (33). We have µ c (Λ ,j , G j ; Ψ ) → , j → ∞ , where Λ ,j is deﬁned in (58).Proof. Suppose that the maximum in the mutual coherence is attained at ¯ l ∈ Z , ¯ k ∈ Z and ( ι ) ∈{ (h) , (v) , (b) } (there is a maximum since the systems are translation invariant and the inner productgoes to zero as n goes to inﬁnity). For each N = 1 , . . . using (56) and Lemma 8.1 (iii), we have µ c (Λ ,j , G j ; Ψ ) = (cid:88) m ∈ B (0 ,M j ) ,n ∈ I ± T |(cid:104) ( g s j ) m,n , ψ α j ,ιj, ¯ l, ¯ k (cid:105)|≤ C N · − (2 − α j ) j/ { m ∈ B (0 , M j ) } · { n ∈ I ± T ∩ M j,l, ( ι ) }≤ C N · − (2 − α j ) j/ · M j · (2 − α j − (cid:15) ) j/ M j =2 (cid:15)j/ = C N · − (cid:15)j/ j → + ∞ −−−−→ . (cid:3) Cluster coherence of the projected frames.

In this subsection we compute the clustercoherence terms corresponding to the missing stripe. To be able to inpaint the missing part, thefollowing propositions rely of the fact that the gap size h j at each scale j is smaller than the essentiallength of the shearlet elements in this scale. For illustration, see Figure 8.In [25] the problem of inpainting a missing strip from a line singularity via shearlet frames wasstudied. As part of their construction, they prove the following proposition. Proposition 9.3. ( [25], Prop. 5.6)

Consider the shearlet frame Ψ of Deﬁnition 5.3. For h j = o (2 ( α j + (cid:15) ) j ) with α j ∈ (0 , , lim inf α j > and (cid:15) satisfying (60), we have µ c (Λ ± ,j , P j Ψ ; Ψ ) → , j → ∞ , where Λ ± ,j is deﬁned in (61), P j is deﬁned in (32). Figure 8.

Left: Frequency support of a shearlet (green) and Gabor elements(grey small squares). Right: Missing part (light brown), essential spatial supportof a shearlet (green) and a cluster of Gabor elements (grey). The gap width doesnot exceed the essential length of the shearlet.Next, we prove the following result.

Proposition 9.4.

Consider the the Gabor frame of scale j, G j deﬁned in (33). Suppose that h j = o (2 − α j j ) with α j ∈ (0 , , lim inf α j > , and I T satisﬁes (56) and (57) . Then we have µ c (Λ ,j , P j G j ; G j ) → , j → ∞ , where Λ ,j is deﬁned in (58) and P j is deﬁned in (32). For an illustration of this Proposition, we refer to Figure 8 (grey part). Next, we prove thefollowing result.

Proof.

First, from the deﬁnition of the cluster, we may assume that the maximum is attained for ( m (cid:48) , n (cid:48) ) ∈ Λ ,j . Namely, we have µ c (Λ ,j , P j G j ; G j ) = (cid:88) ( m,n ) ∈ M j × ( I ± T ∩A s,j ) |(cid:104) P j ( g s j ) m,n , ( g s j ) m (cid:48) ,n (cid:48) (cid:105)| = (cid:88) ( m,n ) ∈ M j × ( I ± T ∩A s,j ) (cid:12)(cid:12)(cid:12) (cid:90) h j − h j (cid:90) R ( g s j ) m,n ( x )( g s j ) m (cid:48) ,n (cid:48) ( x ) dx (cid:12)(cid:12)(cid:12) ≤ (cid:88) ( m,n ) ∈ M j × ( I ± T ∩A s,j ) (cid:90) h j − h j (cid:90) R | ( g s j ) m,n ( x ) || ( g s j ) m (cid:48) ,n (cid:48) ( x ) | dx. (95)By Lemma 8.1, we obtain the following decay estimate of ( g s j ) m,n ( x ) for any N ∈ N | ( g s j ) m,n ( x ) | ≤ C N · s j · (cid:104)| s j x + m |(cid:105) − N (cid:104)| s j x + m |(cid:105) − N . y = s j x , we have (cid:90) h j − h j (cid:90) R | ( g s j ) m,n ( x ) || ( g s j ) m (cid:48) ,n (cid:48) ( x ) | dx ≤ (cid:90) h j − h j (cid:90) R C N s j (cid:104)| s j x + m |(cid:105) − N (cid:104)| s j x + m |(cid:105) − N (cid:104)| s j x + m (cid:48) |(cid:105) − N ·(cid:104)| s j x + m (cid:48) |(cid:105) − N dx = (cid:90) h j s j − h j s j (cid:90) R C N (cid:104)| y + m |(cid:105) − N (cid:104)| y + m |(cid:105) − N (cid:104)| y + m (cid:48) |(cid:105) − N (cid:104)| y + m (cid:48) |(cid:105) − N dy. (96)Next, (cid:88) m (cid:104)| y + m |(cid:105) − N (cid:104)| y + m |(cid:105) − N ≤ C (cid:48) N , and (cid:90) R (cid:104)| y + m (cid:48) |(cid:105) − N dy ≤ C (cid:48)(cid:48) N . (97)Thus, by (57) and (95) combined with (96) and (97) , we obtain µ c (Λ ,j , P j G j ; G j ) ≤ (cid:88) n ∈ I ± T ∩A s,j C N · h j s j ≤ C N · α j j s j h j s j = C N · α j j h j j → + ∞ −−−−→ . (cid:3) Proposition 9.5.

Consider the shearlet frame Ψ of Deﬁnition 5.3 and the Gabor frame of scale j, G j , deﬁned in (33). Assuming that I T satisﬁes (56) and (57) , we have µ c (Λ ,j , P j G j , Ψ ) → , j → ∞ , where Λ ,j is deﬁned in (58) and P j is deﬁned in (32) .Proof. The maximum of the cluster coherence is attained for some ( j, ¯ l, ¯ k ; α j , ( ι )) ∈ ∆ j . Thus, µ c (Λ ,j , P j G j , Ψ ) = (cid:88) ( m,n ) ∈ M j × ( I ± T ∩A s,j ) |(cid:104) P j ( g s j ) m,n , ψ α j , ( ι ) j, ¯ l, ¯ k (cid:105)| = (cid:88) ( m,n ) ∈ M j × ( I ± T ∩A s,j ) (cid:12)(cid:12)(cid:12) (cid:90) h j − h j (cid:90) R ( g s j ) m,n ( x ) ψ α j , ( ι ) j, ¯ l, ¯ k ( x ) dx (cid:12)(cid:12)(cid:12) ≤ (cid:88) ( m,n ) ∈ M j × ( I ± T ∩A s,j ) (cid:90) h j − h j (cid:90) R | ( g s j ) m,n ( x ) || ψ α j , ( ι ) j, ¯ l, ¯ k ( x ) | dx. Now we consider three cases1) Case ( ι ) = (v) . For each N = 1 , , . . . , by Lemma 8.1, we derive the following decay estimatesof ( g s j ) m,n ( x ) and ψ α j , ( ι ) j, ¯ l, ¯ k ( x ) | ( g s j ) m,n ( x ) | ≤ C N · s j · (cid:104)| s j x + m |(cid:105) − N (cid:104)| s j x + m |(cid:105) − N , | ψ α j , (v) j, ¯ l, ¯ k ( x ) | ≤ C N · (2+ α ) j/ · (cid:104)| α j j x − ¯ k |(cid:105) − N (cid:104)| j x + ¯ l α j j x − ¯ k |(cid:105) − N . y = S ¯ l (v) A jα j , (v) x = (2 α j j x , j x + ¯ l α j j x ) , we have (cid:90) h j − h j (cid:90) R | ( g s j ) m,n ( x ) || ψ α j , (v) j, ¯ l, ¯ k ( x ) | dx ≤ C N · − (2+ α j ) j/ · s j (cid:90) αjj h j − αjj h j (cid:90) R (cid:104)| s j − α j j y + m |(cid:105) − N (cid:104)| s j − j ( y − ¯ ly ) + m |(cid:105) − N ·(cid:104)| y − ¯ k |(cid:105) − N (cid:104)| y − ¯ k |(cid:105) − N dy dy . ≤ C N · − (2+ α j ) j/ · s j (cid:90) R (cid:90) R (cid:104)| s j − α j j y + m |(cid:105) − N (cid:104)| s j − j ( y − ¯ ly ) + m |(cid:105) − N ·(cid:104)| y − ¯ k |(cid:105) − N (cid:104)| y − ¯ k |(cid:105) − N dy dy . (98)Furthermore, (cid:88) m (cid:104)| s j − α j j y + m |(cid:105) − N (cid:104)| s j − j ( y − ¯ ly ) + m |(cid:105) − N ≤ C (cid:48) N (99)and (cid:90) R (cid:104)| y − ¯ k |(cid:105) − N dy ≤ C (cid:48)(cid:48) N , (cid:90) R (cid:104)| y − ¯ k |(cid:105) − N dy ≤ C (cid:48)(cid:48)(cid:48) N . (100)Thus, by (98), (99), (100) and (57), we obtain µ c (Λ ,j , P j G j ; Ψ ) ≤ (cid:88) n ∈ I ± T ∩A s,j C N · − (2+ α j ) j/ · s j ≤ C N · − (2+ α j ) j/ · s j · α j j s j = C N · − (2 − α j ) j/ j → + ∞ −−−−→ .

2) Case ( ι ) = (h) . Similarly, by the change of variable y = S ¯ l (h) A jα j , (h) x = (2 j x +¯ l α j j x , α j j x ) ,we have (2+ α j ) j/ · s − j · (cid:90) h j − h j (cid:90) R | ( g s j ) m,n ( x ) || ψ α j , (h) j, ¯ l, ¯ k ( x ) | dx ≤ C N (cid:90) R (cid:104)| s j − j ( y − ¯ ly ) + m |(cid:105) − N (cid:104)| s j − α j j y + m |(cid:105) − N (cid:104)| y − ¯ k |(cid:105) − N (cid:104)| y − ¯ k |(cid:105) − N dy. By using similar argument as in case ( ι ) = (v) , we ﬁnally obtain µ c (Λ ,j , P j G j ; Ψ ) ≤ (cid:88) n ∈ I ± T ∩A s,j C N · − (2+ α j ) j/ · s j (57) ≤ C N · − (2 − α j ) j/ j → + ∞ −−−−→ .

1) Case ( ι ) = (b) . Recalling the deﬁnition of boundary shearlets, the decay estimate can be donesimilarly. (cid:3)

Proposition 9.6.

Consider the shearlet frame Ψ of Deﬁnition 5.3 and the Gabor frame of scale j, G j , deﬁned in (33). Suppose that s j ≤ α j j , h j = o (2 α j j ) , and α j ∈ (0 , , lim inf α j > . Thenwe have µ c (Λ ± ,j , P j Ψ ; G j ) → , j → ∞ , where Λ ± ,j is deﬁned in (61) and P j is deﬁned in (32). Proof.

Without loss of generality, we prove only µ c (Λ ,j , P j Ψ ; G j ) → , j → ∞ . First, we assumethat the maximum in the deﬁnition of the cluster is attained for some ( ¯ m, ¯ n ) ∈ Z × Z . We have µ c (Λ ,j , P j Ψ ; G j ) = (cid:88) ( j,l,k ; α j ,ι ) ∈ Λ ,j |(cid:104) P j ψ α j , ( ι ) j,l,k , ( g s j ) ¯ m, ¯ n (cid:105)| = (cid:88) | l |≤ ,k ∈ Z | k − lk |≤ (cid:15)j (cid:12)(cid:12)(cid:12) (cid:90) h j − h j (cid:90) R ψ α j , (v) j,l,k ( x )( g s j ) ¯ m, ¯ n ( x ) dx (cid:12)(cid:12)(cid:12) ≤ (cid:88) | l |≤ ,k ∈ Z | k − lk |≤ (cid:15)j (cid:90) h j − h j (cid:90) R | ψ α j , (v) j,l,k ( x ) || ( g s j ) ¯ m, ¯ n ( x ) | dx | . For each N = 1 , , . . . , by Lemma 8.1, we derive the following decay estimates of ψ α j , (v) j,l,k ( x ) and ( g s j ) ¯ m, ¯ n ( x ) | ψ α j , (v) j,l,k ( x ) | ≤ C N · (2+ α ) j/ · (cid:104)| α j j x − k |(cid:105) − N (cid:104)| j x + l α j j x − k |(cid:105) − N , | ( g s j ) ¯ m, ¯ n ( x ) | ≤ C N · s j · (cid:104)| s j x + ¯ m |(cid:105) − N (cid:104)| s j x + ¯ m |(cid:105) − N . Thus, by changing variable y = S l (v) A jα j , (v) x = (2 α j j x , j x + l α j j x ) , we have (cid:90) h j − h j (cid:90) R | ψ α j , (v) j,l,k ( x ) | · | ( g s j ) ¯ m, ¯ n ( x ) | dx ≤ C N · − (2+ α j ) j/ · s j · (cid:90) αjj h j − αjj h j (cid:90) R (cid:104)| y − k |(cid:105) − N (cid:104)| y − k |(cid:105) − N (cid:104)| s j − α j j y + ¯ m |(cid:105) − N · (cid:104)| s j − j ( y − ly ) + ¯ m |(cid:105) − N dy dy . (101)Next, (cid:104)| s j − α j j y + ¯ m |(cid:105) − N (cid:104)| s j − j ( y − ly ) + ¯ m |(cid:105) − N ≤ (102)and (cid:90) R (cid:16) (cid:88) | l |≤ ,k ∈ Z | k − lk |≤ (cid:15)j (cid:104)| y − k |(cid:105) − N (cid:104)| y − k |(cid:105) − N (cid:17) dy = (cid:88) | l |≤ ,k ,k ∈ Z | k |≤ (cid:15)j (cid:104)| y − k |(cid:105) − N · (cid:16) (cid:90) R (cid:104)| y − k − lk |(cid:105) − N dy (cid:17) ≤ C N · (cid:15)j · (cid:90) R (cid:104)| t − y |(cid:105) − N dt ≤ C (cid:48) N (cid:15)j . (103)Thus, by (101), (102), (103), we obtain µ c (Λ ,j , P j Ψ ; G j ) ≤ C (cid:48) N · − (2+ α j ) j/ · s j · α j j h j · (cid:15)js j ≤ αjj ≤ C (cid:48) N · − (2 − α j − (cid:15) ) j/ · α j j h j j → + ∞ −−−−→ . (cid:3) Proof of theorem 6.4

We are now ready to present the proof of Theorem 6.4.

Proof.

By Theorem 3.8, we have (cid:107) P j C ∗ j − P j w S j (cid:107) + (cid:107) P j T ∗ j − P j T s,j (cid:107) ≤ δ j − µ c,j , (104)where δ j = δ ,j + δ ,j and µ c,j = max { µ c (Λ ± ,j , P j Ψ ; Ψ ) + µ c (Λ ,j , P j G j ; Ψ ) , µ c (Λ ,j , P j G j ; G j ) + µ c (Λ ± ,j , P j Ψ ; G j ) } + max { µ c (Λ ± ,j , Φ ; G j ) , µ c (Λ , G j ; Ψ ) } . Moreover, it follows from Propositions 8.3 and 8.6 that δ j = δ + δ = o (2 − Nj ) , ∀ N ∈ N . (105)For the other term, the following estimate holds as a consequence of Propositions 9.1, 9.2, 9.3, 9.4,9.5 and 9.6 µ c,j −→ , j → + ∞ . (106)Combining (104), (105), and (106), we get (cid:107)C ∗ j − w S j (cid:107) + (cid:107)T ∗ j − T s,j (cid:107) = o (2 − Nj ) , ∀ N ∈ N . This proves the ﬁrst claim of the theorem.The second claim is obtained since the following estimate holds for any arbitrarily large number N ∈ N (cid:107) P j C ∗ j − P j w S j (cid:107) + (cid:107) P j T ∗ j − P j T s,j (cid:107) ≤ (cid:107)C ∗ j − C j (cid:107) + (cid:107)T ∗ j − T s,j (cid:107) ≤ − Nj . (107)By Lemma 6.3 and (107), for N ∈ N chosen suﬃciently large, we obtain (cid:107) P j C ∗ j − P j w S j (cid:107) (cid:107) P j w S j (cid:107) ≤ o (2 − Nj ) h j − j j → + ∞ −−−−→ , and (cid:107) P j T ∗ j − P j T s,j (cid:107) h j − j = o (2 − Nj ) h j − j j → + ∞ −−−−→ , which concludes the proof. (cid:3) References [1] Ole Christensen,

An introduction to frames and Riesz bases, Applied and Numerical Harmonic Analysis ,Birkhäuser Boston Inc., Boston, MA, 2003. MR 1946982(2003k:42001).[2] M. Davenport, M. Duarte, Y. Eldar, and G. Kutyniok,

Compressed Sensing: Theory and Applications , CambridgeUniversity Press, 2012.[3] J.-L. Starck, M. Elad, and D. L. Donoho,

Image Decomposition: Separation of Texture from Piecewise SmoothContent , SPIE Proc. 5207, SPIE, Bellingham, WA, 2003.[4] R. Gribonval and E. Bacry,

Harmonic decomposition of audio signals with matching pursuit , IEEE Trans. SignalProc.51(1) (2003), 101–111.[5] M. Zibulevsky and B. Pearlmutter,

Blind source separation by sparse decomposition in a signal dictionary , Neur.Comput.13(2001), 863–882.[6] K. Guo, G. Kutyniok, and D. Labate,

Sparse multidimensional representations using anisotropic dilationandshear operators, Wavelets and Splines (Athens, GA, 2005), Nashboro Press.14 (2006), 189-201.[7] K. Guo and D. Labate,

Optimally sparse multidimensional representation using shearlets , SIAM J. Math.Anal.39(2007), no. 1, 298–318. MR 2318387 (2008k:42097).[8] Emmanuel J. Candes and David L. Donoho,

New tight frames of curvelets and optimal representations of objectswith piecewise C singularities , Comm. Pure Appl. Math.57 (2004), no. 2, 219-266.[9] Candès, E.J., Donoho, D.L, Continuous curvelet transform. I. Resolution of the wavefront set , Appl. Comput.Harmon. Anal. 19(2), 162–197 (2005).[10] D.L. Donoho, G. Kutyniok,

Microlocal analysis of the geometric separation problem , Comm. Pure Appl. Math.66 (2013), no. 1, 1-47. MR 2994548.[11] I. Daubechies, A. Grossman, and Y. Meyer,

Painless nonorthogonal expansions , Journal Math. Phys.27 (1986),1271-1283.[12] P. Grohs, S. Keiper, G. Kutyniok, M. Schäfer. (2014).

Parabolic Molecules: Curvelets, Shearlets, and Beyond .Springer Proceedings in Mathematics and Statistics. 83. 10.1007/978-3-319-06404-89.[13] G. Kutyniok,

Clustered sparsity and separation of cartoon and texture , SIAM J. Imaging Sci. 6 (2013), 848-874.[14] Amiri, Z & Kamyabi-Gol, R. (2018).

Inpainting via High-dimensional Universal Shearlet Systems . Acta Appli-candae Mathematicae. 156. 10.1007/s10440-018-0159-0.[15] G. Kutyniok, W. Q. Lim, and R. Reisenhofer.

Shearlab 3D: Faithful digital shearlet transform with compactlysupported shearlets . ACM Transactions on Mathematical Software, 42(1), 2015.[16] Hennenfent, G., Fenelon, L., Herrmann, F.J,

Nonequispaced curvelet transform for seismic data reconstruction:a sparsity promoting approach . Geophysics 75(6), WB203–WB210 (2010).[17] Herrmann, F.J., Hennenfent, G.,

Non-parametric seismic data recovery with curvelet frames . Geophys. J. Int.173, 233–248 (2008).[18] Hennenfent, G., Herrmann, F.J.,

Application of stable signal recovery to seismic interpolation . In: SEG Inter-national Exposition and 76th Annual Meeting. SEG, Tulsa (2006).[19] E.J.King, G. Kutyniok, X. Zhuang,

Analysis of Inpainting via clustered sparsity and microlocal analysis, J.Math.Imaging Vis. 48 (2014), 205-234. [20] G. Kutyniok, J. Lemvig, and W.-Q Lim,

Optimally sparse approximations of 3D functions by compactly supportedshearlet frames , SIAM J. Math. Anal., 44 (2012), pp. 2962–3017 [21] E.J. King, G. Kutyniok, and X. Zhuang,

Analysis of data separation and recovery problems using clusteredsparsity , Proc. SPIE, Wavelets and Sparsity XIV, vol. 8138, (2011), pp. 813818-813818-11.[22] J. A. Tropp,

Greed is good: algorithmic results for sparse approximation , IEEE Trans. Inform. Theory50(10)(2004), 2231–2242.[23] J.-L. Starck, M. Elad, and D. L. Donoho,

Redundant multiscale transforms and their application for morphologicalcomponent analysis , Adv. Imag. Electron Phys., 132 (2004), pp. 287–348.[24] Jian-Feng Cai, Raymond H. Chan, Zuowei Shen.

Simultaneous cartoon and texture inpainting . Inverse Problems& Imaging, 2010, 4 (3) : 379-395.[25] M. Genzel, G. Kutyniok,

Asymptotic Analysis of Inpainting via Universal Shearlet Systems , SIAM J. ImagingSci. 7.4 (2014), 2301-2339.[26] M. Elad and A. M. Bruckstein,

A Generalized Uncertainty Principle and Sparse Representation in Pairsof Bases ,IEEE Trans. Inform. Theory 48(9) (2002), 2558–2567.[27] J.-L. Starck, Y. Moudden, J. Bobin, M. Elad, and D.L. Donoho,

Morphological Component Analysis ,WaveletsXI (San Diego, CA, 2005), SPIE Proc. 5914, SPIE, Bellingham, WA, 2005.[28] J. -L. Starck, M. Elad and D. L. Donoho,

Image decomposition via the combination of sparse representationsand a variational approach, in IEEE Transactions on Image Processing, vol. 14, no. 10, pp. 1570-1582, Oct. 2005,doi: 10.1109/TIP.2005.852206.[29] K. Guo and D. Labate,

The construction of smooth Parseval frames of shearlets , Math. Model. Nat. Phenom.,8 (2013), pp. 82–105.[30] S. Osher, A. Sole, and L. Vese,

Image decomposition and restoration using total variation minimization and the H − norm, Multiscale Model . Simul. 1(3) (2003), 349–370.[31] D. L. Donoho and G. Kutyniok, Geometric Separation using a Wavelet-Shearlet Dictionary .In L. Fesquet and B.Torrésani, editors, Proceedings of 8th International Conference on Sampling Theory and Applications (SampTA),Marseille, 2009.[32] D. L. Donoho and X. Huo,

Uncertainty principles and ideal atomic decomposition , IEEE Trans. Inform. The-ory47(7) (2001), 2845–2862.[33] M. Bertalmio, L. Vese, G. Sapiro and S. Osher,

Simultaneous structure and texture image inpainting, in IEEETransactions on Image Processing, vol. 12, no. 8, pp. 882-889, Aug. 2003.[34] G. Kutyniok and W.-Q Lim,

Image separation using shearlets. InCurves and Surfaces (Avignon,France, 2010),Lecture Notes in Computer Science 6920, Springer, 2012.[35] D. L. Donoho, M. Elad, and V. N. Temlyakov,

Stable recovery of sparse overcomplete representationsin thepresence of noise , IEEE Trans. Inform. Theory 52(1) (2006), 6–18.[36] L. Borup, R. Gribonval, and M. Nielsen,

Beyond Coherence : Recovering Structured Time-Frequency Represen-tations , Appl. Comput. Harmon. Anal.24 (1) (2008), 120–128.[37] L. A. Vese and S. J. Osher,

Modeling Textures with Total Variation Minimization and Oscillating Patterns inImage Processing , J. Sci. Comput., 19(1-3) (2003), 553–572.[38] Duﬃn, Richard James; Schaeﬀer, Albert Charles (1952).

A class of nonharmonic Fourier series . Transactionsof the American Mathematical Society. 72 (2): 341–366. doi:10.2307/1990760. JSTOR 1990760. MR 0047179.[39] O. Ben-Shahar and S. W. Zucker,

The Perceptual Organization of Texture Flow: A Contextual Inference Ap-proach , IEEE Trans. Pattern Anal. 25(4) (2003), 401–417.[40] D. L. Donoho and M. Elad,

Optimally sparse representation in general (nonorthogonal) dictionaries via l minimization , Proc. Natl. Acad. Sci. USA100(5) (2003), 2197–2202.[41] Tamura H., Mori S., Yamawaki T., Textural Features Corresponding to Visual Perception , IEEE Trans onSystems, Man, and Cybernetics, SMC-8, pp. 460-473, 1978.[42] R. Gribonval and M. Nielsen,

Sparse representations in unions of bases , IEEE Trans. Inform. Theory 49(12)(2003), 3320–3325.[43] Y. Meyer,

Oscillating Patterns in Image Processing and Nonlinear Evolution Equations , University LectureSeries, vol. 22, Amer. Math. Soc., 2001.[44] C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, j. Verdera,

Filling-in by joint interpolation of vector ﬁeldsand gray levels.

IEEE Trans. Image Process. 10(8), 1200–1211 (2001).[45] M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho,

Simultaneous cartoon and texture image inpaintingusingmorphological component analysis (MCA) , Appl. Comput. Harmon. Anal.19(3) (2005), 340–358.[46] A.A. Efros, T.K. Leung,

Texture synthesis by non-parametric sampling , in: IEEE International Conference onComputer Vision, Corfu, Greece, September 1999, pp. 1033–1038[47] B. Dong, H. Ji, J. Li, Z. Shen, and Y. Xu,

Wavelet frame based blind image inpainting , Appl. Comput. Har-mon.Anal., 32 (2012), pp. 268–279.[48] M. Bertalmio, A. L. Bertozzi, and G. Sapiro,

Navier-stokes, ﬂuid dynamics, and image and video inpainting ,inComputer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE ComputerSocietyConference on, vol. 1, IEEE, 2001, pp. I–355–I–362.[49] T.A. Tropp,

Just relax: Convex programming methods for subset selection and sparse approximation , IEEETrans. Inform. Theory (2004), in press.[50] M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester,

Image inpainting . In: Proceedings of SIGGRAPH 2000, NewOrleans, pp. 417–424 (2000).[51] T. F. Chan and J. Shen, Mathematical models for local nontexture inpaintings, SIAM J. Appl. Math., 62(2001/02),pp. 1019–1043.[52] P. Grohs, S. Keiper, G. Kutyniok, and M. Schäfer,

Cartoon approximation with α -curvelets. J. Fourier Anal.Appl., 22(6):1235-1293, (2016).[53] M. Schäfer,

The role of α − scaling for cartoon approximation. arXiv:1612.01036. [54] D. L. Donoho, Sparse components of images and optimal atomic decomposition.

Constr. Approx., 17:353–382,2001.[55] Guo, Kanghui and Labate, Demetrio and Ayllon, Jose,

Image inpainting using sparse multiscale representations:image recovery performance guarantees.

Appl. Comput. Harmon. Anal., (2020).[56] L. Grafakos,

Classical fourier analysis , vol. 86, Springer, 2008.[57] P. Y. Simard, H. S. Malvar, J. Rinker and E. Renshaw,