[PDF] Removing Gaussian Noise by Optimization of Weights in Non-Local Means

Abstract

A new image denoising algorithm to deal with the additive Gaussian white noise model is given. Like the non-local means method, the filter is based on the weighted average of the observations in a neighborhood, with weights depending on the similarity of local patches. But in contrast to the non-local means filter, instead of using a fixed Gaussian kernel, we propose to choose the weights by minimizing a tight upper bound of mean square error. This approach makes it possible to define the weights adapted to the function at hand, mimicking the weights of the oracle filter. Under some regularity conditions on the target image, we show that the obtained estimator converges at the usual optimal rate. The proposed algorithm is parameter free in the sense that it automatically calculates the bandwidth of the smoothing kernel; it is fast and its implementation is straightforward. The performance of the new filter is illustrated by numerical simulations.

Full PDF

aa r X i v : . [ s t a t . O T ] S e p Removing Gaussian Noise by Optimization of Weightsin Non-Local Means

Qiyu Jin a,b

Ion Grama a,b

Quansheng Liu a,b [email protected] [email protected]@univ-ubs.fr a Universit´e de Bretagne-Sud, Campus de Tohaninic, BP 573, 56017 Vannes, France b Universit´e Europ´eenne de Bretagne, France

Abstract

A new image denoising algorithm to deal with the additive Gaussian white noisemodel is given. Like the non-local means method, the ﬁlter is based on the weightedaverage of the observations in a neighborhood, with weights depending on the simi-larity of local patches. But in contrast to the non-local means ﬁlter, instead of usinga ﬁxed Gaussian kernel, we propose to choose the weights by minimizing a tight up-per bound of mean square error. This approach makes it possible to deﬁne theweights adapted to the function at hand, mimicking the weights of the oracle ﬁlter.Under some regularity conditions on the target image, we show that the obtainedestimator converges at the usual optimal rate. The proposed algorithm is parameterfree in the sense that it automatically calculates the bandwidth of the smoothingkernel; it is fast and its implementation is straightforward. The performance of thenew ﬁlter is illustrated by numerical simulations.

Keywords : Non-local means, image denoising, optimization weights, oracle, statisticalestimation.

We deal with the additive Gaussian noise model Y ( x ) = f ( x ) + ε ( x ) , x ∈ I , (1)where I is a uniform N × N grid of pixels on the unit square, Y = ( Y ( x )) x ∈ I is theobserved image brightness, f : [0 , → R + is an unknown target regression functionand ε = ( ε ( x )) x ∈ I are independent and identically distributed (i.i.d.) Gaussian randomvariables with mean 0 and standard deviation σ > . Important denoising techniques for1he model (1) have been developed in recent years, see for example Buades, Coll and Morel(2005 [1]), Kervrann (2006 [10]), Lou, Zhang, Osher and Bertozzi (2010 [14]), Polzehland Spokoiny (2006 [17]), Garnett, Huegerich and Chui (2005 [8]), Cai, Chan, Nikolova(2008 [3]), Katkovnik, Foi, Egiazarian, and Astola ( 2010 [9]), Dabov, Foi, Katkovnik andEgiazarian (2006 [2]). A signiﬁcant step in these developments was the introduction ofthe Non-Local Means ﬁlter by Buades, Coll and Morel [1] and its variants (see e.g. [10],[11], [14]). In these ﬁlters, the basic idea is to estimate the unknown image f ( x ) by aweighted average of the form e f w ( x ) = X x ∈ I w ( x ) Y ( x ) , (2)where w = ( w ( x )) x ∈ I are some non-negative weights satisfying P x ∈ I w ( x ) = 1 . Thechoice of the weights w are based essentially on two criteria: a local criterion so that theweights are as a decreasing function of the distance to the estimated pixel, and a non-localcriterion which gives more important weights to the pixels whose brightness is close tothe brightness of the estimated pixel (see e.g. Yaroslavsky (1985 [25]) and Tomasi andManduchi (1998 [23])). The non-local approach has been further completed by a fruitfulidea which consists in attaching small regions, called data patches, to each pixel andcomparing these data patches instead of the pixels themselves.The methods based on the non-local criterion consist of a comparatively novel directionwhich is less studied in the literature. In this paper we shall address two problems relatedto this criterion.The ﬁrst problem is how to choose data depending on weights w in (2) in some opti-mal way. Generally, the weights w are deﬁned through some priory ﬁxed kernels, oftenthe Gaussian one, and the important problem of the choice of the kernel has not beenaddressed so far for the non-local approach. Although the choice of the Gaussian kernelseems to show reasonable numerical performance, there is no particular reason to re-strict ourselves only to this type of kernel. Our theoretical results and the accompanyingsimulations show that another kernel should be preferred. In addition to this, for theobtained optimal kernel we shall also be interested in deriving a locally adaptive rule forthe bandwidth choice. The second problem that we shall address is the convergence of theobtained ﬁlter to the true image. Insights can be found in [1], [10], [11] and [13], howeverthe problem of convergence of the Non-Local Means Filter has not been completely settledso far. In this paper, we shall give some new elements of the proof of the convergence ofthe constructed ﬁlter, thereby giving a theoretical justiﬁcation of the proposed approachfrom the asymptotic point of view.Our main idea is to produce a very tight upper bound of the mean square error R (cid:16) e f w ( x ) (cid:17) = E (cid:16) e f w ( x ) − f ( x ) (cid:17) in terms of the bias and variance and to minimize this upper bound in w under theconstraints w ≥ P x ∈ I w ( x ) = 1 . In contrast to the usual approach where a speciﬁcclass of target functions is considered, here we give a bound of the bias depending onlyon the target function f at hand, instead of using just a bound expressed in terms ofthe parameters of the class. We ﬁrst obtain an explicit formula for the optimal weights2 ∗ in terms of the unknown function f. In order to get a computable ﬁlter, we estimate w ∗ by some adaptive weights b w based on data patches from the observed image Y. Wethus obtain a new ﬁlter, which we call

Optimal Weights

Filter. To justify theoreticallyour ﬁlter, we prove that it achieves the optimal rate of convergence under some regularityconditions on f. Numerical results show that Optimal Weights Filter outperforms thetypical Non-Local Means Filter, thus giving a practical justiﬁcation that the optimalchoice of the kernel improves the quality of the denoising, while all other conditions arethe same.We would like to point out that related optimization problems for non parametricsignal and density recovering have been proposed earlier in Sacks and Ylvysaker (1978[22]), Roll (2003 [19]), Roll and Ljung (2004 [20]), Roll, Nazin and Ljung (2005 [21]),Nazin, Roll, Ljung and Grama (2008 [15]). In these papers the weights are optimizedover a given class of regular functions and thus depend only on some parameters of theclass. This approach corresponds to the minimax setting, where the resulting minimaxestimator has the best rate of convergence corresponding to the worst image in the givenclass of images. If the image happens to have better regularity than the worst one, theminimax estimator will exhibit a slower rate of convergence than expected. The noveltyof our work is to ﬁnd the optimal weights depending on the image f at hand, whichimplicates that our Optimal Weights Filter automatically attains the optimal rate ofconvergence for each particular image f. Results of this type are related to the ”oracle”concept developed in Donoho and Johnstone (1994 [6]).Filters with data-dependent weights have been previously studied in many papers,among which we mention Polzehl and Spokoiny (2000 [18], 2003 [16], 2006 [17]), Kervrann(2006 [10] and 2007 [12]). Compared with these ﬁlters our algorithm is straightforwardto implement and gives a quality of denoising which is close to that of the best recentmethods (see Table 2). The weight optimization approach can also be applied with thesealgorithms to improve them. In particular, we can use it with recent versions of the Non-Local Means Filter, like the BM3D (see 2006 [2], 2007 [4, 5]); however this is beyond thescope of the present paper and will be done elsewhere.The paper is organized as follows. Our new ﬁlter based on the optimization of weightsin the introduction in Section 2 where we present the main idea and the algorithm. Ourmain theoretical results are presented in Section 3 where we give the rate of convergenceof the constructed estimators. In Section 4, we present our simulation results with a briefanalysis. Proofs of the main results are deferred to Section 5.To conclude this section, let us set some important notations to be used throughoutthe paper. The Euclidean norm of a vector x = ( x , ..., x d ) ∈ R d is denoted by k x k = (cid:16)P di =1 x i (cid:17) . The supremum norm of x is denoted by k x k ∞ = sup ≤ i ≤ d | x i | . The cardinalityof a set A is denoted card A . For a positive integer N the uniform N × N -grid of pixelson the unit square is deﬁned by I = (cid:26) N , N , · · · , N − N , (cid:27) . (3)Each element x of the grid I will be called pixel. The number of pixels is n = N . For3ny pixel x ∈ I and a given h > , the square window of pixels U x ,h = { x ∈ I : k x − x k ∞ ≤ h } (4)will be called search window at x . We naturally take h as a multiple of N ( h = kN for some k ∈ { , , · · · , N } ). The size of the square search window U x ,h is the positive integernumber M = nh = card U x ,h . For any pixel x ∈ U x ,h and a given η > V x,η = { y ∈ I : k y − x k ∞ ≤ η } (5)will be called for short a patch window at x in order to be distinguished from the searchwindow U x ,h . Like h , the parameter η is also taken as a multiple of N . The size of thepatch window V x,η is the positive integer m = nη = card V x ,η . The vector Y x,η = ( Y ( y )) y ∈ V x,η formed by the the values of the observed noisy image Y at pixels in the patch V x,η will be called simply data patch at x ∈ U x ,h . Finally, thepositive part of a real number a is denoted by a + , that is a + = (cid:26) a if a ≥ , a < . Let h > x ∈ I consider a family of weighted estimates e f h,w ( x )of the form e f h,w ( x ) = X x ∈ U x ,h w ( x ) Y ( x ) , (6)where the unknown weights satisfy w ( x ) ≥ X x ∈ U x ,h w ( x ) = 1 . (7)The usual bias plus variance decomposition of the mean square error gives E (cid:16) e f h,w ( x ) − f ( x ) (cid:17) = Bias + V ar, (8)with

Bias =  X x ∈ U x ,h w ( x ) ( f ( x ) − f ( x ))  and V ar = σ X x ∈ U x ,h w ( x ) . The decomposition (8) is commonly used to construct asymptotically minimax estimatorsover some given classes of functions in the nonparametric function estimation. In order4o highlight the diﬀerence between the approach proposed in the present paper and theprevious work, suppose that f belongs to the class of functions satisfying the H¨oldercondition | f ( x ) − f ( y ) | ≤ L k x − y k β ∞ , ∀ x, y ∈ I . In this case, it is easy to see that E (cid:16) e f h,w ( x ) − f ( x ) (cid:17) ≤  X x ∈ U x ,h w ( x ) L | x − x | β  + σ X x ∈ U x ,h w ( x ) . (9)Optimizing further the weights w in the obtained upper bound gives an asymptoticallyminimax estimate with weights depending on the unknown parameters L and β (fordetails see [22]). With our approach the bias term Bias will be bounded in terms of theunknown function f itself. As a result we obtain some ”oracle” weights w adapted to theunknown function f at hand, which will be estimated further using data patches from theimage Y. First, we shall address the problem of determining the ”oracle” weights. With thisaim denote ρ f,x ( x ) ≡ | f ( x ) − f ( x ) | . (10)Note that the value ρ f,x ( x ) characterizes the variation of the image brightness of thepixel x with respect to the pixel x . From the decomposition (8), we easily obtain a tightupper bound in terms of the vector ρ f,x : E (cid:16) e f h ( x ) − f ( x ) (cid:17) ≤ g ρ f,x ( w ) , (11)where g ρ f,x ( w ) =  X x ∈ U x ,h w ( x ) ρ f,x ( x )  + σ X x ∈ U x ,h w ( x ) . (12)From the following theorem we can obtain the form of the weights w which minimizethe function g ρ f,x ( w ) under the constraints (7) in terms of the values ρ f,x ( x ) . For thesake of generality, we shall formulate the result for an arbitrary non-negative function ρ ( x ) , x ∈ U x ,h . Deﬁne the objective function g ρ ( w ) =  X x ∈ U x ,h w ( x ) ρ ( x )  + σ X x ∈ U x ,h w ( x ) . (13)Introduce into consideration the strictly increasing function M ρ ( t ) = X x ∈ U x ,h ρ ( x )( t − ρ ( x )) + , t ≥ . (14)Let K tr be the usual triangular kernel: K tr ( t ) = (1 − | t | ) + , t ∈ R . (15)5 heorem 1 Assume that ρ ( x ) , x ∈ U x ,h , is a non-negative function. Then the uniqueweights which minimize g ρ ( w ) subject to (7) are given by w ρ ( x ) = K tr ( ρ ( x ) a ) P y ∈ U x ,h K tr ( ρ ( x ) a ) , x ∈ U x ,h , (16) where the bandwidth a > is the unique solution on (0 , ∞ ) of the equation M ρ ( a ) = σ . (17)Theorem 1 can be obtained from a result of Sacks and Ylvysaker [22]. The proof isdeferred to Section 5.1. Remark 2

The value of a > can be calculated as follows. We sort the set { ρ ( x ) | x ∈ U x ,h } in the ascending order ρ ≤ ρ ≤ · · · ≤ ρ M < ρ M +1 = + ∞ , where M = Card U x ,h . Let a k = σ + k P i =1 ρ ik P i =1 ρ i , ≤ k ≤ M, (18) and k ∗ = max { ≤ k ≤ M | a k ≥ ρ k } = min { ≤ k ≤ M | a k < ρ k } − , (19) with the convention that a k = ∞ if ρ k = 0 and that min ∅ = M +1 . Then the solution a > of (17) can be expressed as a = a k ∗ ; moreover, k ∗ is the unique integer k ∈ { , · · · , M } such that a k ≥ ρ k and a k +1 < ρ k +1 if k < M . The proof of the remark is deferred to Section 5.2.Let x ∈ I . Using the optimal weights given by Theorem 1, we ﬁrst introduce thefollowing non computable approximation of the true image, called ”oracle”: f ∗ h ( x ) = P x ∈ U x ,h K tr ( ρ f,x ( x ) a ) Y ( x ) P y ∈ U x ,h K tr ( ρ f,x ( x ) a ) , (20)where the bandwidth a is the solution of the equation M ρ f,x ( a ) = σ . A computable ﬁltercan be obtained by estimating the unknown function ρ f,x ( x ) and the bandwidth a fromthe data as follows.Let h > η > x ∈ I and any x ∈ U x ,h consider adistance between the data patches Y x,η = ( Y ( y )) y ∈ V x,η and Y x ,η = ( Y ( y )) y ∈ V x ,η deﬁnedby d ( Y x,η , Y x ,η ) = 1 m k Y x,η − Y x ,η k , m = card V x,η , and k Y x,η − Y x ,η k = P x + z ∈ V x ,η ( Y ( x + z ) − Y ( x + z )) . SinceBuades, Coll and Morel [1] the distance d ( Y x,η , Y x ,η ) is known to be a ﬂexible tool tomeasure the variations of the brightness of the image Y. As Y ( x + z ) − Y ( x + z ) = f ( x + z ) − f ( x + z ) + ǫ ( x + z ) − ǫ ( x + z )we have E ( Y ( x + z ) − Y ( x + z )) = ( f ( x + z ) − f ( x + z )) + 2 σ . If we use the approximation( f ( x + z ) − f ( x + z )) ≈ ( f ( x ) − f ( x )) = ρ f,x ( x )and the law of large numbers, it seems reasonable that ρ f,x ( x ) ≈ d ( Y x,η , Y x ,η ) − σ . But our simulations show that a much better approximation is ρ f,x ( x ) ≈ b ρ x ( x ) = (cid:16) d ( Y x,η , Y x ,η ) − √ σ (cid:17) + . (21)The fact that b ρ x ( x ) is a good estimator of ρ f,x will be justiﬁed by convergence theorems:cf. Theorems 4 and 5 of Section 3. Thus our Optimal Weights Filter is deﬁned by b f ( x ) = b f h,η ( x ) = P x ∈ U x ,h K tr ( b ρ x ( x ) b a ) Y ( x ) P y ∈ U x ,h K tr ( b ρ x ( x ) b a ) , (22)where the bandwidth b a > M b ρ x ( b a ) = σ , which can becalculated as in Remark 2 (with ρ ( x ) and a replaced by b ρ x ( x ) and b a respectively). Weend this section by giving an algorithm for computing the ﬁlter (22). The input values ofthe algorithm are the image Y ( x ) , x ∈ I , the variance of the noise σ and two numbers m and M representing the sizes of the patch window and the search window respectively. Algorithm :

Optimal Weights FilterRepeat for each x ∈ I give an initial value of b a : b a = 1 (it can be an arbitrary positive number).compute { b ρ x ( x ) | x ∈ U x ,h } by (21)/ compute the bandwidth b a at x reorder { b ρ x ( x ) | x ∈ U x ,h } as increasing sequence, say b ρ x ( x ) ≤ b ρ x ( x ) ≤ · · · ≤ b ρ x ( x M )loop from k = 1 to M if P ki =1 b ρ x ( x i ) > σ + P ki =1 b ρ x ( x i ) P ki =1 b ρ x ( x i ) ≥ b ρ ( x k ) then b a = σ + P ki =1 b ρ x ( x i ) P ki =1 b ρ x ( x i ) else quit loopelse continue loop 7nd loop/ compute the estimated weights b w at x compute b w ( x i ) = K tr (1 − b ρ x ( x i ) / b a ) + P xi ∈ U x ,h K tr (1 − b ρ x ( x i ) / b a ) + / compute the ﬁlter b f at x compute b f ( x ) = P x i ∈ U x ,h b w ( x i ) Y ( x i ).The proposed algorithm is computationally fast and its implementation is straightfor-ward compared to more sophisticated algorithms developed in recent years. Notice thatan important issue in the non-local means ﬁlter is the choice of the bandwidth parameterin the Gaussian kernel; our algorithm is parameter free in the sense that it automaticallychooses the bandwidth.The numerical simulations show that our ﬁlter outperforms the classical non-localmeans ﬁlter under the same conditions. The overall performance of the proposed ﬁltercompared to its simplicity is very good which can be a big advantage in some practicalapplications. We hope that optimal weights that we deduced can be useful with morecomplicated algorithms and can give similar improvements of the denoising quality. How-ever, these investigations are beyond the scope of the present paper. A detailed analysisof the performance of our ﬁlter is given in Section 4. In this section, we present two theoretical results.The ﬁrst result is a mathematical justiﬁcation of the ”oracle” ﬁlter introduced in theprevious section. It shows that despite the fact that we minimized an upper bound ofthe mean square error instead of the mean square error itself, the obtained ”oracle” stillhas the optimal rate of convergence. Moreover, we show that the weights optimizationapproach possesses the following important adaptivity property: our procedure automat-ically chooses the correct bandwidth a > h > U x ,h is larger than necessary.The second result shows the convergence of the Optimal Weights Filter b f h,η under somemore restricted conditions than those formulated in Section 2. To prove the convergence,we split the image into two independent parts. From the ﬁrst one, we construct the”oracle” ﬁlter; from the second one, we estimate the weights. Under some regularityassumptions on the target image we are able to show that the resulting ﬁlter has nearlythe optimal rate of convergence.Let ρ ( x ) , x ∈ U x ,h , be an arbitrary non-negative function and let w ρ be the optimalweights given by (16). Using these weights w ρ we deﬁne the family of estimates f ∗ h ( x ) = X x ∈ U x ,h w ρ ( x ) Y ( x ) (23)depending on the unknown function ρ. The next theorem shows that one can pick up auseful estimate from the family f ∗ h if the the function ρ is close to the ”true” function8 f,x ( x ) = | f ( x ) − f ( x ) | , i.e. if ρ ( x ) = | f ( x ) − f ( x ) | + δ n , (24)where δ n ≥ f ∗ h under the local H¨older condition | f ( x ) − f ( y ) | ≤ L k x − y k β ∞ , ∀ x, y ∈ U x ,h , (25)where β > h > , and x ∈ I . In the following, c i > i ≥

1) denotes a positive constant, and O ( a n ) ( n ≥

1) denotesa number bounded by c · a n for some constant c >

0. All the constants c i > c > L , β and σ ; their values can be diﬀerent from line to line. Theorem 3

Assume that h = c n − β +2 with c > c = (cid:16) σ ( β +2)(2 β +2)8 L β (cid:17) β +2 , or h ≥ c n − α with ≤ α < β +2 and c > . Suppose that f satisﬁes the local H¨older’s condition (25)and that δ n = O (cid:16) n − β β (cid:17) . Then E ( f ∗ h ( x ) − f ( x )) = O (cid:16) n − β β (cid:17) . (26)The proof will be given in Section 5.3.Recall that the bandwidth h of order n − β is required to have the optimal minimaxrate of convergence O (cid:16) n − β β (cid:17) of the mean squared error for estimating the function f ofglobal H¨older smoothness β (cf. e.g. [7]). To better understand the adaptivity propertyof the oracle f ∗ h ( x ) , assume that the image f at x has H¨older smoothness β (see [24])and that h ≥ c n − α with 0 ≤ α < β +2 , which means that the radius h > U x ,h has been chosen larger than the ”standard” n − β +2 . Then, by Theorem 3,the rate of convergence of the oracle is still of order n − β β , contrary to the global casementioned above. If we choose a suﬃciently large search window U x ,h , then the oracle f ∗ h ( x ) will have a rate of convergence which depends only on the unknown maximal localsmoothness β of the image f. In particular, if β is very large, then the rate will be closeto n − / , which ensures good estimation of the ﬂat regions in cases where the regions areindeed ﬂat. More generally, since Theorem 3 is valid for arbitrary β, it applies for themaximal local H¨older smoothness β x at x , therefore the oracle f ∗ h ( x ) will exhibit thebest rate of convergence of order n − βx βx at x . In other words, the procedure adapts tothe best rate of convergence at each point x of the image.We justify by simulation results that the diﬀerence between the oracle f ∗ h computedwith ρ = ρ f,x = | f ( x ) − f ( x ) | , and the true image f , is extremely small (see Table 1).This shows that, at least from the practical point of view, it is justiﬁed to optimize theupper bound g ρ f,x ( w ) instead of optimizing the mean square error E ( f ∗ h ( x ) − f ( x )) itself.The estimate f ∗ h with the choice ρ ( x ) = ρ f,x ( x ) will be called oracle ﬁlter. In partic-ular for the oracle ﬁlter f ∗ h , under the conditions of Theorem 3, we have E ( f ∗ h ( x ) − f ( x )) ≤ g ρ ( w ρ ) ≤ cn − β β . x ∈ I , h > η > . To prove the convergence we split the set of pixels into two parts I = I ′ x ∪ I ′′ x , where I ′ x = (cid:26) x + (cid:18) iN , jN (cid:19) ∈ I : i + j is even (cid:27) (27)is the set of pixels with an even sum of coordinates i + j and I ′′ x = I (cid:31) I ′ x . Denote U ′ x ,h = U x ,h ∩ I ′ x and V ′′ x,η = V x,η ∩ I ′′ x . Consider the distance between the data patches Y ′′ x,η = ( Y ( y )) y ∈ V ′′ x,η and Y ′′ x ,η = ( Y ( y )) y ∈ V ′′ x ,η deﬁned by d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) = 1 √ m ′′ (cid:13)(cid:13) Y ′′ x,η − Y ′′ x ,η (cid:13)(cid:13) , where m ′′ = card V ′′ x,η . An estimate of the function ρ f,x is given by ρ f,x ( x ) ≈ b ρ ′′ x ( x ) = (cid:16) d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) − √ σ (cid:17) + , (28)see (21). Deﬁne the ﬁlter b f ′ h,η by b f ′ h,η ( x ) = X x ∈ U ′ x ,h b w ′′ ( x ) Y ( x ) , (29)where b w ′′ = arg min w  X x ∈ U ′ x ,h w ( x ) b ρ ′′ x ( x )  + σ X x ∈ U ′ x ,h w ( x ) . (30)The next theorem gives a rate of convergence of the Optimal Weights Filter if theparameters h > η > β. Theorem 4

Assume that h = c n − β +2 with c > c = (cid:16) σ ( β +2)(2 β +2)8 L β (cid:17) β +2 , and that η = c n − β +2 . Suppose that function f satisﬁes the local H¨older condition (25). Then E (cid:16) b f ′ h,η ( x ) − f ( x ) (cid:17) = O (cid:16) n − β β +2 ln n (cid:17) . (31)For the proof of this theorem see Section 5.4.Theorem 4 states that with the proper choices of the parameters h and η , the meansquare error of the estimator b f ′ h,η ( x ) converges nearly at the rate O ( n − β β +2 ) which is theusual optimal rate of convergence for a given H¨older smoothness β > b a provided by our algorithmdepends essentially on the local properties of the image and does not depend much onthe radius h of the search window. These simulations, together with Theorem 3, suggestthat the Optimal Weights Filter (22) can also be applied with larger h, as is the case ofthe ”oracle” ﬁlter f ∗ h . The following theorem deals with the case where h is large.10 heorem 5 Assume that h = c n − α with c > , and < α ≤ β +2 and that η = c n − β +2 . Suppose that the function f satisﬁes the local H¨older condition (25). Then E (cid:16) b f ′ h,η ( x ) − f ( x ) (cid:17) = O (cid:16) n − β β +2 ln n (cid:17) . For the proof of this theorem see Section 5.5. Note that in this case the obtained rateof convergence is not the usual optimal one, in contrast to Theorems 3 and 4, but webelieve that this is the best rate that can be obtained for the proposed ﬁlter.

The performance of the Optimal Weights Filter b f h,η ( x ) is measured by the usual PeakSignal-to-Noise Ratio (PSNR) in decibels (db) deﬁned as P SN R = 10 log M SE , M SE = 1 card I X x ∈ I ( f ( x ) − b f h,η ( x )) , where f is the original image, and b f the estimated one.In the simulations, we sometimes shall use the smoothed version of the estimate ofbrightness variation d K ( Y x,η , Y x ,η ) instead of the non smoothed one d ( Y x,η , Y x ,η ) . Itshould be noted that for the smoothed versions of the estimated brightness variationwe can establish similar convergence results. The smoothed estimate d K ( Y x,η , Y x ,η ) isdeﬁned by d K ( Y x,η , Y x ,η ) = k K ( y ) · ( Y x,η − Y x ,η ) k qP y ′ ∈ V x ,η K ( y ′ ) , where K are some weights deﬁned on V x ,η . The corresponding estimate of brightnessvariation ρ f,x ( x ) is given by b ρ K,x ( x ) = (cid:16) d K ( Y x,η , Y x ,η ) − √ σ (cid:17) + . (32)With the rectangular kernel K r ( y ) = (cid:26) , y ∈ V x ,η , , otherwise, (33)we obtain exactly the distance d ( Y x,η , Y x ,η ) and the ﬁlter described in Section 2. Othersmoothing kernels K used in the simulations are the Gaussian kernel K g ( y ) = exp (cid:18) − N k y − x k h g (cid:19) , (34)where h g is the bandwidth parameter and the following kernel K ( y ) = ( P pk = N k y − x k ∞ k +1) if y = x , P pk =1 1(2 k +1) if y = x , (35)11 mages Lena Barbara Boat House PeppersSizes 512 ×

512 512 ×

512 256 ×

256 256 × σ/P SNR ×

11 41.20db 40.06db 40.23db 41.50db 40.36db13 ×

13 41.92db 40.82db 40.99db 42.24db 41.01db15 ×

15 42.54db 41.48db 41.62db 42.85db 41.53db17 ×

17 43.07db 42.05db 42.79db 43.38db 41.99db σ/P SNR ×

11 37.17db 35.92db 36.23db 37.18db 36.25db13 ×

13 37.91db 36.70db 37.01db 37.97db 36.85db15 ×

15 38.57db 37.37db 37.65db 38.59db 37.38db17 ×

17 39.15db 37.95db 38.22db 39.11db 37.80db σ/P SNR ×

11 34.81db 33.65db 33.79db 34.93db 33.57db13 ×

13 35.57db 34.47db 34.58db 35.78db 34.23db15 ×

15 36.24db 35.15db 35.25db 36.48db 34.78db17 ×

17 36.79db 35.75db 35.84db 37.07db 35.26db

Table 1:

PSNR values when oracle estimator f ∗ h is applied with diﬀerent values of M . −10 −5 0 5 10−10−505100123456789 x 10 −3 −10 −5 0 5 10−10−5051000.0050.010.0150.020.025 Figure 1:

The shape of the kernels K g (left) and K (right) with M = 21 × with the width of the similarity window m = (2 p + 1) . The shape of these two kernelsare displayed in Figure 1.To avoid the undesirable border eﬀects in our simulations, we mirror the image outsidethe image limits, that is we extend the image outside the image limits symmetrically withrespect to the border. At the corners, the image is extended symmetrically with respectto the corner pixels.We have done simulations on a commonly-used set of images available at http://decsai.ugr.es/javier/denoise/test images/ which includes Lena, Barbara, Boat, House, Peppers.The potential of the estimation method is illustrated with the 512 ×

512 image ”Lena”(Figure 2(a)) and ”Barbara” (Figure 3(a) ) corrupted by an additive white Gaussian noise(Figures 2(b), PSNR= 22 . db , σ = 20 and 3 (b), PSNR= 18 . σ = 30 ). We ﬁrst usedthe rectangular kernel K for computing the estimated brightness variation function b ρ K,x , which corresponds to the Optimal Weights Filter as deﬁned in Section 2. Empirically we12ound that the parameters m and M can be ﬁxed to m = 21 ×

21 and M = 13 × . In Figures 2(c) and 3(c), we can see that the noise is reduced in a natural mannerand signiﬁcant geometric features, ﬁne textures, and original contrasts are visually wellrecovered with no undesirable artifacts (PSNR= 32 . db for ”Lena” and PSNR = 28 . U x ,h with centers at some testing pixels x on the noisy image, see Figure 4The distribution of the weights inside the search window U x ,h depends on the estimatedbrightness variation function b ρ K,x ( x ) , x ∈ U x ,h . If the estimated brightness variation b ρ K,x ( x ) is less than b a (see Theorem 1), the similarity between pixels is measured by alinear decreasing function of b ρ K,x ( x ) ; otherwise it is zero. Thus b a acts as an automaticthreshold. In Figure 5, it is shown how the Optimal Weights Filter chooses in each casea proper weight conﬁguration.The best numerical results are obtained using K = K g and K = K in the deﬁnition of b ρ K,x . In Table 2, we compare the Non-Local Mean Filter and the Optimal Weights ﬁlterwith diﬀerent choices of the kernel: K = K g , K , K r . The best PSNR values we obtainedby varying the size m of the similarity windows and the size M of the search windowsare reported in Tables 3 ( σ = 10), 4 ( σ = 20) and 5 ( σ = 30) for K = K . Note thatthe PSNR values are close for every m and M and the optimal m and M depend on theimage content. The values m = 21 ×

21 and M = 13 ×

13 seem appropriate in most casesand a smaller patch size m can be considered for processing piecewise smooth images. We begin with some preliminary results. The following lemma can be obtained fromTheorem 1 of Sacks and Ylvisaker [22]. For the convenience of readers, we prefer to givea direct proof adapted to our situation.

Lemma 6

Let g ρ ( w ) be deﬁned by (13). Then there are unique weights w ρ which minimize g ρ ( w ) subject to (7), given by w ρ ( x ) = 1 σ ( b − λρ ( x )) + , (36)13 a) Original image ”Lena” (b) Noisy image with σ = 20, P SNR = 22 . db (c) Restored with OWF, PSNR= 32 . db (d) Square error with OWF(e) Restored with NLMF, PSNR= 31 . db (f) Square error with NLMF Figure 2:

Results of denoising ”Lena” 512 ×

512 image. Comparing (d) and (f) we see that theOptimal Weights Filter (OWF) captures more details than the Non-Local Means Filter (NLMF). a) Original image ”Barbara” (b) Noisy image with σ = 30, P SNR = 18 . db (c) Restored with OWF, PSNR= 28 . db (d) Square error with OWF(e) Restored with NLMF, PSNR= 27 . db (f) Square error with NLMF Figure 3:

Results of denoising ”Barbara” 512 ×

512 image. Comparing (d) and (f) we seethat the Optimal Weights Filter (OWF) captures more details than the Non-Local Means Filter(NLMF). b c d e f Figure 4:

The noisy image with six selected search windows with centers at pixels a, b, c, d, e,f.

Images Lena Barbara Boat House PeppersSizes 512 ×

512 512 ×

512 256 ×

256 256 × σ/P SNR K r K g K σ/P SNR K r K g K σ/P SNR K r K g K Table 2:

Comparison between the Non-Local Means Filter (NLMF) and the Optimal WeightsFilter (OWF). riginal image Noisy image 2D representation 3D representation Restored imageof the weights of the weights a −20 −10 0 10 20−20−100102000.0020.0040.0060.0080.010.012 b −20 −10 0 10 20−20−100102000.020.040.060.080.10.12 c −20 −10 0 10 20−20−100102000.511.522.533.5 x 10 −3 d −20 −10 0 10 20−20−100102000.511.52 x 10 −3 e −20 −10 0 10 20−20−100102000.020.040.060.080.10.120.140.16 f −20 −10 0 10 20−20−100102002468 x 10 −4 Figure 5:

These pictures show how the Optimal Weights Filter detects the features of theimage by choosing appropriate weights. The ﬁrst column displays six selected search windowsused to estimate the image at the corresponding central pixels a, b, c, d, e and f. The secondcolumn displays the corresponding search windows corrupted by a Gaussian noise with standarddeviation σ = 20 . The third column displays the two-dimensional representation of the weightsused to estimate central pixels. The fourth column gives the three-dimensional representationof the weights. The ﬁfth column gives the restored images. = 10 Lena Barbara Boat House Peppers m/M ×

512 512 ×

512 256 ×

256 256 × × / ×

11 35.35db 34.03db 33.43db 35.69db 34.16db13 × / ×

11 35.40db 34.06db 33.45db 35.72db 34.14db15 × / ×

11 35.44db 34.07db 33.47db 35.73db 34.10db17 × / ×

11 35.47db 34.08db 33.47db 35.74db 34.06db19 × / ×

11 35.50db 34.07db 33.48db 35.74db 34.02db21 × / ×

11 35.52db 34.06db 33.47db 35.73db 33.97db11 × / ×

13 35.35db 34.08db 33.43db 35.77db 34.15db13 × / ×

13 35.40db 34.11db 33.46db 35.79db 34.12db15 × / ×

13 35.44db 34.12db 33.47db 35.80db 34.09db17 × / ×

13 35.47db 34.12db 33.48db 35.81db 34.05db19 × / ×

13 35.50db 34.12db 33.48db 35.81db 34.01db21 × / ×

13 35.52db 34.10db 33.48db 35.80db 33.96db11 × / ×

15 35.33db 34.11db 33.43db 35.82db 34.14db13 × / ×

15 35.39db 34.13db 33.45db 35.84db 34.11db15 × / ×

15 35.43db 34.14db 33.47db 35.85db 34.08db17 × / ×

15 35.47db 34.14db 33.48db 35.86db 34.04db19 × / ×

15 35.49db 34.14db 33.48db 35.85db 34.00db21 × / ×

15 35.52db 34.12db 33.48db 35.84db 33.96db11 × / ×

17 35.32db 34.13db 33.42db 35.86db 34.12db13 × / ×

17 35.37db 34.15db 33.44db 35.88db 34.10db15 × / ×

17 35.42db 34.16db 33.46db 35.89db 34.07db17 × / ×

17 35.46db 34.16db 33.47db 35.89db 34.03db19 × / ×

17 35.48db 34.15db 33.47db 35.88db 34.00db21 × / ×

17 35.51db 34.14db 33.47db 35.87db 33.95db

Table 3:

PSNR values when Optimal Weights Filter with K = K is applied with diﬀerentvalues of m and M ( σ = 10). σ = 20 Lena Barbara Boat House Peppers m/M ×

512 512 ×

512 256 ×

256 256 × × / ×

11 32.08db 30.60db 30.00db 32.56db 30.65db13 × / ×

11 32.20db 30.70db 30.06db 32.64db 30.68db15 × / ×

11 32.30db 30.78db 30.11db 32.71db 30.70db17 × / ×

11 32.39db 30.84db 30.15db 32.76db 30.70db19 × / ×

11 32.47db 30.88db 30.18db 32.79db 30.70db21 × / ×

11 32.53db 30.91db 30.21db 32.81db 30.69db11 × / ×

13 32.06db 30.67db 29.99db 32.63db 30.61db13 × / ×

13 32.18db 30.78db 30.05db 32.71db 30.64db15 × / ×

13 32.29db 30.86db 30.10db 32.79db 30.66db17 × / ×

13 32.38db 30.92db 30.14db 32.84db 30.67db19 × / ×

13 32.46db 30.97db 30.18db 32.88db 30.67db21 × / ×

13 32.52db 31.00db 30.20db 32.90db 30.66db11 × / ×

15 32.02db 30.71db 29.97db 32.67db 30.56db13 × / ×

15 32.15db 30.82db 30.03db 32.76db 30.59db15 × / ×

15 32.26db 30.90db 30.08db 32.83db 30.62db17 × / ×

15 32.35db 30.96db 30.12db 32.89db 30.63db19 × / ×

15 32.43db 31.01db 30.16db 32.92db 30.63db21 × / ×

15 32.50db 31.04db 30.19db 32.94db 30.63db11 × / ×

17 31.97db 30.72db 29.94db 32.70db 30.52db13 × / ×

17 32.10db 30.83db 30.00db 32.79db 30.56db15 × / ×

17 32.22db 30.92db 30.05db 32.86db 30.58db17 × / ×

17 32.32db 30.98db 30.10db 32.92db 30.59db19 × / ×

17 32.40db 31.02db 30.13db 32.96db 30.60db21 × / ×

17 32.47db 31.06db 30.17db 32.98db 30.60db

Table 4:

PSNR values when Optimal Weights Filter with K = K is applied with diﬀerentvalues of m and M ( σ = 20). = 30 Lena Barbara Boat House Peppers m/M ×

512 512 ×

512 256 ×

256 256 × × / ×

11 29.96db 28.38db 27.96db 30.26db 28.36db13 × / ×

11 30.10db 28.53db 28.03db 30.39db 28.43db15 × / ×

11 30.23db 28.65db 28.10db 30.50db 28.47db17 × / ×

11 30.34db 28.75db 28.15db 30.58db 28.50db19 × / ×

11 30.43db 28.83db 28.20db 30.65db 28.51db21 × / ×

11 30.50db 28.81db 28.23db 30.70db 28.52db11 × / ×

13 29.94db 28.42db 27.95db 30.35db 28.30db13 × / ×

13 30.08db 28.58db 28.02db 30.49db 28.37db15 × / ×

13 30.21db 28.70db 28.09db 30.60db 28.42db17 × / ×

13 30.32db 28.80db 28.14db 30.68db 28.46db19 × / ×

13 30.42db 28.88db 28.19db 30.75db 28.48db21 × / ×

13 30.50db 28.89db 28.23db 30.80db 28.49db11 × / ×

15 29.89db 28.43db 27.92db 30.39db 28.23db13 × / ×

15 30.04db 28.58db 27.99db 30.53db 28.30db15 × / ×

15 30.17db 28.71db 28.06db 30.64db 28.36db17 × / ×

15 30.28db 28.81db 28.11db 30.73db 28.40db19 × / ×

15 30.38db 28.89db 28.16db 30.80db 28.43db11 × / ×

17 29.82db 28.40db 27.89db 30.39db 28.18db13 × / ×

17 29.98db 28.56db 27.96db 30.54db 28.26db15 × / ×

17 30.11db 28.69db 28.02db 30.66db 28.31db17 × / ×

17 30.22db 28.79db 28.08db 30.76db 28.36db19 × / ×

17 30.33db 28.87db 28.13db 30.84db 28.39db21 × / ×

17 30.42db 28.96db 28.17db 30.89db 28.41db

Table 5:

PSNR values when Optimal Weights Filter with K = K is applied with diﬀerentvalues of m and M ( σ = 30). where b and λ are determined by X x ∈ U x ,h σ ( b − λρ ( x )) + = 1 , (37) X x ∈ U x ,h σ ( b − λρ ( x )) + ρ ( x ) = λ. (38) Proof.

Let w ′ be a minimizer of g ρ ( w ) under the constraint (7). According to Theorem3.9 of Whittle (1971 [24]), there are Lagrange multipliers b ≥ b ( x ) ≥ , x ∈ U x ,h , such that the function G ( w ) = g ρ ( w ) − b ( X x ∈ U x ,h w ( x ) − − X x ∈ U x ,h b ( x ) w ( x )is minimized at the same point w ′ . Since the function G is strictly convex it admits aunique point of minimum. This implies that there is also a unique minimizer of g ρ ( w )under the constraint (7) which coincides with the unique minimizer of G. Let w ρ be the unique minimizer of G satisfying the constraint (7). Again, using thefact that G is strictly convex, for any x ∈ U x ,h ,∂∂w ( x ) G ( w ) (cid:12)(cid:12)(cid:12)(cid:12) w = w ρ = 2  X y ∈ U x ,h w ρ ( y ) ρ ( y )  ρ ( x ) + 2 σ w ρ ( x ) − b − b ( x ) ≥ . (39)Note that in general we do not have an equality in (39). In addition, by the Karush-Kuhn-Tucker condition, b ( x ) w ρ ( x ) = 0 . (40)19et λ = X y ∈ U x ,h w ρ ( y ) ρ ( y ) . (41)Then (39) becomes ∂∂w ( x ) G ( w ) (cid:12)(cid:12)(cid:12)(cid:12) w = w ρ = λρ ( x ) + σ w ρ ( x ) − b − b ( x ) ≥ , x ∈ U x ,h . (42)If b ( x ) = 0 , then, with respect to the single variable w ( x ) the function G ( w ) attainsits minimum at an interior point w ρ ( x ) ≥

0, so that we have ∂∂w ( x ) G ( w ) (cid:12)(cid:12)(cid:12)(cid:12) w = w ρ = λρ ( x ) + σ w ρ ( x ) − b = 0 . From this we obtain b − λρ ( x ) = σw ρ ( x ) ≥

0, so w ρ ( x ) = ( b − λρ ( x )) + σ . If b ( x ) >

0, by (40), we have w ρ ( x ) = 0. Consequently, from (42) we have b − λρ ( x ) ≤ − b ( x ) ≤ , (43)so that we get again w ρ ( x ) = 0 = ( b − λρ ( x )) + σ . As to the conditions (37) and (38), they follow immediately from the constraint (7) andthe equation (41).

Proof of Theorem 1 . Applying Lemma 6 with b = λa , we see that the unique op-timal weights w minimizing g ρ ( w ) subject to (7), are given by w ρ = λσ ( a − ρ ( x )) + , (44)where a and λ satisfy λ X x ∈ U x ,h ( a − ρ ( x )) + = σ (45)and X x ∈ U x ,h ( a − ρ ( x )) + ρ ( x ) = σ . (46)Since the function M ρ ( t ) = X x ∈ U x ,h ( t − ρ ( x )) + ρ ( x )is strictly increasing and continuous with M ρ (0) = 0 and lim t →∞ M ρ ( t ) = + ∞ , the equation M ρ ( a ) = σ , ∞ ). By (45), σ λ = X x ∈ U x ,h ( a − ρ ( x )) + , which together with (44) imply (16) and (17). Expression (14) can be rewritten as M ρ ( t ) = M X i =1 ρ i ( t − ρ i ) + . (47)Since function M ρ ( t ) is strictly increasing with M ρ (0) = 0 and M ρ (+ ∞ ) = + ∞ , equation(17) admits a unique solution a on (0 , + ∞ ), which must be located in some interval[ ρ k , ρ k +1 ), 1 ≤ k ≤ M , where ρ M +1 = ∞ (see Figure 6). Hence the equation (17)becomes k X i =1 ρ i ( a − ρ i ) = σ , (48)where ρ k ≤ a < ρ k +1 . From (48), it follows that a = σ + k P i =1 ρ ik P i =1 ρ i , ρ k ≤ a < ρ k +1 . (49)We now show that k = k ∗ (so that a = k = k ∗ ), where k ∗ := max { ≤ k ≤ M | a k ≥ ρ k } . To this end, it suﬃces to verify that a k ≥ ρ k and a k < ρ k if k < k ≤ M . We havealready seen that a k ≥ ρ k ; if k < k ≤ M , then a k < ρ k +1 ≤ ρ k , so that a k = ( σ + k P i =1 ρ i ) + k P i = k +1 ρ ik P i =1 ρ i = a k k P i =1 ρ i + k P k +1 ρ ik P i =1 ρ i < ρ k k P i =1 ρ i + k P k +1 ρ k ρ ik P i =1 ρ i = ρ k . (50)We ﬁnally prove that if 1 ≤ k < M and a k < ρ k , then a k +1 < ρ k +1 , so that the lastequality in (19) holds and that k ∗ is the unique integer k ∈ { , · · · , M } such that a k ≥ ρ k and a k +1 < ρ k +1 if 1 ≤ k < M . In fact, for 1 ≤ k < M , the inequality a k < ρ k impliesthat σ + k X i =1 ρ i < ρ k k X i =1 ρ i . ρ ρ · · · ρ k r a ρ k +1 · · · ρ M Figure 6:

The number axis of ρ i , i = 1 , , · · · , M . This, in turn, implies that a k +1 = σ + k P i =1 ρ i + ρ k +1 k +1 P i =1 ρ i < ρ k k P i =1 ρ i + ρ k +1 k +1 P i =1 ρ i ≤ ρ k +1 . First assume that ρ ( x ) = ρ f,x ( x ) = | f ( x ) − f ( x ) | . Recall that g ρ and w ρ were deﬁnedby (13) and (16). Using H¨older’s condition (25) we have, for any w , g ρ ( w ρ ) ≤ g ρ ( w ) ≤ g ( w ) , where g ( w ) =  X x ∈ U x ,h w ( x ) L k x − x k β ∞  + σ X x ∈ U x ,h w ( x ) . In particular, denoting w = arg min w g ( w ) , we get g ρ ( w ρ ) ≤ g ( w ) . By Theorem 1, w ( x ) = (cid:0) a − L k x − x k β ∞ (cid:1) + . X y ∈ U x ,h (cid:0) a − L k x − x k β ∞ (cid:1) + , where a > , ∞ ) of the equation M h ( a ) = σ , with M h ( t ) = X x ∈ U x ,h L k x − x k β ∞ ( t − L k x − x k β ∞ ) + , t ≥ . Theorem 3 will be a consequence of the following lemma.

Lemma 7

Assume that ρ ( x ) = L k x − x k β ∞ and that h ≥ c n − α with ≤ α < β +2 , or h = c n − β +2 with c > c = (cid:16) σ (2 β +2)( β +2)8 L (cid:17) β +2 . Then a = c n − β/ (2 β +2) (1 + o (1)) (51) and g ( w ) ≤ c n − β β , (52) where c and c are positive constants depending only on β, L and σ. roof. We ﬁrst prove (51) in the case where h = 1, i.e. U x ,h = I . Then by the deﬁnitionof a, we have M ( a ) = X x ∈ I ( a − L k x − x k β ∞ ) + L k x − x k β ∞ = σ . (53)Let h = ( a/L ) /β . Then a − L k x − x k β ∞ ≥ k x − x k ∞ ≤ h . So from (53)we get L h β X k x − x k ∞ ≤ h k x − x k β ∞ − L X k x − x k ∞ ≤ h k x − x k β ∞ = σ . (54)By the deﬁnition of the neighborhood U x ,h it is easily seen that X k x − x k ∞ ≤ h k x − x k β ∞ = 8 N − β Nh X k =1 k β +1 = 8 N h β +2 β + 2 (1 + o (1))and X k x − x k ∞ ≤ h k x − x k β ∞ = 8 N − β Nh X k =1 k β +1 = 8 N h β +2 β + 2 (1 + o (1)) . Therefore, (54) implies 8 L β ( β + 2) (2 β + 2) N h β +2 (1 + o (1)) = σ , from which we infer that h = c n − β +2 (1 + o (1)) (55)with c = (cid:16) σ ( β +2)(2 β +2)8 L β (cid:17) β +2 . From (55) and the deﬁnition of h , we obtain a = Lh β = Lc β n − β β +2 (1 + o (1)) , which prove (51) in the case when h = 1.We next prove (51) under the conditions of the lemma. If h ≥ c n − α , where 0 ≤ α < β +2 , then it is clear that h ≥ h for n suﬃciently large. Therefore M h ( a ) = M ( a ), thuswe arrive at equation (53) from which we deduce (55). If h ≥ c n − β +2 and c > c , thenagain h ≥ h for n suﬃciently large. Therefore M h ( a ) = M ( a ), and we arrive again at(55).We ﬁnally prove (52). Denote for brevity G h = X k x − x k ∞ ≤ h ( h β − k x − x k β ∞ ) + . Since h ≥ h for n suﬃciently large, we have M h ( a ) = M h ( a ) = σ and G h = G h . Thenit is easy to see that g ( w ) = σ M h ( a ) + P k x − x k ∞ ≤ h (cid:16)(cid:0) a − L k x − x k β ∞ (cid:1) + (cid:17) L G h = σ L aG h . G h = X k x − x k ∞ ≤ h ( h β − k x − x k β ∞ )= h β X ≤ k ≤ Nh k − N β X ≤ k ≤ Nh k β +1 = 4 ββ + 2 N h β +2 (1 + o (1))= 4 β ( β + 2) L /β N a ( β +2) /β (1 + o (1)) , we obtain g ( w ) = σ ( β + 2)4 β L /β − a − β N (1 + o (1)) = c n − β β +2 (1 + o (1)) , where c is a constant depending on β, L and σ. Proof of Theorem 3 . As ρ ( x ) = | f ( x ) − f ( x ) | + δ n , we have  X x ∈ U x ,h w ( x ) ρ ( x )  ≤  X x ∈ U x ,h w ( x ) | f ( x ) − f ( x ) | + δ n  ≤  X x ∈ U x ,h w ( x ) | f ( x ) − f ( x ) |  + 2 δ n . Hence g ρ ( w ) ≤ g ( w ) + 2 δ n . So g ρ ( w ρ ) ≤ g ρ ( w ) ≤ g ( w ) + 2 δ n . Therefore, by Lemma 7 and the condition that δ n = O (cid:16) n − β β +2 (cid:17) , we obtain g ρ ( w ρ ) = O (cid:16) n − β β +2 (cid:17) . This gives (26).

We begin with a decomposition of b ρ ′′ x ( x ). Note that b ρ ′′ x ( x ) = (cid:16) d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) − σ √ (cid:17) + ≤ (cid:12)(cid:12)(cid:12) d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) − σ √ (cid:12)(cid:12)(cid:12) . (56)24ecall that M ′ = card U ′ x ,h = nh / m ′′ = card V ′′ x ,η = nη / . Let T x ,x be thetranslation mapping T x ,x y = x + ( y − x ) . Denote ∆ x ,x ( y ) = f ( y ) − f ( T x ,x y ) and ζ ( y ) = ε ( y ) − ε ( T x ,x y ) . Since Y ( y ) − Y ( T x ,x y ) = ∆ x ,x ( y ) + ζ ( y ) , it is easy to see that d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) = 1 m ′′ X y ∈ V ′′ x ,η (∆ x ,x ( y ) + ζ ( y )) = ∆ ( x ) + S ( x ) + 2 σ , where ∆ ( x ) = 1 m ′′ X y ∈ V ′′ x ,η ∆ x ,x ( y ) , (57) S ( x ) = − S ( x ) + S ( x ) (58)with S ( x ) = 1 m ′′ X y ∈ V ′′ x ,η ∆ x ,x ( y ) ζ ( y ) ,S ( x ) = 1 m ′′ X y ∈ V ′′ x ,η (cid:0) ζ ( y ) − σ (cid:1) . Notice that E S ( x ) = E S ( x ) = E S ( x ) = 0 . Then obviously d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) − σ √ p ∆ ( x ) + S ( x ) + 2 σ − √ σ = ∆ ( x ) + S ( x ) p ∆ ( x ) + S ( x ) + 2 σ + √ σ . (59)First we prove the following lemma. Lemma 8

Suppose that the function f satisﬁes the local H¨older condition (25). Then,for any x ∈ U ′ x ,h , ρ f,x ( x ) − L η β ≤ ∆ ( x ) ≤ ρ f,x ( x ) + 6 L η β . Proof.

By the decomposition f ( y ) − f ( T x ,x ( y )) = [ f ( x ) − f ( x )] + [ f ( y ) − f ( x )] + [ f ( x ) − f ( T x ,x ( y ))]25nd the inequality ( a + b + c ) ≤ a + b + c ) we obtain∆ ( x ) = 1 m ′′ X y ∈ V ′′ x ,η ( f ( y ) − f ( T x ,x ( y ))) ≤ m ′′ X y ∈ V ′′ x ,η ( f ( x ) − f ( x )) m ′′ X y ∈ V ′′ x ,η ( f ( y ) − f ( x )) m ′′ X y ∈ V ′′ x ,η ( f ( x ) − f ( T x ,x ( y ))) . By the local H¨older condition (25) this implies∆ ( x ) ≤ f ( x ) − f ( x )) + 3 L η β + 3 L η β , which gives the upper bound. The lower bound can be proved similarly using the inequal-ity ( a + b + c ) ≥ a − b − c . We ﬁrst prove a large deviation inequality for S ( x ). Lemma 9

Let S ( x ) be deﬁned by (58). Then there are two constants c and c such thatfor any ≤ z ≤ c ( m ′′ ) / , P (cid:18) | S ( x ) | ≥ z √ m ′′ (cid:19) ≤ (cid:0) − c z (cid:1) . Proof.

Denote ξ ( y ) = ζ ( y ) − σ − x ,x ( y ) ζ ( y ) . Since ζ ( y ) = ε ( y ) − ε ( T x ,x y ) is anormal random variable with mean 0 and variance 2 σ , the random variable ξ ( y ) has anexponential moment, i.e. there exist two positive constants t and c depending only on β, L and σ such that φ y ( t ) = E e tξ ( y ) ≤ c , for any | t | ≤ t . Let ψ y ( t ) = ln φ y ( t ) be thecumulate generating function. By Chebyshev’s exponential inequality we get, P { S ( x ) > z √ m ′′ } ≤ exp  − tz √ m ′′ + X y ∈ V ′′ x ,η ψ y ( t )  , for any | t | ≤ t and for any z > . By the-three terms Taylor expansion, for | t | ≤ t ,ψ y ( t ) = ψ y (0) + tψ ′ y (0) + t ψ ′′ y ( θt ) , where | θ | ≤ , ψ y (0) = 0 , ψ ′ y (0) = E ξ ( y ) = 0 and0 ≤ ψ ′′ y ( t ) = φ ′′ y ( t ) φ y ( t ) − (cid:0) φ ′ y ( t ) (cid:1) ( φ y ( t )) ≤ φ ′′ y ( t ) φ y ( t ) . E e tξ ( y ) ≥ e t E ξ ( y ) = 1 , we obtain the following upper bound: ψ ′′ y ( t ) ≤ φ ′′ y ( t ) = E ξ ( y ) e tξ ( y ) . Using the elementary inequality x e x ≤ e x , x ≥ , we have, for | t | ≤ t / ,ψ ′′ y ( t ) ≤ t E (cid:18) t ξ ( y ) (cid:19) e t ξ ( y ) ≤ t E e t ξ ( y ) ≤ t c . This implies that for | t | ≤ t , ≤ ψ y ( t ) ≤ c t t and P (cid:16) S ( x ) > z √ m ′′ (cid:17) ≤ exp {− tz √ m ′′ + 9 c t m ′′ } . If t = c z/ √ m ′′ ≤ t /

3, where c is a positive constant, we obtain P (cid:16) S ( x ) > z √ m ′′ (cid:17) ≤ exp (cid:26) − c z (cid:18) − c t c (cid:19)(cid:27) . Choosing c > P (cid:16) S ( x ) > z √ m ′′ (cid:17) ≤ exp (cid:0) − c z (cid:1) for some constant c > . In the same way we show that P (cid:16) S ( x ) < − z √ m ′′ (cid:17) ≤ exp (cid:0) − c z (cid:1) . This proves the lemma.We next prove that ρ ′′ x ( x ) is uniformly of order O (cid:16) n − β β +2 √ ln n (cid:17) with probability1 − O ( n − ) , if h has the order n − β +2 . Lemma 10

Suppose that the function f satisﬁes the local H¨older condition (25). Assumethat h = c n − β +2 with c > c = (cid:16) σ ( β +2)(2 β +2)8 L β (cid:17) β +2 and that η = c n − β +2 . Then thereexists a constant c > depending only on β, L and σ , such that P (cid:26) max x ∈ U x ,h b ρ ′′ x ( x ) ≥ c n − β β +2 √ ln n (cid:27) = O (cid:0) n − (cid:1) . (60) Proof.

Using Lemma 9, there are two constants c , c such that, for any z satisfying0 ≤ z ≤ c ( m ′′ ) / , P max x ∈ U ′ x ,h | S ( x ) | ≥ z √ m ′′ ! ≤ X x ∈ U ′ x ,h P (cid:18) | S ( x ) | ≥ z √ m ′′ (cid:19) ≤ m ′′ exp (cid:0) − c z (cid:1) . m ′′ = nη / c n β β +2 . Leting z = √ c log m ′′ and choosing c suﬃcientlylarge we obtain P max x ∈ U ′ x ,h | S ( x ) | ≥ c n − β β +2 √ ln n ! ≤ c n . (61)Using Lemma 8 and the local H¨older condition (25) we have ∆ ( x ) ≤ cL h β , for x ∈ U ′ x ,h . From (56) and (59), with probability 1 − O ( n − ) , we havemax x ∈ U ′ x ,h b ρ ′′ x ( x ) ≤ max x ∈ U ′ x ,h ∆ ( x ) + | S ( x ) | p ∆ ( x ) + S ( x ) + 2 σ + √ σ ≤ cL h β + c n − β β +2 √ ln n √ σ . Since h = O (cid:16) n − β +2 (cid:17) , this gives the desired result.We then prove that given { Y ( x ) , x ∈ I ′′ x } , the conditional expectation of | b f ′ h,η ( x ) − f ( x ) | is of order O (cid:16) n − β β +2 √ ln n (cid:17) with probability 1 − O ( n − ) . Lemma 11

Suppose that the conditions of Theorem 4 are satisﬁed. Then P (cid:16) E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x } ≥ cn − β β +2 ln n (cid:17) = O ( n − ) , where c > is a constant depending only on β, L and σ. Proof.

By (29) and the independence of ε ( x ), we have E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤  X x ∈ U ′ x ,h b w ′′ ( x ) ρ f,x ( x )  + σ X x ∈ U ′ x ,h b w ′′ ( x ) . (62)Since ρ f,x ( x ) < Lh β , from (62) we get E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤  X x ∈ U ′ x ,h b w ′′ β  + σ X x ∈ U ′ x ,h b w ′′ ( x ) ≤ L h β + σ X x ∈ U ′ x ,h b w ′′ ( x ) ≤  X x ∈ U ′ x ,h b w ′′ ( x ) b ρ ′′ x ( x )  + σ X x ∈ U ′ x ,h b w ′′ ( x ) + L h β . (63)28et w ∗ = arg min w g ( w ), where g ( w ) =  X x ∈ U ′ x ,h w ( x ) ρ f,x ( x )  + σ X x ∈ U ′ x ,h w ( x ) . (64)As b w ′′ minimizes the function in (30), from (63) we obtain E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤  X x ∈ U ′ x ,h w ∗ ( x ) b ρ ′′ x ( x )  + σ X x ∈ U ′ x ,h w ∗ ( x ) + L h β . (65)By Lemma 10, with probability 1 − O ( n − ) we have X x ∈ U ′ x ,h w ∗ ( x ) b ρ ′′ x ( x ) ≤ c n − β β +2 √ ln n. Therefore by (65), with probability 1 − O ( n − ), E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤ σ X x ∈ U ′ x ,h w ∗ ( x ) + c n − β β +2 ln n + L h β ≤ g ( w ∗ ) + c n − β β +2 ln n + L h β . This gives the assertion of Lemma 13, as h β = O (cid:16) n − β β +2 (cid:17) and g ( w ∗ ) = O (cid:16) n − β β +2 (cid:17) , by Lemma 7 with U ′ x ,h instead of U x ,h . Now we are ready to prove Theorem 4.

Proof of Theorem 4.

Since the function f satisﬁes H¨older’s condition, by the deﬁnition of g ( w ) (cf. (64)) we have g ( w ) ≤  X x ∈ U ′ x ,h w ( x ) Lh β  + σ X x ∈ U ′ x ,h w ( x ) ≤ L h β + σ ≤ L + σ , so that E (cid:16) | b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x (cid:17) ≤ g ( b w ′′ ) ≤ L + σ . Denote by X the conditional expectation in the above display and write {·} for theindicator function of the set {·} . Then E X = E X · { X ≥ cn − β β +2 ln n } + E X · { X < cn − β β +2 ln n }≤ (cid:0) L + σ (cid:1) P { X ≥ cn − β β +2 ln n } + cn − β β +2 ln n.

29o applying Lemma 11, we see that E (cid:16) | b f ′ h,η ( x ) − f ( x ) | (cid:17) = E X ≤ O ( n − ) + cn − β β +2 ln n = O (cid:16) n − β β +2 ln n (cid:17) . This proves Theorem 4.

We keep the notations of the prevoius subsection. The following result gives a two sidedbound for b ρ ′′ x ( x ) . Lemma 12

Suppose that the function f satisﬁes the local H¨older condition (25). Assumethat h = c n − α with c > and α < β +2 and that η = c n − β +2 . Then there exists positiveconstants c , c , c and c depending only on β, L and σ , such that P (cid:26) max x ∈ U x ,h (cid:0)b ρ ′′ x ( x ) − c ρ f,x ( x ) (cid:1) ≤ c n − β β +2 √ ln n (cid:27) = 1 − O (cid:0) n − (cid:1) (66) and P (cid:26) max x ∈ U x ,h (cid:0) ρ f,x ( x ) − c b ρ ′′ x ( x ) (cid:1) ≤ c n − β β +2 √ ln n (cid:27) = 1 − O (cid:0) n − (cid:1) . (67) Proof.

As in the proof of Lemma 10, we have P max x ∈ U ′ x ,h | S ( x ) | ≥ c n − β β +2 √ ln n ! ≤ c n . Using Lemma 8, for any x ∈ U ′ x ,h , ρ f,x ( x ) − L η β ≤ ∆ ( x ) ≤ ρ f,x ( x ) + 6 L η β . (68)From (56) we have d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) − σ √ ( x ) + S ( x ) p ∆ ( x ) + S ( x ) + 2 σ + √ σ . For the upper bound we have, for any x ∈ U ′ x ,h , b ρ ′′ x ( x ) = (cid:16) d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) − σ √ (cid:17) + ≤ ρ f,x ( x ) + 6 L η β + | S ( x ) |√ σ Therefore, with probability 1 − O ( n − ) , max x ∈ U ′ x ,h (cid:18)b ρ ′′ x ( x ) − ρ f,x ( x ) √ σ (cid:19) ≤ L η β + c n − β β +2 √ ln n √ σ ≤ c n − β β +2 √ ln n. x ∈ U ′ x ,h , we have b ρ ′′ x ( x ) = (cid:16) d (cid:0) Y ′′ x,η , Y ′′ x ,η (cid:1) − σ √ (cid:17) + = (∆ ( x ) + S ( x )) + p ∆ ( x ) + S ( x ) + 2 σ + √ σ ≥ (∆ ( x ) + S ( x )) + q ∆ ( x ) + c n − β β +2 √ ln n + 2 σ + √ σ ≥ c (cid:0) ∆ ( x ) + S ( x ) (cid:1) + ≥ c (cid:0) ∆ ( x ) − | S ( x ) | (cid:1) . Taking into account (68), on the set n max x ∈ U ′ x ,h | S ( x ) | < c n − β β +2 √ ln n o , b ρ ′′ x ( x ) ≥ c (cid:18) ρ f,x ( x ) − L η β − | S ( x ) | (cid:19) ≥ c (cid:16) ρ f,x ( x ) − η β − n − β β +2 √ ln n (cid:17) Therefore, with probability 1 − O ( n − ) , max x ∈ U ′ x ,h (cid:0) c ρ f,x ( x ) − b ρ ′′ x ( x ) (cid:1) ≤ c (cid:16) η β + n − β β +2 √ ln n (cid:17) ≤ c n − β β +2 √ ln n. So the lemma is proved.We then prove that given { Y ( x ) , x ∈ I ′′ x } , the conditional expectation of | b f ′ h,η ( x ) − f ( x ) | is of order O (cid:16) n − β β +2 √ ln n (cid:17) with probability 1 − O ( n − ). Lemma 13

Suppose that the conditions of Theorem 5 are satisﬁed. Then P (cid:16) E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x } ≥ cn − β β +2 ln n (cid:17) = O ( n − ) , where c > is a constant depending only on β , L and σ . Proof.

By (29) and the independence of ε ( x ), we have E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x } ≤  X x ∈ U ′ x ,h b w ′′ ( x ) ρ f,x ( x )  + σ X x ∈ U ′ x ,h b w ′′ ( x ) . Since, by Lemma 12, with probability 1 − O ( n − ) , max x ∈ U x ,h (cid:0) ρ f,x ( x ) − c b ρ ′′ x ( x ) (cid:1) ≤ c n − β β +2 √ ln n,

31e get (with probability 1 − O ( n − )), E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤ c  X x ∈ U ′ x ,h b w ′′ ( x ) qb ρ ′′ x ( x )  + c n − β β +2 √ ln n + σ X x ∈ U ′ x ,h b w ′′ ( x ) . (69)A simple truncation argument, using the decomposition b ρ ′′ x ( x ) = b ρ ′′ x ( x ) nb ρ ′′ x ( x ) ≤ n − β β +2 o + b ρ ′′ x ( x ) nb ρ ′′ x ( x ) > n − β β +2 o , gives X x ∈ U ′ x ,h b w ′′ ( x ) qb ρ ′′ x ( x ) ≤ n − β β +2 X x ∈ U ′ x ,h b w ′′ ( x ) + n β β +2 X x ∈ U ′ x ,h b w ′′ ( x ) b ρ ′′ x ( x ) ≤ n − β β +2 + n β β +2 X x ∈ U ′ x ,h b w ′′ ( x ) b ρ ′′ x ( x ) . (70)From (69) and (70) one gets E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤ c n β β +2  X x ∈ U ′ x ,h b w ′′ ( x ) b ρ ′′ x ( x )  + σ X x ∈ U ′ x ,h b w ′′ ( x )  + c n − β β +2 √ ln n. Let w ∗ = arg min w g ( w ), where g was deﬁned in (65). As b w ′′ minimize the function in(30), from (63) we obtain E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤ c n β β +2  X x ∈ U ′ x ,h w ∗ ( x ) b ρ ′′ x ( x )  + σ X x ∈ U ′ x ,h w ∗ ( x )  + c n − β β +2 √ ln n. (71)By Lemma 12, with probability 1 − O ( n − ) , max x ∈ U x ,h (cid:0)b ρ ′′ x ( x ) − c ρ f,x ( x ) (cid:1) ≤ c n − β β +2 √ ln n. Therefore, with probability 1 − O ( n − ) , E {| b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x }≤ c n β β +2  X x ∈ U ′ x ,h w ∗ ( x ) ρ f,x ( x )  + σ X x ∈ U ′ x ,h w ∗ ( x )  + c n − β β +2 √ ln n = c n β β +2 g ( w ∗ ) + c n − β β +2 √ ln n. g ( w ∗ ) = O (cid:16) n − β β +2 (cid:17) by Lemma 7 with U ′ x ,h instead of U x ,h . Proof of Theorem 5 . Since the function f satisﬁes H¨older’s condition, by the deﬁnitionof g ( w ) (cf. (64)) we have (see the proof of Theorem 4) g ( w ) ≤ L + σ so that E (cid:16) | b f ′ h,η ( x ) − f ( x ) | (cid:12)(cid:12) Y ( x ) , x ∈ I ′′ x (cid:17) ≤ g ( b w ′′ ) ≤ L + σ . Denote by X the conditional expectation in the above display. Then E X = E X · { X ≥ cn − β β +2 ln n } + E X · { X < cn − β β +2 ln n }≤ (cid:0) L + σ (cid:1) P { X ≥ cn − β β +2 ln n } + cn − β β +2 ln n. So applying Lemma 13, we see that E (cid:16) | b f ′ h,η ( x ) − f ( x ) | (cid:17) = E X ≤ O ( n − ) + cn − β β +2 ln n = O (cid:16) n − β β +2 ln n (cid:17) . This proves Theorem 5

A new image denoising ﬁlter to deal with the additive Gaussian white noise model basedon a weights optimization problem is proposed. The proposed algorithm is computa-tionally fast and its implementation is straightforward. Our work leads to the followingconclusions.1. In the non-local means ﬁlter the choice of the Gaussian kernel is not justiﬁed. Ourapproach shows that it is preferable to choose the triangular kernel.2. The obtained estimator is shown to converge at the usual optimal rate, under someregularity conditions on the target function. To the best of our knowledge suchconvergence results have not been established so far.3. Our ﬁlter is parameter free in the sense that it chooses automatically the bandwidthparameter.4. Our numerical results conﬁrm that optimal choice of the kernel improves the per-formance of the non-local means ﬁlter, under the same conditions.33 eferences [1] A. Buades, B. Coll, and J.M. Morel. A review of image denoising algorithms, with anew one.

Multiscale Model. Simul. , 4(2):490–530, 2006.[2] T. Buades, Y. Lou, JM Morel, and Z. Tang. A note on multi-image denoising. In

Int.workshop on Local and Non-Local Approximation in Image Processing , pages 1–15,August 2009.[3] J.F. Cai, R.H. Chan, and M. Nikolova. Two-phase approach for deblurring imagescorrupted by impulse plus Gaussian noise.

Inverse Problems and Imaging , 2(2):187–204, 2008.[4] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Color image denoising via sparse3D collaborative ﬁltering with grouping constraint in luminance-chrominance space.In

IEEE Int. Conf. Image Process. , volume 1, pages 313–316, September 2007.[5] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-Dtransform-domain collaborative ﬁltering.

IEEE Trans. Image Process. , 16(8):2080–2095, 2007.[6] D.L. Donoho and J.M. Johnstone. Ideal spatial adaptation by wavelet shrinkage.

Biometrika , 81(3):425, 1994.[7] J.Q. Fan and I. Gijbels. Local polynomial modelling and its applications. In

Chapman & Hall, London , 1996.[8] R. Garnett, T. Huegerich, C. Chui, and W. He. A universal noise removal algorithmwith an impulse detector.

IEEE Trans. Image Process. , 14(11):1747–1754, 2005.[9] V. Katkovnik, A. Foi, K. Egiazarian, and J. Astola. From local kernel to nonlocalmultiple-model image denoising.

Int. J. Comput. Vis. , 86(1):1–32, 2010.[10] C. Kervrann and J. Boulanger. Optimal spatial adaptation for patch-based imagedenoising.

IEEE Trans. Image Process. , 15(10):2866–2878, 2006.[11] C. Kervrann and J. Boulanger. Local adaptivity to variable smoothness for exemplar-based image regularization and representation.

Int. J. Comput. Vis. , 79(1):45–69,2008.[12] C. Kervrann, J. Boulanger, and P. Coup´e. Bayesian non-local means ﬁlter, imageredundancy and adaptive dictionaries for noise removal. In

Proc. Conf. Scale-Spaceand Variational Meth. (SSVM’ 07) , pages 520–532. Springer, June 2007.[13] B. Li, Q.S. Liu, J.W. Xu, and X.J. Luo. A new method for removing mixed noises.

Sci. China Ser. F (Information sciences) , 54:1–9, 2010.[14] Y. Lou, X. Zhang, S. Osher, and A. Bertozzi. Image recovery via nonlocal operators.

J. Sci. Comput. , 42(2):185–197, 2010. 3415] A.V. Nazin, J. Roll, L. Ljung, and I. Grama. Direct weight optimization in statisticalestimation and system identiﬁcation.

System Identiﬁcation and Control Problems(SICPRO08), Moscow , January 28-31 2008.[16] J. Polzehl and V. Spokoiny. Image denoising: pointwise adaptive approach.

Ann.Stat. , 31(1):30–57, 2003.[17] J. Polzehl and V. Spokoiny. Propagation-separation approach for local likelihoodestimation.

Probab. Theory Rel. , 135(3):335–362, 2006.[18] J. Polzehl and V.G. Spokoiny. Adaptive weights smoothing with applications toimage restoration.

J. Roy. Stat. Soc. B , 62(2):335–354, 2000.[19] J. Roll. Local and piecewise aﬃne approaches to system identiﬁcation.

Ph.D. dis-sertation, Dept. Elect. Eng., Link¨oing University, Link¨oing, Sweden , 2003.[20] J. Roll and L. Ljung. Extending the direct weight optimization approach. In

TechnicalReport LiTH-ISY-R-2601. Dept. of EE, Link¨oping Univ., Sweden , 2004.[21] J. Roll, A. Nazin, and L. Ljung. Nonlinear system identiﬁcation via direct weightoptimization.

Automatica , 41(3):475–490, 2005.[22] J. Sacks and D. Ylvisaker. Linear estimation for approximately linear models.

Ann.Stat. , 6(5):1122–1137, 1978.[23] C. Tomasi and R. Manduchi. Bilateral ﬁltering for gray and color images. In

Proc.Int. Conf. Computer Vision , pages 839–846, January 04-07 1998.[24] P. Whittle. Optimization under constraints: theory and applications of nonlinearprogramming. In

Wiley-Interscience, New York , 1971.[25] L. P. Yaroslavsky. Digital picture processing. An introduction. In