[PDF] A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing

Abstract

Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, a non-convex non-smooth optimization framework is proposed to achieve diverse smoothing natures where even contradictive smoothing behaviors can be achieved. To this end, we first introduce the truncated Huber penalty function which has seldom been used in image smoothing. A robust framework is then proposed. When combined with the strong flexibility of the truncated Huber penalty function, our framework is capable of a range of applications and can outperform the state-of-the-art approaches in several tasks. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. The effectiveness and superior performance of our approach are validated through comprehensive experimental results in a range of applications.

Full PDF

AA Generalized Framework for Edge-preserving and Structure-preserving ImageSmoothing

Wei Liu , , Pingping Zhang , Yinjie Lei ∗ , Xiaolin Huang , , Jie Yang , ∗ , Ian Reid Department of Automation, Shanghai Jiao Tong University, The University of Adelaide Dalian University of Technology, Sichuan University, Institute of Medical Robotics, Shanghai Jiao Tong University { wei.liu02, ian.reid } @adelaide.edu.au, [email protected], [email protected], { xiaolinhuang, jieyang } @sjtu.edu.cn Abstract

Image smoothing is a fundamental procedure in applicationsof both computer vision and graphics. The required smooth-ing properties can be different or even contradictive amongdifferent tasks. Nevertheless, the inherent smoothing natureof one smoothing operator is usually ﬁxed and thus cannotmeet the various requirements of different applications. Inthis paper, a non-convex non-smooth optimization frameworkis proposed to achieve diverse smoothing natures where evencontradictive smoothing behaviors can be achieved. To thisend, we ﬁrst introduce the truncated Huber penalty functionwhich has seldom been used in image smoothing. A robustframework is then proposed. When combined with the strongﬂexibility of the truncated Huber penalty function, our frame-work is capable of a range of applications and can outperformthe state-of-the-art approaches in several tasks. In addition, anefﬁcient numerical solution is provided and its convergence istheoretically guaranteed even the optimization framework isnon-convex and non-smooth. The effectiveness and superiorperformance of our approach are validated through compre-hensive experimental results in a range of applications.

Introduction

The key challenge of many tasks in both computer visionand graphics can be attributed to image smoothing. At thesame time, the required smoothing properties can vary dra-matically for different tasks. In this paper, depending on therequired smoothing properties, we roughly classify a largenumber of applications into four groups.Applications in the ﬁrst group require the smoothing op-erator to smooth out small details while preserving strongedges, and the amplitudes of these strong edges can be re-duced but the edges should be neither blurred nor sharp-ened. Representatives in this group are image detail en-hancement and HDR tone mapping (Farbman et al. 2008;Fattal, Agrawala, and Rusinkiewicz 2007; He, Sun, and Tang2013). Blurring edges can result in halos while sharpeningedges will lead to gradient reversals (Farbman et al. 2008). ∗ Jie Yang and Yinjie Lei are the corresponding authors of thispaper.Copyright c (cid:13)

Figure 1: Our method is capable of (a) image detail enhance-ment, (b) clip-art compression artifacts removal, (c) guideddepth map upsampling and (d) image texture removal.These applications are representatives of edge-preservingand structure-preserving image smoothing and require con-tradictive smoothing properties.The second group includes tasks like clip-art compres-sion artifacts removal (Nguyen and Brown 2015; Xu et al.2011), image abstraction and pencil sketch production (Xuet al. 2011). In contrast to the ones in the ﬁrst group, thesetasks require to smooth out small details while sharpeningstrong edges. This is because edges can be blurred in thecompressed clip-art image and they need to be sharpenedwhen the image is recovered (see Fig. 1(b) for example).Sharper edges can produce better visual quality in image ab-straction and pencil sketch. At the same time, the amplitudesof strong edges are not allowed to be reduced in these tasks.Guided image ﬁltering, such as guided depth map upsam-pling (Park et al. 2011; Ferstl et al. 2013; Liu et al. 2017b)and ﬂash/no ﬂash ﬁltering (Kopf et al. 2007; Petschnigg etal. 2004), is categorized into the third group. The structureinconsistency between the guidance image and target image,which can cause blurring edges and texture copy artifactsin the smoothed image (Ham, Cho, and Ponce 2015; Liu etal. 2017b), should be properly handled by the specially de-signed smoothing operator. They also need to sharpen edgesin the smoothed image due to the reason that low-qualitycapture of depth and noise in the no ﬂash images can lead toblurred edge (see Fig. 1(c) for example).Tasks in the fourth group require to smooth the image ina scale-aware manner, e.g., image texture removal (Xu et al.2012; Zhang et al. 2014; Cho et al. 2014). This kind of tasksrequire to smooth out small structures even when they con-tain strong edges, while large structure should be properlypreserved even the edges are weak (see Fig. 1(d) for exam- a r X i v : . [ c s . G R ] N ov le). This is totally different from that in the above threegroups where they all aim at preserving strong edges.To be more explicit, we categorize the smoothing proce-dures in the ﬁrst to the third groups as edge-preserving im-age smoothing since they try to preserve salient edges, whilethe smoothing processes in the fourth group are classiﬁed as structure-preserving image smoothing because they aim atpreserving salient structures.A diversity of edge-preserving and structure-preservingsmoothing operators have been proposed for various tasks.Generally, each of them is designed to meet the requirementsof certain applications, and thus its inherent smoothing na-ture is usually ﬁxed. Therefore, there is seldom a smooth-ing operator that can meet all the smoothing requirementsof the above four groups, which are quite different or evencontradictive. For example, the L norm smoothing (Xu etal. 2011) can sharpen strong edges and is suitable for clip-art compression artifacts removal, however, this will lead togradient reversals in image detail enhancement and HDRtone mapping. The weighted least squares (WLS) smooth-ing (Farbman et al. 2008) performs well in image detail en-hancement and HDR tone mapping, but it is not capable ofsharpening edges and structure-preserving smoothing, etc.In contrast to most of the smoothing operators in the lit-erature, a new smoothing operator, which is based on a non-convex non-smooth optimization framework, is proposed inthis paper. It can achieve different and even contradictivesmoothing behaviors and is able to handle the applicationsin the four groups mentioned above. The main contributionsof this paper are as follows:1. We introduce the truncated Huber penalty function whichhas seldom been used in image smoothing. By varying theparameters, it shows strong ﬂexibility.2. A robust non-convex non-smooth optimization frame-work is proposed. When combined with the strong ﬂex-ibility of the truncated Huber penalty function, our modelcan achieve various and even contradictive smoothing be-haviors. We show that it is able to handle the tasks inthe four groups mentioned above. This has seldom beenachieved by previous smoothing operators.3. An efﬁcient numerical solution to the proposed optimiza-tion framework is provided. Its convergence is theoreti-cally guaranteed.4. Our method is able to outperform the specially designedapproaches in many tasks and state-of-the-art perfor-mance is achieved. Related Work

Tremendous smoothing operators have been proposed in re-cent decades. In terms of edge-preserving smoothing, bilat-eral ﬁlter (BLF) (Tomasi and Manduchi 1998) is the earlywork that has been used in various tasks such as image detailenhancement (Fattal, Agrawala, and Rusinkiewicz 2007),HDR tone mapping (Durand and Dorsey 2002), etc. How-ever, it is prone to produce results with gradient reversalsand halos (Farbman et al. 2008). Its alternatives (Gastal andOliveira 2012; 2011) also share a similar problem. Guided ﬁlter (GF) (He, Sun, and Tang 2013) can produce results freeof gradient reversals but halos still exist. The WLS smooth-ing (Farbman et al. 2008) solves a global optimization prob-lem and performs well in handling these artifacts. The L norm smoothing is able to eliminate low-amplitude struc-tures while sharpening strong edges, which can be appliedto the tasks in the second group. To handle the structure in-consistency problem, Shen et al. (Shen et al. 2015b) pro-posed to perform mutual-structure joint ﬁltering. They alsoexplored the relation between the guidance image and targetimage via optimizing a scale map (Shen et al. 2015a), how-ever, additional processing was adopted for structure incon-sistency handling. Ham et al. (Ham, Cho, and Ponce 2015)proposed to handle the structure inconsistency by combin-ing a static guidance weight with a Welsch’s penalty (Hol-land and Welsch 1977) regularized smoothness term, whichleaded to a static/dynamic (SD) ﬁlter. Gu et al. (Gu et al.2017b) presented a weighted analysis representation modelfor guided depth map enhancement.In terms of structure-preserving smoothing, Zhang et al.(Zhang et al. 2014) proposed to smooth structures of differ-ent scales with a rolling guidance ﬁlter (RGF). Cho et al.(Cho et al. 2014) modiﬁed the original BLF with localpatch-based analysis of texture features and obtained a bi-lateral texture ﬁlter (BTF) for image texture removal. Kara-can et al. (Karacan, Erdem, and Erdem 2013) proposed tosmooth image textures by making use of region covariancesthat captured local structure and textural information. Xuet al. (Xu et al. 2012) adopted the relative total variation(RTV) as a prior to regularize the texture smoothing proce-dure. Fan et al. (Fan et al. 2018; 2019) proposed to performvarious kinds of image smoothing through convolutionalneural networks. Chen et al. (Chan and Esedoglu 2005)proved that the TV- L model (Chan and Esedoglu 2005;Nikolova 2004) could smooth images in a scale-aware man-ner, and it is thus ideal for structure-preserving smoothingsuch as image texture removal (Buades et al. 2010).Most of the approaches mentioned above are limited toa few applications because their inherent smoothing na-tures are usually ﬁxed. In contrast, our method proposed inthis paper can have strong ﬂexibility in achieving varioussmoothing behaviors, which enables wider applications ofour method than most of them. Moreover, our method canshow better performance than these methods in several ap-plications that they are specially designed for. Our Approach

Truncated Huber Penalty Function

We ﬁrst introduce the truncated Huber penalty functionwhich is deﬁned as: h T ( x ) = (cid:26) h ( x ) , | x | ≤ bb − a , | x | > b s.t. a ≤ b, (1) where a, b are constants. h ( · ) is the Huber penalty function(Huber and others 1964) deﬁned as: h ( x ) = (cid:26) a x , | x | < a | x | − a , | x | ≥ a , (2)a) (b) Figure 2: Plots of (a) different penalty functions and (b) thetruncated Huber penalty function with different parametersettings. h T ( · ) and h ( · ) are plotted in Fig. 2(a) with a = (cid:15) which isa sufﬁcient small value (e.g., (cid:15) = 10 − ). h ( · ) is an edge-preserving penalty function, but it cannot sharpen edgeswhen adopted to regularize the smoothing procedure. In con-trast, h T ( · ) can sharpen edges because it is able to not penal-ize image edges due to the truncation. The Welsch’s penaltyfunction (Holland and Welsch 1977), which was adopted inthe recent proposed SD ﬁlter (Ham, Cho, and Ponce 2015),is also plotted in the ﬁgure. This penalty function is knownto be capable of sharpening edges, which is also because itseldom penalizes strong image edges. The Welsch’s penaltyfunction is close to the L norm when the input is small,while the h T ( · ) can be close to the L norm when a is setsufﬁcient small, which demonstrates h T ( · ) can better pre-serve weak edges than the Welsch’s penalty function.With different parameter settings, h T ( · ) can show strongﬂexibility to yield different penalty behaviors. Assume theinput intensity values are within [0 , I m ] , then the amplitudeof any edge will fall in [0 , I m ] . We ﬁrst set a = (cid:15) . Then if weset b > I m , h T ( · ) will be actually the same as h ( · ) becausethe second condition in Eq. (1) can never be met. Because a is sufﬁcient small, h T ( · ) will be close to the L norm in thiscase, and thus it will be an edge-preserving penalty func-tion that does not sharpen edges. Conversely, when we set b < I m , the truncation in h T ( · ) will be activated. This canlead to having penalization on weak edges without penaliz-ing strong edges, and thus the strong edges are sharpened.To be short, b can act as a switch to decide whether h T ( · ) can sharpen edges or not. Similarly, by setting a = b > I m and a = b < I m , h T ( · ) can be easily switched betweenthe L norm and truncated L norm. Note that the truncated L norm is also able to sharpen edges (Xu, Zheng, and Jia2013). In contrast, the Welsch’s penalty function does notenjoy this kind of ﬂexibility. Different cases of h T ( · ) are il-lustrated in Fig. 2(b). Model

Given an input image f and a guidance image g , thesmoothed output image u is the solution to the followingobjective function: E u ( u ) = (cid:88) i  (cid:88) j ∈ Nd ( i ) h T ( u i − f j ) + λ (cid:88) j ∈ Ns ( i ) ω i,j h T ( u i − u j )  , (3) where h T is deﬁned in Eq.(1); N d ( i ) is the (2 r d + 1) × (2 r d + 1) square patch centered at i ; N s ( i ) is the (2 r s + 1) × (2 r s + 1) square patch centered at i ; λ is a parame-ter that controls the overall smoothing strength. To be clear,we adopt { a d , b d } and { a s , b s } to denote the parameters of h T ( · ) in the data term and smoothness term, respectively.The guidance weight ω i,j is deﬁned as: ω i,j = 1( | g i − g j | + δ ) α , (4) where α determines the sensitivity to the edges in g whichcan be the input image, i.e., g = f . |·| represents the absolutevalue. δ is a small constant being set as δ = 10 − .The adoption of h T ( · ) makes our model in Eq. (3) to enjoya strong ﬂexibility. As will be shown in the following prop-erty analysis section, with different parameter settings, ourmodel is able to achieve different smoothing behaviors, andthus it is capable of various tasks that require either edge-preserving smoothing or structure-preserving smoothing. Numerical Solution

Our model in Eq. (3) is not only non-convex but alsonon-smooth, which arises from the adopted h T ( · ) . Com-monly used approaches (Lanckriet and Sriperumbudur2009; Nikolova and Ng 2005; Wang et al. 2008; Zhang,Kwok, and Yeung 2004) for solving non-convex optimiza-tion problems are not applicable. To tackle this problem,we ﬁrst rewrite h T ( · ) in a new equivalent form. By deﬁn-ing ∇ di,j = u i − f j and ∇ si,j = u i − u j , we have: h T ( ∇ ∗ i,j ) = min l ∗ i,j (cid:110) h ( ∇ ∗ i,j − l ∗ i,j ) + ( b ∗ − a ∗ | l ∗ i,j | (cid:111) , (5) where ∗ ∈ { d, s } , | l ∗ i,j | is the L norm of l ∗ i,j . The mini-mum of the right side of Eq. (5) is obtained on the condition: l ∗ i,j = (cid:26) , |∇ ∗ i,j | ≤ b ∗ ∇ ∗ i,j , |∇ ∗ i,j | > b ∗ , ∗ ∈ { d, s } . (6) The detailed proof of Eq. (5) and Eq. (6) is provided in oursupplementary ﬁle. These two equations also theoreticallyvalidate our analysis in Fig. 2(b): we have |∇ ∗ i,j | ∈ [0 , I m ] if the intensity values are in [0 , I m ] . Then if b > I m , basedon Eq. (5) and Eq. (6), we will always have h T ( ∇ ∗ i,j ) = h ( ∇ ∗ i,j ) which means h T ( · ) degrades to h ( · ) .A new energy function is deﬁned as: E ul ( u, l d , l s ) = (cid:80) i,j (cid:0) h ( ∇ di,j − l di,j ) + ( b d − a d ) | l di,j | (cid:1) + λ (cid:80) i,j ω i,j (cid:0) h ( ∇ si,j − l si,j ) + ( b s − a s ) | l si,j | (cid:1) . (7) Based on Eq. (5) and Eq. (6), we then have: E u ( u ) = min l ∗ E ul ( u, l d , l s ) , ∗ ∈ { d, s } . (8) Given Eq. (6) as the optimum condition of Eq. (8) withrespect to l ∗ , optimizing E ul ( u, l d , l s ) with respect to u onlyinvolves Huber penalty function h ( · ) . The problem can thusbe optimized through the half-quadratic (HQ) optimizationtechnique (Geman and Yang 1995; Nikolova and Ng 2005).More speciﬁcally, a variable µ ∗ ( ∗ ∈ { d, s } ) and a function ψ ( µ ∗ i,j ) with respect to µ ∗ exist such that: h ( ∇ ∗ i,j − l ∗ i,j ) = min µ ∗ i,j (cid:8) µ ∗ i,j ( ∇ ∗ i,j − l ∗ i,j ) + ψ ( µ ∗ i,j ) (cid:9) , (9) here the optimum is yielded on the condition: µ ∗ i,j = (cid:40) a ∗ , |∇ ∗ i,j − l ∗ i,j | < a ∗ |∇ ∗ i,j − l ∗ i,j | , |∇ ∗ i,j − l ∗ i,j | ≥ a ∗ , ∗ ∈ { d, s } . (10) The detailed proof of Eq. (9) and Eq. (10) is provided in oursupplementary ﬁle. Then we can further deﬁne a new energyfunction: E ulµ ( u, l d , l s , µ d , µ s ) = (cid:80) i,j (cid:0) µ di,j ( ∇ di,j − l di,j ) + ψ ( µ di,j ) + ( b d − a d ) | l di,j | (cid:1) + λ (cid:80) i,j ω i,j (cid:0) µ si,j ( ∇ si,j − l si,j ) + ψ ( µ si,j ) + ( b s − a s ) | l si,j | (cid:1) . (11) Based on Eq. (9) and Eq. (10), we then have: E ul ( u, l ∗ ) = min µ ∗ E ulµ ( u, l ∗ , µ ∗ ) , ∗ ∈ { d, s } . (12) Given Eq. (10) as the optimum condition of µ ∗ forEq. (12), optimizing E ulµ ( u, l d , l s , µ d , µ s ) with respect to u only involves the L norm penalty function, which hasa closed-form solution. However, since the optimum condi-tions in Eq. (6) and Eq. (10) both involve u , therefore, theﬁnal solution u can only be obtained in an iterative man-ner. Assuming we have got u k , then ( l ∗ ) k and ( µ ∗ ) k , ( ∗ ∈{ s, d } ) can be updated through Eq. (6) and Eq. (10) with u k .Finally, u k +1 is obtained with: u k +1 = min u E ulµ (cid:16) u, ( l ∗ ) k , ( µ ∗ ) k (cid:17) , (13) Eq.(13) has a close-form solution as: u k +1 = (cid:16) A k − λ W k (cid:17) − (cid:16) D k + 2 λS k (cid:17) , (14) where W k is an afﬁnity matrix with W ki,j = ω i,j ( µ si,j ) k , A k is a diagonal matrix with A kii = (cid:80) j ∈ N d ( i ) ( µ di,j ) k +2 λ (cid:80) j ∈ N s ( i ) ω i,j ( µ si,j ) k , D k is a vector with D ki = (cid:80) j ∈ N d ( i ) ( µ di,j ) k ( f j + ( l di,j ) k ) and S k is also a vector with S ki = (cid:80) j ∈ N s ( i ) ω i,j ( µ si,j ) k ( l si,j ) k .The above optimization procedure monotonically de-creases the value of E u ( u ) in each step, its convergence istheoretically guaranteed. Given u k in the k th iteration and ∗ ∈ { s, d } , then for any u , we have: E u ( u ) ≤ E ul ( u, ( l ∗ ) k ) , E u ( u k ) = E ul ( u k , ( l ∗ ) k ) , (15) (cid:26) E ul ( u, ( l ∗ ) k ) ≤ E ulµ ( u, ( l ∗ ) k , ( µ ∗ ) k ) E ul ( u k , ( l ∗ ) k ) = E ulµ ( u k , ( l ∗ ) k , ( µ ∗ ) k ) . (16) Given ( l ∗ ) k has been updated through Eq. (6), Eq. (15) isbased on Eq. (8) and Eq. (5). After ( µ ∗ ) k has been updatedthrough Eq. (10), Eq. (16) is based on Eq. (12) and Eq. (9).We now have: E ul ( u k +1 , ( l ∗ ) k ) ≤ E ulµ ( u k +1 , ( l ∗ ) k , ( µ ∗ ) k ) ≤ E ulµ ( u k , ( l ∗ ) k , ( µ ∗ ) k ) = E ul ( u k , ( l ∗ ) k ) , (17) the ﬁrst and second inequalities follow from Eq. (16) andEq. (13), respectively. We ﬁnally have: E u ( u k +1 ) ≤ E ul ( u k +1 , ( l ∗ ) k ) ≤ E ul ( u k , ( l ∗ ) k ) = E u ( u k ) , (18) Algorithm 1

Image Smoothing via Non-convex Non-smooth Optimization

Require:

Input image f , guide image g , iteration number N , parameter λ, α, a ∗ , b ∗ , r ∗ , u ← f , with ∗ ∈ { d, s } for k = 0 : N do With u k , compute ( ∇ ∗ i,j ) k , update ( l ∗ i,j ) k accordingto Eq. (6) With ( l ∗ i,j ) k , update ( µ ∗ i,j ) k according to Eq. (10) With ( l ∗ i,j ) k and ( µ ∗ i,j ) k , solve for u k +1 according toEq. (13) (or Eq. (14)) end forEnsure: Smoothed image u N +1 the ﬁrst and second inequalities follow from Eq. (15) andEq. (17), respectively. Since the value of E u ( u ) is boundedfrom below, Eq. (18) indicates that the convergence of ouriterative scheme is theoretically guaranteed.The above optimization procedure is iteratively per-formed N times to get the output u N . In all our experiments,we set u = f , which is able to produce promising results ineach application. Our optimization procedure is summarizedin Algorithm 1. Property Analysis

With different parameter settings, the strong ﬂexibility of h T ( · ) makes our model able to achieve various smoothingbehaviors. First, we show that some classical approaches canbe viewed as special cases of our model. For example, bysetting a d = b d > I m , a s = (cid:15), b s > I m , α = 0 , r d =0 , r s = 1 , our model is an approximation of the TV model(Rudin, Osher, and Fatemi 1992) which is a representativeedge-preserving smoothing operator. If we set α = 0 . , g = f with other parameters the same as above, then the ﬁrst iter-ation of Algorithm 1 will be the WLS smoothing (Farbmanet al. 2008) which performs well in handling gradient rever-sals and halos in image detail enhancement and HDR tonemapping. With parameters a d = (cid:15), b d > I m , a s = (cid:15), b s >I m , α = 0 , r d = 0 , r s = 1 , our model can yield very closesmoothing natures as the TV- L model (Buades et al. 2010)which is classical for structure-preserving smoothing.For different kinds of applications, our model can pro-duce better results than the special cases mentioned above.To be convenient, we ﬁrst start with the tasks in the fourthgroup which require structure-preserving smoothing. Forthese tasks, the parameters are set as a d = (cid:15), b d > I m , a s = (cid:15), b s > I m , r d = r s , α = 0 . , g = f . This parame-ter setting has the following two advantages: ﬁrst, the set-ting a d = (cid:15), b d > I m , a s = (cid:15), b s > I m enables ourmodel to have the structure-preserving property similar tothat of the TV- L model; second, the guidance weight with α = 0 . , g = f can make our model to obtain sharper edgesin the results than the TV- L model does. We illustrate thiswith 1D smoothing results in Fig. 3(a) and (b). Fig. 6(b) and(c) further show a comparison of image texture removal re-sults. As shown in the ﬁgure, both the TV- L model andour model can properly remove the small textures, however,edges in our result are much sharper than that in the result of a) (b) (c) (d) (e) (f) Figure 3: 1D signal with structures of different scales and amplitudes. Smoothing result of (a) TV- L smoothing (Buades et al.2010), (c) WLS (Farbman et al. 2008), (e) SD ﬁlter (Ham, Cho, and Ponce 2015), our results in (b), (d) and (f). (a) (b) (c) Figure 4: Image detail enhancement results of different approaches. (a) Input image. Result of (b) WLS (Farbman et al. 2008)and (c) our method. The upper parts of each close-up in (b) and (c) correspond to the patches in the smoothed image. (a) (b) (c) (d) (e) (f)

Figure 5: Clip-art compression artifacts removal results of different approaches. (a) Input image. (b) Our result. Close-ups of (c)input image and results of (d) SD ﬁlter (Ham, Cho, and Ponce 2015), (e) our method with the structure-preserving parametersetting, (f) our method with the edge-preserving and structure-preserving parameter setting. (a) (b) (c)

Figure 6: Texture smoothing results of different approaches.(a) Input image. Result of (b) TV- L smoothing (Buades etal. 2010), and (e) our method.the TV- L model. The typical values for r d = r s are ∼ depending on the texture size. λ is usually smaller than 1.Larger r d , r s , λ can lead larger structures to be removed. Theiteration number is set as N = 10 .When dealing with image detail enhancement and HDRtone mapping in the ﬁrst group, one way is to set the pa-rameters so that our model can perform WLS smoothing. Incontrast, we can further make use of the structure-preservingproperty of our model to produce better results. The param-eters are set as follows: a d = (cid:15), b d > I m , a s = (cid:15), b s >I m , r d = r s , α = 0 . , g = f . This kind of parameter set-ting is based on the following observation in our experi-ments: when we adopt N = 1 and set λ to a large value,the amplitudes of different structures will decrease at differ- ent rates, i.e., the amplitudes of small structures can have alarger decrease than the large ones, as illustrated in Fig. 3(d).At the same time, edges are neither blurred nor sharpened.This kind of smoothing behavior is desirable for image de-tail enhancement and HDR tone mapping. As a comparison,Fig. 3(c) shows the smoothing result of the WLS smoothing.As can be observed from the ﬁgures, our method can bet-ter preserve the edges (see the bottom of the 1D signals inFig. 3(c) and (d)). Fig. 4(b) and (c) further show a compari-son of image detail enhancement results. We ﬁx r d = r s = 2 and vary λ to control the smoothing strength. λ for the tasksin the ﬁrst group is usually much larger than that for theones in the fourth group, for example, the result in Fig. 4(c)is generated with λ = 20 .To sharpen edges that is required by the tasks in the sec-ond and the third groups, we can set b s < I m in the smooth-ness term. In addition, we further set other parameters as a d = (cid:15), b d < I m , a s = (cid:15) . The truncation b d < I m in thedata term can help our model to be robust against the out-liers in the input image, for example, the noise in the no ﬂashimage and low-quality depth map. The truncation b s < I m in the smoothness term can enable our model to be an edge-preserving one. By setting a d = a s = (cid:15) , our model can fur-ther enjoy the structure-preserving property. With both edge-preserving and structure-preserving smoothing natures, ourmodel has the ability to preserve large structures with weak a) (b) (c) (d) (e) (f) Figure 7: HDR tone mapping results of different approaches. Result of (a) BF (Tomasi and Manduchi 1998), (b) GF (He, Sun,and Tang 2013), (c) L norm smoothing (Xu et al. 2011), (d) WLS (Farbman et al. 2008), (e) SG-WLS (Liu et al. 2017a) and(f) our method. (a) (b) (c) (d) (e) (f) Figure 8: Clip-art compression artifacts removal results of different methods. (a) Input compressed image. Result of (b) theapproach proposed by Wang et al. (Wang, Wong, and Heng 2006), (c) L norm smoothing (Xu et al. 2011), (d) region fusionapproach (Nguyen and Brown 2015), (e) BTF (Cho et al. 2014) and (f) our method.edges and small structures with strong edges at the sametime, which is challenging but is of practical importance.Fig. 5(a) illustrates this kind of case with an example ofclip-art compression artifacts removal: both the thin blackcircle around the “wheel” and the gray part in the center ofthe “wheel” should be preserved. The challenge lies on twofacts. On one hand, if we perform edge-preserving smooth-ing, the gray part will be removed because the correspond-ing edge is weak. Fig. 5(d) shows the result of the SD ﬁlter(Ham, Cho, and Ponce 2015). The SD ﬁlter can properlypreserve the thin black circle and sharpen the edges thanksto the adopted Welsch’s penalty function, however, it fails topreserve the weak edge between the black part and the graypart. On the other hand, if we adopt structure-preservingsmoothing, then the thin black circle will be smoothed dueto its small structure size. Fig. 5(e) shows the correspondingresult of our method with the structure-preserving parame-ter setting described above. In contrast, our method with theedge-preserving and structure-preserving parameter settingcan preserve both these two parts and sharpen the edges,as shown in Fig. 5(f). Fig. 3(e) and (f) also show a com-parison of the SD ﬁlter and our method with 1D smoothingresults. We ﬁx α = 0 . , r d = r s , N = 10 for the tasksin both the second and the third groups. We empirically set b d = b s = 0 . I m ∼ . I m and r d = r s = 1 ∼ depend-ing on the applied task and the input noise level.The structure inconsistency issue in the third group canalso be easily handled by our model. Note that µ si,j inEq. (11) is computed with the smoothed image in each it- eration, as formulated in Eq. (10), it thus can reﬂect the in-herent natures of the smoothed image. The guidance weight ω i,j can provide additional structural information from theguidance image g . This means that µ si,j and ω i,j can com-plement each other. In fact, the equivalent guidance weightof Eq. (11) in each iteration is µ si,j ω i,j , which can reﬂect theproperty of both the smoothed image and the guidance im-age. In this way, it can properly handle the structure incon-sistency problem and avoid blurring edges and texture copyartifacts. Similar ideas were also adopted in (Ham, Cho, andPonce 2015; Liu et al. 2017b). Applications and Experimental Results

Our method is applied to various tasks in the ﬁrst to thefourth groups to validate the effectiveness. Comparisonswith the state-of-the-art approaches in each application arealso presented. Due to the limited space, we only show ex-perimental results of four applications.Our experiments are performed on a PC with an Intel Corei5 3.4GHz CPU (one thread used) and 8GB memory. For anRGB image of size × and N = 10 in Algorithm1, the running time is . / . / . / . / . sec-onds in MATLAB for r d = r s = 1 / / / / . Note thatas described in the property analysis section, the value of r d = r s is smaller than 3 in most cases except for guideddepth map upsampling. For the tasks in the ﬁrst group whichrequire N = 1 , the computational cost could be further re-duced to of that mentioned above. HDR tone mapping is a representative task in the ﬁrst a) (b) (c) (d) (e) (f) (g)

Figure 9: Guided depth map upsampling results of simulated ToF data. (a) Guidance color image. (b) Ground-truth depth map.Result of (c) the approach proposed by Gu et al. (Gu et al. 2017b), (d) SGF (Zhang et al. 2015), (e) SD ﬁlter (Ham, Cho, andPonce 2015), (f) Park et al. (Park et al. 2011) and (g) our method. (a) (b) (c) (d) (e) (f) (g)

Figure 10: Guided depth upsampling results of real ToF data. (a) Guidance intensity image. (b) Ground-truth depth map. Resultof (c) the approach proposed by Gu et al. (Gu et al. 2017b), (d) TGV (Ferstl et al. 2013), (e) SD ﬁlter (Ham, Cho, and Ponce2015), (f) SGF (Zhang et al. 2015) and (g) our method. (a) (b) (c) (d) (e) (f) (g)

Figure 11: Image texture removal results. (a) Input image. Result of (b) JCAS (Gu et al. 2017a), (c) RTV (Xu et al. 2012), (d)FCN based approach (Chen, Xu, and Koltun 2017), (e) muGIF (Guo et al. 2018) (f) BTF (Cho et al. 2014) and (g) our method.group. It requires to decompose the input image into a baselayer and a detail layer through edge-preserving smoothing.The challenge of this task is that if the edges are sharp-ened by the smoothing procedure, it will result in gradi-ent reversals, and halos will occur if the edges are blurred.Fig. 7 shows the tone mapping results using different edge-preserving smoothing operators. The results of BF (Tomasiand Manduchi 1998) and GF (He, Sun, and Tang 2013) con-tain clear halos around the picture frames and the light ﬁx-ture, as shown in Fig. 7(a) and (b). This is due to their lo-cal smoothing natures where strong smoothing can also blursalient edges (Farbman et al. 2008; He, Sun, and Tang 2013).The L norm smoothing (Xu et al. 2011) can properly elim-inate halos, but there are gradient reversals in its result asillustrated in Fig. 7(c). This is because the L smoothing isprone to sharpen salient edges. The WLS (Farbman et al.2008) and SG-WLS (Liu et al. 2017a) smoothing perform well in handling gradient reversals and halos in most cases.However, there are slight halos in their results as illustratedin the left close-up in Fig. 7(d) and (e). These artifacts areproperly eliminated in our results. Clip-art compression artifacts removal . Clip-art im-ages are piecewise constant with sharp edges. When theyare compressed in JPEG format with low quality, there willbe edge-related artifacts, and the edges are usually blurredas shown in Fig. 8(a). Therefore, when removing the com-pression artifacts, the edges should also be sharpened inthe restored image. We thus classify this task into the sec-ond group. The approach proposed by Wang et al. (Wang,Wong, and Heng 2006) can seldom handle heavy compres-sion artifacts as shown in Fig. 8(b). The L norm smoothingfails to preserve weak edges as shown in Fig. 8(c). The re-gion fusion approach (Nguyen and Brown 2015) is able toproduce results with sharpened edges, however, it also en-able 1: Quantitative comparison on the noisy simulated ToF data. Results are evaluated in MAE. The best results are in bold .The second best results are underlined. Art Book Dolls Laundry Moebius Reindeer × × × × × × × × × × × × × × × × × × × × × × × × TGV(Ferstl et al. 2013) 0.8 1.21 2.01 4.59 0.61 0.88 1.21 2.19 0.66 0.95 1.38 2.88 0.61

Table 2: Quantitative comparison on real ToF dataset. Theerrors are calculated as MAE to the measured ground-truthin mm . The best results are in bold . The second best resultsare underlined. Books Devil Shark

Bicubic 16.23mm 17.78mm 16.66mmGF(He, Sun, and Tang 2013) 15.55mm 16.1mm 17.1mmSD Filter(Ham, Cho, and Ponce 2015) 13.47mm 15.99mm 16.18mmSG-WLS(Liu et al. 2017a) 14.71mm 16.24mm 16.51mmShen et al.(Shen et al. 2015b) 15.47mm 16.18mm 17.33mmPark et al.(Park et al. 2011) 14.31mm 15.36mm 15.88mmTGV(Ferstl et al. 2013) 12.8mm 14.97mm 15.53mmAR(Yang et al. 2014) 14.37mm 15.41mm 16.27mmGu et al.(Gu et al. 2017b) 13.87mm 15.36mm 15.88mmSGF(Zhang et al. 2015) 13.57mm 15.74mm 16.21mmFGI(Li et al. 2016b) 14.21mm 16.43mm 16.37mmFBS(Barron and Poole 2016) 15.93mm 17.21mm 16.33mmLi et al.(Li et al. 2016a) 14.33mm 15.09mm 15.82mmOurs hances the blocky artifacts along strong edges as highlightedin Fig. 8(d). The edges in the result of BTF (Cho et al. 2014)are blurred in Fig. 8(e). Our result is illustrated in Fig. 8(f)with edges sharpened and compression artifacts removed.

Guided depth map upsampling belongs to the guidedimage ﬁltering in the third group. The RGB guided im-age can provide additional structural information to restoreand sharpen the depth edges. The challenge of this taskis the structure inconsistency between the depth map andthe RGB guidance image, which can cause blurring depthedges and texture copy artifacts in the upsampled depthmap. We test our method on the simulated dateset providedin (Yang et al. 2014). Fig. 9 shows the visual comparisonbetween our result and the results of the recent state-of-the-art approaches. Our method shows better performancein preserving sharp depth edges and avoiding texture copyartifacts. Tab. 1 also shows the quantitative evaluation onthe results of different methods. Following the measurementused in (Guo et al. 2018; Li et al. 2016b; Liu et al. 2017a;Yang et al. 2014), the evaluation is measured in terms ofmean absolute errors (MAE). As Tab. 1 shows, our methodcan achieve the best or the second best performance amongall the compared approaches.We further validate our method on the real data introducedby Ferstl et al. (Ferstl et al. 2013). The real dataset containsthree low-resolution depth maps captured by a ToF depthcamera and the corresponding highly accurate ground-truthdepth maps captured with structured light. The upsampling factor for the real dataset is ∼ . × . The visual comparisonin Fig. 10 and the quantitative comparison in Tab. 2 showsthat our method can outperform the compared methods andachieve state-of-the-art performance. Image texture removal belongs to the tasks in the fourthgroup. It aims at extracting salient meaningful structureswhile removing small complex texture patterns. The chal-lenge of this task is that it requires structure-preservingsmoothing rather than the edge-preserving in the abovetasks. Fig. 11(a) shows a classical example of image tex-ture removal: the small textures with strong edges shouldbe smoothed out while the salient structures with weakedges should be preserved. Fig. 11(b) ∼ (f) show the resultsof the recent state-of-the-art approaches. The joint convo-lutional analysis and synthesis sparse (JCAS) model (Gu etal. 2017a) can well remove the textures, but the resultingedges are also blurred. The RTV method (Xu et al. 2012),muGIF (Guo et al. 2018), BTF (Cho et al. 2014) and FCNbased approach (Chen, Xu, and Koltun 2017) cannot com-pletely remove the textures, in addition, the weak edges ofthe salient structures have also been smoothed out in theirresults. Our method can both preserve the weak edges of thesalient structures and remove the small textures. Conclusion

We propose a non-convex non-smooth optimization frame-work for edge-preserving and structure-preserving imagesmoothing. We ﬁrst introduce the truncated Huber penaltyfunction which shows strong ﬂexibility. Then a robustframework is presented. When combined with the ﬂexibil-ity of the truncated Huber penalty function, our frameworkis able to achieve different and even contradictive smoothingbehaviors using different parameter settings. This is differ-ent from most previous approaches of which the inherentsmoothing natures are usually ﬁxed. We further propose anefﬁcient numerical solution to our model and prove its con-vergence theoretically. Comprehensive experimental resultsin a number of applications demonstrate the effectiveness ofour method.

Acknowledgement

We gratefully acknowledge the support of the AustraliaCentre for Robotic Vision. This paper is also partly sup-ported by NSFC, China (No. U1803261, 61977046), KeyResearch and Development Program of Sichuan Province(No. 2019YFG0409) and National Key Research and De-velopment Project (No. 2018AAA0100702) eferences

Barron, J. T., and Poole, B. 2016. The fast bilateral solver. In

ECCV , 617–632. Springer.Buades, A.; Le, T. M.; Morel, J.-M.; Vese, L. A.; et al. 2010. Fastcartoon+ texture image ﬁlters.

TIP

SIAM Journal on AppliedMathematics

ICCV , volume 9, 2516–2525.Cho, H.; Lee, H.; Kang, H.; and Lee, S. 2014. Bilateral textureﬁltering.

ToG

ToG , volume 21, 257–266. ACM.Fan, Q.; Yang, J.; Wipf, D.; Chen, B.; and Tong, X. 2018. Imagesmoothing via unsupervised learning. In

SIGGRAPH Asia 2018Technical Papers , 259. ACM.Fan, Q.; Chen, D.; Yuan, L.; Hua, G.; Yu, N.; and Chen, B. 2019.A general decoupled learning framework for parameterized imageoperators.

IEEE transactions on pattern analysis and machine in-telligence .Farbman, Z.; Fattal, R.; Lischinski, D.; and Szeliski, R. 2008.Edge-preserving decompositions for multi-scale tone and detailmanipulation. In

ToG , volume 27, 67. ACM.Fattal, R.; Agrawala, M.; and Rusinkiewicz, S. 2007. Multiscaleshape and detail enhancement from multi-light image collections.In

ToG , volume 26, 51. ACM.Ferstl, D.; Reinbacher, C.; Ranftl, R.; R¨uther, M.; and Bischof, H.2013. Image guided depth upsampling using anisotropic total gen-eralized variation. In

ICCV , 993–1000.Gastal, E. S., and Oliveira, M. M. 2011. Domain transform foredge-aware image and video processing. In

ToG , volume 30, 69.ACM.Gastal, E. S., and Oliveira, M. M. 2012. Adaptive manifolds forreal-time high-dimensional ﬁltering.

ToG

TIP

ICCV , 1717–1725. IEEE.Gu, S.; Zuo, W.; Guo, S.; Chen, Y.; Chen, C.; and Zhang, L. 2017b.Learning dynamic guidance for depth image enhancement. In

CVPR .Guo, X.; Li, Y.; Ma, J.; and Ling, H. 2018. Mutually guided imageﬁltering.

TPAMI .Ham, B.; Cho, M.; and Ponce, J. 2015. Robust image ﬁlteringusing joint static and dynamic guidance. In

CVPR , 4823–4831.He, K.; Sun, J.; and Tang, X. 2013. Guided image ﬁltering.

TPAMI

Communications in Statistics-theory and Methods

The annals of mathematical statistics

ToG

ToG , volume 26, 96. ACM. Lanckriet, G. R., and Sriperumbudur, B. K. 2009. On the conver-gence of the concave-convex procedure. In

NeurIPS , 1759–1767.Li, Y.; Huang, J.-B.; Ahuja, N.; and Yang, M.-H. 2016a. Deep jointimage ﬁltering. In

ECCV , 154–169. Springer.Li, Y.; Min, D.; Do, M. N.; and Lu, J. 2016b. Fast guided globalinterpolation for depth and motion. In

ECCV , 717–733. Springer.Liu, W.; Chen, X.; Shen, C.; Liu, Z.; and Yang, J. 2017a. Semi-global weighted least squares in image ﬁltering. In

ICCV , 5861–5869.Liu, W.; Chen, X.; Yang, J.; and Wu, Q. 2017b. Robust colorguided depth map restoration.

TIP

ICCV , 208–216.Nikolova, M., and Ng, M. K. 2005. Analysis of half-quadratic min-imization methods for signal and image recovery.

SIAM Journal onScientiﬁc Computing

Journal of Mathematical Imaging and Vision

ICCV ,1623–1630. IEEE.Petschnigg, G.; Szeliski, R.; Agrawala, M.; Cohen, M.; Hoppe, H.;and Toyama, K. 2004. Digital photography with ﬂash and no-ﬂashimage pairs.

ToG

Physica D: nonlinear phe-nomena

TPAMI (1):1–1.Shen, X.; Zhou, C.; Xu, L.; and Jia, J. 2015b. Mutual-structure forjoint ﬁltering. In

ICCV , 3406–3414.Tomasi, C., and Manduchi, R. 1998. Bilateral ﬁltering for gray andcolor images. In

ICCV , 839–846. IEEE.Wang, Y.; Yang, J.; Yin, W.; and Zhang, Y. 2008. A new alternat-ing minimization algorithm for total variation image reconstruc-tion.

SIAM Journal on Imaging Sciences

ToG

ToG , volume 30, 174. ACM.Xu, L.; Yan, Q.; Xia, Y.; and Jia, J. 2012. Structure extraction fromtexture via relative total variation.

ToG

CVPR , 1107–1114.Yang, J.; Ye, X.; Li, K.; Hou, C.; and Wang, Y. 2014. Color-guideddepth recovery from rgb-d data using an adaptive autoregressivemodel.

TIP

ECCV , 815–830. Springer.Zhang, F.; Dai, L.; Xiang, S.; and Zhang, X. 2015. Segmentgraph based image ﬁltering: fast structure-preserving smoothing.In

ICCV , 361–369.Zhang, Z.; Kwok, J. T.; and Yeung, D.-Y. 2004. Surrogate maxi-mization/minimization algorithms for adaboost and the logistic re-gression model. In