[PDF] Fast High-Dimensional Bilateral and Nonlocal Means Filtering

Abstract

Existing fast algorithms for bilateral and nonlocal means filtering mostly work with grayscale images. They cannot easily be extended to high-dimensional data such as color and hyperspectral images, patch-based data, flow-fields, etc. In this paper, we propose a fast algorithm for high-dimensional bilateral and nonlocal means filtering. Unlike existing approaches, where the focus is on approximating the data (using quantization) or the filter kernel (via analytic expansions), we locally approximate the kernel using weighted and shifted copies of a Gaussian, where the weights and shifts are inferred from the data. The algorithm emerging from the proposed approximation essentially involves clustering and fast convolutions, and is easy to implement. Moreover, a variant of our algorithm comes with a guarantee (bound) on the approximation error, which is not enjoyed by existing algorithms. We present some results for high-dimensional bilateral and nonlocal means filtering to demonstrate the speed and accuracy of our proposal. Moreover, we also show that our algorithm can outperform state-of-the-art fast approximations in terms of accuracy and timing.

Full PDF

IIEEE TRANSACTIONS ON IMAGE PROCESSING 1

Fast High-Dimensional Bilateral andNonlocal Means Filtering

Pravin Nair,

Student Member, IEEE, and Kunal N. Chaudhury,

Senior Member, IEEE

Abstract —Existing fast algorithms for bilateral and nonlocalmeans ﬁltering mostly work with grayscale images. They cannoteasily be extended to high-dimensional data such as color andhyperspectral images, patch-based data, ﬂow-ﬁelds, etc. In thispaper, we propose a fast algorithm for high-dimensional bilateraland nonlocal means ﬁltering. Unlike existing approaches, wherethe focus is on approximating the data (using quantization) orthe ﬁlter kernel (via analytic expansions), we locally approximatethe kernel using weighted and shifted copies of a Gaussian, wherethe weights and shifts are inferred from the data. The algorithmemerging from the proposed approximation essentially involvesclustering and fast convolutions, and is easy to implement.Moreover, a variant of our algorithm comes with a guarantee(bound) on the approximation error, which is not enjoyed byexisting algorithms. We present some results for high-dimensionalbilateral and nonlocal means ﬁltering to demonstrate the speedand accuracy of our proposal. Moreover, we also show that ouralgorithm can outperform state-of-the-art fast approximations interms of accuracy and timing.

Index Terms —High-dimensional ﬁlter, bilateral ﬁlter, nonlocalmeans, shiftability, kernel, approximation, fast algorithm.

I. I

NTRODUCTION

Smoothing images while preserving structures (edges, cor-ners, lines, etc.) is a fundamental task in image processing. Aclassic example in this regard is the diffusion framework ofPerona and Malik [1]. In the last few decades, several ﬁlteringbased approaches have been proposed for this task. Prominentexamples include bilateral ﬁltering [2], [3], [4], mean shiftﬁltering [5], weighted least squares [6], domain transform [7],guided ﬁltering [8], and nonlocal means [9]. The brute-forceimplementation of most of these ﬁlters is computationallyprohibitive and cannot be used for real-time applications.To address this problem, researchers have come up withapproximation algorithms that can signiﬁcantly accelerate theﬁltering without compromising the quality. Unfortunately, forbilateral and nonlocal means ﬁltering, most of these algo-rithms can be used only for grayscale images. It is difﬁcultto use them even for color ﬁltering, while preserving theirefﬁciency. The situation is more challenging for multispectraland hyperspectral images, where the range dimension is muchlarger. Of course, any algorithm for grayscale ﬁltering can beused for channel-by-channel processing of high-dimensionalimages. However, by working in the combined intensity space,we can exploit the strong correlation between channels [4].

The authors are with the Department of Electrical Engineering, Indian Insti-tute of Science, Bangalore 560012, India. Correspondence: [email protected]. N. Chaudhury was supported by a Startup Grant from Indian Institute ofScience and EMR Grant SERB/F/6047/2016-2017 from the Department ofScience and Technology, Government of India.

A. High-Dimensional Filtering

The focus of the present work is on two popular smoothers,the bilateral and the nonlocal means ﬁlters, and their applica-tion to high-dimensional data. The former is used for edge-preserving smoothing in a variety of applications [10], whilethe latter is primarily used for denoising. Though nonlocalmeans has limited denoising capability compared to state-of-the-art denoisers [11], [12], it continues to be of interest due toits simplicity and the availability of low-complexity algorithms[13], [14], [15]. The connection between these ﬁlters is thatthey can be interpreted as a multidimensional Gaussian ﬁlteroperating in the joint spatio-range space [16]. The term high-dimensional ﬁltering is used when the dimension of the spatio-range space is large [14], [17], [18]. In this paper, we will usethis term when the range dimension is greater than one.An uniﬁed formulation of high-dimensional bilateral andnonlocal means ﬁltering is as follows. Suppose that the inputdata is f : Ω → [0 , R ] n , where Ω ⊂ Z d is the spatial domain, [0 , R ] n is the range space, and d and n are dimensions ofthe domain and range of f . Let p : Ω → R ρ be the guide image, which is used to control the ﬁltering [10]. The output g : Ω → [0 , R ] n is given by g (ı) = 1 η (ı) (cid:88) j ∈ W ω ( j ) ϕ (cid:0) p (ı − j ) − p (ı) (cid:1) f (ı − j ) , (1)where η (ı) = (cid:88) j ∈ W ω ( j ) ϕ (cid:0) p (ı − j ) − p (ı) (cid:1) . (2)The aggregations in (1) and (2) are performed over a windowaround the pixel of interest, i.e., W = [ − S, S ] d , where S isthe window length. We call ω : Z d → R the spatial kerneland ϕ : R ρ → R the range kernel [4], where we denote thedimension of the range space of p as ρ . The spatial kernelis used to measure proximity in the spatial domain Ω (as inclassical linear ﬁltering). On the other hand, the range kernelis used to measure proximity in the range space of p .For (joint) bilateral ﬁltering, ω and ϕ are generally Gaus-sian. While f and p are identical for the bilateral ﬁlter( ρ = n ), they are different for the joint bilateral ﬁlter [10]. Inthe original proposal for nonlocal means [9], p (ı) are thespatial neighbors of pixel ı (extracted from a patch around ı ) and hence ρ is n times the size of image patch, ϕ is amultidimensional Gaussian, and ω is a box ﬁlter. Later, it wasshown in [21] that by reducing the dimension of each patch(e.g., by applying PCA to the collection of patches), we canimprove the speed and denoising performance. Similar to [14],[17], [18], we have also considered PCA-based nonlocal means a r X i v : . [ c s . C V ] N ov EEE TRANSACTIONS ON IMAGE PROCESSING 2 (a)

Lena ( × × ) (b) Proposed, 274ms, 55.36dB. (c) GCS [19], ms, . dB. (d) AM [14], ms, . dB.(e) Brute-force, sec. -30-20-100102030 (f) Error: (b) - (e). -30-20-100102030 (g) Error: (c) - (e). -30-20-100102030 (h) Error: (d) - (e).Fig. 1. Bilateral ﬁltering of a color image [20], where the ﬁlter parameters are σ s = 10 and σ r = 40 . We have used clusters for GCS and the proposedmethod. The number of manifolds was automatically set to in AM. The run-time and PSNR (see deﬁnition in (21)) with manifolds are . sec and . dB. In the second row, we show the error between a particular method (ﬁrst row) and the brute-force implementation for just the red channel. in this work. Needless to say, the proposed algorithm can alsowork with full patches.The brute-force computation of (1) and (2) clearly re-quires O ( S d ( n + ρ )) operations per pixel. In particular, thecomplexity scales exponentially with the window length S ,which can be large for some smoothing applications. In thelast decade, several fast algorithms have been proposed forbilateral ﬁltering of grayscale images [20], [22], [23], [24],[25], [26]. The complexity can be cut down from O ( S d ) to O (1) using these algorithms. Similarly, fast algorithms havebeen proposed for nonlocal means [13], [15] that can reducethe complexity from O ( S d ρ ) to O ( S d ) . B. Previous WorkQuantization methods . Durand et al. [23] proposed a novelframework for approximating (1) and (2) using clustering andinterpolation. In terms of our notations, their approximationof (1) and (2) can be expressed as η (ı) K (cid:88) k =1 c k (ı) (cid:32) (cid:88) j ∈ W ω ( j ) ϕ (cid:0) p (ı − j ) − µ k (cid:1) f (ı − j ) (cid:33) , (3)and ˆ η (ı) = K (cid:88) k =1 c k (ı) (cid:32) (cid:88) j ∈ W ω ( j ) ϕ (cid:0) p (ı − j ) − µ k (cid:1)(cid:33) . (4)The centers µ k are obtained by uniformly sampling the rangespace of p , whereas the interpolation coefﬁcient c k (ı) isdetermined from the distance between p (ı) and µ k . Noticethat the inner summations in (3) and (4) can be expressed asconvolutions, which are computed using FFT in [23]. Further-more, the entire processing is performed on the subsampled image, following which the output image is upsampled. Theapproximation turns out to be effective for grayscale images.This is because the range space is one-dimensional in this case,and hence a good approximation can be achieved using a smallthe number of samples K . Moreover, it sufﬁces to interpolatejust the two neighboring pixels. Paris et al. [20] showed thatthe accuracy of [23] can be improved by downsampling theintermediate images (instead of the input image) involved inthe convolutions. Chen et al. [16] proposed to accelerate [20]by performing convolutions in the higher-dimensional spatio-range space. Yang et al. [26] observed that this frameworkcan also be used with non-Gaussian range kernels, and that O (1) convolutions can be used to improve the computationalcomplexity. As the range dimension ρ increases, these methodshowever become computationally inefﬁcient. In particular, anexponentially large K is required to achieve a decent ap-proximation, and linear interpolation requires ρ neighboringsamples. In [27], Yang et al. extended this method to performbilateral ﬁltering of color images, and proposed a fast schemefor linear interpolation. However, [27] is based on uniformquantization, which is not optimal given the strong correlationbetween the color channels. In fact, a more efﬁcient algorithmfor color ﬁltering was later proposed in [19], where clusteringis used to ﬁnd µ k . Moreover, instead of interpolation, theyused local statistics prior to improve the accuracy. However,the algorithm is used only for color bilateral ﬁltering and itsapplication to other forms of high-dimensional ﬁltering is notreported in [19]. Splat-blur-slice method . Another line of work is [14], [17],[18], which is based on a slightly different form of approx-imation. More precisely, they are based on the “splat-blur-slice” framework, which involves data partitioning (clustering

EEE TRANSACTIONS ON IMAGE PROCESSING 3

Cluster Range Space Form Intermediate Images Calculate Coe ﬃ cientsFast Convolutions Weighted Recombination fp ˆ gµ k c k b k u k ! ⇤ u k ! ⇤ b k (input)(guide) (output) Fig. 2. Flowchart of our algorithm with the main modules. Also shown are the timings for the bilateral ﬁltering of a MP color image. The ﬁlter parametersare σ s = 10 and σ r = 75 . When K = 4 (number of clusters), total time required is ms and the PSNR is . dB. The symbols used in the ﬂowchartare deﬁned in Section II (also see Algorithm 1). or tessellation), fast convolutions, and interpolation as the coreoperations. These are considered to be the state-of-the-art fastalgorithms for high-dimensional bilateral and nonlocal meansﬁltering. The idea, originally proposed in [16], is based onthe observation that [20] corresponds to convolutions in thejoint spatio-range space. The general idea is to sample theinput pixels in a different space (splatting), perform Gaussianconvolutions (blurring), and resample the result back to theoriginal pixel space (slicing). Adams et al. tessellated thedomain and performed blurring using the k -d tree in [17] andthe permutohedral lattice in [18]. Gastal et al. [14] dividedand resampled the domain into non-linear manifolds, andperformed blurring on them. This was shown to be faster thanall other methods within the splat-blur-slice framework. Kernel approximation . In an altogether different direction,it was shown in [22], [24], [25], [28] that fast algorithms forbilateral ﬁltering can be obtained by approximating the rangekernel ϕ using the so-called shiftable functions, which includespolynomials and trigonometric functions. The above men-tioned algorithms provide excellent accelerations for grayscaleimages, and are generally superior to algorithms based ondata approximation. Unfortunately, it is difﬁcult to approxi-mate a high-dimensional Gaussian using low-order shiftablefunctions. For example, the Fourier expansion in [25] is quiteeffective in one dimension (grayscale images), but its straight-forward extension to even three dimensions (color images)results in exponentially many terms. This is referred to as the“curse of dimensionality”. C. Contributions

Existing algorithms have focussed on either approximatingthe data [14], [17], [18], [19], [26] or the range kernel [22],[24], [25], [29], [30]. In this paper, we combine the ideasof shiftability [22], [24], [28], [31] and data approximation[23], [26] within a single framework to approximate (1) and(2). In particular, we locally approximate the range kernel in(1) and (2) using weighted and shifted copies of a Gaussian,where the weights and shifts are determined from the data.More speciﬁcally, we use clustering to determine the shifts,whereby the correlation between data channels is taken into account. Once the shifts have been ﬁxed, we determine theweights (coefﬁcients) using least-squares ﬁtting, where thedata is again taken into account. An important technical pointis that we are required to solve a least-squares problem ateach pixel, which can be expensive. We show how this canbe done using just one matrix inversion. In summary, the keydistinctions of our method in relation to previous approachesare as follows: • We use data-dependent methods to calculate both µ k and c k in (3) and (4). The latter is done in a heuristic fashionin [14], [17], [18]. We note that our approximation alsoreduces to the form in (3) and (4). However, the notion ofshiftable approximation (in a sense which will be madeprecise in Section II) plays an important role. Namely,it allows us to interpret the coefﬁcients in (3) and (4)from an approximation-theoretic point of view. This inturn forms the basis of the proposed optimization used tocompute the coefﬁcients. • Unlike [19], [20], [23], [26], [27], our approach is scal-able in the dimensions of the input f and the guide p . • An important difference with [22], [24], [25], [28] is thata different shiftable approximation is used at each pixelin our proposal, while the approximation is global in [22],[24], [25], [28]. This will be made precise in Section II. • Similar to [14], [17], [18], our method also involvessplatting (clustering), blurring (convolutions) and slicing(weighted recombinations). However, the important dif-ference is that we perform slicing in an optimal manner,which is done in a heuristic fashion in these methods.We explain the proposed shiftable approximation in detailin Section II. The algorithm emerging from the approximationis conceptually simple and easy to code. The ﬂowchart of theprocessing pipeline is shown in Figure 2. The computation-intensive components are “Cluster Range Space” and “FastConvolutions”, for which highly optimized routines are avail-able. Excluding clustering, the complexity is O ( K ( n + ρ )) ,where K is the number of clusters. Notice that there is nodependence on S . This is a big advantage compared to fastnonlocal means [13], [15], where the scaling is O ( S d ) . Thehighlight of our proposal is that we are able to derive a EEE TRANSACTIONS ON IMAGE PROCESSING 4 bound on the approximation error for a particular variant ofour algorithm [32]. In particular, the error is guaranteed tovanish if we use more clusters. This kind of guarantee isnot offered by state-of-the-art algorithms for high-dimensionalﬁltering [14], [17], [18], partly due to the complex nature oftheir formulation.We use the proposed fast algorithm for various applicationssuch as smoothing [4], denoising [9], low-light denoising [33],hyperspectral ﬁltering [34], and ﬂow-ﬁeld denoising [35]. Inparticular, we demonstrate that our algorithm is competitivewith existing fast algorithms for color bilateral and nonlocalmeans ﬁltering (cf. Figures 1 and 11). We note that althoughour original target was high-dimensional ﬁltering, our algo-rithm outperforms [26], which is considered as the state-of-the-art for bilateral ﬁltering of grayscale images (cf. Figure5). Finally, we note that we have not used the particular formof the Gaussian kernel in the derivation of the fast algorithm.Therefore, it can also be used for non-Gaussian kernels, suchas the exponential and the Lorentz kernel [23], [36].

D. Organization

This rest of the paper is organized as follows. In SectionII, we describe the proposed approximation and the resultingfast algorithm. We also derive error bounds for a particularvariant of the approximation. In Section III, we apply the fastalgorithm for high-dimensional bilateral and nonlocal meansﬁltering, and compare its performance (timing and accuracy)with state-of-the-art algorithms. We conclude the paper witha summary of the results in Section IV.II. P

ROPOSED M ETHOD

Notice that, if the guide p has constant intensity value ateach pixel, then (1) and (2) reduce to linear convolutions. Thisobservation is essentially at the core of the fast algorithms in[20], [23], [26]. On the other hand, the fast algorithms in [22],[25], [28] are derived using a completely different idea, wherethe univariate Gaussian kernel ϕ is approximated using apolynomial or a trigonometric function ψ . These functions are shiftable in the following sense: There exists basis functions ψ , . . . , ψ K such that, for any τ ∈ R , we can write ψ ( x − τ ) = K (cid:88) k =1 c k ( τ ) ψ k ( x ) , (5)where the coefﬁcients c , . . . , c K depend only on τ . One canreadily see that polynomials and trigonometric functions, ψ ( x ) = N (cid:88) n =0 p n x n and ψ ( x ) = N (cid:88) n =0 q n cos( nω x ) , (6)are shiftable. The utility of expansion (5) is that it allows us tofactor out p (ı) from the kernel. In particular, by replacing ϕ by ψ , it was shown in [22] that we can compute the bilateralﬁlter using fast convolutions. The Taylor polynomial of ϕ wasused for ψ in [24], and the Fourier approximation of ϕ wasused in [22], [25], [28]. A. Shiftable Approximation

As mentioned earlier, it is rather difﬁcult to obtain shiftableapproximations for high-dimensional Gaussians that are practi-cally useful. We can use separable extensions of (6) to generatesuch approximations. However, the difﬁculty is that the num-ber of terms grows as N ρ in this case, where N is the order in(6). We address this fundamental problem using a data-centricapproach. First, we do not use a shiftable (trigonometric orpolynomial) approximation of ϕ [22], [25]. Instead, for someﬁxed pixel ı ∈ Ω , we consider the multidimensional Gaussian ϕ ( x − p (ı)) centered at p (ı) appearing in (1) and (2). Similarto (5), we wish to ﬁnd ψ , . . . , ψ K such that ϕ ( x − p (ı)) ≈ K (cid:88) k =1 c k (ı) ψ k ( x ) , (7)where c , . . . , c K depend only on p (ı) . The key distinctionwith [22], [24], [25], [28] is that the functional approximationis deﬁned locally in (7), namely, the approximation is differentat each pixel. Moreover, (7) is neither a polynomial nor atrigonometric function. In spite of this, we continue to use theterm “shiftable” to highlight the fact that the approximationis based on “shifts” of the basis functions ψ , . . . , ψ K . In thiswork, we propose to use the translates of the original kernelas the basis functions, i.e., we set ψ k ( x ) = ϕ ( x − µ k ) . Onecan in principle use different basis functions, but we will notinvestigate this possibility in this paper.The important consideration is that the shifts { µ k } are setglobally. Since the range kernel is deﬁned via the guide p , wepropose to cluster the range space of p and use the clustercentroids for { µ k } . That is we partition the range space Θ = (cid:8) p (ı) : ı ∈ Ω (cid:9) , (8)into clusters C , ...., C K . We set µ k to be the centroid of C k .In summary, for each ı , we require that ϕ ( x − p (ı)) ≈ K (cid:88) k =1 c k (ı) ϕ ( x − µ k ) . (9)At this point, notice the formal resemblance between (5) and(9). Hence, we will refer to (9) as a shiftable approximationin the rest of the discussion, even though it is not shiftable inthe sense of (5).Note that we need a good approximation in (9) only for x ∈ Θ ı = { p (ı − j ) : j ∈ W } . This is because the samplesappearing in (1) and (2) are ϕ ( x − p (ı)) , where x takes valuesin Θ ı . To determine the coefﬁcients, it is therefore natural toconsider the following problem: min (¸ ı) ∈ R K (cid:88) x ∈ Θ ı (cid:16) ϕ ( x − p (ı)) − K (cid:88) k =1 c k (ı) ϕ ( x − µ k ) (cid:17) . (10)The difﬁculty is that this requires us to solve an expensiveleast-squares problem (the matrix to be inverted is large) foreach ı . More importantly, we have a different inversion, ateach pixel. This is time consuming and impractical. On theother hand, notice that Θ i is included in Θ . Hence, we couldperform the ﬁtting over Θ . Instead, we set Λ = { µ , . . . , µ K } and choose to ﬁt over Λ , which are quantized representatives EEE TRANSACTIONS ON IMAGE PROCESSING 5

Fig. 3. Proposed approximation of the Gaussian range kernel ϕ ( x − p (ı)) in (9) for p (ı) = 25 (left) and p (ı) = 100 (right). The shifts { µ k } are obtained byclustering a grayscale image whose histogram is shown above. As expected, the approximation is better for order K = 8 compared to K = 4 . In particular,the approximation is quite accurate over the entire dynamic range [0 , when K = 8 . However, notice that for K = 4 , the approximation is better in theinterval [100 , , where the density is higher. As the approximation is data-driven, the error gets distributed in tune with the underlying histogram.

10 20 30 40 50 60510152025

Fig. 4. Error rates for the proposed approximation (see text for the deﬁnitionof E K ), where ρ is the dimension of the range space of the guide image p .We notice that the approximation error falls off with the increase in order K .The error is averaged over the color images ( ρ = 3 ) from the Kodak datasetand hyperspectral images ( ρ = 33 ) from [37]. of the points in Θ . In short, while the cluster centers Λ mostlikely do not belong to Θ i , they are representative of the largerset Θ . Therefore, instead of (10), we consider the surrogateproblem: min (¸ ı) ∈ R K (cid:88) x ∈ Λ (cid:16) ϕ ( x − p (ı)) − K (cid:88) k =1 c k (ı) ϕ ( x − µ k ) (cid:17) . In matrix notation, we can compactly write this as min (¸ ı) ∈ R K (cid:107) A (¸ ı) − (¯ ı) (cid:107) , (11)where A ∈ R K × K and (¯ ı) ∈ R K are given by A kl = ϕ ( µ k − µ l ) and b k (ı) = ϕ ( µ k − p (ı)) . (12)The solution of (11) is (¸ ı) = A † (¯ ı) , where A † is the pseudo-inverse of A . In particular, we need to compute the pseudo-inverse just once; the coefﬁcients at each pixel are obtainedusing a matrix-vector multiplication. In Figure 6, we showthat solving (11) is much faster than solving (10), and that thecoefﬁcients from (11) are close to those from (10).The approximation result for univariate Gaussians are shownin Figure 3. Notice that our approximation depends on the dis-tribution of p (because of the clustering), and the coefﬁcientsare obtained by solving a least square problem at each pixel ı . Hence, in Figure 3, a particular distribution is selected andthe approximations for p (ı) = 25 and p (ı) = 100 are shown.The error rates (error vs. K ) for multidimensional Gaussians(covariance σ r I ) corresponding to dimensions ρ = 3 and are shown in Figure 4. The error E K for a ﬁxed K is simply(10) averaged over all the pixels.To summarize, the steps in the proposed shiftable approxi-mation are as follows: • Cluster the range space Θ and use the centers for µ k . • Set the basis functions as ψ k ( x ) = ϕ ( x − µ k ) . • Set up A using (12) and compute its pseudo-inverse A † . • At each pixel ı , set (¯ ı) using (12) and (¸ ı) = A † (¯ ı) .We note that one can freely choose different shifts (and basisfunctions) in (9) for different applications. That is, (9) offers abroad approximation framework, where one can consider otherrules for ﬁxing the parameters (shifts and coefﬁcients). B. Fast Algorithm

We now develop a fast algorithm by replacing the kernel in(1) and (2) with the approximation in (9). For ≤ k ≤ K ,deﬁne v k : Ω → R n and r k : Ω → R to be v k (ı) = (cid:88) j ∈ W ω ( j ) ϕ ( p (ı − j ) − µ k ) f (ı − j ) , (13)and r k (ı) = (cid:88) j ∈ W ω ( j ) ϕ ( p (ı − j ) − µ k ) . (14)For x = p (ı − j ) , j ∈ W , we replace ϕ ( x − p (ı)) in (1) and(2) with the approximation in (9). After some manipulations(see Appendix V-A), (1) and (2) are approximated as, ˆ g (ı) = 1ˆ η (ı) K (cid:88) k =1 c k (ı) v k (ı) , (15)and ˆ η (ı) = K (cid:88) k =1 c k (ı) r k (ı) . (16)The advantage with (15) and (16) is clear once we noticethat (13) and (14) can be expressed as convolutions. Indeed,by deﬁning u k : Ω → R n to be u k (ı) = b k (ı) f (ı) , where b k is given by (12), we can write v k (ı) = ( ω ∗ u k )(ı) = (cid:88) j ∈ W ω ( j ) u k (ı − j ) , and r k (ı) = ( ω ∗ b k )(ı) = (cid:88) j ∈ W ω ( j ) b k (ı − j ) , where ∗ denotes linear convolution. In summary, using (9), wehave been able to approximate the high-dimensional ﬁlteringusing weighted combinations of fast convolutions. The overallprocess is summarized in Algorithm 1. Note that we have used ⊕ , ⊗ and (cid:11) to represent pointwise addition, multiplication,and division. In steps 17 and 18, c k denotes the mapping EEE TRANSACTIONS ON IMAGE PROCESSING 6

Algorithm 1:

Fast High-Dimensional Filtering. Input : f : Ω → R n and p : Ω → R ρ , kernels ω and ϕ ; Parameter : Number of clusters K ; Output : Approximation in (15); // CLUSTER RANGE SPACE { µ k } ← cluster Θ in (8) using bisecting K -means; // FORM INTERMEDIATE IMAGES Set up A in (12) using { µ k } and ϕ , and compute A † ; for ı ∈ Ω do for k = 1 , . . . , K do b k (ı) ← ϕ ( µ k − p (ı)) ; u k (ı) ← b k (ı) f (ı) ; end end // CALCULATE COEFFICIENTS for ı ∈ Ω do (¸ ı) ← A † (¯ ı) ; end // FAST CONVOLUTIONS Initialize v : Ω → R n and r : Ω → R with zeros; for k = 1 , . . . , K do v ← v ⊕ [ c k ⊗ ( ω ∗ u k )] ; r ← r ⊕ [ c k ⊗ ( ω ∗ b k )] ; end ˆ g ← v (cid:11) r . c k : Ω → R . The dominant computation in Algorithm 1 arethe ( n + 1) K convolutions in steps 17 and 18. Clusteringand inversion of A is performed just once. Notice that b k ,which is used for computing the coefﬁcients in step 13, isanyways required to form the intermediate images in step 9.The ﬂowchart of our algorithm, along with typical timings fora megapixel image, is shown in Figure 2. Note that the stepsin Figure 2 (resp. loops in Algorithm 1) can be parallelized:clustering the range space (loop 6), ﬁnding coefﬁcients (loop12), and performing convolutions (loop 16). C. Implementation

We have used bisecting K -means [38] for iteratively clus-tering the range space in step 4. In particular, the clusterwith largest variance is divided into two parts (using -meansclustering) at every iteration. This ensures that we do not endup with few large clusters at the end. We initialize the -meansclustering using a pair of points with the largest separation. Wepicked bisecting K -means over other clustering algorithms asits cost is linear in K , and is generally faster than other cluster-ing algorithms [38]. Importantly, its computational overhead isnegligible compared to the time required for the convolutions.Needless to say, we can use any efﬁcient clustering methodand the fast algorithm would still go through.We have used the Matlab routine pinv to compute A † instep 5. As is well-known, when ω is a box or Gaussian,we can convolve using O (1) operations, i.e., the cost doesnot depend on the size of spatial ﬁlter [39], [40]. TheGaussian convolutions in steps 17 and 18 are implementedusing Young’s O (1) algorithm [40]. For box convolutions, we used a standard recursive procedure which require justfour operations per pixel for any box size. Since the leadingcomputations are the convolutions, and there are O ( Kn ) convolutions, the overall complexity is O ( K ( n + ρ )) ; theadditional O ( Kρ ) term is for computing (12). Importantly,we are able to obtain a speedup of S d /K over the brute-force computations of (1) and (2). The storage complexity ofAlgorithm 1 is clearly linear in n , K, as well as the image size.The MATLAB implementation of our algorithm is availablehere: https://github.com/pravin1390/FastHDFilter. D. Approximation Accuracy

We note that the algorithm in [32] is a variant of our method.Indeed, for ≤ k ≤ K , if we set the coefﬁcients in (9) to be c k (ı) = (cid:40) if p (ı) ∈ C k , else , (17)then we obtain the fast algorithm in [32]. In other words, theﬁltering is performed on a cluster-by-cluster basis in this case.Correspondingly, (15) reduces to ˆ g (ı) = v s (ı) r s (ı) , (18)assuming that p (ı) is in cluster C s .In (15) and (16), we weight the results from the K clusters,where the weights are obtained using the optimization in (11).We will demonstrate in Section III that the weighting helps inreducing the approximation error. Unfortunately, it also makesthe analysis of the algorithm difﬁcult. Intuitively though, onewould expect the error from (15) to be less than that from (18).For the latter approximation, we have the following result [32]. Theorem 1 (Error bound):

Suppose that ω and ϕ are non-negative, and that the latter is Lipschitz continuous, i.e., forsome L > , | ϕ ( x ) − ϕ ( y ) | ≤ L (cid:107) x − y (cid:107) , for any x and y in R ρ . Then, for some constant C > , (cid:88) ı ∈ Ω (cid:107) ˆ g (ı) − g (ı) (cid:107) ≤ CL n | W | E K , (19)where ˆ g is given by (18), and E K is the clustering error: E K = K (cid:88) k =1 (cid:88) p (ı) ∈C k (cid:107) p (ı) − µ k (cid:107) . (20)We note that the assumptions in Theorem 1 are valid when ω is a box or Gaussian, and ϕ is a multidimensional Gaussian.For completeness, we provide the derivation of Theorem 1 inAppendix V-B. For several clustering methods, the clusteringerror E K vanishes as K → ∞ [38]. For any of these methods,we can approximate (1) with arbitrary accuracy provided K is sufﬁciently large. We note that such a guarantee is notavailable for existing fast approximation algorithms for high-dimensional ﬁltering [14], [17], [18].III. N UMERICAL R ESULTS

We now report some numerical results for high-dimensionalﬁltering. We have used the Matlab implementation of Algo-rithm 1. For fair timing comparisons with Adaptive Manifolds

EEE TRANSACTIONS ON IMAGE PROCESSING 7 (AM) [14] and Global Color Sparseness (GCS) [19], we haveused the Matlab code provided by the authors. The timinganalysis was performed on an Intel 4-core 3.4 GHz machinewith 32 GB memory. The grayscale and color images usedfor our experiments were obtained from standard databases .The infrared and hyperspectral images are the ones used in[14] and [37], [41]. To compare the ﬁltering accuracy withexisting methods, we have ﬁxed the timings by adjusting therespective approximation order. The objective of this sectionis to demonstrate that, in spite of its simple formulation, ouralgorithm is competitive with existing fast approximations ofbilateral and nonlocal means ﬁltering.We have used an isotropic Gaussian range kernel for bilat-eral and nonlocal means ﬁltering : ϕ ( x ) = exp (cid:0) −(cid:107) x (cid:107) / σ r (cid:1) , where the standard deviation σ r is used to control the effectivewidth (inﬂuence) of the kernel. We recall from (1) and (2) thatthe input to ϕ is the difference x = p (ı) − p (ı − j ) . In oneof the experiments, ϕ ( x ) is an anisotropic Gaussian; we haveexplicitly mentioned this at the right place. The spatial kernel ω for bilateral ﬁltering is also Gaussian: ω (ı) = exp (cid:0) −(cid:107) ı (cid:107) / σ s (cid:1) , where σ s is again the standard deviation. For nonlocal means(NLM), ω is a box ﬁlter, namely, no spatial weighting is used.The ﬁlter width for bilateral ﬁltering is S = 3 σ s , while thatfor NLM is explicitly mentioned (typically, S = 10 ).Following [24], [26], the error between the outputs of thebrute-force implementation and the fast algorithm is measuredusing the peak signal-to-noise ratio (PSNR). This is given by PSNR = −

10 log (MSE) , where MSE = | Ω | − (cid:88) ı ∈ Ω (cid:107) ˆ g (ı) − g (ı) (cid:107) , (21)and | Ω | is the number of pixels. Note that PSNR is also usedto measure denoising performance; in this case, ˆ g (ı) and g (ı) are the denoised and clean images in (21). It should be evidentfrom the context as to what is being measured using PSNR .We also use

SSIM [42] for measuring visual quality.

A. Grayscale Bilateral Filtering

As mentioned earlier, although the proposed method istargeted at high-dimensional ﬁltering, we can outperform [26],which is considered the state-of-the-art for bilateral ﬁltering ofgrayscale images. We demonstrate this in Table I and Figure5 using a couple of examples. In particular, notice in Figure 5that artifacts are visible in Yang’s result (when K is small.).Our PSNR is almost dB higher, and our output does notshow any artifacts. A detailed PSNR comparison is providedin Table I. We notice that, for the same timing, our PSNR isconsistently better than [26]. Note that we have set K = 4 asthe number of clusters (resp. quantization bins) for our method(resp. [26]).For completeness, we have reported the timings for different σ s in Table II, where we have used Young’s fast algorithm for https://goo.gl/821N2G, https://goo.gl/2fcNmu, https://goo.gl/MvxCMX. (a) Input ( × ). (b) Brute-force ﬁltering.(c) Proposed (43.5 dB). (d) Yang ( dB).Fig. 5. Visual comparison of bilateral ﬁltering for the House image. Filterparameters: σ s = 10 and σ r = 30 . For a fair comparison, we have used fourclusters for the proposed method and four bins for Yang’s method [26]. Noticethat our result is better than Yang’s, both visually (compare boxed areas) and interms of PSNR . This is because we use non-uniform quantization (clustering)and data-driven approximation (see Figure 3).TABLE IC

OMPARISON OF APPROXIMATION ACCURACY ( PSNR ) FOR BILATERALFILTERING OF

Barbara ( × ). A VERAGE TIMING IS

MS FOR Y ANG AND

MS FOR PROPOSED METHOD . σ s \ σ r

10 20 30 40 50 60 70 80 90 100

Yang [26] PSNR .

48 53 .

01 53 .

48 53 .

65 53 .

48 53 .

17 53 .

01 52 .

71 52 .

28 52 . PSNR .

23 53 .

17 53 .

81 53 .

98 53 .

81 53 .

98 53 .

98 53 . PSNR .

86 53 .

32 53 .

81 53 .

98 53 .

82 53 .

65 53 .

82 53 .

98 54 . Proposed PSNR .

08 64 .

61 61 .

69 59 .

50 57 .

50 56 .

09 54 .

88 53 .

81 53 .

32 52 . PSNR .

53 68 .

13 64 .

61 62 .

11 60 .

53 59 .

19 58 .

03 57 .

25 56 .

77 56 . PSNR .

01 70 .

07 67 .

30 64 .

61 62 .

56 60 .

89 60 .

17 59 .

19 58 .

88 58 . performing the Gaussian convolutions [40]. As expected, thetiming essentially does not change with σ s . TABLE II

TIMINGS FOR BILATERAL FILTERING OF

Barbara

FOR VARYING σ s ( σ r = 100 , K = 16 ). σ s

10 20 30 40 50 60 70 80

Time (ms)

496 498 475 486 500 489 485 481

In Figure 6, we have compared the bilateral ﬁltering resultsobtained using (10) and (11). Since (11) is an approximation of(10), we see a drop in PSNR using (10), but importantly thereis a signiﬁcant speed up using (11). In fact, the time requiredusing (10) is more than that of the brute-force implementation.

B. Color Bilateral Filtering

We next perform fast bilateral ﬁltering of color images,for which the state-of-the-art methods are AM [14] and GCS[19]. We have also compared with our previous work [32],which we simply refer to as “Clustering”. In Figure 7, wehave studied the effect of the number of clusters on the

EEE TRANSACTIONS ON IMAGE PROCESSING 8 (a) Input ( × ). (b) Brute-force ( ms).(c) Using (10) ( sec, . dB). (d) Using (11) (27 ms, 32.6dB).

Fig. 6. Bilateral ﬁltering of a grayscale image. Parameters: σ s = 2 , σ r = 20 , and K = 4 . The coefﬁcients are obtained from (10) and (11) in (c) and (d). ﬁltering accuracy, for our method, GCS, AM and [32]. WhileGCS performs better for small K (coarse approximation), ourmethod outperforms GCS when K > . Note that the numberof manifolds can only take dyadic (discrete) values [14].

10 20 30 40 50 6030405060

ClusteringGCSProposedAM

Fig. 7. Scaling of ﬁltering accuracy (average

PSNR ) with number of clusters(manifolds for AM), for bilateral ﬁltering of color images from the Kodakdataset. The parameters are σ s = 20 and σ r = 40 . size \ K P .

54 48 .

68 53 . K .

29 47 .

42 53 . K .

25 46 .

90 54 . TABLE III

PSNR

FOR COLOR IMAGES ( σ s = 10 , σ r = 50 ). Resolution (MP) T i m i ng ( s e c ) Fig. 8. Timing (seconds) vs. resolution(MP) for color ﬁltering.

In Table III, we report the ﬁltering accuracy for full HD, K ultra HD, and K ultra HD by varying the number of clusters K . For a given resolution and K , we have averaged the PSNRover six high resolution images with that resolution. Noticethat, irrespective of the image resolution, the PSNR values areabove dB with just clusters. In Figure 8, the timings forﬁltering color images are plotted for varying image sizes ( . https://hdqwalls.com MP to MP). This plot veriﬁes the linear dependence ofthe timing on the image size, irrespective of the number ofclusters used.

25 50 75 100 125 150 175 200 225 25030354045

ProposedGCSAMClustering

Fig. 9. Approximation accuracy (

PSNR ) for bilateral ﬁltering of color

Bar-bara ( × × ), at different σ r values averaged over σ s = 10 , , .The average timings are: Proposed: ms, Global Color Sparseness: ms, Adaptive Manifolds: ms and Clustering [32]: ms. In Figure 9, we have compared the

PSNR and timings forbilateral ﬁltering a color image, at different σ s and σ r values.We have used K = 16 clusters for all three methods: proposed,GCS, and [32]. We use the default settings for the numberof manifolds in AM. Notice that we are better by - dB,while the timings are roughly the same. In particular, notice thePSNR improvement that we obtain over [32] by optimizing thecoefﬁcients, i.e., by using (15) in place of (18). Interestingly,[32] can outperform GCS and AM at large σ r values.A visual comparison for color bilateral ﬁltering was alreadyshown in Figure 1, along with the error images, timings, and PSNR s. In particular, notice that the error from our method issmaller than AM and GCS. This is interesting as our methodis conceptually simpler than these algorithms — AM has acomplex formulation and GCS is based on two-pass ﬁltering.In Figure 10, a visual result is provided to highlight howthe approximation accuracy scales with K . Notice that theaccuracy improves signiﬁcantly as K increases from to . Infact, the approximation already resembles the result of brute-force ﬁltering when K is . C. Color Nonlocal Means

In NLM, f (ı) is the noisy image (corrupted with additiveGaussian noise) and p (ı) is a square patch of pixels around ı .In particular, if the patch is m × m , then the dimension of p (ı) is ρ = 3 m for a color image. A popular trick to accelerateNLM (called PCA-NLM) is to project the patch onto a lower-dimensional space using PCA [21]. We note that PCA wasused for reducing the patch dimension (as explained earlier)for both AM and our algorithm.A visual comparison is provided in Figure 11 for low andhigh noise, where the superior performance of our method overAM is evident. We have also compared with a state-of-the-art Gaussian denoiser (CPU implementation ) based on deepneural nets [12]. It is not surprising that the result from [12] isbetter both in terms of PSNR and visual quality. However, weare somewhat faster. Figure 11 also suggests that our methodis visually closer to PCA-NLM at both low and high noiselevels, albeit with a signiﬁcant speedup ( min to . sec).In Figure 12, we have reported Gaussian denoising resultson the Kodak dataset for different noise levels ( σ ). While the https://github.com/cszn/FFDNet EEE TRANSACTIONS ON IMAGE PROCESSING 9 (a) K = 2 ( ms, dB). (b) K = 4 ( ms, dB). (c) K = 8 ( ms, dB). (d) K = 16 ( ms, dB). (e) Brute-force.Fig. 10. Bilateral ﬁltering of Peppers for different values of K ( σ s = 10 and σ r = 40 ). Notice how the accuracy improves with K .(a) Clean/Noisy ( dB, σ =20 ). (b) [21] ( min, . dB, . ) . (c) Ours (1.7 sec, 31.2 dB, 0.92) . (d)

AM ( sec, . dB, . ) . (e) [12] ( . sec, . dB, . ) .(f) Clean/Noisy ( dB, σ =63 ). (g) [21] ( min, . dB, . ) . (h) Ours (2.7 sec, 26.8 dB, 0.82) . (i)

AM ( . sec, dB, . ) . (j) [12] ( sec, dB, . ) .Fig. 11. Color image denoising (Gaussian noise) using PCA-NLM [21], where the patch and search sizes are × and × . For PCA-NLM, AMand our method, PCA was used to reduce the range dimension from × × to . We used clusters for our method. The number of manifolds wasautomatically set to in the AM code [14]. The run-time, PSNR and SSIM for AM with manifolds are sec, . dB and . (top image) and . sec, . dB and . (bottom image). For comparison, we have also shown the result from a state-of-the-art Gaussian denoiser [12].

25 37.5 50 62.5 7512151821242730

ProposedAMPCA-NLM

25 37.5 50 62.5 750.20.40.60.81

ProposedAMPCA-NLM

Fig. 12. Comparison of the denoising performances of PCA-NLM [21],Adaptive Manifolds, and the proposed method. The noise level is denotedby σ ( × . The PCA dimension was set to for all methods. The PSNR and

SSIM [42] values are averaged over the images from the Kodak dataset.The parameters used are K = 31 , S = 10 , and m = 3 . proposed method and AM perform similarly for small σ , theformer is more robust when σ is large. In fact, the denoisingperformance of AM degrades rapidly with the increase in σ .Another important point is that our result is close to PCA-NLM in terms of PSNR and SSIM for all noise levels. D. Hyperspectral Denoising

We next perform denoising of hyperspectral images usingthe bilateral ﬁlter. For such images, the range dimension (i.e.,the number of spectral bands) is high. In Figure 13, denoisingresults are exclusively shown to highlight the speedup ofour approximation over brute-force implementation for a -band image. Notice the dramatic reduction in timing compared to the brute-force implementation. Moreover, notice that thedenoising quality is in fact quite good for our method (comparethe text in the boxed regions). (a) Input [37]. (b) 440nm band.(c) Brute-force ( min). (d) Proposed (15 sec).

Fig. 13. Denoising of a hyperspectral image (1340 × × usingbilateral ﬁltering ( σ s = 5 and σ r = 100 ). The image in (b) shows one of thenoisy bands; the same band after ﬁltering is shown in (c) and (d). We haveshown one of the bands just for visualization; the ﬁltering was performed onthe entire hyperspectral image and not on a band-by-band basis. Moreover, we also compare with recent optimization-based

EEE TRANSACTIONS ON IMAGE PROCESSING 10 (a) Infrared input. (b) Noisy low-light input. (c) Without infrared. (d) With infrared data. (e) [43].Fig. 14. Nonlocal means denoising of a low-light image using [43] and the proposed method, with and without infrared data [33]. Notice that we obtainbetter denoising by taking the infrared information into account (see zoomed sections).(a) Clean/Noisy( σ = 0 . ). (b) Proposed. (c) [44]. (d) [45].Fig. 15. Hyperspectral denoising of a natural image corrupted with Gaussiannoise. Image size: ( × × bands. For our method, σ s = 3 , σ r = 100 , and K = 32 . We used iterations for [44] and iteration for[45]. The (timing, MPSNR, MSSIM) for (b), (c) and (d) are (36 sec, 31.29dB, 0.87) , ( sec, . dB, . ) and ( sec, . dB, . ). denoising methods [44], [45], where parameters are tunedaccordingly. Visual and quantitative comparisons are shownin Figure 15 (Pavia dataset). For quantitative comparisons, wehave used MPSNR and MSSIM, which are simply the PSNRand SSIM values averaged over the spectral bands. We noticethat the restoration obtained using our method is better than[44], [45] (source code made public by authors), which issupported by the metrics shown in the ﬁgures. The same isvisually evident from a comparison of the boxed regions inFigure 15. In particular, the color is not restored properly in[44], and grains can be seen in [45]. As expected, we aremuch faster than these iterative methods, since we performthe ﬁltering in one shot. E. Low-light Denoising

Finally, we use our fast algorithm for NLM denoising oflow-light images using additional infrared data [33]. In Figure14, we have shown a visual result which compares our method,both with and without infrared data, and [43] (an optimizationmethod). We set the patch and search window sizes as × and × . We used an anisotropic Gaussian kernel, where σ r =50 for the low-light data and σ r = 10 for the infrared data.The dimension was reduced from × × to using PCA,and the infrared data was added as the seventh dimension. Weused clusters for our method. In Figure 14, notice that somesalient features are lost if we solely use the low-light input asthe guide. Instead, if we add the infrared data to the guide,then the restored image is sharper and the features are moreapparent compared to [43] and NLM without infrared data. F. Flow-ﬁeld Denoising

Finally, we apply the proposed method for ﬂow-ﬁeld de-noising. In particular, sharp directional changes in the ﬂow canbe preserved much better using NLM, while simultaneouslyremoving the noise (with Gaussian ﬁltering as the baseline).An instance of ﬂow-ﬁeld denoising [35] is shown in Figure16 and compared with Gaussian smoothing.IV. C

ONCLUSION

We proposed a framework for fast high-dimensional ﬁlteringby approximating both the data and the kernel. In particular,we derived an algorithm that fuses the scalability of the formerwith the approximation capability of the latter. At the coreof our algorithm is the concept of shiftable approximation,which allows us to interpret the coefﬁcients in the frameworkof Durand et al. [23] from an approximation-theoretic pointof view. We proposed an efﬁcient method for determining theshifts (centers) and the coefﬁcients using K -means clustering(inspired by [19]) and data-driven optimization. Though theproposed algorithm is conceptually simple and easy to imple-ment (about lines of code), it was shown to yield promisingresults for diverse applications. In particular, our algorithm wasshown to be competitive with state-of-the-art methods for fastbilateral and nonlocal means ﬁltering of color images. EEE TRANSACTIONS ON IMAGE PROCESSING 11 -2 20 20 02 -2 -2 (a) Clean ﬂow-ﬁeld. -2 20 20 02 -2 -2 (b) Noisy ﬂow-ﬁeld ( σ = 7 . ). -2 20 20 02 -2 -2 (c) NLM output. -22 20 0 02 -2 -2 (d) Gaussian smoothing.Fig. 16. Denoising of a synthetic ﬂow-ﬁeld using proposed NLM approxima-tion and Gaussian smoothing. The synthetic ﬂow-ﬁeld used is u ( x, y, z ) = x − ze y , v ( x, y, z ) = y − xze y , and w ( x, y, z ) = z − xe y . The patchand search window sizes for NLM are × × and × × , and σ r = 7 . . We have used PCA to reduce the dimension of the patch from × to . Notice that isotropic Gaussian smoothing fails to preserve sharpchanges in the ﬂow direction. A CKNOWLEDGEMENTS

The authors thank the editor and the anonymous reviewersfor their comments and suggestions. We also thank the authorsof [12], [14], [19], [44] for distributing their code. We alsothank Jingxiang Yang for helping us debug the code of [45].V. A

PPENDIX

A. Derivation of (15) and (16)We recall that the basis of the approximation is the follow-ing: For x = p (ı − j ) , j ∈ W , we replace ϕ ( x − p (ı)) in (1)and (2) with (cid:80) Kk =1 c k (ı) ϕ ( x − µ k ) . That is, we approximatethe numerator of (1) with (cid:88) j ∈ W ω ( j ) (cid:32) K (cid:88) k =1 c k (ı) ϕ ( p (ı − j ) − µ k ) (cid:33) f (ı − j ) , (22)and (2) with (cid:88) j ∈ W ω ( j ) (cid:32) K (cid:88) k =1 c k (ı) ϕ ( p (ı − j ) − µ k ) (cid:33) . (23)Exchanging the sums, we can write (22) as K (cid:88) k =1 c k (ı)  (cid:88) j ∈ W ω ( j ) ϕ ( p (ı − j ) − µ k ) f (ı − j )  = K (cid:88) k =1 c k (ı) v k (ı) , where v k is deﬁned in (13). Similarly, we can write (23) as (cid:80) Kk =1 c k (ı) r k (ı) , where r k is deﬁned in (14). This completesthe derivation of (15) and (16). B. Proof of Theorem 1

For notational convenience, let g (ı) = ξ (ı) /η (ı) , where ξ (ı) = (cid:88) j ∈ W ω ( j ) ϕ (cid:0) p (ı − j ) − p (ı) (cid:1) f (ı − j ) , and η is as deﬁned in (2). For some ﬁxed ı ∈ Ω , assume that p (ı) ∈ C s where ≤ s ≤ K . Then, following (18), ˆ g (ı) = v s (ı) r s (ı) . (24)Note that we can write ˆ g (ı) − g (ı) = 1 η (ı) (cid:16) ˆ g (ı) (cid:0) η (ı) − r s (ı) (cid:1) + (cid:0) v s (ı) − ξ (ı) (cid:1)(cid:17) . By triangle inequality, we can bound (cid:107) ˆ g (ı) − g (ı) (cid:107) using | η (ı) | (cid:16) √ nR | η (ı) − r s (ı) | + (cid:107) v s (ı) − ξ (ı) (cid:107) (cid:17) , (25)where we have used the fact that ˆ g (ı) ∈ [0 , R ] n . This followsfrom (13), (14), and (24). Now v s (ı) − ξ (ı) = (cid:88) j ∈ W ω ( j ) δ (ı , j ) f (ı − j ) , where δ (ı , j ) = ϕ ( p (ı − j ) − p (ı)) − ϕ ( p (ı − j ) − µ s ) . By Lipschitz property, (cid:107) δ (ı , j ) (cid:107) ≤ L (cid:107) p (ı) − µ s (cid:107) . Note that we can assume that each ω ( j ) is in [0 , R ] . This issimply because the weights appear both in the numerator anddenominator in (1) and (24). Moreover, since the range of f is [0 , R ] n , using triangle inequality again, we obtain (cid:107) v s (ı) − ξ (ı) (cid:107) ≤ L | W |√ nR (cid:107) p (ı) − µ s (cid:107) . (26)Similarly, | η (ı) − r s (ı) | ≤ L | W |(cid:107) p (ı) − µ s (cid:107) . (27)Now, since ω and ϕ are non-negative, η (ı) ≥ ω ( ) ϕ ( ) . Usingthe fact that / | η (ı) | ≤ /ω ( ) ϕ ( ) along with (25), (26) and(27), we obtain (cid:107) ˆ g (ı) − g (ı) (cid:107) ≤ C | W |√ nL (cid:107) p (ı) − µ s (cid:107) , (28)where C = 2 √ R/ω ( ) ϕ ( ) . On squaring (28) and summingover all pixels, we arrive at (19).R EFERENCES[1] P. Perona and J. Malik, “Scale-space and edge detection usinganisotropic diffusion,”

IEEE Transactions on Pattern Analysis andMachine Intelligence , vol. 12, no. 7, pp. 629–639, 1990.[2] V. Aurich and J. Weule, “Non-linear gaussian ﬁlters performing edgepreserving diffusion,”

Mustererkennung , pp. 538–545, 1995.[3] S. M. Smith and J. M. Brady, “SUSAN- A new approach to low levelimage processing,”

International Journal of Computer Vision , vol. 23,no. 1, pp. 45–78, 1997.[4] C. Tomasi and R. Manduchi, “Bilateral ﬁltering for gray and colorimages,”

Proc. IEEE International Conference on Computer Vision , pp.839–846, 1998.[5] D. Comaniciu and P. Meer, “Mean shift analysis and applications,”

Proc.IEEE International Conference on Computer Vision , vol. 2, pp. 1197–1203, 1999.[6] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge-preservingdecompositions for multi-scale tone and detail manipulation,”

ACMTransactions on Graphics , vol. 27, no. 3, p. 67, 2008.

EEE TRANSACTIONS ON IMAGE PROCESSING 12 [7] E. S. Gastal and M. M. Oliveira, “Domain transform for edge-awareimage and video processing,”

ACM Transactions on Graphics , vol. 30,no. 4, pp. 69:1–69:12, 2011.[8] K. He, J. Sun, and X. Tang, “Guided image ﬁltering,”

IEEE Transactionson Pattern Analysis and Machine Intelligence , vol. 35, no. 6, pp. 1397–1409, 2013.[9] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for imagedenoising,”

Proc. IEEE Conference on Computer Vision and PatternRecognition , vol. 2, pp. 60–65, 2005.[10] S. Paris, P. Kornprobst, J. Tumblin, and F. Durand, “Bilateral ﬁltering:Theory and Applications,”

Foundations and Trends R (cid:13) in ComputerGraphics and Vision , vol. 4, no. 1, pp. 1–73, 2009.[11] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian et al. , “Image denoisingwith block-matching and 3D ﬁltering,” Proceedings of SPIE , vol. 6064,no. 30, pp. 606 414–606 414, 2006.[12] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussiandenoiser: Residual learning of deep cnn for image denoising,”

IEEETransactions on Image Processing , vol. 26, no. 7, pp. 3142–3155, 2017.[13] J. Darbon, A. Cunha, T. F. Chan, S. Osher, and G. J. Jensen, “Fastnonlocal ﬁltering applied to electron cryomicroscopy,”

Proc. IEEEInternational Symposium on Biomedical Imaging: From Nano to Macro ,pp. 1331–1334, 2008.[14] E. S. Gastal and M. M. Oliveira, “Adaptive manifolds for real-time high-dimensional ﬁltering,”

ACM Transactions on Graphics , vol. 31, no. 4,pp. 33:1–33:13, 2012.[15] J. Wang, Y. Guo, Y. Ying, Y. Liu, and Q. Peng, “Fast non-local algorithmfor image denoising,”

Proc. IEEE International Conference on ImageProcessing , pp. 1429–1432, 2006.[16] J. Chen, S. Paris, and F. Durand, “Real-time edge-aware image process-ing with the bilateral grid,”

ACM Transactions on Graphics , vol. 26,no. 3, p. 103, 2007.[17] A. Adams, J. Baek, and M. A. Davis, “Fast high-dimensional ﬁlteringusing the permutohedral lattice,”

Computer Graphics Forum , vol. 29,no. 2, pp. 753–762, 2010.[18] A. Adams, N. Gelfand, J. Dolson, and M. Levoy, “Gaussian kd-trees forfast high-dimensional ﬁltering,”

ACM Transactions on Graphics , vol. 28,no. 3, p. 21, 2009.[19] M. G. Mozerov and J. Van De Weijer, “Global color sparseness and alocal statistics prior for fast bilateral ﬁltering,”

IEEE Transactions onImage Processing , vol. 24, no. 12, pp. 5842–5853, 2015.[20] S. Paris and F. Durand, “A fast approximation of the bilateral ﬁlter usinga signal processing approach,”

Proc. European Conference on ComputerVision , pp. 568–580, 2006.[21] T. Tasdizen, “Principal neighborhood dictionaries for nonlocal meansimage denoising,”

IEEE Transactions on Image Processing , vol. 18,no. 12, pp. 2649–2660, 2009.[22] K. N. Chaudhury, D. Sage, and M. Unser, “Fast O(1) bilateral ﬁlteringusing trigonometric range kernels,”

IEEE Transactions on Image Pro-cessing , vol. 20, no. 12, pp. 3376–3382, 2011.[23] F. Durand and J. Dorsey, “Fast bilateral ﬁltering for the display of high-dynamic-range images,”

ACM Transactions on Graphics , vol. 21, no. 3,pp. 257–266, 2002.[24] F. Porikli, “Constant time O(1) bilateral ﬁltering,”

Proc. IEEE Confer-ence on Computer Vision and Pattern Recognition , pp. 1–8, 2008.[25] K. Sugimoto and S. I. Kamata, “Compressive bilateral ﬁltering,”

IEEETransactions on Image Processing , vol. 24, no. 11, pp. 3357–3369, 2015.[26] Q. Yang, K. H. Tan, and N. Ahuja, “Real-time O(1) bilateral ﬁltering,”

Proc. IEEE Conference on Computer Vision and Pattern Recognition ,pp. 557–564, 2009.[27] Q. Yang, N. Ahuja, and K.-H. Tan, “Constant time median and bilateralﬁltering,”

International Journal of Computer Vision , vol. 112, no. 3, pp.307–318, 2015.[28] K. N. Chaudhury and S. D. Dabhade, “Fast and provably accuratebilateral ﬁltering,”

IEEE Transactions on Image Processing , vol. 25,no. 6, pp. 2519–2528, 2016.[29] S. Ghosh and K. N. Chaudhury, “Fast bilateral ﬁltering of vector-valuedimages,”

Proc. IEEE International Conference on Image Processing , pp.1823–1827, 2016.[30] ——, “Color bilateral ﬁltering using stratiﬁed fourier sampling,”

Proc.Global Conference on Signal and Information Processing , 2018.[31] K. N. Chaudhury, “Acceleration of the shiftable O(1) algorithm forbilateral ﬁltering and nonlocal means,”

IEEE Transactions on ImageProcessing , vol. 22, no. 4, pp. 1291–1300, 2013.[32] P. Nair and K. N. Chaudhury, “Fast high-dimensional ﬁltering usingclustering,”

Proc. IEEE International Conference on Image Processing ,pp. 240–244, 2017. [33] S. Zhuo, X. Zhang, X. Miao, and T. Sim, “Enhancing low light imagesusing near infrared ﬂash images,”

Proc. IEEE International Conferenceon Image Processing , pp. 2537–2540, 2010.[34] Q. Yuan, L. Zhang, and H. Shen, “Hyperspectral image denoisingemploying a spectral–spatial adaptive total variation model,”

IEEETransactions on Geoscience and Remote Sensing , vol. 50, no. 10, pp.3660–3677, 2012.[35] M. A. Westenberg and T. Ertl, “Denoising 2-D vector ﬁelds by vectorwavelet thresholding,”

Journal of WSCG , pp. 33–40, 2005.[36] Q. Yang, “Hardware-efﬁcient bilateral ﬁltering for stereo matching,”

IEEE Transactions on Pattern Analysis and Machine Intelligence ,vol. 36, no. 5, pp. 1026–1032, 2014.[37] S. M. C. N. D. H. Foster, K. Amano and M. J. Foster, “Frequency ofmetamerism in natural scenes,”

Journal of the Optical Society of AmericaA , vol. 23, pp. 2359–2372, 2006.[38] P.-N. Tan, M. Steinbach, and V. Kumar,

Introduction to Data Mining .Addison-Wesley Longman Publishing Co., Inc., 2005.[39] R. Deriche, “Recursively implementating the Gaussian and its deriva-tives,” INRIA, Research Report RR-1893, 1993.[40] I. T. Young and L. J. Van Vliet, “Recursive implementation of theGaussian ﬁlter,”

Signal Processing , vol. 44, no. 2, pp. 139–151, 1995.[41] “Hyperspectral image database.” [Online]. Available: http://lesun.weebly.com/hyperspectral-data-set.html[42] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: from error visibility to structural similarity,”

IEEETransactions on Image Processing , vol. 13, no. 4, pp. 600–612, 2004.[43] X. Yang and X. Lin, “Brightening and denoising lowlight images,”

Proc.International Conference on Information, Communications and SignalProcessing , pp. 1–5, 2015.[44] F. Fan, Y. Ma, C. Li, X. Mei, J. Huang, and J. Ma, “Hyperspectral imagedenoising with superpixel segmentation and low-rank representation,”

Information Sciences , vol. 397, pp. 48–68, 2017.[45] Y.-Q. Zhao and J. Yang, “Hyperspectral image denoising via sparse rep-resentation and low-rank constraint,”