Multiplicative Noise Removal: Nonlocal Low-Rank Model and Its Proximal Alternating Reweighted Minimization Algorithm
MMultiplicative Noise Removal: Nonlocal Low-Rank Model and ItsProximal Alternating Reweighted Minimization Algorithm ∗ Xiaoxia Liu † Jian Lu †‡ Lixin Shen § Chen Xu † Yuesheng Xu ¶ Abstract
The goal of this paper is to develop a novel numerical method for efficient multiplicativenoise removal. The nonlocal self-similarity of natural images implies that the matrices formedby their nonlocal similar patches are low-rank. By exploiting this low-rank prior with applicationto multiplicative noise removal, we propose a nonlocal low-rank model for this task and developa proximal alternating reweighted minimization (PARM) algorithm to solve the optimizationproblem resulting from the model. Specifically, we utilize a generalized nonconvex surrogateof the rank function to regularize the patch matrices and develop a new nonlocal low-rankmodel, which is a nonconvex nonsmooth optimization problem having a patchwise data fidelityand a generalized nonlocal low-rank regularization term. To solve this optimization problem,we propose the PARM algorithm, which has a proximal alternating scheme with a reweightedapproximation of its subproblem. A theoretical analysis of the proposed PARM algorithmis conducted to guarantee its global convergence to a critical point. Numerical experimentsdemonstrate that the proposed method for multiplicative noise removal significantly outperformsexisting methods such as the benchmark SAR-BM3D method in terms of the visual quality ofthe denoised images, and the PSNR (the peak-signal-to-noise ratio) and SSIM (the structuralsimilarity index measure) values.
Key words: multiplicative noise removal, nonlocal low-rank regularization, image restoration
AMS subject classification:
We consider in this paper the problem of multiplicative noise removal. To effectively restoreimages degraded by multiplicative noise, we develop a method which consists of an optimizationmodel and an iterative algorithm to solve the minimization problem. Based on the nonlocal self-similarity of natural images, we propose a nonlocal low-rank model for multiplicative noise removal. ∗ Submitted to the editors January 30, 2020.
Funding:
The research of J. Lu is partially supported by the Natural Science Foundation of China under grant61972265 and 11871348. The research of L. Shen is partially supported by the National Science Foundation undergrant DMS-1913039. The research of C. Xu is partially supported by the Natural Science Foundation of Chinaunder grant 61872429. The research of Y. Xu is partially supported by the National Science Foundation under grantDMS-1912958 and by the Natural Science Foundation of China under grant 11771464. † Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statis-tics, Shenzhen University, Shenzhen, 518060, P.R. China. ([email protected],[email protected],[email protected]) ‡ Corresponding author. § Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA. ([email protected]) ¶ Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, USA. ([email protected]) a r X i v : . [ m a t h . O C ] F e b he resulting model is a nonconvex nonsmooth minimization problem. We develop a proximalalternating reweighted minimization (PARM) algorithm with a convergence guarantee to efficientlysolve the problem.Multiplicative noise (i.e., speckle noise) widely occurs in coherent imaging systems due to theinterference of coherent waves scattered from distributed targets. For example, images obtainedfrom synthetic aperture radar (SAR) [24], ultrasound imaging [30] and laser imaging [27] are nat-urally contaminated with multiplicative noise. Removing multiplicative noise from such images isinevitable in many areas of applications.Methods employed for multiplicative noise removal in the literature include the total variation(TV) regularization based models, patch-based methods, and nonlocal low-rank based methods. TVregularization has been widely used to preserve edges in the restored images. In a TV regularizationbased model, the objective function is the sum of a data fidelity term and a TV regularization term.The data fidelity term measures the closeness between the desired image and the observed noisyimage, while the TV regularization term measures the total variation of a desired image or an imagein its transformed domain. The AA model [3] used the Bayesian maximum a posteriori probability(MAP) estimation to derive the data fidelity term in terms of the desired image. However, thisdata fidelity term is nonconvex and the resulting optimization problem is challenging to solve. Toovercome this challenge, the DZ model [10] modified the data fidelity term by adding a quadraticterm. As a consequence, the objective function of the DZ model becomes convex under some mildconditions. The I-DIV model [29] used the so-called I-divergence as the data fidelity term. Byperforming the logarithmic transformation, the SO model [28], the HNM model [16], and the Expmodel [23] led to convex, even strictly convex, data fidelity terms. The m V model [34] and the TwL- m V model [18] used convex or strongly convex data fidelity terms via the m th root transformation.The TV regularization based models have good performance in denoising. However, they tend toover-smooth image textures and generate unexpected artifacts.The patch-based methods make use of the redundancy of image patches to yield a restoredimage with fine details. Sparse representations of image patches have been studied in the patch-based methods for multiplicative noise removal. In the learned dictionary method [15], an optimalover-complete dictionary was learned from the patches of the logarithmic transformed noisy imageand then an image was restored via a variational model based on the learned dictionary and aTV regularization. The SAR-BM3D method [25] is another remarkable approach relying on asparse representation, which takes advantage of the nonlocal self-similarity of natural images [5].Nonlocal similar patches, collected as 3D groups, were identified based on a probabilistic similaritymeasure for multiplicative noise, and then were denoised by using jointly nonlocal filtering anda local linear minimum-mean-square-error shrinkage in a wavelet domain. We remark that thosemethods constrain the sparsity priors in either a fixed dictionary or a fixed wavelet domain, whichlimits their capability in multiplicative noise removal.Recently, the nonlocal low-rank based methods were extensively exploited in image processing.It is recognized that natural images are of nonlocal self-similarity. Matrices formed by nonlocalsimilar patches are low-rank, and hence the desired image can be restored by low-rank estimationsof nonlocal similar patch matrices. To regularize the rank of the matrices formed by nonlocalsimilar patches, different approximations of the rank function including the weighted nuclear normand the log-det function were adopted, see, e.g., [9, 12, 14, 17, 31].Existing studies have shown impressive empirical performance of nonlocal low-rank based meth-ods. However, theoretical analysis of the existing methods is missing and there is little work onapplications of nonlocal low-rank based methods to multiplicative noise removal. To address thisissue, we propose to develop a new nonlocal low-rank based method that is theoretically and prac-tically suitable for multiplicative noise removal. The proposed method includes a novel nonlocal2ow-rank model and an efficient iterative algorithm to solve the proposed model with a convergenceguarantee. We explore the underlying low-rank prior of the patch matrices and propose a nonlocallow-rank model for multiplicative noise removal. The resulting optimization problem is nonconvexand nonsmooth, which is challenging to design efficient and theoretically convergence-guaranteedalgorithms to solve. In fact, the well-known alternating direction method of multipliers (ADMM)algorithm is not applicable to this optimization problem, and the alternating minimization (AM)algorithm and the augmented Lagrange multiplier (ALM) algorithm may not converge [4, 33]. Toaddress this challenge on developing an efficient convergent algorithm, we propose a proximal al-ternating minimization scheme with a reweighted approximation of its subproblem and furtheruse the Kurdyka-(cid:32)Lojasiewicz theory [2, 4] to prove its global convergence to a critical point. Theexperiments demonstrate that the proposed nonlocal low-rank based method is well suitable formultiplicative noise removal.The main contributions of this work are: • We propose a nonlocal low-rank model for multiplicative noise removal. This model is for-mulated in the log-transformed domain of images. The objective function of the model asthe sum of a fidelity term and a regularization term is nonconvex and nonsmooth. Its fidelityterm is adapted from the corresponding one in the Exp model [23] to patches, and is strictlyconvex under certain conditions. Its regularization term is the application of the compositionof the rank operator with the patch extraction operator onto the underlying image. Due tothe difficulties caused by the composition and the rank function in solving this model, wepropose to split this composition by introducing an auxiliary variable and to approximate therank function using a smooth concave function. • We develop a proximal alternating reweighted minimization (PARM) algorithm for solvingthe proposed nonlocal low-rank model. The key in the PARM algorithm is to deal with theconcave function that is used to approximate the rank function in the model. We propose toapproximate this concave function by its affine approximation (i.e., the reweighted approxi-mation) in each iteration of the PARM algorithm. This approach could be useful for a widerange of nonlocal low-rank models. • We provide a theoretical analysis of the PARM algorithm which guarantees its global con-vergence to a critical point, in contrast to the practically used algorithms such as in [9, 32]which are lack of convergence analysis. • We give a detailed description on the implementation of the PARM algorithm includingparameter settings, patch sizes, and search windows. We also test the proposed method forvarious images at different noise levels. Furthermore, we conduct the performance comparisonof the proposed method with many existing ones for multiplicative noise removal, with respectto the visual quality of the denoised images, and the PSNR (the peak-signal-to-noise ratio)and SSIM (the structural similarity index measure) values.This paper is organized into six sections. In section 2, we present the nonlocal low-rank modelfor multiplicative noise removal. The proposed PARM algorithm to solve the resulting nonconvexnonsmooth optimization problem is presented in section 3. Section 4 is devoted to the convergenceanalysis of the proposed algorithm. In section 5, we demonstrate the efficiency of the new methodnumerically by experiment results. Section 6 concludes this paper.3
Nonlocal Low-Rank Model for Multiplicative Noise Removal
We propose in this section a nonlocal low-rank model for multiplicative noise removal by exploit-ing low-rank priors of the nonlocal similar patch matrices extracted from the underlying images.Throughout this paper, matrices are bold capital, vectors are bold lowercase and scalars orentries are not bold. Given x , y ∈ R d , (cid:104) x , y (cid:105) := (cid:80) di =1 (cid:104) x i , y i (cid:105) is the standard inner product and (cid:107) x (cid:107) := (cid:112) (cid:104) x , x (cid:105) is the standard (cid:96) norm. Let S d + denote the set of symmetric positive definitematrices of size d × d and let I d denote the identity matrix of size d × d . Given x , y ∈ R d and H ∈ S d + , (cid:104) x , y (cid:105) H := (cid:104) x , Hy (cid:105) is the H -weighted inner product and (cid:107) x (cid:107) H := (cid:112) (cid:104) x , x (cid:105) H is the H -weighted (cid:96) norm. Given X , Y ∈ R m × n , (cid:104) X , Y (cid:105) F := tr( X (cid:62) Y ) is the Frobenius inner productand (cid:107) X (cid:107) F := (cid:112) (cid:104) X , X (cid:105) F is the Frobenius norm.Multiplicative noise removal in this paper refers to reducing multiplicative noise in an L -lookimage obtained by the multi-look averaging technique. An L -look image v ∈ R N in the intensityformat degraded by multiplicative noise can be modeled as v = uη , where u ∈ R N is the desired image to be restored, η ∈ R N is the multiplicative noise and themultiplication operation is a componentwise operation. The multiplicative noise in each pixelfollows a Gamma distribution [11], whose probability distribution function is defined as p ( η i ) = L L η L − i Γ( L ) e − Lη i , i = 1 , . . . , N, which has mean 1 and variance of 1 /L . A list of TV regularization based models for multiplicativenoise removal is presented in Table 1. Table 1: TV regularization based models for multiplicative noise removal.
Name Model Φ Transform. Properties of
ΦAA [3] min u ∈ R N + (cid:104) log u + vu , (cid:105) + λ (cid:107) u (cid:107) T V – nonconvexDZ [10] min u ∈ R N + (cid:104) log u + vu , (cid:105) + ρ (cid:107) (cid:112) uv − (cid:107) + λ (cid:107) u (cid:107) T V – strictly convex if ρ ≥ √ I-DIV [29] min u ∈ R N + (cid:104) u − v log u , (cid:105) + λ (cid:107) u (cid:107) T V – convexSO [28] min u ∈ R N (cid:104) x + v e x , (cid:105) + λ (cid:107) x (cid:107) T V x = log u strictly convexHNM [16] min x ∈ R N , w ∈ R N (cid:104) x + v e x , (cid:105) + ρ (cid:107) x − w (cid:107) + λ (cid:107) w (cid:107) T V x = log u convexExp [23] min x ∈ R N (cid:104) x + v e x , (cid:105) + ρ (cid:107) (cid:113) e x v − γ (cid:107) + λ (cid:107) x (cid:107) T V x = log u strictly convex if ργ ≤ m V [34] min x ∈ m √ U (cid:104) m log x + vx m , (cid:105) + λ (cid:107) x (cid:107) T V x = m √ u convex if m is suf-ficiently largeTwL- m V [18] min a> , x ∈ m √ U s (cid:104) a, x s (cid:105) − s (cid:104) m log a − s vx m , (cid:105) + λ (cid:107) x (cid:107) T V x = m √ u strongly convexwith respect to x R + = (0 , + ∞ ); 2. U = (0 , C ] N , C ∈ R + ; 3. λ > ρ > γ ≥
1, and s ≥
1; 4. denotes the vector whose entries are allones; 5. The division, multiplication, logarithmic, exponential, square root operations are componentwise operations. In the following, we present our nonlocal low-rank model for multiplicative noise removal stepby step. According to the nonlocal self-similarity of natural images, for an image patch, we can findnonlocal similar patches across the image or within a local window [5]. We begin with collectingsimilar patches using block matching [8, 25] and formulating patch matrices. Suppose that ˆ u ∈ R N
4s an estimated clean image in the intensity format and that J is the number of nonlocal similarpatch groups to be collected. For the reference patch ˆ u j ∈ R m j with size √ m j × √ m j in the j thpatch group, we search within a local window for a total of n j patches that are similar to thereference patch, assuming m j ≤ n j , j = 1 , , . . . , J . To fully exploit the statistics of L -look images,we measure the similarity between two patches ˆ u j ∈ R m j and ˆ u (cid:48) j ∈ R m j using the block similaritymeasure introduced in [25] d ( ˆ u j , ˆ u (cid:48) j ) = (2 L − m j (cid:88) i =1 log (cid:115) (ˆ u j ) i (ˆ u (cid:48) j ) i + (cid:115) (ˆ u (cid:48) j ) i (ˆ u j ) i . Following the above, for each group we construct a patch matrix from all the patches in thegiven group through an extraction operator. Define R jl ∈ R m j × N be a binary matrix (i.e., itsentries are either 1 or 0) such that R jl ˆ u is the l th patch in the j th nonlocal similar patch groupof the given estimated image ˆ u , l = 1 , . . . , n j , j = 1 , . . . , J . Then we define a linear operator R j : R N → R m j × n j , m j ≤ n j , as follows, R j ( x ) = (cid:2) R j x R j x · · · R jn j x (cid:3) . Here, R j ( x ) is called the j th patch matrix of the (transformed) image x ∈ R N . After the patchmatrix is extracted, the patch matrix can be further processed using, for example, normalizationwith mean zero, and the corresponding extraction operator R j can be defined accordingly. Intu-itively, the patch matrix R j ( x ) with similar structures should be a low-rank matrix if x is close tothe clean image ˆ u , for example, up to a transformation.Taking advantage of the low-rank prior of image patch matrices R j ( x )’s, the objective functionof a patch-based nonlocal low-rank model consists of a data fidelity term to restore the desiredimage and a nonlocal low-rank regularization term as followsmin x τ f ( x ) + J (cid:88) j =1 λ j rank( R j ( x )) , (1)where x ∈ R N is the desired (transformed) image to be restored, f : R N → ( −∞ , + ∞ ] is thedata fidelity term that measures the closeness between the observed image and the desired image, R j : R N → R m j × n j , m j ≤ n j , is the (normalized) extraction of j th nonlocal similar patch matrix,and λ j > j = 1 , . . . , J .Model (1) regularizes low-rank priors on image patch matrices, but it is not a feasible modelfrom both theoretical and practical perspectives. First, model (1) as a composition optimizationis not easy to solve. Second, the rank function is discontinuous and nonconvex, and minimizing aproblem involving the rank function is NP-hard [26]; therefore, it is challenging to solve model (1).To tackle the above challenges, we plan to relax model (1) in the following ways. We first applythe variable splitting method to model (1) to address the composition optimization problem, adopta nonconvex surrogate of the rank function to replace the rank function, and preferably utilize apatchwise data fidelity term.First, we apply the variable splitting method to relax model (1). By introducing auxiliary(splitting) variables Y j ∈ R m j × n j such that Y j = R j ( x ) and then relaxing these equalities of thesplitting variables, we obtain the following modelmin x , Y ,..., Y J τ f ( x ) + J (cid:88) j =1 (cid:110) µ j (cid:107) Y j − R j ( x ) (cid:107) F + λ j rank( Y j ) (cid:111) , µ j > g : [0 , ∞ ) → R such that g is monotonicallyincreasing, a generalized relaxation of the rank function is defined as (cid:107) Y (cid:107) ∗ ,g = m (cid:88) i =1 g ( σ i ( Y )) , where Y ∈ R m × n , m ≤ n , and σ i ( Y ) is the i th largest singular value of Y . Here, we give two specialcases of the function g . If g ( t ) = (cid:107) t (cid:107) as the (cid:96) norm, then (cid:107) Y (cid:107) ∗ ,g exactly reduces to the rankfunction. If g ( t ) = t as a linear function, then (cid:107) Y (cid:107) ∗ ,g = (cid:107) Y (cid:107) ∗ is exactly the nuclear norm, whichis the tightest convex surrogate of the rank function. However, the rank minimization is NP-hard,while the nuclear norm minimization may over-shrink the singular values with large values [12].To better approximate the rank function, we are interested in its nonconvex relaxation (cid:107) · (cid:107) ∗ ,g with the function g to be monotonically increasing, concave and smooth. For example, a decentchoice of g : [0 , ∞ ) → R is the logarithmic function defined as g ( t ) = log( t + ε ) , (2)where ε > v ∈ R N be the given noisy image and let x ∈ R N be the unknown clean log-transformedimage to be restored. We extend the pixelwise data fidelity term of the Exp model [23] as shownin Table 1 to a patchwise data fidelity term that is in terms of patch matrices R j ( x )’s as follows f ( x ) = J (cid:88) j =1 µ j (cid:28) R j ( x ) + R j ( v ) e R j ( x ) , R j ( N ) (cid:29) F + ρ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:115) e R j ( x ) R j ( v ) − γR j ( N ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F , where µ j > N denotes the vector of size N × ρ > γ ≥ f isstrictly convex if ργ ≤ .The patchwise data fidelity term can be further viewed as a weighted pixelwise data fidelityterm. Define R (cid:62) j : R m j × n j → R N as R (cid:62) j ( Y ) = (cid:80) n j l =1 R (cid:62) jl y i , where y i ∈ R m j is the i th vector of Y . Since R j and R (cid:62) j are linear operators such that (cid:104) R j ( x ) , Y (cid:105) F = (cid:104) x , R (cid:62) j ( Y ) (cid:105) for all x ∈ R N and Y ∈ R m j × n j , where (cid:104)· , ·(cid:105) is the standard inner product for vectors, then f can be written as f ( x ) = J (cid:88) j =1 µ j (cid:68) x + v e x , ( R (cid:62) j ◦ R j ) N (cid:69) + ρ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) R j (cid:32)(cid:114) e x v − γ N (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F = (cid:104) x + v e x , N (cid:105) W + ρ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:114) e x v − γ N (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) W , (3)where W = (cid:80) Jj =1 µ j R (cid:62) j ◦ R j = (cid:80) Jj =1 µ j (cid:80) n j l =1 R (cid:62) jl R jl is a diagonal matrix whose main diagonalentries indicate the weighted counts for each pixel. Since we assume that each pixel belongs to atleast one nonlocal similar patch group, then W ∈ S N + and the W -weighted inner product (cid:104)· , ·(cid:105) W and the W -weighted (cid:96) norm (cid:107) · (cid:107) W are well-defined. The proposed data fidelity term assigns6ore weights to the image pixels that belong to multiple patch groups. It helps develop efficientalgorithms and cooperates well with the framework of our algorithm introduced in section 3.Putting all the above discussion together, we come up with the following nonlocal low-rankmodel min x , Y ,..., Y J τ f ( x ) + J (cid:88) j =1 (cid:32) µ j (cid:107) Y j − R j ( x ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y j )) (cid:33) , (4)where x ∈ R N , Y j ∈ R m j × n j , f : R N → ( −∞ , + ∞ ] is defined as (3), g : [0 , ∞ ) → R is definedas (2), R j : R N → R m j × n j is the (normalized) extraction of j th nonlocal similar patch matrix, m j ≤ n j , τ > µ j > λ j > j = 1 , . . . , J .Clearly, the objective function of model (4) is nonconvex and nonsmooth. Existing algorithmsare not directly applicable to this problem. It is challenging to design theoretically convergence-guaranteed and practically efficient algorithms to solve this nonconvex nonsmooth optimizationproblem. In the next section, we will propose an efficient algorithm for the nonlocal low-rankmodel (4) and analyze its convergence in section 4. We present a proximal alternating reweighted minimization algorithm for solving the nonconvexnonsmooth optimization problem of model (4).The nonlocal low-rank model, which has the form of model (4), regularizes the low-rank priorof patch matrices and can also be applicable to many image restoration problems such as imagedenoising and compressive sensing if the patch matrix extraction R j and the data fidelity term f are appropriately selected. In the following, we consider the nonlocal low-rank model in a generalsetting. The objective function of model (4), denoted as Φ, can be written asΦ( x , Y , . . . , Y J ) = τ f ( x ) + J (cid:88) j =1 Φ j ( x , Y j ) , (5)where Φ j ( x , Y ) = µ j (cid:107) Y − R j ( x ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y )) , (6)and we assume (A1) f : R N → ( −∞ , + ∞ ] is inf-bounded, proper and lower semicontinuous, i.e., inf f > −∞ ,dom f := { x ∈ R N : f ( x ) < + ∞} (cid:54) = ∅ and f ( a ) ≤ lim inf x → a f ( x ) , ∀ a ∈ R N ; (A2) g : [0 , ∞ ) → R is monotonically increasing and concave (and nonconvex); and g is continuouslydifferentiable with an L g -Lipschitz continuous gradient, i.e., | g (cid:48) ( t ) − g (cid:48) ( t ) | ≤ L g | t − t | , ∀ t ≥ , t ≥ (A3) Φ( x , Y , . . . , Y J ) is coercive, i.e.,lim (cid:107) ( x , Y ..., Y J ) (cid:107)→∞ Φ( x , Y , . . . , Y J ) = + ∞ .
7n the application of multiplicative noise removal, we utilize the nonlocal low-rank model (4)with f defined as (3) and g defined as (2). It is easy to verify that f satisfies Assumption (A1)and g satisfies Assumption (A2). These together with the coercivity of f and g imply that Φ isinf-bounded and coercive. Hence, Assumption (A1)-(A3) hold for our proposed model.In this general setting, no convexity or smoothness is assumed for f and the objective functionΦ of the nonlocal low-rank model (4) is nonconvex and nonsmooth. For solving this nonconvexand nonsmooth optimization problem, the alternating minimization (AM) algorithm was adoptedfor compressive sensing [9] and the augmented Lagrange multiplier (ALM) algorithm was adoptedfor speckle noise removal [32]. However, there is no guarantee that those methods will converge.Because the sequence generated by the AM algorithm may cycle indefinitely without converging ifthe minimum in each alternating step is not uniquely obtained [4]; and the sequence generated bythe ALM algorithm may diverge even with bounded penalty parameters [33]. Therefore, we willpropose an algorithm called the Proximal Alternating Reweighted Minimization (PARM) algorithm customized for model (4) as shown below Y k +1 j ∈ argmin Y j (cid:101) Φ j ( x k , Y j ) + α jk (cid:107) Y j − Y kj (cid:107) F , (7) x k +1 ∈ argmin x Φ( x , Y k +11 , . . . , Y k +1 J ) + β k (cid:107) x − x k (cid:107) W , (8)where (cid:101) Φ j ( x k , Y j ) is a reweighted approximation of Φ j ( x k , Y j ) with respect to Y j , W = (cid:80) Jj =1 µ j R (cid:62) j ◦ R j ∈ S N + , and α jk > β k > (A4) For the sequences { α jk } k ∈ N , j = 1 , , . . . , J , and the sequence { β k } k ∈ N , there exist positiveconstants α − , α + , β − , β + such thatinf { α jk : k ∈ N , j = 1 , , . . . , J } ≥ α − , and inf { β k : k ∈ N } ≥ β − , sup { α jk : k ∈ N , j = 1 , , . . . , J } ≤ α + , and sup { β k : k ∈ N } ≤ β + . The convergence analysis of the PARM algorithm will be provided in the next section.The proposed PARM algorithm has a proximal alternating scheme similar to the proximalalternating linearized minimization [4] for nonconvex and nonsmooth problems proposed by Bolteet al., in which a proximal term at the previous iterate is added to each subproblem. In (7), we utilize (cid:101) Φ j ( x k , Y j ), a reweighted approximation of Φ j ( x k , Y j ), to approximate the nonconvex surrogate ofthe rank function, which yields a closed form for (7). In (8), the proximal term is in term of the W -weighted norm, which is to be consistent with the patchwise data fidelity term f , for example,as defined in (3). In fact, we will continue to use the W -weighted norm to measure the variable x throughout the entire paper. Moreover, as an algorithm for nonlocal low-rank models applied toimage restoration, the PARM algorithm can be intuitively interpreted as follows. Equation (7) canbe viewed as a low-rank patch matrix estimation, which returns the nonlocal patch matrices Y j ’swith a low-rank property, while equation (8) can be viewed as the image restoration step, whichaggregates all the estimated nonlocal patch matrices from (7) to form the desired image x .Before further derive our PARM algorithm, we review some preliminaries on subdifferentialsand proximity operators for nonconvex and nonsmooth functions. For nonconvex and nonsmooth functions, we use the following definitions for subdifferentialsand proximity operators. 8 efinition 3.1 (Subdifferentials)
Let f : R d → ( −∞ , + ∞ ] be a proper and lower semicontin-uous function.(1) For a given x ∈ dom f , the Fr´echet subdifferential of f at x , written ˆ ∂f ( x ), is the set of allvectors u ∈ R d which satisfylim inf y (cid:54) = x y → x f ( y ) − f ( x ) − (cid:104) u , y − x (cid:105)(cid:107) y − x (cid:107) ≥ . When x / ∈ dom f , we set ˆ ∂f ( x ) = ∅ .(2) The subdifferential (or called the limiting-subdifferential) of f at x ∈ R d , written ∂f ( x ), isdefined through the following closure process ∂f ( x ) := { u ∈ R d : ∃ x k → x , f ( x k ) → f ( x ) and u k ∈ ˆ ∂f ( x k ) → u as k → ∞} . Definition 3.2 (Proximity operators)
Let f : R d → ( −∞ , + ∞ ] be a proper and lower semi-continuous function such that inf R d f > −∞ . The proximity operator of f at x ∈ R d is definedas prox f ( x ) = argmin u ∈ R d f ( u ) + 12 (cid:107) u − x (cid:107) . Note that prox f ( x ) is a set-valued map. If f is convex, then prox f ( x ) is reduced to a single-valuedmap.The definitions above for subdifferentials and proximity operators are defined on vectors withrespect to the standard (cid:96) norm. Without loss of generality, these definitions can be extended tovectors with respect to the weighted (cid:96) norm and matrices with respect to the Frobenius norm.Let H ∈ S d + . The Fr´echet subdifferential of f : R d → ( −∞ , + ∞ ] at a vector x ∈ R d withrespect to H is denoted as ˆ ∂ H f ( x ); its subdifferential is denoted as ∂ H f ( x ); and its proximityoperator is denoted as prox H f ( x ).For the function f : R m × n → ( −∞ , + ∞ ] at a matrix X ∈ R m × n with respect to the Frobeniusnorm, its Fr´echet subdifferential is denoted as ˆ ∂ F f ( X ) or ˆ ∂f ( X ); its subdifferential is denoted as ∂ F f ( X ) or ∂f ( X ); and its proximity operator is denoted as prox Ff ( X ) or prox f ( X ).Now, we are ready to discuss in detail the proposed PARM algorithm in (7) and (8). To estimate low-rank patch matrices, the minimization of Φ j ( x k , Y j ), as a generalized rankminimization of the patch matrix Y j , is approximated via a reweighted scheme, as shown in (7).Since g is concave on [0 , ∞ ) and continuously differentiable, by the definition of the supergra-dient, we have g ( σ i ( Y j )) ≤ g ( σ i ( Y kj )) + ( w kj ) i ( σ i ( Y j ) − σ i ( Y kj )) , (9)where w kj = [( w kj ) , . . . , ( w kj ) m j ] (cid:62) and ( w kj ) i = g (cid:48) ( σ i ( Y kj )), i = 1 , , . . . , m j . Then we replace theterm g ( σ i ( Y j )) in Φ j ( x k , Y j ) by the right hand side of the inequality (9) and have its reweightedapproximation (cid:101) Φ j ( x k , Y j ) as follows (cid:101) Φ j ( x k , Y j ) = µ j (cid:107) Y j − R j ( x k ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y kj )) + ( w kj ) i ( σ i ( Y j ) − σ i ( Y kj )) . (10)9ence, the update of the low-rank patch matrix Y k +1 j in (7) at the ( k + 1)th step can be rewrittenas follows Y k +1 j ∈ argmin Y j µ j (cid:107) Y j − R j ( x k ) (cid:107) F + λ j m j (cid:88) i =1 ( w kj ) i σ i ( Y j ) + α jk (cid:107) Y j − Y kj (cid:107) F (11)= argmin Y j λ j m j (cid:88) i =1 ( w kj ) i σ i ( Y j ) + µ j + α jk (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Y j − µ j R j ( x k ) + α jk Y kj µ j + α jk (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F . By introducing the definition of the weighted nuclear norm of Y ∈ R m × n , m ≤ n , with theweight vector w = [ w , . . . , w m ] (cid:62) and w i ≥ i = 1 , . . . , m , as follows (cid:107) Y (cid:107) ∗ , w = m (cid:88) i =1 w i σ i ( Y ) , where σ ( Y ) ≥ σ ( Y ) ≥ · · · ≥ σ m ( Y ) ≥
0. It was proved in [7] that (cid:107) · (cid:107) ∗ , w is convex if and onlyif w ≥ w ≥ · · · ≥ w m ≥
0. In other words, for (cid:107) · (cid:107) ∗ , w being a convex function, the weights mustincrease with singular values. However, in order for large singular values to receive less penaltyto help reducing the bias and smaller singular values to receive heavier penalty to help promotingsparsity, the opposite order of the weight is desirable, i.e., 0 ≤ w ≤ w ≤ · · · ≤ w m . Under thisorder of the weights, the weighted nuclear norm is a nonconvex function and in general its proximityoperator may be a set-valued map. Fortunately, the proximity operator is a single-value map, asshown in the following lemma. Lemma 3.1 (see [7, Theorem 2.3])
For any λ > Y ∈ R m × n , m ≤ n and w = [ w , . . . , w m ] (cid:62) with 0 ≤ w ≤ w ≤ · · · ≤ w m , prox λ (cid:107)·(cid:107) ∗ , w ( Y ) = U S λ, w ( Σ ) V (cid:62) , where Y = U Σ V (cid:62) is the singular value decomposition (SVD) of Y and S λ, w ( Σ ) = diag { (Σ ii − λw i ) + } is the weighted singular value thresholding (WSVT) operator.The assumption that g is monotonically increasing and concave implies that g (cid:48) is nonnegativeand monotonically decreasing. Then the weight vector w kj satisfies the ascending constraint, thatis, 0 ≤ ( w kj ) ≤ · · · ≤ ( w kj ) m j . Hence, by Lemma 3.1, the low-rank patch matrix Y k +1 j can beuniquely achieved Y k +1 j = prox λjµj + αjk (cid:107)·(cid:107) ∗ , w kj (cid:32) µ j R j ( x k ) + α jk Y kj µ j + α jk (cid:33) , = 1 µ j + α jk prox λ j (cid:107)·(cid:107) ∗ , w kj (cid:16) µ j R j ( x k ) + α jk Y kj (cid:17) , = 1 µ j + α jk U k +1 j S λ j , w kj ( (cid:101) Σ kj )( V k +1 j ) (cid:62) , where U k +1 j (cid:101) Σ kj ( V k +1 j ) (cid:62) is the SVD of µ j R j ( x k ) + α jk Y kj . Remark 3.1
The ascending constraint on the weight vector w kj may not be automatically satis-fied, if g is not differentiable and ( w kj ) i is chosen as a supergradient of g at σ i ( Y kj ), i.e., − ( w kj ) i ∈ ( − g )( σ i ( Y kj )), as defined in [22]. For example, suppose that g is not differentiable at σ i ( Y kj ) andthen ∂ ( − g )( σ i ( Y kj )) contains more than one element. If σ i +1 ( Y kj ) = σ i ( Y kj ), then the weights − ( w kj ) i +1 and − ( w kj ) i that are selected from the same set ∂ ( − g )( σ i ( Y kj )) = ∂ ( − g )( σ i +1 ( Y kj ))may have ( w kj ) i +1 < ( w kj ) i rather than an ascending order. Thus, we have to carefully select the( w kj ) i in the case where g is not differentiable. For example, let ( w kj ) i = − min ∂ ( − g )( σ i ( Y kj )). After obtaining the estimates of the low-rank patch matrices Y k +1 j ’s from the generalized rankminimization in the previous step, we may have a situation where the same pixel may have sev-eral estimated values. That is because one pixel may belong to more than one nonlocal similarpatch matrices, when we group nonlocal similar patches by block matching. Thus, at this imagerestoration step in (8) of the PARM algorithm, we aggregate all the estimated patches to restorethe entire image by minimizing the proximal regularization of Φ( x , Y k +11 , . . . , Y k +1 J ) with respectto x .Note that the term (cid:80) Jj =1 µ j (cid:107) Y k +1 j − R j ( x ) (cid:107) F in Φ( x , Y k +11 , . . . , Y k +1 J ) can be written as J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x ) (cid:107) F = J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x k ) (cid:107) F + J (cid:88) j =1 µ j (cid:107) R j ( x ) − R j ( x k ) (cid:107) F − J (cid:88) j =1 µ j (cid:104) R j ( x ) − R j ( x k ) , Y k +1 j − R j ( x k ) (cid:105) F . Recall that R (cid:62) j : R m j × n j → R N is defined as R (cid:62) j ( Y ) = (cid:80) n j l =1 R (cid:62) jl y i , where y i is the i th vectorof Y . Since (cid:104) R j ( x ) , Y (cid:105) F = (cid:104) x , R (cid:62) j ( Y ) (cid:105) = (cid:104) x , W − R (cid:62) j ( Y ) (cid:105) W and W = (cid:80) Jj =1 µ j R (cid:62) j ◦ R j ∈ S N + ,then the right hand side of the above equality can be written as J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x ) (cid:107) F = J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x k ) (cid:107) F + 12 (cid:107) x − x k (cid:107) W − (cid:104) x − x k , J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k (cid:105) W . The update of the estimated image x k +1 in (8) at the ( k + 1)th step can be rewritten as follows x k +1 ∈ argmin x τ f ( x ) − (cid:104) x − x k , J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k (cid:105) W + β k + 12 (cid:107) x − x k (cid:107) W (12)= prox W τβk +1 f x k + 1 β k + 1 J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k . The overall procedure of the PARM algorithm in (7) and (8) is summarized in Algorithm 1.
To remove multiplicative noise, we apply the PARM algorithm in Algorithm 1 to solve thenonlocal low-rank model (4) with f defined as (3) and g defined as (2). Accordingly, using the11 lgorithm 1 Proximal alternating reweighted minimization algorithm for model (4) Set parameters τ , µ j , λ j , α jk , and β k Set extraction R j by block matching Compute matrix W Initialize x , Y j , and w j Set k = 0 repeat for j from 1 to J do [ U k +1 j , (cid:101) Σ kj , V k +1 j ] = SVD (cid:16) µ j R j ( x k ) + α jk Y kj (cid:17) (cid:46) SVD Σ k +1 j = µ j + α jk S λ j , w kj ( (cid:101) Σ kj ) (cid:46) WSVT Y k +1 j = U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) (cid:46) Update Y k +1 j ( w k +1 j ) i = g (cid:48) ((Σ k +1 j ) ii ) (cid:46) Update w k +1 j end for x k +1 ∈ prox W τβk +1 f (cid:16) x k + β k +1 (cid:16)(cid:80) Jj =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k (cid:17)(cid:17) (cid:46) Update x k +1 k ← k + 1 until stopping criterion is satisfieddefinition of f and g , Algorithm 1 can be specifically implemented as follows. In line 11 of Algo-rithm 1, ( w k +1 j ) i = k +1 j ) ii + ε ; in line 13, the proximity operator of τβ k +1 f with respect to W canbe computed using Newton’s method. Given that ργ ≤ , the function f is strictly convex andhence prox W τβk +1 f is single-valued defined as x = prox W τβk +1 f ( (cid:101) x ) = argmin x f ( x ) + β k + 12 τ (cid:107) x − (cid:101) x (cid:107) W . Since f is differentiable with respect to the W -weighted (cid:96) norm with its gradient ∇ W f ( x ) = − v e x + ρ (cid:16) e x v − γ (cid:113) e x v (cid:17) , then x is the unique solution of the following equation ∇ W f ( x ) + β k + 1 τ ( x − (cid:101) x ) = 0 , and this equation can be efficiently solved by Newton’s method. The aim of this section is to analyze the convergence of the PARM algorithm for model (4). Theproof is motivated by the inexact descent convergence results for Kurdyka-(cid:32)Lojasiewicz functionsin [2, 4].In the sequel, we use the notation Z := ( x , Y , . . . , Y J ) and (cid:107) Z (cid:107) := (cid:118)(cid:117)(cid:117)(cid:116) (cid:107) x (cid:107) W + J (cid:88) j =1 (cid:107) Y j (cid:107) F , and we denote by Φ( Z ) the objective function in model (4).Here are three essential conditions to guarantee convergence of the sequence { Z k } k ∈ N generatedby the PARM algorithm. 12 H1) Sufficient descent condition:
There exists a positive constant c such that for ∀ k ∈ N , c (cid:107) Z k +1 − Z k (cid:107) ≤ Φ( Z k ) − Φ( Z k +1 ) . (H2) Relative error condition: There exists a positive constant c such that for ∀ k ∈ N , (cid:107) A k +1 (cid:107) ≤ c (cid:107) Z k +1 − Z k (cid:107) and A k +1 ∈ ∂ Φ( Z k +1 ) . (H3) Continuity condition: There exists a subsequence { Z k t } t ∈ N and Z ∗ such thatlim t →∞ Z k t = Z ∗ and lim t →∞ Φ( Z k t ) = Φ( Z ∗ ) . In the following, we prove that the sequence { Z k } k ∈ N satisfies Condition (H1)-(H3), and thenconclude that { Z k } k ∈ N converges to a critical point of Φ using the fact that Φ is a Kurdyka-(cid:32)Lojasiewicz function. We show that the objective function Φ in model (4) evaluated at Z k , denoted Φ( Z k ), decreasessufficiently as k increases. Proposition 4.1 (Sufficient descent condition)
Suppose that the objective function Φ inmodel (4) satisfies Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARMalgorithm provided that the parameters satisfy Assumption (A4). Then { Φ( Z k ) } k ∈ N is strictlydecreasing and, in particular, there exists a positive constant c such that for ∀ k ∈ N , c (cid:107) Z k +1 − Z k (cid:107) ≤ Φ( Z k ) − Φ( Z k +1 ) . (13) Proof.
Let Φ j ( x , Y ) be defined as (6) and let (cid:101) Φ j ( x k , Y j ) be defined as (10). Then, accordingthe concavity of g illustrated in inequality (9), Φ j ( x k , Y k +1 j ) and its reweighted approximation (cid:101) Φ j ( x k , Y j ) have the following relationshipΦ j ( x k , Y k +1 j ) ≤ (cid:101) Φ j ( x k , Y k +1 j ) and Φ j ( x k , Y kj ) = (cid:101) Φ j ( x k , Y kj ) . Thus, the objective function Φ in (5) evaluated at x k and Y k +1 j ’s can be rewritten asΦ( x k , Y k +11 , . . . , Y k +1 J ) = τ f ( x k ) + J (cid:88) j =1 Φ j ( x k , Y k +1 j ) ≤ τ f ( x k ) + J (cid:88) j =1 (cid:101) Φ j ( x k , Y k +1 j ) . By the update of Y k +1 j in (7), we have (cid:101) Φ j ( x k , Y k +1 j ) + α jk (cid:107) Y k +1 j − Y kj (cid:107) F ≤ (cid:101) Φ j ( x k , Y kj ) = Φ j ( x k , Y kj ) . Combining the two inequalities above, we have the following inequality onΦ( x k , Y k +11 , . . . , Y k +1 J ) and Φ( Z k )Φ( x k , Y k +11 , . . . , Y k +1 J ) ≤ τ f ( x k ) + J (cid:88) j =1 Φ j ( x k , Y kj ) − J (cid:88) j =1 α jk (cid:107) Y k +1 j − Y kj (cid:107) F , = Φ( Z k ) − J (cid:88) j =1 α jk (cid:107) Y k +1 j − Y kj (cid:107) F .
13y the update of x k +1 in (8), we haveΦ( Z k +1 ) ≤ Φ( x k , Y k +11 , . . . , Y k +1 J ) − β k (cid:107) x k +1 − x k (cid:107) W . Combining the two inequalities above, we have that Φ( Z k +1 ) and Φ( Z k ) satisfy the followinginequality β k (cid:107) x k +1 − x k (cid:107) W + J (cid:88) j =1 α jk (cid:107) Y k +1 j − Y kj (cid:107) F ≤ Φ( Z k ) − Φ( Z k +1 ) . Equation (13) holds with c = min { β − , α − } > { Φ( Z k ) } k ∈ N is strictly decreasing. Here, β − and α − are two positive parameters given in Assumption (A4).The sufficient descent condition proved in Proposition 4.1 immediately yields the followingcorollary. Corollary 4.1
Suppose that the objective function Φ in model (4) satisfies Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). Then lim k →∞ (cid:107) Z k − Z k +1 (cid:107) = 0 . Proof.
Summing inequality (13) from k = 0 to k = K −
1, we have c K − (cid:88) k =0 (cid:107) Z k +1 − Z k (cid:107) ≤ Φ( Z ) − Φ( Z K ) ≤ Φ( Z ) − Φ inf , where Φ inf = inf Z Φ( Z ) > −∞ .Taking K → ∞ , we have ∞ (cid:88) k =0 (cid:107) Z k +1 − Z k (cid:107) < ∞ , which implies lim k →∞ (cid:107) Z k +1 − Z k (cid:107) = 0. Before proving that a subgradient of Φ at Z k +1 is upper bounded by the iterates gap, we firstcharacterize the subdifferential of Φ.Recall that the variable x is measured in terms of the W -weight (cid:96) norm and that the vari-ables Y j ’s are measured in terms of the Frobenius norm. Then using the notations introduced insubsection 3.1 we define the subdifferential of Φ by ∂ Φ( Z ) = (cid:8) ( A x , A Y , . . . , A Y J ) : A x ∈ ∂ Wx Φ( Z ) , A Y j ∈ ∂ Y j Φ( Z ) , j = 1 , . . . , J (cid:9) , where ∂ Wx Φ( Z ) is the partial subdifferential of Φ with respect to the variable x and with respect tothe W -weight (cid:96) norm and ∂ Y j Φ( Z ) is the partial subdifferential of Φ with respect to the variable Y j and with respect to the Frobenius norm.By the definition of Φ in model (4) and the fact that J (cid:88) j =1 µ j (cid:107) Y j − R j ( x ) (cid:107) F = 12 (cid:104) x , x − J (cid:88) j =1 µ j W − R (cid:62) j ( Y j ) (cid:105) W + J (cid:88) j =1 µ j (cid:107) Y j (cid:107) F ,
14e have ∂ Wx Φ( Z ) = τ ∂ W f ( x ) + x − J (cid:88) j =1 µ j W − R (cid:62) j ( Y j )and ∂ Y j Φ( Z ) = µ j ( Y j − R j ( x )) + λ j ∂ (cid:32) m j (cid:88) i =1 g ◦ σ i (cid:33) ( Y j ) . To compute the subdifferential of the singular value function (cid:80) m j i =1 g ◦ σ i and further characterize ∂ Y j Φ( Z ), we introduce some definitions and a lemma on singular value functions in [19, 20]. Definition 4.1
A function f : R n → R is absolutely symmetric if f ( x , x , . . . , x n ) = f ( | x π (1) | , | x π (2) | , . . . , | x π ( n ) | ) , for any permutation π . Definition 4.2
A function F : R m × n → R , m ≤ n , is a singular value function if F ( X ) =( f ◦ σ )( X ), where f : R m → R is an absolutely symmetric function, σ ( X ) = [ σ ( X ) , . . . , σ m ( X )] (cid:62) and σ i ( X ) is the i th largest singular value of X .The function (cid:80) mi =1 g ◦ σ i can be viewed as a singular value function of the form (cid:32) m (cid:88) i =1 g ◦ σ i (cid:33) ( Y ) = ( (cid:101) g ◦ σ )( Y ) , where (cid:101) g : R m → R is defined as (cid:101) g ( t ) = (cid:80) mi =1 g ( | t i | ) and is absolutely symmetric. Lemma 4.1
The subdifferential of a singular value function f ◦ σ at X ∈ R m × n is given by theformula ∂ ( f ◦ σ )( X ) = (cid:110) U diag( d ) V (cid:62) : d ∈ ∂f ( σ ( X )) , ( U , V ) ∈ M ( X ) (cid:111) , where M ( X ) = (cid:8) ( U , V ) ∈ R m × l × R n × l : U (cid:62) U = V (cid:62) V = I , X = U diag( σ ( X )) V (cid:62) (cid:9) .By Lemma 4.1, the subdifferential of (cid:80) mi =1 g ◦ σ i at Y ∈ R m × n can be computed as follows ∂ (cid:32) m (cid:88) i =1 g ◦ σ i (cid:33) ( Y ) = { U diag( d ) V (cid:62) : d i = c i g (cid:48) [ σ i ( Y )] , c i ∈ ∂ | · | ( σ i ( Y )) ,i = 1 , . . . , m, ( U , V ) ∈ M ( Y ) } , where ∂ | · | ( σ i ( Y )) = (cid:40) { } , if σ i ( Y ) > − , , if σ i ( Y ) = 0 . Next, we are ready to derive a subgradient of Φ at Z k +1 using the lemma below and to provethat it is upper bounded. Lemma 4.2
Suppose that the objective function Φ in model (4) satisfies Assumption (A1)-(A3).Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameters15atisfy Assumption (A4). Let U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) be the SVD of Y k +1 j . Then, for each k and each j , there exists c k +1 j ∈ R m j such that( c k +1 j ) i ∈ ∂ | · | ( σ i ( Y k +1 j )) , i = 1 , . . . , m j , (14)and λ j U k +1 j diag( d k +1 j )( V k +1 j ) (cid:62) = − α jk ( Y k +1 j − Y kj ) − µ j ( Y k +1 j − R j ( x k )) , (15)where diag( d k +1 j ) = diag( c k +1 j ) diag( w kj ). Proof.
According to the update of Y k +1 j in (11), we have0 ∈ µ j ( Y k +1 j − R j ( x k )) + λ j ∂ (cid:107) · (cid:107) ∗ , w kj ( Y k +1 j ) + α jk ( Y k +1 j − Y kj ) . Since the weighted nuclear norm (cid:107) · (cid:107) ∗ , w is a singular value function, then by Lemma 4.1 thesubdifferential of (cid:107) · (cid:107) ∗ , w can be computed as follows ∂ (cid:107) · (cid:107) ∗ , w ( Y ) = { U diag( d ) V (cid:62) : d i = c i w i , c i ∈ ∂ | · | ( σ i ( Y )) , i = 1 , . . . , m, ( U , V ) ∈ M ( Y ) } . Note that ( U k +1 j , V k +1 j ) ∈ M ( Y k +1 j ). Thus, there exists c k +1 j ∈ R m j such that (14) holds and − α jk ( Y k +1 j − Y kj ) − µ j ( Y k +1 j − R j ( x k )) = λ j U k +1 j diag( d k +1 j )( V k +1 j ) (cid:62) ∈ λ j ∂ (cid:107) · (cid:107) ∗ , w kj ( Y k +1 j ) , where diag( d k +1 j ) = diag( c k +1 j ) diag( w kj ). Proposition 4.2 (Relative error condition)
Suppose that the objective function Φ in model(4) satisfies Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARM algo-rithm provided that the parameters satisfy Assumption (A4). Let U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) be the SVDof Y k +1 j and let c k +1 j and d k +1 j be in R m j satisfying (14) and (15).Define A k +1 = ( A k +1 x , A k +1 Y , . . . , A k +1 Y J ), where A k +1 x = β k ( x k +1 − x k ) (16)and A k +1 Y j = µ j ( R j ( x k ) − R j ( x k +1 )) + λ j U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) − α jk ( Y k +1 j − Y kj ) , (17)where diag( (cid:101) d k +1 j ) = diag( c k +1 j ) diag( w k +1 j ).Then the following assertions hold for ∀ k ∈ N ,(a) A k +1 ∈ ∂ Φ( Z k +1 );(b) (cid:107) A k +1 (cid:107) ≤ c (cid:107) Z k +1 − Z k (cid:107) , for some c > Proof. (a) According to the update of x k +1 in (12), we have0 ∈ τ ∂ W f ( x k +1 ) + x k − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) + ( β k + 1)( x k +1 − x k ) . A k +1 x in (16) implies A k +1 x ∈ τ ∂ W f ( x k +1 ) + x k +1 − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) = ∂ Wx Φ( Z k +1 ) . Also, for each j , the definition of A k +1 Y j in (17) and Lemma 4.2 imply A k +1 Y j = µ j ( Y k +1 j − R j ( x k +1 )) + λ j U k +1 j diag( (cid:101) d k +1 j )( V k +1 j ) (cid:62) ∈ µ j ( Y k +1 j − R j ( x k +1 )) + λ j ∂ (cid:107) · (cid:107) ∗ , w k +1 j ( Y k +1 j )= µ j ( Y k +1 j − R j ( x k +1 )) + λ j ∂ (cid:32) m j (cid:88) i =1 g ◦ σ i (cid:33) ( Y k +1 j )= ∂ Y j Φ( Z k +1 ) . (b) It follows from the Cauchy-Schwarz inequality that (cid:107) A k +1 (cid:107) ≤ (cid:107) A k +1 x (cid:107) W + J (cid:88) j =1 (cid:107) A k +1 Y j (cid:107) F , where (cid:107) A k +1 x (cid:107) W = β k (cid:107) x k +1 − x k (cid:107) W and (cid:107) A k +1 Y j (cid:107) F ≤ µ j (cid:107) R j ( x k ) − R j ( x k +1 ) (cid:107) F + λ j (cid:107) U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) (cid:107) F + α jk (cid:107) Y k +1 j − Y kj (cid:107) F . The right hand side of the above inequality can be computed term by term as follows. The squareof the first term is bounded above by the square of the weighted iterates of the variable x , µ j (cid:107) R j ( x k ) − R j ( x k +1 ) (cid:107) F ≤ µ j J (cid:88) j =1 µ j (cid:107) R j ( x k − x k +1 ) (cid:107) F = µ j (cid:107) x k − x k +1 (cid:107) W . This implies that µ j (cid:107) R j ( x k ) − R j ( x k +1 ) (cid:107) F ≤ √ µ j (cid:107) x k − x k +1 (cid:107) W .Also, the second term is bounded above by the iterates of the variable Y j . Since (cid:107) (cid:101) d k +1 j − d k +1 j (cid:107) ≤(cid:107) (cid:101) d k +1 j − d k +1 j (cid:107) , then we have λ j (cid:107) U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) (cid:107) F = λ j (cid:107) (cid:101) d k +1 j − d k +1 j (cid:107) ≤ λ j m j (cid:88) i =1 (cid:12)(cid:12)(cid:12) ( c k +1 j ) i (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ( w k +1 j ) i − ( w kj ) i (cid:12)(cid:12)(cid:12) ≤ λ j m j (cid:88) i =1 (cid:12)(cid:12)(cid:12) g (cid:48) ( σ i ( Y k +1 j )) − g (cid:48) ( σ i ( Y kj )) (cid:12)(cid:12)(cid:12) . Using the condition that g (cid:48) is L g -Lipschitz continuous, we further obtain λ j (cid:107) U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) (cid:107) F ≤ λ j m j (cid:88) i =1 L g | σ i ( Y k +1 j ) − σ i ( Y kj ) |≤ λ j m j L g (cid:107) Y k +1 j − Y kj (cid:107) F , (cid:107) Y k +1 j − Y kj (cid:107) ≤ (cid:107) Y k +1 j − Y kj (cid:107) F .Therefore, combining all the inequalities above, we obtain (cid:107) A k +1 (cid:107) ≤ ( β k + M µ ) (cid:107) x k +1 − x k (cid:107) W + J (cid:88) j =1 ( λ j m j L g + α jk ) (cid:107) Y k +1 j − Y kj (cid:107) F ≤ c (cid:107) Z k +1 j − Z kj (cid:107) , where M µ = (cid:80) Jj =1 √ µ j and c = max { β + + M µ , λ m L g + α + , . . . , λ J m J L g + α + } .The relative error condition proved in Proposition 4.2 immediately yields the following corollary. Corollary 4.2
Suppose that the objective function Φ in model (4) satisfies Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). Define A k +1 = ( A k +1 x , A k +1 Y , . . . , A k +1 Y J ), where A k +1 x is defined as (16)and A k +1 Y j is defined as (17), j = 1 , . . . , J . Thenlim k →∞ (cid:107) A k +1 (cid:107) = 0 . Proof.
The result is immediately followed by Proposition 4.2 and Corollary 4.1.
We first show the existence of a limit point of { Z k } k ∈ N using the boundedness of { Z k } k ∈ N ,and then prove a continuity condition for any convergent subsequence of { Z k } k ∈ N , which impliesCondition (H3). Proposition 4.3
Suppose that the objective function Φ in model (4) satisfies Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). Let S denote the set of all limit points of the sequence { Z k } k ∈ N . Thenthe following assertions hold.(a) S (cid:54) = ∅ ;(b) If { Z k t } t ∈ N is a subsequence of { Z k } k ∈ N such that lim t →∞ Z k t = Z ∗ ∈ S , thenlim t →∞ Φ( Z k t ) = Φ( Z ∗ ) . Proof. (a) We show that { Z k } k ∈ N is bounded by contradiction.Assume for the sake of contradiction that there exists a subsequence { Z k l } l ∈ N such that (cid:107) Z k l (cid:107) → ∞ as l → ∞ . According to Assumption (A3), Φ is coercive, and then Φ( Z k l ) → ∞ as l → ∞ . However, since { Φ( Z k ) } k ∈ N is strictly decreasing and lower bounded by Φ inf > −∞ ,then { Φ( Z k ) } k ∈ N converges and { Φ( Z k l ) } l ∈ N also converges, which yields a contradiction. Thus, { Z k } k ∈ N is bounded and there exists a convergent subsequence of { Z k } k ∈ N .(b) Let { Z k t } t ∈ N be a subsequence such that Z k t → Z ∗ as t → ∞ .Since f is lower semicontinuous, then we havelim inf t →∞ τ f ( x k t +1 ) ≥ τ f ( x ∗ ) . x k t +1 referring to (12), we obtain the following inequality τ f ( x k t +1 ) + (cid:104) x k t +1 − x k t , x k t − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k t +1 j ) (cid:105) W + β k t + 12 (cid:107) x k t +1 − x k t (cid:107) W ≤ τ f ( x ∗ ) + (cid:104) x ∗ − x k t , x k t − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k t +1 j ) (cid:105) W + β k t + 12 (cid:107) x ∗ − x k t (cid:107) W . Letting t → ∞ on both sides of the above inequality, we getlim sup t →∞ τ f ( x k t +1 ) ≤ τ f ( x ∗ ) + lim sup t →∞ (cid:104) x ∗ − x k t , x k t − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k t +1 j ) (cid:105) W + β k t + 12 (cid:107) x ∗ − x k t (cid:107) W = τ f ( x ∗ ) , where we use the boundedness of the sequences { x k t +1 } t ∈ N , { Y k t +1 j } t ∈ N and { β k t } t ∈ N and the resultthat lim t →∞ (cid:107) x k t +1 − x k t (cid:107) W → t →∞ τ f ( x k t +1 ) = f ( x ∗ ).Due to the continuity of µ j (cid:107) Y j − R j ( x ) (cid:107) F with respect to Y j and x and the continuity of g ( t )with respect to t , we havelim t →∞ Φ( Z k t +1 ) = τ f ( x ∗ ) + J (cid:88) j =1 (cid:32) µ j (cid:107) Y ∗ j − R j ( x ∗ ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y ∗ j )) (cid:33) = Φ( Z ∗ ) . In this subsection, we show the convergence of the sequence { Z k } k ∈ N generated by the PARMalgorithm.Let us first review a definition and a theorem on the Kurdyka-(cid:32)Lojasiewicz (KL) property of afunction in [1, 2]. Definition 4.3 (Kurdyka-(cid:32)Lojasiewicz)
Let f : R d → ( −∞ , + ∞ ] be proper and lower semi-continuous.(a) The function f is called to have the Kurdyka-(cid:32)Lojasiewicz (KL) property at ˜ x ∈ dom ∂f ifthere exist η ∈ (0 , + ∞ ], a neighborhood U of ˜ x and a continuous function ϕ : [0 , η ) → [0 , ∞ )such that(i) ϕ (0) = 0;(ii) ϕ is C on (0 , η ) and continuous at 0;(iii) for all s ∈ (0 , η ), ϕ (cid:48) ( s ) > x ∈ U ∩ { x ∈ R d : f ( ˜ x ) < f ( x ) < f ( ˜ x ) + η } , the following Kurdyka-(cid:32)Lojasiewiczinequality holds ϕ (cid:48) ( f ( x ) − f ( ˜ x ))dist( , ∂f ( x )) ≥ . f is called a KL function if f has the KL property at each point of dom ∂f . Theorem 4.1 (see [2, Theorem 2.9])
Let f : R d → ( −∞ , + ∞ ] be a proper lower semicon-tinuous function. Consider a sequence { x k } k ∈ N that satisfies Condition (H1)-(H3). If f has theKurdyka-(cid:32)Lojasiewicz property at the limit point x ∗ specified in (H3), then the sequence { x k } k ∈ N converges to x ∗ as k goes to ∞ , and x ∗ is a critical point of f . Moreover, the sequence { x k } k ∈ N has a finite length, i.e., ∞ (cid:88) k =0 (cid:107) x k +1 − x k (cid:107) < ∞ . The KL theory is a powerful tool for nonconvex nonsmooth optimization problems and KLfunctions are ubiquitous. For example, for multiplicative noise removal, the objective function Φ inmodel (4) with f defined as (3) and g as (2) is a KL function. For more examples of KL functionssee [1, 2].Next, equipped with Condition (H1)-(H3) discussed in the previous subsections, we can showthat any limit point of { Z k } k ∈ N is a critical point of Φ in the following theorem. Theorem 4.2
Suppose that the objective function Φ in model (4) satisfies Assumption (A1)-(A3).Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). Let S denote the set of all limit points of the sequence { Z k } k ∈ N and letcrit(Φ) denote the set of all critical points of the function Φ. Then ∅ (cid:54) = S ⊆ crit(Φ), that is, anylimit point of { Z k } k ∈ N is a critical point of Φ. Proof.
Let Z ∗ be in S (cid:54) = ∅ and let { Z k t } t ∈ N be a subsequence of { Z k } k ∈ N such that lim t →∞ Z k t = Z ∗ . Then by Proposition 4.3, lim t →∞ Φ( Z k t ) = Φ( Z ∗ ). Also, it follows from Proposition 4.2 andCorollary 4.2 that A k t ∈ ∂ Φ( Z k t ) and A k t → as t → ∞ . Thus, by the definition of subdifferentialin Definition 3.1, we have ∈ ∂ Φ( Z ∗ ).In addition to Condition (H1)-(H3), if Φ is a Kurdyka-(cid:32)Lojasiewicz (KL) function, then a strongerconvergence result can be achieved for the sequence { Z k } k ∈ N . That is, we can prove that thesequence { Z k } k ∈ N itself converges a critical point of Φ using the KL theory. Theorem 4.3
Suppose that the objective function Φ in model (4) satisfies Assumption (A1)-(A3).Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). If Φ is a KL function, then the following assertions hold.(a) The sequence { Z k } k ∈ N has finite length, that is, ∞ (cid:88) k =0 (cid:107) Z k +1 − Z k (cid:107) < ∞ ;(b) The sequence { Z k } k ∈ N converges to a critical point of Φ. Proof.
It follows from Proposition 4.1, Proposition 4.2 and Proposition 4.3 that the sequence { Z k } k ∈ N satisfies Condition (H1)-(H3), respectively. Then the results (a) and (b) immediatelyfollow from Theorem 4.1. 20 Numerical Results
In this section, we first describe a practical version of Algorithm 1 and then test the proposedalgorithms to solve the proposed nonlocal low-rank model for multiplicative noise removal. Wecompare our proposed method with six existing methods: the DZ method [10], the HNW method[16], the I-DIV method [29], the TwL-mV method [18], the learned dictionary (Dict) method [15]and the SAR-BM3D method [25]. Numerical results show superior performance of the proposedmethod over the existing ones.The experiments were implemented in Matlab 2016b running a 64 bit Ubuntu 18.04 systemand executed on an eight-core Intel Xeon E5-2640v3 128GB CPU at 2.6 GHz, with four NVIDIATesla P100 16GB GPUs. The proposed algorithms were accelerated using graphics processing units(GPUs), as the estimation of each patch matrix can be computed in parallel.
The PARM algorithm presented in Algorithm 1 converges theoretically as shown in section 4,if the patch extraction operator R j is assumed to be fixed. The extraction operator R j playsan important role in improving the denoising performance because a better initialization of R j can yield to a better denoised image. In the case in which the optimal R j is not available, it isempirically challenging to find an appropriate choice of R j only with a noisy image given. Algorithm 2
Practical version of the PARM algorithm Set parameters τ , µ j , λ j , and β k Initialize x and w j for k from 0 to K − do Set extraction ˆ R j by block matching Compute matrix ˆ W for j from 1 to J do [ U k +1 j , (cid:101) Σ kj , V k +1 j ] = SV D (cid:16) ˆ R j ( x k ) (cid:17) (cid:46) SVD Σ k +1 j = S λ j /µ j , w kj ( (cid:101) Σ kj ) (cid:46) WSVT Y k +1 j = U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) (cid:46) Update Y k +1 j ( w k +1 j ) i = g (cid:48) ((Σ k +1 j ) ii ) (cid:46) Update w k +1 j end for x k +1 ∈ prox ˆ W τβk +1 f (cid:16) x k + β k +1 (cid:16)(cid:80) Jj =1 µ j ˆ W − ˆ R j (cid:62) ( Y k +1 j ) − x k (cid:17)(cid:17) (cid:46) Update x k +1 end for Here, we provide a practical version of the PARM algorithm with dynamically updated patchextraction operator, denoted as ˆ R j . The operator ˆ R j is recomputed at each step by block matchingbased on the update of the estimated image e x k , and the weighted counts matrix, now denoted as ˆ W , is recomputed based on the newest ˆ R j . As a result of this dynamically updating scheme on ˆ R j ,the patch matrix Y k +1 j and Y kj may not refer to the same patch group. That is because Y k +1 j isassociated with ˆ R j ( x k ), while Y kj is associated with ˆ R j ( x k − ) using a different extraction operatorˆ R j . Hence, in this practical version of the PARM algorithm, we set that Y k +1 j is updated withoutusing the previous update Y kj and its parameter α jk . The overall procedure of a practical versionof the PARM algorithm for multiplicative noise removal is summarized as Algorithm 2.21 .2 Parameter settings First, we utilize block matching and normalization with mean zero to extract patch matricesusing the following parameter settings for block matching. In Algorithm 1, the fixed extraction R j is initialized via block matching based on the estimated image from the SAR-BM3D method;and in Algorithm 2, the dynamically updated extraction ˆ R j is computed at each step via blockmatching based on the update of e x k . Besides this, both algorithms share the same parametersettings for block matching, including the search window, the patch size and the number of patchesin each patch group as presented in Table 2. Table 2: Settings for block matching. L Search window Patch size Patch number ×
10 1503 50 9 × × Second, we set the model parameters and algorithm parameters for Algorithm 1 and Algo-rithm 2, respectively. The model parameters τ , λ j ’s, µ j ’s, ρ , γ and ε are adaptive to the noise level.The algorithm parameters ( α jk ’s and) β k ’s influence the computational speed. The settings of theabove parameters are presented in Table 3 and Table 4. Table 3: Parameter settings for Algorithm 1. L Standard images Remote images Common parameters τ λ j τ λ j µ j ρ γ ε α jk β k β k /
50 1.8 β k /
100 1 1 0.01 4 10 − β k /
150 1 β k /
150 0.45 1 1.5 1.9 10 − β k /
250 0.6 β k /
200 0.15 1 2 1.3 10 − L Standard images Remote images Common parameters τ λ j τ λ j µ j ρ γ ε β k β k /
50 2.6 β k /
50 2.6 1 0.01 4 10 − β k /
150 1.3 β k /
150 1.2 1 1.5 1.9 10 − β k /
250 0.8 β k /
250 0.7 1 2 1.3 10 − Third, the initialization settings and the stopping criteria are set differently for Algorithm 1 andAlgorithm 2. Algorithm 1 is initialized using the estimated image from the SAR-BM3D methodand is terminated if the relative error reaches a tolerance threshold as follows (cid:107) x k +1 − x k (cid:107) W (cid:107) x k (cid:107) W < max (cid:26) − , (cid:107) x − x (cid:107) W (cid:107) x (cid:107) W × (cid:27) . Algorithm 2 is initialized using the given noisy image and terminated by experience based on thenumber of iterations K . For L = 1 , , K is set to 65-70, 23-25, 18-20, respectively.Lastly, the restored image is estimated by ˆ u = e ˆ x , where ˆ x is the log-transformed image obtainedfrom the proposed algorithms. 22 .3 Numerical results tested on standard test images In this experiment, we use standard test images “Monarch”, “Lena” and “House” all of size256 × L = 1, L = 3 and L = 5. (a) Monarch (b) Lena (c) HouseFigure 1: Standard test images. The evaluation of the image quality is measured in the intensity format between the originalimage u ∈ R N and the estimated image ˆ u ∈ R N , using the peak-signal-to-noise ratio (PSNR)defined as PSNR = 10 log (cid:18) N (cid:107) u − ˆ u (cid:107) (cid:19) and the structural similarity index measure (SSIM) [35]. Table 5: Numerical results tested on standard test images at different noise levels by different methods.
Image L Meas. Alg 1 Alg 2 SAR- DZ HNW I-DIV TwL- DictBM3D 4V
Monarch 1 PSNR
Table 5 reports the PSNR and SSIM values of the denoised images tested on three standard testimages. The best results for each case are marked in bold and the second-best results are underlined.23 a) Noisy image (L=1) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 2: Comparison of denoised images restored from “Monarch” at noise level L = 1 by different methods. The(PSNR, SSIM) values for each denoised image: (c) Alg 1 (21.94dB, 0.6926); (d) Alg 2 (21.55dB, 0.6966); (e) SAR-BM3D (21.36dB, 0.6404); (f) DZ (19.38dB, 0.5758); (g) HNW (19.73dB, 0.5523); (h) I-DIV (19.91dB, 0.5883); (i)TwL-4V (19.26dB, 0.5848); (j) Dictionary (19.50dB, 0.5726).(a) Noisy image (L=3) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 3: Comparison of denoised images restored from “Lena” at noise level L = 3 by different methods. The (PSNR,SSIM) values for each denoised image: (c) Alg 1 (26.44dB, 0.7892); (d) Alg 2 (26.29dB, 0.7944); (e) SAR-BM3D(26.00dB, 0.7596); (f) DZ (24.06dB, 0.6907); (g) HNW (24.36dB, 0.6911); (h) I-DIV (24.48dB, 0.7073); (i) TwL-4V(24.29dB, 0.7128); (j) Dictionary (24.81dB, 0.7379). a) Noisy image (L=5) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 4: Comparison of denoised images restored from “House” at noise level L = 5 by different methods. The(PSNR, SSIM) values for each denoised image: (c) Alg 1 (29.04dB, 0.8115); (d) Alg 2 (29.12dB, 0.8163); (e) SAR-BM3D (28.36dB, 0.7641); (f) DZ (25.70dB, 0.7339); (g) HNW (25.73dB, 0.6995); (h) I-DIV (25.84dB, 0.7291); (i)TwL-4V (25.79dB, 0.7197); (j) Ditionary (24.56dB, 0.6474). Both Algorithm 1 and Algorithm 2 outperform all the other methods in terms of PSNR and SSIMvalues. Compared with the benchmark SAR-BM3D method, Algorithm 1 achieves 0.54-0.59dB,0.42-0.66dB and 0.46-0.68dB improvements in PSNR when L = 1, L = 3 and L = 5, respectively.Algorithm 2 with updated patch extraction also surpasses the SAR-BM3D method and it evensurpasses Algorithm 1 in some of the cases, especially in terms of SSIM values.Figure 2-4 present the denoised images tested on “Monarch” at noise level L = 1, “Lena” at L = 3 and “House” at L = 5. In terms of the visual quality, Algorithm 1 and Algorithm 2 performbetter than other methods, because they reconstruct more details and more smooth textures, butless noise and fewer artifacts. For example, compared to the DZ method, the HNW method, theI-DIV method, the TwL-mV method and the learned dictionary method, the proposed methodspreserve more details of the hair of “Lena” and generate more smooth textures on the wings of“Monarch” and the sky of “House”. Compared to the benchmark SAR-BM3D method, the proposedmethods generate fewer artifacts, resulting in better images in terms of PSNR and SSIM values. In this experiment, we use remote sensing images “Remote 1” and “Remote 2” both of size512 × ×
632 as shown in Figure 5. To generate the observedimages, we degrade the original test images by multiplicative Gamma noise at L = 1, L = 3 and L = 5. The image quality is evaluated using PSNR and SSIM values.Table 6 reports the PSNR and SSIM values of the denoised images tested on three remote sensingimages. Algorithm 1 and Algorithm 2 achieve great performance in PSNR and SSIM values overother methods. For example, Algorithm 1 outperforms the benchmark SAR-BM3D method by0.11-0.28dB, 0.06-0.19dB and 0.06-0.15dB in PSNR when L = 1, L = 3 and L = 5, respectively;and it outperforms the other traditional methods by 0.76-1.57dB, 0.94-2.93dB and 0.86-3.62dB inPSNR when L = 1, L = 3 and L = 5, respectively. Algorithm 2 is also comparable to Algorithm 125 a) Remote 1 (b) Remote 2 (c) Remote 3Figure 5: Remote sensing images. Table 6: Numerical results tested on remote sensing images at different noise levels by different methods.
Image L Meas. Alg 1 Alg 2 SAR- DZ HNW I-DIV TwL- DictBM3D 4V
Remote 1 1 PSNR a) Noisy image (L=1) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 6: Comparison of denoised images restored from “Remote 1” at noise level L = 1 by different methods.The (PSNR, SSIM) values for each denoised image: (c) Alg 1 (21.23dB, 0.5459); (d) Alg 2 (21.11dB, 0.5510); (e)SAR-BM3D (21.12dB, 0.5393); (f) DZ (20.47dB, 0.4950); (g) HNW (20.24dB, 0.4551); (h) I-DIV (20.03dB, 0.4709);(i) TwL-4V (20.07dB, 0.4934); (j) Dictionary (20.44dB, 0.4867).(a) Noisy image (L=3) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 7: Comparison of denoised images restored from “Remote 2” at noise level L = 3 by different methods.The (PSNR, SSIM) values for each denoised image: (c) Alg 1 (24.13dB, 0.6471); (d) Alg 2 (24.07dB, 0.6474); (e)SAR-BM3D (24.03dB, 0.6449); (f) DZ (22.76dB, 0.5758); (g) HNW (22.71dB, 0.5805); (h) I-DIV (22.49dB, 0.5744);(i) TwL-4V (22.66dB, 0.5845); (j) Dictionary (22.34dB, 0.5592). a) Noisy image (L=5) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 8: Comparison of denoised images restored from “Remote 3” at noise level L = 5 by different methods.The (PSNR, SSIM) values for each denoised image: (c) Alg 1 (25.80dB, 0.7427); (d) Alg 2 (25.83dB, 0.7460); (e)SAR-BM3D (25.65dB, 0.7316); (f) DZ (24.45dB, 0.6745); (g) HNW (23.79dB, 0.6562); (h) I-DIV (24.10dB, 0.6713);(i) TwL-4V (24.30dB, 0.6737); (j) Dictionary (22.70dB, 0.5736). and the SAR-BM3D method.Figure 6-8 present the denoised images by different methods tested on “Remote 1” at noiselevel L = 1, “Remote 2” at L = 3 and “Remote 3” at L = 5. Algorithm 1, Algorithm 2 and thebenchmark SAR-BM3D method achieve significantly better visual quality over other methods. Forexample, they reconstruct buildings, roads and patterns with fine edges and textures. In this experiment, we use real SAR images images “SAR 1” of size 370 ×
370 and “SAR 2” ofsize 350 ×
350 as shown in Figure 9(a) and Figure 10(a), respectively.Figure 9 and Figure 10 demonstrate that Algorithm 1 and Algorithm 2 achieve better denoisingperformance than other methods. For example, they reconstruct more local structures and smoothtextures than the DZ method, the HNW method, the I-DIV method, the TwL-4V method andthe learned dictionary method; and they remove more noise and generate fewer artifacts than thebenchmark SAR-BM3D method.In addition to the visual quality comparison on the denoised images, we can also receive guidanceby computing the equivalent number of looks (ENL) and analyzing the ratio images for differentmethods.The ENL of an estimated image ˆ u ∈ R N measures the multiplicative noise reduction in homo-geneous regions and is defined as ENL = µ u σ u , where µ ˆ u is the average intensity of the selected area and σ u is its variance.For computing the ENL values, two homogeneous regions are respectively selected from “SAR 1”and “SAR 2”, as indicated by the white boxes in Figure 11(a) and Figure 12(a). Table 7 presents theENL values for different methods. The SAR-BM3D method has the lowest ENL values comparedto other methods, which indicates that the multiplicative noise is not effectively reduced or there28 a) SAR 1 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 9: Comparison of denoised images restored from “SAR 1” by different methods.(a) SAR 2 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 10: Comparison of denoised images restored from “SAR 2” by different methods. Table 7: ENL values of desnoised images restored from real SAR images by different methods.
Image Region Noisy Alg 1 Alg 2 SAR- DZ HNW I-DIV TwL- DictBM3D 4V
SAR 1 Left 9.46 63.09 289.17 42.84 745.14 521.24 306.90 117.48 183.57Right 10.65 82.59 203.24 57.36 333.18 360.12 276.04 144.54 175.49SAR 2 Left 22.64 97.02 894.03 91.92 1008.50 816.97 501.22 579.26 336.58Right 21.91 96.58 734.47 91.03 985.64 740.27 566.30 724.76 444.01
The pointwise ratio between the real SAR image u ∈ R N and the estimated image ˆ u ∈ R N simulates the multiplicative noise that has been removed by the given method and is defined asRatio = u ˆ u . The ratio images for different methods are presented in Figure 11 and Figure 12. The ratio imagesfor Algorithm 1, Algorithm 2 and the SAR-BM3D method present almost random speckle, whichis matched with the expected statistics. On the contrary, the ratio images for the other methodsstill contain some geometric structures such as edges and details correlated to the real SAR images,which indicates that those methods have removed some valuable information besides of noise. (a) SAR 1 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 11: Comparison of the ratio images between “SAR 1” and the estimated images by different methods.
We have proposed an effective method for multiplicative noise removal. The proposed methodconsists of a nonlocal low-rank model, which exploits the low-rank prior of nonlocal similar patchmatrices, and the PARM iterative algorithm, which solves the nonconvex nonsmooth optimizationproblem resulting from the proposed model. We have established the global convergence of the30 a) SAR 2 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 12: Comparison of the ratio images between “SAR 2” and the estimated images by different methods. sequence generated by the PARM algorithm to a critical point of the nonconvex nonsmooth objec-tive function of the resulting optimization problem. Numerical results have demonstrated that theproposed method with a theoretical convergence guarantee outperforms several existing methodsincluding the state-of-the-art SAR-BM3D method.
References [1]
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran , Proximal alternating mini-mization and projection methods for nonconvex problems. an approach based on the kurdyka-lojasiewicz inequality , Mathematics of Operations Research, 35 (2010), pp. 438–457.[2]
H. Attouch, J. Bolte, and F. B. Svaiter , Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularizedgauss-seidel methods , Mathematical Programming, 137 (2013), pp. 91–129.[3]
G. Aubert and J.-F. Aujol , A variational approach to removing multiplicative noise , SIAMjournal on applied mathematics, 68 (2008), pp. 925–946.[4]
J. Bolte, S. Sabach, and M. Teboulle , Proximal alternating linearized minimization fornonconvex and nonsmooth problems , Mathematical Programming, 146 (2014), pp. 459–494.[5]
A. Buades, B. Coll, and J.-M. Morel , A non-local algorithm for image denoising , Com-puter Vision and Pattern, 2 (2005), pp. 60–65.[6]
R. Chan, H. Yang, and T. Zeng , A two-stage image segmentation method for blurry imageswith poisson or multiplicative gamma noise , SIAM Journal on Imaging Sciences, 7 (2014),pp. 98–127.[7]
K. Chen, H. Dong, and K.-S. Chan , Reduced rank regression via adaptive nuclear normpenalization , Biometrika, 100 (2013), pp. 901–920.318]
K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian , Image denoising by sparse 3-dtransform-domain collaborative filtering , IEEE Transactions on Image Processing, 16 (2007),pp. 2080–2095.[9]
W. Dong, G. Shi, X. Li, Y. Ma, and F. Huang , Compressive sensing via nonlocal low-rankregularization , IEEE Transactions on Image Processing, 23 (2014), pp. 3618–3632.[10]
Y. Dong and T. Zeng , A convex variational model for restoring blurred images with multi-plicative noise , SIAM Journal on Imaging Sciences, 6 (2013), pp. 1598–1625.[11]
J. W. Goodman , Some fundamental properties of speckle , JOSA, 66 (1976), pp. 1145–1150.[12]
S. Gu, L. Zhang, W. Zuo, and X. Feng , Weighted nuclear norm minimization with ap-plication to image denoising , in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, 2014, pp. 2862–2869.[13]
R. A. Horn, R. A. Horn, and C. R. Johnson , Topics in matrix analysis , Cambridgeuniversity press, 1994.[14]
T. Huang, W. Dong, X. Xie, G. Shi, and X. Bai , Mixed noise removal via laplacianscale mixture modeling and nonlocal low-rank approximation , IEEE Transactions on ImageProcessing, 26 (2017), pp. 3171–3186.[15]
Y.-M. Huang, L. Moisan, M. K. Ng, and T. Zeng , Multiplicative noise removal via alearned dictionary , IEEE Transactions on Image Processing, 21 (2012), pp. 4534–4543.[16]
Y.-M. Huang, M. K. Ng, and Y.-W. Wen , A new total variation method for multiplicativenoise removal , SIAM Journal on imaging sciences, 2 (2009), pp. 20–40.[17]
Y.-M. Huang, H.-Y. Yan, Y.-W. Wen, and X. Yang , Rank minimization with applicationsto image noise removal , Information Sciences, 429 (2018), pp. 147–163.[18]
M. Kang, S. Yun, and H. Woo , Two-level convex relaxed variational model for multiplica-tive denoising , SIAM Journal on Imaging Sciences, 6 (2013), pp. 875–903.[19]
A. S. Lewis and H. S. Sendov , Nonsmooth analysis of singular values. part i: Theory ,Set-Valued Analysis, 13 (2005), pp. 213–241.[20]
A. S. Lewis and H. S. Sendov , Nonsmooth analysis of singular values. part ii: applications ,Set-Valued Analysis, 13 (2005), pp. 243–264.[21]
Y. Lou, X. Zhang, S. Osher, and A. Bertozzi , Image recovery via nonlocal operators ,Journal of Scientific Computing, 42 (2010), pp. 185–197.[22]
C. Lu, J. Tang, S. Yan, and Z. Lin , Generalized nonconvex nonsmooth low-rank mini-mization , in 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014,pp. 4130–4137.[23]
J. Lu, L. Shen, C. Xu, and Y. Xu , Multiplicative noise removal in imaging: An exp-modeland its fixed-point proximity algorithm , Applied and Computational Harmonic Analysis, 41(2016), pp. 518–539.[24]
C. J. Oliver and S. Quegan , Understanding Synthetic Aperture Radar Images , SciTechPublishing, Inc., Raleigh, NC, 2004. 3225]
S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva , A nonlocal sar imagedenoising algorithm based on llmmse wavelet shrinkage , IEEE Transactions on Geoscience andRemote Sensing, 50 (2011), pp. 606–616.[26]
B. Recht, M. Fazel, and P. A. Parrilo , Guaranteed minimum-rank solutions of linearmatrix equations via nuclear norm minimization , SIAM Review, 52 (2010), pp. 471–501.[27]
J. M. Schmitt, S. Xiang, and K. M. Yung , Speckle in optical coherence tomography ,Journal of Biomedical Optics, 4 (1999), pp. 95–105.[28]
J. Shi and S. Osher , A nonlinear inverse scale space method for a convex multiplicativenoise model , SIAM Journal on imaging sciences, 1 (2008), pp. 294–321.[29]
G. Steidl and T. Teuber , Removing multiplicative noise by douglas-rachford splitting meth-ods , Journal of Mathematical Imaging and Vision, 36 (2010), pp. 168–184.[30]
R. F. Wagner, S. W. Smith, J. M. Sandrik, and H. Lopez , Statistics of speckle inultrasound b-scans , IEEE Transactions on Sonics and Ultrasonics, 30 (1983), pp. 156–163.[31]
J. Wei, Y. Huang, L. Ke, and L. Wang , Nonlocal low-rank-based compressed sensing forremote sensing image reconstruction , IEEE Geoscience & Remote Sensing Letters, 13 (2017),pp. 1557–1561.[32]
Y. Wu , Speckle noise removal via nonlocal low-rank regularization , Journal of Visual Commu-nication and Image Representation, 39 (2016), pp. 172–180.[33]
W. Yu, W. Yin, and J. Zeng , Global convergence of admm in nonconvex nonsmooth opti-mization , Journal of Scientific Computing, 78 (2018), pp. 1–35.[34]