[PDF] Multiplicative Noise Removal: Nonlocal Low-Rank Model and Its Proximal Alternating Reweighted Minimization Algorithm

Abstract

The goal of this paper is to develop a novel numerical method for efficient multiplicative noise removal. The nonlocal self-similarity of natural images implies that the matrices formed by their nonlocal similar patches are low-rank. By exploiting this low-rank prior with application to multiplicative noise removal, we propose a nonlocal low-rank model for this task and develop a proximal alternating reweighted minimization (PARM) algorithm to solve the optimization problem resulting from the model. Specifically, we utilize a generalized nonconvex surrogate of the rank function to regularize the patch matrices and develop a new nonlocal low-rank model, which is a nonconvex nonsmooth optimization problem having a patchwise data fidelity and a generalized nonlocal low-rank regularization term. To solve this optimization problem, we propose the PARM algorithm, which has a proximal alternating scheme with a reweighted approximation of its subproblem. A theoretical analysis of the proposed PARM algorithm is conducted to guarantee its global convergence to a critical point. Numerical experiments demonstrate that the proposed method for multiplicative noise removal significantly outperforms existing methods such as the benchmark SAR-BM3D method in terms of the visual quality of the denoised images, and the PSNR (the peak-signal-to-noise ratio) and SSIM (the structural similarity index measure) values.

Full PDF

MMultiplicative Noise Removal: Nonlocal Low-Rank Model and ItsProximal Alternating Reweighted Minimization Algorithm ∗ Xiaoxia Liu † Jian Lu †‡ Lixin Shen § Chen Xu † Yuesheng Xu ¶ Abstract

The goal of this paper is to develop a novel numerical method for eﬃcient multiplicativenoise removal. The nonlocal self-similarity of natural images implies that the matrices formedby their nonlocal similar patches are low-rank. By exploiting this low-rank prior with applicationto multiplicative noise removal, we propose a nonlocal low-rank model for this task and developa proximal alternating reweighted minimization (PARM) algorithm to solve the optimizationproblem resulting from the model. Speciﬁcally, we utilize a generalized nonconvex surrogateof the rank function to regularize the patch matrices and develop a new nonlocal low-rankmodel, which is a nonconvex nonsmooth optimization problem having a patchwise data ﬁdelityand a generalized nonlocal low-rank regularization term. To solve this optimization problem,we propose the PARM algorithm, which has a proximal alternating scheme with a reweightedapproximation of its subproblem. A theoretical analysis of the proposed PARM algorithmis conducted to guarantee its global convergence to a critical point. Numerical experimentsdemonstrate that the proposed method for multiplicative noise removal signiﬁcantly outperformsexisting methods such as the benchmark SAR-BM3D method in terms of the visual quality ofthe denoised images, and the PSNR (the peak-signal-to-noise ratio) and SSIM (the structuralsimilarity index measure) values.

Key words: multiplicative noise removal, nonlocal low-rank regularization, image restoration

AMS subject classiﬁcation:

We consider in this paper the problem of multiplicative noise removal. To eﬀectively restoreimages degraded by multiplicative noise, we develop a method which consists of an optimizationmodel and an iterative algorithm to solve the minimization problem. Based on the nonlocal self-similarity of natural images, we propose a nonlocal low-rank model for multiplicative noise removal. ∗ Submitted to the editors January 30, 2020.

Funding:

The research of J. Lu is partially supported by the Natural Science Foundation of China under grant61972265 and 11871348. The research of L. Shen is partially supported by the National Science Foundation undergrant DMS-1913039. The research of C. Xu is partially supported by the Natural Science Foundation of Chinaunder grant 61872429. The research of Y. Xu is partially supported by the National Science Foundation under grantDMS-1912958 and by the Natural Science Foundation of China under grant 11771464. † Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statis-tics, Shenzhen University, Shenzhen, 518060, P.R. China. ([email protected],[email protected],[email protected]) ‡ Corresponding author. § Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA. ([email protected]) ¶ Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, USA. ([email protected]) a r X i v : . [ m a t h . O C ] F e b he resulting model is a nonconvex nonsmooth minimization problem. We develop a proximalalternating reweighted minimization (PARM) algorithm with a convergence guarantee to eﬃcientlysolve the problem.Multiplicative noise (i.e., speckle noise) widely occurs in coherent imaging systems due to theinterference of coherent waves scattered from distributed targets. For example, images obtainedfrom synthetic aperture radar (SAR) [24], ultrasound imaging [30] and laser imaging [27] are nat-urally contaminated with multiplicative noise. Removing multiplicative noise from such images isinevitable in many areas of applications.Methods employed for multiplicative noise removal in the literature include the total variation(TV) regularization based models, patch-based methods, and nonlocal low-rank based methods. TVregularization has been widely used to preserve edges in the restored images. In a TV regularizationbased model, the objective function is the sum of a data ﬁdelity term and a TV regularization term.The data ﬁdelity term measures the closeness between the desired image and the observed noisyimage, while the TV regularization term measures the total variation of a desired image or an imagein its transformed domain. The AA model [3] used the Bayesian maximum a posteriori probability(MAP) estimation to derive the data ﬁdelity term in terms of the desired image. However, thisdata ﬁdelity term is nonconvex and the resulting optimization problem is challenging to solve. Toovercome this challenge, the DZ model [10] modiﬁed the data ﬁdelity term by adding a quadraticterm. As a consequence, the objective function of the DZ model becomes convex under some mildconditions. The I-DIV model [29] used the so-called I-divergence as the data ﬁdelity term. Byperforming the logarithmic transformation, the SO model [28], the HNM model [16], and the Expmodel [23] led to convex, even strictly convex, data ﬁdelity terms. The m V model [34] and the TwL- m V model [18] used convex or strongly convex data ﬁdelity terms via the m th root transformation.The TV regularization based models have good performance in denoising. However, they tend toover-smooth image textures and generate unexpected artifacts.The patch-based methods make use of the redundancy of image patches to yield a restoredimage with ﬁne details. Sparse representations of image patches have been studied in the patch-based methods for multiplicative noise removal. In the learned dictionary method [15], an optimalover-complete dictionary was learned from the patches of the logarithmic transformed noisy imageand then an image was restored via a variational model based on the learned dictionary and aTV regularization. The SAR-BM3D method [25] is another remarkable approach relying on asparse representation, which takes advantage of the nonlocal self-similarity of natural images [5].Nonlocal similar patches, collected as 3D groups, were identiﬁed based on a probabilistic similaritymeasure for multiplicative noise, and then were denoised by using jointly nonlocal ﬁltering anda local linear minimum-mean-square-error shrinkage in a wavelet domain. We remark that thosemethods constrain the sparsity priors in either a ﬁxed dictionary or a ﬁxed wavelet domain, whichlimits their capability in multiplicative noise removal.Recently, the nonlocal low-rank based methods were extensively exploited in image processing.It is recognized that natural images are of nonlocal self-similarity. Matrices formed by nonlocalsimilar patches are low-rank, and hence the desired image can be restored by low-rank estimationsof nonlocal similar patch matrices. To regularize the rank of the matrices formed by nonlocalsimilar patches, diﬀerent approximations of the rank function including the weighted nuclear normand the log-det function were adopted, see, e.g., [9, 12, 14, 17, 31].Existing studies have shown impressive empirical performance of nonlocal low-rank based meth-ods. However, theoretical analysis of the existing methods is missing and there is little work onapplications of nonlocal low-rank based methods to multiplicative noise removal. To address thisissue, we propose to develop a new nonlocal low-rank based method that is theoretically and prac-tically suitable for multiplicative noise removal. The proposed method includes a novel nonlocal2ow-rank model and an eﬃcient iterative algorithm to solve the proposed model with a convergenceguarantee. We explore the underlying low-rank prior of the patch matrices and propose a nonlocallow-rank model for multiplicative noise removal. The resulting optimization problem is nonconvexand nonsmooth, which is challenging to design eﬃcient and theoretically convergence-guaranteedalgorithms to solve. In fact, the well-known alternating direction method of multipliers (ADMM)algorithm is not applicable to this optimization problem, and the alternating minimization (AM)algorithm and the augmented Lagrange multiplier (ALM) algorithm may not converge [4, 33]. Toaddress this challenge on developing an eﬃcient convergent algorithm, we propose a proximal al-ternating minimization scheme with a reweighted approximation of its subproblem and furtheruse the Kurdyka-(cid:32)Lojasiewicz theory [2, 4] to prove its global convergence to a critical point. Theexperiments demonstrate that the proposed nonlocal low-rank based method is well suitable formultiplicative noise removal.The main contributions of this work are: • We propose a nonlocal low-rank model for multiplicative noise removal. This model is for-mulated in the log-transformed domain of images. The objective function of the model asthe sum of a ﬁdelity term and a regularization term is nonconvex and nonsmooth. Its ﬁdelityterm is adapted from the corresponding one in the Exp model [23] to patches, and is strictlyconvex under certain conditions. Its regularization term is the application of the compositionof the rank operator with the patch extraction operator onto the underlying image. Due tothe diﬃculties caused by the composition and the rank function in solving this model, wepropose to split this composition by introducing an auxiliary variable and to approximate therank function using a smooth concave function. • We develop a proximal alternating reweighted minimization (PARM) algorithm for solvingthe proposed nonlocal low-rank model. The key in the PARM algorithm is to deal with theconcave function that is used to approximate the rank function in the model. We propose toapproximate this concave function by its aﬃne approximation (i.e., the reweighted approxi-mation) in each iteration of the PARM algorithm. This approach could be useful for a widerange of nonlocal low-rank models. • We provide a theoretical analysis of the PARM algorithm which guarantees its global con-vergence to a critical point, in contrast to the practically used algorithms such as in [9, 32]which are lack of convergence analysis. • We give a detailed description on the implementation of the PARM algorithm includingparameter settings, patch sizes, and search windows. We also test the proposed method forvarious images at diﬀerent noise levels. Furthermore, we conduct the performance comparisonof the proposed method with many existing ones for multiplicative noise removal, with respectto the visual quality of the denoised images, and the PSNR (the peak-signal-to-noise ratio)and SSIM (the structural similarity index measure) values.This paper is organized into six sections. In section 2, we present the nonlocal low-rank modelfor multiplicative noise removal. The proposed PARM algorithm to solve the resulting nonconvexnonsmooth optimization problem is presented in section 3. Section 4 is devoted to the convergenceanalysis of the proposed algorithm. In section 5, we demonstrate the eﬃciency of the new methodnumerically by experiment results. Section 6 concludes this paper.3

Nonlocal Low-Rank Model for Multiplicative Noise Removal

We propose in this section a nonlocal low-rank model for multiplicative noise removal by exploit-ing low-rank priors of the nonlocal similar patch matrices extracted from the underlying images.Throughout this paper, matrices are bold capital, vectors are bold lowercase and scalars orentries are not bold. Given x , y ∈ R d , (cid:104) x , y (cid:105) := (cid:80) di =1 (cid:104) x i , y i (cid:105) is the standard inner product and (cid:107) x (cid:107) := (cid:112) (cid:104) x , x (cid:105) is the standard (cid:96) norm. Let S d + denote the set of symmetric positive deﬁnitematrices of size d × d and let I d denote the identity matrix of size d × d . Given x , y ∈ R d and H ∈ S d + , (cid:104) x , y (cid:105) H := (cid:104) x , Hy (cid:105) is the H -weighted inner product and (cid:107) x (cid:107) H := (cid:112) (cid:104) x , x (cid:105) H is the H -weighted (cid:96) norm. Given X , Y ∈ R m × n , (cid:104) X , Y (cid:105) F := tr( X (cid:62) Y ) is the Frobenius inner productand (cid:107) X (cid:107) F := (cid:112) (cid:104) X , X (cid:105) F is the Frobenius norm.Multiplicative noise removal in this paper refers to reducing multiplicative noise in an L -lookimage obtained by the multi-look averaging technique. An L -look image v ∈ R N in the intensityformat degraded by multiplicative noise can be modeled as v = uη , where u ∈ R N is the desired image to be restored, η ∈ R N is the multiplicative noise and themultiplication operation is a componentwise operation. The multiplicative noise in each pixelfollows a Gamma distribution [11], whose probability distribution function is deﬁned as p ( η i ) = L L η L − i Γ( L ) e − Lη i , i = 1 , . . . , N, which has mean 1 and variance of 1 /L . A list of TV regularization based models for multiplicativenoise removal is presented in Table 1. Table 1: TV regularization based models for multiplicative noise removal.

Name Model Φ Transform. Properties of

ΦAA [3] min u ∈ R N + (cid:104) log u + vu , (cid:105) + λ (cid:107) u (cid:107) T V – nonconvexDZ [10] min u ∈ R N + (cid:104) log u + vu , (cid:105) + ρ (cid:107) (cid:112) uv − (cid:107) + λ (cid:107) u (cid:107) T V – strictly convex if ρ ≥ √ I-DIV [29] min u ∈ R N + (cid:104) u − v log u , (cid:105) + λ (cid:107) u (cid:107) T V – convexSO [28] min u ∈ R N (cid:104) x + v e x , (cid:105) + λ (cid:107) x (cid:107) T V x = log u strictly convexHNM [16] min x ∈ R N , w ∈ R N (cid:104) x + v e x , (cid:105) + ρ (cid:107) x − w (cid:107) + λ (cid:107) w (cid:107) T V x = log u convexExp [23] min x ∈ R N (cid:104) x + v e x , (cid:105) + ρ (cid:107) (cid:113) e x v − γ (cid:107) + λ (cid:107) x (cid:107) T V x = log u strictly convex if ργ ≤ m V [34] min x ∈ m √ U (cid:104) m log x + vx m , (cid:105) + λ (cid:107) x (cid:107) T V x = m √ u convex if m is suf-ﬁciently largeTwL- m V [18] min a> , x ∈ m √ U s (cid:104) a, x s (cid:105) − s (cid:104) m log a − s vx m , (cid:105) + λ (cid:107) x (cid:107) T V x = m √ u strongly convexwith respect to x R + = (0 , + ∞ ); 2. U = (0 , C ] N , C ∈ R + ; 3. λ > ρ > γ ≥

1, and s ≥

1; 4. denotes the vector whose entries are allones; 5. The division, multiplication, logarithmic, exponential, square root operations are componentwise operations. In the following, we present our nonlocal low-rank model for multiplicative noise removal stepby step. According to the nonlocal self-similarity of natural images, for an image patch, we can ﬁndnonlocal similar patches across the image or within a local window [5]. We begin with collectingsimilar patches using block matching [8, 25] and formulating patch matrices. Suppose that ˆ u ∈ R N

4s an estimated clean image in the intensity format and that J is the number of nonlocal similarpatch groups to be collected. For the reference patch ˆ u j ∈ R m j with size √ m j × √ m j in the j thpatch group, we search within a local window for a total of n j patches that are similar to thereference patch, assuming m j ≤ n j , j = 1 , , . . . , J . To fully exploit the statistics of L -look images,we measure the similarity between two patches ˆ u j ∈ R m j and ˆ u (cid:48) j ∈ R m j using the block similaritymeasure introduced in [25] d ( ˆ u j , ˆ u (cid:48) j ) = (2 L − m j (cid:88) i =1 log (cid:115) (ˆ u j ) i (ˆ u (cid:48) j ) i + (cid:115) (ˆ u (cid:48) j ) i (ˆ u j ) i  . Following the above, for each group we construct a patch matrix from all the patches in thegiven group through an extraction operator. Deﬁne R jl ∈ R m j × N be a binary matrix (i.e., itsentries are either 1 or 0) such that R jl ˆ u is the l th patch in the j th nonlocal similar patch groupof the given estimated image ˆ u , l = 1 , . . . , n j , j = 1 , . . . , J . Then we deﬁne a linear operator R j : R N → R m j × n j , m j ≤ n j , as follows, R j ( x ) = (cid:2) R j x R j x · · · R jn j x (cid:3) . Here, R j ( x ) is called the j th patch matrix of the (transformed) image x ∈ R N . After the patchmatrix is extracted, the patch matrix can be further processed using, for example, normalizationwith mean zero, and the corresponding extraction operator R j can be deﬁned accordingly. Intu-itively, the patch matrix R j ( x ) with similar structures should be a low-rank matrix if x is close tothe clean image ˆ u , for example, up to a transformation.Taking advantage of the low-rank prior of image patch matrices R j ( x )’s, the objective functionof a patch-based nonlocal low-rank model consists of a data ﬁdelity term to restore the desiredimage and a nonlocal low-rank regularization term as followsmin x τ f ( x ) + J (cid:88) j =1 λ j rank( R j ( x )) , (1)where x ∈ R N is the desired (transformed) image to be restored, f : R N → ( −∞ , + ∞ ] is thedata ﬁdelity term that measures the closeness between the observed image and the desired image, R j : R N → R m j × n j , m j ≤ n j , is the (normalized) extraction of j th nonlocal similar patch matrix,and λ j > j = 1 , . . . , J .Model (1) regularizes low-rank priors on image patch matrices, but it is not a feasible modelfrom both theoretical and practical perspectives. First, model (1) as a composition optimizationis not easy to solve. Second, the rank function is discontinuous and nonconvex, and minimizing aproblem involving the rank function is NP-hard [26]; therefore, it is challenging to solve model (1).To tackle the above challenges, we plan to relax model (1) in the following ways. We ﬁrst applythe variable splitting method to model (1) to address the composition optimization problem, adopta nonconvex surrogate of the rank function to replace the rank function, and preferably utilize apatchwise data ﬁdelity term.First, we apply the variable splitting method to relax model (1). By introducing auxiliary(splitting) variables Y j ∈ R m j × n j such that Y j = R j ( x ) and then relaxing these equalities of thesplitting variables, we obtain the following modelmin x , Y ,..., Y J τ f ( x ) + J (cid:88) j =1 (cid:110) µ j (cid:107) Y j − R j ( x ) (cid:107) F + λ j rank( Y j ) (cid:111) , µ j > g : [0 , ∞ ) → R such that g is monotonicallyincreasing, a generalized relaxation of the rank function is deﬁned as (cid:107) Y (cid:107) ∗ ,g = m (cid:88) i =1 g ( σ i ( Y )) , where Y ∈ R m × n , m ≤ n , and σ i ( Y ) is the i th largest singular value of Y . Here, we give two specialcases of the function g . If g ( t ) = (cid:107) t (cid:107) as the (cid:96) norm, then (cid:107) Y (cid:107) ∗ ,g exactly reduces to the rankfunction. If g ( t ) = t as a linear function, then (cid:107) Y (cid:107) ∗ ,g = (cid:107) Y (cid:107) ∗ is exactly the nuclear norm, whichis the tightest convex surrogate of the rank function. However, the rank minimization is NP-hard,while the nuclear norm minimization may over-shrink the singular values with large values [12].To better approximate the rank function, we are interested in its nonconvex relaxation (cid:107) · (cid:107) ∗ ,g with the function g to be monotonically increasing, concave and smooth. For example, a decentchoice of g : [0 , ∞ ) → R is the logarithmic function deﬁned as g ( t ) = log( t + ε ) , (2)where ε > v ∈ R N be the given noisy image and let x ∈ R N be the unknown clean log-transformedimage to be restored. We extend the pixelwise data ﬁdelity term of the Exp model [23] as shownin Table 1 to a patchwise data ﬁdelity term that is in terms of patch matrices R j ( x )’s as follows f ( x ) = J (cid:88) j =1 µ j (cid:28) R j ( x ) + R j ( v ) e R j ( x ) , R j ( N ) (cid:29) F + ρ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:115) e R j ( x ) R j ( v ) − γR j ( N ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F  , where µ j > N denotes the vector of size N × ρ > γ ≥ f isstrictly convex if ργ ≤ .The patchwise data ﬁdelity term can be further viewed as a weighted pixelwise data ﬁdelityterm. Deﬁne R (cid:62) j : R m j × n j → R N as R (cid:62) j ( Y ) = (cid:80) n j l =1 R (cid:62) jl y i , where y i ∈ R m j is the i th vector of Y . Since R j and R (cid:62) j are linear operators such that (cid:104) R j ( x ) , Y (cid:105) F = (cid:104) x , R (cid:62) j ( Y ) (cid:105) for all x ∈ R N and Y ∈ R m j × n j , where (cid:104)· , ·(cid:105) is the standard inner product for vectors, then f can be written as f ( x ) = J (cid:88) j =1 µ j (cid:68) x + v e x , ( R (cid:62) j ◦ R j ) N (cid:69) + ρ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) R j (cid:32)(cid:114) e x v − γ N (cid:33)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F  = (cid:104) x + v e x , N (cid:105) W + ρ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:114) e x v − γ N (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) W , (3)where W = (cid:80) Jj =1 µ j R (cid:62) j ◦ R j = (cid:80) Jj =1 µ j (cid:80) n j l =1 R (cid:62) jl R jl is a diagonal matrix whose main diagonalentries indicate the weighted counts for each pixel. Since we assume that each pixel belongs to atleast one nonlocal similar patch group, then W ∈ S N + and the W -weighted inner product (cid:104)· , ·(cid:105) W and the W -weighted (cid:96) norm (cid:107) · (cid:107) W are well-deﬁned. The proposed data ﬁdelity term assigns6ore weights to the image pixels that belong to multiple patch groups. It helps develop eﬃcientalgorithms and cooperates well with the framework of our algorithm introduced in section 3.Putting all the above discussion together, we come up with the following nonlocal low-rankmodel min x , Y ,..., Y J τ f ( x ) + J (cid:88) j =1 (cid:32) µ j (cid:107) Y j − R j ( x ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y j )) (cid:33) , (4)where x ∈ R N , Y j ∈ R m j × n j , f : R N → ( −∞ , + ∞ ] is deﬁned as (3), g : [0 , ∞ ) → R is deﬁnedas (2), R j : R N → R m j × n j is the (normalized) extraction of j th nonlocal similar patch matrix, m j ≤ n j , τ > µ j > λ j > j = 1 , . . . , J .Clearly, the objective function of model (4) is nonconvex and nonsmooth. Existing algorithmsare not directly applicable to this problem. It is challenging to design theoretically convergence-guaranteed and practically eﬃcient algorithms to solve this nonconvex nonsmooth optimizationproblem. In the next section, we will propose an eﬃcient algorithm for the nonlocal low-rankmodel (4) and analyze its convergence in section 4. We present a proximal alternating reweighted minimization algorithm for solving the nonconvexnonsmooth optimization problem of model (4).The nonlocal low-rank model, which has the form of model (4), regularizes the low-rank priorof patch matrices and can also be applicable to many image restoration problems such as imagedenoising and compressive sensing if the patch matrix extraction R j and the data ﬁdelity term f are appropriately selected. In the following, we consider the nonlocal low-rank model in a generalsetting. The objective function of model (4), denoted as Φ, can be written asΦ( x , Y , . . . , Y J ) = τ f ( x ) + J (cid:88) j =1 Φ j ( x , Y j ) , (5)where Φ j ( x , Y ) = µ j (cid:107) Y − R j ( x ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y )) , (6)and we assume (A1) f : R N → ( −∞ , + ∞ ] is inf-bounded, proper and lower semicontinuous, i.e., inf f > −∞ ,dom f := { x ∈ R N : f ( x ) < + ∞} (cid:54) = ∅ and f ( a ) ≤ lim inf x → a f ( x ) , ∀ a ∈ R N ; (A2) g : [0 , ∞ ) → R is monotonically increasing and concave (and nonconvex); and g is continuouslydiﬀerentiable with an L g -Lipschitz continuous gradient, i.e., | g (cid:48) ( t ) − g (cid:48) ( t ) | ≤ L g | t − t | , ∀ t ≥ , t ≥ (A3) Φ( x , Y , . . . , Y J ) is coercive, i.e.,lim (cid:107) ( x , Y ..., Y J ) (cid:107)→∞ Φ( x , Y , . . . , Y J ) = + ∞ .

7n the application of multiplicative noise removal, we utilize the nonlocal low-rank model (4)with f deﬁned as (3) and g deﬁned as (2). It is easy to verify that f satisﬁes Assumption (A1)and g satisﬁes Assumption (A2). These together with the coercivity of f and g imply that Φ isinf-bounded and coercive. Hence, Assumption (A1)-(A3) hold for our proposed model.In this general setting, no convexity or smoothness is assumed for f and the objective functionΦ of the nonlocal low-rank model (4) is nonconvex and nonsmooth. For solving this nonconvexand nonsmooth optimization problem, the alternating minimization (AM) algorithm was adoptedfor compressive sensing [9] and the augmented Lagrange multiplier (ALM) algorithm was adoptedfor speckle noise removal [32]. However, there is no guarantee that those methods will converge.Because the sequence generated by the AM algorithm may cycle indeﬁnitely without converging ifthe minimum in each alternating step is not uniquely obtained [4]; and the sequence generated bythe ALM algorithm may diverge even with bounded penalty parameters [33]. Therefore, we willpropose an algorithm called the Proximal Alternating Reweighted Minimization (PARM) algorithm customized for model (4) as shown below Y k +1 j ∈ argmin Y j (cid:101) Φ j ( x k , Y j ) + α jk (cid:107) Y j − Y kj (cid:107) F , (7) x k +1 ∈ argmin x Φ( x , Y k +11 , . . . , Y k +1 J ) + β k (cid:107) x − x k (cid:107) W , (8)where (cid:101) Φ j ( x k , Y j ) is a reweighted approximation of Φ j ( x k , Y j ) with respect to Y j , W = (cid:80) Jj =1 µ j R (cid:62) j ◦ R j ∈ S N + , and α jk > β k > (A4) For the sequences { α jk } k ∈ N , j = 1 , , . . . , J , and the sequence { β k } k ∈ N , there exist positiveconstants α − , α + , β − , β + such thatinf { α jk : k ∈ N , j = 1 , , . . . , J } ≥ α − , and inf { β k : k ∈ N } ≥ β − , sup { α jk : k ∈ N , j = 1 , , . . . , J } ≤ α + , and sup { β k : k ∈ N } ≤ β + . The convergence analysis of the PARM algorithm will be provided in the next section.The proposed PARM algorithm has a proximal alternating scheme similar to the proximalalternating linearized minimization [4] for nonconvex and nonsmooth problems proposed by Bolteet al., in which a proximal term at the previous iterate is added to each subproblem. In (7), we utilize (cid:101) Φ j ( x k , Y j ), a reweighted approximation of Φ j ( x k , Y j ), to approximate the nonconvex surrogate ofthe rank function, which yields a closed form for (7). In (8), the proximal term is in term of the W -weighted norm, which is to be consistent with the patchwise data ﬁdelity term f , for example,as deﬁned in (3). In fact, we will continue to use the W -weighted norm to measure the variable x throughout the entire paper. Moreover, as an algorithm for nonlocal low-rank models applied toimage restoration, the PARM algorithm can be intuitively interpreted as follows. Equation (7) canbe viewed as a low-rank patch matrix estimation, which returns the nonlocal patch matrices Y j ’swith a low-rank property, while equation (8) can be viewed as the image restoration step, whichaggregates all the estimated nonlocal patch matrices from (7) to form the desired image x .Before further derive our PARM algorithm, we review some preliminaries on subdiﬀerentialsand proximity operators for nonconvex and nonsmooth functions. For nonconvex and nonsmooth functions, we use the following deﬁnitions for subdiﬀerentialsand proximity operators. 8 eﬁnition 3.1 (Subdiﬀerentials)

Let f : R d → ( −∞ , + ∞ ] be a proper and lower semicontin-uous function.(1) For a given x ∈ dom f , the Fr´echet subdiﬀerential of f at x , written ˆ ∂f ( x ), is the set of allvectors u ∈ R d which satisfylim inf y (cid:54) = x y → x f ( y ) − f ( x ) − (cid:104) u , y − x (cid:105)(cid:107) y − x (cid:107) ≥ . When x / ∈ dom f , we set ˆ ∂f ( x ) = ∅ .(2) The subdiﬀerential (or called the limiting-subdiﬀerential) of f at x ∈ R d , written ∂f ( x ), isdeﬁned through the following closure process ∂f ( x ) := { u ∈ R d : ∃ x k → x , f ( x k ) → f ( x ) and u k ∈ ˆ ∂f ( x k ) → u as k → ∞} . Deﬁnition 3.2 (Proximity operators)

Let f : R d → ( −∞ , + ∞ ] be a proper and lower semi-continuous function such that inf R d f > −∞ . The proximity operator of f at x ∈ R d is deﬁnedas prox f ( x ) = argmin u ∈ R d f ( u ) + 12 (cid:107) u − x (cid:107) . Note that prox f ( x ) is a set-valued map. If f is convex, then prox f ( x ) is reduced to a single-valuedmap.The deﬁnitions above for subdiﬀerentials and proximity operators are deﬁned on vectors withrespect to the standard (cid:96) norm. Without loss of generality, these deﬁnitions can be extended tovectors with respect to the weighted (cid:96) norm and matrices with respect to the Frobenius norm.Let H ∈ S d + . The Fr´echet subdiﬀerential of f : R d → ( −∞ , + ∞ ] at a vector x ∈ R d withrespect to H is denoted as ˆ ∂ H f ( x ); its subdiﬀerential is denoted as ∂ H f ( x ); and its proximityoperator is denoted as prox H f ( x ).For the function f : R m × n → ( −∞ , + ∞ ] at a matrix X ∈ R m × n with respect to the Frobeniusnorm, its Fr´echet subdiﬀerential is denoted as ˆ ∂ F f ( X ) or ˆ ∂f ( X ); its subdiﬀerential is denoted as ∂ F f ( X ) or ∂f ( X ); and its proximity operator is denoted as prox Ff ( X ) or prox f ( X ).Now, we are ready to discuss in detail the proposed PARM algorithm in (7) and (8). To estimate low-rank patch matrices, the minimization of Φ j ( x k , Y j ), as a generalized rankminimization of the patch matrix Y j , is approximated via a reweighted scheme, as shown in (7).Since g is concave on [0 , ∞ ) and continuously diﬀerentiable, by the deﬁnition of the supergra-dient, we have g ( σ i ( Y j )) ≤ g ( σ i ( Y kj )) + ( w kj ) i ( σ i ( Y j ) − σ i ( Y kj )) , (9)where w kj = [( w kj ) , . . . , ( w kj ) m j ] (cid:62) and ( w kj ) i = g (cid:48) ( σ i ( Y kj )), i = 1 , , . . . , m j . Then we replace theterm g ( σ i ( Y j )) in Φ j ( x k , Y j ) by the right hand side of the inequality (9) and have its reweightedapproximation (cid:101) Φ j ( x k , Y j ) as follows (cid:101) Φ j ( x k , Y j ) = µ j (cid:107) Y j − R j ( x k ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y kj )) + ( w kj ) i ( σ i ( Y j ) − σ i ( Y kj )) . (10)9ence, the update of the low-rank patch matrix Y k +1 j in (7) at the ( k + 1)th step can be rewrittenas follows Y k +1 j ∈ argmin Y j µ j (cid:107) Y j − R j ( x k ) (cid:107) F + λ j m j (cid:88) i =1 ( w kj ) i σ i ( Y j ) + α jk (cid:107) Y j − Y kj (cid:107) F (11)= argmin Y j λ j m j (cid:88) i =1 ( w kj ) i σ i ( Y j ) + µ j + α jk (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Y j − µ j R j ( x k ) + α jk Y kj µ j + α jk (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F . By introducing the deﬁnition of the weighted nuclear norm of Y ∈ R m × n , m ≤ n , with theweight vector w = [ w , . . . , w m ] (cid:62) and w i ≥ i = 1 , . . . , m , as follows (cid:107) Y (cid:107) ∗ , w = m (cid:88) i =1 w i σ i ( Y ) , where σ ( Y ) ≥ σ ( Y ) ≥ · · · ≥ σ m ( Y ) ≥

0. It was proved in [7] that (cid:107) · (cid:107) ∗ , w is convex if and onlyif w ≥ w ≥ · · · ≥ w m ≥

0. In other words, for (cid:107) · (cid:107) ∗ , w being a convex function, the weights mustincrease with singular values. However, in order for large singular values to receive less penaltyto help reducing the bias and smaller singular values to receive heavier penalty to help promotingsparsity, the opposite order of the weight is desirable, i.e., 0 ≤ w ≤ w ≤ · · · ≤ w m . Under thisorder of the weights, the weighted nuclear norm is a nonconvex function and in general its proximityoperator may be a set-valued map. Fortunately, the proximity operator is a single-value map, asshown in the following lemma. Lemma 3.1 (see [7, Theorem 2.3])

For any λ > Y ∈ R m × n , m ≤ n and w = [ w , . . . , w m ] (cid:62) with 0 ≤ w ≤ w ≤ · · · ≤ w m , prox λ (cid:107)·(cid:107) ∗ , w ( Y ) = U S λ, w ( Σ ) V (cid:62) , where Y = U Σ V (cid:62) is the singular value decomposition (SVD) of Y and S λ, w ( Σ ) = diag { (Σ ii − λw i ) + } is the weighted singular value thresholding (WSVT) operator.The assumption that g is monotonically increasing and concave implies that g (cid:48) is nonnegativeand monotonically decreasing. Then the weight vector w kj satisﬁes the ascending constraint, thatis, 0 ≤ ( w kj ) ≤ · · · ≤ ( w kj ) m j . Hence, by Lemma 3.1, the low-rank patch matrix Y k +1 j can beuniquely achieved Y k +1 j = prox λjµj + αjk (cid:107)·(cid:107) ∗ , w kj (cid:32) µ j R j ( x k ) + α jk Y kj µ j + α jk (cid:33) , = 1 µ j + α jk prox λ j (cid:107)·(cid:107) ∗ , w kj (cid:16) µ j R j ( x k ) + α jk Y kj (cid:17) , = 1 µ j + α jk U k +1 j S λ j , w kj ( (cid:101) Σ kj )( V k +1 j ) (cid:62) , where U k +1 j (cid:101) Σ kj ( V k +1 j ) (cid:62) is the SVD of µ j R j ( x k ) + α jk Y kj . Remark 3.1

The ascending constraint on the weight vector w kj may not be automatically satis-ﬁed, if g is not diﬀerentiable and ( w kj ) i is chosen as a supergradient of g at σ i ( Y kj ), i.e., − ( w kj ) i ∈ ( − g )( σ i ( Y kj )), as deﬁned in [22]. For example, suppose that g is not diﬀerentiable at σ i ( Y kj ) andthen ∂ ( − g )( σ i ( Y kj )) contains more than one element. If σ i +1 ( Y kj ) = σ i ( Y kj ), then the weights − ( w kj ) i +1 and − ( w kj ) i that are selected from the same set ∂ ( − g )( σ i ( Y kj )) = ∂ ( − g )( σ i +1 ( Y kj ))may have ( w kj ) i +1 < ( w kj ) i rather than an ascending order. Thus, we have to carefully select the( w kj ) i in the case where g is not diﬀerentiable. For example, let ( w kj ) i = − min ∂ ( − g )( σ i ( Y kj )). After obtaining the estimates of the low-rank patch matrices Y k +1 j ’s from the generalized rankminimization in the previous step, we may have a situation where the same pixel may have sev-eral estimated values. That is because one pixel may belong to more than one nonlocal similarpatch matrices, when we group nonlocal similar patches by block matching. Thus, at this imagerestoration step in (8) of the PARM algorithm, we aggregate all the estimated patches to restorethe entire image by minimizing the proximal regularization of Φ( x , Y k +11 , . . . , Y k +1 J ) with respectto x .Note that the term (cid:80) Jj =1 µ j (cid:107) Y k +1 j − R j ( x ) (cid:107) F in Φ( x , Y k +11 , . . . , Y k +1 J ) can be written as J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x ) (cid:107) F = J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x k ) (cid:107) F + J (cid:88) j =1 µ j (cid:107) R j ( x ) − R j ( x k ) (cid:107) F − J (cid:88) j =1 µ j (cid:104) R j ( x ) − R j ( x k ) , Y k +1 j − R j ( x k ) (cid:105) F . Recall that R (cid:62) j : R m j × n j → R N is deﬁned as R (cid:62) j ( Y ) = (cid:80) n j l =1 R (cid:62) jl y i , where y i is the i th vectorof Y . Since (cid:104) R j ( x ) , Y (cid:105) F = (cid:104) x , R (cid:62) j ( Y ) (cid:105) = (cid:104) x , W − R (cid:62) j ( Y ) (cid:105) W and W = (cid:80) Jj =1 µ j R (cid:62) j ◦ R j ∈ S N + ,then the right hand side of the above equality can be written as J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x ) (cid:107) F = J (cid:88) j =1 µ j (cid:107) Y k +1 j − R j ( x k ) (cid:107) F + 12 (cid:107) x − x k (cid:107) W − (cid:104) x − x k , J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k (cid:105) W . The update of the estimated image x k +1 in (8) at the ( k + 1)th step can be rewritten as follows x k +1 ∈ argmin x τ f ( x ) − (cid:104) x − x k , J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k (cid:105) W + β k + 12 (cid:107) x − x k (cid:107) W (12)= prox W τβk +1 f  x k + 1 β k + 1  J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k  . The overall procedure of the PARM algorithm in (7) and (8) is summarized in Algorithm 1.

To remove multiplicative noise, we apply the PARM algorithm in Algorithm 1 to solve thenonlocal low-rank model (4) with f deﬁned as (3) and g deﬁned as (2). Accordingly, using the11 lgorithm 1 Proximal alternating reweighted minimization algorithm for model (4) Set parameters τ , µ j , λ j , α jk , and β k Set extraction R j by block matching Compute matrix W Initialize x , Y j , and w j Set k = 0 repeat for j from 1 to J do [ U k +1 j , (cid:101) Σ kj , V k +1 j ] = SVD (cid:16) µ j R j ( x k ) + α jk Y kj (cid:17) (cid:46) SVD Σ k +1 j = µ j + α jk S λ j , w kj ( (cid:101) Σ kj ) (cid:46) WSVT Y k +1 j = U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) (cid:46) Update Y k +1 j ( w k +1 j ) i = g (cid:48) ((Σ k +1 j ) ii ) (cid:46) Update w k +1 j end for x k +1 ∈ prox W τβk +1 f (cid:16) x k + β k +1 (cid:16)(cid:80) Jj =1 µ j W − R (cid:62) j ( Y k +1 j ) − x k (cid:17)(cid:17) (cid:46) Update x k +1 k ← k + 1 until stopping criterion is satisﬁeddeﬁnition of f and g , Algorithm 1 can be speciﬁcally implemented as follows. In line 11 of Algo-rithm 1, ( w k +1 j ) i = k +1 j ) ii + ε ; in line 13, the proximity operator of τβ k +1 f with respect to W canbe computed using Newton’s method. Given that ργ ≤ , the function f is strictly convex andhence prox W τβk +1 f is single-valued deﬁned as x = prox W τβk +1 f ( (cid:101) x ) = argmin x f ( x ) + β k + 12 τ (cid:107) x − (cid:101) x (cid:107) W . Since f is diﬀerentiable with respect to the W -weighted (cid:96) norm with its gradient ∇ W f ( x ) = − v e x + ρ (cid:16) e x v − γ (cid:113) e x v (cid:17) , then x is the unique solution of the following equation ∇ W f ( x ) + β k + 1 τ ( x − (cid:101) x ) = 0 , and this equation can be eﬃciently solved by Newton’s method. The aim of this section is to analyze the convergence of the PARM algorithm for model (4). Theproof is motivated by the inexact descent convergence results for Kurdyka-(cid:32)Lojasiewicz functionsin [2, 4].In the sequel, we use the notation Z := ( x , Y , . . . , Y J ) and (cid:107) Z (cid:107) := (cid:118)(cid:117)(cid:117)(cid:116) (cid:107) x (cid:107) W + J (cid:88) j =1 (cid:107) Y j (cid:107) F , and we denote by Φ( Z ) the objective function in model (4).Here are three essential conditions to guarantee convergence of the sequence { Z k } k ∈ N generatedby the PARM algorithm. 12 H1) Suﬃcient descent condition:

There exists a positive constant c such that for ∀ k ∈ N , c (cid:107) Z k +1 − Z k (cid:107) ≤ Φ( Z k ) − Φ( Z k +1 ) . (H2) Relative error condition: There exists a positive constant c such that for ∀ k ∈ N , (cid:107) A k +1 (cid:107) ≤ c (cid:107) Z k +1 − Z k (cid:107) and A k +1 ∈ ∂ Φ( Z k +1 ) . (H3) Continuity condition: There exists a subsequence { Z k t } t ∈ N and Z ∗ such thatlim t →∞ Z k t = Z ∗ and lim t →∞ Φ( Z k t ) = Φ( Z ∗ ) . In the following, we prove that the sequence { Z k } k ∈ N satisﬁes Condition (H1)-(H3), and thenconclude that { Z k } k ∈ N converges to a critical point of Φ using the fact that Φ is a Kurdyka-(cid:32)Lojasiewicz function. We show that the objective function Φ in model (4) evaluated at Z k , denoted Φ( Z k ), decreasessuﬃciently as k increases. Proposition 4.1 (Suﬃcient descent condition)

Suppose that the objective function Φ inmodel (4) satisﬁes Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARMalgorithm provided that the parameters satisfy Assumption (A4). Then { Φ( Z k ) } k ∈ N is strictlydecreasing and, in particular, there exists a positive constant c such that for ∀ k ∈ N , c (cid:107) Z k +1 − Z k (cid:107) ≤ Φ( Z k ) − Φ( Z k +1 ) . (13) Proof.

Let Φ j ( x , Y ) be deﬁned as (6) and let (cid:101) Φ j ( x k , Y j ) be deﬁned as (10). Then, accordingthe concavity of g illustrated in inequality (9), Φ j ( x k , Y k +1 j ) and its reweighted approximation (cid:101) Φ j ( x k , Y j ) have the following relationshipΦ j ( x k , Y k +1 j ) ≤ (cid:101) Φ j ( x k , Y k +1 j ) and Φ j ( x k , Y kj ) = (cid:101) Φ j ( x k , Y kj ) . Thus, the objective function Φ in (5) evaluated at x k and Y k +1 j ’s can be rewritten asΦ( x k , Y k +11 , . . . , Y k +1 J ) = τ f ( x k ) + J (cid:88) j =1 Φ j ( x k , Y k +1 j ) ≤ τ f ( x k ) + J (cid:88) j =1 (cid:101) Φ j ( x k , Y k +1 j ) . By the update of Y k +1 j in (7), we have (cid:101) Φ j ( x k , Y k +1 j ) + α jk (cid:107) Y k +1 j − Y kj (cid:107) F ≤ (cid:101) Φ j ( x k , Y kj ) = Φ j ( x k , Y kj ) . Combining the two inequalities above, we have the following inequality onΦ( x k , Y k +11 , . . . , Y k +1 J ) and Φ( Z k )Φ( x k , Y k +11 , . . . , Y k +1 J ) ≤ τ f ( x k ) + J (cid:88) j =1 Φ j ( x k , Y kj ) − J (cid:88) j =1 α jk (cid:107) Y k +1 j − Y kj (cid:107) F , = Φ( Z k ) − J (cid:88) j =1 α jk (cid:107) Y k +1 j − Y kj (cid:107) F .

13y the update of x k +1 in (8), we haveΦ( Z k +1 ) ≤ Φ( x k , Y k +11 , . . . , Y k +1 J ) − β k (cid:107) x k +1 − x k (cid:107) W . Combining the two inequalities above, we have that Φ( Z k +1 ) and Φ( Z k ) satisfy the followinginequality β k (cid:107) x k +1 − x k (cid:107) W + J (cid:88) j =1 α jk (cid:107) Y k +1 j − Y kj (cid:107) F ≤ Φ( Z k ) − Φ( Z k +1 ) . Equation (13) holds with c = min { β − , α − } > { Φ( Z k ) } k ∈ N is strictly decreasing. Here, β − and α − are two positive parameters given in Assumption (A4).The suﬃcient descent condition proved in Proposition 4.1 immediately yields the followingcorollary. Corollary 4.1

Summing inequality (13) from k = 0 to k = K −

1, we have c K − (cid:88) k =0 (cid:107) Z k +1 − Z k (cid:107) ≤ Φ( Z ) − Φ( Z K ) ≤ Φ( Z ) − Φ inf , where Φ inf = inf Z Φ( Z ) > −∞ .Taking K → ∞ , we have ∞ (cid:88) k =0 (cid:107) Z k +1 − Z k (cid:107) < ∞ , which implies lim k →∞ (cid:107) Z k +1 − Z k (cid:107) = 0. Before proving that a subgradient of Φ at Z k +1 is upper bounded by the iterates gap, we ﬁrstcharacterize the subdiﬀerential of Φ.Recall that the variable x is measured in terms of the W -weight (cid:96) norm and that the vari-ables Y j ’s are measured in terms of the Frobenius norm. Then using the notations introduced insubsection 3.1 we deﬁne the subdiﬀerential of Φ by ∂ Φ( Z ) = (cid:8) ( A x , A Y , . . . , A Y J ) : A x ∈ ∂ Wx Φ( Z ) , A Y j ∈ ∂ Y j Φ( Z ) , j = 1 , . . . , J (cid:9) , where ∂ Wx Φ( Z ) is the partial subdiﬀerential of Φ with respect to the variable x and with respect tothe W -weight (cid:96) norm and ∂ Y j Φ( Z ) is the partial subdiﬀerential of Φ with respect to the variable Y j and with respect to the Frobenius norm.By the deﬁnition of Φ in model (4) and the fact that J (cid:88) j =1 µ j (cid:107) Y j − R j ( x ) (cid:107) F = 12 (cid:104) x , x − J (cid:88) j =1 µ j W − R (cid:62) j ( Y j ) (cid:105) W + J (cid:88) j =1 µ j (cid:107) Y j (cid:107) F ,

14e have ∂ Wx Φ( Z ) = τ ∂ W f ( x ) + x − J (cid:88) j =1 µ j W − R (cid:62) j ( Y j )and ∂ Y j Φ( Z ) = µ j ( Y j − R j ( x )) + λ j ∂ (cid:32) m j (cid:88) i =1 g ◦ σ i (cid:33) ( Y j ) . To compute the subdiﬀerential of the singular value function (cid:80) m j i =1 g ◦ σ i and further characterize ∂ Y j Φ( Z ), we introduce some deﬁnitions and a lemma on singular value functions in [19, 20]. Deﬁnition 4.1

A function f : R n → R is absolutely symmetric if f ( x , x , . . . , x n ) = f ( | x π (1) | , | x π (2) | , . . . , | x π ( n ) | ) , for any permutation π . Deﬁnition 4.2

A function F : R m × n → R , m ≤ n , is a singular value function if F ( X ) =( f ◦ σ )( X ), where f : R m → R is an absolutely symmetric function, σ ( X ) = [ σ ( X ) , . . . , σ m ( X )] (cid:62) and σ i ( X ) is the i th largest singular value of X .The function (cid:80) mi =1 g ◦ σ i can be viewed as a singular value function of the form (cid:32) m (cid:88) i =1 g ◦ σ i (cid:33) ( Y ) = ( (cid:101) g ◦ σ )( Y ) , where (cid:101) g : R m → R is deﬁned as (cid:101) g ( t ) = (cid:80) mi =1 g ( | t i | ) and is absolutely symmetric. Lemma 4.1

The subdiﬀerential of a singular value function f ◦ σ at X ∈ R m × n is given by theformula ∂ ( f ◦ σ )( X ) = (cid:110) U diag( d ) V (cid:62) : d ∈ ∂f ( σ ( X )) , ( U , V ) ∈ M ( X ) (cid:111) , where M ( X ) = (cid:8) ( U , V ) ∈ R m × l × R n × l : U (cid:62) U = V (cid:62) V = I , X = U diag( σ ( X )) V (cid:62) (cid:9) .By Lemma 4.1, the subdiﬀerential of (cid:80) mi =1 g ◦ σ i at Y ∈ R m × n can be computed as follows ∂ (cid:32) m (cid:88) i =1 g ◦ σ i (cid:33) ( Y ) = { U diag( d ) V (cid:62) : d i = c i g (cid:48) [ σ i ( Y )] , c i ∈ ∂ | · | ( σ i ( Y )) ,i = 1 , . . . , m, ( U , V ) ∈ M ( Y ) } , where ∂ | · | ( σ i ( Y )) = (cid:40) { } , if σ i ( Y ) > − , , if σ i ( Y ) = 0 . Next, we are ready to derive a subgradient of Φ at Z k +1 using the lemma below and to provethat it is upper bounded. Lemma 4.2

Suppose that the objective function Φ in model (4) satisﬁes Assumption (A1)-(A3).Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameters15atisfy Assumption (A4). Let U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) be the SVD of Y k +1 j . Then, for each k and each j , there exists c k +1 j ∈ R m j such that( c k +1 j ) i ∈ ∂ | · | ( σ i ( Y k +1 j )) , i = 1 , . . . , m j , (14)and λ j U k +1 j diag( d k +1 j )( V k +1 j ) (cid:62) = − α jk ( Y k +1 j − Y kj ) − µ j ( Y k +1 j − R j ( x k )) , (15)where diag( d k +1 j ) = diag( c k +1 j ) diag( w kj ). Proof.

According to the update of Y k +1 j in (11), we have0 ∈ µ j ( Y k +1 j − R j ( x k )) + λ j ∂ (cid:107) · (cid:107) ∗ , w kj ( Y k +1 j ) + α jk ( Y k +1 j − Y kj ) . Since the weighted nuclear norm (cid:107) · (cid:107) ∗ , w is a singular value function, then by Lemma 4.1 thesubdiﬀerential of (cid:107) · (cid:107) ∗ , w can be computed as follows ∂ (cid:107) · (cid:107) ∗ , w ( Y ) = { U diag( d ) V (cid:62) : d i = c i w i , c i ∈ ∂ | · | ( σ i ( Y )) , i = 1 , . . . , m, ( U , V ) ∈ M ( Y ) } . Note that ( U k +1 j , V k +1 j ) ∈ M ( Y k +1 j ). Thus, there exists c k +1 j ∈ R m j such that (14) holds and − α jk ( Y k +1 j − Y kj ) − µ j ( Y k +1 j − R j ( x k )) = λ j U k +1 j diag( d k +1 j )( V k +1 j ) (cid:62) ∈ λ j ∂ (cid:107) · (cid:107) ∗ , w kj ( Y k +1 j ) , where diag( d k +1 j ) = diag( c k +1 j ) diag( w kj ). Proposition 4.2 (Relative error condition)

Suppose that the objective function Φ in model(4) satisﬁes Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARM algo-rithm provided that the parameters satisfy Assumption (A4). Let U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) be the SVDof Y k +1 j and let c k +1 j and d k +1 j be in R m j satisfying (14) and (15).Deﬁne A k +1 = ( A k +1 x , A k +1 Y , . . . , A k +1 Y J ), where A k +1 x = β k ( x k +1 − x k ) (16)and A k +1 Y j = µ j ( R j ( x k ) − R j ( x k +1 )) + λ j U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) − α jk ( Y k +1 j − Y kj ) , (17)where diag( (cid:101) d k +1 j ) = diag( c k +1 j ) diag( w k +1 j ).Then the following assertions hold for ∀ k ∈ N ,(a) A k +1 ∈ ∂ Φ( Z k +1 );(b) (cid:107) A k +1 (cid:107) ≤ c (cid:107) Z k +1 − Z k (cid:107) , for some c > Proof. (a) According to the update of x k +1 in (12), we have0 ∈ τ ∂ W f ( x k +1 ) + x k − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) + ( β k + 1)( x k +1 − x k ) . A k +1 x in (16) implies A k +1 x ∈ τ ∂ W f ( x k +1 ) + x k +1 − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k +1 j ) = ∂ Wx Φ( Z k +1 ) . Also, for each j , the deﬁnition of A k +1 Y j in (17) and Lemma 4.2 imply A k +1 Y j = µ j ( Y k +1 j − R j ( x k +1 )) + λ j U k +1 j diag( (cid:101) d k +1 j )( V k +1 j ) (cid:62) ∈ µ j ( Y k +1 j − R j ( x k +1 )) + λ j ∂ (cid:107) · (cid:107) ∗ , w k +1 j ( Y k +1 j )= µ j ( Y k +1 j − R j ( x k +1 )) + λ j ∂ (cid:32) m j (cid:88) i =1 g ◦ σ i (cid:33) ( Y k +1 j )= ∂ Y j Φ( Z k +1 ) . (b) It follows from the Cauchy-Schwarz inequality that (cid:107) A k +1 (cid:107) ≤ (cid:107) A k +1 x (cid:107) W + J (cid:88) j =1 (cid:107) A k +1 Y j (cid:107) F , where (cid:107) A k +1 x (cid:107) W = β k (cid:107) x k +1 − x k (cid:107) W and (cid:107) A k +1 Y j (cid:107) F ≤ µ j (cid:107) R j ( x k ) − R j ( x k +1 ) (cid:107) F + λ j (cid:107) U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) (cid:107) F + α jk (cid:107) Y k +1 j − Y kj (cid:107) F . The right hand side of the above inequality can be computed term by term as follows. The squareof the ﬁrst term is bounded above by the square of the weighted iterates of the variable x , µ j (cid:107) R j ( x k ) − R j ( x k +1 ) (cid:107) F ≤ µ j J (cid:88) j =1 µ j (cid:107) R j ( x k − x k +1 ) (cid:107) F = µ j (cid:107) x k − x k +1 (cid:107) W . This implies that µ j (cid:107) R j ( x k ) − R j ( x k +1 ) (cid:107) F ≤ √ µ j (cid:107) x k − x k +1 (cid:107) W .Also, the second term is bounded above by the iterates of the variable Y j . Since (cid:107) (cid:101) d k +1 j − d k +1 j (cid:107) ≤(cid:107) (cid:101) d k +1 j − d k +1 j (cid:107) , then we have λ j (cid:107) U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) (cid:107) F = λ j (cid:107) (cid:101) d k +1 j − d k +1 j (cid:107) ≤ λ j m j (cid:88) i =1 (cid:12)(cid:12)(cid:12) ( c k +1 j ) i (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ( w k +1 j ) i − ( w kj ) i (cid:12)(cid:12)(cid:12) ≤ λ j m j (cid:88) i =1 (cid:12)(cid:12)(cid:12) g (cid:48) ( σ i ( Y k +1 j )) − g (cid:48) ( σ i ( Y kj )) (cid:12)(cid:12)(cid:12) . Using the condition that g (cid:48) is L g -Lipschitz continuous, we further obtain λ j (cid:107) U k +1 j diag( (cid:101) d k +1 j − d k +1 j )( V k +1 j ) (cid:62) (cid:107) F ≤ λ j m j (cid:88) i =1 L g | σ i ( Y k +1 j ) − σ i ( Y kj ) |≤ λ j m j L g (cid:107) Y k +1 j − Y kj (cid:107) F , (cid:107) Y k +1 j − Y kj (cid:107) ≤ (cid:107) Y k +1 j − Y kj (cid:107) F .Therefore, combining all the inequalities above, we obtain (cid:107) A k +1 (cid:107) ≤ ( β k + M µ ) (cid:107) x k +1 − x k (cid:107) W + J (cid:88) j =1 ( λ j m j L g + α jk ) (cid:107) Y k +1 j − Y kj (cid:107) F ≤ c (cid:107) Z k +1 j − Z kj (cid:107) , where M µ = (cid:80) Jj =1 √ µ j and c = max { β + + M µ , λ m L g + α + , . . . , λ J m J L g + α + } .The relative error condition proved in Proposition 4.2 immediately yields the following corollary. Corollary 4.2

Suppose that the objective function Φ in model (4) satisﬁes Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). Deﬁne A k +1 = ( A k +1 x , A k +1 Y , . . . , A k +1 Y J ), where A k +1 x is deﬁned as (16)and A k +1 Y j is deﬁned as (17), j = 1 , . . . , J . Thenlim k →∞ (cid:107) A k +1 (cid:107) = 0 . Proof.

The result is immediately followed by Proposition 4.2 and Corollary 4.1.

We ﬁrst show the existence of a limit point of { Z k } k ∈ N using the boundedness of { Z k } k ∈ N ,and then prove a continuity condition for any convergent subsequence of { Z k } k ∈ N , which impliesCondition (H3). Proposition 4.3

Suppose that the objective function Φ in model (4) satisﬁes Assumption (A1)-(A3). Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). Let S denote the set of all limit points of the sequence { Z k } k ∈ N . Thenthe following assertions hold.(a) S (cid:54) = ∅ ;(b) If { Z k t } t ∈ N is a subsequence of { Z k } k ∈ N such that lim t →∞ Z k t = Z ∗ ∈ S , thenlim t →∞ Φ( Z k t ) = Φ( Z ∗ ) . Proof. (a) We show that { Z k } k ∈ N is bounded by contradiction.Assume for the sake of contradiction that there exists a subsequence { Z k l } l ∈ N such that (cid:107) Z k l (cid:107) → ∞ as l → ∞ . According to Assumption (A3), Φ is coercive, and then Φ( Z k l ) → ∞ as l → ∞ . However, since { Φ( Z k ) } k ∈ N is strictly decreasing and lower bounded by Φ inf > −∞ ,then { Φ( Z k ) } k ∈ N converges and { Φ( Z k l ) } l ∈ N also converges, which yields a contradiction. Thus, { Z k } k ∈ N is bounded and there exists a convergent subsequence of { Z k } k ∈ N .(b) Let { Z k t } t ∈ N be a subsequence such that Z k t → Z ∗ as t → ∞ .Since f is lower semicontinuous, then we havelim inf t →∞ τ f ( x k t +1 ) ≥ τ f ( x ∗ ) . x k t +1 referring to (12), we obtain the following inequality τ f ( x k t +1 ) + (cid:104) x k t +1 − x k t , x k t − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k t +1 j ) (cid:105) W + β k t + 12 (cid:107) x k t +1 − x k t (cid:107) W ≤ τ f ( x ∗ ) + (cid:104) x ∗ − x k t , x k t − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k t +1 j ) (cid:105) W + β k t + 12 (cid:107) x ∗ − x k t (cid:107) W . Letting t → ∞ on both sides of the above inequality, we getlim sup t →∞ τ f ( x k t +1 ) ≤ τ f ( x ∗ ) + lim sup t →∞ (cid:104) x ∗ − x k t , x k t − J (cid:88) j =1 µ j W − R (cid:62) j ( Y k t +1 j ) (cid:105) W + β k t + 12 (cid:107) x ∗ − x k t (cid:107) W = τ f ( x ∗ ) , where we use the boundedness of the sequences { x k t +1 } t ∈ N , { Y k t +1 j } t ∈ N and { β k t } t ∈ N and the resultthat lim t →∞ (cid:107) x k t +1 − x k t (cid:107) W → t →∞ τ f ( x k t +1 ) = f ( x ∗ ).Due to the continuity of µ j (cid:107) Y j − R j ( x ) (cid:107) F with respect to Y j and x and the continuity of g ( t )with respect to t , we havelim t →∞ Φ( Z k t +1 ) = τ f ( x ∗ ) + J (cid:88) j =1 (cid:32) µ j (cid:107) Y ∗ j − R j ( x ∗ ) (cid:107) F + λ j m j (cid:88) i =1 g ( σ i ( Y ∗ j )) (cid:33) = Φ( Z ∗ ) . In this subsection, we show the convergence of the sequence { Z k } k ∈ N generated by the PARMalgorithm.Let us ﬁrst review a deﬁnition and a theorem on the Kurdyka-(cid:32)Lojasiewicz (KL) property of afunction in [1, 2]. Deﬁnition 4.3 (Kurdyka-(cid:32)Lojasiewicz)

Let f : R d → ( −∞ , + ∞ ] be proper and lower semi-continuous.(a) The function f is called to have the Kurdyka-(cid:32)Lojasiewicz (KL) property at ˜ x ∈ dom ∂f ifthere exist η ∈ (0 , + ∞ ], a neighborhood U of ˜ x and a continuous function ϕ : [0 , η ) → [0 , ∞ )such that(i) ϕ (0) = 0;(ii) ϕ is C on (0 , η ) and continuous at 0;(iii) for all s ∈ (0 , η ), ϕ (cid:48) ( s ) > x ∈ U ∩ { x ∈ R d : f ( ˜ x ) < f ( x ) < f ( ˜ x ) + η } , the following Kurdyka-(cid:32)Lojasiewiczinequality holds ϕ (cid:48) ( f ( x ) − f ( ˜ x ))dist( , ∂f ( x )) ≥ . f is called a KL function if f has the KL property at each point of dom ∂f . Theorem 4.1 (see [2, Theorem 2.9])

Let f : R d → ( −∞ , + ∞ ] be a proper lower semicon-tinuous function. Consider a sequence { x k } k ∈ N that satisﬁes Condition (H1)-(H3). If f has theKurdyka-(cid:32)Lojasiewicz property at the limit point x ∗ speciﬁed in (H3), then the sequence { x k } k ∈ N converges to x ∗ as k goes to ∞ , and x ∗ is a critical point of f . Moreover, the sequence { x k } k ∈ N has a ﬁnite length, i.e., ∞ (cid:88) k =0 (cid:107) x k +1 − x k (cid:107) < ∞ . The KL theory is a powerful tool for nonconvex nonsmooth optimization problems and KLfunctions are ubiquitous. For example, for multiplicative noise removal, the objective function Φ inmodel (4) with f deﬁned as (3) and g as (2) is a KL function. For more examples of KL functionssee [1, 2].Next, equipped with Condition (H1)-(H3) discussed in the previous subsections, we can showthat any limit point of { Z k } k ∈ N is a critical point of Φ in the following theorem. Theorem 4.2

Suppose that the objective function Φ in model (4) satisﬁes Assumption (A1)-(A3).Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). Let S denote the set of all limit points of the sequence { Z k } k ∈ N and letcrit(Φ) denote the set of all critical points of the function Φ. Then ∅ (cid:54) = S ⊆ crit(Φ), that is, anylimit point of { Z k } k ∈ N is a critical point of Φ. Proof.

Let Z ∗ be in S (cid:54) = ∅ and let { Z k t } t ∈ N be a subsequence of { Z k } k ∈ N such that lim t →∞ Z k t = Z ∗ . Then by Proposition 4.3, lim t →∞ Φ( Z k t ) = Φ( Z ∗ ). Also, it follows from Proposition 4.2 andCorollary 4.2 that A k t ∈ ∂ Φ( Z k t ) and A k t → as t → ∞ . Thus, by the deﬁnition of subdiﬀerentialin Deﬁnition 3.1, we have ∈ ∂ Φ( Z ∗ ).In addition to Condition (H1)-(H3), if Φ is a Kurdyka-(cid:32)Lojasiewicz (KL) function, then a strongerconvergence result can be achieved for the sequence { Z k } k ∈ N . That is, we can prove that thesequence { Z k } k ∈ N itself converges a critical point of Φ using the KL theory. Theorem 4.3

Suppose that the objective function Φ in model (4) satisﬁes Assumption (A1)-(A3).Let { Z k } k ∈ N be the sequence generated by the PARM algorithm provided that the parameterssatisfy Assumption (A4). If Φ is a KL function, then the following assertions hold.(a) The sequence { Z k } k ∈ N has ﬁnite length, that is, ∞ (cid:88) k =0 (cid:107) Z k +1 − Z k (cid:107) < ∞ ;(b) The sequence { Z k } k ∈ N converges to a critical point of Φ. Proof.

It follows from Proposition 4.1, Proposition 4.2 and Proposition 4.3 that the sequence { Z k } k ∈ N satisﬁes Condition (H1)-(H3), respectively. Then the results (a) and (b) immediatelyfollow from Theorem 4.1. 20 Numerical Results

In this section, we ﬁrst describe a practical version of Algorithm 1 and then test the proposedalgorithms to solve the proposed nonlocal low-rank model for multiplicative noise removal. Wecompare our proposed method with six existing methods: the DZ method [10], the HNW method[16], the I-DIV method [29], the TwL-mV method [18], the learned dictionary (Dict) method [15]and the SAR-BM3D method [25]. Numerical results show superior performance of the proposedmethod over the existing ones.The experiments were implemented in Matlab 2016b running a 64 bit Ubuntu 18.04 systemand executed on an eight-core Intel Xeon E5-2640v3 128GB CPU at 2.6 GHz, with four NVIDIATesla P100 16GB GPUs. The proposed algorithms were accelerated using graphics processing units(GPUs), as the estimation of each patch matrix can be computed in parallel.

The PARM algorithm presented in Algorithm 1 converges theoretically as shown in section 4,if the patch extraction operator R j is assumed to be ﬁxed. The extraction operator R j playsan important role in improving the denoising performance because a better initialization of R j can yield to a better denoised image. In the case in which the optimal R j is not available, it isempirically challenging to ﬁnd an appropriate choice of R j only with a noisy image given. Algorithm 2

Practical version of the PARM algorithm Set parameters τ , µ j , λ j , and β k Initialize x and w j for k from 0 to K − do Set extraction ˆ R j by block matching Compute matrix ˆ W for j from 1 to J do [ U k +1 j , (cid:101) Σ kj , V k +1 j ] = SV D (cid:16) ˆ R j ( x k ) (cid:17) (cid:46) SVD Σ k +1 j = S λ j /µ j , w kj ( (cid:101) Σ kj ) (cid:46) WSVT Y k +1 j = U k +1 j Σ k +1 j ( V k +1 j ) (cid:62) (cid:46) Update Y k +1 j ( w k +1 j ) i = g (cid:48) ((Σ k +1 j ) ii ) (cid:46) Update w k +1 j end for x k +1 ∈ prox ˆ W τβk +1 f (cid:16) x k + β k +1 (cid:16)(cid:80) Jj =1 µ j ˆ W − ˆ R j (cid:62) ( Y k +1 j ) − x k (cid:17)(cid:17) (cid:46) Update x k +1 end for Here, we provide a practical version of the PARM algorithm with dynamically updated patchextraction operator, denoted as ˆ R j . The operator ˆ R j is recomputed at each step by block matchingbased on the update of the estimated image e x k , and the weighted counts matrix, now denoted as ˆ W , is recomputed based on the newest ˆ R j . As a result of this dynamically updating scheme on ˆ R j ,the patch matrix Y k +1 j and Y kj may not refer to the same patch group. That is because Y k +1 j isassociated with ˆ R j ( x k ), while Y kj is associated with ˆ R j ( x k − ) using a diﬀerent extraction operatorˆ R j . Hence, in this practical version of the PARM algorithm, we set that Y k +1 j is updated withoutusing the previous update Y kj and its parameter α jk . The overall procedure of a practical versionof the PARM algorithm for multiplicative noise removal is summarized as Algorithm 2.21 .2 Parameter settings First, we utilize block matching and normalization with mean zero to extract patch matricesusing the following parameter settings for block matching. In Algorithm 1, the ﬁxed extraction R j is initialized via block matching based on the estimated image from the SAR-BM3D method;and in Algorithm 2, the dynamically updated extraction ˆ R j is computed at each step via blockmatching based on the update of e x k . Besides this, both algorithms share the same parametersettings for block matching, including the search window, the patch size and the number of patchesin each patch group as presented in Table 2. Table 2: Settings for block matching. L Search window Patch size Patch number ×

10 1503 50 9 × × Second, we set the model parameters and algorithm parameters for Algorithm 1 and Algo-rithm 2, respectively. The model parameters τ , λ j ’s, µ j ’s, ρ , γ and ε are adaptive to the noise level.The algorithm parameters ( α jk ’s and) β k ’s inﬂuence the computational speed. The settings of theabove parameters are presented in Table 3 and Table 4. Table 3: Parameter settings for Algorithm 1. L Standard images Remote images Common parameters τ λ j τ λ j µ j ρ γ ε α jk β k β k /

50 1.8 β k /

100 1 1 0.01 4 10 − β k /

150 1 β k /

150 0.45 1 1.5 1.9 10 − β k /

250 0.6 β k /

200 0.15 1 2 1.3 10 − L Standard images Remote images Common parameters τ λ j τ λ j µ j ρ γ ε β k β k /

50 2.6 β k /

50 2.6 1 0.01 4 10 − β k /

150 1.3 β k /

150 1.2 1 1.5 1.9 10 − β k /

250 0.8 β k /

250 0.7 1 2 1.3 10 − Third, the initialization settings and the stopping criteria are set diﬀerently for Algorithm 1 andAlgorithm 2. Algorithm 1 is initialized using the estimated image from the SAR-BM3D methodand is terminated if the relative error reaches a tolerance threshold as follows (cid:107) x k +1 − x k (cid:107) W (cid:107) x k (cid:107) W < max (cid:26) − , (cid:107) x − x (cid:107) W (cid:107) x (cid:107) W × (cid:27) . Algorithm 2 is initialized using the given noisy image and terminated by experience based on thenumber of iterations K . For L = 1 , , K is set to 65-70, 23-25, 18-20, respectively.Lastly, the restored image is estimated by ˆ u = e ˆ x , where ˆ x is the log-transformed image obtainedfrom the proposed algorithms. 22 .3 Numerical results tested on standard test images In this experiment, we use standard test images “Monarch”, “Lena” and “House” all of size256 × L = 1, L = 3 and L = 5. (a) Monarch (b) Lena (c) HouseFigure 1: Standard test images. The evaluation of the image quality is measured in the intensity format between the originalimage u ∈ R N and the estimated image ˆ u ∈ R N , using the peak-signal-to-noise ratio (PSNR)deﬁned as PSNR = 10 log (cid:18) N (cid:107) u − ˆ u (cid:107) (cid:19) and the structural similarity index measure (SSIM) [35]. Table 5: Numerical results tested on standard test images at diﬀerent noise levels by diﬀerent methods.

Image L Meas. Alg 1 Alg 2 SAR- DZ HNW I-DIV TwL- DictBM3D 4V

Monarch 1 PSNR

Table 5 reports the PSNR and SSIM values of the denoised images tested on three standard testimages. The best results for each case are marked in bold and the second-best results are underlined.23 a) Noisy image (L=1) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 2: Comparison of denoised images restored from “Monarch” at noise level L = 1 by diﬀerent methods. The(PSNR, SSIM) values for each denoised image: (c) Alg 1 (21.94dB, 0.6926); (d) Alg 2 (21.55dB, 0.6966); (e) SAR-BM3D (21.36dB, 0.6404); (f) DZ (19.38dB, 0.5758); (g) HNW (19.73dB, 0.5523); (h) I-DIV (19.91dB, 0.5883); (i)TwL-4V (19.26dB, 0.5848); (j) Dictionary (19.50dB, 0.5726).(a) Noisy image (L=3) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 3: Comparison of denoised images restored from “Lena” at noise level L = 3 by diﬀerent methods. The (PSNR,SSIM) values for each denoised image: (c) Alg 1 (26.44dB, 0.7892); (d) Alg 2 (26.29dB, 0.7944); (e) SAR-BM3D(26.00dB, 0.7596); (f) DZ (24.06dB, 0.6907); (g) HNW (24.36dB, 0.6911); (h) I-DIV (24.48dB, 0.7073); (i) TwL-4V(24.29dB, 0.7128); (j) Dictionary (24.81dB, 0.7379). a) Noisy image (L=5) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 4: Comparison of denoised images restored from “House” at noise level L = 5 by diﬀerent methods. The(PSNR, SSIM) values for each denoised image: (c) Alg 1 (29.04dB, 0.8115); (d) Alg 2 (29.12dB, 0.8163); (e) SAR-BM3D (28.36dB, 0.7641); (f) DZ (25.70dB, 0.7339); (g) HNW (25.73dB, 0.6995); (h) I-DIV (25.84dB, 0.7291); (i)TwL-4V (25.79dB, 0.7197); (j) Ditionary (24.56dB, 0.6474). Both Algorithm 1 and Algorithm 2 outperform all the other methods in terms of PSNR and SSIMvalues. Compared with the benchmark SAR-BM3D method, Algorithm 1 achieves 0.54-0.59dB,0.42-0.66dB and 0.46-0.68dB improvements in PSNR when L = 1, L = 3 and L = 5, respectively.Algorithm 2 with updated patch extraction also surpasses the SAR-BM3D method and it evensurpasses Algorithm 1 in some of the cases, especially in terms of SSIM values.Figure 2-4 present the denoised images tested on “Monarch” at noise level L = 1, “Lena” at L = 3 and “House” at L = 5. In terms of the visual quality, Algorithm 1 and Algorithm 2 performbetter than other methods, because they reconstruct more details and more smooth textures, butless noise and fewer artifacts. For example, compared to the DZ method, the HNW method, theI-DIV method, the TwL-mV method and the learned dictionary method, the proposed methodspreserve more details of the hair of “Lena” and generate more smooth textures on the wings of“Monarch” and the sky of “House”. Compared to the benchmark SAR-BM3D method, the proposedmethods generate fewer artifacts, resulting in better images in terms of PSNR and SSIM values. In this experiment, we use remote sensing images “Remote 1” and “Remote 2” both of size512 × ×

632 as shown in Figure 5. To generate the observedimages, we degrade the original test images by multiplicative Gamma noise at L = 1, L = 3 and L = 5. The image quality is evaluated using PSNR and SSIM values.Table 6 reports the PSNR and SSIM values of the denoised images tested on three remote sensingimages. Algorithm 1 and Algorithm 2 achieve great performance in PSNR and SSIM values overother methods. For example, Algorithm 1 outperforms the benchmark SAR-BM3D method by0.11-0.28dB, 0.06-0.19dB and 0.06-0.15dB in PSNR when L = 1, L = 3 and L = 5, respectively;and it outperforms the other traditional methods by 0.76-1.57dB, 0.94-2.93dB and 0.86-3.62dB inPSNR when L = 1, L = 3 and L = 5, respectively. Algorithm 2 is also comparable to Algorithm 125 a) Remote 1 (b) Remote 2 (c) Remote 3Figure 5: Remote sensing images. Table 6: Numerical results tested on remote sensing images at diﬀerent noise levels by diﬀerent methods.

Image L Meas. Alg 1 Alg 2 SAR- DZ HNW I-DIV TwL- DictBM3D 4V

Remote 1 1 PSNR a) Noisy image (L=1) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 6: Comparison of denoised images restored from “Remote 1” at noise level L = 1 by diﬀerent methods.The (PSNR, SSIM) values for each denoised image: (c) Alg 1 (21.23dB, 0.5459); (d) Alg 2 (21.11dB, 0.5510); (e)SAR-BM3D (21.12dB, 0.5393); (f) DZ (20.47dB, 0.4950); (g) HNW (20.24dB, 0.4551); (h) I-DIV (20.03dB, 0.4709);(i) TwL-4V (20.07dB, 0.4934); (j) Dictionary (20.44dB, 0.4867).(a) Noisy image (L=3) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 7: Comparison of denoised images restored from “Remote 2” at noise level L = 3 by diﬀerent methods.The (PSNR, SSIM) values for each denoised image: (c) Alg 1 (24.13dB, 0.6471); (d) Alg 2 (24.07dB, 0.6474); (e)SAR-BM3D (24.03dB, 0.6449); (f) DZ (22.76dB, 0.5758); (g) HNW (22.71dB, 0.5805); (h) I-DIV (22.49dB, 0.5744);(i) TwL-4V (22.66dB, 0.5845); (j) Dictionary (22.34dB, 0.5592). a) Noisy image (L=5) (b) Ground truth (c) Alg 1 (d) Alg 2 (e) SAR-BM3D(f) DZ (g) HNW (h) I-DIV (i) TwL-4V (j) DictionaryFigure 8: Comparison of denoised images restored from “Remote 3” at noise level L = 5 by diﬀerent methods.The (PSNR, SSIM) values for each denoised image: (c) Alg 1 (25.80dB, 0.7427); (d) Alg 2 (25.83dB, 0.7460); (e)SAR-BM3D (25.65dB, 0.7316); (f) DZ (24.45dB, 0.6745); (g) HNW (23.79dB, 0.6562); (h) I-DIV (24.10dB, 0.6713);(i) TwL-4V (24.30dB, 0.6737); (j) Dictionary (22.70dB, 0.5736). and the SAR-BM3D method.Figure 6-8 present the denoised images by diﬀerent methods tested on “Remote 1” at noiselevel L = 1, “Remote 2” at L = 3 and “Remote 3” at L = 5. Algorithm 1, Algorithm 2 and thebenchmark SAR-BM3D method achieve signiﬁcantly better visual quality over other methods. Forexample, they reconstruct buildings, roads and patterns with ﬁne edges and textures. In this experiment, we use real SAR images images “SAR 1” of size 370 ×

370 and “SAR 2” ofsize 350 ×

350 as shown in Figure 9(a) and Figure 10(a), respectively.Figure 9 and Figure 10 demonstrate that Algorithm 1 and Algorithm 2 achieve better denoisingperformance than other methods. For example, they reconstruct more local structures and smoothtextures than the DZ method, the HNW method, the I-DIV method, the TwL-4V method andthe learned dictionary method; and they remove more noise and generate fewer artifacts than thebenchmark SAR-BM3D method.In addition to the visual quality comparison on the denoised images, we can also receive guidanceby computing the equivalent number of looks (ENL) and analyzing the ratio images for diﬀerentmethods.The ENL of an estimated image ˆ u ∈ R N measures the multiplicative noise reduction in homo-geneous regions and is deﬁned as ENL = µ u σ u , where µ ˆ u is the average intensity of the selected area and σ u is its variance.For computing the ENL values, two homogeneous regions are respectively selected from “SAR 1”and “SAR 2”, as indicated by the white boxes in Figure 11(a) and Figure 12(a). Table 7 presents theENL values for diﬀerent methods. The SAR-BM3D method has the lowest ENL values comparedto other methods, which indicates that the multiplicative noise is not eﬀectively reduced or there28 a) SAR 1 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 9: Comparison of denoised images restored from “SAR 1” by diﬀerent methods.(a) SAR 2 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 10: Comparison of denoised images restored from “SAR 2” by diﬀerent methods. Table 7: ENL values of desnoised images restored from real SAR images by diﬀerent methods.

Image Region Noisy Alg 1 Alg 2 SAR- DZ HNW I-DIV TwL- DictBM3D 4V

SAR 1 Left 9.46 63.09 289.17 42.84 745.14 521.24 306.90 117.48 183.57Right 10.65 82.59 203.24 57.36 333.18 360.12 276.04 144.54 175.49SAR 2 Left 22.64 97.02 894.03 91.92 1008.50 816.97 501.22 579.26 336.58Right 21.91 96.58 734.47 91.03 985.64 740.27 566.30 724.76 444.01

The pointwise ratio between the real SAR image u ∈ R N and the estimated image ˆ u ∈ R N simulates the multiplicative noise that has been removed by the given method and is deﬁned asRatio = u ˆ u . The ratio images for diﬀerent methods are presented in Figure 11 and Figure 12. The ratio imagesfor Algorithm 1, Algorithm 2 and the SAR-BM3D method present almost random speckle, whichis matched with the expected statistics. On the contrary, the ratio images for the other methodsstill contain some geometric structures such as edges and details correlated to the real SAR images,which indicates that those methods have removed some valuable information besides of noise. (a) SAR 1 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 11: Comparison of the ratio images between “SAR 1” and the estimated images by diﬀerent methods.

We have proposed an eﬀective method for multiplicative noise removal. The proposed methodconsists of a nonlocal low-rank model, which exploits the low-rank prior of nonlocal similar patchmatrices, and the PARM iterative algorithm, which solves the nonconvex nonsmooth optimizationproblem resulting from the proposed model. We have established the global convergence of the30 a) SAR 2 (b) Alg 1 (c) Alg 2 (d) SAR-BM3D(e) DZ (f) HNW (g) I-DIV (h) TwL-4V (i) DictionaryFigure 12: Comparison of the ratio images between “SAR 2” and the estimated images by diﬀerent methods. sequence generated by the PARM algorithm to a critical point of the nonconvex nonsmooth objec-tive function of the resulting optimization problem. Numerical results have demonstrated that theproposed method with a theoretical convergence guarantee outperforms several existing methodsincluding the state-of-the-art SAR-BM3D method.

References [1]

H. Attouch, J. Bolte, P. Redont, and A. Soubeyran , Proximal alternating mini-mization and projection methods for nonconvex problems. an approach based on the kurdyka-lojasiewicz inequality , Mathematics of Operations Research, 35 (2010), pp. 438–457.[2]

H. Attouch, J. Bolte, and F. B. Svaiter , Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularizedgauss-seidel methods , Mathematical Programming, 137 (2013), pp. 91–129.[3]

G. Aubert and J.-F. Aujol , A variational approach to removing multiplicative noise , SIAMjournal on applied mathematics, 68 (2008), pp. 925–946.[4]

J. Bolte, S. Sabach, and M. Teboulle , Proximal alternating linearized minimization fornonconvex and nonsmooth problems , Mathematical Programming, 146 (2014), pp. 459–494.[5]

A. Buades, B. Coll, and J.-M. Morel , A non-local algorithm for image denoising , Com-puter Vision and Pattern, 2 (2005), pp. 60–65.[6]

R. Chan, H. Yang, and T. Zeng , A two-stage image segmentation method for blurry imageswith poisson or multiplicative gamma noise , SIAM Journal on Imaging Sciences, 7 (2014),pp. 98–127.[7]

K. Chen, H. Dong, and K.-S. Chan , Reduced rank regression via adaptive nuclear normpenalization , Biometrika, 100 (2013), pp. 901–920.318]

K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian , Image denoising by sparse 3-dtransform-domain collaborative ﬁltering , IEEE Transactions on Image Processing, 16 (2007),pp. 2080–2095.[9]

W. Dong, G. Shi, X. Li, Y. Ma, and F. Huang , Compressive sensing via nonlocal low-rankregularization , IEEE Transactions on Image Processing, 23 (2014), pp. 3618–3632.[10]

Y. Dong and T. Zeng , A convex variational model for restoring blurred images with multi-plicative noise , SIAM Journal on Imaging Sciences, 6 (2013), pp. 1598–1625.[11]

J. W. Goodman , Some fundamental properties of speckle , JOSA, 66 (1976), pp. 1145–1150.[12]

S. Gu, L. Zhang, W. Zuo, and X. Feng , Weighted nuclear norm minimization with ap-plication to image denoising , in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, 2014, pp. 2862–2869.[13]

R. A. Horn, R. A. Horn, and C. R. Johnson , Topics in matrix analysis , Cambridgeuniversity press, 1994.[14]

T. Huang, W. Dong, X. Xie, G. Shi, and X. Bai , Mixed noise removal via laplacianscale mixture modeling and nonlocal low-rank approximation , IEEE Transactions on ImageProcessing, 26 (2017), pp. 3171–3186.[15]

Y.-M. Huang, L. Moisan, M. K. Ng, and T. Zeng , Multiplicative noise removal via alearned dictionary , IEEE Transactions on Image Processing, 21 (2012), pp. 4534–4543.[16]

Y.-M. Huang, M. K. Ng, and Y.-W. Wen , A new total variation method for multiplicativenoise removal , SIAM Journal on imaging sciences, 2 (2009), pp. 20–40.[17]

Y.-M. Huang, H.-Y. Yan, Y.-W. Wen, and X. Yang , Rank minimization with applicationsto image noise removal , Information Sciences, 429 (2018), pp. 147–163.[18]

M. Kang, S. Yun, and H. Woo , Two-level convex relaxed variational model for multiplica-tive denoising , SIAM Journal on Imaging Sciences, 6 (2013), pp. 875–903.[19]

A. S. Lewis and H. S. Sendov , Nonsmooth analysis of singular values. part i: Theory ,Set-Valued Analysis, 13 (2005), pp. 213–241.[20]

A. S. Lewis and H. S. Sendov , Nonsmooth analysis of singular values. part ii: applications ,Set-Valued Analysis, 13 (2005), pp. 243–264.[21]

Y. Lou, X. Zhang, S. Osher, and A. Bertozzi , Image recovery via nonlocal operators ,Journal of Scientiﬁc Computing, 42 (2010), pp. 185–197.[22]

C. Lu, J. Tang, S. Yan, and Z. Lin , Generalized nonconvex nonsmooth low-rank mini-mization , in 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 2014,pp. 4130–4137.[23]

J. Lu, L. Shen, C. Xu, and Y. Xu , Multiplicative noise removal in imaging: An exp-modeland its ﬁxed-point proximity algorithm , Applied and Computational Harmonic Analysis, 41(2016), pp. 518–539.[24]

C. J. Oliver and S. Quegan , Understanding Synthetic Aperture Radar Images , SciTechPublishing, Inc., Raleigh, NC, 2004. 3225]

S. Parrilli, M. Poderico, C. V. Angelino, and L. Verdoliva , A nonlocal sar imagedenoising algorithm based on llmmse wavelet shrinkage , IEEE Transactions on Geoscience andRemote Sensing, 50 (2011), pp. 606–616.[26]

B. Recht, M. Fazel, and P. A. Parrilo , Guaranteed minimum-rank solutions of linearmatrix equations via nuclear norm minimization , SIAM Review, 52 (2010), pp. 471–501.[27]

J. M. Schmitt, S. Xiang, and K. M. Yung , Speckle in optical coherence tomography ,Journal of Biomedical Optics, 4 (1999), pp. 95–105.[28]

J. Shi and S. Osher , A nonlinear inverse scale space method for a convex multiplicativenoise model , SIAM Journal on imaging sciences, 1 (2008), pp. 294–321.[29]

G. Steidl and T. Teuber , Removing multiplicative noise by douglas-rachford splitting meth-ods , Journal of Mathematical Imaging and Vision, 36 (2010), pp. 168–184.[30]

R. F. Wagner, S. W. Smith, J. M. Sandrik, and H. Lopez , Statistics of speckle inultrasound b-scans , IEEE Transactions on Sonics and Ultrasonics, 30 (1983), pp. 156–163.[31]

J. Wei, Y. Huang, L. Ke, and L. Wang , Nonlocal low-rank-based compressed sensing forremote sensing image reconstruction , IEEE Geoscience & Remote Sensing Letters, 13 (2017),pp. 1557–1561.[32]

Y. Wu , Speckle noise removal via nonlocal low-rank regularization , Journal of Visual Commu-nication and Image Representation, 39 (2016), pp. 172–180.[33]

W. Yu, W. Yin, and J. Zeng , Global convergence of admm in nonconvex nonsmooth opti-mization , Journal of Scientiﬁc Computing, 78 (2018), pp. 1–35.[34]