An iterative support shrinking algorithm for ℓ p - ℓ q minimization
AAn iterative support shrinking algorithm for (cid:96) p - (cid:96) q minimization Zhifang Liu, Yanan Zhao and Chunlin Wu ∗ School of Mathematical Sciences, Nankai University, Tianjin 300071, ChinaOctober 17, 2018
Abstract.
We present an iterative support shrinking algorithm for (cid:96) p - (cid:96) q minimization ( < p < ≤ q < ∞ ). This algorithm guarantees the nonexpensiveness of the signal support set and can be easily implementedafter being proximally linearized. The subproblem can be very efficiently solved due to its convexity andreducing size along iteration. We prove that the iterates of the algorithm globally converge to a stationary pointof the (cid:96) p - (cid:96) q objective function. In addition, we show a lower bound theory for the iteration sequence, which ismore practical than the lower bound results for local minimizers in the literature. Keywords. nonconvex nonsmooth regularization, non-Lipschitz optimization, support shrinking, sparse sig-nal reconstruction, lower bound theory
Mathematics subject classification (2010).
Sparse reconstruction plays an important role in various applications such as signal and image processing,compressed sensing, model selection, variable selection, and many others [24, 9, 45, 36, 41]. This problem candescribed as follows. Given an M × N measurement matrix A with M < N , we consider to recover the sparsesignal x ∈ R N from an observed signal y = Ax + n ∈ R M , where n represents the measurement noise. Thereare many different types of noise. Two typical and important examples are Gaussian noise and heavy-tailednoise [38], which obey Gaussian distribution and heavier-than-Gaussian tails distribution [29], respectively. Toobtain the sparest solution, one naturally proposes to solve the following (cid:96) minimization problem min x ∈ R N (cid:107) x (cid:107) subject to (cid:107) Ax − y (cid:107) q ≤ ε, or its unconstrained counterpart min x ∈ R N (cid:107) x (cid:107) + 1 qα (cid:107) Ax − y (cid:107) qq , (1.1)where (cid:107)·(cid:107) denotes the (cid:96) “norm” that returns the number of nonzero entries of its argument, (cid:107)·(cid:107) q for q ∈ [1 , ∞ ) is the (cid:96) q norm, and α ∈ (0 , ∞ ) is a parameter that balances the regularization and the fidelity. The second termin (1.1), named as the fidelity term, is constructed using the noise distribution and the Maximum Likelihoodprinciple. As well known, for Gaussian noise, people use the (cid:96) fidelity term ( q = 2 ). For heavy-tailed noisesuch as impulsive noise, the (cid:96) fidelity term ( q = 1 ) is a good choice [23, 38]. Since the (cid:96) minimization isNP-hard [37], numerous methods have been proposed to approximate it. Two common ways are to replace the (cid:96) “norm” with the (cid:96) norm [12, 24] and the (cid:96) p quasi-norm ( < p < ) [24, 16, 25, 22, 40], where the (cid:96) p ∗ Corresponding author. [email protected] a r X i v : . [ m a t h . NA ] J a n uasi-norm is defined as (cid:107) x (cid:107) p = ( (cid:80) N j =1 | x j | p ) /p . In sparse reconstruction, the noncovex (cid:96) p quasi-norm hassome advantages [15, 25, 40, 32] over the convex (cid:96) norm.In this paper, we focus on the following (cid:96) p - (cid:96) q minimization problem min x ∈ R N E ( x ) := (cid:107) x (cid:107) pp + 1 qα (cid:107) Ax − y (cid:107) qq , (1.2)where p ∈ (0 , , q ∈ [1 , ∞ ) and α ∈ (0 , ∞ ) . The objective function E in (1.2) is nonsmooth, nonconvex andnon-Lipschtiz, which results in a great challenge for optimization. We now review some existing methods. Ascan been seen, most of them were designed for (cid:96) p - (cid:96) minimization.One class of approaches is smoothing approximate methods [19, 17, 18, 4], which are based on the specialstructure of the nonsmooth function E . By a smoothing function ϕ ( x, θ ) for the absolute value function | x | , the (cid:96) p , p ∈ (0 , regularization term can be smoothed. Two choices of ϕ ( x, θ ) in [17, 18, 4] are ϕ ( x, θ ) = (cid:26) | x | if | x | > θ, x θ + θ , if | x | ≤ θ, and ϕ ( x, θ ) = (cid:112) x + 4 θ . Based on this technique, hybrid orthogonal matching pursuit-smoothing gradient (OMP-SG) method [19],smoothing quadratic regularization (SQR) algorithm [4], and smoothing trust region Newton method [18] havebeen proposed for (cid:96) p , p ∈ (0 , regularized problems with smooth fidelity terms with convergence guarantee.They essentially reformulate the non-Lipschitz problem to be lipschitz ones by a smoothing parameter, whichcontrols the approximate accuracy and need to be updated progressively to zero.The second class of approaches is general iterative shrinkage-thresholding algorithms (GISA) for (cid:96) p - (cid:96) minimization problem [43, 46, 8]. GISA was inspired by the great success of soft thresholding and iterativeshrinkage-thresholding algorithms (ISTA) [21, 3] for convex (cid:96) - (cid:96) minimization problem. Specifically, thegeneral step of GISA is x ( k +1) = T αβ ( x ( k ) + β A T ( y − Ax ( k ) )) , where β > is an appropriate stepsize and T αβ : R N → R N is a shrinkage-thresholding operator. GISA is easyto implement, but it applies only to the case q = 2 . Even for q = 2 , the operator T αβ have analytical expressiononly for p = and p = [43, 30]. For a general < p < , the operator T αβ needs to be computed vianumerical methods [46, 8].The third class of approaches is iterative reweighted minimization methods; see, e.g. [27, 32, 33, 13, 20].There are iterative reweighted least squares (IRLS) and iterative reweighted (cid:96) (IRL1) minimization methods.One can refer to [35] for a systematic review. In [32, 33], the authors considered a smoothed (cid:96) p - (cid:96) minimization N (cid:88) j =1 ( x j + θ ) p/ + 12 α (cid:107) Ax − y (cid:107) and proposed IRLS algorithms to solve this approximate problem. In [20], Chen and Zhou considered thefollowing approximation to (cid:96) p - (cid:96) minimization N (cid:88) j =1 ( | x j | + θ ) p + 12 α (cid:107) Ax − y (cid:107) for some small θ > . An IRL1 algorithm was proposed to slove this approximate problem. Both IRLS andIRL1 are stable. Actually reweighted methods reformulate the original non-Lipshitz (cid:96) p - (cid:96) to lipschitz ones by ade-singularizing parameter.In this paper, we consider (1.2) from a different perspective. We first obtain a proposition from the first orderoptimality condition. Motivated by this proposition, we propose an iterative algorithm with constraints on the2upport set of the signals. The core idea is to guarantee that the signal support set will not expand in the iterativeprocedure. After constraints elimination and proximally linearized, this algorithm can be easily implemented.The subproblem therein is convex and with reducing size along iteration. It is solved inexactly by alternatingdirection method of multipliers (ADMM). Furthermore, we establish the global convergence of the iterates toa stationary point of (1.2). We also prove a new lower bound theory for the iteration sequence, which is morepractical than those lower bounds for local minimizers in the literature. Numerical examples show the goodperformance of our proposed algorithm for both (cid:96) p - (cid:96) and (cid:96) p - (cid:96) restoration.The rest of this paper is organized as follows. In section 2, we give some basic notation and preliminaries. Insection 3, we describe the motivation, and propose our algorithms. In section 4, the convergence analysis is pro-vided and the lower bound property of the iteration sequence is discussed. In section 5, we give implementationdetails. The numerical experiments are shown in section 6. Section 7 concludes the paper. Denote I = { , , . . . , M } and J = { , , . . . , N } . For a vector x ∈ R N , we refer to x j as its j th entry anddenote the support set of x by supp( x ) := { j ∈ J : x j (cid:54) = 0 } . We assume that all vectors are column vectors. For a matrix A ∈ R M × N , we write its i th row as A Ti , which isthe vector transpose of A i ∈ R N . Then we have A = A T ... A T M . Let S be a subset of J . We denote x S be the subvector of x indexed by S , which consists of the nonzeroentries of x when S = supp( x ) . Similarly, we denote B = A S to be the column submatrix of A consisting ofthe columns indexed by S . Let B Ti be the i th row of B , we have B i = ( A i ) S .Define φ : [0 , ∞ ) → [0 , ∞ ) by φ ( x ) = x p (0 < p < . We state some useful properties for φ ( · ) . Proposition 2.1.
The function φ ( · ) has the following properties:(i) φ (0) = 0 and φ (cid:48) ( x ) = px p − > on (0 , ∞ ) .(ii) φ ( x ) is concave and the following inequality holds, φ ( y ) ≤ φ ( x ) + φ (cid:48) ( x )( y − x ) , ∀ x ∈ (0 , ∞ ) , y ∈ [0 , ∞ ) . (2.1) (iii) For any c > , φ (cid:48) ( x ) is L c -Lipschitz continuous on [ c, ∞ ) , i.e., there exists a constant L c > determinedby c , such that ∀ x, y ∈ [ c, ∞ ) , | φ (cid:48) ( x ) − φ (cid:48) ( y ) | ≤ L c | x − y | . (2.2) (iv) The subdifferential of φ ( | x | ) at x is given by ∂φ ( | x | ) = (cid:40) ( −∞ , ∞ ) , x = 0 , { sgn( x ) φ (cid:48) ( | x | ) } , x (cid:54) = 0 , where sgn( x ) is the signum function. φ ( · ) , we have (cid:107) x (cid:107) pp = (cid:80) j ∈ J φ ( | x j | ) . Thus the objective function E in (1.2) reads E ( x ) = (cid:88) j ∈ J φ ( | x j | ) + 1 qα (cid:88) i ∈ I (cid:12)(cid:12) A Ti x − y i (cid:12)(cid:12) q , (2.3)which is bounded below and coercive.Now, we drive the subdifferential of E at x . For < q < ∞ , by [39, Exercise 8.8 and Proposition 10.5], weget ∂ E ( x ) = ∂ (cid:88) j ∈ J φ ( | x j | ) + 1 qα ∇ (cid:32)(cid:88) i ∈ I (cid:12)(cid:12) A Ti x − y i (cid:12)(cid:12) q (cid:33) . (2.4)where ∂ ( (cid:80) j ∈ J φ ( | x j | )) = ∂φ ( | x | ) × · · · × ∂φ ( | x N | ) . For q = 1 , we have checked the regularity requirementby [39, Corollary 10.9], indiciating ∂ E ( x ) = ∂ (cid:88) j ∈ J φ ( | x j | ) + ∂ (cid:32) α (cid:88) i ∈ I (cid:12)(cid:12) A Ti x − y i (cid:12)(cid:12)(cid:33) . (2.5)Throughout this paper, we say that x ∗ is a stationary point of (1.2) if x ∗ satisfies ∈ ∂ E ( x ∗ ) . (2.6)If ¯ x is a local minimizer of (1.2), then the first-order optimality condition (2.6) holds. Proposition 3.1.
Given x ∈ R N . Suppose that x is sufficiently close to a local minimizer (or a stationary point) x ∗ of (1.2) . Then it holds that x ∗ j = 0 , ∀ j ∈ Ω = J \ supp( x ) . (3.1) Proof.
We prove (3.1) by contradiction. For the case of < q < ∞ . As x ∗ is a local minimizer (or a stationarypoint) of E , the condition (2.6) implies that for any j ∈ J , we have ∈ ∂φ ( (cid:12)(cid:12) x ∗ j (cid:12)(cid:12) ) + 1 α (cid:32)(cid:88) i ∈ I sgn( A Ti x ∗ − y i ) (cid:12)(cid:12) A Ti x ∗ − y i (cid:12)(cid:12) q − A i (cid:33) j . Assume that there is j (cid:48) ∈ Ω such that x ∗ j (cid:48) (cid:54) = 0 . Then we have x ∗ j (cid:48) ) φ (cid:48) ( (cid:12)(cid:12) x ∗ j (cid:48) (cid:12)(cid:12) ) + 1 α (cid:32)(cid:88) i ∈ I sgn( A Ti x ∗ − y i ) (cid:12)(cid:12) A Ti x ∗ − y i (cid:12)(cid:12) q − A i (cid:33) j (cid:48) . (3.2)The second term on the right side of (3.2) is bounded. Since x is sufficiently close to x ∗ , x ∗ j (cid:48) can be sufficientlyclose to x j (cid:48) = 0 . Then the equation (3.2) is impossible to be true. This is a contradiction.For the case of q = 1 , the condition (2.6) can be rewritten as for any j ∈ J , ∈ ∂φ ( (cid:12)(cid:12) x ∗ j (cid:12)(cid:12) ) + 1 α (cid:32)(cid:88) i ∈ I ∂ | · | ( A Ti x ∗ − y i ) A i (cid:33) j . (3.3)Using the boundedness of the second term on the right side of (3.3), we can prove the results similarly. (cid:4) x ( k ) is an approximate solution in the k th iteration. Inthe next iteration, we minimize the objective function with the restriction of zero entries outside the support setof x ( k ) . This idea yields the following iterative support shrinking algorithm (ISSA) for solving (1.2). ISSA: Iterative Support Shrinking AlgorithmInitialization:
Select x (0) ∈ R N . Iteration:
For k = 0 , , . . . until convergence:1. Set S ( k ) = supp( x ( k ) ) .2. Compute x ( k +1) by solving min x ∈ R N (cid:88) j ∈ S ( k ) φ ( | x j | ) + 1 qα (cid:107) Ax − y (cid:107) qq , s. t. x j = 0 , ∀ j ∈ Ω ( k )0 = J \ S ( k ) . ( P x )In fact, the problem ( P x ) amounts to minimize the objective function respect to only S ( k ) entries of x ,with the remaining components being null. Note that S ( k ) is the support of x ( k ) . Given a vector x ∈ R N with supp( x ) ⊆ S ( k ) , we let z ( k ) = x ( k ) S ( k ) , B ( k ) = A S ( k ) and z = x S ( k ) . It follows that B ( k ) z = Ax , (cid:107) z − z ( k ) (cid:107) = (cid:107) x − x ( k ) (cid:107) . (3.4)These relationships help to reformulate the problem ( P x ) to an unconstrained problem with z as the unknowns.At the same time, each term φ ( | x j | ) , j ∈ S ( k ) can be linearized at | x ( k ) j | (cid:54) = 0 . Together with a proximal tech-nique, we present an iterative support shrinking algorithm with proximal linearization (ISSAPL) to solve (1.2). ISSAPL: Iterative Support Shrinking Algorithm with Proximal LinearizationInitialization:
Select x (0) ∈ R N and β > . Iteration:
For k = 0 , , . . . until convergence:1. Set S ( k ) = supp( x ( k ) ) .2. Generate x ( k +1) as follows:2.1. Set z ( k ) = x ( k ) S ( k ) and B ( k ) = A S ( k ) .2.2. Compute ˆ z ( k +1) by solving min z ˆ E ( k ) ( z ) = (cid:88) j ∈ S ( k ) φ (cid:48) ( | x ( k ) j | ) | z j | + 1 qα (cid:107) B ( k ) z − y (cid:107) qq + β (cid:107) z − z ( k ) (cid:107) . ( P z )2.3. Set x ( k +1) j = (cid:40) , j ∈ Ω ( k )0 = J \ S ( k ) , ˆ z ( k +1) j , j ∈ S ( k ) . (3.5)The problem ( P z ) in ISSAPL has an unique optimal solution due to strong convexity of ˆ E ( k ) . Although theproblem ( P z ) is a convex optimization problem, it needs to be solved by iteration. In practical, we solve the5 P z ) inexactly. Now we present our inexact iterative support shrinking algorithm with proximal linearization(InISSAPL) to solve (1.2). InISSAPL: Inexact Iterative Support Shrinking Algorithm with Proximal LinearizationInitialization:
Select x (0) ∈ R N , β > and ≤ ε < . Iteration:
For k = 0 , , . . . until convergence:1. Set S ( k ) = supp( x ( k ) ) .2. Generate x ( k +1) as follows:2.1. Set z ( k ) = x ( k ) S ( k ) and B ( k ) = A S ( k ) .2.2. Find ˆ z ( k +1) ≈ arg min z ˆ E ( k ) ( z ) and ˆ u ( k +1) ∈ ∂ ˆ E (ˆ z ( k +1) ) , such that (cid:107) ˆ u ( k +1) (cid:107) ≤ β ε (cid:107) ˆ z ( k +1) − z ( k ) (cid:107) . (3.6)2.3. Set x ( k +1) j = (cid:40) , j ∈ Ω ( k )0 = J \ S ( k ) , ˆ z ( k +1) j , j ∈ S ( k ) . (3.7) Remark.
The condition (3.6) in InISSAPL is motivated by [2]. It corresponds to an inexact optimality conditionand a guide to select the approximate solution for ( P z ). Due to the strong convexity of the problem ( P z ), it canbe solved to any given accuracy. Therefore, the condition (3.6) in InISSAPL can hold, as long as the problem( P z ) is sufficiently solved.We have some useful representations of ˆ u ( k +1) . Since ˆ u ( k +1) ∈ ∂ ˆ E (ˆ z ( k +1) ) , we have ˆ u ( k +1) ∈ ∂ (cid:88) j ∈ S ( k ) φ (cid:48) ( | z ( k ) j | ) | ˆ z ( k +1) j | + β (ˆ z ( k +1) − z ( k ) )+ 1 qα ∂ (cid:32)(cid:88) i ∈ I (cid:12)(cid:12)(cid:12) ( B ( k ) i ) T ˆ z ( k +1) − y i (cid:12)(cid:12)(cid:12) q (cid:33) . (3.8)Then for any j ∈ S ( k ) and i ∈ I , there are ξ j ∈ ∂ | · | (ˆ z ( k +1) j ) = ∂ | · | ( x ( k +1) j ) and η i = ∂ | · | (( B ( k ) i ) T ˆ z ( k +1) − y i ) = ∂ | · | ( A Ti x ( k +1) − y i ) , such that when < q < ∞ , ˆ u ( k +1) j = ξ j φ (cid:48) ( | z ( k ) j | ) + β (ˆ z ( k +1) j − z ( k ) j )+ 1 α (cid:32)(cid:88) i ∈ I sgn(( B ( k ) i ) T ˆ z ( k +1) − y i ) (cid:12)(cid:12)(cid:12) ( B ( k ) i ) T ˆ z ( k +1) − y i (cid:12)(cid:12)(cid:12) q − B ( k ) i (cid:33) j = ξ j φ (cid:48) ( | x ( k ) j | ) + β ( x ( k +1) j − x ( k ) j )+ 1 α (cid:32)(cid:88) i ∈ I sgn( A Ti x ( k +1) − y i ) (cid:12)(cid:12)(cid:12) A Ti x ( k +1) − y i (cid:12)(cid:12)(cid:12) q − A i (cid:33) j , (3.9)6nd when q = 1 , ˆ u ( k +1) j = ξ j φ (cid:48) ( | z ( k ) j | ) + β (ˆ z ( k +1) j − z ( k ) j ) + 1 α (cid:32)(cid:88) i ∈ I η i B ( k ) i (cid:33) j = ξ j φ (cid:48) ( | x ( k ) j | ) + β ( x ( k +1) j − x ( k ) j ) + 1 α (cid:32)(cid:88) i ∈ I η i A i (cid:33) j . (3.10) In this section, we establish the global convergence result of the sequence by the proposed InISSAPL. Theseresults also hold for ISSAPL.From the iteration process of InISSAPL, we can see that it generates a nonincreasing sequence of supportset. A basic lemma for (cid:8) S ( k ) (cid:9) is showed in the following. Lemma 4.1.
The sequence (cid:8) S ( k ) (cid:9) converges in a finite number of iterations, i.e., there exists an integer K > such that if k ≥ K , then S ( k ) ≡ S ( K ) .Proof. Since J ⊇ S (0) ⊇ · · · ⊇ S ( k ) ⊇ · · · , (cid:8) S ( k ) (cid:9) converges in a finite number of iterations. (cid:4) Lemma 4.1 plays a key role in the convergence analysis for the three vector sequences, (cid:8) x ( k ) (cid:9) , (cid:8) z ( k ) (cid:9) and (cid:8) ˆ z ( k ) (cid:9) , generated by InISSAPL. Note that z ( k ) = x ( k ) S ( k ) . By (3.7), ˆ z ( k ) and x ( k ) have exactly the same nonzeroentries. From Lemma 4.1, we can claim that after a certain K numbers of iteration, the support of x ( k ) is fixed,i.e., if k ≥ K , then supp( x ( k ) ) = S ( K ) , from which we can directly obtain that z ( k ) = x ( k ) S ( K ) = ˆ z ( k ) , ∀ k > K . In the next, we establish the global convergence of the sequence (cid:8) x ( k ) (cid:9) . For the convenience of description,we introduce an auxiliary function F ( k ) ( x ) = (cid:88) j ∈ S ( k ) φ ( | x ( k ) j | ) + φ (cid:48) ( | x ( k ) j | ) (cid:16) | x j | − | x ( k ) j | (cid:17) + 1 qα (cid:107) Ax − y (cid:107) qq + β (cid:107) x − x ( k ) (cid:107) . (4.1) Lemma 4.2.
For any β > and ≤ ε < , let (cid:8) x ( k ) (cid:9) be a sequence generated by InISSAPL . Then(i) The sequence (cid:8) E ( x ( k ) ) (cid:9) is nonincreasing and satisfies E ( x ( k +1) ) + β − ε ) (cid:107) x ( k +1) − x ( k ) (cid:107) ≤ E ( x ( k ) ) . (4.2) (ii) The sequence (cid:8) x ( k ) (cid:9) is bounded and satisfies lim k →∞ (cid:107) x ( k +1) − x ( k ) (cid:107) = 0 . roof. For the case of < q < ∞ . Due to the fact that φ (0) = 0 , we have F ( k ) ( x ( k ) ) = (cid:88) j ∈ S ( k ) φ ( | x ( k ) j | ) + 1 qα (cid:107) Ax ( k ) − y (cid:107) qq = (cid:88) j ∈ J φ ( | x ( k ) j | ) + 1 qα (cid:107) Ax ( k ) − y (cid:107) qq = E ( x ( k ) ) . (4.3)When x ∈ R N and supp( x ) ⊆ S ( k ) , we obtain F ( k ) ( x ) = (cid:88) j ∈ supp( x ) φ ( | x ( k ) j | ) + φ (cid:48) ( | x ( k ) j | ) (cid:16) | x j | − | x ( k ) j | (cid:17) + (cid:88) j ∈ S ( k ) \ supp( x ) φ ( | x ( k ) j | ) − φ (cid:48) ( | x ( k ) j | ) | x ( k ) j | + 1 qα (cid:107) Ax − y (cid:107) qq + β (cid:107) x − x ( k ) (cid:107) [ by (2.1) ] ≥ (cid:88) j ∈ S ( k ) φ ( | x j | ) + 1 qα (cid:107) Ax − y (cid:107) qq + β (cid:107) x − x ( k ) (cid:107) = (cid:88) j ∈ J φ ( | x j | ) + 1 qα (cid:107) Ax − y (cid:107) qq + β (cid:107) x − x ( k ) (cid:107) = E ( x ) + β (cid:107) x − x ( k ) (cid:107) . (4.4)The subdifferential of F ( k ) at x is defined as ∂ F ( k ) ( x ) = ∂ (cid:88) j ∈ S ( k ) φ (cid:48) ( | x ( k ) j | ) | x j | + β ( x − x ( k ) )+ 1 qα ∇ (cid:32)(cid:88) i ∈ I (cid:12)(cid:12) A Ti x − y i (cid:12)(cid:12) q (cid:33) = ∂ (cid:88) j ∈ S ( k ) φ (cid:48) ( | x ( k ) j | ) | x j | + β ( x − x ( k ) )+ 1 α (cid:88) i ∈ I sgn( A Ti x − y i ) (cid:12)(cid:12) A Ti x − y i (cid:12)(cid:12) q − A i . Let u ( k +1) = ( u ( k +1)1 , . . . , u ( k +1) N ) T , where u ( k +1) j = ˆ u ( k +1) j , j ∈ S ( k )1 α (cid:16)(cid:80) i ∈ I sgn( A Ti x ( k +1) − y i ) (cid:12)(cid:12) A Ti x ( k +1) − y i (cid:12)(cid:12) q − A i (cid:17) j , otherwise , ˆ u ( k +1) j in (3.9). Then u ( k +1) ∈ ∂ F ( k ) ( x ( k +1) ) . Since for any j ∈ J \ S ( k ) , x ( k +1) j = x ( k ) j = 0 , we have (cid:104) u ( k +1) , x ( k ) − x ( k +1) (cid:105) = (cid:88) j ∈ S ( k ) u ( k +1) j ( x ( k ) j − x ( k +1) j )= (cid:88) j ∈ S ( k ) ˆ u ( k +1) j ( z ( k ) j − ˆ z ( k +1) j ) ≥ −(cid:107) ˆ u ( k +1) (cid:107) (cid:107) z ( k ) − ˆ z ( k +1) (cid:107) [ by (3.6) ] ≥ − β ε (cid:107) z ( k ) − ˆ z ( k +1) (cid:107) [ by (3.4) ] = − β ε (cid:107) x ( k ) − x ( k +1) (cid:107) . (4.5)Putting (4.3), (4.4) and (4.5) together, we obtain E ( x ( k ) ) = F ( k ) ( x ( k ) ) ≥ F ( k ) ( x ( k +1) ) + (cid:104) u ( k +1) , x ( k ) − x ( k +1) (cid:105)≥ F ( k ) ( x ( k +1) ) − β ε (cid:107) x ( k +1) − x ( k ) (cid:107) ≥ E ( x ( k +1) ) + β − ε ) (cid:107) x ( k +1) − x ( k ) (cid:107) . With the fact that E ( x ) is bounded from below and β (1 − ε ) > , it follows that (cid:8) E ( x ( k ) ) (cid:9) is nonincreasingand converges to a finite value as k → ∞ . Thus lim k →∞ (cid:107) x ( k +1) − x ( k ) (cid:107) = 0 .Because E ( x ) is coercive, we know that (cid:8) x ( k ) (cid:9) is bounded.For the case of q = 1 , the subdifferential of F ( k ) at x is given by ∂ F ( k ) ( x ) = ∂ (cid:88) j ∈ S ( k ) φ (cid:48) ( | x ( k ) j | ) | x j | + β ( x − x ( k ) ) + ∂ (cid:32) α (cid:88) i ∈ I (cid:12)(cid:12) A Ti x − y i (cid:12)(cid:12)(cid:33) = ∂ (cid:88) j ∈ S ( k ) φ (cid:48) ( | x ( k ) j | ) | x j | + β ( x − x ( k ) ) + 1 α (cid:88) i ∈ I ∂ | · | ( A Ti x − y i ) A i . Let u ( k +1) = ( u ( k +1)1 , . . . , u ( k +1) N ) T with u ( k +1) j = (cid:40) ˆ u ( k +1) j , j ∈ S ( k )1 α (cid:0)(cid:80) i ∈ I η i A i (cid:1) j , otherwise , where ˆ u ( k +1) j is as in (3.10) and η i ∈ ∂ | · | ( A Ti x − y i ) . Then u ( k +1) ∈ ∂ F ( k ) ( x ( k +1) ) . In a similar way, we canprove that (i)(ii) holds. (cid:4) Recall the results of Lemma 4.1, we now focus on the iteration number k ≥ K to get the convergence of thesequence (cid:8) x ( k ) (cid:9) . Then the entries of ˆ u ( k +1) in (3.8) can be written as for any j ∈ S ( k ) , when < q < ∞ , ˆ u ( k +1) j = sgn( x ( k +1) j ) φ (cid:48) ( | x ( k ) j | ) + β ( x ( k +1) j − x ( k ) j )+ 1 α (cid:32)(cid:88) i ∈ I sgn( A Ti x ( k +1) − y i ) (cid:12)(cid:12)(cid:12) A Ti x ( k +1) − y i (cid:12)(cid:12)(cid:12) q − A i (cid:33) j , (4.6)9nd when q = 1 , ˆ u ( k +1) j = sgn( x ( k +1) j ) φ (cid:48) ( | x ( k ) j | ) + β ( x ( k +1) j − x ( k ) j ) + 1 α (cid:32)(cid:88) i ∈ I η i A i (cid:33) j . (4.7)The condition (3.6) reads (cid:107) ˆ u ( k +1) (cid:107) ≤ β ε (cid:107) ˆ z ( k +1) − z ( k ) (cid:107) = β ε (cid:107) x ( k +1) − x ( k ) (cid:107) , (4.8)for ≤ q < ∞ .The following is a bound theory on the iteration sequence, which is important to establish the convergenceof (cid:8) x ( k ) (cid:9) . Theorem 4.3.
There are < c < C < ∞ such that either x ( k ) j = 0 or c ≤ | x ( k ) j | ≤ C, ∀ j ∈ J, ∀ k ≥ K . (4.9) Proof.
From Lemma 4.1, for any j ∈ S ( K ) and k ≥ K , x ( k ) j (cid:54) = 0 . We now prove by contradiction that | x ( k ) j | hasnonzero lower and upper bound for any j ∈ S ( K ) , ∀ k ≥ K .For the case of < q < ∞ , assume there exists j (cid:48) ∈ S ( K ) such that x ( k ) j (cid:48) (cid:54) = 0 and lim k →∞ x ( k ) j (cid:48) = 0 . Notethat, if necessary, we can pass to a subsequence of x ( k ) j (cid:48) . By letting ζ j (cid:48) = β ( x ( k +1) j (cid:48) − x ( k ) j (cid:48) ) + 1 α (cid:32)(cid:88) i ∈ I sgn( A Ti x ( k +1) − y i ) (cid:12)(cid:12)(cid:12) A Ti x ( k +1) − y i (cid:12)(cid:12)(cid:12) q − A i (cid:33) j (cid:48) , we have (cid:12)(cid:12)(cid:12) φ (cid:48) ( | x ( k ) j (cid:48) | ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ˆ u ( k +1) j (cid:48) (cid:12)(cid:12)(cid:12) + | ζ j (cid:48) | (4.10)according to (4.6). It follows from the boundness of (cid:8) x ( k ) (cid:9) (Lemma 4.2) that | ζ j (cid:48) | is bounded. The condi-tion (4.8) implies that (cid:12)(cid:12)(cid:12) ˆ u ( k +1) j (cid:48) (cid:12)(cid:12)(cid:12) is also bounded. Thus the equation (4.10) is impossible to hold when k → ∞ .For the case of q = 1 , by letting ζ j (cid:48) = β ( x ( k +1) j (cid:48) − x ( k ) j (cid:48) ) + 1 α (cid:32)(cid:88) i ∈ I η i A i (cid:33) j (cid:48) , we have (cid:12)(cid:12)(cid:12) φ (cid:48) ( | x ( k ) j (cid:48) | ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ˆ u ( k +1) j (cid:48) (cid:12)(cid:12)(cid:12) + | ζ j (cid:48) | , according to (4.7). Using the boundness of the right-hand side, we can prove the results similarly. (cid:4) Compared with the lower bound theory for local minimizers in the literature [19], the bound theory inTheorem 4.3 is for the iterative sequence and more practical. Theorem 4.3 indicates that when k ≥ K , thereexists L c > , such that for any j ∈ S ( K ) , (cid:12)(cid:12)(cid:12) φ (cid:48) ( | x ( k +1) j | ) − φ (cid:48) ( | x ( k ) j | ) (cid:12)(cid:12)(cid:12) ≤ L c (cid:12)(cid:12)(cid:12) | x ( k +1) j | − | x ( k ) j | (cid:12)(cid:12)(cid:12) ≤ L c (cid:12)(cid:12)(cid:12) x ( k +1) j − x ( k ) j (cid:12)(cid:12)(cid:12) . (4.11)We now derive a subgradient lower bound for the iterates gap.10 emma 4.4. For each k ≥ K , there exists v ( k +1) ∈ ∂ E ( x ( k +1) ) such that (cid:107) v ( k +1) (cid:107) ≤ (cid:18) L c + β ε + 2) (cid:19) (cid:107) x ( k +1) − x ( k ) (cid:107) . (4.12) Proof.
For the case of < q < ∞ , denote v ( k +1) j = sgn( x ( k +1) j ) φ (cid:48) ( | x ( k +1) j | )+ 1 α (cid:32)(cid:88) i ∈ I sgn( A Ti x ( k +1) − y i ) (cid:12)(cid:12)(cid:12) A Ti x ( k +1) − y i (cid:12)(cid:12)(cid:12) q − A i (cid:33) j , ∀ j ∈ S ( K ) ,v ( k +1) j = 0 , ∀ j ∈ J \ S ( K ) ;ˆ v ( k +1) j = sgn( x ( k +1) j ) φ (cid:48) ( | x ( k ) j | )+ 1 α (cid:32)(cid:88) i ∈ I sgn( A Ti x ( k +1) − y i ) (cid:12)(cid:12)(cid:12) A Ti x ( k +1) − y i (cid:12)(cid:12)(cid:12) q − A i (cid:33) j , ∀ j ∈ S ( K ) , ˆ v ( k +1) j = 0 , ∀ j ∈ J \ S ( K ) . Since ∂φ ( | | ) = ( −∞ , ∞ ) , we have v ( k +1) = ( v ( k +1)1 , . . . , v ( k +1) N ) (cid:62) ∈ ∂ E ( x ( k +1) ) and (cid:107) ˆ v ( k +1) (cid:107) = (cid:115) (cid:88) j ∈ S ( K ) | ˆ v ( k +1) j | = (cid:115) (cid:88) j ∈ S ( K ) | ˆ u ( k +1) j − β ( x ( k +1) j − x ( k ) j ) | ≤ (cid:107) ˆ u ( k +1) (cid:107) + β (cid:107) x ( k +1) − x ( k ) (cid:107) [ by (4.8) ] ≤ β ε + 2) (cid:107) x ( k +1) − x ( k ) (cid:107) . (4.13)Form (4.11), it follows that (cid:107) v ( k +1) − ˆ v ( k +1) (cid:107) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) j ∈ S ( K ) (cid:12)(cid:12)(cid:12) sgn( x ( k +1) j ) φ (cid:48) ( | x ( k +1) j | ) − sgn( x ( k +1) j ) φ (cid:48) ( | x ( k ) j | ) (cid:12)(cid:12)(cid:12) [ by (4.11) ] ≤ L c (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) j ∈ S ( K ) (cid:12)(cid:12)(cid:12) | x ( k +1) j | − | x ( k ) j | (cid:12)(cid:12)(cid:12) ≤ L c (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) j ∈ S ( K ) (cid:12)(cid:12)(cid:12) x ( k +1) j − x ( k ) j (cid:12)(cid:12)(cid:12) = L c (cid:107) x ( k +1) − x ( k ) (cid:107) . (4.14)Combining (4.13) and (4.14) yields: (cid:107) v ( k +1) (cid:107) ≤ (cid:107) v ( k +1) − ˆ v ( k +1) (cid:107) + (cid:107) ˆ v ( k +1) (cid:107) ≤ (cid:18) L c + β ε + 2) (cid:19) (cid:107) x ( k +1) − x ( k ) (cid:107) . q = 1 , denote v ( k +1) j = sgn( x ( k +1) j ) φ (cid:48) ( | x ( k +1) j | ) + 1 α (cid:32)(cid:88) i ∈ I η i A i (cid:33) j , ∀ j ∈ S ( K ) ,v ( k +1) j = 0 , ∀ j ∈ J \ S ( K ) ;ˆ v ( k +1) j = sgn( x ( k +1) j ) φ (cid:48) ( | x ( k ) j | ) + 1 α (cid:32)(cid:88) i ∈ I η i A i (cid:33) j , ∀ j ∈ S ( K ) , ˆ v ( k +1) j = 0 , ∀ j ∈ J \ S ( K ) . In a similar way, we can prove that (4.12) holds . (cid:4)
Finally, we establish our main convergence result. An important tool for establishing the convergence isbased on the so-called Kurdyka-Łojasiewicz (KL) property, which has attracted a lot of attention in recentyears. Related preliminaries have been provided in Appendix 8.
Theorem 4.5.
The sequences (cid:8) x ( k ) (cid:9) generated by InISSAPL converges globally to the limit point x ∗ , which isa stationary point of E .Proof. Since (cid:8) x ( k ) (cid:9) is bounded (Lemma 4.2), there exists a subsequence ( x ( k l ) ) and x ∗ such that x ( k l ) → x ∗ and E ( x ( k l ) ) → E ( x ∗ ) , as l → ∞ . (4.15)The function E satisfies the KL property [2]. Combing (4.2), (4.4) and (4.15), and by Theorem 2.9 in [2],the sequence (cid:8) x ( k ) (cid:9) converges globally to the limit point x ∗ , which is a stationary point of E . (cid:4) The subproblem in InISSAPL is a weighted (cid:96) minimization. Some standard methods like ADMM [7, 42, 28,44], split Bregman method [26, 10, 11] and primal-dual algorithm [14] can be used to efficiently solve it. Wehere adopt ADMM. For clarity of description in this section, we refer to S ( k ) , φ (cid:48) ( | z ( k ) j | ) , B ( k ) and z ( k ) by S , w j , ¯ B and ¯ z , respectively. Consequently, ( P z ) becomes min z (cid:88) j ∈ S w j | z j | + 1 qα (cid:107) ¯ Bz − y (cid:107) qq + β (cid:107) z − ¯ z (cid:107) . (5.1)We rewrite (5.1) to the following constrained optimization problem: min z , s , t (cid:88) j ∈ S w j | s j | + 1 qα (cid:107) t (cid:107) qq + β (cid:107) z − ¯ z (cid:107) , s. t. z = s , ¯ Bz − y = t , (5.2)and define the augmented Lagrangian functional for the problem (5.2) as follows: L ( z , s , t ; λ, µ ) = (cid:88) j ∈ S w j | s j | + 1 qα (cid:107) t (cid:107) qq + β (cid:107) z − ¯ z (cid:107) + (cid:104) λ , z − s (cid:105) + (cid:104) µ , ( ¯ Bz − y ) − t (cid:105) + γ (cid:107) z − s (cid:107) + δ (cid:107) ( ¯ Bz − y ) − t (cid:107) , where γ, δ > are the penalty parameters and λ , µ are the Lagrangian multipliers. The ADMM for solving (5.1)is described as follows. 12 DMM: Alternating Direction Method of Multipliers for Solving (5.1)
Initialization:
Start with z (0) = ¯ z , λ (0) = , µ (0) = . Iteration:
For l = 0 , , . . . , MAXit,1. Compute ( s ( l +1) , t ( l +1) ) = arg min s , t L ( z ( l ) , s , t ; λ ( l ) , µ ( l ) ) . (5.3)2. Compute z ( l +1) = arg min z L ( z , s ( l ) , t ( l ) ; λ ( l ) , µ ( l ) ) . (5.4)3. Update λ ( l +1) = λ ( l ) + γ ( z ( l +1) − s ( l +1) ) , (5.5) µ ( l +1) = µ ( l ) + δ (( ¯ Bz ( l +1) − y ) − t ( l +1) ) . (5.6)ADMM can solve (5.1) to any accuracy. Considering the computational efficiency, we utilize in practice,the following stopping criterion [7]: (cid:13)(cid:13)(cid:13) τ ( l +1) (cid:13)(cid:13)(cid:13) ≤ √ M (cid:15) abs + (cid:15) rel max (cid:26)(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) I ¯ B (cid:21) (cid:20) z ( l +1) z ( l +1) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) − s ( l +1) − t ( l +1) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) y (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) (cid:27) , (cid:13)(cid:13)(cid:13) υ ( l +1) (cid:13)(cid:13)(cid:13) ≤ √ N (cid:15) abs + (cid:15) rel (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) I ¯ B (cid:21) T (cid:20) λ ( l +1) µ ( l +1) (cid:21)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , where τ ( l +1) = (cid:20) z ( l +1) − s ( l +1) ¯ Bz ( l +1) − y − t ( l +1) (cid:21) , υ ( l +1) = (cid:20) I ¯ B (cid:21) T (cid:20) γ ( s ( l ) − s ( l +1) ) δ ( t ( l ) − t ( l +1) ) (cid:21) , are primal and dualresiduals, respectively, at the l the iteration. (cid:15) abs > ia an absolute tolerance and (cid:15) rel is a relative tolerance.The subproblems (5.3) and (5.4) can be efficiently solved.1. For (5.3), the minimization with respect to s and t is min s , t (cid:88) j ∈ S w j | s j | + 1 qα (cid:107) t (cid:107) qq − (cid:104) λ ( l ) , s (cid:105) − (cid:104) µ ( l ) , t (cid:105) + γ (cid:107) z ( l ) − s (cid:107) + δ (cid:107) ( ¯ Bz ( l ) − y ) − t (cid:107) , which can be separated into two independent subproblems.1.1. The minimization (5.3) with respect to s min s (cid:88) j ∈ S w j | s j | + γ (cid:107) s − z ( l ) − λ ( l ) γ (cid:107) , has the following closed form solution: s ( l +1) j = sgn( z ( l ) j + λ ( l ) j γ ) max (cid:40)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z ( l ) j + λ ( l ) j γ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − w j γ , (cid:41) , ∀ j ∈ S. t is min t qα (cid:107) t (cid:107) qq + δ (cid:107) t − ( ¯ Bz ( l ) − y ) − µ ( l ) δ (cid:107) . q = 1 , t ( l +1) i = sgn(( ¯ Bz ( l ) − y ) i + µ ( l ) i δ ) max (cid:26)(cid:12)(cid:12)(cid:12)(cid:12) ( ¯ Bz ( l ) − y ) i + µ ( l ) i δ (cid:12)(cid:12)(cid:12)(cid:12) − δα , (cid:27) , ∀ i ∈ I . For q = 2 , t ( l +1) i = δα ( ¯ Bz ( l ) − y ) i + αµ ( l ) i δα , ∀ i ∈ I . For other < q < ∞ , we can find t ( l +1) j via anynumerical procedure such as Newton’s method.2. For (5.4), the minimization with respect to z is a quadratic optimization problem, min z β (cid:107) z − ¯ z (cid:107) + (cid:104) λ ( l ) , z (cid:105) + (cid:104) µ ( l ) , ¯ Bz (cid:105) + γ (cid:107) z − s ( l +1) (cid:107) + δ (cid:107) ( ¯ Bz − y ) − t ( l +1) (cid:107) . Its optimality condition gives a linear system ( β + γ + δ ¯ B T ¯ B ) z = β ¯ z + γ s ( l +1) + δ ¯ B T ( y + t ( l +1) ) − λ ( l ) − ¯ B T µ ( l ) , which can be solved efficiently [7]. Remark.
For q = 2 , we actually only need to introduce one new variable s . In this section, we present numerical experiments to demonstrate the efficiency of the InISSAPL algorithm. Allthe tests were performed using Windows 10 and M
ATLAB
R2016a 64-bit on a HP Z228 microtower workstationwith an Intel(R) Core(TM) i7-4790 CPU @3.60GHz and 8GB memory.In our experiments, we generated the true signal x o of the sparsity κ supported on a random index set withindependently and identically distributed Gaussian entries. For the InISSAPL algorithm, we chose MAXit =1000 , (cid:15) abs = 10 − and (cid:15) abs = 10 − in the inner ADMM and adopted the following stopping criteria for theouter iteration (cid:13)(cid:13) x ( k +1) − x ( k ) (cid:13)(cid:13) (cid:13)(cid:13) x ( k ) (cid:13)(cid:13) ≤ − . p In our first example, we tested the InISSAPL algorithm for (cid:96) p - (cid:96) minimization to recover sparse vectors with p varying among { . , . , . , . , . } . We used a × random Gaussian matrix A and a true signal x o of the sparsity κ = 500 . The Gaussian noises with σ = 0 . and σ = 0 . were added to the clean signal Ax o to simulate the measurements y . The InISSAPL algorithm was applied to get recovered signals x ∗ . Toshow the performance of our algorithm, we chose the results of LASSO as the benchmarks, which is solved byADMM-lasso [7]. For ADMM-lasso, we also set (cid:15) abs = 10 − and (cid:15) abs = 10 − . We show the relative L error (cid:107) x ∗ − x o (cid:107) (cid:107) x ∗ (cid:107) in Table 1. As can be seen, for the low level noise, our InISSAPL algorithm with smaller p generatesbetter results. However, with p = 0 . , it is more robust to different levels of noise.Table 1: Relative L errors of LASSO and InISSAPL for (cid:96) p - (cid:96) minimization with different p .LASSO p = 0 . p = 0 . p = 0 . p = 0 . p = 0 . κ = 500 L Error L Error L Error L Error L Error L Error σ = 0 . σ = 0 . .2 Robust recovery from measurements with heavy-tailed noise In the third example, we tested our InISSAPL algorithm for (cid:96) p - (cid:96) minimization. We chose p = 0 . . We gen-erated a × random Gaussian A and a true signal x o of the sparsity κ = 25 . The Ax o was corruptedby impulsive noise obeying the standard (0 , -Cauchy distribution, which is scaled by a factor of − . Fig-ure 1(a) shows both the noiseless and noisy observations. The noisy observed signal approximates closely thenoiseless observation almost everywhere except two outliers at the entries 124 and 249. Figure 1(b)-(d) show therecovered signals by LASSO, InISSAPL for (cid:96) p - (cid:96) minimization, and InISSAPL for (cid:96) p - (cid:96) minimization. Notethat LASSO and (cid:96) p - (cid:96) minimization are very sensitive to the outliers and failed to reconstruct the signal. As canbe seen in Figure 1(d), the (cid:96) p - (cid:96) minimization is able to recover the sparse signal with high accuracy. Noiseless observed signalNoisy observed signal (a) Noiseless and noisy observed signals
Recovered signalOriginal signal (b) Recovered signal by LASSO
Recovered signalOriginal signal (c) Recovered signal by (cid:96) p - (cid:96) via InISSAPL Recovered signalOriginal signal (d) Recovered signal by (cid:96) p - (cid:96) via InISSAPL Figure 1: Reconstructions of a sparse signal from measurements corrupted by impulsive noise using differentmethods. (cid:96) p - (cid:96) minimization via InISSAPL is more robust to this case.15 Conclusions
We proposed an iterative support shrinking algorithm for non-Lipschtiz (cid:96) p - (cid:96) q minimization. The proposed algo-rithm overcomes the non-Lipschtizian by iteratively adding constraints on the support to the original problem.It is a new type of reweighted (cid:96) algorithm. The algorithm is easy to implement. The subproblem in eachiteration is solved inexactly by ADMM. We proved the global convergence of the iterative sequence, whoselimit is a stationary point of the (cid:96) p - (cid:96) q objective function. We also showed a more practical lower bound theoryof the iterates. Numerical experiments demonstrated the performance of the algorithm. Due to the successivesize reduction of the subproblem, it has good potentials in applications for large scale sparse signal recoveryproblems. We recall some definitions and results here.
Definition 8.1 (Subdifferentials [39]) . Let h : R N → R ∪ { + ∞} be a proper, lower semicontinuous function.(i) The regular subdifferential of h at ¯ x ∈ dom h = { x ∈ R N : h ( x ) < + ∞} is defined as (cid:98) ∂h (¯ x ) := (cid:40) v ∈ R N : lim inf x → ¯ xx (cid:54) =¯ x h ( x ) − h (¯ x ) − (cid:104) v , x − ¯ x (cid:105)(cid:107) x − ¯ x (cid:107) ≥ (cid:41) ; (ii) The (limiting) subdifferential of h at ¯ x ∈ dom h is defined as ∂h (¯ x ) := (cid:110) v ∈ R N : ∃ x ( k ) → ¯ x , h ( x ( k ) ) → h ( x ) , v ( k ) ∈ (cid:98) ∂h ( x ( k ) ) , v ( k ) → v (cid:111) . Remark.
Form Definition 8.1, the following properties hold:(i) For any ¯ x ∈ dom h , (cid:98) ∂h (¯ x ) ⊂ ∂h (¯ x ) . If h is continuously differentiable at ¯ x , then (cid:98) ∂h (¯ x ) = ∂h (¯ x ) = {∇ h (¯ x ) } ;(ii) For any ¯ x ∈ dom h , the subdifferential set ∂h (¯ x ) is closed, i.e, (cid:110) v ∈ R N : ∃ x ( k ) → ¯ x , h ( x ( k ) ) → h (¯ x ) , v ( k ) ∈ ∂h ( x ( k ) ) , v ( k ) → v (cid:111) ⊂ ∂h (¯ x ) . The foundational works on the Kurdyka-Łojasiewicz (KL) property property are due to Łojasiewicz [34]and Kurdyka [31]. For the development of the appliciation of KL property in optimization theory, see [5, 1, 2, 6]and reference therein.
Definition 8.2 (Kurdyka-Łojasiewicz Property [1]) . A proper function h is said to have the Kurdyka-Łojasiewiczproperty at ¯ x ∈ dom ∂h = { x ∈ R N : ∂h ( x ) (cid:54) = ∅} if there exist ζ ∈ (0 , + ∞ ] , a neighborhood U of ¯ x , and acontinuous concave function ϕ : [0 , ζ ) → R + such that(i) ϕ (0) = 0 ;(ii) ϕ (0) is C on (0 , ζ ) ;(iii) for all s ∈ (0 , ζ ) , ϕ (cid:48) ( s ) > ;(iv) for all x ∈ U satisfying h (¯ x ) < h ( x ) < h (¯ x ) + ζ , the Kurdyka-Łojasiewicz inequality holds: ϕ (cid:48) ( h ( x ) − h (¯ x )) dist(0 , ∂h ( x )) ≥ . where dist(0 , ∂h ( x )) = min {(cid:107) v (cid:107) : v ∈ ∂h ( x ) } ,16 proper, lower semicontinuous function h satisfying the KL property at all points in dom ∂h is called a KL function . On can refer to [1, 2, 6] for examples of KL functions. For this paper, the function E satisfies theKL property [2]. References [1] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization and projectionmethods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality.
Math.Oper. Res. , 35(2):438–457, April 30 2010.[2] H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tameproblems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.
Math.Program. , 137(1-2):91–129, 2013.[3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.
SIAM J. Imaging Sci. , 2(1):183–202, 2009.[4] W. Bian and X. Chen. Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization.
SIAM J. Optim. , 23(3):1718–1741, 2013.[5] J. Bolte, A. Daniilidis, and A. Lewis. The Łojasiewicz inequality for nonsmooth subanalytic functionswith applications to subgradient dynamical systems.
SIAM J. Optim. , 17(4):1205–1223, 2006.[6] J. B. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex andnonsmooth problems.
Math. Program. , 146(1-2):459–494, 2014.[7] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learningvia the alternating direction method of multipliers.
Found. Trends Mach. Learn. , 3(1):1–122, Jan. 2011.[8] K. Bredies, D. A. Lorenz, and S. Reiterer. Minimization of non-smooth, non-convex functionals by itera-tive thresholding.
J. Optim. Theory Appl. , 165(1):78–112, Apr. 2015.[9] A. M. Bruckstein, D. L. Donoho, and M. Elad. From sparse solutions of systems of equations to sparsemodeling of signals and images.
SIAM Rev. , 51(1):34–81, 2009.[10] J. Cai, S. Osher, and Z. Shen. Convergence of the linearized bregman iteration for (cid:96) -norm minimization. Math. Comput. , 78(268):2127–2136, 2009.[11] J. Cai, S. Osher, and Z. Shen. Linearized bregman iterations for compressed sensing.
Math. Comput. ,78(267):1515–1536, 2009.[12] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction fromhighly incomplete frequency information.
IEEE Trans. Inform. Theory , 52(2):489–509, Feb 2006.[13] E. J. Cand`es, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted (cid:96) minimization. J. FourierAnal. Appl. , 14(5):877–905, Dec 2008.[14] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications toimaging.
J. Math. Imaging Vision , 40(1):120–145, 2011.[15] R. Chartrand and V. Staneva. Restricted isometry properties and nonconvex compressive sensing.
InverseProblems , 24(3):035020, 14, 2008.[16] R. Chartrand and W. Yin. Iteratively reweighted algorithms for compressive sensing. In
Proc. IEEE Int.Conf. Acoustics, Speech and Signal Processing , pages 3869–3872, March 2008.1717] X. Chen. Smoothing methods for nonsmooth, nonconvex minimization.
Math. Program. , 134(1):71–99,2012.[18] X. Chen, L. Niu, and Y. Yuan. Optimality conditions and a smoothing trust region newton method fornonLipschitz optimization.
SIAM J. Optim. , 23(3):1528–1552, July 2013.[19] X. Chen, F. Xu, and Y. Ye. Lower bound theory of nonzero entries in solutions of (cid:96) - (cid:96) p minimization. SIAM J. Sci. Comput. , 32(5):2832–2852, 2010.[20] X. Chen and W. Zhou. Convergence of the reweighted (cid:96) minimization algorithm for (cid:96) - (cid:96) p minimization. Comput. Optim. Appl. , 59(1):47–61, Oct 2014.[21] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problemswith a sparsity constraint.
Comm. Pure Appl. Math. , 57(11):1413–1457, 2004.[22] I. Daubechies, R. A. Devore, M. Fornasier, and C. S. Gntrk. Iteratively reweighted least squares minimiza-tion for sparse recovery.
Comm. Pure Appl. Math. , 63(1):1–38, 2010.[23] T. E. Dielman. Least absolute value regression: recent contributions.
J. Stat. Comput. Simul. , 75(4):263–286, 2005.[24] D. L. Donoho. Compressed sensing.
IEEE Trans. Inform. Theory , 52(4):1289–1306, 2006.[25] S. Foucart and M.-J. Lai. Sparsest solutions of underdetermined linear systems via (cid:96) q -minimization for < q ≤ . Appl. Comput. Harmon. Anal. , 26(3):395 – 407, May 2009.[26] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems.
SIAM J. Imaging Sci. ,2(2):323–343, 2009.[27] I. F. Gorodnitsky and B. D. Rao. Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm.
IEEE Trans. Signal Process. , 45(3):600–616, Mar 1997.[28] B. He and X. Yuan. On the O (1 /n ) convergence rate of the douglascrachford alternating direction method. SIAM J. Numer. Anal. , 50(2):700–709, 2012.[29] H. P. J.
Robust statistics . Wiley, New York, 1981.[30] D. Krishnan and R. Fergus. Fast image deconvolution using hyper-laplacian priors. In
Proc. 22nd Int.Conf. Neural Information Processing Systems , pages 1033–1041, 2009.[31] K. Kurdyka. On gradients of functions definable in o-minimal structures.
Ann. Inst. Fourier (Grenoble) ,48(3):769–783, 1998.[32] M.-J. Lai and J. Wang. An unconstrained (cid:96) q minimization with < q ≤ for sparse solution of underde-termined linear systems. SIAM J. Optim. , 21(1):82–101, 2011.[33] M.-J. Lai, Y. Xu, and W. Yin. Improved iteratively reweighted least squares for unconstrained smoothed (cid:96) q minimization. SIAM J. Numer. Anal. , 51(2):927–957, 2013.[34] S. Łojasiewicz. Une propri´et´e topologique des sous-ensembles analytiques r´eels. In
Les ´Equations auxD´eriv´ees Partielles (Paris, 1962) , pages 87–89. ´Editions du Centre National de la Recherche Scientifique,Paris, 1963.[35] Z. Lu. Iterative reweighted minimization methods for (cid:96) p regularized unconstrained nonlinear program-ming. Math. Program. , 147(1):277–307, Oct 2014.1836] J. Lv and Y. Fan. A unified approach to model selection and sparse recovery using regularized least squares.
Ann. Statist. , 37(6A):3498–3528, 2009.[37] B. K. Natarajan. Sparse approximate solutions to linear systems.
SIAM J. Comput. , 24(2):227–234, 1995.[38] J. L. Paredes and G. R. Arce. Compressive sensing signal reconstruction by weighted median regressionestimates.
IEEE Trans. Signal Process. , 59(6):2585–2601, 2011.[39] R. T. Rockafellar and R. J.-B. Wets.
Variational Analysis , volume 317 of
Grundlehren der MathematischenWissenschaften . Springer-Verlag Berlin Heidelberg, 1998.[40] Q. Sun. Recovery of sparsest signals via (cid:96) q -minimization. Appl. Comput. Harmon. Anal. , 32(3):329 – 341,May 2012.[41] J. A. Tropp and S. J. Wright. Computational methods for sparse solution of linear inverse problems.
Proc.IEEE , 98(6):948–958, June 2010.[42] C. Wu and X.-C. Tai. Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF,vectorial TV, and high order models.
SIAM J. Imaging Sci. , 3(3):300–339, 2010.[43] Z. Xu, X. Chang, F. Xu, and H. Zhang. L / regularization: A thresholding representation theory and afast solver. IEEE Trans. Neural Netw. Learn. Syst. , 23(7):1013–1027, Jul 2012.[44] M. Yan and W. Yin. Self equivalence of the alternating direction method of multipliers. In R. Glowinski,S. J. Osher, and W. Yin, editors,
Splitting Methods in Communication, Imaging, Science, and Engineering.Scientific Computation . Springer, Cham, 2016.[45] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models.
Ann. Statist. ,36(4):1509–1533, Aug. 2008.[46] W. Zuo, D. Meng, L. Zhang, X. Feng, and D. Zhang. A generalized iterated shrinkage algorithm fornon-convex sparse coding. In