Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems
aa r X i v : . [ m a t h . O C ] A ug LINEAR CONVERGENCE OF INEXACT DESCENT METHOD AND INEXACTPROXIMAL GRADIENT ALGORITHMS FOR LOWER-ORDERREGULARIZATION PROBLEMS
YAOHUA HU ∗ , CHONG LI † , KAIWEN MENG ‡ , AND
XIAOQI YANG § Abstract.
The ℓ p regularization problem with 0 < p < ℓ p regularisation problem.In the present paper, we investigate the linear convergence issue of one inexact descent method and two inexactproximal gradient algorithms (PGA). For this purpose, an optimality condition theorem is explored to provide theequivalences among a local minimum, second-order optimality condition and second-order growth property of the ℓ p regularization problem. By virtue of the second-order optimality condition and second-order growth property,we establish the linear convergence properties of the inexact descent method and inexact PGAs under some simpleassumptions. Both linear convergence to a local minimal value and linear convergence to a local minimum areprovided. Finally, the linear convergence results of the inexact numerical methods are extended to the infinite-dimensional Hilbert spaces. Key words. sparse optimization, nonconvex regularization, descent methods, proximal gradient algorithms,linear convergence.
AMS subject classifications.
Primary, 65K05, 65J22; Secondary, 90C26, 49M37
1. Introduction.
The following linear inverse problem is at the core of many problems invarious areas of mathematics and applied sciences: finding x ∈ R n such that Ax = b, where A ∈ R m × n and b ∈ R m are known, and an unknown noise is included in b . If m ≪ n ,the above linear inverse problem is seriously ill-conditioned and has infinitely many solutions, andresearchers are interested in finding solutions with certain structures, e.g., the sparsity structure.A popular technique for approaching a sparse solution of the linear inverse problem is to solve the ℓ regularization problem min x ∈ R n k Ax − b k + λ k x k , ∗ College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, P. R. China ([email protected]). This author’s work was supported in part by the National Natural Science Foundation of China(11601343) and Natural Science Foundation of Guangdong (2016A030310038). † School of Mathematical Sciences, Zhejiang University, Hangzhou 310027, P. R. China ([email protected]). Thisauthor’s work was supported in part by the National Natural Science Foundation of China (11571308). ‡ School of Economics and Management, Southwest Jiaotong University, Chengdu 610031, P. R. China (mk-wfl[email protected]). This author’s work was supported in part by the National Natural Science Foundation of China(11671329). § Department of Applied Mathematics, The Hong Kong Polytechnic University, Kowloon, Hong Kong([email protected]). This author’s work was supported in part by the Research Grants Council of HongKong (PolyU 152167/15E). 1
LINEAR CONVERGENCE OF INEXACT DESCENT METHODS where k·k denotes the Euclidean norm, k x k := P ni =1 | x i | is a sparsity promoting norm, and λ > ℓ regularization problem has been extensively investigated (see, e.g., [4, 17, 18, 35, 51, 54])and gained successful applications in a wide range of fields, such as compressive sensing [12, 19],image science [4, 20], systems biology [44, 48] and machine learning [3, 33].However, in recent years, it has been revealed by extensive empirical studies that the solutionsobtained from the ℓ regularization may be much less sparse than the true sparse solution, andthat the ℓ regularization cannot recover a signal or an image with the least measurements whenapplied to compressive sensing; see, e.g., [14, 53, 58]. To overcome these drawbacks, the following ℓ p regularization problem (0 < p <
1) was introduced in [14, 53] to improve the performance ofsparsity recovery: min x ∈ R n k Ax − b k + λ k x k pp , (1.1)where k x k p := ( P ni =1 | x i | p ) /p is the ℓ p quasi-norm. It was shown in [14] that the ℓ p regular-ization requires a weaker restricted isometry property to guarantee perfect sparsity recovery andallows to obtain a more sparse solution from fewer linear measurements than that required by the ℓ regularization; and it was illustrated in [23, 53] that the ℓ p regularization has a significantlystronger capability in obtaining a sparse solution than the ℓ regularization. Benefitting from theseadvantages, the ℓ p regularization technique has been applied in many fields; see [23, 34, 38, 39]and references therein. It is worth noting that the ℓ p regularization problem (1.1) is a variant oflower-order penalty problems, investigated in [11, 25, 31], for a constrained optimization problem.The main advantage of the lower-order penalty functions over the classical ℓ penalty functionin the context of constrained optimization is that they require weaker conditions to guarantee anexact penalization property and that their least exact penalty parameter is smaller.Motivated by these significant advantages and successful applications of the ℓ p regularization,tremendous efforts have been devoted to the study of optimization algorithms for the ℓ p regu-larization problem. Many practical algorithms have been investigated for solving problem (1.1),such as an interior-point potential reduction algorithm [22], smoothing methods [15, 16], splittingmethods [27, 28] and iterative reweighted minimization methods [26, 29]. In particular, Xu etal. [53] proposed an iterative half thresholding algorithm, which is efficient in signal recovery andimage deconvolution. In the present paper, we are particularly interested in the proximal gradientalgorithm (in short, PGA) for solving problem (1.1), which is reduced to the algorithm proposedin [53] when p = . Algorithm PGA . Given an initial point x ∈ R n and a sequence of stepsizes { v k } ⊆ R + .For each k ∈ N , having x k , we determine x k +1 as follows: z k := x k − v k A ⊤ ( Ax k − b ) ,x k +1 ∈ arg min x ∈ R n (cid:26) λ k x k pp + 12 v k k x − z k k (cid:27) . (1.2)The PGA is one of the most widely studied first-order iterative algorithms for solving regularizationproblems, and a special case of several iterative methods (see [1, 2, 8, 47, 40]) for solving thecomposite minimization problem min x ∈ R n F ( x ) := H ( x ) + Φ( x ) , (1.3) AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG H : R n → R := R ∪ { + ∞} is smooth and convex, and Φ : R n → R is nonsmooth andpossibly nonconvex. The convergence properties of these iterative methods have been exploredunder the framework of so-call Kurdyka- Lojasiewicz (in short, KL) theory. In particular, Attouchet al. [2] established the global convergence of abstract descent methods for minimizing a KLfunction F : R n → R (see [2, Definition 2.4] for the definition of a KL function), in which thesequence { x k } satisfies the following hypotheses for two positive constants α and β :(H1) ( Sufficient decrease condition ). For each k ∈ N , F ( x k +1 ) − F ( x k ) ≤ − α k x k +1 − x k k ;(H2) ( Relative error condition ). For each k ∈ N , there exists w k +1 ∈ ∂F ( x k +1 ) such that k w k +1 k ≤ β k x k +1 − x k k ;(H3) ( Continuity condition ) ∗ . There exist a subsequence { x k j } and a point x ∗ such thatlim j →∞ x k j → x ∗ and lim j →∞ F ( x k j ) → F ( x ∗ ) . The global convergence of Algorithm PGA follows from the established convergence results of [2].The study of convergence rates of optimization algorithms is an important issue of numericaloptimization, and much attention has been paid to establish the convergence rates of relevantiterative algorithms for solving the structured optimization problem (1.3); see [1, 7, 24, 27, 36, 46,47, 50, 52] and references therein. For example, the linear convergence of the PGA for solving theclassical ℓ (convex) regularization problem has been well investigated; see, e.g., [9, 45, 56, 57] andreferences therein. Under the general framework of the KL (possibly nonconvex) functions, thelinear convergence of several iterative algorithms for solving problem (1.3), including the PGA asa special case, have been established in [1, 8, 47, 52] under the assumption that the KL exponentof the objective function is . However, the KL exponent of the ℓ q regularized function is stillunknown, and thus, the linear convergence result in these references cannot be directly appliedto the ℓ q regularization problem (1.1). On the other hand, Zeng et al. [55] obtained the linearconvergence of the PGA for problem (1.1) with an upper bound on p , which may be less than 1,and a lower bound on the stepsizes { v k } , and Hu et al. [23] established the linear convergence ofthe PGA for the group-wised ℓ p regularization problem under the assumption that the limitingpoint is a local minimum.Another important issue is the practicability of the PGA for solving the ℓ p regularizationproblem (1.1). It is worth noting that the main computation of the PGA is the calculation of theproximity operator of the ℓ p regularizer (1.2). The analytical solutions of the proximity operatorof the ℓ p regularizer (1.2) when p = 1 (resp. , , 0) were provided in [18] (resp. [13], [53],[6]); see also [23, Proposition 18] for the group-wised ℓ p regularizer. However, in the scenario ofgeneral p , the proximity operator of the ℓ p regularizer may not have an analytic solution (see [23,Remark 21]), and it could be computationally expensive to solve subproblem (1.2) exactly at eachiteration. Although some recent works showed impressive empirical performance of the inexactversions of the PGA that use an approximate proximity operator (see, e.g., [23, 32] and references ∗ This condition is satisfied automatically for the ℓ p regularization problem (1.1). LINEAR CONVERGENCE OF INEXACT DESCENT METHODS therein), there is few theoretical analysis, to the best of our knowledge, on how the error in thecalculation of the proximity operator affects the convergence rate of the inexact PGA for solvingthe ℓ p regularization problem (1.1). Two relevant papers on the linear convergence study of theinexact PGA should be mentioned: (a) Schmidt et al. [43] proved the linear convergence of theinexact PGA for solving the convex composite problem (1.3), in which H is strongly convex andΦ is convex; (b) Frankel et al. [21] provided a framework of establishing the linear convergence fordescent methods satisfying (H1)-(H3), where (H2) is replaced by inexact form (H2 ◦ ), see section4. However, the convergence analysis in [21] was based on the assumption that the KL exponentof F is and the inexact version would be not convenient to implement for applications; see theexplanation in Remark 5.2 below. Therefore, neither of the convergence analysis in [21, 43] canbe applied to establish the linear convergence of the inexact PGA for solving the ℓ q regularizationproblem. Thus, a clear analysis of the convergence rate of the inexact PGA is required to advanceour understanding of its strength for solving the ℓ p regularization problem (1.1).The aim of the present paper is to investigate the linear convergence issue of an inexact de-scent method and inexact PGAs for solving the ℓ p regularization problem (1.1). For this purpose,we first investigate an optimality condition theorem for the local minima of the ℓ p regularizationproblem (1.1), in which we establish the equivalences among a local minimum, second-order op-timality condition and second-order growth property of the ℓ p regularization problem (1.1). Theestablished optimality conditions are not only of independent interest (which, in particular, im-prove the result in [16]) in investigating the structure of local minima, but also provide a crucialtool for establishing the linear convergence of the inexact descent method and inexact PGAs forsolving the ℓ p regularization problem in sections 4 and 5.We then consider a general framework of an inexact descent method, in which both (H1) and(H2) are relaxed to inexact forms (see (H1 ◦ ) and (H2 ◦ ) in section 4), for solving the ℓ p regularizationproblem. Correspondingly, the solution sequence does not satisfy the descent property. This isan essential difference from the extensive studies in descent methods and the work of Frankel etal. [21]. Under some mild assumptions on the limiting points and inexact terms, we establishthe linear convergence of the inexact descent method by virtue of both second-order optimalitycondition and second-order growth property (see Theorem 4.2).The convergence theorem for the inexact descent method further provides a useful tool forestablishing the linear convergence of the inexact PGAs in section 5. Our convergence analysisdeviates significantly from that of [21] and relevant works in descent methods, where the KLinequality is used as a standard technique. Indeed, we investigate the inexact versions of the PGAfor solving the ℓ p regularization problem (1.1), in which the proximity operator of the ℓ p regularizer(1.2) is approximately solved at each iteration (with progressively better accuracy). Inspired bythe ideas in the seminal work of Rockafellar [41], we consider two types of inexact PGAs: onemeasures the inexact term by the approximation of proximal regularized function value, and theother is measured by the distance of the iterate to the exact proximal operator (see AlgorithmsIPGA-I and IPGA-II). Under some suitable assumptions on the inexact terms, we establish thelinear convergence of these two inexact PGAs to a local minimum of problem (1.1); see Theorems5.3 and 5.4. It is worth noting that neither of these inexact PGAs satisfies the conditions of theinexact descent method mentioned earlier; see the explanation in Remark 5.1(ii). In our analysisin this part, Theorem 4.2 plays an important role in such a way that we are able to show that the AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG ℓ p regularization problemin infinite-dimensional Hilbert spaces and proved its global convergence to a critical point undersome technical assumptions and using dedicated tools from algebraic geometry; see the explanationbefore Theorem 6.4. Dropping these technical assumptions, we prove the global convergence ofthe PGA under the only assumption on stepsizes (as in [10]), which significantly improves [10,Theorem 5.1], and, under a simple additional assumption, further establish the linear convergenceof the descent method and PGA, as well as their inexact versions, for solving the ℓ p regularizationproblem in infinite-dimensional Hilbert spaces.The paper is organized as follows. In section 2, we present the notations and preliminaryresults to be used in the present paper. In section 3, we establish the equivalences among alocal minimum, second-order optimality condition and second-order growth property of the ℓ p regularization problem (1.1), as well as some interesting corollaries. By virtue of the second-orderoptimality condition and second-order growth property, the linear convergence of an inexact descentmethod and inexact PGAs for solving problem (1.1) are established in sections 4 and 5, respectively.Finally, the convergence properties of relevant algorithms are extended to the infinite-dimensionalHilbert spaces in section 6.
2. Notation and preliminary results.
We consider the n -dimensional Euclidean space R n with inner product h· , ·i and Euclidean norm k · k . For 0 < p < x ∈ R n , the ℓ p “norm” on R n is denoted by k · k p and defined as follows: k x k p := n X i =1 | x i | p ! p for each x ∈ R n ;while k x k denotes the number of nonzero components of x . It is well-known (see, e.g., [23, Eq.(7)]) that k x k p ≥ k x k q for each x ∈ R n and 0 < p ≤ q. (2.1)We write supp : R n → R and sign : R → R to denote the support function and signum function,respectively. For an integer l ≤ n , fixing x ∈ R l and δ ∈ R + , we use B ( x, δ ) to denote the openball of radius δ centered at x (in the Euclidean norm). Moreover, we write R l = := { x ∈ R l : x i = 0 for each i = 1 , . . . , l } . Let R l × l denote the space of all l × l matrices. We endow R l × l with the partial orders ≻ and (cid:23) ,which are defined for any Y, Z ∈ R l × l by Y ≻ (resp. , (cid:23) ) Z ⇐⇒ Y − Z is positive definite (resp., positive semi-definite) . Thus, for Z ∈ R l × l , Z ≻ Z (cid:23) Z ≺
0) means that Z is positive definite (resp., positivesemi-definite, negative definite). In particular, we use diag( x ) to denote a square diagonal matrixwith the components of vector x on its main diagonal. LINEAR CONVERGENCE OF INEXACT DESCENT METHODS
For simplicity, associated with problem (1.1), we use F : R n → R to denote the ℓ p regularizedfunction, and H : R n → R and Φ : R n → R are the functions defined by F ( · ) := H ( · ) + Φ( · ) , H ( · ) := k A · − b k and Φ( · ) := λ k · k pp . (2.2)Letting x ∗ ∈ R n \ { } , we write s := k x ∗ k and I := supp( x ∗ ) , (2.3)We write A i to denote the i -th column of A , A I := ( A i ) i ∈ I and x I := ( x i ) i ∈ I . Let f : R s → R , h : R s → R and ϕ : R s → R be the functions defined by f ( · ) := h ( · ) + ϕ ( · ) , h ( · ) := k A I · − b k and ϕ ( · ) := λ k · k pp (2.4)Obviously, ϕ is smooth (of arbitrary order) on R s = , and so is f . The first- and second-orderderivatives of ϕ at each y ∈ R s = are respectively given by ∇ ϕ ( y ) = λp (cid:16)(cid:0) | y i | p − sign( y i ) (cid:1) i ∈ I (cid:17) and ∇ ϕ ( y ) = λp ( p − (cid:16)(cid:0) | y i | p − (cid:1) i ∈ I (cid:17) . (2.5)Since 0 < p <
1, it is clear that ∇ ϕ ( y ) ≺ y ∈ R s = . By (2.2) and (2.4), one sees thatΦ( x ) = ϕ ( x I ) and F ( x ) = f ( x I ) for each x satisfying supp( x ) = I. (2.6)The point x ∗ is called a critical point of problem (1.1) if it satisfies that ∇ f ( x ∗ I ) = 0. The followingelementary equality is repeatedly used in our convergence analysis: k Ay − b k − k Ax − b k = h y − x, A ⊤ ( Ax − b ) i + k A ( y − x ) k (2.7)(by Taylor’s formula applied to the function k A · − b k ). We end this section by providing thefollowing lemma, which is useful to establish the linear convergence of inexact decent methods. Lemma 2.1.
Let η ∈ (0 , , and let { a k } and { δ k } be two sequences of nonnegative scalarssuch that a k +1 ≤ a k η + δ k for each k ∈ N and lim sup k →∞ δ k +1 δ k < . (2.8) Then there exist θ ∈ (0 , and K > such that a k ≤ Kθ k for each k ∈ N . (2.9) Proof . We first claim that there exist θ ∈ (0 ,
1) and a sequence of nonnegative scalars { c k } such that a k +1 ≤ a k θ + c k θ k for each k ∈ N and ∞ X k =0 c k < + ∞ . (2.10)Indeed, by the second inequality of (2.8), there exist τ ∈ (0 ,
1) and N ∈ N such that δ k +1 ≤ τ δ k for each k ≥ N . Letting c i := τ i − N δ N when i ≥ N and c i := δ i τ i otherwise, this shows that δ k ≤ c k τ k for each k ∈ N . (2.11) AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG P ∞ k =0 c k = P N − k =0 c k + τ − N − τ δ N < + ∞ . Letting θ := max { η, τ } andcombining (2.8) and (2.11), we arrive at (2.10), as desired.Next, we show by mathematical induction that the following relation holds for each k ∈ N : a k ≤ max (cid:26) , a c + θ (cid:27) k − Y i =0 ( c i + θ ) . (2.12)Clearly, (2.12) holds for k = 1. Assuming that (2.12) holds for each k ≤ N , we estimate a N +1 inthe following two cases.Case 1. If a N < θ N , it follows from the first inequality of (2.10) that a N +1 ≤ ( θ + c N ) θ N ≤ N Y i =0 ( c i + θ ) ≤ max (cid:26) , a c + θ (cid:27) N Y i =0 ( c i + θ ) . Case 2. If a N ≥ θ N , one sees by (2.10) and (2.12) (when k = N ) that a N +1 ≤ ( θ + c N ) a N ≤ max (cid:26) , a c + θ (cid:27) N Y i =0 ( c i + θ ) . Hence, for both cases, (2.12) holds for k = N + 1, and so, it holds for each k ∈ N by mathematicalinduction. Clearly, (2.12) can be reformulated as a k ≤ max (cid:26) , a c + θ (cid:27) θ k exp k − X i =0 ln (cid:16) c i θ (cid:17)! . (2.13)Note that ln(1 + t ) ≤ t for any t ≥
0. It follows that k − X i =0 ln(1 + c i θ ) ≤ θ k − X i =0 c i ≤ θ ∞ X i =0 c i < + ∞ (by (2.10)). Letting K := max n , a c + θ o exp (cid:0) θ P ∞ i =0 c i (cid:1) , we conclude (2.9) by (2.13), and theproof is complete.
3. Characterizations of local minima.
Optimality condition is a crucial tool for optimiza-tion problems, either providing the useful characterizations of (local) minima or designing effectiveoptimization algorithms. Some sufficient or necessary optimality conditions for the ℓ p regulariza-tion problem (1.1) have been developed in the literature; see [16, 23, 30, 37] and references therein.In particular, Chen et al. [16] established the following first- and second-order necessary optimalityconditions for a local minimum x ∗ of problem (1.1), i.e.,2 A ⊤ I ( A I x ∗ I − b ) + λp (cid:16)(cid:0) | x ∗ i | p − sign( x ∗ i ) (cid:1) i ∈ I (cid:17) = 0 , (3.1)and 2 A ⊤ I A I + λp ( p − (cid:16)(cid:0) | x ∗ i | p − (cid:1) i ∈ I (cid:17) (cid:23) , (3.2)where I = supp( x ∗ ) is defined by (2.3). These necessary conditions were used to estimate the(lower/upper) bounds for the absolute values and the number of nonzero components of local LINEAR CONVERGENCE OF INEXACT DESCENT METHODS minima. However, it seems that a complete optimality condition that is both necessary andsufficient for the local minima of the ℓ p regularization problem has not been established yet in theliterature. To remedy this gap, this section is devoted to providing some necessary and sufficientcharacterizations for the local minima of problem (1.1).To begin with, the following lemma (i.e., [23, Lemma 10]) illustrates that the ℓ p regularizedfunction satisfies a first-order growth property at 0, which is useful for proving the equivalentcharacterizations of its local minima. This property also indicates a significant advantage of the ℓ p regularization over the ℓ regularization that the ℓ p regularization has a strong sparsity promotingcapability. Lemma 3.1.
Let h : R n → R be a continuously differentiable function. Then there exist ǫ > and δ > such that h ( x ) + λ k x k pp ≥ h (0) + ǫ k x k for any x ∈ B (0 , δ ) . The main result of this section is presented in the following theorem, in which we establish theequivalences among a local minimum, second-order optimality condition and second-order growthproperty of the ℓ p regularization problem (1.1). Note that the latter two conditions were providedin [23] as necessary conditions for the group-wised ℓ p regularization problem, while the second-orderoptimality condition is an improvement of the result in [16] in that the matrix in the left-hand sideof (3.2) is indeed positive definite. Recall that F : R n → R is the ℓ p regularized function definedby (2.2) and I = supp( x ∗ ) is defined by (2.3). Theorem 3.2.
Let x ∗ ∈ R n \ { } . Then the following assertions are equivalent: (i) x ∗ is a local minimum of problem (1.1) . (ii) (3.1) and the following condition hold: A ⊤ I A I + λp ( p − (cid:16)(cid:0) | x ∗ i | p − (cid:1) i ∈ I (cid:17) ≻ . (3.3)(iii) Problem (1.1) satisfies the second-order growth property at x ∗ , i.e., there exist ǫ > and δ > such that F ( x ) ≥ F ( x ∗ ) + ǫ k x − x ∗ k for any x ∈ B ( x ∗ , δ ) . (3.4) Proof . Without loss of generality, we assume that I = { , . . . , s } .(i) ⇒ (ii). Suppose that (i) holds. Then x ∗ I is a local minimum of f (by (2.6)), and (3.1) and(3.2) hold by [16, pp. 76] (they can also be checked directly by the optimality condition for smoothoptimization in [5, Proposition 1.1.1]): ∇ f ( x ∗ I ) = 0 and ∇ f ( x ∗ I ) (cid:23)
0. Thus, it remains to prove(3.3), i.e., ∇ f ( x ∗ I ) ≻
0. To do this, suppose on the contrary that (3.3) does not hold. Then, by(3.2), there exists w = 0 such that h w, ∇ f ( x ∗ I ) w i = 0. Let ψ : R → R be defined by ψ ( t ) := f ( x ∗ I + tw ) for each t ∈ R . Then one sees that ψ ′ (0) = h w, ∇ f ( x ∗ I ) i = 0 and ψ ′′ (0) = h w, ∇ f ( x ∗ I ) w i = 0, and 0 is a localminimum of ψ (as x ∗ I is a local minimum of f ). Therefore, ψ (3) (0) = 0 and ψ (4) (0) ≥
0. However,
AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG ψ (4) (0) = λp ( p − p − p − X i ∈ I (cid:0) w i | x ∗ i | p − (cid:1) < , which yields a contradiction. Hence, assertion (ii) holds.(ii) ⇒ (iii). Suppose that assertion (ii) of this theorem holds. Then ∇ f ( x ∗ I ) = 0 and ∇ f ( x ∗ I ) ≻ . (3.5)By Taylor’s formula, we have that f ( y ) = f ( x ∗ I ) + ∇ f ( x ∗ I )( y − x ∗ I ) + 12 h y − x ∗ I , ∇ f ( x ∗ I )( y − x ∗ I ) i + o ( k y − x ∗ I k ) for each y ∈ R s . This, together with (3.5), implies that there exist ǫ > δ > f ( y ) ≥ f ( x ∗ I ) + 2 ǫ k y − x ∗ I k for any y ∈ B ( x ∗ I , δ ) . (3.6)Let τ > √ ǫ τ ≥ k A I kk A I c k , and define g : R n − s → R by g ( z ) := k A I c z k + 2 h A I x ∗ I − b, A I c z i − τ k z k for each z ∈ R n − s . (3.7)Clearly, g is continuously differentiable on R n − s with g (0) = 0. Then, by Lemma 3.1, there exist ǫ > δ > g ( z ) + λ k z k pp ≥ g (0) + ǫ k z k = ǫ k z k ≥ z ∈ B (0 , δ ) . (3.8)Fix x := (cid:18) x I x I c (cid:19) with x I ∈ B ( x ∗ I , δ ) and x I c ∈ B (0 , δ ). Then it follows from the definitions of thefunctions F , f and g (see (2.2), (2.4) and (3.7)) that F ( x ) = k A I x I + A I c x I c − b k + λ k x I k pp + λ k x I c k pp = k A I x I − b k + k A I c x I c k + 2 h A I x I − b, A I c x I c i + λ k x I k pp + λ k x I c k pp = f ( x I ) + g ( x I c ) + 2 τ k x I c k + λ k x I c k pp + 2 h A I ( x I − x ∗ I ) , A I c x I c i . Applying (3.6) (to x I in place of y ) and (3.8) (to x I c in place of z ), we have that F ( x ) ≥ f ( x ∗ I ) + 2 ǫ k x I − x ∗ I k + 2 τ k x I c k + 2 h A I ( x I − x ∗ I ) , A I c x I c i . By the definition of τ , we have that2 |h A I ( x I − x ∗ I ) , A I c x I c i| ≤ √ ǫ τ k x I − x ∗ I kk x I c k ≤ ǫ k x I − x ∗ I k + τ k x I c k , and then, it follows that F ( x ) ≥ f ( x ∗ I ) + ǫ k x I − x ∗ I k + τ k x I c k ≥ f ( x ∗ I ) + min { ǫ , τ }k x − x ∗ k (noting that x I c = x I c − x ∗ I c ). Hence F ( x ) ≥ F ( x ∗ ) + min { ǫ , τ }k x − x ∗ k , as f ( x ∗ I ) = F ( x ∗ )by (2.6). This means that (3.4) holds with ǫ := min { ǫ , τ } and δ := min { δ , δ } , and so (iii) isverified.0 LINEAR CONVERGENCE OF INEXACT DESCENT METHODS (iii) ⇒ (i). It is trivial. The proof is complete. Remark 3.1.
As shown in Lemma 3.1, for the case when x ∗ = 0 , the equivalence betweenassertions (i) and (iii) in Theorem 3.2 is true, while assertion (ii) is not well defined (as I = ∅ ). The structure of local minima is a useful property for the numerical study of the ℓ p regular-ization problem; see, e.g., [16, 53]. As a byproduct of Theorem 3.2, we will prove that the numberof local minima of problem (1.1) is finite, which was claimed in [16, Corollary 2.2] but with anincomplete proof (because their proof is based on the fact that f has at most one local minimumwhenever A ⊤ I A I is of full rank, which is unclear). Corollary 3.3.
The ℓ p regularization problem (1.1) has only a finite number of local min-ima.Proof . Let I ⊆ { , . . . , n } . We use LM( F, R n ; I ) to denote the set of local minima x ∗ ofproblem (1.1) with supp( x ∗ ) = I , and setΘ( I ) := { x I : x ∈ LM( F, R n ; I ) } . (3.9)Then the set of local minima of problem (1.1) can be expressed as the union of LM( F, R n ; I ) overall subsets I ⊆ { , . . . , n } . Clearly, LM( F, R n ; I ) and Θ( I ) have the same cardinality. Thus, tocomplete the proof, it suffices to show that Θ( I ) is finite. To do this, we may assume that, withoutloss of generality, I = { , . . . , s } , and write O := { y ∈ R s = : ∇ f ( y ) ≻ } , (3.10)where f : R s → R is defined by (2.4). Clearly, O is open in R s , and Θ( I ) ⊆ O by Theorem 3.2.Thus, it follows from (3.9) that Θ( I ) ⊆ LM( f, R s ) ∩ O (3.11)(we indeed can show an equality), where, for an open subset U of R s , LM( f, U ) stands for the setof local minima of f over U . For simplicity, we set R sJ := { y ∈ R s : y j > j ∈ J, y j < j ∈ I \ J } and O J := O ∩ R sJ for any J ⊆ I . Then each O J is open in R s (as so are O and R sJ ). Thisparticularly implies that LM( f, R s ) ∩ O J = LM( f, O J ) for each J ⊆ I. (3.12)Moreover, it is clear that O = ∪ J ⊆ I O J . HenceΘ( I ) ⊆ LM( f, R s ) ∩ O = ∪ J ⊆ I (LM( f, R s ) ∩ O J ) = ∪ J ⊆ I LM( f, O J ) (3.13)(thanks to (3.11) and (3.12)). Below we show that O J is convex for each J ⊆ I. (3.14)Granting this, one concludes that each LM( f, O J ) is at most a singleton, because ∇ f ≻ O J by (3.10) and then f is strictly convex on O J by the higher-dimensional derivative tests forconvexity (see, e.g., [42, Theorem 2.14]); hence Θ( I ) is finite by (3.13), completing the proof. AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG J ⊆ I , and let y, z ∈ O J . Then, by definition, one has that ∇ f ( y ) ≻ ∇ f ( z ) ≻ . (3.15)By elementary calculus, the map t t p − is convex on (0 , + ∞ ), and so | y i | p − + | z i | p − ≥ (cid:18) | y i | + | z i | (cid:19) p − for each i ∈ I. Consequently, we havediag (cid:18)(cid:18) | y i | p − + | z i | p − (cid:19) i ∈ I (cid:19) (cid:23) diag (cid:18) | y i | + | z i | (cid:19) p − ! i ∈ I ! . This, together with (2.5) and (3.15), implies that ∇ f (cid:18) y + z (cid:19) (cid:23) ∇ f ( y ) + ∇ f ( z )2 ≻ . Since y + z ∈ R sJ ⊆ R s = , it follows that y + z ∈ O ∩ R sJ = O J and (3.14) is proved.Another byproduct of Theorem 3.2 is the following corollary, in which we show the isolationof a local minimum of problem (1.1) in the sense of critical points. This property is useful forestablishing the global convergence of the inexact descent method and inexact PGA. Corollary 3.4.
Let x ∗ be a local minimum of the ℓ p regularization problem (1.1) . Then x ∗ is an isolated critical point of problem (1.1) .Proof . Recall that I = supp( x ∗ ) and f are defined by (2.3) and (2.4), respectively. Since x ∗ isa local minimum of problem (1.1), it follows from (2.6) that x ∗ I is a local minimum of f and fromTheorem 3.2 (cf. (3.3)) that ∇ f ( x ∗ I ) ≻
0. By the fact that x ∗ I ∈ R s = and by the smoothness of f at x ∗ I , we can find a constant τ with0 < τ < (cid:18) λp k A ⊤ ( Ax ∗ − b ) k ∞ (cid:19) p − (3.16)such that B ( x ∗ I , τ ) ⊆ R s = ∩ { y ∈ R s : ∇ f ( y ) ≻ } . (3.17)We aim to show that B ( x ∗ , τ ) includes only one critical point of problem (1.1), that is x ∗ . To dothis, let x ∈ B ( x ∗ , τ ) be a critical point of problem (1.1). We first claim that supp( x ) = I . It isclear by (3.17) that x i = 0 when i ∈ I, and | x i | < τ otherwise. (3.18)If i ∈ supp( x ), by the definition of critical point, it follows that 2 A ⊤ i ( Ax − b )+ λp | x i | p − sign( x i ) = 0;consequently, by the fact that x is closed to x ∗ , we obtain that | x i | = (cid:18) | A ⊤ i ( Ax − b ) | λp (cid:19) p − > (cid:18) | A ⊤ i ( Ax ∗ − b ) | λp (cid:19) p − > (cid:18) k A ⊤ ( Ax ∗ − b ) k ∞ λp (cid:19) p − > τ LINEAR CONVERGENCE OF INEXACT DESCENT METHODS (due to (3.16)). This, together with (3.18), shows that supp( x ) = I , as desired.Finally, we show that x = x ∗ . By (3.17), one has that f is strongly convex on B ( x ∗ I , τ ). Since x is a critical point of problem (1.1), one has by the definition of critical point that ∇ f ( x I ) = 0,and so x I is a minimum of f on B ( x ∗ I , τ ). By the strongly convexity of f on B ( x ∗ I , τ ), we obtain x I = x ∗ I , and hence that x = x ∗ (since supp( x ) = I ). The proof is complete.
4. Linear convergence of inexact descent method.
This section aims to establish thelinear convergence of an inexact version of descent methods in a general framework. In our anal-ysis, we will employ both second-order optimality condition and second-order growth property,established in Theorem 3.2.Let α and β be fixed positive constants and { ǫ k } ⊆ R + be a sequence of nonnegative scalars,and recall that F : R n → R is the ℓ p regularized function defined by (2.2). We consider a sequence { x k } that satisfies the following relaxed conditions of (H1) and (H2).(H1 ◦ ) For each k ∈ N , F ( x k +1 ) − F ( x k ) ≤ − α k x k +1 − x k k + ǫ k ; (4.1)(H2 ◦ ) For each k ∈ N , there exists w k +1 ∈ ∂F ( x k +1 ) such that k w k +1 k ≤ β k x k +1 − x k k + ǫ k . Frankel et al. [21] proposed an inexact version of descent methods, in which only (H2) is relaxedto the inexact form (H2 ◦ ) while the exact form (H1) is maintained; consequently, the sequence { x k } satisfies a descent property. However, in our framework, note by (4.1) that the sequence { x k } does not satisfy a descent property. This is an essential difference from [21] and extensive studiesin descent methods.We begin with the following useful properties of the inexact descent method; in particular, aconsistent property that x k has the same support as x ∗ when k is large (assertion (ii)) is useful forproviding a uniform decomposition of { x k } in convergence analysis. Proposition 4.1. (i)
Let { x k } be a sequence satisfying (H1 ◦ ) with ∞ X k =0 ǫ k < + ∞ . (4.2) Then P ∞ k =0 k x k +1 − x k k < + ∞ . (ii) Let { x k } be a sequence satisfying (H2 ◦ ) with lim k →∞ ǫ k = 0 . Suppose that { x k } convergesto x ∗ . Then there exists N ∈ N such that supp( x k ) = supp( x ∗ ) for each k ≥ N. (4.3) Proof . Assertion (i) of this theorem is trivial by the assumption and the fact that F ≥ γ := (cid:18) λpβ + 1 + 4 k A ⊤ ( Ax ∗ − b ) k ∞ (cid:19) − p . (4.4) AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG { x k } converges to x ∗ , there exists N ∈ N such that for each k ≥ Nx ki = 0 when i ∈ supp( x ∗ ) , and | x ki | < γ otherwise. (4.5)Fix k ≥ N and i ∈ supp( x k ). By the assumption (H2 ◦ ), there exists w k ∈ ∂F ( x k ) such that k w k k ≤ β k x k − x k − k + ǫ k < β + 1 (4.6)(by the assumptions that lim k →∞ ǫ k = 0 and lim k →∞ x k = x ∗ ). Noting that i ∈ supp( x k ), weobtain by (2.5) that | w ki | = | A ⊤ i ( Ax k − b ) + λp | x ki | p − sign( x ki ) | ≥ λp | x ki | p − − k A ⊤ ( Ax ∗ − b ) k ∞ . This, together with (4.6) and (4.4), shows that | x ki | > γ when i ∈ supp( x k ). This, together with(4.5), shows that supp( x k ) = supp( x ∗ ) for each k ≥ N . The proof is complete.The main theorem of this section is as follows. The convergence theorem is not only ofindependent interest in establishing the linear convergence of inexact descent method, but alsoprovides a useful approach for the linear convergence study of the inexact PGA in the next section.Recall that functions F and f are defined by (2.2) and (2.4), respectively. Theorem 4.2.
Let { x k } be a sequence satisfying (H1 ◦ ) and { ǫ k } satisfy (4.2) . Suppose oneof limiting points of { x k } , denoted by x ∗ , is a local minimum of problem (1.1) . Then the followingassertions are true. (i) { x k } converges to x ∗ . (ii) Suppose further that { x k } satisfies (H2 ◦ ) and lim sup k →∞ ǫ k +1 ǫ k < . (4.7) Then { x k } converges linearly to x ∗ , that is, there exist C > and η ∈ (0 , such that F ( x k ) − F ( x ∗ ) ≤ Cη k and k x k − x ∗ k ≤ Cη k for each k ∈ N . (4.8) Proof . (i) It follows from Proposition 4.1(i) that lim k →∞ k x k +1 − x k k = 0. By the assumptionthat x ∗ is a local minimum of problem (1.1), it follows from Lemma 3.4 that x ∗ is an isolated criticalpoint of problem (1.1). Then, we can prove that { x k } converges to x ∗ (the proof is standard; see,e.g., the proof of [10, Proposition 2.3]).(ii) If x ∗ = 0, it follows from Proposition 4.1(ii) that there exists N ∈ N such that x k = 0 foreach k ≥ N , and so the conclusion holds. Then it remains to prove (4.8) for the case when x ∗ = 0.Suppose that x ∗ = 0. Recall that I = supp( x ∗ ) is defined by (2.3). By the assumption that x ∗ is a local minimum of problem (1.1), assertions (ii) and (iii) of Theorem 3.2 are satisfied; hence, itfollows from (3.3) and (2.5) that 2 A ⊤ I A I + ∇ ϕ ( x ∗ I ) = ∇ f ( x ∗ I ) ≻
0. This, together with x ∗ I ∈ R s = (cf. (2.3)) and the smoothness of ϕ at x ∗ I , implies that there exist ǫ > δ > L ϕ > B ( x ∗ I , δ ) ⊆ R s = ∩ { y ∈ R s : ∇ ϕ ( y ) ≻ − A ⊤ I A I } , (4.9) k∇ ϕ ( y ) − ∇ ϕ ( z ) k ≤ L ϕ k y − z k for any y, z ∈ B ( x ∗ I , δ ) . LINEAR CONVERGENCE OF INEXACT DESCENT METHODS
By assertion (i) of this theorem that { x k } converges to x ∗ , there exists N ∈ N such that (4.3) holds(by Proposition 4.1(ii)) and x kI ∈ B ( x ∗ I , δ ) for each k ≥ N . In particular, the following relationshold for each k ≥ N : F ( x k +1 ) ≥ F ( x ∗ ) + ǫ k x k +1 − x ∗ k , (4.10)and k∇ ϕ ( x kI ) − ∇ ϕ ( x k +1 I ) k ≤ L ϕ k x kI − x k +1 I k . (4.11)Noting by (2.5) and (4.9) that ∇ ϕ ( w ) ≺ ∇ f ( w ) ≻ w ∈ B ( x ∗ I , δ ) , it follows that ϕ is concave and f is convex on B ( x ∗ I , δ ). Fix k ≥ N . Then one has that h∇ ϕ ( x kI ) , x kI − x k +1 I i ≤ ϕ ( x kI ) − ϕ ( x k +1 I ) (4.12)and f ( x kI ) − f ( x ∗ I ) ≤ h∇ f ( x kI ) , x kI − x ∗ I i (4.13)(as x kI , x k +1 I ∈ B ( x ∗ I , δ )). To proceed, we define r k := F ( x k ) − F ( x ∗ ) for each k ∈ N , (4.14)and then it follows from (4.3) and (2.6) that r k = f ( x kI ) − f ( x ∗ I ) . (4.15)Hence, using (4.13), we obtain that r k ≤ h∇ f ( x kI ) , x kI − x ∗ I i = h∇ f ( x kI ) , x kI − x k +1 I i + h∇ f ( x kI ) , x k +1 I − x ∗ I i . (4.16)By (2.4) and (4.12), it follows that h∇ f ( x kI ) , x kI − x k +1 I i = h∇ h ( x kI ) , x kI − x k +1 I i + h∇ ϕ ( x kI ) , x kI − x k +1 I i≤ h∇ h ( x kI ) , x kI − x k +1 I i + ϕ ( x kI ) − ϕ ( x k +1 I ) . Recall from (2.4) that ∇ h ( x kI ) = 2 A ⊤ I ( A I x kI − b ). Then, by (2.7) (with A I , x k +1 I , x k +1 I in place of A , y , x ), we have that h∇ f ( x kI ) , x kI − x k +1 I i ≤ f ( x kI ) − f ( x k +1 I ) + k A I ( x k +1 I − x kI ) k ≤ r k − r k +1 + k A k k x k +1 − x k k (4.17)(due to (4.15)). On the other hand, one has that h∇ f ( x kI ) , x k +1 I − x ∗ I i = h∇ f ( x k +1 I ) , x k +1 I − x ∗ I i + h∇ f ( x kI ) − ∇ f ( x k +1 I ) , x k +1 I − x ∗ I i . (4.18)By the assumption (H2 ◦ ), we obtain that h∇ f ( x k +1 I ) , x k +1 I − x ∗ I i ≤ k∇ f ( x k +1 I ) kk x k +1 I − x ∗ I k≤ k w k +1 kk x k +1 I − x ∗ I k≤ β k x k +1 − x k kk x k +1 − x ∗ k + ǫ k k x k +1 − x ∗ k ; AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG h∇ f ( x kI ) − ∇ f ( x k +1 I ) , x k +1 I − x ∗ I i = h∇ h ( x kI ) − ∇ h ( x k +1 I ) + ∇ ϕ ( x kI ) − ∇ ϕ ( x k +1 I ) , x k +1 I − x ∗ I i≤ (2 k A k + L ϕ ) k x k +1 I − x kI kk x k +1 I − x ∗ I k≤ (2 k A k + L ϕ ) k x k +1 − x k kk x k +1 − x ∗ k . Combining the above two inequalities, it follows from (4.18) that h∇ f ( x kI ) , x k +1 I − x ∗ I i ≤ (cid:0) β + 2 k A k + L ϕ (cid:1) k x k +1 − x k kk x k +1 − x ∗ k + ǫ k k x k +1 − x ∗ k . Let σ := β + 2 k A k + L ϕ and τ ∈ (0 , ǫ ) . (4.19)Then one has that h∇ f ( x kI ) , x k +1 I − x ∗ I i ≤ σ τ k x k +1 − x k k + τ k x k +1 − x ∗ k + 12 τ ǫ k + τ k x k +1 − x ∗ k = σ τ k x k +1 − x k k + τ k x k +1 − x ∗ k + 12 τ ǫ k . This, together with (4.16) and (4.17), shows that r k ≤ r k − r k +1 + (cid:18) k A k + σ τ (cid:19) k x k +1 − x k k + τ k x k +1 − x ∗ k + 12 τ ǫ k . (4.20)Recalling (4.14), we obtain by the assumption (H1 ◦ ) that k x k +1 − x k k ≤ α (cid:0) F ( x k ) − F ( x k +1 ) (cid:1) + 1 α ǫ k = 1 α ( r k − r k +1 ) + 1 α ǫ k , and by (4.10) that k x k +1 − x ∗ k ≤ ǫ (cid:0) F ( x k +1 ) − F ( x ∗ ) (cid:1) = 1 ǫ r k +1 . Hence, (4.20) reduces to r k ≤ r k − r k +1 + 2 τ k A k + σ τ α ( r k − r k +1 ) + 2 τ k A k + σ τ α ǫ k + τǫ r k +1 + 12 τ ǫ k , that is, r k +1 ≤ − − τǫ τ k A k + σ τα − τǫ ! r k + (cid:18) τ k A k + σ + α τ α + 2 τ k A k + σ − τ α ǫ (cid:19) ǫ k . (4.21)Let ¯ η := 1 − − τǫ τ k A k + σ τα − τǫ and ¯ c := 2 τ k A k + σ + α τ α + 2 τ k A k + σ − τ α ǫ . Then (4.21) reduces to r k +1 ≤ ¯ ηr k + ¯ cǫ k for each k ≥ N. LINEAR CONVERGENCE OF INEXACT DESCENT METHODS
One can check that 0 < ¯ η < c > r k ,¯ η and ¯ cǫ k in place of a k , η and δ k ), there exist θ ∈ (0 ,
1) and
K > F ( x k ) − F ( x ∗ ) = r k ≤ Kθ k for each k ≥ N (by (4.14)). Furthermore, using (4.10), we have that k x k − x ∗ k ≤ (cid:18) F ( x k ) − F ( x ∗ ) ǫ (cid:19) ≤ (cid:18) Kǫ (cid:19) (cid:16) √ θ (cid:17) k for each k ≥ N. This shows that (4.8) holds with C := max n K, (cid:0) Kǫ (cid:1) o and η := √ θ . The proof is complete. Remark 4.1.
It is worth noting in (4.8) that the linear convergence of { F ( x k ) } to F ( x ∗ ) is adirect consequence of that of { x k } to x ∗ . Indeed, recalling from [23, Lemma 2] that k x k pp − k y k pp ≤k x − y k pp for any x, y ∈ R n , we obtain by (2.2) that F ( x k ) − F ( x ∗ ) ≤ k A k k x k − x ∗ k + λ k x k − x ∗ k pp . As an application of Theorem 4.2 for the case when ǫ k ≡
0, the linear convergence of thedescent methods investigated in [1, 2] for solving the ℓ p regularization problem (1.1) is presentedin the following theorem. Theorem 4.3.
Let { x k } be a sequence satisfying (H1) and (H2) . Then { x k } converges to acritical point x ∗ of problem (1.1) . Suppose that x ∗ is a local minimum of problem (1.1) . Then { x k } converges linearly to x ∗ .
5. Linear convergence of inexact proximal gradient algorithms.
The main purposeof this section is to investigate the linear convergence rate of two inexact PGAs for solving the ℓ p regularization problem (1.1). Associated to problem (1.2), we denote the (inexact) proximaloperator of the ℓ p regularizer by P v,ǫ ( x ) := ǫ -arg min y ∈ R n (cid:26) λ k y k pp + 12 v k y − x k (cid:27) . (5.1)In the special case when ǫ = 0, we write P v ( x ) for P v, ( x ) for simplicity. Recall that functions F and H are defined by (2.2). It is clear that the iterative formula of Algorithm PGA is x k +1 ∈ P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1) . Some useful properties of the proximal operator of the ℓ p regularizer are presented as follows. Proposition 5.1.
Let v > , ǫ > , x ∈ R n , ξ ∈ R n , y ∈ P v ( x − v ∇ H ( x )) and z ∈P v,ǫ ( x − v ( ∇ H ( x ) + ξ )) . Then the following assertions are true. (i) F ( z ) − F ( x ) ≤ − (cid:0) v − k A k (cid:1) k z − x k − h z − x, ξ i + ǫ . (ii) For each i ∈ N , the following implication holds y i = 0 ⇒ | y i | ≥ ( vλp (1 − p )) − p . AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG Proof . (i) Recall that H and Φ are defined by (2.2), that is, H ( · ) = k A ·− b k and Φ( · ) = λ k·k pp .It follows from (5.1) thatΦ( z ) + 12 v k z − ( x − v ( ∇ H ( x ) + ξ )) k ≤ Φ( x ) + 12 v k v ( ∇ H ( x ) + ξ ) k + ǫ, that is, Φ( z ) − Φ( x ) ≤ − v k z − x k − h z − x, A ⊤ ( Ax − b ) i − h z − x, ξ i + ǫ. Combining this with (2.7), we prove assertion (i) of this theorem.(ii) Let i ∈ N be such that y i = 0. Then, by (5.1) (with ǫ = 0), one has that y i ∈ arg min t ∈ R (cid:26) λ | t | p + 12 v ( t − ( x − v ∇ H ( x )) i ) (cid:27) . Thus, using its second-order necessary condition, we obtain that λp ( p − | y i | p − + v ≥
0; conse-quently, | y i | ≥ ( vλp (1 − p )) − p . The proof is complete.Inspired by the ideas in the seminal work of Rockafellar [41], we propose the following twotypes of inexact PGAs. Algorithm IPGA-I . Given an initial point x ∈ R n , a sequence of stepsizes { v k } ⊆ R + anda sequence of inexact terms { ǫ k } ⊆ R + . For each k ∈ N , having x k , we determine x k +1 by x k +1 ∈ P v k ,ǫ k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1) . (5.2) Algorithm IPGA-II . Given an initial point x ∈ R n , a sequence of stepsizes { v k } ⊆ R + anda sequence of inexact terms { ǫ k } ⊆ R + . For each k ∈ N , having x k , we determine x k +1 satisfyingdist (cid:0) x k +1 , P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1)(cid:1) ≤ ǫ k . (5.3) Remark 5.1. (i) Algorithms IPGA-I and IPGA-II adopts two popular inexact schemes inthe calculation of proximal operators, respectively: Algorithm IPGA-I (resp., Algorithm IPGA-II)measures the inexact term by the approximation of proximal regularized function value (resp., bythe distance of the iterate to the exact proximal operator). The latter type of inexact scheme iscommonly considered in theoretical analysis, while the former one is more attractive to implementin practical applications. Recently, Frankel et al. [21] proposed an inexact PGA (based on a similarinexact scheme to Algorithm IPGA-II) for solving the general problem (1.3).(ii) Neither Algorithms IPGA-I nor IPGA-II satisfies both conditions (H1 ◦ ) and (H2 ◦ ) of theinexact descent method mentioned in section 4. Indeed, if both conditions (H1 ◦ ) and (H2 ◦ ) aresatisfied, then Lemma 4.1 ensures a consistent property of the support of { x k } to x ∗ (cf. (4.3)),which is impossible for either Algorithms IPGA-I or IPGA-II. In particular, Algorithms IPGA-Ionly satisfies condition (H1 ◦ ) (shown in the proof of Theorem 5.2), while neither (H1 ◦ ) nor (H2 ◦ )can be shown for Algorithms IPGA-II.Using Theorem 4.2, the global convergence result of Algorithm IPGA-I is presented in thefollowing theorem. However, we are not able to prove the global convergence of Algorithm IPGA-II at this moment.8 LINEAR CONVERGENCE OF INEXACT DESCENT METHODS
Theorem 5.2.
Let { x k } be a sequence generated by Algorithm IPGA-I with { v k } satisfying < v ≤ v k ≤ ¯ v < k A k − for each k ∈ N . (5.4) and { ǫ k } satisfying (4.2) . Suppose that one of limiting points of { x k } , denoted by x ∗ , is a localminimum of problem (1.1) . Then { x k } converges to x ∗ .Proof . In view of Algorithm IPGA-I (cf. (5.2)) and by Proposition 5.1(i) (with x k +1 , x k , v k ,0, ǫ k in place of z , x , v , ξ , ǫ ), we obtain that F ( x k +1 ) − F ( x k ) ≤ − (cid:18) v k − k A k (cid:19) k x k +1 − x k k + ǫ k ≤ − (cid:18) v − k A k (cid:19) k x k +1 − x k k + ǫ k (by (5.4)). Note also by (5.4) that v − k A k >
0. This shows that { x k } satisfies (H1 ◦ ) with v − k A k and √ ǫ k in place of α and ǫ k , respectively. Then the conclusion directly follows fromTheorem 4.2(i). The proof is complete.Recall that, for the inexact proximal point algorithm (see, e.g., [41, 49]), the inexact termis assumed to have progressively better accuracy to investigate its convergence rate; specifically,it is assumed that x k +1 ∈ P v k ,ǫ k ( x k ) with ǫ k = o ( k x k +1 − x k k ) or that dist (cid:0) x k +1 , P v k ( x k ) (cid:1) ≤ o ( k x k +1 − x k k ). However, we are not able to prove the linear convergence of the inexact PGAs underthis assumption of inexact term yet (due to the nonconvexity of the ℓ p regularized function), andwe need some additional assumptions to ensure the linear convergence. Recall that I = supp( x ∗ ) isdefined by (2.3). Let { t k } ⊆ R + and { τ k } ⊆ R + . For Algorithms IPGA-I and IPGA-II, we assume x k +1 I ∈ P v k ,ǫ k (cid:0)(cid:0) x k − v k ∇ H ( x k ) (cid:1) I (cid:1) with ǫ k ≤ τ k k x k +1 I − x kI k , (5.5) x k +1 I c ∈ P v k ,ǫ k (cid:0)(cid:0) x k − v k ∇ H ( x k ) (cid:1) I c (cid:1) with ǫ k ≤ τ k k x k +1 I c − x kI c k , (5.6)and dist (cid:0) x k +1 I , (cid:0) P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1)(cid:1) I (cid:1) ≤ t k k x k +1 I − x kI k , (5.7)dist (cid:0) x k +1 I c , (cid:0) P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1)(cid:1) I c (cid:1) ≤ t k k x k +1 I c − x kI c k , (5.8)respectively. Note that (5.5)-(5.6) and (5.7)-(5.8) are sufficient conditions for guaranteeing (5.2)with ǫ k = t k k x k +1 − x k k and (5.3) with ǫ k = t k k x k +1 − x k k , respectively. (The implementablestrategy of inexact PGAs that conditions (5.5)-(5.6) or (5.7)-(5.8) are satisfied will be proposed atthe end of this section.) Now, we establish the linear convergence of the above two inexact PGAsfor solving the ℓ p regularization problem under the additional assumptions, respectively. Recallthat f , h and ϕ are defined by (2.4). Theorem 5.3.
Let { x k } be a sequence generated by Algorithm IPGA-II with { v k } satisfying (5.4) . Suppose that { x k } converges to a local minimum x ∗ of problem (1.1) and that (5.7) and (5.8) are satisfied for each k ∈ N with lim k →∞ t k = 0 . Then { x k } converges linearly to x ∗ .Proof . Note that P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1) is closed for each k ∈ N . Then, by (5.7) and (5.8),one can choose y k ∈ P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1) (5.9) AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG k x k +1 I − y kI k ≤ t k k x k +1 I − x kI k and k x k +1 I c − y kI c k ≤ t k k x k +1 I c − x kI c k for each k ∈ N . (5.10)Noting that x ∗ I ∈ R s = (cf. (2.3)) and recalling that f , h and ϕ are defined by (2.4), there exists0 < δ < ( vλp (1 − p )) − p such that B ( x ∗ I , δ ) ⊆ R s = and k∇ ϕ ( y ) − ∇ ϕ ( z ) k ≤ L ϕ k y − z k for any y, z ∈ B ( x ∗ I , δ ) . (5.11)By the assumption that lim k →∞ x k = x ∗ and I = supp( x ∗ ) (cf. (2.3)), we have by (5.10) thatlim k →∞ y kI = x ∗ I and lim k →∞ y kI c = x ∗ I c = 0. Then there exists N ∈ N such that k x kI − x ∗ I k ≤ δ, k y kI − x ∗ I k ≤ δ and k y kI c k ≤ δ for each k ≥ N. Consequently, one sees that x kI , y kI ∈ B ( x ∗ I , δ ) ⊆ R s = and y kI c = 0 for each k ≥ N (5.12)(by Proposition 5.1(ii)), and by (5.11) that k∇ ϕ ( x k +1 I ) − ∇ ϕ ( y kI ) k ≤ L ϕ k x k +1 I − y kI k for each k ≥ N. (5.13)We first provide an estimate on { x kI c } k ≥ N . By the assumption that lim k →∞ t k = 0, we canassume, without loss of generality, that t k < for each k ≥ N . By (5.12), we obtain from thesecond inequality of (5.10) that k x k +1 I c k ≤ t k k x k +1 I c − x kI c k ≤ t k k x k +1 I c k + t k k x kI c k , and so, k x k +1 I c k ≤ t k − t k k x kI c k < t k k x kI c k for each k ≥ N. (5.14)Below, we estimate { x kI } k ≥ N . To do this, we fix k ≥ N and let τ be a constant such that0 < τ < v − k A k (recalling (5.4)). By (5.10) and using the triangle inequality, one has that12 k x k +1 I − x kI k < (1 − t k ) k x k +1 I − x kI k ≤ k y kI − x kI k ≤ (1 + t k ) k x k +1 I − x kI k < k x k +1 I − x kI k (5.15)(by t k < ). By (5.9), (2.2) and (2.4), we check that y kI ∈ P v k (cid:0) x kI − v k (cid:0) ∇ h ( x kI ) + 2 A I A I c x kI c (cid:1)(cid:1) ,and so, we obtain from Proposition 5.1(i) (with f , h , A I , y kI , x kI , v k , 2 A ⊤ I A I c x kI c , 0 in place of F , H , A , z , x , v , ξ , ǫ ) that f ( y kI ) − f ( x kI ) ≤ − (cid:18) v k − k A I k (cid:19) k y kI − x kI k − h y kI − x kI , A ⊤ I A I c x kI c i≤ − (cid:18) v k − k A k (cid:19) k y kI − x kI k + τ k y kI − x kI k + 1 τ k A k k x kI c k (5.16) ≤ − (cid:18) v − k A k − τ (cid:19) k x k +1 I − x kI k + 1 τ k A k k x kI c k (by (5.4) and (5.15)). By the smoothness of f on B ( x ∗ I , δ )( ⊆ R s = ) and (5.12), there exists L > f ( x k +1 I ) − f ( y kI ) ≤ k∇ f ( y kI ) kk x k +1 I − y kI k + L k x k +1 I − y kI k . (5.17)0 LINEAR CONVERGENCE OF INEXACT DESCENT METHODS (by Taylors formula). The first-order optimality condition of (5.9) says that ∇ ϕ ( y kI ) + 1 v k (cid:0) y kI − x kI + 2 v k A ⊤ I ( Ax k − b ) (cid:1) = 0 . (5.18)Then we obtain by (2.4) that ∇ f ( y kI ) = 2 A ⊤ I ( A I y kI − b ) + ∇ ϕ ( y kI ) = − (cid:18) v k − A ⊤ I A I (cid:19) ( y kI − x kI ) − A ⊤ I A I c x kI c ;consequently, k∇ f ( y kI ) k ≤ (cid:18) v k − k A k (cid:19) k y kI − x kI k + 2 k A k k x kI c k≤ (cid:18) v − k A k (cid:19) k x k +1 I − x kI k + 2 k A k k x kI c k (due to (5.4) and (5.15)). Combing this with (5.17), we conclude by the first inequality of (5.10)that f ( x k +1 I ) − f ( y kI ) ≤ (cid:18) v − k A k (cid:19) t k k x k +1 I − x kI k + 2 k A k t k k x kI c kk x k +1 I − x kI k + Lt k k x k +1 I − x kI k (5.19) ≤ (cid:18) (cid:18) v − k A k (cid:19) t k + t k ( L + τ ) (cid:19) k x k +1 I − x kI k + 1 τ k A k k x kI c k . Recalling that lim k →∞ t k = 0, we can assume, without loss of generality, that32 (cid:18) v − k A k (cid:19) t k + t k ( L + τ ) ≤ τ for each k ≥ N. This, together with (5.16) and (5.19), yields that f ( x k +1 I ) − f ( x kI ) ≤ − (cid:18) v − k A k − τ (cid:19) k x k +1 I − x kI k + 2 τ k A k k x kI c k . (5.20)On the other hand, by the smoothness of f on B ( x ∗ I , δ ), we obtain by (5.12) and (2.4) that k∇ f ( x k +1 I ) k ≤ k∇ h ( x kI ) + ∇ ϕ ( y kI ) k + k∇ h ( x k +1 I ) − ∇ h ( x kI ) k + k∇ ϕ ( x k +1 I ) − ∇ ϕ ( y kI )) k . (5.21)Note by (5.18), (5.15) and (5.4) that k∇ h ( x kI ) + ∇ ϕ ( y kI ) k = k v k ( x kI − y kI ) − A ⊤ I A I c x kI c k ≤ v k x k +1 I − x kI k + 2 k A k k x kI c k , k∇ h ( x k +1 I ) − ∇ h ( x kI ) k ≤ k A k k x k +1 I − x kI k , and by (5.13) and (5.10) that k∇ ϕ ( x k +1 I ) − ∇ ϕ ( y kI ) k ≤ L ϕ k x k +1 I − y kI k ≤ L ϕ t k k x k +1 I − x kI k . AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG k∇ f ( x k +1 I ) k ≤ (cid:18) v + 2 k A k + L ϕ t k (cid:19) k x k +1 I − x kI k + 2 k A k k x kI c k . This and (5.20) show that { x kI } k ≥ N satisfies (H1 ◦ ) and (H2 ◦ ) with f , x kI , (cid:0) v − k A k − τ (cid:1) , (cid:16) v + 2 k A k + L ϕ t k (cid:17) and max nq τ , o k A k k x kI c k in place of F , x k α , β and ǫ k , respectively.Furthermore, it follows from (5.14) that lim k →∞ k x k +1 Ic kk x kIc k ≤ lim k →∞ t k = 0. This verifies (4.7)assumed in Theorem 4.2(ii). Therefore, the assumptions of Theorem 4.2(ii) are satisfied, and so itfollows that { x kI } converges linearly to x ∗ I . Recall from (5.14) that { x kI c } converges linearly to x ∗ I c (=0). Therefore, { x k } converges linearly to x ∗ . The proof is complete. Remark 5.2.
Frankel et al. [21] considered an inexact PGA similar to Algorithm IPGA-IIwith the inexact control being given by ǫ k = t k dist (cid:0) P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1) , P v k (cid:0) x k − − v k − ∇ H (cid:0) x k − (cid:1)(cid:1)(cid:1) . However, this inexact control would be not convenient to implement for applications because ǫ k is expressed in terms of P v ( · ) that is usually expensive to calculate exactly. In Theorem 5.3, weestablished the linear convergence of Algorithm IPGA-II with the inexact control being given by(5.7) and (5.8). Our convergence analysis deviates significantly from that of [21], in which the KLinequality is used as a standard technique. Theorem 5.4.
Let { x k } be a sequence generated by Algorithm IPGA-I with { v k } satisfying (5.4) . Suppose that { x k } converges to a global minimum x ∗ of problem (1.1) and that (5.5) and (5.6) are satisfied for each k ∈ N with lim k →∞ τ k = 0 . Then { x k } converges linearly to x ∗ .Proof . For simplicity, we write y k ∈ P v k ( x k − v k ∇ H ( x k )) for each k ∈ N . By Proposition5.1(i) (with y k , x k , v k , 0, 0 in place of z , x , v , ξ , ǫ ) and by (5.4), one has that (cid:18) v − k A k (cid:19) k y k − x k k ≤ F ( x k ) − F ( y k ) ≤ F ( x k ) − min x ∈ R n F ( x ) . Then, by the assumption that { x k } converges to a global minimum x ∗ of F , we have that { y k } alsoconverges to this x ∗ . By Theorem 3.2, it follows from (3.3) that 2 A ⊤ I A I + ∇ ϕ ( x ∗ I ) = ∇ f ( x ∗ I ) ≻ x ∗ I ∈ R s = (cf. (2.3)) and the smoothness of ϕ at x ∗ I , implies that there exists0 < δ < ( vλp (1 − p )) − p such that B ( x ∗ I , δ ) ⊆ R s = ∩ { y ∈ R s : ∇ ϕ ( y ) ≻ − A ⊤ I A I } . (5.22)By the convergence of { x k } and { y k } to x ∗ , there exists N ∈ N such that x kI , y kI ∈ B ( x ∗ I , δ ) , x kI c ∈ B (0 , δ ) and y kI c = 0 for each k ≥ N (5.23)(by Proposition 5.1(ii)). Fix k ≥ N . Then, by (5.6) and (5.1), we have that ϕ ( x k +1 I c ) + 12 v k k x k +1 I c − x kI c + 2 v k A ⊤ I c ( Ax k − b ) k ≤ ǫ k + 12 v k k − x kI c + 2 v k A ⊤ I c ( Ax k − b ) k . This implies that ϕ ( x k +1 I c ) ≤ ǫ k + 12 v k (cid:0) k x kI c k − k x kI c − x k +1 I c k (cid:1) − h x k +1 I c , A I c ( Ax k − b ) i . (5.24)2 LINEAR CONVERGENCE OF INEXACT DESCENT METHODS
Note that lim k →∞ x kI c = 0 and lim k →∞ τ k = 0. By (5.24) and (5.6), there exists K > k x k +1 I c k pp ≤ K ( k x k +1 I c k + k x kI c k ) . Then it follows from (2.1) (as p <
1) that (cid:0) − K k x k +1 I c k − p (cid:1) k x k +1 I c k p ≤ k x k +1 I c k pp − K k x k +1 I c k ≤ K k x kI c k . Since lim k →∞ x kI c = 0, we assume, without loss of generality, that k x k +1 I c k ≤ (2 K ) − − p . Hence, k x k +1 I c k p ≤ K k x kI c k = (cid:0) K k x kI c k − p (cid:1) k x kI c k p . Let α k := (cid:0) K k x kI c k − p (cid:1) p . Then it follows that k x k +1 I c − x kI c k ≥ k x kI c k − k x k +1 I c k ≥ − α k α k k x k +1 I c k . (5.25)On the other hand, let f k : R s → R be an auxiliary function defined by f k ( y ) := ϕ ( y ) + 12 v k k y − (cid:0) x kI − v k A ⊤ I ( Ax k − b ) (cid:1) k for each y ∈ R s . (5.26)Obviously, f k is smooth on R s = and note by Taylor’s formula of f k at y kI that f k ( y ) = f k ( y kI ) + ∇ f k ( y kI )( y − y kI ) + 12 h y − y kI , ∇ f k ( y kI )( y − y kI ) i + o ( k y − y kI k ) , ∀ y ∈ R s . (5.27)By (5.26), it is clear that y kI ∈ arg min y ∈ R s f k ( y ). Its first-order necessary optimality conditionsays that ∇ f k ( y kI ) = 0, and its second-order derivative is ∇ f k ( y kI ) = ∇ ϕ ( y kI ) + v k I s , where I s denotes the identical matrix in R s × s . Note by (5.22) and (5.23) that ∇ ϕ ( y kI ) ≻ − A ⊤ I A I . Then ∇ f k ( y kI ) ≻ v k I s − A ⊤ I A I ≻ v I s − A ⊤ I A I ≻ σ be the smallest eigenvalue of v I s − A ⊤ I A I , we obtain by (5.27) that f k ( y ) ≥ f k ( y kI ) + σ k y − y kI k for any y ∈ B ( y kI , δ ) (5.28)(otherwise we can select a smaller δ ). By (5.23), one observes that k x k +1 I − y kI k ≤ k x k +1 I − x ∗ I k + k y kI − x ∗ I k ≤ δ, and so, (5.28) and (5.5) imply that k x k +1 I − y kI k ≤ σ (cid:0) f k ( x k +1 I ) − f k ( y kI ) (cid:1) ≤ σ τ k k x k +1 I − x kI k . Note that y k ∈ P v k ( x k ) is arbitrary. This, together with (5.25), shows that { x k } can be seen as aspecial sequence generated by Algorithm IPGA-II that satisfies (5.7) and (5.8) with max { α k − α k , σ τ k } in place of t k . Since lim k →∞ τ k = 0 and lim k →∞ α k = 0 (by the definition of α k ), one has thatlim k →∞ max { α k − α k , σ τ k } = 0, and so, the conclusion directly follows from Theorem 5.3. AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG k · k pp and k · − x k in the proximal operator are separable (see (5.1)). Wecan propose two implementable inexact PGAs, Algorithms IPGA-Ip and IPGA-IIp, which are theparallel versions of Algorithms IPGA-I and IPGA-II, respectively. Algorithm IPGA-Ip . Given an initial point x ∈ R n , a sequence of stepsizes { v k } ⊆ R + and a sequence of nonnegative scalars { ǫ k } ⊆ R + . For each k ∈ N , having x k , we determine x k +1 by x k +1 i ∈ P v k ,ǫ k (cid:0)(cid:0) x k − v k ∇ H ( x k ) (cid:1) i (cid:1) with ǫ k = τ k k x k +1 i − x ki k for each i = 1 , . . . , n. Algorithm IPGA-IIp . Given an initial point x ∈ R n , a sequence of stepsizes { v k } ⊆ R + and a sequence of nonnegative scalars { t k } ⊆ R + . For each k ∈ N , having x k , we determine x k +1 satisfying dist (cid:0) x k +1 i , (cid:0) P v k (cid:0) x k − v k ∇ H (cid:0) x k (cid:1)(cid:1)(cid:1) i (cid:1) ≤ t k k x k +1 i − x ki k for each i = 1 , . . . , n. It is easy to verify that Algorithms IPGA-Ip and IPGA-IIp satisfy conditions (5.5)-(5.6) and(5.7)-(5.8) respectively, and so, their linear convergence properties follow directly from Theorems5.3 and 5.4.
6. Extension to infinite dimensional cases.
This section extends the results in precedingsections to the infinite-dimensional Hilbert spaces. In this section, we adopt the following notations.Let H be a Hilbert space, and let ℓ denote the Hilbert space consisting of all square-summablesequences. We consider the following ℓ p regularized least squares problem in infinite-dimensionalHilbert spaces min x ∈ l F ( x ) := k Ax − b k + ∞ X i =1 λ i | x i | p , (6.1)where A : ℓ → H is a bounded linear operator, and λ := ( λ i ) is a sequence of weights satisfying λ i ≥ λ > i ∈ N . (6.2)We start from some useful properties of the (inexact) descent methods and then present thelinear convergence of (inexact) descent methods and PGA for solving problem (6.1). Proposition 6.1.
Let { x k } ⊆ ℓ be a sequence satisfying (H1 ◦ ) and (H2 ◦ ) , and { ǫ k } satisfy (4.2) . Then there exist N ∈ N and a finite index set J ⊆ N such that supp( x k ) = J for each k ≥ N. (6.3) Proof . Fix k ∈ N . By (H1 ◦ ), one has that F ( x k ) ≤ F ( x k − ) − α k x k − x k − k + ǫ k − ≤ F ( x k − ) + ǫ k − ≤ F ( x ) + ∞ X i =0 ǫ i < + ∞ LINEAR CONVERGENCE OF INEXACT DESCENT METHODS (due to (4.2)). Then, it follows from (2.1) and (6.2) that k x k k p ≤ k x k k pp ≤ λ ∞ X i =1 λ i | x ki | p ≤ λ F ( x k ) < + ∞ . Then { x k } is bounded, denoting the upper bound of their norms by M . Let τ := min ( β , (cid:18) λp k A k M + 2 k A kk b k (cid:19) − p ) ( > . (6.4)Note by Proposition 4.1(i) that lim k →∞ k x k +1 − x k k = 0, which, together with (4.2), shows thatthere exists N ∈ N such that k x k +1 − x k k ≤ τ and ǫ k < k ≥ N. (6.5)We claim that the following implication is true for for each k ≥ N and i ∈ N x ki = 0 ⇒ | x ki | > τ ; (6.6)hence, this, together with (6.5), implies (6.3), as desired.Finally, we complete the proof by showing (6.6). Fix k > N and i ∈ N , and suppose that x ki = 0. Then, it follows from (6.2) and (H2 ◦ ) that λp | x ki | p − + 2 A ⊤ i ( Ax k − b ) ≤ k w k k ≤ β k x k − x k − k + ǫ k < τ ≤ β by (6.4)). Noting that k x k k ≤ M , we obtain from the above relation that | x ki | > (cid:18) λp k A k M + 2 k A kk b k (cid:19) − p ≥ τ (by (6.4)), which verifies (6.6), as desired. Remark 6.1. (i) Problem (6.1) for the n -dimensional Euclidean space has an equivalentformula to that of problem (1.1). Indeed, let u i := (cid:0) λ i λ (cid:1) p x i and K i := (cid:16) λλ i (cid:17) p A i for i = 1 , . . . , n. Then, problem (6.1) is reformulated to min u ∈ R n k Ku − b k + λ k u k pp that is (1.1) with K and u inplace of A and x .(ii) It is easy to verify by the similar proofs that Theorem 3.2 and Corollary 3.4 are also truefor problem (6.1) in the infinite-dimensional Hilbert spaces. Theorem 6.2.
Let { x k } ⊆ ℓ be a sequence satisfying (H1) and (H2) . Then { x k } convergesto a critical point x ∗ of problem (6.1) . Suppose that x ∗ is a local minimum of problem (6.1) . Then { x k } converges linearly to x ∗ .Proof . By the assumptions, it follows from Proposition 6.1 that there exist N ∈ N and a finiteindex set J such that (6.3) is satisfied. Let f J : R | J | → R be a function denoted by f J ( y ) := k A J y − b k + X i ∈ J λ i | y i | p for any y ∈ R | J | . AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG { x kJ } k ≥ N satisfies (H1) and (H2) with x kJ and f J in place of x k and F . Hence, the convergence of { x kJ } to a critical point x ∗ J of f J directly followsTheorem 4.3. Let x ∗ J c = 0. Then, by (6.3), it follows that { x k } converges to this x ∗ , which is acritical point of problem (6.1). Furthermore, suppose that x ∗ is a local minimum of problem (6.1).Then x ∗ J is also a local minimum of f J , and so, the linear convergence of { x kJ } to x ∗ J also followsfrom Theorem 4.3. Then, by (6.3), we conclude that { x k } converges linearly to this x ∗ . Theorem 6.3.
Let { x k } ⊆ ℓ be a sequence satisfying (H1 ◦ ) and { ǫ k } satisfy (4.2) . Supposeone of limiting points of { x k } , denoted by x ∗ , is a local minimum of problem (6.1) . Then thefollowing assertions are true. (i) { x k } converges to x ∗ . (ii) Suppose further that { x k } satisfies (H2 ◦ ) and { ǫ k } satisfies (4.7) . Then { x k } convergeslinearly to x ∗ .Proof . The proofs of assertions (i) and (ii) of this theorem use the lines of analysis similar tothat of assertion (i) of Theorem 4.2 (recalling from Remark 6.1(ii) that Corollary 3.4 is true forthe infinite-dimensional cases) and that of Theorem 6.2, respectively. The details are omitted.Bredies et al. [10] investigated the PGA for solving problem (6.1) in infinite-dimensionalHilbert spaces and proved that the generated sequence converges to a critical point under thefollowing additional assumptions: (a) { x ∈ ℓ : A ⊤ Ax = k A ⊤ A k x } is finite dimensional, (b) k A ⊤ A k is not an accumulation point of the eigenvalues of A ⊤ A , (c) A satisfies a finite basisinjectivity property, and (d) p is a rational. Dropping these technical assumptions, we provethe global convergence of the PGA only under the common made assumption on stepsizes, whichsignificantly improves [10, Theorem 5.1], and further establish its linear convergence under a simpleadditional assumption in the following theorem. Recall from [2, Theorem 5.1] that the sequence { x k } generated by Algorithm PGA satisfies conditions (H1) and (H2) under the assumption (5.4).Hence, as an application of Theorem 6.2, the results in the following theorem directly follow. Theorem 6.4.
Let { x k } ⊆ ℓ be a sequence generated by Algorithm PGA with { v k } satisfying (5.4) . Then { x k } converges to a critical point x ∗ of problem (6.1) . Furthermore, suppose that x ∗ is a local minimum of problem (6.1) . Then { x k } converges linearly to x ∗ . Let x ∗ be a local minimum of problem (6.1). It was reported in [16, Theorem 2.1(i)] that | x ∗ i | ≥ (cid:18) λp (1 − p )2 k A i k (cid:19) − p for each i ∈ supp( x ∗ ) . This indicates that supp( x ∗ ) is a finite index set. Then, following the proof lines of Theorems 5.2-5.4, we can obtain the linear convergence of inexact PGAs for infinite-dimensional Hilbert spaces,which are provided as follows. Theorem 6.5.
Let { x k } ⊆ ℓ be a sequence generated by Algorithm IPGA-I with { v k } satis-fying (5.4) . Then the following assertions are true. (i) Suppose that (4.2) is satisfied, and that one of limiting points of { x k } , denoted by x ∗ , is alocal minimum of problem (6.1) . Then { x k } converges to x ∗ . (ii) Suppose that { x k } converges to a global minimum x ∗ of problem (6.1) and that (5.5) and (5.6) are satisfied for each k ∈ N with lim k →∞ τ k = 0 . Then { x k } converges linearly to x ∗ . LINEAR CONVERGENCE OF INEXACT DESCENT METHODS
Theorem 6.6.
Let { x k } ⊆ ℓ be a sequence generated by Algorithm IPGA-II with { v k } satisfying (5.4) . Suppose that { x k } converges to a local minimum x ∗ of problem (6.1) and that (5.7) and (5.8) are satisfied for each k ∈ N with lim k →∞ t k = 0 . Then { x k } converges linearly to x ∗ . Remark 6.2.
Algorithms IPGA-Ip and IPGA-IIp, the parallel versions of Algorithms IPGA-I and IPGA-II, are implementable for solving problem (6.1) in the infinite-dimensional Hilbertspaces, and the generated sequences share the same linear convergence properties as shown inTheorems 6.5 and 6.6, respectively.
REFERENCES[1]
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran , Proximal alternating minimization and projectionmethods for nonconvex problems: An approach based on the Kurdyka- Lojasiewicz inequality , Math. Oper.Res., 35 (2010), pp. 438–457.[2]
H. Attouch, J. Bolte, and B. F. Svaiter , Convergence of descent methods for semi-algebraic and tameproblems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods , Math.Program., 137 (2013), pp. 91–129.[3]
F. Bach, R. Jenatton, J. Mairal, and G. Obozinski , Structured sparsity through convex optimization ,Statist. Sci., 27 (2012), pp. 450–468.[4]
A. Beck and M. Teboulle , A fast iterative shrinkage-thresholding algorithm for linear inverse problems ,SIAM J. Imaging Sci., 2 (2009), pp. 183–202.[5]
D. P. Bertsekas , Nonlinear Programming , Athena Scientific, Cambridge, 1999.[6]
T. Blumensath and M. E. Davies , Iterative thresholding for sparse approximations , J. Fourier Anal. Appl.,14 (2008), pp. 629–654.[7]
J. Bolte, T. P. Nguyen, J. Peypouquet, and B. W. Suter , From error bounds to the complexity offirst-order descent methods for convex functions , Math. Program., (2016), pp. 1–37.[8]
J. Bolte, S. Sabach, and M. Teboulle , Proximal alternating linearized minimization for nonconvex andnonsmooth problems , Math. Program., 146 (2013), pp. 459–494.[9]
K. Bredies and D. A. Lorenz , Linear convergence of iterative soft-thresholding , J. Fourier Anal. Appl., 14(2008), pp. 813–837.[10]
K. Bredies, D. A. Lorenz, and S. Reiterer , Minimization of non-smooth, non-convex functionals by iter-ative thresholding , J. Optim. Theory App., 165 (2015), pp. 78–112.[11]
R. S. Burachik and A. Rubinov , Abstract convexity and augmented Lagrangians , SIAM J. Optim., 18 (2007),pp. 413–436.[12]
E. Cand`es and T. Tao , Decoding by linear programming , IEEE Trans. Inform. Theory, 51 (2005), pp. 4203–4215.[13]
W. Cao, J. Sun, and Z. Xu , Fast image deconvolution using closed-form thresholding formulas of L q ( q = , ) regularization , J. Vis. Commun. Image R., 24 (2013), pp. 31–41.[14] R. Chartrand and V. Staneva , Restricted isometry properties and nonconvex compressive sensing , InverseProbl., 24 (2008), pp. 1–14.[15]
X. Chen , Smoothing methods for nonsmooth, nonconvex minimization , Math. Program., 134 (2012), pp. 71–99.[16]
X. Chen, F. Xu, and Y. Ye , Lower bound theory of nonzero entries in solutions of ℓ - ℓ p minimization , SIAMJ. Sci. Comput., 32 (2010), pp. 2832–2852.[17] P. L. Combettes and V. R. Wajs , Signal recovery by proximal forward-backward splitting , Multiscale Model.Sim., 4 (2005), pp. 1168–1200.[18]
I. Daubechies, M. Defrise, and C. De Mol , An iterative thresholding algorithm for linear inverse problemswith a sparsity constraint , Commun. Pur. Appl. Math., 57 (2004), pp. 1413–1457.[19]
D. L. Donoho , Compressed sensing , IEEE Trans. Inform. Theory, 52 (2006), pp. 1289–1306.[20]
M. Elad , Sparse and Redundant Representations , Springer, New York, 2010.[21]
P. Frankel, G. Garrigos, and J. Peypouquet , Splitting methods with variable metric for Kurdyka–Lojasiewicz functions and general convergence rates , J. Optim. Theory App., 165 (2015), pp. 874–900.[22]
D. Ge, X. Jiang, and Y. Ye , A note on complexity of L p minimization , Mathm. Program., 129 (2011), pp.AOHUA HU, CHONG LI, KAIWEN MENG AND XIAOQI YANG Y. Hu, C. Li, K. Meng, J. Qin and X. Yang , Group sparse optimizatin via ℓ p,q regularization , J. Mach.Learn. Res, 18 (2017), pp. 1–52.[24] Y. Hu, C. Li, and X. Yang , On convergence rates of linearized proximal algorithms for convex compositeoptimization with applications , SIAM J. Optim., 26 (2016), pp. 1207–1235.[25]
X. Huang and X. Yang , A unified augmented Lagrangian approach to duality and exact penalization , Math.Oper. Res., 28 (2003), pp. 533–552.[26]
M. Lai and J. Wang , An unconstrained ℓ q minimization with < q ≤ for sparse solution of underdeterminedlinear systems , SIAM J. Optim., 21 (2011), pp. 82–101.[27] G. Li and T. K. Pong , Douglas–Rachford splitting for nonconvex optimization with application to nonconvexfeasibility problems , Math. Program., 159 (2015), pp. 1–31.[28]
G. Li and T. K. Pong , Global convergence of splitting methods for nonconvex composite optimization , SIAMJ. Optim., 25 (2015), pp. 2434–2460.[29]
Z. Lu , Iterative reweighted minimization methods for l p regularized unconstrained nonlinear programming ,Math. Program., 147 (2014), pp. 277–307.[30] Z. Lu and Y. Zhang , Sparse approximation via penalty decomposition methods , SIAM J. Optim., 23 (2013),pp. 2448–2478.[31]
Z. Luo, J. Pang, and D. Ralph , Mathematical Programs with Equilibrium Constraints , Cambridge UniversityPress, Cambridge, 1996.[32]
S. Ma, D. Goldfarb, and L. Chen , Fixed point and Bregman iterative methods for matrix rank minimization ,Math. Program., 128 (2011), pp. 321–353.[33]
J. Mairal , Incremental majorization-minimization optimization with application to large-scale machine learn-ing , SIAM J. Optim., 25 (2015), pp. 829–855.[34]
G. Marjanovic and V. Solo , On l q optimization and sparse inverse covariance selection , IEEE Trans. Sig.Proc., 62 (2014), pp. 1644–1654.[35] Y. Nesterov , Gradient methods for minimizing composite functions , Math. Program., 140 (2013), pp. 125–161.[36]
P. Ochs, Y. Chen, T. Brox, and T. Pock , iPiano: Inertial proximal algorithm for nonconvex optimization ,SIAM J. Imaging Sci., 7 (2014), pp. 1388–1419.[37] M. Nikolova , Description of the minimizers of least squares regularized with ℓ -norm. Uniqueness of theglobal minimizer , SIAM J. Imaging Sci., 6 (2013), pp. 904–937.[38] J. K. Pant, W. S. Lu, and A. Antoniou , New improved algorithms for compressive sensing based on ℓ p norm , IEEE Trans. Circuits-II, 61 (2014), pp. 198–202.[39] J. Qin, Y. H. Hu, F. Xu, H. K. Yalamanchili, and J. Wang , Inferring gene regulatory networks by integrat-ing ChIP-seq/chip and transcriptome data via LASSO-type regularization methods , Methods, 67 (2014),pp. 294–303.[40]
M. Razaviyayn, M. Hong, and Z. Luo , A unified convergence analysis of block successive minimizationmethods for nonsmooth optimization , SIAM J. Optim., 23 (2013), pp. 1126–1153.[41]
R. T. Rockafellar , Monotone operators and the proximal point algorithm , SIAM J. Control Optim., 14(1976), pp. 877–898.[42]
R. T. Rockafellar and R. J.-B. Wets , Variational Analysis , Springer-Verlag, Berlin, 1998.[43]
M. Schmidt, N. L. Roux, and F. Bach , Convergence rates of inexact proximal-gradient methods for convexoptimization , in Advances in Neural Information Processing Systems 24 (2011), pp. 1458–1466.[44]
N. Simon, J. Friedman, T. Hastie, and R. Tibshirani , A sparse-group Lasso , J. Comput. Graph. Stat., 22(2013), pp. 231–245.[45]
S. Tao, D. Boley, and S. Zhang , Local linear convergence of ISTA and FISTA on the LASSO problem ,SIAM J. Optim., 26 (2016), pp. 313–336.[46]
P. Tseng , Approximation accuracy, gradient methods, and error bound for structured convex optimization ,Math. Program., 125 (2010), pp. 263–295.[47]
P. Tseng and S. Yun , A coordinate gradient descent method for nonsmooth separable minimization , Math.Program., 117 (2009), pp. 387–423.[48]
J. Wang, Y. Hu, C. Li, and J.-C. Yao , Linear convergence of CQ algorithms and applications in generegulatory network inference , Inverse Probl., 33 (2017), pp. 055017.[49]
J. Wang, C. Li, G. Lopez, and J.-C. Yao , Proximal point algorithms on Hadamard manifolds: Linearconvergence and finite termination , SIAM J. Optim., 26 (2017), pp. 2696–2729.[50]
B. Wen, X. Chen, and T. K. Pong , Linear convergence of proximal gradient algorithm with extrapolationfor a class of nonconvex nonsmooth minimization problems , SIAM J. Optim., 27 (2017), pp. 124–145. LINEAR CONVERGENCE OF INEXACT DESCENT METHODS[51]
L. Xiao and T. Zhang , A proximal-gradient homotopy method for the sparse least-squares problem , SIAM J.Optim., 23 (2013), pp. 1062–1091.[52]
Y. Xu and W. Yin , A block coordinate descent method for regularized multiconvex optimization with applica-tions to nonnegative tensor factorization and completion , SIAM J. Imaging Sci., 6 (2013), pp. 1758–1789.[53]
Z. Xu, X. Chang, F. Xu, and H. Zhang , L / regularization: A thresholding representation theory and afast solver , IEEE Trans. Neur. Net. Lear., 23 (2012), pp. 1013–1027.[54] J. Yang and Y. Zhang , Alternating direction algorithms for ℓ -problems in compressive sensing , SIAM J.Sci. Comput., 33 (2011), pp. 250–278.[55] J. Zeng, S. Lin, and Z. Xu , Sparse regularization: Convergence of iterative jumping thresholding algorithm ,IEEE Trans. Sig. Proc., 64 (2016), pp. 5106–5118.[56]
H. Zhang, J. Jiang, and Z.-Q. Luo , On the linear convergence of a proximal gradient method for a class ofnonsmooth convex minimization problems , J. Oper. Res. Soc. China, 1 (2013), pp. 163–186.[57]
L. Zhang, Y. Hu, C. Li, and J.-C. Yao , A new linear convergence result for the iterative soft thresholdingalgorithm , Optimization, 66 (2017), pp. 1177–1189.[58]