Convergence Analysis of A Variable Metric Proximal Linearized ADMM with Over-Relaxation Parameter in Nonconvex Nonsmooth Optimization
CConvergence Analysis of A Variable Metric Proximal LinearizedADMM with Over-Relaxation Parameter inNonconvex Nonsmooth Optimization
Maryam Yashtini ∗ September 14, 2020
Abstract
We introduce a variable metric proximal linearized ADMM (VMP-LADMM) algorithm withan over relaxation parameter β ∈ (0 ,
2) in the multiplier update, and develop its theoreticalanalysis. The algorithm solves a broad class of linearly constrained nonconvex and nonsmoothminimization problems. Under mild assumption, we show that the sequence generated by VMP-LADMM is bounded. Based on the powerful (cid:32)Lojasiewicz and Kurdyka-(cid:32)Lojasiewicz propertieswe establish that the sequence is globally converges to a critical point and we derive convergencerates.
The alternating direction method of multiplier (ADMM) [11, 15, 17, 20, 21] is one of the mostsuccessful approaches to solve linearly constrained optimization problems.The ADMM algorithm is closely related to the Douglas-Rachford [16] and Peachman-Rachford[35] operator splitting methods that date back to the 1950s. Due to of its success in solvingstructured convex optimization, ADMM has been widely used in various applications such asmachine learning, statistics, compressive sensing, image and signal processing, and sparse andlow-rank optimization, references include [12, 21, 38, 43, 44, 46, 48], and the surveys [11, 18].Theoretical analysis of ADMM and its variants including linearized ADMM and proximalADMM have been established extensively in the context of convex optimization , see [5, 14, 15,17, 20, 22, 31] and references therin. Boley [5] studied the local linear convergence for solvingquadratic and linear programs. In [22], Hager, Yashtini, and Zhang established the ergodicconvergence rate of a proximal linearized ADMM method [14], in which the proximal parameterupdates through a backtracking line search strategy. Deng and Yin [15] studied the convergencerate of a a general ADMM method in which a proximal term was added to each subproblem.Eckstein and Bertsekas in [17] showed the linear convergence of the ADMM for solving linearprograms, which depends on a bound on the largest iterate in the course of the algorithm. Lionsand Mercier in [31] show that the Douglas–Rachford operator splitting method converges linearlyunder the assumption that some involved monotone operators are both coercive and Lipschitz.The theory of multi-block ADMM– ADMM algorithm with more than two blocks for minimizingthe sum of more than two convex functions–has also been established [13, 23, 24, 28, 29, 30].Without assuming strong convexity assumption on objective function, Hong and Luo [25] show ∗ [email protected], Georgetown University, Department of Mathematics and Statistics 327A St. Mary’sHall 37th and O Streets, N.W., Washington D.C. 20057 Phone: (202) 687-6214 Fax: (202) 687.6067 a r X i v : . [ m a t h . O C ] S e p hat the linear convergence rate can be achieved for ADMM by including an over-relaxationparameter/stepsize in the multiplier update.ADMM based methodologies have been developed to solve nonconvex and possibly nons-mooth optimization problems. Applications include phase retrieval [42], distributed clustering[19], image inpainting [49], sparse zero variance discriminant analysis [1], image denoising [48],image colorization [47], image reconstruction [45], matrix separation [37], sparse feedback con-trol [27] to name a few. In terms of theoretical advancements, some recent works include[10, 26, 32, 34, 39, 40, 41]. Work [41] proved the global convergence of multi-block version ofclassical ADMM. Under (cid:32)Lojasiewicz/Kurdyka-(cid:32)Lojasiewicz property [33], Bot and Nguyen es-tablished the convergence and convergence rate of proximal and proximal linearized ADMM[10]. When the functionals are either smooth/nonconvex or convex/nonsmooth, [26] analyzedthe convergence of multi-block ADMM for solving a family of nonconvex consensus and sharingproblems. The iteration complexity of two classes of fully and partially linearized multi-blockADMM with the choice of relaxation parameter β in the multiplier update ( ?? ) establishedin [34]. In [40] authors show that ADMM is closely related to Douglas–Rachford splittingand Peaceman–Rachford splitting, and establish a unifying global convergence result applied tonon-convex problem.In this paper, we study the global convergence analysis of a variant of ADMM algorithm tosolve the following linearly constrained nonconvex and nonsmooth minimization problemmin x,y F ( x, y ) := f ( x ) + g ( x ) + h ( y )s . t . Ax + By + c = 0 , (1)where x ∈ R n and y ∈ R m are unknown variables, A : R n → R p and B : R m → R p arelinear operators, c ∈ R , f : R n → ( −∞ , + ∞ ] is a proper nonsmooth function, while g : R n → ( −∞ , + ∞ ] and h : R m → ( −∞ , + ∞ ] are smooth functionals. We do not assume any convexityassumption on f , g , and h . The augmented Lagrangian functional L α ( x, y, z ) associated withproblem (1), defined by L α : R n × R m × R p → R L α ( x, y, z ) = f ( x ) + g ( x ) + h ( y ) + (cid:104) z, Ax + By + c (cid:105) + α (cid:13)(cid:13) Ax + By + c (cid:13)(cid:13) . (2) α > z ∈ R p is the Lagrange multiplier associated with the constraint Ax + By + c = 0.Let { Q k } k ≥ ⊆ R n × n and { Q k } k ≥ ⊆ R m × m be two sequence of symmetric and positivedefinite matrices. Given the initial vector ( x , y , z ) and for k = 1 , , . . . until some stop-ping criterion satisfied the variable metric proximal ADMM algorithm generates the sequence { ( x k , y k , z k ) } k ≥ recursively as follows x k +1 = arg min x L α ( x, y k , z k ) + 12 (cid:107) x − x k (cid:107) Q k ,y k +1 = arg min y L α ( x k +1 , y, z k ) + 12 (cid:107) y − y k (cid:107) Q k ,z k +1 = z k + α ( Ax k +1 + By k +1 + c ) , (3)where (cid:107) v (cid:107) Q = (cid:104) v, Qv (cid:105) for any v ∈ R d and Q ∈ R d × d , where (cid:104)· , ·(cid:105) denotes the Euclidean innerproduct and (cid:107) · (cid:107) = (cid:112) (cid:104)· , ·(cid:105) denotes the (cid:96) norm. Algorithm (3) can be equivalently written asfollows x k +1 ∈ arg min x ∈ R n f ( x ) + g ( x ) + (cid:104) z k , Ax (cid:105) + α (cid:107) Ax + By k + c (cid:107) + 12 (cid:107) x − x k (cid:107) Q k ,y k +1 = arg min y ∈ R m h ( y ) + (cid:104) z k , By (cid:105) + α (cid:107) By + Ax k +1 + c (cid:107) + 12 (cid:107) y − y k (cid:107) Q k ,z k +1 = z k + α ( Ax k +1 + By k +1 + c ) . e take advantage of the smooth structure of g ( · ), α (cid:107) A · + By k + c (cid:107) , and h ( · ), and we replacethem by their proper linearization for more efficiency to obtain the Variable Metric ProximalLinearized ADMM (VMP-LADMM) algorithm, given as follows x k +1 ∈ arg min x ∈ R n ˆ f k ( x )ˆ f k ( x ) := f ( x ) + (cid:10) ∇ g ( x k ) + αA ∗ (cid:0) Ax k + By k + c + α − z k (cid:1) , x − x k (cid:11) + (cid:107) x − x k (cid:107) Q k ,y k +1 = arg min y ∈ R m ˆ h k ( y )ˆ h k ( y ) := (cid:10) ∇ h ( y k ) + B ∗ z k , y − y k (cid:11) + α (cid:107) By + Ax k +1 + c (cid:107) + (cid:107) y − y k (cid:107) Q k ,z k +1 = z k + αβ ( Ax k +1 + By k +1 + c ) (4)where an over-relaxation parameter β ∈ (0 ,
2) is added to the multiplier update.The VMP-LADMM algorithm is related but different from [32, 10]. Work [32] considersVMP-LADMM with the proximal terms L x (cid:107) x − x k (cid:107) and L y (cid:107) y − y k (cid:107) , where L x > L y > β = 1. Algorithm 2 in [10] does not exploit lineariziationof α (cid:107) A · + By k + c (cid:107) in the x subproblem, and solves (1) with g ( x ) = 0, A = − I n × n and c = 0. Note that wise choices of proximal matrices { Q ki } k ≥ for i = 1 , x k +1 and y k +1 , consequently yields a more efficient scheme. For instance, fora positive sequence { t k } k ≥ , Q k = t k I n , where I n is an n × n identity matrix the x subproblemin (4) leads to the following prox-linear problem x k +1 := arg min x ∈ R n (cid:110) f ( x ) + (cid:10) p k , x − x k (cid:11) + 12 t k (cid:107) x − x k (cid:107) (cid:111) , (5)where p k := ∇ g ( x k ) + αA ∗ ( Ax k + By k + c + α − z k ). Prox-linear subproblems can be easier tocompute specially when f is a separable function. The y subproblem in (4) contains the secondorder term α y ∗ B ∗ By . If B ∗ B is nearly a diagonal matrix (or nearly an orthogonal matrix),one can replace B ∗ B by a certain symmetric diagonal (orthogonal) matrix D ≈ B ∗ B . Thisreplacement gives rise to α y ∗ B ∗ By = α y ∗ Dy ∗ , and then one can choose Q k = α ( D − B ∗ B ) forefficiency.The main contribution of this paper is the establishment of global convergence and conver-gence rate analysis of the VMP-LADMM algorithm (4). We prove that the sequence generatedby the VMP-LADMM algorithm is bounded (Theorem 1), and any limit point is a stationarypoint (Lemma 3). In Theorem 2 we show that the sequence generated by the VMP-LADMMalgorithm is Cauchy, hence converges to a unique limit point. The rate of convergence for theerror of regularized augmented Lagrangian is established in Theorems 3. This consequentlyprovides the rate for the objective functional error as by Lemma 11 the objective function, aug-mented Lagrangian, and regularized augmented Lagrangian all approch to the same limit. Wedrive the rate of convergence for the error of the sequence generated by the VMP-LADMM inTheorem 4. Notation and Preliminary Facts
The Euclidean scalar product of R n and its correspond-ing norms are, respectively, denoted by (cid:104)· , ·(cid:105) and (cid:107) · (cid:107) = (cid:112) (cid:104)· , ·(cid:105) . If n , . . . , n p ∈ Z + and p ∈ Z + ,then for any v := ( v , . . . , v p ) ∈ R n × R n ×· · ·× R n p and v (cid:48) := ( v (cid:48) , . . . , v (cid:48) p ) ∈ R n × R n ×· · ·× R n p the Cartesian product and its norm are defined by (cid:28) v, v (cid:48) (cid:29) = p (cid:88) i =1 (cid:104) v i , v (cid:48) i (cid:105) √ p p (cid:88) i =1 (cid:107) v i (cid:107) ≤ ||| v ||| = (cid:118)(cid:117)(cid:117)(cid:116) p (cid:88) i =1 (cid:107) v i (cid:107) ≤ p (cid:88) i =1 (cid:107) v i (cid:107) . We denote by I n the n × n identity matrix. The minimum eigenvalue of the matrix A ∈ R n × n denoted by λ A min while its maximum eigenvalue is denoted λ A max . For the sequence { u k } k ≥ ,∆ u k := u k − u k − , for all k ≥ et Φ : R d → R be a proper and lower semicontinuous function. The domain of Φ, denoteddom Φ, defined by dom Φ := { x ∈ R d : Φ( x ) < + ∞} . The graph of the Φ, denoted Graph Φ, defined byGraph Φ := { ( x, y ) ∈ R d × R : y = Φ( x ) } . For any x ∈ dom Φ, the Fr´echet (viscosity) subdifferential of Φ at x , denoted ˆ ∂ Φ( x ), defined byˆ ∂ Φ( x ) = (cid:110) s ∈ R d : lim y (cid:54) = x inf y → x Φ( y ) − Φ( x ) − (cid:104) s, y − x (cid:105)(cid:107) y − x (cid:107) ≥ (cid:111) . For x / ∈ dom Φ, then ˆ ∂ Φ = ∅ .The limiting (Mordukhovich) differential , or simply the subdifferential for short, of Φ at x ∈ dom Φ, denoted ∂ Φ( x ), is defined by ∂ Φ( x ) := { s ∈ R d : ∃ x k → x, Φ( x k ) → Φ( x ) and s k ∈ ˆΦ( x k ) → s as k → + ∞} . For any x ∈ R d , the above definition implies ˆ ∂ Φ( x ) ⊂ ∂ Φ( x ), where the first set is convex andclosed while the second one is closed ([36], TH. 8.6, p.302).Let ( x k , s k ) ∈ Graph ∂ Φ := { ( x, s ) ∈ R d × R d : s k ∈ ∂ Φ( x k ) } . If ( x k , s k ) → ( x ∗ , s ∗ ) as k → ∞ , then by the definition of subdifferential ∂ Φ( x ) we have Φ( x k ) → Φ( x ∗ ) as k → ∞ , andequivalently, ( x ∗ , s ∗ ) ∈ Graph ∂ Φ. The well-known Fermat’s rule “ x ∈ R d is a local minimizerof Φ, then ∂ Φ( x ) (cid:51)
0” remains unchanged. If x ∈ R d such that ∂ Φ( x ) (cid:51) x is calleda critical point. We denote by crit Φ the set of critical points of Φ, that iscrit Φ = { x ∈ R d : 0 ∈ ∂ Φ( x ) } . When Φ is convex the two sets coincide andˆ ∂ Φ( x ) = ∂ Φ( x ) = { s ∈ R N : Φ( y ) ≥ Φ( x ) + (cid:104) s, y − x (cid:105) ∀ y ∈ R N } . Let Ω is a subset of R d and x is any point in R d . The distance from x to Ω, denoted dist( x, Ω),is defined by dist( x, Ω) = inf {(cid:107) x − z (cid:107) : z ∈ Ω } . If Ω = ∅ , then dist( x, Ω) = + ∞ for all x ∈ R d . For any real-valued function Φ on R d we havedist(0 , ∂ Φ( x )) = inf {(cid:107) s ∗ (cid:107) : s ∗ ∈ ∂ Φ( x ) } Let F : R n × R m → ( −∞ , + ∞ ] be a lower semicontinuous function. The subdifferentiation of F at the point (ˆ x, ˆ y ) is defined by ∂F (ˆ x, ˆ y ) = (cid:16) ∂ x F (ˆ x, ˆ y ) , ∂ y F (ˆ x, ˆ y ) (cid:17) , where ∂ x F and ∂ y F are espectively the differential of the function F ( · , y ) when y ∈ R m isfixed, and F ( x, · ) when x is fixed. Let Φ : R d → R be Fr´echet differentiable such that itsgradient is Lipschitz continuous with constant L >
0. Then for every u, v ∈ R d and every ξ ∈ [ u, v ] = { (1 − t ) u + tv : t ∈ [0 , } it holdsΦ( v ) ≤ Φ( u ) + (cid:104)∇ Φ( ξ ) , v − u (cid:105) + L (cid:107) v − u (cid:107) , (6)where if ξ = u , the inequality (6) gives the so-called Descent Lemma. et Φ : R d → ( −∞ , + ∞ ] be a proper lower semicontinuous function. For −∞ < η < η ≤ + ∞ we define the set [ η < Φ < η ] = { x ∈ R d : η < Φ( x ) < η } . Let x ∗ be a critical point of Φ, that is 0 ∈ ∂ Φ( x ∗ ). The function Φ has (cid:32)Lojasiewicz property at x ∗ if there exists an exponent θ ∈ [0 ,
1) and a neighborhood U of x ∗ such that for any x ∈ U itholds | Φ( x ) − Φ( x ∗ ) | θ ≤ dist(0 , ∂ Φ( x )) . Let η ∈ (0 , + ∞ ]. We denote by Ψ η the set of all concave and continuous functions ψ :[0 , η ) → [0 , + ∞ ) that satisfy the following conditions:- ψ (0) = 0;- ψ is C on (0 , η ) and continuous at 0- for all s ∈ (0 , η ), ψ (cid:48) ( s ) > R d → ( − ∞ , + ∞ ] is said to have the K(cid:32)Lproperty at x ∗ ∈ dom ∂ Φ if there exists η ∈ [0 , + ∞ ), a neighborhood U of x ∗ , and a a function ψ ∈ Ψ η such that for every x ∈ U ∩ [Φ( x ∗ ) < Φ < Φ( x ∗ ) + η ] , the following K(cid:32)L inequality holds ψ (cid:48) (cid:16) Φ( x ) − Φ( x ∗ ) (cid:17) dist (cid:16) , ∂ Φ( x ) (cid:17) ≥ . If Φ satisfies the property at each point of dom ∂ Φ, then Φ is a K(cid:32)L function.Let Ω be a compact set. Assume that Φ is constant on Ω and satisfies the K(cid:32)L property ateach point of Ω. Then there exists (cid:15) > η >
0, and ψ ∈ Ψ η such that for every x ∗ ∈ Ω andevery element x belongs to the intersection { x ∈ R d : dist( x, Ω) < (cid:15) } ∩ [Φ( x ∗ ) < Φ < Φ( x ∗ ) + η ] , it holds ψ (cid:48) (cid:16) Φ( x ) − Φ( x ∗ ) (cid:17) dist (cid:16) , ∂ Φ( x ) (cid:17) ≥ . It is shown that any proper and lower semicontinuous function has the K(cid:32)L property at anynoncritical points. The K(cid:32)L property provides a parametrization of the function Φ in order toavoid flatness near its critical points. Some functions that satisfy the K(cid:32)L property includessemialgebraic, real subanalytic, uniformly convex and convex functions satisfying the growthcondition, nonsmooth functions such as (cid:96) norm and (cid:96) p with a p ∈ Q + , indicator functionsof semi-algebraic sets such as δ (cid:107) x (cid:107) p ≤ α or δ (cid:107) x (cid:107) p ≤ α ,x ≥ , finite sums, product and composition ofsemi-algebraic functions, cone of PSD matrices, Stiefel manifolds. We refer interested readersto [2, 3, 4, 6, 7, 8, 9] for more properties of K(cid:32)L functions and illustrating examples. Organization.
The paper is organized as follows. In the next section, we provide someproperties of algorithm based on the augmented Lagrangian functional. In Section 3, we intro-duce the regularized augmented Lagrangian and drive some related theoretical results which areessential for the analysis of convergence in Section 3. Section 4 concludes the paper. Algorithm Properties and Augmented Lagrangian
In this section, we establish some important properties for the VMP-LAMM algorithm (4) basedon the augmented Lagrangian functional (2).
Assumption 1
We begin by making some assumptions.A1. f is lower-semicontinuous, f and g are coercive, and h is bounded from below;A2. λ B ∗ B min > and λ BB ∗ min > , where λ C min denotes the minimum eigenvalue of the matrix C .A3. α > , β ∈ (0 , , q i − := inf k ≥ (cid:107) Q ki (cid:107) > and q i := sup k ≥ (cid:107) Q ki (cid:107) < + ∞ , i = 1 , ;A4. The functions g and h are L g and L h Lipschitz differentiable.
Lemma 1 (Subgradient bound)
Suppose that the Assumption 1 holds. Let { ( x k , y k , z k ) } k ≥ be a sequence generated by the VMP-LADMM algorithm (4). There exists a constant ρ > and d k +1 := ( d k +1 x , d k +1 y , d k +1 z ) ∈ ∂ L α ( x k +1 , y k +1 , z k +1 ) such that ||| d k +1 ||| ≤ ρ (cid:16) (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) (cid:17) , (7) where ρ := max { q + L g , α (cid:107) A (cid:107)(cid:107) B (cid:107) + L h + q , (cid:107) A (cid:107) + (cid:107) B (cid:107) + 1 αβ } , (8) and for any sequence { u k } k ≥ , ∆ u k +1 = u k +1 − u k .Proof . Let k ≥ L α with respect to x , andevaluating the result at the point ( x k +1 , y k +1 , z k +1 ) yields ∂ x L α ( x k +1 , y k +1 , z k +1 ) = ∂f ( x k +1 ) + ∇ g ( x k +1 ) + αA ∗ (cid:0) Ax k +1 + By k +1 + α − z k +1 + c (cid:1) . By the optimality condition of x subproblem in (4) we have −∇ g ( x k ) − αA ∗ (cid:0) Ax k +1 − By k + α − z k + c (cid:1) − Q k ∆ x k +1 ∈ ∂f ( x k +1 )Therefore, we obtain d k +1 x := ∇ g ( x k +1 ) − ∇ g ( x k ) + A ∗ ∆ z k +1 + αA ∗ B ∆ y k +1 − Q k ∆ x k +1 ∈ ∂ x L α ( x k +1 , y k +1 , z k +1 ) . (9)Taking partial differential of L α with respect to y and evaluating the result at ( x k +1 , y k +1 , z k +1 )gives ∇ y L α ( x k +1 , y k +1 , z k +1 ) = ∇ h ( y k +1 ) + αB ∗ (cid:0) Ax k +1 + By k +1 + α − z k +1 + c (cid:1) . The optimality criterion of y subproblem in (4) is given by ∇ h ( y k ) + αB ∗ ( Ax k +1 + By k +1 + α − z k + c ) + Q k ∆ y k +1 = 0 . Thus, we have d k +1 y := ∇ h ( y k +1 ) − ∇ h ( y k ) + B ∗ ∆ z k +1 − Q k ∆ y k +1 ∈ ∇ y L α ( x k +1 , y k +1 , z k +1 ) . (10)By the z subproblem in the algorithm (4) it is easy to see that d k +1 z := Ax k +1 + By k +1 + c = 1 αβ ∆ z k +1 ∈ ∇ z L α ( x k +1 , y k +1 , z k +1 ) . (11)Hence, by (9), (10), , and (11), we then have d k +1 := ( d k +1 x , d k +1 y , d k +1 z ) ∈ ∂ L α ( x k +1 , y k +1 , z k +1 ) . rom (9), by using the triangle inequality we have (cid:107) d k +1 x (cid:107) ≤ (cid:107)∇ g ( x k +1 ) − ∇ g ( x k ) (cid:107) + (cid:107) A (cid:107) · (cid:107) ∆ z k +1 (cid:107) + α (cid:107) A (cid:107) · (cid:107) B (cid:107)(cid:107) ∆ y k +1 (cid:107) + (cid:107) Q k (cid:107)(cid:107) ∆ x k +1 (cid:107) Since g is L g Lipschitz continuous and q = sup k ≥ (cid:107) Q k (cid:107) < + ∞ we then have (cid:107) d k +1 x (cid:107) ≤ α (cid:107) A (cid:107)(cid:107) B (cid:107)(cid:107) ∆ y k +1 (cid:107) + ( q + L g ) (cid:107) ∆ x k +1 (cid:107) + (cid:107) A (cid:107)(cid:107) ∆ z k +1 (cid:107) . (12)From (10), by the triangle inequality, (cid:107) d k +1 y (cid:107) ≤ (cid:107)∇ h ( y k +1 ) − ∇ h ( y k ) (cid:107) + (cid:107) B (cid:107)(cid:107) ∆ z k +1 (cid:107) + (cid:107) Q k (cid:107)(cid:107) ∆ y k +1 (cid:107) . Since h is L h Lipschitz continuous and q = sup k ≥ (cid:107) Q k (cid:107) < + ∞ we get (cid:107) d k +1 y (cid:107) ≤ ( L h + q ) (cid:107) ∆ y k +1 (cid:107) + (cid:107) B (cid:107)(cid:107) ∆ z k +1 (cid:107) . (13)We also have (cid:107) d k +1 z (cid:107) = 1 αβ (cid:107) ∆ z k +1 (cid:107) . (14) ||| d k +1 ||| ≤ (cid:107) d k +1 x (cid:107) + (cid:107) d k +1 y (cid:107) + (cid:107) d k +1 z (cid:107)≤ ( q + L g ) (cid:107) ∆ x k +1 (cid:107) + ( α (cid:107) A (cid:107)(cid:107) B (cid:107) + L h + q ) (cid:107) ∆ y k +1 (cid:107) + ( (cid:107) A (cid:107) + (cid:107) B (cid:107) + 1 αβ ) (cid:107) ∆ z k +1 (cid:107) . We let ρ := max { q + L g , α (cid:107) A (cid:107)(cid:107) B (cid:107) + L h + q , (cid:107) A (cid:107) + (cid:107) B (cid:107) + αβ } to obtain (7). (cid:4) Lemma 2 (Limiting continuity)
Suppose that the Assumption 1 holds. If ( x ∗ , y ∗ , z ∗ ) is thelimit point of a subsequence { ( x k j , y k j , z k j ) } j ≥ , then L α ( x ∗ , y ∗ , z ∗ ) = lim j →∞ L α ( x k j , y k j , z k j ) . Proof . Let { ( x k j , y k j , z k j ) } j ≥ be a subsequence of the sequence generated by the VMP-LADMM algorithm such that lim j →∞ ( x k j , y k j , z k j ) = ( x ∗ , y ∗ , z ∗ ) . The function f is lower semicontinuous hence we have f ( x ∗ ) ≤ lim inf j →∞ f ( x k j ) . (15)From the x -subproblem in (4), we have ˆ f k ( x k +1 ) ≤ ˆ f k ( x ) for any x ∈ R n . Choose k = k j , ∀ j ≥
0, and letting x = x ∗ to get f ( x k j +1 ) + (cid:68) ∇ g ( x k j ) + αA ∗ (cid:0) Ax k j + By k j + α − z k j + c (cid:1) , ∆ x k j +1 (cid:69) + (cid:107) ∆ x k j +1 (cid:107) Q kj ≤ f ( x ∗ ) + (cid:68) ∇ g ( x k j ) + αA ∗ ( Ax k j + By k j + α − z k j + c ) , x ∗ − x k j (cid:69) + (cid:107) x ∗ − x k j (cid:107) Q kj . By the continuity of ∇ g and the fact that the distance between two successive iterates approachesto zero, taking the limit supremum from the both sides leads tolim sup j →∞ f ( x k j +1 ) ≤ f ( x ∗ )+ lim sup j →∞ (cid:110)(cid:68) ∇ g ( x k j ) + αA ∗ ( Ax k j + By k j + α − z k j + c ) , x ∗ − x k j (cid:69) + (cid:107) x ∗ − x k j (cid:107) Q kj (cid:111) . e have x k j → x ∗ as j → ∞ thus the latter inequality reduces tolim sup j →∞ f ( x k j +1 ) ≤ f ( x ∗ ) . Thus, in view of (15), we then have lim j →∞ f ( x k j ) = f ( x ∗ ) . Since the functions h ( · ) and g ( · ) are smooth we further havelim j →∞ g ( x k j ) = g ( x ∗ ) and lim j →∞ h ( y k j ) = h ( y ∗ ) . Thuslim j →∞ L α ( x k j , y k j , z k j )= lim j →∞ (cid:110) f ( x k j ) + g ( x k j ) + h ( y k j ) + (cid:104) z k j , Ax k j + By k j + c (cid:105) + α (cid:107) Ax k j + By k j + c (cid:107) (cid:111) = f ( x ∗ ) + g ( x ∗ ) + h ( y ∗ ) + (cid:104) z ∗ , Ax ∗ + By ∗ + c (cid:105) + α (cid:107) Ax ∗ + By ∗ + c (cid:107) = L α ( x ∗ , y ∗ , z ∗ ) . That completes the proof. (cid:4)
Lemma 3 (Limit point is critical point)
Suppose that the Assumption 1 holds. Any limitpoint ( x ∗ , y ∗ , z ∗ ) of the sequence { ( x k , y k , z k ) } k ≥ generated by the VMP-LADMM algorithm(4) is a stationary point. That is, ∈ L α ( x ∗ , y ∗ , z ∗ ) , or equivalently ∈ ∂f ( x ∗ ) + ∇ g ( x ∗ ) + A ∗ z ∗ , ∇ h ( y ∗ ) + B ∗ z ∗ Ax ∗ + By ∗ + c. Proof . Let { ( x k j , y k j , z k j ) } j ≥ be a subsequence of { ( x k , y k , z k ) } k ≥ such that ( x ∗ , y ∗ , z ∗ ) =lim j →∞ ( x k j , y k j , z k j ) . This follows that (cid:107) ∆ x k j (cid:107) → (cid:107) ∆ y k j (cid:107) →
0, and (cid:107) ∆ z k j (cid:107) → j → ∞ .By Lemma 2, L α ( x k j , y k j , z k j ) → L α ( x ∗ , y ∗ , z ∗ ), j → + ∞ . Let d k j ∈ ∂ L α ( x k j , y k j , z k j ), byLemma 1 we have ||| d k j (cid:107)|| ≤ ρ ( (cid:107) ∆ x k j (cid:107) + (cid:107) ∆ y k j (cid:107) + (cid:107) ∆ z k j (cid:107) ), where ρ > ||| d k j (cid:107)|| → j → ∞ , hence d k j →
0. By the closeness criterion of the limiting sub-differentialwe then have 0 ∈ ∂ L α ( x ∗ , y ∗ , z ∗ ), or equivalently, ( x ∗ , y ∗ , z ∗ ) ∈ crit( L α ). (cid:4) Lemma 4 (descent of L α during x update) Suppose that the Assumption 1 holds. For thesequence { ( x k , y k , z k ) } k ≥ generated by the VMP-LADMM algorithm (4) we have L α ( x k , y k , z k ) − L α ( x k +1 , y k , z k ) ≥ (cid:107) ∆ x k +1 (cid:107) A k , (16) where A k = Q k + (cid:16) αλ A ∗ A min − L g (cid:17) I n . need to be editedProof . Let k ≥ x iterate of (4) it is clear that ˆ f k ( x k +1 ) ≤ ˆ f k ( x ) forany x ∈ R n . Setting x = x k gives f ( x k +1 ) − f ( x k ) + (cid:10) ∇ g ( x k ) , ∆ x k +1 (cid:11) + (cid:107) ∆ x k +1 (cid:107) Q k ≤ − α (cid:10) A ∗ ( Ax k + By k + α − z k + c ) , ∆ x k +1 (cid:11) . (17)We next consider L α ( x k , y k , z k ) − L α ( x k +1 , y k , z k ) = f ( x k ) − f ( x k +1 ) + g ( x k ) − g ( x k +1 ) −(cid:104) z k , A ∆ x k +1 ) (cid:105) + α (cid:107) Ax k + By k + c (cid:107) − α (cid:107) Ax k +1 + By k + c (cid:107) = f ( x k ) − f ( x k +1 ) + g ( x k ) − g ( x k +1 ) + α (cid:107) A ( x k +1 − x k ) (cid:107) −(cid:104) z k , A ∆ x k +1 (cid:105) − α (cid:10) A ∗ ( Ax k + By k + c ) , ∆ x k +1 (cid:11) . y (17) we then have L α ( x k , y k , z k ) − L α ( x k +1 , y k , z k ) ≥ g ( x k ) − g ( x k +1 ) + (cid:10) ∇ g ( x k ) , ∆ x k +1 (cid:11) + α (cid:107) A ∆ x k +1 (cid:107) + (cid:107) ∆ x k +1 (cid:107) Q k . Since g is L g this yields L α ( x k , y k , z k ) − L α ( x k +1 , y k , z k ) ≥ α (cid:107) A ∆ x k +1 (cid:107) − L g (cid:107) ∆ x k +1 (cid:107) + 12 (cid:107) ∆ x k +1 (cid:107) Q k . That completes the proof. (cid:4)
Lemma 5 (descent of L α during y update) Suppose that the Assumption 1 holds. For thesequence { ( x k , y k , z k ) } k ≥ generated by the VMP-LADMM algorithm (4) we have L α ( x k +1 , y k , z k ) − L α ( x k +1 , y k +1 , z k ) ≥ (cid:107) ∆ y k +1 (cid:107) B k , (18) where B k = Q k + ( αλ B ∗ B min − L h ) I m . Proof . Let k ≥ y subproblem in (4) we have ∇ h ( y k ) + αB ∗ ( By k +1 + Ax k +1 + α − z k + c ) + Q k ∆ y k +1 = 0Multiply this equation by ∆ y k +1 and rearrange to obtain − α (cid:104) B ∗ ( By k +1 + Ax k +1 + α − z k + c ) , ∆ y k +1 (cid:105) = (cid:104)∇ h ( y k ) , ∆ y k +1 (cid:105) + (cid:107) ∆ y k +1 (cid:107) Q k . (19)We next consider L α ( x k +1 , y k , z k ) − L α ( x k +1 , y k +1 , z k ) = h ( y k ) − h ( y k +1 ) + α (cid:107) By k + Ax k +1 + c (cid:107) − α (cid:107) By k +1 + Ax k +1 + c (cid:107) − (cid:104) z k , B ∆ y k +1 (cid:105) = h ( y k ) − h ( y k +1 ) + α (cid:107) B ∆ y k +1 (cid:107) − α (cid:104) ∆ y k +1 , B ∗ ( By k +1 + Ax k +1 + α − z k + c ) (cid:105) By (19) and the fact that h is L h Lipschitz continuous we then get L α ( x k +1 , y k , z k ) − L α ( x k +1 , y k +1 , z k ) ≥ (cid:107) ∆ y k +1 (cid:107) Q k + α (cid:107) B ∆ y k +1 (cid:107) − L h (cid:107) ∆ y k +1 (cid:107) . This completes the proof. (cid:4)
Lemma 6
Suppose that the Assumption 1 holds. For the sequence { ( x k , y k , z k ) } k ≥ generatedby the VMP-LADMM algorithm (4) we have L α ( x k +1 , y k +1 , z k +1 ) + (cid:13)(cid:13)(cid:13) ∆ x k +1 (cid:13)(cid:13)(cid:13) A k + (cid:13)(cid:13)(cid:13) ∆ y k +1 (cid:13)(cid:13)(cid:13) B k ≤ L α ( x k , y k , z k ) + 1 αβ (cid:13)(cid:13)(cid:13) ∆ z k +1 (cid:13)(cid:13)(cid:13) , (20) where A k = Q k + (cid:0) αλ A ∗ A min − L g (cid:1) I n . and B k = Q k + ( αλ B ∗ B min − L h ) I m . Proof . Let k ≥ z update in (4) and Lemma 4 and 5 we have L α ( x k +1 , y k +1 , z k +1 ) = L α ( x k +1 , y k +1 , z k ) + 1 αβ (cid:107) ∆ z k +1 (cid:107) ≤ L α ( x k +1 , y k , z k ) − (cid:107) ∆ y k +1 (cid:107) B k + 1 αβ (cid:107) ∆ z k +1 (cid:107) ≤ L α ( x k , y k , z k ) − (cid:107) ∆ x k +1 (cid:107) A k − (cid:107) ∆ y k +1 (cid:107) B k + 1 αβ (cid:107) ∆ z k +1 (cid:107) . Rearrange to obtain (20). (cid:4) emma 7 (Monotonicity of L α ) Suppose that the Assumption 1 holds. For the sequence { ( x k , y k , z k ) } k ≥ generated by the VMP-LADMM algorithm (4) we have αβ (cid:13)(cid:13)(cid:13) ∆ z k +1 (cid:13)(cid:13)(cid:13) ≤ θ (cid:107) ∆ y k (cid:107) + θ (cid:107) ∆ y k +1 (cid:107) + γ (cid:13)(cid:13)(cid:13) B ∗ ∆ z k (cid:13)(cid:13)(cid:13) − γ (cid:107) B ∗ ∆ z k +1 (cid:13)(cid:13)(cid:13) , (21) and consequently L α (cid:0) x k +1 , y k +1 , z k +1 (cid:1) + (cid:13)(cid:13) ∆ x k +1 (cid:13)(cid:13) A k + (cid:13)(cid:13) ∆ y k +1 (cid:13)(cid:13) B k − rθ I m + r − αβ (cid:13)(cid:13) ∆ z k +1 (cid:13)(cid:13) + rγ (cid:13)(cid:13) B ∗ ∆ z k +1 (cid:13)(cid:13) ≤ L α (cid:0) x k , y k , z k (cid:1) + rγ (cid:13)(cid:13) B ∗ ∆ z k (cid:13)(cid:13) + rθ (cid:13)(cid:13) ∆ y k (cid:13)(cid:13) (22) where r > , and θ := 2 β ( L h + q ) αβλ BB ∗ min (cid:16) − | − β | (cid:17) , γ := | − β | αβλ BB ∗ min (cid:16) − | − β | (cid:17) θ := 2 βq αβλ BB ∗ min (cid:16) − | − β | (cid:17) . (23) Proof . Let k ≥ w k +1 := − Q k ∆ y k +1 − ∇ h ( y k ) . (24)Then ∆ w k +1 = Q k − ∆ y k − Q k ∆ y k +1 + ∇ h ( y k − ) − ∇ h ( y k ) , and by the triangle inequality we have (cid:107) ∆ w k +1 (cid:107) ≤ (cid:107)∇ h ( y k ) − ∇ h ( y k − ) (cid:107) + (cid:107) Q k (cid:107)(cid:107) ∆ y k +1 (cid:107) + (cid:107) Q k − (cid:107)(cid:107) ∆ y k (cid:107) (25)By the fact that h is L h Lipschitz continuous and q = sup k ≥ (cid:107) Q k (cid:107) < + ∞ we obtain (cid:107) ∆ w k +1 (cid:107) ≤ ( L h + q ) (cid:107) ∆ y k (cid:107) + q (cid:107) ∆ y k +1 (cid:107) . (26)and hence (cid:107) ∆ w k +1 (cid:107) ≤ L h + q ) (cid:107) ∆ y k (cid:107) + 2 q (cid:107) ∆ y k +1 (cid:107) . (27)Expressing the optimality condition of y subproblem using w k +1 gives w k +1 = αB ∗ ( Ax k +1 + By k +1 + c + α − z k )Combining this with the z iterate in (4) yields B ∗ z k +1 = βw k +1 + (1 − β ) B ∗ z k . (28)This follows that B ∗ ∆ z k +1 = β ∆ w k +1 + (1 − β ) B ∗ ∆ z k . Since β ∈ (0 , B ∗ ∆ z k +1 = (cid:0) − | − β | (cid:1)(cid:16) β ∆ w k +1 − | − β | (cid:17) + | − β | (cid:16) sign(1 − β ) B ∗ ∆ z k (cid:17) , where sign( λ ) = 1 if λ ≥ λ ) = − λ <
0. By the convexity of (cid:107) · (cid:107) we have λ BB ∗ min (cid:16) − | − β | (cid:17)(cid:13)(cid:13)(cid:13) ∆ z k +1 (cid:13)(cid:13)(cid:13) ≤ (cid:16) − | − β | (cid:17)(cid:13)(cid:13)(cid:13) B ∗ ∆ z k +1 (cid:13)(cid:13)(cid:13) ≤ β − | − β | (cid:13)(cid:13)(cid:13) ∆ w k +1 (cid:13)(cid:13)(cid:13) + | − β | (cid:13)(cid:13)(cid:13) B ∗ ∆ z k (cid:13)(cid:13)(cid:13) − | − β | (cid:13)(cid:13)(cid:13) B ∗ ∆ z k +1 (cid:13)(cid:13)(cid:13) . ultiply the both sides of the latter inequality by rαβλ BB ∗ min (cid:16) − | − β | (cid:17) where r > (cid:107) ∆ w k +1 (cid:107) to obtain1 αβ (cid:13)(cid:13)(cid:13) ∆ z k +1 (cid:13)(cid:13)(cid:13) ≤ β ( L h + q ) αβλ BB ∗ min (cid:16) − | − β | (cid:17) (cid:107) ∆ y k (cid:107) + 2 βq αβλ BB ∗ min (cid:16) − | − β | (cid:17) (cid:107) ∆ y k +1 (cid:107) + | − β | αβλ BB ∗ min (cid:16) − | − β | (cid:17) (cid:13)(cid:13)(cid:13) B ∗ ∆ z k (cid:13)(cid:13)(cid:13) − | − β | αβλ BB ∗ min (cid:16) − | − β | (cid:17) (cid:13)(cid:13)(cid:13) B ∗ ∆ z k +1 (cid:13)(cid:13)(cid:13) . Multiply this inequality by r > (cid:4)
The regularized Augmented Lagrangian functional is defined by R : R n × R m × R p × R m × R p → ( −∞ , + ∞ ] R ( x, y, z, y (cid:48) , z (cid:48) ) = L α ( x, y, z ) + rγ (cid:13)(cid:13)(cid:13) B ∗ ( z − z (cid:48) ) (cid:13)(cid:13)(cid:13) + rθ (cid:13)(cid:13)(cid:13) y − y (cid:48) (cid:13)(cid:13)(cid:13) , (29)where r >
1, and θ and γ are as in (23). For any k ≥
1, we denote R k := R ( x k , y k , z k , y k − , z k − ) = L α ( x k , y k , z k ) + rγ (cid:107) B ∗ ∆ z k (cid:107) + rθ (cid:107) ∆ y k (cid:107) . (30)By (22) we then have R k +1 + (cid:107) ∆ x k +1 (cid:107) A k + (cid:107) ∆ y k +1 (cid:107) D k + r − αβ (cid:107) ∆ z k +1 (cid:107) ≤ R k . (31)where D k := B k − r ( θ + θ ) I m . Assumption 2 Sufficient decrease condition (i) The symmetric positive definite matrices { Q k } k ≥ and { Q k } k ≥ are chosen such that thereexists σ > and σ > such that q − + αλ A ∗ A min − L g ≥ σ and q − + αλ B ∗ B min − ( L h + θ + θ ) ≥ σ . We then let σ = min { σ , σ , r − αβ } . By this assumption, for all k ≥ R k +1 + σ (cid:16) (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) (cid:17) ≤ R k ≤ R . (32) ii) (Relaxed Version) The conditions in the assumption can be relaxed. Suppose that thesymmetric positive definite { Q k } k ≥ and { Q k } k ≥ are chosen such that after a finite num-ber of iterations k ≥ we have σ k = inf k ≥ k (cid:110) (cid:107) Q k (cid:107) + λ A ∗ A min − L g , (cid:107) Q k (cid:107) + αλ B ∗ B min − ( L h + θ + θ ) , r − αβ (cid:111) > σ > . With this, we would then have R k +1 + σ (cid:16) (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) (cid:17) ≤ R k ≤ R k , ∀ k ≥ k . (33) Remark 1
We make a few remarks regarding to the Assumption 2.(i) As a special choice is to select Q k = Q k fixed and Q k = Q fixed for all k , set Q = L g I n and Q = ( L h + θ + θ ) I m . Following this σ = min { L g , αλ B ∗ B min , r − αβ } . Note that λ A ∗ A min can be zero, but λ B ∗ B min > by Assumption 1.(ii) If g ( x ) = 0 in (1), then L g = 0 , and hence σ = q − > . Lemma 8 (Convergence of R k ) Suppose that the Assumption 1 and 2 (i) hold. If { ( x k , y k , z k ) } k ≥ is a sequence generated by the VMP-LADMM algorithm (4), which assumed to be bounded, thenthe sequence {R k } k ≥ is bounded from below and convergent.Proof . Let k ≥ L α ( x k , y k , z k ) = f ( x k ) + g ( x k ) + h ( y k ) + (cid:104) z k , Ax k + By k + c (cid:105) + α (cid:107) Ax k + By k + c (cid:107) is bounded from below. Since { ( x k , y k , z k ) } k ≥ be a bounded sequence clearly (cid:104) z k , Ax k + By k + c (cid:105) and (cid:107) Ax k + By k + c (cid:107) are bounded for k ≥
0. By Assumption 1, h is bounded from belowand since f and g are coercive and { x k } k ≥ is bounded, then { f ( x k ) } k ≥ and { g ( x k ) } k ≥ arebounded. Therefore {L α ( x k , y k , z k ) } k ≥ is bounded from below, and consequently −∞ < inf {R k : k ≥ } . By Assumption 2 (i), {R k } k ≥ k is monotonically decreasing for all k ≥
0. This together withthe fact that R k is bounded from below, we conclude that {R k } k ≥ is convergent. (cid:4) Theorem 1 (bounded sequence)
We assume that Assumption 1, and 2 (ii) hold. Thensequence { ( x k , y k , z k ) } k ≥ generated by the VMP-LADMM algorithm (4) is bounded.Proof . Let { ( x k , y k , z k ) } k ≥ be a generated by the VMP-LADMM algorithm. By (33) thereexists a k ≥ R k +1 ≤ R k for all k ≥ k , and hence f ( x k +1 ) + g ( x k +1 ) + h ( y k +1 ) + α (cid:13)(cid:13)(cid:13) Ax k +1 + By k +1 + α − z k +1 + c (cid:13)(cid:13)(cid:13) − α (cid:13)(cid:13)(cid:13) z k +1 (cid:13)(cid:13)(cid:13) + σ (cid:107) ∆ x k +1 (cid:107) + σ (cid:107) ∆ y k +1 (cid:107) + σ (cid:107) ∆ z k +1 (cid:107) + rθ (cid:107) ∆ y k +1 (cid:107) + rγ (cid:107) B ∗ ∆ z k +1 (cid:107) ≤ R k . (34)We will next find a lower bound for − α (cid:107) z k +1 (cid:107) . Given w k +1 as in (24), we rearrange (28) toobtain βB ∗ z k +1 = βw k +1 + (1 − β ) B ∗ ( z k − z k +1 ) . Since β ∈ (0 ,
2) we can rewrite this equation as follows βB ∗ z k +1 = (1 − | − β | ) (cid:16) βw k +1 − | − β | (cid:17) + ( | − β | ) (cid:16) sign(1 − β ) B ∗ ( z k − z k +1 ) (cid:17) . y the convexity of (cid:107) · (cid:107) we then obtain λ BB ∗ min β (cid:13)(cid:13)(cid:13) z k +1 (cid:13)(cid:13)(cid:13) ≤ β − | − β | (cid:13)(cid:13)(cid:13) w k +1 (cid:13)(cid:13)(cid:13) + | − β | (cid:13)(cid:13)(cid:13) B ∗ ∆ z k +1 (cid:13)(cid:13)(cid:13) (35)Use (cid:107) w k +1 (cid:107) ≤ q + L h ) (cid:107) ∆ y k +1 (cid:107) + 2 (cid:107)∇ h ( y k +1 ) (cid:107) in (35) and then divide the both sides ofthe resulting inequality by − αβ λ BB ∗ min to get − α (cid:13)(cid:13)(cid:13) z k +1 (cid:13)(cid:13)(cid:13) ≥ − ϑ (cid:107)∇ h ( y k +1 ) (cid:107) − θ (cid:107) ∆ y k +1 (cid:107) − γ (cid:13)(cid:13)(cid:13) B ∗ ∆ z k +1 (cid:13)(cid:13)(cid:13) , where ϑ := 1 α (1 − | − β | ) λ BB ∗ min , θ := ( q + L h ) α (1 − | − β | ) λ BB ∗ min , γ := | − β | αβ λ BB ∗ min . Using the latter inequality, (34) leads to f ( x k +1 ) + g ( x k +1 ) + α (cid:13)(cid:13)(cid:13) Ax k +1 + By k +1 + α − z k +1 + c (cid:13)(cid:13)(cid:13) + ( rθ + σ − θ ) (cid:13)(cid:13)(cid:13) ∆ y k +1 (cid:13)(cid:13)(cid:13) + σ (cid:13)(cid:13)(cid:13) ∆ x k +1 (cid:13)(cid:13)(cid:13) + ( rγ − γ ) (cid:13)(cid:13)(cid:13) B ∗ ∆ z k +1 (cid:13)(cid:13)(cid:13) + σ (cid:107) ∆ z k +1 (cid:107) ≤ R k − inf y (cid:110) h ( y ) − ϑ (cid:13)(cid:13)(cid:13) ∇ h ( y ) (cid:13)(cid:13)(cid:13) (cid:111) . (36)By the Assumption 1, h is L h Lipschitz continuous, then for any k ≥ k ≥ y ∈ R m it holds h ( y ) ≤ h ( y k ) + (cid:104)∇ h ( y k ) , y − y k (cid:105) + L h (cid:107) y − y k (cid:107) . If δ > y = y k − δ ∇ h ( y k )yields h (cid:16) y k − δ ∇ h ( y k ) (cid:17) ≤ h ( y k ) − (cid:16) δ − L h δ (cid:17) (cid:107)∇ h ( y k ) (cid:107) . Since h is bounded from below, then we have −∞ < inf { h ( y ) − (cid:16) δ − L h δ (cid:17) (cid:107)∇ h ( y ) (cid:107) : y ∈ R m } (37)We choose δ > ϑ = δ − L h δ , then (37) follows that the right hand side of (36)is finite. It is easy to verify for any r > β ∈ (0 , rγ − γ > rθ + σ − θ >
0. Hence f ( x k +1 ) + g ( x k +1 ) + (cid:13)(cid:13)(cid:13) Ax k +1 + By k +1 + α − z k +1 + c (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ∆ y k +1 (cid:13)(cid:13)(cid:13) + (cid:107) ∆ z k (cid:107) < + ∞ . (38)Since f and g are coercive, then the sequence { x k } k ≥ k and consequently { Ax k } k ≥ k is bounded.By the z iterate of (4) we have By k +1 = 1 αβ ∆ z k +1 − Ax k +1 − c. By (38), { ∆ z k } k ≥ is bounded and since B ∗ B is invertible then { y k } k ≥ k is bounded. Finally,since { Ax k } k ≥ k and { By k } k ≥ k are bounded and also { Ax k + By k + α z k } k ≥ k is bounded,thus { z k } k ≥ k is bounded.We showed that the sequences { x k } k ≥ k , { y k } k ≥ k , and { z k } k ≥ k are bounded. Hence, thereexists M x > M y > M z > (cid:107) x k (cid:107) ≤ M x , (cid:107) y k (cid:107) ≤ M y , (cid:107) z k (cid:107) ≤ M z , ∀ k ≥ k . (39) e denote by ˆ M x = max {(cid:107) x k (cid:107) : k = 0 , , · · · , k − } ˆ M y = max {(cid:107) y k (cid:107) : k = 0 , , · · · , k − } ˆ M z = max {(cid:107) z k (cid:107) : k = 0 , , · · · , k − } Thus we have (cid:107) x k (cid:107) ≤ max { M x , ˆ M x } , (cid:107) y k (cid:107) ≤ max { M y , ˆ M y } , (cid:107) z k (cid:107) ≤ max { M z , ˆ M z } , ∀ k ≥ . This concludes the proof. (cid:4)
Remark 2
Theorem 1 was established for Assumption 2(ii). With Assumption 2(i) then (39)holds for k ≥ . Lemma 9
Suppose that the Assumption 1 and 2 (ii) hold. If { ( x k , y k , z k ) } k ≥ is a sequencegenerated by the VMP-LADMM algorithm (4), which assumed to be bounded, we have lim k →∞ (cid:107) ∆ x k +1 (cid:107) = 0 , lim k →∞ (cid:107) ∆ y k +1 (cid:107) = 0 , lim k →∞ (cid:107) ∆ z k +1 (cid:107) = 0 . Proof . By Lemma 8 , by summing up (33) from k = k to some K ≥ k we have K (cid:88) k = k (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ σ ( R k − inf k ≥ R k ) < + ∞ . We let K approach to infinity, and since { ( x k , y k , z k ) } k ≥ is bounded we have (cid:88) k ≥ (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) < + ∞ . This follows that (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) → , as k → ∞ , and thuslim k →∞ (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) → . This finishes the proof. (cid:4)
Lemma 10 (properties of limit point set)
Let the Assumptions 1 and 2 hold. For a boundedsequence { ( x k , y k , z k ) } k ≥ generated by the VMP-LADMM algorithm (4) the following are true(i) The limit point set of the sequence { ( x k , y k , z k ) } k ≥ , denoted ω (cid:16) ( x k , y k , z k ) } k ≥ (cid:17) , is nonempty,connected and compact.(ii) lim k →∞ dist (cid:104) ( x k , y k , z k ) , ω (cid:16) ( x k , y k , z k ) } k ≥ (cid:17)(cid:105) = 0 .(iii) ω (cid:16) ( x k , y k , z k ) } k ≥ (cid:17) ⊆ crit L α .Proof . These results follow by Lemma 9. We omit the proof. Lemma 11
Suppose that the Assumptions 1 holds. If ( x ∗ , y ∗ , z ∗ ) is a limit point of a subse-quence { ( x k j , y k j , z k j ) } j ≥ , then R ( x ∗ , y ∗ , z ∗ , y ∗ , z ∗ ) = L α ( x ∗ , y ∗ , z ∗ ) = f ( x ∗ ) + g ( x ∗ ) + h ( y ∗ ) . roof . Let { ( x k j , y k j , z k j ) } j ≥ be a subsequence such that ( x k j , y k j , z k j ) → ( x ∗ , y ∗ , z ∗ ) as j → ∞ . Hence (cid:107) ∆ y k j (cid:107) → (cid:107) B ∗ ∆ z k j (cid:107) ≤ (cid:107) B (cid:107)(cid:107) ∆ z k j (cid:107) → k → ∞ hencelim j →∞ R k j = lim k →∞ L α ( x k j , y k j , z k j )= lim k →∞ f ( x k j ) + g ( x k j ) + h ( y k j ) + (cid:104) z k j , Ax k j + By k j + c (cid:105) + α (cid:107) Ax k j + By k j + c (cid:107) . By the z iterate of the algorithm (4), since (cid:107) ∆ z k j +1 (cid:107) → (cid:107) Ax k j + By k j + c (cid:107) → , as j → ∞ . Since { z k j } j ≥ is a bounded sequence we also have (cid:104) z k j , Ax k j + By k j + c (cid:105) → k → ∞ . Since g and h are smooth and f is lower semicontinuous, thenlim j →∞ R ( x k j , y k j , z k j , y k j , z k j ) = lim j →∞ L α ( x k j , y k j , z k j )= lim j →∞ f ( x k j ) + g ( x k j ) + h ( y k j )= f ( x ∗ ) + g ( x ∗ ) + h ( y ∗ ) . This concludes the proof. (cid:4)
In this section, we establish the main theoretical results of the sequence generated by VMP-LADMM. We begin with some important Lemmas.
Lemma 12
Suppose that the Assumption 1 holds. Let { ( x k , y k , z k ) } k ≥ be a sequence generatedby the VMP-LADMM algorithm (4). Define s kx := d kx , s ky := d ky + 2 rθ ∆ y k , s kz := d kz + 2 rγ BB ∗ ∆ z k ,s ky (cid:48) := − rθ ∆ y k , s kz (cid:48) := − γ BB ∗ ∆ z k where ( d kx , d ky , d kz ) ∈ ∂ L α ( x k , y k , z k ) . Then s k := ( s kx , s ky , s kz , s ky (cid:48) , s kz (cid:48) , ) ∈ ∂ R ( x k , y k , z k , y k − , z k − ) for k ≥ , and it holds ||| s k ||| ≤ ˜ ρ (cid:16) (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) (cid:17) , (40) where ˜ ρ = √ ρ + 4 r max { θ , γ } , (41) r > , ρ is given in (8), θ and γ are as (23).Proof . Let k ≥ d kx , d ky , d kz ) ∈ ∂ L α ( x k , y k , z k ). By taking partial derivatives of R k with respect to x, y, z, y (cid:48) , z (cid:48) we obtain s kx := ∂ x R ( x k , y k , z k , y k − , z k − ) = ∂ x L α ( x k , y k , z k ) = d kx ,s ky := ∇ y R ( x k , y k , z k , y k − , z k − ) = ∇ y L α ( x k , y k , z k ) + 2 rθ ∆ y k = d ky + 2 θ ∆ y k ,s kz := ∇ z R ( x k , y k , z k , y k − , z k − ) = ∇ z L α ( x k , y k , z k ) + 2 γ BB ∗ ∆ z k = d kz + 2 γ BB ∗ ∆ z k ,s ky (cid:48) := ∇ y (cid:48) R ( x k , y k , z k , y k − , z k − ) = − rθ ∆ y k ,d kz (cid:48) := ∇ z (cid:48) R ( x k , y k , z k , y k − , z k − ) = − rγ BB ∗ ∆ z k . y the triangle inequality we obtain (cid:107) s kx (cid:107) = (cid:107) d kx (cid:107) , (cid:107) s ky (cid:107) ≤ (cid:107) d ky (cid:107) + 2 rθ (cid:107) ∆ y k (cid:107) , (cid:107) s kz (cid:107) ≤ (cid:107) d kz (cid:107) + 2 rγ (cid:107) B (cid:107) (cid:107) ∆ z k (cid:107) , (cid:107) s ky (cid:48) (cid:107) ≤ rθ (cid:107) ∆ y k (cid:107) , (cid:107) s kz (cid:107) ≤ rγ (cid:107) B (cid:107) (cid:107) ∆ z k (cid:107) . By Lemma 1, this follows that ||| s k ||| ≤ (cid:107) s kx (cid:107) + (cid:107) s ky (cid:107) + (cid:107) s kz (cid:107) + (cid:107) s ky (cid:48) (cid:107) + (cid:107) s kz (cid:107)≤ (cid:107) d kx (cid:107) + (cid:107) d ky (cid:107) + (cid:107) d kz (cid:107) + 4 rθ (cid:107) ∆ y k (cid:107) + 4 rγ (cid:107) B (cid:107) (cid:107) ∆ z k (cid:107)≤ √ ||| d k ||| + 4 rθ (cid:107) ∆ y k (cid:107) + 4 rγ (cid:107) B (cid:107) (cid:107) ∆ z k (cid:107)≤ √ ρ (cid:107) ∆ x k (cid:107) + ( √ ρ + 4 rθ ) (cid:107) ∆ y k (cid:107) + +( √ ρ + 4 rγ ) (cid:107) ∆ z k (cid:107) . This concludes the proof. (cid:4)
Lemma 13
Let the Assumption 1 and 2 hold. If { ( x k , y k , z k ) } k ≥ is a sequence generated bythe VMP-LADMM algorithm (4), then the following statements hold.(i) The set ω (cid:16) { ( x k , y k , z k , y k − , z k − ) } k ≥ (cid:17) is nonempty, connected, and compact.(ii) Ω ⊆ { ( x, y, z, y, z ) ∈ R n × R m × R p × R m × R p : ( x, y, z ) ∈ crit( L α ) (cid:111) . (iii) lim k →∞ dist (cid:104) ( x k , y k , z k , y k − , x k − ) , Ω (cid:105) = 0 ;(iv) The sequences {R k } k ≥ , {L α ( x k , y k , z k ) } k ≥ , and {F ( x k , y k ) } k ≥ approach to the samelimit and if ( x ∗ , y ∗ , z ∗ , y ∗ , z ∗ ) ∈ Ω , then R ( x ∗ , y ∗ , z ∗ , y ∗ , z ∗ ) = L α ( x ∗ , y ∗ , z ∗ ) = F ( x ∗ , y ∗ ) . Proof . These results follow immediately from Lemma 3, Theorem 8, and Lemma 12. (cid:4)
Theorem 2 (Convergence of sequence)
Suppose that the Assumptions 1 and 2 (ii) hold, { ( x k , y k , z k ) } k ≥ is a sequence generated by the VMP-LADMM algorithm (4) which is assumedto be bounded, and R satisfies the K(cid:32)L property on Ω := ω (cid:16) { ( x k , y k , z k , y k − , z k − ) } k ≥ (cid:17) . Thatis, for every v ∗ := ( x ∗ , y ∗ , z ∗ , y ∗ , z ∗ ) ∈ Ω there exists (cid:15) > , η ∈ [0 , + ∞ ) , and desingularizingfunction ψ ∈ Ψ η such that for every v = ( x, y, z, y (cid:48) , z (cid:48) ) ∈ S , where S := (cid:110) v ∈ R n × R m × R p × R m × R p : dist( v, Ω) < (cid:15) and R ( v ∗ ) < R ( v ) < R ( v ∗ ) + η (cid:111) , (42) it holds ψ (cid:48) (cid:16) R ( v ) − R ( v ∗ ) (cid:17) dist (cid:16) , ∂ R ( x ) (cid:17) ≥ . then { u k } k ≥ := { ( x k , y k , z k ) } k ≥ satisfies the finite length property ∞ (cid:88) k =0 (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) < + ∞ , consequently converges to a stationary point of (1).Proof . By Lemma 8, there exists a k ≥ {R k } k ≥ k is monotonicallydecreasing and converges, let R ∞ := lim k →∞ R k . This follows that the error sequence E k := R k − R ∞ , is non-negative, monotonically decreasing for all k ≥ k , and converges to 0. Let usconsider two cases: ase 1. There is k ≥ k such that E k = 0. Hence E k = 0 for all k ≥ k and by (33) wehave (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ σ ( E k − E k +1 ) = 0 , ∀ k ≥ k . This gives rise to (cid:88) k ≥ (cid:16) (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) (cid:17) ≤ k − (cid:88) k =0 (cid:16) (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) (cid:17) < + ∞ . The latter conclusion is due to the fact that the sequence is bounded.
Case 2.
The error sequence E k = R k − R ∞ > k ≥ k . Then by (33) we have ||| ∆ u k +1 ||| = (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ σ ( E k − E k +1 ) , ∀ k ≥ k . (43)By Lemma 13, Ω is nonempty, compact, and connected and R k take on a constant value R ∞ on Ω. Since the sequence {R k } k ≥ k is monotonically decreasing to R ∞ , then there exists k ≥ k ≥ R ∞ < R k < R ∞ + η, ∀ k ≥ k . By Lemma 13 we also have lim k →∞ dist (cid:104) ( x k , y k , z k , y k − , x k − ) , Ω (cid:105) = 0 , thus there exists k ≥ (cid:104) ( x k , y k , z k , y k − , x k − ) , Ω (cid:105) < (cid:15), ∀ k ≥ k . Choose ˜ k = max { k , k , } then ( x k , y k , z k , y k − , x k − ) ∈ S for k ≥ ˜ k which follows that ψ (cid:48) ( E k ) · dist (cid:16) , ∂ R k (cid:17) ≥ . (44)Since ψ is concave, the we have ψ ( E k ) − ψ ( E k +1 ) ≥ ψ (cid:48) ( E k )( E k − E k +1 ) . By this, together with(43) and (44) we then obtain ||| ∆ u k +1 ||| ≤ ψ (cid:48) ( E k ) ||| ∆ u k +1 ||| · dist(0 , ∂ R k ) ≤ σ ψ (cid:48) ( E k )( E k − E k +1 ) · dist(0 , ∂ R k ) ≤ σ (cid:16) ψ ( E k ) − ψ ( E k +1 ) (cid:17) · dist(0 , ∂ R k ) . By the arithmetic mean-geometric mean inequality for any γ > ||| ∆ u k +1 ||| ≤ γ σ (cid:16) ψ ( E k ) − ψ ( E k +1 ) (cid:17) + 12 γ dist(0 , ∂ R k ) . This follows that (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σ (cid:16) ψ ( E k ) − ψ ( E k +1 ) (cid:17) + √ γ dist(0 , ∂ R k ) . (45)By Lemma 12 we then obtain (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σ (cid:16) ψ ( E k ) − ψ ( E k +1 ) (cid:17) + √ ρ γ (cid:16) (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) (cid:17) . (46)Exploit the identity (cid:80) Kk = k (cid:107) ∆ x k (cid:107) = (cid:80) Kk = k (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ x k (cid:107) − (cid:107) ∆ x K (cid:107) , and choose γ > > √ ρ/ γ , Set δ = 1 − √ ρ γ . Summing up (60) from k = k ≥ ˜ k to K ≥ k leads to (cid:80) Kk = k (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σδ (cid:16) ψ ( E k ) − ψ ( E K +1 ) (cid:17) + √ ρ γδ (cid:16) (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) (cid:17) − √ ρ γδ (cid:16) (cid:107) ∆ x K (cid:107) + (cid:107) ∆ y K (cid:107) + (cid:107) ∆ z K (cid:107) (cid:17) . ecall that E k is monotonically decreasing and ψ ( E k ) ≥ ψ ( E k +1 ) > (cid:80) Kk = k (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σδ ψ ( E k ) + √ ρ γδ (cid:0) (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) (cid:1) . The right hand side of this inequality is bounded for any K ≥ k , we let K → ∞ to obtain (cid:88) k ≥ k (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σδ ψ ( E k ) + √ ρ γδ (cid:16) (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) (cid:17) . (47)Since { ( x k , y k , z k ) } k ≥ is a bounded sequence, for any k ∈ Z + we clearly have λ ( k ) := k − (cid:88) k =0 (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) < + ∞ . Thus, (cid:80) k ≥ (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) < + ∞ .Note that for any p, q, K ∈ Z + where q ≥ p > ||| u q − u p ||| = ||| q − (cid:88) k = p ∆ u k +1 ||| ≤ q − (cid:88) k = p ||| ∆ u k +1 ||| ≤ q − (cid:88) k = p (cid:16) (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) (cid:17) ≤ (cid:88) k ≥ (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) . Hence (cid:80) k ≥ (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) < + ∞ yields that { u k } k ≥ = { ( x k , y k , z k ) } k ≥ isa Cauchy sequence, hence converges. By Lemma 3, it converges to a stationary point. (cid:4) Remark 3
Theorem 2 gives rise to the fact that the limit point set ω ( { ( x k , y k , z k ) } k ≥ ) is asingleton. Let’s denote by ( x ∞ , y ∞ , z ∞ ) the unique limit point of the sequence ( x k , y k , z k ) k ≥ . Theorem 3 (Convergence rate of R k ) Suppose that Assumption 1 and 2 (ii) hold, and R satisfies the (cid:32)Lojasiewicz property at v ∞ := ( x ∞ , y ∞ , z ∞ , y ∞ , z ∞ ) , that is, there exists an expo-nent θ ∈ [0 , , C L > , and (cid:15) > such that for all v := ( x, y, z, y (cid:48) , z (cid:48) ) such that dist( v, v ∞ ) < (cid:15) |R ( v ) − R ( v ∞ ) ) | θ ≤ C L dist(0 , ∂ R ( v )) . (48) holds. Denote E k := R k − R ∞ , where R ∞ := R ( v ∞ ) = lim k →∞ R k . There exists K ≥ suchthat ¯ α E θk ≤ E k − − E k , ∀ k ≥ K (49) where ¯ α > . Moreover,(a) if θ = 0 , then E k converges to zero in a finite number of iterations.(b) if θ ∈ (0 , / , then for all k ≥ K it holds E k ≤ max {E i : 1 ≤ i ≤ K } (1 + ¯ α E θ − K ) k − K +1 , (50) (c) if θ ∈ (1 / , then there is a µ > such that for all k ≥ K it holds E k ≤ (cid:16) µ ( k − K + 1) + E − θK (cid:17) θ − , ∀ k ≥ K. roof . By (40) for any k ≥ ρ ||| s k ||| ≤ (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) . (51)By (33) there is a k ≥ k ≥ k we have (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) ≤ σ ( E k − E k − ) . (52)Combining (51) and (52) leads to13˜ ρ ||| s k ||| ≤ σ ( E k − − E k ) . (53)Since R satisfies the (cid:32)Lojasiewicz property at v ∞ , v k → v ∞ , R k monotonically decreasingand R k → R ∞ as k → ∞ , then there exist an K ≥ k , (cid:15) > θ ∈ [0 , C L > k ≥ K dist( v k , v ∞ ) < (cid:15) and |R k − R ∞ | θ ≤ C L dist(0 , ∂ R k ) holds. This follows that E θk ≤ C L ||| s k ||| , ∀ k ≥ K where s k ∈ ∂ R k . This, together with (53) yields σ C L ˜ ρ E θk ≤ E k − − E k . Setting ¯ α = σ/ C L ˜ ρ >
0, we obtain (49).( i ) Let θ = 0. If E k > k ≥ K we would have ¯ α ≤ E k − − E k . As k approaches to infinity,the right hand side approaches to zero, then 0 < ¯ α ≤
0, which leads to a contradiction. Hence E k must be equal to zero for k ≥ K . Hence, there is a ˜ k ≤ K such that E k = 0 for all k ≥ ˜ k .( ii ) If θ ∈ (0 , ], then 2 θ − <
0. Let k ≥ K + 1 be fixed. {E i } i ≥ K is monotonicallydecreasing hence E i ≤ E K for i = K + 1 , K + 2 , . . . , k and¯ α E θ − K E k ≤ E k − − E k . We rearrange this to obtain E k ≤ E k − α E θ − K ≤ E k − (1 + ¯ α E θ − K ) ≤ · · · ≤ E K (1 + ¯ α E θ − k ) k − K . Hence E k ≤ max {E i : 0 ≤ i ≤ K } (1 + ¯ α E θ − K ) k − K , k > K. ( iii ) Let θ ∈ (1 / , α ≤ ( E k − − E k ) E − θk , ∀ k ≥ K (54)We let h : R + → R defined by h ( s ) = s − θ for s ∈ R + . Clearly, h is monotonically decreasing as h (cid:48) ( s ) = − θs − (1+2 θ ) < h ( E k − ) ≤ h ( E k ) for all k > K as E k is monotonicallydecreasing. We consider two cases. First, let r ∈ (1 , + ∞ ) such that h ( E k ) ≤ r h ( E k − ) , ∀ k > K. ence, by (54) we obtain¯ α ≤ r ( E k − − E k ) h ( E k − ) ≤ r h ( E k − ) (cid:90) E k − E k ds ≤ r (cid:90) E k − E k h ( s ) ds = r (cid:90) E k − E k s − θ ds = r − θ [ E − θk − − E − θk ] , where 1 − θ <
0. Rearrange to get0 < ¯ α (2 θ − r ≤ E − θk − E − θk − . Setting ˆ µ = ¯ α (2 θ − r > ν := 1 − θ < < ˆ µ < E νk − E νk − , ∀ k > K. (55)Next, let consider the case where h ( E k ) ≥ r h ( E k − ), hence E − θk ≥ r E − θk − . Rearranging thisgives r − E θk − ≥ E θk , which by raising both sides to the power 1 / θ and setting q := r − θ ∈ (0 , q E k − ≥ E k . Since ν = 1 − θ < q ν E νk − ≤ E νk , which follows that( q ν − E νk − ≤ E νk − E νk − . By the fact that q ν − > E p → + as p → ∞ , there exists ¯ µ such that ( q ν − E νk − > ¯ µ for all k > K . Therefore we obtain 0 < ¯ µ ≤ E νk − E νk − . (56)Choose µ = min { ˆ µ, ¯ µ } >
0, one can combine (55) and (56) to obtain0 < µ ≤ E νk − E νk − , ∀ k > K. Summing this inequality from K + 1 to some k ≥ K + 1 gives µ ( k − K ) + E νK ≤ E νk . Hence E k ≤ ( µ ( k − K ) + E νK ) /ν = ( µ ( k − K ) + E − θK ) / (1 − θ ) . This concludes the proof. (cid:4)
Theorem 4 (Convergence rate of sequence)
Suppose that the Assumptions 1 and 2 (ii)hold, and u ∞ := ( x ∞ , y ∞ , z ∞ ) is the unique limit point of the sequence { ( x k , y k , z k ) } k ≥ gener-ated by the VMP-LADMM algorithm. If R satisfies the K(cid:32)L property at v ∞ := ( x ∞ , y ∞ , z ∞ , y ∞ , z ∞ ) then there exists a K ≥ such that for all k ≥ K we have ||| u k − u ∞ ||| ≤ C max { ψ ( E k ) , (cid:112) E k − } , (57) where C > constant, E k := R k − R ∞ , R ∞ := R ( v ∞ ) = lim k →∞ R k , ψ ∈ Ψ η with η > denotes a desingularizing function. Moreover, if ψ : [0 , η ) → [0 , + ∞ ) , ψ ( s ) = s − θ , where θ ∈ [0 , then the following rates hold i) If θ = 0 , then u k converges to u ∞ in a finite number of iterations.(ii) If θ ∈ (0 , / , then for all k ≥ K it holds ||| u k − u ∞ ||| ≤ max {√E i : 1 ≤ i ≤ K } (cid:113) (1 + ¯ α E θ − K ) k − K +1 , where ˜ α = σ/ ρ .(iii) if θ ∈ (1 / , then ||| u k − u ∞ ||| ≤ (cid:16) µ ( k − K + 1) + E K − θ (cid:17) − θ θ − , ∀ k ≥ K. Proof . Let k ≥ {E k } k ≥ k is monotonically decreasing and by (33) for all k ≥ k + 1 it holds (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) ≤ √ √ σ (cid:112) E k − − E k ≤ √ √ σ (cid:112) E k − . (58)By this, the fact that R k converges to R ∞ , lim k →∞ v k = v ∞ , and R satisfies the K(cid:32)L propertyat v ∞ we conclude that there exists (cid:15) > η > ψ ∈ Ψ η , and K ≥ k + 1 such that for all k ≥ K , dist( v k , v ∞ ) < (cid:15) and R ∞ < R k < R ∞ + η, and the K(cid:32)L property ψ (cid:48) (cid:0) E k (cid:1) · dist (cid:0) , ∂ R k (cid:1) ≥ ψ and (43) we then obtain ||| ∆ u k +1 ||| ≤ σ (cid:16) ψ ( E k ) − ψ ( E k +1 ) (cid:17) · dist(0 , ∂ R k ) . By the arithmetic mean-geometric mean inequality for any γ > (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σ (cid:16) ψ ( E k ) − ψ ( E k +1 ) (cid:17) + √ γ dist(0 , ∂ R k ) . Using Lemma 12 gives (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σ ψ ( E k ) + √ ρ γ (cid:16) (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) (cid:17) . (60)Let γ > > √ ρ/ γ . Denote δ := 1 − √ ρ γ , then sum up the latterinequality over k ≥ K to get (cid:80) k ≥ K (cid:107) ∆ x k +1 (cid:107) + (cid:107) ∆ y k +1 (cid:107) + (cid:107) ∆ z k +1 (cid:107) ≤ √ γ σδ ψ ( E K ) + √ ρ γδ (cid:16) (cid:107) ∆ x K (cid:107) + (cid:107) ∆ y K (cid:107) + (cid:107) ∆ z K (cid:107) (cid:17) . Hence by the triangle inequality for any k ≥ K it holds ||| u k − u ∞ ||| ≤ (cid:88) p ≥ k ||| ∆ u p +1 |||≤ (cid:88) p ≥ k (cid:107) ∆ x p +1 (cid:107) + (cid:107) ∆ y p +1 (cid:107) + (cid:107) ∆ z p +1 (cid:107)≤ √ γ σδ ψ ( E k ) + √ ρ γδ (cid:16) (cid:107) ∆ x k (cid:107) + (cid:107) ∆ y k (cid:107) + (cid:107) ∆ z k (cid:107) (cid:17) . xploitting (58), the latter inequality leads to ||| u k − u ∞ ||| ≤ √ γ σδ ψ ( E k ) + 3˜ ρ γδ √ σ (cid:112) E k − ≤ C max { ψ ( E k ) , (cid:112) E k − } , where C = max (cid:110) √ γ σδ , ρ γδ √ σ (cid:111) . By the concavity of ψ follows that for all k ≥ K ||| u k − u ∞ ||| ≤ C max {E − θk − , (cid:112) E k − } . (61)We let now θ ∈ [0 ,
1) and ψ ( s ) = s − θ , then ψ (cid:48) ( s ) = (1 − θ ) s − θ . Then (59) yields E kθ ≤ dist (cid:0) , ∂ R k (cid:1) , ∀ k ≥ K. This implies that R k satisfies the (cid:32)Lojasiewics (48) at v ∞ for all k ≥ K with C L = 1. (i) If θ = 0, then E k → u k must convergeto u ∞ in a finite numbers of iterations. (ii) If θ ∈ (0 , / {E − θk − , (cid:112) E k − } = (cid:112) E k − . By Theorem 3(ii) ||| u k − u ∞ ||| ≤ max {√E i : 1 ≤ i ≤ K } (cid:113) (1 + ¯ α E θ − K ) k − K +1 , ∀ k ≥ K. where ˜ α = σ/ ρ . (iii) If θ ∈ (1 / , {E − θk , (cid:112) E k − } = E − θk . By Theorem 3(iii) we have ||| u k − u ∞ ||| ≤ (cid:16) µ ( k − K + 1) + E K − θ (cid:17) − θ θ − . This completes the proof. (cid:4)
In this paper, we considered the variable metric proximal linearized ADMM method, the algo-rithm (4), and established the convergence and convergence rate analysis. The algorithm solves abroad class of linearly constrained nonconvex and nonsmooth minimization problem of the form(1), in which functions f , g and h satisfy the (cid:32)Lojasiewicz and Kurdyka-(cid:32)Lojasiewicz inequalities.Since most convex functions in finite dimensional space–are semi-algebraic or sub-analytic, orinvolving o -minimal structures (see [3] for details)–then satisfy the K(cid:32)L property and hence ourtheoretical results also work in convex setting. References [1]
B. Ames and M. Hong , Alternating directions method of multipliers for (cid:96) -penalized zerovariance discriminant analysis and principal component analysis , Comput. Optim. Appl.http://dx.doi.org/10.1007/s10589-016-9828-y, 64 (2016).[2] H. Attouch and J. Bolte , On the convergence of the proximal algorithm for nonsmoothfunctions involving analytic features , Math. Program., 116 (2009), pp. 5–16. H. Attouch, J. Bolte, P. Redont, and A. Soubeyran , Proximal alternating min-imization and projection methods for nonconvex problems: an approach based on theKurdyka–(cid:32)Lojasiewicz inequality , Math. Oper. Res., 35 (2010), pp. 438–457.[4]
H. Attouch, J. Bolte, and B. Svaiter , Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regular-ized gauss-seidel methods , Math. Program. Ser. A, 137 (2013), pp. 91–129.[5]
D. Boley , Local linear convergence of the alternating direction method of multipliers onquadratic or linear programs , SIAM J. Optim, 23 (2013).[6]
J. Bolte, A. Daniilidis, and A. Lewis , The (cid:32)Lojasiewicz inequality for nonsmooth sub-analytic functions with applications to subgradient dynamical systems , SIAM J. Optim, 17(2006), pp. 1205–1223.[7]
J. Bolte, A. Daniilidis, M. Ley, and L. Mazet , Characterizations of (cid:32)Lojasiewiczinequalities: Subgradientflows, talweg, convexity , Trans. Amer. Math. Soc, 362 (2010),pp. 3319–3363.[8]
J. Bolte, S. Sabach, and M. Teboulle , Proximal alternating linearized minimizationfor nonconvex and nonsmooth problems , Math. Program., 146 (2014), pp. 459–494.[9]
L. A. Bolte J., Daniilidis A. and S. M. , Clarke subgradients of stratifiable functions ,SIAM J. Optim, 18 (2007), pp. 556–572.[10]
R. Bot¸ and D. Nguyen , The proximal alternating direction method of multipliers in thenonconvex setting: Convergence analysis and rates , Math. Oper. Res., 45 (2020), pp. 682–712.[11]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein , Distributed optimizationand statistical learning via the alternating direction method of multipliers , Foundations andTrends in Machine Learning, 3 (2010), pp. 1–122.[12]
C. Chen, R. Chan, S. Ma, and J. Yang , Inertial proximal admm for linearly constrainedseparable convex optimization , SIAM J. Imaging Sci., 8 (2015), pp. 2239–2267.[13]
C. Chen, B. He, X. Yuan, and Y. Ye , The direct extension of ADMM for multi-blockconvex minimization problems is not necessarily convergent , Math. Program., 55 (2016),pp. 57–79.[14]
Y. Chen, W. W. Hager, M. Yashtini, X. Ye, and H. Zhang , Bregman operatorsplitting with variable stepsize for Total Variation image reconstruction , Comput. Optim.Appl., 54 (2013), pp. 317–342.[15]
W. Deng and W. Yin , On the global and linear convergence of the generalized alternatingdirection method of multipliers , J. Sci. Comput., 66 (2015), pp. 889–916.[16]
J. Douglas and H. Rachford , On the numerical solution of the heat conduction problemin 2 and 3 spacevariables , Trans. Am. Math. Soc., 82 (1956).[17]
J. Eckstein and D. Bertsekas , On the Douglas-Rachford splitting method and theproximal point algorithm for maximal monotone operators , Math. Program., 55 (1992),pp. 293–318.[18]
J. Eckstein and W. Yao , Understanding the convergence of the alternating directionmethod of multipliers: theoretical and computational perspectives , Pac. J. Optim, 11 (2015),pp. 619–644.[19]
P. A. Forero, A. Cano, and G. B. Giannakis , Distributed clustering using wirelesssensor networks , IEEE J. Selected Topics Signal Process., 5 (2011).[20]
D. Gabay and B. Mercier , A dual algorithm for the solution of nonlinear variationalproblems via finite element approximations , Computers and Mathematics with Applica-tions, 2 (1976), pp. 17–40. W. W. Hager, C. Ngo, M. Yashtini, and H. Zhang , Alternating direction approximateNewton (ADAN) algorithm for ill-conditioned inverse problems with application to parallelMRI , J. Oper. Res. Soc. China, 3 (2015), pp. 139–162.[22]
W. W. Hager, M. Yashtini, and H. Zhang , An O (1 /k ) convergence rate for thevariable stepsize Bregman operator splitting algorithm , SIAM J. Numer. Anal., 53 (2016),pp. 1535–1556.[23] D. Han and X. Yuan , A note on the alternating direction method of multipliers , J. Optim.Theory Appl., 155 (2012), pp. 227–238.[24]
B. He, M. Tao, and X. Yuan , Alternating direction method with gaussian back substi-tution for separable convex programming , SIAM J. Optim, 22 (2012), pp. 313–340.[25]
M. Hong and Z. Luo , On the linear convergence of the alternating direction method ofmultipliers , Math. Program., 162 (2017), pp. 165–199.[26]
M. Hong, Z. Luo, and M. Razaviyayn , Convergence analysis of alternating directionmethod of multipliers for a family of nonconvex problems , SIAM J. Optim, 26 (2016),pp. 337–364.[27]
F. Lin, M. Fardad, and M. R. Jovanovic , Design of optimal sparse feedback gains viathe alternating direction method of multipliers , IEEE Trans. Automat. Control, 58 (2013),pp. 2426–2431.[28]
T. Lin, S. Ma, and S. Zhang , On the global linear convergence of the ADMM withmultiblock variables , SIAM J. Optim, 25 (2015), pp. 1478–1497.[29]
T. Lin, S. Ma, and S. Zhang , On the sublinear convergence rate of multi-block ADMM ,J. Oper. Res. Soc. China, 3 (2015), pp. 251–274.[30]
T. Lin, S. Ma, and S. Zhang , Iteration complexity analysis of multi-block admm for afamily of convex minimization without strong convexity , J. Sci. Comput., 69 (2016), pp. 52–81.[31]
P. L. Lions and B. Mercier , Splitting algorithms for the sum of two nonlinear operators ,SIAM J. Numer. Anal., 16 (1979), pp. 964–979.[32]
Q. Liu, X. Shen, and Y. Gu , Linearized ADMM for nonconvex nonsmooth optimizationwith convergence analysis , IEEE Access, 7 (2019), pp. 76131–76144.[33]
S. (cid:32)Lojasiewicz , Une propri´et´e topologique des sous-ensembles analytiques r´eels. In: Les´Equations aux D´eriv´ees Partielles , Editions du Centre National de la Recherche Scien-tifique, Paris, 1963.[34]
J. G. Melo and R. D. Monteiro , Iteration-complexity of a linearized proximal multiblockADMM class for linearly constrained nonconvex optimization problems , (2017).[35]
D. Peaceman and H. Rachford , The numerical solution of parabolic elliptic differentialequations , SIAM J. Appl. Math., 3 (1955), pp. 28–41.[36]
R. T. Rockafellar and R. Wets. , Variational analysis , vol. 317, Grundlehren derMathematischen Wissenschaften, Springer, Berlin, 1998.[37]
Y. Shen, Z. Wen, , and Y. Zhang , Augmented Lagrangian alternating direction methodfor matrix separation based on low-rank factorization , Optim. Methods Soft., 29 (2014),pp. 239–263.[38]
W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin , On the linear convergence of theADMM in decentralized consensus optimization , IEEE Trans. Signal Process., 62 (2014),pp. 1750–1761.[39]
D. Sun, K. C. Toh, and L. Yang , A convergent 3-block semi-proximal alternating direc-tion method of multipliers for conic programming with 4-type constraints , SIAM J. Optim,25 (2015), pp. 882–915. A. Themelis and P. Patrinos , Douglas–Rachford splitting and ADMM for nonconvexoptimization: Tight convergence results , SIAM J. Optim, 30 (2020), pp. 149–181.[41]
Y. Wang, W. Yin, and J. Zeng , Global convergence of ADMM in nonconvex nonsmoothoptimization , J. Sci. Comput., 78 (2019), pp. 29–63.[42]
Z. Wen, C. Yang, X. Liu, and S. Marchesini , Alternating direction methods forclassical and ptychographic phase retrieval , Inverse problems, 28 (2012), pp. 1–18.[43]
C. Wu and X.-C. Tai , Augmented Lagrangian method, dual methods, and split Bregmaniteration for ROF, vectorial TV, and high order models , SIAM J. Imaging Sci., 3 (2010),pp. 300–339.[44]
J. Yang, Y. Zhang, and W. Yin , A fast alternating direction method for TVL1-L2signal reconstruction from partial Fourier data , IEEE J. Selected Topics Signal Process., 4(2010), pp. 288–297.[45]
M. Yashtini , Euler’s Elastica-based algorithm for parallel MRI reconstruction usingSensitivity Encoding , Optimization Letter (https://doi.org/10.1007/s11590-019-01451-8),(2019).[46]
M. Yashtini, W. W. Hager, Y. Chen, and X. Ye , Partially parallel MR image re-construction using sensitivity encoding , in 2012 IEEE International Conference on ImageProcessing, Orlando, 2012, IEEE, pp. 2077–2080.[47]
M. Yashtini, S. Kang, and W. Zhu , Efficient alternating minimization methods forvariational edge-weighted colorization models , Adv. Comput. Math., 45 (2019), pp. 1735–1767.[48]
M. Yashtini and S. H. Kang , Alternating direction method of multiplier for Euler’selastica-based denoising , SSVM 2015, LNCS 9087, (2015), pp. 690–701.[49] ,
A fast relaxed normal two split method and an effective weighted TV approach forEuler’s Elastica image inpainting , SIAM J. Imaging Sci., 9 (2016), pp. 1552–1581., SIAM J. Imaging Sci., 9 (2016), pp. 1552–1581.