An inertial alternating direction method of multipliers
aa r X i v : . [ m a t h . O C ] A p r An inertial alternating direction method of multipliers
Radu Ioan Bot¸ ∗ Ern¨o Robert Csetnek † August 11, 2018
Abstract.
In the context of convex optimization problems in Hilbert spaces, we induce inertial effects into theclassical ADMM numerical scheme and obtain in this way so-called inertial ADMM algorithms, the convergenceproperties of which we investigate into detail. To this aim we make use of the inertial version of the Douglas-Rachford splitting method for monotone inclusion problems recently introduced in [12], in the context ofconcomitantly solving a convex minimization problem and its Fenchel dual. The convergence of both sequencesof the generated iterates and of the objective function values is addressed. We also show how the obtainedresults can be extended to the treating of convex minimization problems having as objective a finite sum ofconvex functions.
Key Words. inertial ADMM algorithm, inertial Douglas-Rachford splitting, maximally monotone operator,resolvent, subdifferential, convex optimization, Fenchel duality
AMS subject classification.
One of the most popular algorithms in the literature for solving the convex optimization probleminf x ∈ R n { f ( x ) + g ( Ax ) } , (1)where f : R n → R and g : R m → R are proper, convex and lower semicontinuous functions and A is a m × n matrix with real entries, is the alternating direction method of multipliers (ADMM). We briefly describe thisprocedure. By introducing an auxiliary variable one can rewrite (1) asinf ( x,z ) ∈ R n × R m Ax − z =0 { f ( x ) + g ( z ) } . (2)For γ ≥ augmented Lagrangian L γ : R n × R m × R m → R defined by L γ ( x, z, y ) = f ( x ) + g ( z ) + y T ( Ax − z ) + γ k Ax − z k ∀ ( x, y, z ) ∈ R n × R m × R m , where the Euclidean norm on R m is taken.The ADMM algorithm reads for given y , z ∈ R m and every k ≥ x k +1 = argmin x ∈ R n L γ ( x, z k , y k ) (3) ∗ University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email:[email protected]. Research partially supported by DFG (German Research Foundation), project BO 2516/4-1. † University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: [email protected]. Research supported by DFG (German Research Foundation), project BO 2516/4-1. k +1 = argmin z ∈ R m L γ ( x k +1 , z, y k ) (4) y k +1 = y k + γ ( Lx k +1 − z k +1 ) . (5)The convergence of the ADMM algorithm is guaranteed by assuming that the matrix A has full columnrank and the unaugmented Lagrangian L has a saddle point ( x, z, y ) ∈ R n × R m × R m , that is L ( x, z, y ) ≤ L ( x, z, y ) ≤ L ( x, z, y ) ∀ ( x, z, y ) ∈ R n × R m × R m . Let us mention that if ( x, z, y ) is a saddle point of L , then x is an optimal solution to (1), z = Ax and y is anoptimal solution to the Fenchel dual problem to (1)sup v ∈ R m {− f ∗ ( − A T v ) − g ∗ ( v ) } , (6)where A T denotes the transpose of the matrix A and f ∗ and g ∗ the conjugate functions of f and g , respectively.One of the limitations of this algorithm is the presence of the term Ax in the update rule of x k +1 , whichmeans that the scheme is not a really full splitting algorithm, like the primal-dual algorithms recently consideredin [10,11,14]. Nevertheless, the algorithm has been successfully implemented in the context of different real-lifeproblems, like location problems, the lasso problem in image processing, problems arising in satistics, supportvector machines classification, etc. We refer the reader to the seminal work [13] for the history of the ADMMalgorithm and various concrete applications of it (see also [16, 17, 19, 20]).In this paper we propose new ADMM type numerical schemes, which have their roots in the class ofso-called inertial proximal point algorithms. The latter iterative schemes are designed for solving monotoneinclusion problems and, as they arise from the time discretization of some differential inclusions of secondorder type (see [1, 3]), have the property that the next iterate is defined by using the previous two iterates.In this way an inertial effect is induced into the numerical scheme, the increasing interest in this class ofalgorithms being emphasized by a considerable number of papers written in the last fifteen years on this topic,see [1–4, 9, 12, 15, 21–23].We derive the inertial version of the ADMM from the perspective of the monotone operator theory, usingas starting point the fact pointed out in [20] that the classical ADMM can be approached from the Douglas-Rachford splitting scheme for monotone inclusion problems (see also [17]). In [12] we recently introducedand studied the convergence properties of an inertial Douglas-Rachford splitting algorithm . By combiningthis iterative scheme with the techniques from [17, 20], we are able to obtain an inertial ADMM scheme forsimultaneously solving convex minimization problems and their Fenchel-type duals. For the sake of generality,the analysis is carried out in infinite dimensional Hilbert spaces, in opposition to the usual literature onADMM algorithms where the finite dimensional setting is preferred. Moreover, we prove the convergence ofboth sequences of the generated iterates and of the objective function values and show that the classical ADMMscheme can be recovered as particular instance of our inertial ADMM algorithm. We also point out how otherADMM-type algorithms from the literature turn out to be particular schemes of the new ones presented here.The paper is organized as follows. In the next section we make the reader familiar with the notions andresults which will be used throughout the manuscript. In Section 3 we introduce the inertial ADMM algorithmfor simultaneously solving in Hilbert spaces the convex optimization problems which assumes the minimizationof the sum of a proper, convex and lower semicontinuous function with the composition of another proper,convex and lower semicontinuous function with a linear continuous operator and its Fenchel dual problem andstudy its convergence properties. Finally, in the last section we treat the convex minimization problem havingas objective the finite sum of proper, convex and lower semicontinuous functions and its Fenchel-type dual andprovide for this primal-dual pair inertial ADMM algorithms and corresponding convergence statements.2 Preliminaries
For the readers convenience let us recall some standard notions and results in monotone operator theory andconvex analysis which will be used further in the paper, see also [5–7, 18, 26, 27]. Let N = { , , , ... } bethe set of nonnegative integers. Let H be a real Hilbert space with inner product h· , ·i and associated norm k · k = p h· , ·i . The symbols ⇀ and → denote weak and strong convergence, respectively. When G is anotherHilbert space and L : H → G a linear continuous operator, then L ∗ : G → H , defined by h L ∗ y, x i = h y, Lx i forall ( x, y ) ∈ H × G , denotes the adjoint operator of L .For an arbitrary set-valued operator A : H ⇒ H we denote by Gr A = { ( x, u ) ∈ H × H : u ∈ Ax } its graph and by A − : H ⇒ H its inverse operator , defined by ( u, x ) ∈ Gr A − if and only if ( x, u ) ∈ Gr A .We use also the notation zer A = { x ∈ H : 0 ∈ Ax } for the set of zeros of A . We say that A is monotone if h x − y, u − v i ≥ x, u ) , ( y, v ) ∈ Gr A . A monotone operator A is said to be maximally monotone , if thereexists no proper monotone extension of the graph of A on H × H . The resolvent of A , J A : H ⇒ H , is definedby J A = (Id H + A ) − , where Id H : H → H , Id H ( x ) = x for all x ∈ H , is the identity operator on H . Moreover,if A is maximally monotone, then J A : H → H is single-valued and maximally monotone (see [5, Proposition23.7 and Corollary 23.10]). For an arbitrary γ > p ∈ J γA x if and only if ( p, γ − ( x − p )) ∈ Gr A. (7)The operator A is said to be uniformly monotone if there exists an increasing function φ A : [0 , + ∞ ) → [0 , + ∞ ] that vanishes only at 0, and h x − y, u − v i ≥ φ A ( k x − y k ) for every ( x, u ) ∈ Gr A and ( y, v ) ∈ Gr A . Awell-known class of operators fulfilling this property is the one of the strongly monotone operators. Let γ > A is γ -strongly monotone , if h x − y, u − v i ≥ γ k x − y k for all ( x, u ) , ( y, v ) ∈ Gr A .Let us recall now some elements of convex analysis. For a function f : H → R , where R := R ∪ {±∞} isthe extended real line, we denote by dom f = { x ∈ H : f ( x ) < + ∞} its effective domain and say that f is proper if dom f = ∅ and f ( x ) = −∞ for all x ∈ H . We denote by Γ( H ) the family of proper, convex and lowersemi-continuous extended real-valued functions defined on H . Let f ∗ : H → R , f ∗ ( u ) = sup x ∈H {h u, x i − f ( x ) } for all u ∈ H , be the conjugate function of f . The subdifferential of f at x ∈ H , with f ( x ) ∈ R , is the set ∂f ( x ) := { v ∈ H : f ( y ) ≥ f ( x ) + h v, y − x i ∀ y ∈ H} . We take by convention ∂f ( x ) := ∅ , if f ( x ) ∈ {±∞} .Notice that if f ∈ Γ( H ), then ∂f is a maximally monotone operator (see [24]) and it holds ( ∂f ) − = ∂f ∗ .Let S ⊆ H be a nonempty set. The indicator function of S , δ S : H → R , is the function which takes thevalue 0 on S and + ∞ otherwise. The subdifferential of the indicator function is the normal cone of S , thatis N S ( x ) = { u ∈ H : h u, y − x i ≤ ∀ y ∈ S } , if x ∈ S and N S ( x ) = ∅ for x / ∈ S . Notice that, if S is a linearsubspace, then N S ( x ) = S ⊥ = { u ∈ H : h y, u i = 0 ∀ y ∈ S } for all x ∈ S .When f ∈ Γ( H ) and γ >
0, for every x ∈ H we denote by prox γf ( x ) the proximal point of parameter γ of f at x , which is the unique optimal solution of the optimization probleminf y ∈H (cid:26) f ( y ) + 12 γ k y − x k (cid:27) . (8)Notice that the resolvent of the maximally monotone operator ∂f is nothing else than the proximal pointoperator of f , namely, J γ∂f = (Id H + γ∂f ) − = prox γf . (9)Moreover, if f = δ S , where S ⊆ H is a nonempty, closed convex set, then the proximal point operator of f isthe orthogonal projection on S .Let us also recall that a proper function f : H → R is said to be uniformly convex , if there exists anincreasing function φ : [0 , + ∞ ) → [0 , + ∞ ] which vanishes only at 0 and such that f ( tx + (1 − t ) y ) + t (1 − t ) φ ( k x − y k ) ≤ tf ( x ) + (1 − t ) f ( y ) ∀ x, y ∈ dom f and ∀ t ∈ (0 , .
3n case this inequality holds for φ = ( β/ · ) , where β >
0, then f is said to be β -strongly convex . Let usmention that this property implies β -strong monotonicity of ∂f (see [5, Example 22.3]) (more general, if f isuniformly convex, then ∂f is uniformly monotone, see [5, Example 22.3]).We close this section by presenting the inertial Douglas-Rachford splitting algorithm for determining thezeros of the sum of two maximally monotone operators recently obtained in [12], which will be crucial for theproof of the main results in the next section. Theorem 1 (Inertial Douglas–Rachford splitting algorithm, see [12]) Let
A, B : H ⇒ H be maximally mono-tone operators such that zer( A + B ) = ∅ . Consider the following iterative scheme: ( ∀ k ≥ y k = J γB [ w k + α k ( w k − w k − )] v k = J γA [2 y k − w k − α k ( w k − w k − )] w k +1 = w k + α k ( w k − w k − ) + λ k ( v k − y k ) where γ > , w , w are arbitrarily chosen in H , ( α k ) k ≥ is nondecreasing with α = 0 and ≤ α k ≤ α < forevery k ≥ and λ, σ, δ > are such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . (10) Then there exists x ∈ H such that the following statements are true:(i) J γB x ∈ zer( A + B ) ;(ii) P k ∈ N k w k +1 − w k k < + ∞ ;(iii) ( w k ) k ∈ N converges weakly to x ;(iv) y k − v k → as k → + ∞ ;(v) ( y k ) k ≥ converges weakly to J γB x ;(vi) ( v k ) k ≥ converges weakly to J γB x ;(vii) if A or B is uniformly monotone, then ( y k ) k ≥ and ( v k ) k ≥ converge strongly to the unique point in zer( A + B ) . Remark 2
According to [12], the condition α = 0 can be replaced with the assumption w = w withoutaltering the conclusion of the above theorem. Remark 3
Let us mention that in the hypotheses of the above theorem we have0 < δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i < . Conversely, for a fixed α ∈ (0 , α > σ > α (cid:0) α (1 + α ) + σ (cid:1) + α + 2 α √ α p α (1 + α ) + σ < , δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i = α, for all δ ∈ { δ , δ } , where δ , = 1 − α − α (cid:0) α (1 + α ) + σ (cid:1) ± q(cid:0) − α − α (cid:0) α (1 + α ) + σ (cid:1)(cid:1) − αα ( α (1 + α ) + σ )2 αα . In this section we present the main result of the paper, which consists in the formulation of an inertial ADMMalgorithm for a primal-dual pair of convex optimization problems and in the investigation of its convergenceproperties. We start by describing the setting in which we work.
Problem 4
Let H and G be real Hilbert spaces, f ∈ Γ( H ), g ∈ Γ( G ) and L : H → G a linear continuousoperator. We aim to solve the convex optimization problem( P ) inf x ∈H { f ( x ) + g ( Lx ) } (11)together with its Fenchel-type dual problem( D ) sup v ∈G {− f ∗ ( − L ∗ v ) − g ∗ ( v ) } . (12)Denoting by v ( P ) and v ( D ) the optimal objective values of the two problems, respectively, the situation v ( P ) ≥ v ( D ), called in the literature weak duality , always holds. In case a regularity condition is fulfilled onecan guarantee equality for the optimal objective values and existence of optimal solutions to the dual. Forthe readers convenience, we discuss some regularity conditions which are suitable in this context. One of theweakest regularity conditions of interiority-type is the Attouch-Br´ezis condition , which reads0 ∈ sqri(dom g − L (dom f )) . (13)Here, for S ⊆ G a convex set, we denote bysqri S := { x ∈ S : ∪ λ> λ ( S − x ) is a closed linear subspace of G} its strong quasi-relative interior . Notice that we always have int S ⊆ sqri S (in general this inclusion may bestrict). If G is finite-dimensional, then sqri S coincides with ri S , the relative interior of S , which is the interiorof S with respect to its affine hull. In this case, condition (13) holds if there exists x ′ ∈ ri(dom f ) such that Lx ′ ∈ ri(dom g ). Considering again the infinite dimensional setting, we remark that condition (13) is fulfilled,if for example g is continuous at x ′ ∈ dom f ∩ L − (dom g ). Let us mention that, if (13) holds, then we have strong duality , which means that v ( P ) = v ( D ) and ( D ) has an optimal solution.Moreover, the optimality conditions for the primal-dual pair of optimization problems (11)-(12) read − L ∗ v ∈ ∂f ( x ) and v ∈ ∂g ( Lx ) . (14)More precisely, if ( P ) has an optimal solution x ∈ H and the regularity condition (13) is fulfilled, thenthere exists v ∈ G , an optimal solution to ( D ), such that (14) holds. Conversely, if the pair ( x, v ) ∈ H × G x is an optimal solution to ( P ) and v is an optimal solution to ( D ). For furtherconsiderations concerning duality we invite the reader to consult [5–8, 18, 26, 27].Let us mention some conditions ensuring that ( P ) has an optimal solution. Suppose that ( P ) is feasible,which means that its optimal objective value is not identical + ∞ . The existence of optimal solutions to ( P ) isguaranteed if, for instance, f is coercive (that is lim k x k→∞ f ( x ) = + ∞ ) and g is bounded from below. Indeed,under these circumstances, the objective function of ( P ) is coercive and the statement follows via [5, Corollary11.15]. On the other hand, when f is strongly convex, then the objective function of ( P ) is strongly convex,too, thus ( P ) has a unique optimal solution (see [5, Corollary 11.16]).Let us introduce now the inertial ADMM algorithm. Algorithm 5
Chose y , y , z , z ∈ G , γ > , ( α k ) k ≥ nondecreasing with ≤ α k ≤ α < for every k ≥ , ( λ k ) k ≥ and λ, σ, δ > such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . Suppose that either α = 0 or λ = α = 0 . Further, for all k ≥ set x k +1 = argmin x ∈H n f ( x ) + D y k − α k ( y k − y k − ) − γα k ( z k − z k − ) , Lx E + γ k Lx − z k k o (15) z k +1 = α k +1 λ k ( Lx k +1 − z k ) + (1 − λ k ) α k α k +1 γ (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) (16) z k +1 = argmin z ∈G n g ( z + z k +1 ) + D − y k − (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , z E + γ k z − λ k Lx k +1 − (1 − λ k ) z k k o (17) y k +1 = y k + γ (cid:16) λ k Lx k +1 + (1 − λ k ) z k − z k +1 (cid:17) + (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) . (18) Remark 6
In order to ensure that the sequence ( x k ) k ≥ is uniquely determined we assume that the operator L satisfies the hypothesis ( H ) ∃ θ > k Lx k ≥ θ k x k for all x ∈ H . (19)This condition guarantees that the objective function in (15) is strongly convex, hence ( x k ) k ≥ is well defined(see [5, Corollary 11.16]). Let us mention that ( H ) will be used also in the proof of the convergence statementsof the algorithm. Notice that if L injective and ran L ∗ is closed, then ( H ) holds (see [5, Fact 2.19]). Moreover,( H ) implies that L is injective. We conclude that in case ran L ∗ is closed, ( H ) is equivalent to L injective. Infinite dimensional spaces, namely, if H = R n and G = R m , with m ≥ n ≥
1, hypothesis ( H ) is nothing elsethan saying that L has full column rank, which is a condition widely used in the literature for proving theconvergence of the ADMM algorithm. Remark 7
Notice that the objective function of (17) is strongly convex, hence the sequence ( z k ) k ∈ N is welldefined as well. Moreover, it can be expressed with the help of the proximal point operator of g for every k ≥ z k +1 = − z k +1 + prox γ − g (cid:18) z k +1 + λ k Lx k +1 + (1 − λ k ) z k + 1 γ y k + (1 − λ k ) α k γ (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17)(cid:19) . L in the x -argument. Nevertheless,in case H = G and L is the identity operator on H , relation (15) can be expressed via the proximal pointoperator of f for every k ≥ x k +1 = prox γ − f (cid:18) z k − γ y k + α k γ ( y k − y k − ) + α k ( z k − z k − ) (cid:19) . Remark 8
Let us consider the case α k = 0 for all k ≥
1. Then the iterative scheme becomes for every k ≥ x k +1 = argmin x ∈H n f ( x ) + h y k , Lx i + γ k Lx − z k k o (20) z k +1 = argmin z ∈G n g ( z ) + h− y k , z i + γ k z − λ k Lx k +1 − (1 − λ k ) z k k o (21) y k +1 = y k + γ (cid:16) λ k Lx k +1 + (1 − λ k ) z k − z k +1 (cid:17) , (22)which is the error-free case of the classical ADMM algorithm as presented and investigated in [17]. Here ( λ k ) k ≥ can be regarded as a sequence of relaxation parameters. If one takes further λ k = 1 for all k ≥
1, one has theclassical ADMM algorithm (see for example [13]) x k +1 = argmin x ∈H n f ( x ) + h y k , Lx i + γ k Lx − z k k o (23) z k +1 = argmin z ∈G n g ( z ) + h− y k , z i + γ k z − Lx k +1 k o (24) y k +1 = y k + γ (cid:16) Lx k +1 − z k +1 (cid:17) . (25)We are now in position to state the main result of the paper. Theorem 9
In Problem 4 suppose that ( P ) has an optimal solution, the regularity condition (13) is fulfilled,the hypothesis ( H ) concerning the operator L holds and consider the sequences generated by Algorithm 5. Thenthere exists ( x, v ) ∈ H × G satisfying the optimality conditions (14) , hence, x is an optimal solution to ( P ) , v is an optimal solution to ( D ) and v ( P ) = v ( D ) , such that the following statements are true:(i) ( x k ) k ≥ converges weakly to x ;(ii) ( z k ) k ≥ converges strongly to ;(iii) ( z k ) k ∈ N converges weakly to Lx ;(iv) ( Lx k +1 − z k ) k ≥ converges strongly to ;(v) ( y k ) k ∈ N converges weakly to v ;(vi) if g ∗ is uniformly convex, then ( y k ) k ∈ N converges strongly to the unique optimal solution of ( D ) ;(vii) lim k → + ∞ ( f ( x k +1 ) + g ( z k + z k )) = v ( P ) = v ( D ) = lim k → + ∞ ( − f ∗ ( − L ∗ v k ) − g ∗ ( y k )) , where the sequence ( v k ) k ≥ is defined by v k = y k − γz k + γLx k +1 − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) ∀ k ≥ , (26) and ( v k ) k ≥ converges weakly to v . emark 10 Let us mention that the function g ∗ is uniformly convex, if g ∗ is β -strongly convex for β > g ∗ is β -strongly convex if and only if g is Fr´echet-differentiable and ∇ g is β − -Lipschitzian. Proof.
We introduce the sequence ( w k ) k ∈ N by w k = y k + γz k ∀ k ∈ N . (27)We intend to prove that the sequences ( y k ) k ∈ N , ( v k ) k ≥ , ( w k ) k ∈ N are nothing else than the ones generatedby inertial Douglas-Rachford algorithm presented in Theorem 1 for the maximal monotone operators A := ∂ ( f ∗ ◦ ( − L ∗ )) and B := ∂g ∗ . (28)Notice that the hypotheses of the theorem ensure that there exist a pair ( x, v ) ∈ H × G satisfying the optimalityconditions (14), from which one easily derives that zer( A + B ) = ∅ .We fix k ≥
1. We obtain from (17) that0 ∈ ∂g ( z k +1 + z k +1 ) − y k − (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) + γ (cid:16) z k +1 − λ k Lx k +1 − (1 − λ k ) z k (cid:17) , hence due to (18) y k +1 ∈ ∂g ( z k +1 + z k +1 ) . (29)From here we deduce z k +1 + z k +1 ∈ ∂g ∗ ( y k +1 ) = By k +1 , hence y k +1 = J γB ( γz k +1 + γz k +1 + y k +1 ) = J γB ( w k +1 + γz k +1 ) . (30)By (18) we have y k +1 = y k + γz k − γz k +1 + γλ k ( Lx k +1 − z k ) + (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , (31)thus γz k +1 = α k +1 ( y k +1 − y k − γz k + γz k +1 ) = α k +1 ( w k +1 − w k ) . (32)From (30) and (32) we obtain y k +1 = J γB [ w k +1 + α k +1 ( w k +1 − w k )] . (33)Further, from (15) we get0 ∈ ∂f ( x k +1 ) + L ∗ (cid:16) y k − α k ( y k − y k − ) − γα k ( z k − z k − ) (cid:17) + γL ∗ ( Lx k +1 − z k ) , which by (26) gives − L ∗ v k ∈ ∂f ( x k +1 ) . (34)We derive x k +1 ∈ ∂f ∗ ( − L ∗ v k ), hence − Lx k +1 ∈ − L∂f ∗ ( − L ∗ v k ) ⊆ ∂ ( f ∗ ◦ ( − L ∗ ))( v k ) = Av k , which leads to v k = J γA ( v k − γLx k +1 ) . (35)8aking into account (27) and (26) we have v k − γLx k +1 = y k − γz k − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) = 2 y k − w k − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) = 2 y k − w k − α k ( w k − w k − )and from (35) we get v k = J γA [2 y k − w k − α k ( w k − w k − )] . (36)Finally, from (31), (27) and (26) we derive w k +1 = w k + α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) + λ k (cid:16) γ ( Lx k +1 − z k ) − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17)(cid:17) = w k + α k ( w k − w k − ) + λ k ( v k − y k ) , hence w k +1 = w k + α k ( w k − w k − ) + λ k ( v k − y k ) . (37)In conclusion, for all k ≥ y k = J γB [ w k + α k ( w k − w k − )] v k = J γA [2 y k − w k − α k ( w k − w k − )] w k +1 = w k + α k ( w k − w k − ) + λ k ( v k − y k ) , which is the inertial Douglas-Rachford scheme from Theorem 1.Notice that the relation α = 0 from Algorithm 5 corresponds to the situation when in Theorem 1 thevectors w , w can be chosen arbitrarily in G , while the condition λ = α = 0 ensures that w = w , which isthe situation mentioned in Remark 2. Indeed, in case α = 0 and λ = α = 0, from (16) we get z = 0, hence,by (32), w = w .According to Theorem 1, there exists w ∈ G such that w k ⇀ w as k → + ∞ (38) w k +1 − w k → k → + ∞ (39) y k − v k → k → + ∞ (40) y k ⇀ J γB w as k → + ∞ (41) v k ⇀ J γB w as k → + ∞ . (42)From (32) and (39) we derive that z k → k → + ∞ . (43)Further, by (26), (39) and (40) we obtain Lx k +1 − z k → k → + ∞ . (44)Moreover, from (27), (38) and (41) we get z k ⇀ γ ( w − J γB w ) as k → + ∞ . (45)9e deduce from (44) that Lx k ⇀ γ ( w − J γB w ) as k → + ∞ . (46)Now, using the hypothesis ( H ), we easily derive that ( x k ) k ≥ is bounded, thus, due to (46), it possesses at mostone weak cluster point. As a consequence, ( x k ) k ≥ is weakly convergent (see [5, Lemma 2.38]), hence thereexists x ∈ H such that x k ⇀ x as k → + ∞ . (47)From (44), (45) and (47) we also have z k ⇀ Lx as k → + ∞ (48)and Lx = 1 γ ( w − J γB w ) . (49)Using the notation v = J γB w , we prove that the pair ( x, v ) ∈ H × G satisfies the optimality conditions (14).To this end, observe that, due to (34) and (29), we have( − L ∗ v k +1 + L ∗ y k +1 , z k +1 + z k +1 − Lx k +2 ) ∈ ( ∂f × B + S )( x k +2 , y k +1 ) ∀ k ≥ , (50)where S : H × G → H × G is defined by S ( x, y ) = ( L ∗ y, − Lx ) ∀ ( x, y ) ∈ H × G . Since S is monotone and continuous, it is maximally monotone (see [5, Corollary 20.25]). Further, ∂f × B is also maximally monotone (see [5, Proposition 20.23]) and since S has full domain, the sum ∂f × B + S is maximally monotone, too (see [5, Corollary 24.4]). Since the graph of a maximally monotone operator issequentially closed in the weak-strong topology (see [5, Proposition 20.33(ii)]), by taking the limits in (50) andusing (40), (44), (43), (47) and (41) we obtain(0 , ∈ ( ∂f × B + S )( x, v ) . One can easily show that the latter means the pair ( x, v ) satisfies the optimality conditions (14).The statements (i)-(v) follow now from (47), (43), (48), (44) and (41). Further, (vi) follows from Theorem1(vii).We are going to prove now statement (vii). Notice that f and g are weak lower semicontinuous (since f and g are convex) and therefore, by (i), (ii) and (iii) we getlim inf k → + ∞ ( f ( x k +1 ) + g ( z k + z k )) ≥ lim inf k → + ∞ f ( x k +1 ) + lim inf k → + ∞ g ( z k + z k ) ≥ f ( x ) + g ( Lx ) = v ( P ) . (51)Further, from (34) we derive the inequality f ( x ) ≥ f ( x k +1 ) + h− L ∗ v k , x − x k +1 i ∀ k ≥ g ( Lx ) ≥ g ( z k + z k ) + h y k , Lx − z k − z k i ∀ k ≥ . (53)Summing up the last two inequalities we get v ( P ) ≥ f ( x k +1 ) + g ( z k + z k ) + h− v k , Lx − Lx k +1 i + h y k , Lx − z k − z k i ∀ k ≥ , f ( x k +1 ) + g ( z k + z k ) ≤ v ( P ) + h v k − y k , Lx − Lx k +1 i + h y k , − Lx k +1 + z k + z k i ∀ k ≥ . Taking into account (40), (ii), (iii), (iv) and (v) we obtainlim sup k → + ∞ ( f ( x k +1 ) + g ( z k + z k )) ≤ v ( P ) . (54)Combining (51) and (54) we get the first part of the statement.Again by (34) and (29) we have (see [5, Proposition 16.9]) f ( x k +1 ) + f ∗ ( − L ∗ v k ) = h x k +1 , − L ∗ v k i ∀ k ≥ g ( z k + z k ) + g ∗ ( y k ) = h y k , z k + z k i ∀ k ≥ . (56)Adding these relations we derive for every k ≥ − f ∗ ( − L ∗ v k ) − g ∗ ( y k ) = f ( x k +1 ) + g ( z k + z k ) + h v k − y k , Lx k +1 i + h y k , Lx k +1 − z k − z k i . Finally, by (40), (ii), (iii), (iv), (v) and the first part of (vii) we obtainlim k → + ∞ ( − f ∗ ( − L ∗ v k ) − g ∗ ( y k )) = v ( P ) = v ( D )and the proof is complete. (cid:4) Remark 11
When working in finite dimensional spaces, there is no need for the construction considered in(50), since in this case one can simply take the limits in (34) and (29) in order to conclude that ( x, v ) satisfiesthe optimality conditions (14). In infinite dimensional spaces this naive procedure does not work anymore,since in (34) and (29) we have only weak convergence for the sequences involved (we refer to [5, Example 20.34]for an example of a maximally monotone operator whose graph is not sequentially closed in the weak-weaktopology).
Remark 12
Let us notice that the conclusion of Theorem 9(vi) remains true if the uniform convexity of g ∗ isreplaced by the assumptions that f ∗ is β -strongly convex, with β > H ∗ ) ∃ θ ∗ > k L ∗ v k ≥ θ ∗ k v k for all v ∈ G . (57)Indeed, under these conditions one can prove that the composition f ∗ ◦ ( − L ∗ ) is βθ ∗ -strongly convex, hencethe operator A (see (28)) is strongly monotone and the conclusion follows from Theorem 1(vii). The aim of this section is to derive from Theorem 9 via the product space approach iterative schemes andcorresponding convergence statements when solving the optimization problem which assumes the minimizationof the finite sum of proper, convex and lower semicontinuous functions and its Fenchel-type dual. The goal isto evaluate each of the functions arising in the objective separately in the algorithmic scheme.11 roblem 13
Let H be a real Hilbert space, m ≥ f i ∈ Γ( H ) for i = 1 , ..., m .. We aimto solve the convex optimization problem ( P P ) inf x ∈H ( m X i =1 f i ( x ) ) (58)together with its Fenchel-type dual problem( D P ) sup v i ∈H ,i =1 ,...,m P mi =1 v i =0 ( − m X i =1 f ∗ i ( v i ) ) . (59)One of the regularity conditions which guarantees strong duality in this situation is (see [7]):0 ∈ sqri (cid:16) Π mi =1 dom f i − { ( x, ..., x ) : x ∈ H} (cid:17) . (60)According to [7, Remark 2.5], this condition is fulfilled if there exists x ′ ∈ Π mi =1 dom f i such that m − f i are continuous at x ′ . In finite dimensional spaces, condition (60) holds if ∩ mi =1 ri(dom f i ) = ∅ . Also,let us mention that in case m = 2, the regularity condition (60) is equivalent to 0 ∈ sqri(dom f − dom f )(see [7, Remark 2.5]).The optimality conditions for the primal-dual pair of optimization problems (58)-(59) read v i ∈ ∂f i ( x ) i = 1 , ..., m and m X i =1 v i = 0 . (61)More precisely, if ( P P ) has an optimal solution x ∈ H and the regularity condition (60) is fulfilled, thenthere exists ( v , ..., v m ) ∈ H m , an optimal solution to ( D P ), such that (61) holds. Conversely, if ( x, v , ..., v m ) ∈H × H m satisfies relation (61), then x is an optimal solution to ( P P ) and ( v , ..., v m ) is an optimal solution to( D P ).Let us mention some conditions ensuring that ( P P ) has an optimal solution. Suppose that ( P P ) is feasible,which means that its optimal objective value is not identical + ∞ . The existence of optimal solutions of ( P P )is guaranteed if for instance, one of the functions f i is coercive and the remaining ones are bounded frombelow. Indeed, under these circumstances, the objective function of ( P P ) is coercive and the statement followsvia [5, Corollary 11.15]. On the other hand, if one of the functions f i is strongly convex, then the objectivefunction of ( P P ) is strongly convex, too, thus ( P P ) has a unique optimal solution (see [5, Corollary 11.16]).We derive in the following two inertial ADMM algorithms for solving (58)-(59). To this end we reformulateProblem 13 as Problem 4 in the product space H m endowed with the inner product and associated norm definedby h x, u i H m = m X i =1 h x i , u i i H and k x k H m = m X i =1 k x i k H ! / for x = ( x i ) ≤ i ≤ m , u = ( u i ) ≤ i ≤ m ∈ H m , respectively, where h· , ·i H and k · k H denote the inner product andnorm on H , respectively.By using the notation C = { ( x, ..., x ) : x ∈ H} , one can easily rewrite (58) as inf ( x ,...,x m ) ∈H m n f ( x , ..., x m ) + δ C ( x , ..., x m ) o , (62)12here f : H m → R is defined by f ( x , ..., x m ) = m X i =1 f i ( x i ) ∀ ( x , ..., x m ) ∈ H m . This corresponds to the optimization problem (11) with g = δ C : H m → R and L is the identity operator on H m .Notice that the Fenchel dual problem (12) of (62) becomes ( D P ), the regularity condition (13) is equivalentto (60) and the optimality conditions (14) are nothing else than the ones in (61). Moreover, ( x , ..., x m ) ∈ H m is an optimal solution of (62) if and only if x i = x , i = 1 , .., m, and x ∈ H is an optimal solution to ( P P ),while, ( v , ..., v m ) ∈ H m is an optimal solution to the dual of (62) if and only if ( v , ..., v m ) ∈ H m is an optimalsolution to ( D P ). This shows that we are in the context of Problem 4.Writing Algorithm 5 in this setting we get for every k ≥ x k +1 = argmin x ∈H m n f ( x ) + D y k − α k ( y k − y k − ) − γα k ( z k − z k − ) , x E H m + γ k x − z k k H m o (63) z k +1 = α k +1 λ k ( x k +1 − z k ) + (1 − λ k ) α k α k +1 γ (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) (64) z k +1 = argmin z ∈H m n g ( z + z k +1 ) + D − y k − (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , z E H m + γ (cid:13)(cid:13)(cid:13) z − λ k x k +1 − (1 − λ k ) z k (cid:13)(cid:13)(cid:13) H m (cid:27) (65) y k +1 = y k + γ (cid:16) λ k x k +1 + (1 − λ k ) z k − z k +1 (cid:17) + (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , (66)where x k +1 = ( x k +1 i ) ≤ i ≤ m , z k +1 = ( z k +1 i ) ≤ i ≤ m , z k +1 = ( z k +1 i ) ≤ i ≤ m , y k +1 = ( y k +1 i ) ≤ i ≤ m , x = ( x i ) ≤ i ≤ m and z = ( z i ) ≤ i ≤ m .We give in the following an explicit form of this algorithm. Due to the definition of f , relation (63) isnothing else than: x k +1 i = argmin x ∈H n f i ( x ) + D y ki − α k ( y ki − y k − i ) − γα k ( z ki − z k − i ) , x E + γ k x − z ki k o , i = 1 , ..., m. (67)Further, from (65) and (29) we derive z k +1 + z k +1 ∈ C and y k +1 ∈ C ⊥ , hence there exists ( u k ) k ≥ ∈ H such that for every k ≥ z k +1 i + z k +1 i = u k +1 , i = 1 , ..., m (68)and m X i =1 y k +1 i = 0 . (69)If we suppose that P mi =1 y ki = 0 for every k ≥
0, then from (66) we derive m X i =1 z k +1 i = λ k m X i =1 x k +1 i + (1 − λ k ) m X i =1 z ki + (1 − λ k ) α k m X i =1 ( z ki − z k − i ) ∀ k ≥ . (70)13rom this, (68) and (64), we get u k +1 = λ k (1 + α k +1 ) m m X i =1 x k +1 i + 1 − α k +1 λ k − λ k m m X i =1 z ki + α k (1 − λ k )(1 + α k +1 ) m m X i =1 ( z ki − z k − i ) ∀ k ≥ . (71)Conversely, if for a fixed k ≥ P mi =1 y k − i = P mi =1 y ki = 0, then from (66), (68) and (71) wehave P mi =1 y k +1 i = 0.All together, we derive the following algorithm and corresponding convergence theorem (notice that for thestatement in (vii) we use also Remark 12). Algorithm 14
Chose y i , y i , z i , z i ∈ H , i = 1 , ..., m , such that P mi =1 y i = P mi =1 y i = 0 , γ > , ( α k ) k ≥ nondecreasing with ≤ α k ≤ α < for every k ≥ , ( λ k ) k ≥ and λ, σ, δ > such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . Suppose that either α = 0 or λ = α = 0 . Further, for every k ≥ set x k +1 i = argmin x ∈H n f i ( x ) + D y ki − α k ( y ki − y k − i ) − γα k ( z ki − z k − i ) , x E + γ k x − z ki k o , i = 1 , ..., m (72) z k +1 i = α k +1 λ k ( x k +1 i − z ki ) + (1 − λ k ) α k α k +1 γ (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m (73) u k +1 = λ k (1 + α k +1 ) m m X i =1 x k +1 i + 1 − α k +1 λ k − λ k m m X i =1 z ki + α k (1 − λ k )(1 + α k +1 ) m m X i =1 ( z ki − z k − i ) (74) z k +1 i = u k +1 − z k +1 i , i = 1 , ..., m (75) y k +1 i = y ki + γ (cid:16) λ k x k +1 i + (1 − λ k ) z ki − z k +1 i (cid:17) + (1 − λ k ) α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m. (76) Theorem 15
In Problem 13 suppose that ( P P ) has an optimal solution, the regularity condition (60) is fulfilledand consider the sequences generated by Algorithm 14. Then there exists ( x, v , ..., v m ) ∈ H × H m satisfyingthe optimality conditions (61) , hence x is an optimal solution to ( P P ) , ( v , ..., v m ) is an optimal solution to ( D P ) and v ( P P ) = v ( D P ) , such that the following statements are true:(i) ( x ki ) k ≥ converges weakly to x , i = 1 , ..., m ;(ii) ( z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(iii) ( z ki ) k ∈ N converges weakly to x , i = 1 , ..., m ;(iv) ( x k +1 i − z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(v) ( u k ) k ≥ converges weakly to x ;(vi) ( y ki ) k ∈ N converges weakly to v i , i = 1 , ..., m ;(vii) if f ∗ i is strongly convex for every i = 1 , ..., m , then (( y k ) k ∈ N , ..., ( y km ) k ∈ N ) converges strongly to the uniqueoptimal solution to ( D P ) ; viii) lim k → + ∞ (cid:16)P mi =1 f i ( x k +1 i ) (cid:17) = v ( P P ) = v ( D P ) = lim k → + ∞ (cid:0) − P mi =1 f ∗ i ( − v ki ) (cid:1) , where for every i =1 , ..., m , the sequence ( v ki ) k ≥ is defined by v ki = y ki − γz ki + γx k +1 i − α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) ∀ k ≥ , (77) and ( v ki ) k ≥ converges weakly to v i . Remark 16
If we take λ k = 1 for every k ≥
1, then z k +1 i = α k +1 ( x k +1 i − z ki ) , i = 1 , ..., m, and (see also relation(70)) u k +1 = α k +1 m P mi =1 x k +1 i − α k +1 m P mi =1 x ki , hence the iterative scheme (72) - (76) can be simplified to x k +1 i = argmin x ∈H n f i ( x ) + D y ki − α k ( y ki − y k − i ) − γα k ( z ki − z k − i ) , x E + γ k x − z ki k o , i = 1 , ..., m (78) z k +1 i = 1 + α k +1 m m X j =1 x k +1 j − α k +1 m m X j =1 x kj − α k +1 ( x k +1 i − z ki ) , i = 1 , ..., m (79) y k +1 i = y ki + γ (cid:16) x k +1 i − z k +1 i (cid:17) , i = 1 , ..., m. (80)If, moreover, α k = 0 for every k ≥
1, (78) - (80) becomes x k +1 i = argmin x ∈H f i ( x ) + h y ki , x i + γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) x − m m X j =1 x kj (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , i = 1 , ..., m (81) y k +1 i = y ki + γ x k +1 i − m m X j =1 x k +1 j , i = 1 , ..., m, (82)which is the ADMM algorithm as considered in [13, page 50].By interchanging the roles of f and g in (62) we obtain another inertial ADMM-type algorithm withcorresponding convergence statement. Algorithm 17
Chose y i , y i , z i , z i ∈ H , i = 1 , ..., m , γ > , ( α k ) k ≥ nondecreasing with ≤ α k ≤ α < forevery k ≥ , ( λ k ) k ≥ and λ, σ, δ > such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . Suppose that either α = 0 or λ = α = 0 . Further, for every k ≥ set x k +1 = 1 m m X i =1 z ki − mγ m X i =1 y ki + α k mγ m X i =1 (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) (83) z k +1 i = α k +1 λ k ( x k +1 − z ki ) + (1 − λ k ) α k α k +1 γ (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m (84) z k +1 i = argmin z ∈H n f i ( z + z k +1 i ) + D − y ki − (1 − λ k ) α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , z E (85)+ γ k z − λ k x k +1 − (1 − λ k ) z ki k o , i = 1 , ..., m (86) y k +1 i = y ki + γ (cid:16) λ k x k +1 + (1 − λ k ) z ki − z k +1 i (cid:17) + (1 − λ k ) α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m. (87)15 heorem 18 In Problem 13 suppose that ( P P ) has an optimal solution, the regularity condition (60) is fulfilledand consider the sequences generated by Algorithm 17. Then there exists ( x, v , ..., v m ) ∈ H × H m satisfyingthe optimality conditions (61) , hence x is an optimal solution to ( P P ) , ( v , ..., v m ) is an optimal solution to ( D P ) and v ( P P ) = v ( D P ) , such that the following statements are true:(i) ( x k ) k ≥ converges weakly to x ;(ii) ( z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(iii) ( z ki ) k ∈ N converges weakly to x , i = 1 , ..., m ;(iv) ( x k +1 i − z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(v) ( y ki ) k ∈ N converges weakly to v i , i = 1 , ..., m ;(vi) if f ∗ i is strongly convex for i = 1 , ..., m , then (( y k ) k ∈ N , ..., ( y km ) k ∈ N ) converges strongly to the uniqueoptimal solution to ( D P ) ;(vii) lim k → + ∞ (cid:0)P mi =1 f i ( z ki + z ki ) (cid:1) = v ( P P ) = v ( D P ) = lim k → + ∞ (cid:0) − P mi =1 f ∗ i ( − y ki ) (cid:1) . Remark 19
Notice that relation (83) is derived from v k ∈ C ⊥ (see (34)), where for any i = 1 , ..., m thesequence ( v ki ) k ≥ is defined by v ki = y ki − γz ki + γx k +1 − α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) ∀ k ≥ v ki ) k ≥ converges weakly to v i . References [1] F. Alvarez,
On the minimizing property of a second order dissipative system in Hilbert spaces , SIAMJournal on Control and Optimization 38(4), 1102–1119, 2000[2] F. Alvarez,
Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm formaximal monotone operators in Hilbert space , SIAM Journal on Optimization 14(3), 773–782, 2004[3] F. Alvarez, H. Attouch,
An inertial proximal method for maximal monotone operators via discretizationof a nonlinear oscillator with damping , Set-Valued Analysis 9, 3–11, 2001[4] H. Attouch, J. Peypouquet, P. Redont,
A dynamical approach to an inertial forward-backward algorithmfor convex minimization , SIAM Journal on Optimization 24(1), 232–256, 2014[5] H.H. Bauschke, P.L. Combettes,
Convex Analysis and Monotone Operator Theory in Hilbert Spaces , CMSBooks in Mathematics, Springer, New York, 2011[6] J.M. Borwein and J.D. Vanderwerff,
Convex Functions: Constructions, Characterizations and Counterex-amples , Cambridge University Press, Cambridge, 2010[7] R.I. Bot¸,
Conjugate Duality in Convex Optimization , Lecture Notes in Economics and MathematicalSystems, Vol. 637, Springer, Berlin Heidelberg, 2010[8] R.I. Bot¸, E.R. Csetnek,
Regularity conditions via generalized interiority notions in convex optimization:new achievements and their relation to some classical statements , Optimization 61(1), 35–65, 2012169] R.I. Bot¸, E.R. Csetnek,
An inertial forward-backward-forward primal-dual splitting algorithm for solvingmonotone inclusion problems , arXiv:1402.5291, 2014[10] R.I. Bot¸, E.R. Csetnek, A. Heinrich,
A primal-dual splitting algorithm for finding zeros of sums of maxi-mally monotone operators , SIAM Journal on Optimization, 23(4), 2011–2036, 2013[11] R.I. Bot¸, E.R. Csetnek, A. Heinrich, C. Hendrich,
On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems , Mathematical Programming, DOI10.1007/s10107-014-0766-0[12] R.I. Bot¸, E.R. Csetnek, C. Hendrich,
Inertial Douglas-Rachford splitting for monotone inclusion problems ,arXiv:1403.3330v2, 2014[13] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein,
Distributed optimization and statistical learning viathe alternating direction method of multipliers , Foundations and Trends in Machine Learning 3, 1–12, 2010[14] L.M. Brice˜no-Arias, P.L. Combettes,
A monotone + skew splitting model for composite monotone inclu-sions in duality , SIAM Journal on Optimization 21(4), 1230–1250, 2011[15] A. Cabot, P. Frankel,
Asymptotics for some proximal-like method involving inertia and memory aspects ,Set-Valued and Variational Analysis 19, 59–74, 2011[16] J. Eckstein,
Augmented Lagrangian and alternating direction methods for convex optimization: a tutorialand some illustrative computational results , Rutcor Research Report 32-2012, 2012[17] J. Eckstein, D.P. Bertsekas,
On the Douglas-Rachford splitting method and the proximal point algorithmfor maximal monotone operators , Mathematical Programming 55, 293–318, 1992[18] I. Ekeland, R. Temam,
Convex Analysis and Variational Problems , North-Holland Publishing Company,Amsterdam, 1976[19] E. Esser,
Applications of Lagrangian-based alternating direction methods and connections to split Bregman ,CAM Reports 09-31, UCLA, Center for Applied Mathematics, 2009[20] D. Gabay,
Applications of the method of multipliers to variational inequalities , in M. Fortin and R. Glowin-ski, editors, Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems,North-Holland, Amsterdam, 1983[21] P.-E. Maing´e,
Convergence theorems for inertial KM-type algorithms , Journal of Computational and Ap-plied Mathematics 219, 223–236, 2008[22] P.-E. Maing´e, A. Moudafi,
Convergence of new inertial proximal methods for dc programming , SIAMJournal on Optimization 19(1), 397–413, 2008[23] A. Moudafi, M. Oliny,
Convergence of a splitting inertial proximal method for monotone operators , Journalof Computational and Applied Mathematics 155, 447–454, 2003[24] R.T. Rockafellar,
On the maximal monotonicity of subdifferential mappings , Pacific Journal of Mathematics33(1), 209–216, 1970[25] R.T. Rockafellar,
Monotone operators and the proximal point algorithm , SIAM Journal on Control andOptimization 14(5), 877–898, 1976 1726] S. Simons,
From Hahn-Banach to Monotonicity , Springer, Berlin, 2008[27] C. Z˘alinescu,