[PDF] An inertial alternating direction method of multipliers

Abstract

In the context of convex optimization problems in Hilbert spaces, we induce inertial effects into the classical ADMM numerical scheme and obtain in this way so-called inertial ADMM algorithms, the convergence properties of which we investigate into detail. To this aim we make use of the inertial version of the Douglas-Rachford splitting method for monotone inclusion problems recently introduced in [12], in the context of concomitantly solving a convex minimization problem and its Fenchel dual. The convergence of both sequences of the generated iterates and of the objective function values is addressed. We also show how the obtained results can be extended to the treating of convex minimization problems having as objective a finite sum of convex functions.

Full PDF

aa r X i v : . [ m a t h . O C ] A p r An inertial alternating direction method of multipliers

Radu Ioan Bot¸ ∗ Ern¨o Robert Csetnek † August 11, 2018

Abstract.

In the context of convex optimization problems in Hilbert spaces, we induce inertial eﬀects into theclassical ADMM numerical scheme and obtain in this way so-called inertial ADMM algorithms, the convergenceproperties of which we investigate into detail. To this aim we make use of the inertial version of the Douglas-Rachford splitting method for monotone inclusion problems recently introduced in [12], in the context ofconcomitantly solving a convex minimization problem and its Fenchel dual. The convergence of both sequencesof the generated iterates and of the objective function values is addressed. We also show how the obtainedresults can be extended to the treating of convex minimization problems having as objective a ﬁnite sum ofconvex functions.

Key Words. inertial ADMM algorithm, inertial Douglas-Rachford splitting, maximally monotone operator,resolvent, subdiﬀerential, convex optimization, Fenchel duality

AMS subject classiﬁcation.

One of the most popular algorithms in the literature for solving the convex optimization probleminf x ∈ R n { f ( x ) + g ( Ax ) } , (1)where f : R n → R and g : R m → R are proper, convex and lower semicontinuous functions and A is a m × n matrix with real entries, is the alternating direction method of multipliers (ADMM). We brieﬂy describe thisprocedure. By introducing an auxiliary variable one can rewrite (1) asinf ( x,z ) ∈ R n × R m Ax − z =0 { f ( x ) + g ( z ) } . (2)For γ ≥ augmented Lagrangian L γ : R n × R m × R m → R deﬁned by L γ ( x, z, y ) = f ( x ) + g ( z ) + y T ( Ax − z ) + γ k Ax − z k ∀ ( x, y, z ) ∈ R n × R m × R m , where the Euclidean norm on R m is taken.The ADMM algorithm reads for given y , z ∈ R m and every k ≥ x k +1 = argmin x ∈ R n L γ ( x, z k , y k ) (3) ∗ University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email:[email protected]. Research partially supported by DFG (German Research Foundation), project BO 2516/4-1. † University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: [email protected]. Research supported by DFG (German Research Foundation), project BO 2516/4-1. k +1 = argmin z ∈ R m L γ ( x k +1 , z, y k ) (4) y k +1 = y k + γ ( Lx k +1 − z k +1 ) . (5)The convergence of the ADMM algorithm is guaranteed by assuming that the matrix A has full columnrank and the unaugmented Lagrangian L has a saddle point ( x, z, y ) ∈ R n × R m × R m , that is L ( x, z, y ) ≤ L ( x, z, y ) ≤ L ( x, z, y ) ∀ ( x, z, y ) ∈ R n × R m × R m . Let us mention that if ( x, z, y ) is a saddle point of L , then x is an optimal solution to (1), z = Ax and y is anoptimal solution to the Fenchel dual problem to (1)sup v ∈ R m {− f ∗ ( − A T v ) − g ∗ ( v ) } , (6)where A T denotes the transpose of the matrix A and f ∗ and g ∗ the conjugate functions of f and g , respectively.One of the limitations of this algorithm is the presence of the term Ax in the update rule of x k +1 , whichmeans that the scheme is not a really full splitting algorithm, like the primal-dual algorithms recently consideredin [10,11,14]. Nevertheless, the algorithm has been successfully implemented in the context of diﬀerent real-lifeproblems, like location problems, the lasso problem in image processing, problems arising in satistics, supportvector machines classiﬁcation, etc. We refer the reader to the seminal work [13] for the history of the ADMMalgorithm and various concrete applications of it (see also [16, 17, 19, 20]).In this paper we propose new ADMM type numerical schemes, which have their roots in the class ofso-called inertial proximal point algorithms. The latter iterative schemes are designed for solving monotoneinclusion problems and, as they arise from the time discretization of some diﬀerential inclusions of secondorder type (see [1, 3]), have the property that the next iterate is deﬁned by using the previous two iterates.In this way an inertial eﬀect is induced into the numerical scheme, the increasing interest in this class ofalgorithms being emphasized by a considerable number of papers written in the last ﬁfteen years on this topic,see [1–4, 9, 12, 15, 21–23].We derive the inertial version of the ADMM from the perspective of the monotone operator theory, usingas starting point the fact pointed out in [20] that the classical ADMM can be approached from the Douglas-Rachford splitting scheme for monotone inclusion problems (see also [17]). In [12] we recently introducedand studied the convergence properties of an inertial Douglas-Rachford splitting algorithm . By combiningthis iterative scheme with the techniques from [17, 20], we are able to obtain an inertial ADMM scheme forsimultaneously solving convex minimization problems and their Fenchel-type duals. For the sake of generality,the analysis is carried out in inﬁnite dimensional Hilbert spaces, in opposition to the usual literature onADMM algorithms where the ﬁnite dimensional setting is preferred. Moreover, we prove the convergence ofboth sequences of the generated iterates and of the objective function values and show that the classical ADMMscheme can be recovered as particular instance of our inertial ADMM algorithm. We also point out how otherADMM-type algorithms from the literature turn out to be particular schemes of the new ones presented here.The paper is organized as follows. In the next section we make the reader familiar with the notions andresults which will be used throughout the manuscript. In Section 3 we introduce the inertial ADMM algorithmfor simultaneously solving in Hilbert spaces the convex optimization problems which assumes the minimizationof the sum of a proper, convex and lower semicontinuous function with the composition of another proper,convex and lower semicontinuous function with a linear continuous operator and its Fenchel dual problem andstudy its convergence properties. Finally, in the last section we treat the convex minimization problem havingas objective the ﬁnite sum of proper, convex and lower semicontinuous functions and its Fenchel-type dual andprovide for this primal-dual pair inertial ADMM algorithms and corresponding convergence statements.2 Preliminaries

For the readers convenience let us recall some standard notions and results in monotone operator theory andconvex analysis which will be used further in the paper, see also [5–7, 18, 26, 27]. Let N = { , , , ... } bethe set of nonnegative integers. Let H be a real Hilbert space with inner product h· , ·i and associated norm k · k = p h· , ·i . The symbols ⇀ and → denote weak and strong convergence, respectively. When G is anotherHilbert space and L : H → G a linear continuous operator, then L ∗ : G → H , deﬁned by h L ∗ y, x i = h y, Lx i forall ( x, y ) ∈ H × G , denotes the adjoint operator of L .For an arbitrary set-valued operator A : H ⇒ H we denote by Gr A = { ( x, u ) ∈ H × H : u ∈ Ax } its graph and by A − : H ⇒ H its inverse operator , deﬁned by ( u, x ) ∈ Gr A − if and only if ( x, u ) ∈ Gr A .We use also the notation zer A = { x ∈ H : 0 ∈ Ax } for the set of zeros of A . We say that A is monotone if h x − y, u − v i ≥ x, u ) , ( y, v ) ∈ Gr A . A monotone operator A is said to be maximally monotone , if thereexists no proper monotone extension of the graph of A on H × H . The resolvent of A , J A : H ⇒ H , is deﬁnedby J A = (Id H + A ) − , where Id H : H → H , Id H ( x ) = x for all x ∈ H , is the identity operator on H . Moreover,if A is maximally monotone, then J A : H → H is single-valued and maximally monotone (see [5, Proposition23.7 and Corollary 23.10]). For an arbitrary γ > p ∈ J γA x if and only if ( p, γ − ( x − p )) ∈ Gr A. (7)The operator A is said to be uniformly monotone if there exists an increasing function φ A : [0 , + ∞ ) → [0 , + ∞ ] that vanishes only at 0, and h x − y, u − v i ≥ φ A ( k x − y k ) for every ( x, u ) ∈ Gr A and ( y, v ) ∈ Gr A . Awell-known class of operators fulﬁlling this property is the one of the strongly monotone operators. Let γ > A is γ -strongly monotone , if h x − y, u − v i ≥ γ k x − y k for all ( x, u ) , ( y, v ) ∈ Gr A .Let us recall now some elements of convex analysis. For a function f : H → R , where R := R ∪ {±∞} isthe extended real line, we denote by dom f = { x ∈ H : f ( x ) < + ∞} its eﬀective domain and say that f is proper if dom f = ∅ and f ( x ) = −∞ for all x ∈ H . We denote by Γ( H ) the family of proper, convex and lowersemi-continuous extended real-valued functions deﬁned on H . Let f ∗ : H → R , f ∗ ( u ) = sup x ∈H {h u, x i − f ( x ) } for all u ∈ H , be the conjugate function of f . The subdiﬀerential of f at x ∈ H , with f ( x ) ∈ R , is the set ∂f ( x ) := { v ∈ H : f ( y ) ≥ f ( x ) + h v, y − x i ∀ y ∈ H} . We take by convention ∂f ( x ) := ∅ , if f ( x ) ∈ {±∞} .Notice that if f ∈ Γ( H ), then ∂f is a maximally monotone operator (see [24]) and it holds ( ∂f ) − = ∂f ∗ .Let S ⊆ H be a nonempty set. The indicator function of S , δ S : H → R , is the function which takes thevalue 0 on S and + ∞ otherwise. The subdiﬀerential of the indicator function is the normal cone of S , thatis N S ( x ) = { u ∈ H : h u, y − x i ≤ ∀ y ∈ S } , if x ∈ S and N S ( x ) = ∅ for x / ∈ S . Notice that, if S is a linearsubspace, then N S ( x ) = S ⊥ = { u ∈ H : h y, u i = 0 ∀ y ∈ S } for all x ∈ S .When f ∈ Γ( H ) and γ >

0, for every x ∈ H we denote by prox γf ( x ) the proximal point of parameter γ of f at x , which is the unique optimal solution of the optimization probleminf y ∈H (cid:26) f ( y ) + 12 γ k y − x k (cid:27) . (8)Notice that the resolvent of the maximally monotone operator ∂f is nothing else than the proximal pointoperator of f , namely, J γ∂f = (Id H + γ∂f ) − = prox γf . (9)Moreover, if f = δ S , where S ⊆ H is a nonempty, closed convex set, then the proximal point operator of f isthe orthogonal projection on S .Let us also recall that a proper function f : H → R is said to be uniformly convex , if there exists anincreasing function φ : [0 , + ∞ ) → [0 , + ∞ ] which vanishes only at 0 and such that f ( tx + (1 − t ) y ) + t (1 − t ) φ ( k x − y k ) ≤ tf ( x ) + (1 − t ) f ( y ) ∀ x, y ∈ dom f and ∀ t ∈ (0 , .

3n case this inequality holds for φ = ( β/ · ) , where β >

0, then f is said to be β -strongly convex . Let usmention that this property implies β -strong monotonicity of ∂f (see [5, Example 22.3]) (more general, if f isuniformly convex, then ∂f is uniformly monotone, see [5, Example 22.3]).We close this section by presenting the inertial Douglas-Rachford splitting algorithm for determining thezeros of the sum of two maximally monotone operators recently obtained in [12], which will be crucial for theproof of the main results in the next section. Theorem 1 (Inertial Douglas–Rachford splitting algorithm, see [12]) Let

A, B : H ⇒ H be maximally mono-tone operators such that zer( A + B ) = ∅ . Consider the following iterative scheme: ( ∀ k ≥  y k = J γB [ w k + α k ( w k − w k − )] v k = J γA [2 y k − w k − α k ( w k − w k − )] w k +1 = w k + α k ( w k − w k − ) + λ k ( v k − y k ) where γ > , w , w are arbitrarily chosen in H , ( α k ) k ≥ is nondecreasing with α = 0 and ≤ α k ≤ α < forevery k ≥ and λ, σ, δ > are such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . (10) Then there exists x ∈ H such that the following statements are true:(i) J γB x ∈ zer( A + B ) ;(ii) P k ∈ N k w k +1 − w k k < + ∞ ;(iii) ( w k ) k ∈ N converges weakly to x ;(iv) y k − v k → as k → + ∞ ;(v) ( y k ) k ≥ converges weakly to J γB x ;(vi) ( v k ) k ≥ converges weakly to J γB x ;(vii) if A or B is uniformly monotone, then ( y k ) k ≥ and ( v k ) k ≥ converge strongly to the unique point in zer( A + B ) . Remark 2

According to [12], the condition α = 0 can be replaced with the assumption w = w withoutaltering the conclusion of the above theorem. Remark 3

Let us mention that in the hypotheses of the above theorem we have0 < δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i < . Conversely, for a ﬁxed α ∈ (0 , α > σ > α (cid:0) α (1 + α ) + σ (cid:1) + α + 2 α √ α p α (1 + α ) + σ < , δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i = α, for all δ ∈ { δ , δ } , where δ , = 1 − α − α (cid:0) α (1 + α ) + σ (cid:1) ± q(cid:0) − α − α (cid:0) α (1 + α ) + σ (cid:1)(cid:1) − αα ( α (1 + α ) + σ )2 αα . In this section we present the main result of the paper, which consists in the formulation of an inertial ADMMalgorithm for a primal-dual pair of convex optimization problems and in the investigation of its convergenceproperties. We start by describing the setting in which we work.

Problem 4

Let H and G be real Hilbert spaces, f ∈ Γ( H ), g ∈ Γ( G ) and L : H → G a linear continuousoperator. We aim to solve the convex optimization problem( P ) inf x ∈H { f ( x ) + g ( Lx ) } (11)together with its Fenchel-type dual problem( D ) sup v ∈G {− f ∗ ( − L ∗ v ) − g ∗ ( v ) } . (12)Denoting by v ( P ) and v ( D ) the optimal objective values of the two problems, respectively, the situation v ( P ) ≥ v ( D ), called in the literature weak duality , always holds. In case a regularity condition is fulﬁlled onecan guarantee equality for the optimal objective values and existence of optimal solutions to the dual. Forthe readers convenience, we discuss some regularity conditions which are suitable in this context. One of theweakest regularity conditions of interiority-type is the Attouch-Br´ezis condition , which reads0 ∈ sqri(dom g − L (dom f )) . (13)Here, for S ⊆ G a convex set, we denote bysqri S := { x ∈ S : ∪ λ> λ ( S − x ) is a closed linear subspace of G} its strong quasi-relative interior . Notice that we always have int S ⊆ sqri S (in general this inclusion may bestrict). If G is ﬁnite-dimensional, then sqri S coincides with ri S , the relative interior of S , which is the interiorof S with respect to its aﬃne hull. In this case, condition (13) holds if there exists x ′ ∈ ri(dom f ) such that Lx ′ ∈ ri(dom g ). Considering again the inﬁnite dimensional setting, we remark that condition (13) is fulﬁlled,if for example g is continuous at x ′ ∈ dom f ∩ L − (dom g ). Let us mention that, if (13) holds, then we have strong duality , which means that v ( P ) = v ( D ) and ( D ) has an optimal solution.Moreover, the optimality conditions for the primal-dual pair of optimization problems (11)-(12) read − L ∗ v ∈ ∂f ( x ) and v ∈ ∂g ( Lx ) . (14)More precisely, if ( P ) has an optimal solution x ∈ H and the regularity condition (13) is fulﬁlled, thenthere exists v ∈ G , an optimal solution to ( D ), such that (14) holds. Conversely, if the pair ( x, v ) ∈ H × G x is an optimal solution to ( P ) and v is an optimal solution to ( D ). For furtherconsiderations concerning duality we invite the reader to consult [5–8, 18, 26, 27].Let us mention some conditions ensuring that ( P ) has an optimal solution. Suppose that ( P ) is feasible,which means that its optimal objective value is not identical + ∞ . The existence of optimal solutions to ( P ) isguaranteed if, for instance, f is coercive (that is lim k x k→∞ f ( x ) = + ∞ ) and g is bounded from below. Indeed,under these circumstances, the objective function of ( P ) is coercive and the statement follows via [5, Corollary11.15]. On the other hand, when f is strongly convex, then the objective function of ( P ) is strongly convex,too, thus ( P ) has a unique optimal solution (see [5, Corollary 11.16]).Let us introduce now the inertial ADMM algorithm. Algorithm 5

Chose y , y , z , z ∈ G , γ > , ( α k ) k ≥ nondecreasing with ≤ α k ≤ α < for every k ≥ , ( λ k ) k ≥ and λ, σ, δ > such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . Suppose that either α = 0 or λ = α = 0 . Further, for all k ≥ set x k +1 = argmin x ∈H n f ( x ) + D y k − α k ( y k − y k − ) − γα k ( z k − z k − ) , Lx E + γ k Lx − z k k o (15) z k +1 = α k +1 λ k ( Lx k +1 − z k ) + (1 − λ k ) α k α k +1 γ (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) (16) z k +1 = argmin z ∈G n g ( z + z k +1 ) + D − y k − (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , z E + γ k z − λ k Lx k +1 − (1 − λ k ) z k k o (17) y k +1 = y k + γ (cid:16) λ k Lx k +1 + (1 − λ k ) z k − z k +1 (cid:17) + (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) . (18) Remark 6

In order to ensure that the sequence ( x k ) k ≥ is uniquely determined we assume that the operator L satisﬁes the hypothesis ( H ) ∃ θ > k Lx k ≥ θ k x k for all x ∈ H . (19)This condition guarantees that the objective function in (15) is strongly convex, hence ( x k ) k ≥ is well deﬁned(see [5, Corollary 11.16]). Let us mention that ( H ) will be used also in the proof of the convergence statementsof the algorithm. Notice that if L injective and ran L ∗ is closed, then ( H ) holds (see [5, Fact 2.19]). Moreover,( H ) implies that L is injective. We conclude that in case ran L ∗ is closed, ( H ) is equivalent to L injective. Inﬁnite dimensional spaces, namely, if H = R n and G = R m , with m ≥ n ≥

1, hypothesis ( H ) is nothing elsethan saying that L has full column rank, which is a condition widely used in the literature for proving theconvergence of the ADMM algorithm. Remark 7

Notice that the objective function of (17) is strongly convex, hence the sequence ( z k ) k ∈ N is welldeﬁned as well. Moreover, it can be expressed with the help of the proximal point operator of g for every k ≥ z k +1 = − z k +1 + prox γ − g (cid:18) z k +1 + λ k Lx k +1 + (1 − λ k ) z k + 1 γ y k + (1 − λ k ) α k γ (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17)(cid:19) . L in the x -argument. Nevertheless,in case H = G and L is the identity operator on H , relation (15) can be expressed via the proximal pointoperator of f for every k ≥ x k +1 = prox γ − f (cid:18) z k − γ y k + α k γ ( y k − y k − ) + α k ( z k − z k − ) (cid:19) . Remark 8

Let us consider the case α k = 0 for all k ≥

1. Then the iterative scheme becomes for every k ≥ x k +1 = argmin x ∈H n f ( x ) + h y k , Lx i + γ k Lx − z k k o (20) z k +1 = argmin z ∈G n g ( z ) + h− y k , z i + γ k z − λ k Lx k +1 − (1 − λ k ) z k k o (21) y k +1 = y k + γ (cid:16) λ k Lx k +1 + (1 − λ k ) z k − z k +1 (cid:17) , (22)which is the error-free case of the classical ADMM algorithm as presented and investigated in [17]. Here ( λ k ) k ≥ can be regarded as a sequence of relaxation parameters. If one takes further λ k = 1 for all k ≥

1, one has theclassical ADMM algorithm (see for example [13]) x k +1 = argmin x ∈H n f ( x ) + h y k , Lx i + γ k Lx − z k k o (23) z k +1 = argmin z ∈G n g ( z ) + h− y k , z i + γ k z − Lx k +1 k o (24) y k +1 = y k + γ (cid:16) Lx k +1 − z k +1 (cid:17) . (25)We are now in position to state the main result of the paper. Theorem 9

In Problem 4 suppose that ( P ) has an optimal solution, the regularity condition (13) is fulﬁlled,the hypothesis ( H ) concerning the operator L holds and consider the sequences generated by Algorithm 5. Thenthere exists ( x, v ) ∈ H × G satisfying the optimality conditions (14) , hence, x is an optimal solution to ( P ) , v is an optimal solution to ( D ) and v ( P ) = v ( D ) , such that the following statements are true:(i) ( x k ) k ≥ converges weakly to x ;(ii) ( z k ) k ≥ converges strongly to ;(iii) ( z k ) k ∈ N converges weakly to Lx ;(iv) ( Lx k +1 − z k ) k ≥ converges strongly to ;(v) ( y k ) k ∈ N converges weakly to v ;(vi) if g ∗ is uniformly convex, then ( y k ) k ∈ N converges strongly to the unique optimal solution of ( D ) ;(vii) lim k → + ∞ ( f ( x k +1 ) + g ( z k + z k )) = v ( P ) = v ( D ) = lim k → + ∞ ( − f ∗ ( − L ∗ v k ) − g ∗ ( y k )) , where the sequence ( v k ) k ≥ is deﬁned by v k = y k − γz k + γLx k +1 − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) ∀ k ≥ , (26) and ( v k ) k ≥ converges weakly to v . emark 10 Let us mention that the function g ∗ is uniformly convex, if g ∗ is β -strongly convex for β > g ∗ is β -strongly convex if and only if g is Fr´echet-diﬀerentiable and ∇ g is β − -Lipschitzian. Proof.

We introduce the sequence ( w k ) k ∈ N by w k = y k + γz k ∀ k ∈ N . (27)We intend to prove that the sequences ( y k ) k ∈ N , ( v k ) k ≥ , ( w k ) k ∈ N are nothing else than the ones generatedby inertial Douglas-Rachford algorithm presented in Theorem 1 for the maximal monotone operators A := ∂ ( f ∗ ◦ ( − L ∗ )) and B := ∂g ∗ . (28)Notice that the hypotheses of the theorem ensure that there exist a pair ( x, v ) ∈ H × G satisfying the optimalityconditions (14), from which one easily derives that zer( A + B ) = ∅ .We ﬁx k ≥

1. We obtain from (17) that0 ∈ ∂g ( z k +1 + z k +1 ) − y k − (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) + γ (cid:16) z k +1 − λ k Lx k +1 − (1 − λ k ) z k (cid:17) , hence due to (18) y k +1 ∈ ∂g ( z k +1 + z k +1 ) . (29)From here we deduce z k +1 + z k +1 ∈ ∂g ∗ ( y k +1 ) = By k +1 , hence y k +1 = J γB ( γz k +1 + γz k +1 + y k +1 ) = J γB ( w k +1 + γz k +1 ) . (30)By (18) we have y k +1 = y k + γz k − γz k +1 + γλ k ( Lx k +1 − z k ) + (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , (31)thus γz k +1 = α k +1 ( y k +1 − y k − γz k + γz k +1 ) = α k +1 ( w k +1 − w k ) . (32)From (30) and (32) we obtain y k +1 = J γB [ w k +1 + α k +1 ( w k +1 − w k )] . (33)Further, from (15) we get0 ∈ ∂f ( x k +1 ) + L ∗ (cid:16) y k − α k ( y k − y k − ) − γα k ( z k − z k − ) (cid:17) + γL ∗ ( Lx k +1 − z k ) , which by (26) gives − L ∗ v k ∈ ∂f ( x k +1 ) . (34)We derive x k +1 ∈ ∂f ∗ ( − L ∗ v k ), hence − Lx k +1 ∈ − L∂f ∗ ( − L ∗ v k ) ⊆ ∂ ( f ∗ ◦ ( − L ∗ ))( v k ) = Av k , which leads to v k = J γA ( v k − γLx k +1 ) . (35)8aking into account (27) and (26) we have v k − γLx k +1 = y k − γz k − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) = 2 y k − w k − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) = 2 y k − w k − α k ( w k − w k − )and from (35) we get v k = J γA [2 y k − w k − α k ( w k − w k − )] . (36)Finally, from (31), (27) and (26) we derive w k +1 = w k + α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) + λ k (cid:16) γ ( Lx k +1 − z k ) − α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17)(cid:17) = w k + α k ( w k − w k − ) + λ k ( v k − y k ) , hence w k +1 = w k + α k ( w k − w k − ) + λ k ( v k − y k ) . (37)In conclusion, for all k ≥  y k = J γB [ w k + α k ( w k − w k − )] v k = J γA [2 y k − w k − α k ( w k − w k − )] w k +1 = w k + α k ( w k − w k − ) + λ k ( v k − y k ) , which is the inertial Douglas-Rachford scheme from Theorem 1.Notice that the relation α = 0 from Algorithm 5 corresponds to the situation when in Theorem 1 thevectors w , w can be chosen arbitrarily in G , while the condition λ = α = 0 ensures that w = w , which isthe situation mentioned in Remark 2. Indeed, in case α = 0 and λ = α = 0, from (16) we get z = 0, hence,by (32), w = w .According to Theorem 1, there exists w ∈ G such that w k ⇀ w as k → + ∞ (38) w k +1 − w k → k → + ∞ (39) y k − v k → k → + ∞ (40) y k ⇀ J γB w as k → + ∞ (41) v k ⇀ J γB w as k → + ∞ . (42)From (32) and (39) we derive that z k → k → + ∞ . (43)Further, by (26), (39) and (40) we obtain Lx k +1 − z k → k → + ∞ . (44)Moreover, from (27), (38) and (41) we get z k ⇀ γ ( w − J γB w ) as k → + ∞ . (45)9e deduce from (44) that Lx k ⇀ γ ( w − J γB w ) as k → + ∞ . (46)Now, using the hypothesis ( H ), we easily derive that ( x k ) k ≥ is bounded, thus, due to (46), it possesses at mostone weak cluster point. As a consequence, ( x k ) k ≥ is weakly convergent (see [5, Lemma 2.38]), hence thereexists x ∈ H such that x k ⇀ x as k → + ∞ . (47)From (44), (45) and (47) we also have z k ⇀ Lx as k → + ∞ (48)and Lx = 1 γ ( w − J γB w ) . (49)Using the notation v = J γB w , we prove that the pair ( x, v ) ∈ H × G satisﬁes the optimality conditions (14).To this end, observe that, due to (34) and (29), we have( − L ∗ v k +1 + L ∗ y k +1 , z k +1 + z k +1 − Lx k +2 ) ∈ ( ∂f × B + S )( x k +2 , y k +1 ) ∀ k ≥ , (50)where S : H × G → H × G is deﬁned by S ( x, y ) = ( L ∗ y, − Lx ) ∀ ( x, y ) ∈ H × G . Since S is monotone and continuous, it is maximally monotone (see [5, Corollary 20.25]). Further, ∂f × B is also maximally monotone (see [5, Proposition 20.23]) and since S has full domain, the sum ∂f × B + S is maximally monotone, too (see [5, Corollary 24.4]). Since the graph of a maximally monotone operator issequentially closed in the weak-strong topology (see [5, Proposition 20.33(ii)]), by taking the limits in (50) andusing (40), (44), (43), (47) and (41) we obtain(0 , ∈ ( ∂f × B + S )( x, v ) . One can easily show that the latter means the pair ( x, v ) satisﬁes the optimality conditions (14).The statements (i)-(v) follow now from (47), (43), (48), (44) and (41). Further, (vi) follows from Theorem1(vii).We are going to prove now statement (vii). Notice that f and g are weak lower semicontinuous (since f and g are convex) and therefore, by (i), (ii) and (iii) we getlim inf k → + ∞ ( f ( x k +1 ) + g ( z k + z k )) ≥ lim inf k → + ∞ f ( x k +1 ) + lim inf k → + ∞ g ( z k + z k ) ≥ f ( x ) + g ( Lx ) = v ( P ) . (51)Further, from (34) we derive the inequality f ( x ) ≥ f ( x k +1 ) + h− L ∗ v k , x − x k +1 i ∀ k ≥ g ( Lx ) ≥ g ( z k + z k ) + h y k , Lx − z k − z k i ∀ k ≥ . (53)Summing up the last two inequalities we get v ( P ) ≥ f ( x k +1 ) + g ( z k + z k ) + h− v k , Lx − Lx k +1 i + h y k , Lx − z k − z k i ∀ k ≥ , f ( x k +1 ) + g ( z k + z k ) ≤ v ( P ) + h v k − y k , Lx − Lx k +1 i + h y k , − Lx k +1 + z k + z k i ∀ k ≥ . Taking into account (40), (ii), (iii), (iv) and (v) we obtainlim sup k → + ∞ ( f ( x k +1 ) + g ( z k + z k )) ≤ v ( P ) . (54)Combining (51) and (54) we get the ﬁrst part of the statement.Again by (34) and (29) we have (see [5, Proposition 16.9]) f ( x k +1 ) + f ∗ ( − L ∗ v k ) = h x k +1 , − L ∗ v k i ∀ k ≥ g ( z k + z k ) + g ∗ ( y k ) = h y k , z k + z k i ∀ k ≥ . (56)Adding these relations we derive for every k ≥ − f ∗ ( − L ∗ v k ) − g ∗ ( y k ) = f ( x k +1 ) + g ( z k + z k ) + h v k − y k , Lx k +1 i + h y k , Lx k +1 − z k − z k i . Finally, by (40), (ii), (iii), (iv), (v) and the ﬁrst part of (vii) we obtainlim k → + ∞ ( − f ∗ ( − L ∗ v k ) − g ∗ ( y k )) = v ( P ) = v ( D )and the proof is complete. (cid:4) Remark 11

When working in ﬁnite dimensional spaces, there is no need for the construction considered in(50), since in this case one can simply take the limits in (34) and (29) in order to conclude that ( x, v ) satisﬁesthe optimality conditions (14). In inﬁnite dimensional spaces this naive procedure does not work anymore,since in (34) and (29) we have only weak convergence for the sequences involved (we refer to [5, Example 20.34]for an example of a maximally monotone operator whose graph is not sequentially closed in the weak-weaktopology).

Remark 12

Let us notice that the conclusion of Theorem 9(vi) remains true if the uniform convexity of g ∗ isreplaced by the assumptions that f ∗ is β -strongly convex, with β > H ∗ ) ∃ θ ∗ > k L ∗ v k ≥ θ ∗ k v k for all v ∈ G . (57)Indeed, under these conditions one can prove that the composition f ∗ ◦ ( − L ∗ ) is βθ ∗ -strongly convex, hencethe operator A (see (28)) is strongly monotone and the conclusion follows from Theorem 1(vii). The aim of this section is to derive from Theorem 9 via the product space approach iterative schemes andcorresponding convergence statements when solving the optimization problem which assumes the minimizationof the ﬁnite sum of proper, convex and lower semicontinuous functions and its Fenchel-type dual. The goal isto evaluate each of the functions arising in the objective separately in the algorithmic scheme.11 roblem 13

Let H be a real Hilbert space, m ≥ f i ∈ Γ( H ) for i = 1 , ..., m .. We aimto solve the convex optimization problem ( P P ) inf x ∈H ( m X i =1 f i ( x ) ) (58)together with its Fenchel-type dual problem( D P ) sup v i ∈H ,i =1 ,...,m P mi =1 v i =0 ( − m X i =1 f ∗ i ( v i ) ) . (59)One of the regularity conditions which guarantees strong duality in this situation is (see [7]):0 ∈ sqri (cid:16) Π mi =1 dom f i − { ( x, ..., x ) : x ∈ H} (cid:17) . (60)According to [7, Remark 2.5], this condition is fulﬁlled if there exists x ′ ∈ Π mi =1 dom f i such that m − f i are continuous at x ′ . In ﬁnite dimensional spaces, condition (60) holds if ∩ mi =1 ri(dom f i ) = ∅ . Also,let us mention that in case m = 2, the regularity condition (60) is equivalent to 0 ∈ sqri(dom f − dom f )(see [7, Remark 2.5]).The optimality conditions for the primal-dual pair of optimization problems (58)-(59) read v i ∈ ∂f i ( x ) i = 1 , ..., m and m X i =1 v i = 0 . (61)More precisely, if ( P P ) has an optimal solution x ∈ H and the regularity condition (60) is fulﬁlled, thenthere exists ( v , ..., v m ) ∈ H m , an optimal solution to ( D P ), such that (61) holds. Conversely, if ( x, v , ..., v m ) ∈H × H m satisﬁes relation (61), then x is an optimal solution to ( P P ) and ( v , ..., v m ) is an optimal solution to( D P ).Let us mention some conditions ensuring that ( P P ) has an optimal solution. Suppose that ( P P ) is feasible,which means that its optimal objective value is not identical + ∞ . The existence of optimal solutions of ( P P )is guaranteed if for instance, one of the functions f i is coercive and the remaining ones are bounded frombelow. Indeed, under these circumstances, the objective function of ( P P ) is coercive and the statement followsvia [5, Corollary 11.15]. On the other hand, if one of the functions f i is strongly convex, then the objectivefunction of ( P P ) is strongly convex, too, thus ( P P ) has a unique optimal solution (see [5, Corollary 11.16]).We derive in the following two inertial ADMM algorithms for solving (58)-(59). To this end we reformulateProblem 13 as Problem 4 in the product space H m endowed with the inner product and associated norm deﬁnedby h x, u i H m = m X i =1 h x i , u i i H and k x k H m = m X i =1 k x i k H ! / for x = ( x i ) ≤ i ≤ m , u = ( u i ) ≤ i ≤ m ∈ H m , respectively, where h· , ·i H and k · k H denote the inner product andnorm on H , respectively.By using the notation C = { ( x, ..., x ) : x ∈ H} , one can easily rewrite (58) as inf ( x ,...,x m ) ∈H m n f ( x , ..., x m ) + δ C ( x , ..., x m ) o , (62)12here f : H m → R is deﬁned by f ( x , ..., x m ) = m X i =1 f i ( x i ) ∀ ( x , ..., x m ) ∈ H m . This corresponds to the optimization problem (11) with g = δ C : H m → R and L is the identity operator on H m .Notice that the Fenchel dual problem (12) of (62) becomes ( D P ), the regularity condition (13) is equivalentto (60) and the optimality conditions (14) are nothing else than the ones in (61). Moreover, ( x , ..., x m ) ∈ H m is an optimal solution of (62) if and only if x i = x , i = 1 , .., m, and x ∈ H is an optimal solution to ( P P ),while, ( v , ..., v m ) ∈ H m is an optimal solution to the dual of (62) if and only if ( v , ..., v m ) ∈ H m is an optimalsolution to ( D P ). This shows that we are in the context of Problem 4.Writing Algorithm 5 in this setting we get for every k ≥ x k +1 = argmin x ∈H m n f ( x ) + D y k − α k ( y k − y k − ) − γα k ( z k − z k − ) , x E H m + γ k x − z k k H m o (63) z k +1 = α k +1 λ k ( x k +1 − z k ) + (1 − λ k ) α k α k +1 γ (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) (64) z k +1 = argmin z ∈H m n g ( z + z k +1 ) + D − y k − (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , z E H m + γ (cid:13)(cid:13)(cid:13) z − λ k x k +1 − (1 − λ k ) z k (cid:13)(cid:13)(cid:13) H m (cid:27) (65) y k +1 = y k + γ (cid:16) λ k x k +1 + (1 − λ k ) z k − z k +1 (cid:17) + (1 − λ k ) α k (cid:16) y k − y k − + γ ( z k − z k − ) (cid:17) , (66)where x k +1 = ( x k +1 i ) ≤ i ≤ m , z k +1 = ( z k +1 i ) ≤ i ≤ m , z k +1 = ( z k +1 i ) ≤ i ≤ m , y k +1 = ( y k +1 i ) ≤ i ≤ m , x = ( x i ) ≤ i ≤ m and z = ( z i ) ≤ i ≤ m .We give in the following an explicit form of this algorithm. Due to the deﬁnition of f , relation (63) isnothing else than: x k +1 i = argmin x ∈H n f i ( x ) + D y ki − α k ( y ki − y k − i ) − γα k ( z ki − z k − i ) , x E + γ k x − z ki k o , i = 1 , ..., m. (67)Further, from (65) and (29) we derive z k +1 + z k +1 ∈ C and y k +1 ∈ C ⊥ , hence there exists ( u k ) k ≥ ∈ H such that for every k ≥ z k +1 i + z k +1 i = u k +1 , i = 1 , ..., m (68)and m X i =1 y k +1 i = 0 . (69)If we suppose that P mi =1 y ki = 0 for every k ≥

0, then from (66) we derive m X i =1 z k +1 i = λ k m X i =1 x k +1 i + (1 − λ k ) m X i =1 z ki + (1 − λ k ) α k m X i =1 ( z ki − z k − i ) ∀ k ≥ . (70)13rom this, (68) and (64), we get u k +1 = λ k (1 + α k +1 ) m m X i =1 x k +1 i + 1 − α k +1 λ k − λ k m m X i =1 z ki + α k (1 − λ k )(1 + α k +1 ) m m X i =1 ( z ki − z k − i ) ∀ k ≥ . (71)Conversely, if for a ﬁxed k ≥ P mi =1 y k − i = P mi =1 y ki = 0, then from (66), (68) and (71) wehave P mi =1 y k +1 i = 0.All together, we derive the following algorithm and corresponding convergence theorem (notice that for thestatement in (vii) we use also Remark 12). Algorithm 14

Chose y i , y i , z i , z i ∈ H , i = 1 , ..., m , such that P mi =1 y i = P mi =1 y i = 0 , γ > , ( α k ) k ≥ nondecreasing with ≤ α k ≤ α < for every k ≥ , ( λ k ) k ≥ and λ, σ, δ > such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . Suppose that either α = 0 or λ = α = 0 . Further, for every k ≥ set x k +1 i = argmin x ∈H n f i ( x ) + D y ki − α k ( y ki − y k − i ) − γα k ( z ki − z k − i ) , x E + γ k x − z ki k o , i = 1 , ..., m (72) z k +1 i = α k +1 λ k ( x k +1 i − z ki ) + (1 − λ k ) α k α k +1 γ (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m (73) u k +1 = λ k (1 + α k +1 ) m m X i =1 x k +1 i + 1 − α k +1 λ k − λ k m m X i =1 z ki + α k (1 − λ k )(1 + α k +1 ) m m X i =1 ( z ki − z k − i ) (74) z k +1 i = u k +1 − z k +1 i , i = 1 , ..., m (75) y k +1 i = y ki + γ (cid:16) λ k x k +1 i + (1 − λ k ) z ki − z k +1 i (cid:17) + (1 − λ k ) α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m. (76) Theorem 15

In Problem 13 suppose that ( P P ) has an optimal solution, the regularity condition (60) is fulﬁlledand consider the sequences generated by Algorithm 14. Then there exists ( x, v , ..., v m ) ∈ H × H m satisfyingthe optimality conditions (61) , hence x is an optimal solution to ( P P ) , ( v , ..., v m ) is an optimal solution to ( D P ) and v ( P P ) = v ( D P ) , such that the following statements are true:(i) ( x ki ) k ≥ converges weakly to x , i = 1 , ..., m ;(ii) ( z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(iii) ( z ki ) k ∈ N converges weakly to x , i = 1 , ..., m ;(iv) ( x k +1 i − z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(v) ( u k ) k ≥ converges weakly to x ;(vi) ( y ki ) k ∈ N converges weakly to v i , i = 1 , ..., m ;(vii) if f ∗ i is strongly convex for every i = 1 , ..., m , then (( y k ) k ∈ N , ..., ( y km ) k ∈ N ) converges strongly to the uniqueoptimal solution to ( D P ) ; viii) lim k → + ∞ (cid:16)P mi =1 f i ( x k +1 i ) (cid:17) = v ( P P ) = v ( D P ) = lim k → + ∞ (cid:0) − P mi =1 f ∗ i ( − v ki ) (cid:1) , where for every i =1 , ..., m , the sequence ( v ki ) k ≥ is deﬁned by v ki = y ki − γz ki + γx k +1 i − α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) ∀ k ≥ , (77) and ( v ki ) k ≥ converges weakly to v i . Remark 16

If we take λ k = 1 for every k ≥

1, then z k +1 i = α k +1 ( x k +1 i − z ki ) , i = 1 , ..., m, and (see also relation(70)) u k +1 = α k +1 m P mi =1 x k +1 i − α k +1 m P mi =1 x ki , hence the iterative scheme (72) - (76) can be simpliﬁed to x k +1 i = argmin x ∈H n f i ( x ) + D y ki − α k ( y ki − y k − i ) − γα k ( z ki − z k − i ) , x E + γ k x − z ki k o , i = 1 , ..., m (78) z k +1 i = 1 + α k +1 m m X j =1 x k +1 j − α k +1 m m X j =1 x kj − α k +1 ( x k +1 i − z ki ) , i = 1 , ..., m (79) y k +1 i = y ki + γ (cid:16) x k +1 i − z k +1 i (cid:17) , i = 1 , ..., m. (80)If, moreover, α k = 0 for every k ≥

1, (78) - (80) becomes x k +1 i = argmin x ∈H  f i ( x ) + h y ki , x i + γ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) x − m m X j =1 x kj (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  , i = 1 , ..., m (81) y k +1 i = y ki + γ  x k +1 i − m m X j =1 x k +1 j  , i = 1 , ..., m, (82)which is the ADMM algorithm as considered in [13, page 50].By interchanging the roles of f and g in (62) we obtain another inertial ADMM-type algorithm withcorresponding convergence statement. Algorithm 17

Chose y i , y i , z i , z i ∈ H , i = 1 , ..., m , γ > , ( α k ) k ≥ nondecreasing with ≤ α k ≤ α < forevery k ≥ , ( λ k ) k ≥ and λ, σ, δ > such that δ > α (1 + α ) + ασ − α and < λ ≤ λ k ≤ · δ − α h α (1 + α ) + αδ + σ i δ h α (1 + α ) + αδ + σ i ∀ k ≥ . Suppose that either α = 0 or λ = α = 0 . Further, for every k ≥ set x k +1 = 1 m m X i =1 z ki − mγ m X i =1 y ki + α k mγ m X i =1 (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) (83) z k +1 i = α k +1 λ k ( x k +1 − z ki ) + (1 − λ k ) α k α k +1 γ (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m (84) z k +1 i = argmin z ∈H n f i ( z + z k +1 i ) + D − y ki − (1 − λ k ) α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , z E (85)+ γ k z − λ k x k +1 − (1 − λ k ) z ki k o , i = 1 , ..., m (86) y k +1 i = y ki + γ (cid:16) λ k x k +1 + (1 − λ k ) z ki − z k +1 i (cid:17) + (1 − λ k ) α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) , i = 1 , ..., m. (87)15 heorem 18 In Problem 13 suppose that ( P P ) has an optimal solution, the regularity condition (60) is fulﬁlledand consider the sequences generated by Algorithm 17. Then there exists ( x, v , ..., v m ) ∈ H × H m satisfyingthe optimality conditions (61) , hence x is an optimal solution to ( P P ) , ( v , ..., v m ) is an optimal solution to ( D P ) and v ( P P ) = v ( D P ) , such that the following statements are true:(i) ( x k ) k ≥ converges weakly to x ;(ii) ( z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(iii) ( z ki ) k ∈ N converges weakly to x , i = 1 , ..., m ;(iv) ( x k +1 i − z ki ) k ≥ converges strongly to , i = 1 , ..., m ;(v) ( y ki ) k ∈ N converges weakly to v i , i = 1 , ..., m ;(vi) if f ∗ i is strongly convex for i = 1 , ..., m , then (( y k ) k ∈ N , ..., ( y km ) k ∈ N ) converges strongly to the uniqueoptimal solution to ( D P ) ;(vii) lim k → + ∞ (cid:0)P mi =1 f i ( z ki + z ki ) (cid:1) = v ( P P ) = v ( D P ) = lim k → + ∞ (cid:0) − P mi =1 f ∗ i ( − y ki ) (cid:1) . Remark 19

Notice that relation (83) is derived from v k ∈ C ⊥ (see (34)), where for any i = 1 , ..., m thesequence ( v ki ) k ≥ is deﬁned by v ki = y ki − γz ki + γx k +1 − α k (cid:16) y ki − y k − i + γ ( z ki − z k − i ) (cid:17) ∀ k ≥ v ki ) k ≥ converges weakly to v i . References [1] F. Alvarez,

On the minimizing property of a second order dissipative system in Hilbert spaces , SIAMJournal on Control and Optimization 38(4), 1102–1119, 2000[2] F. Alvarez,

Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm formaximal monotone operators in Hilbert space , SIAM Journal on Optimization 14(3), 773–782, 2004[3] F. Alvarez, H. Attouch,

An inertial proximal method for maximal monotone operators via discretizationof a nonlinear oscillator with damping , Set-Valued Analysis 9, 3–11, 2001[4] H. Attouch, J. Peypouquet, P. Redont,

A dynamical approach to an inertial forward-backward algorithmfor convex minimization , SIAM Journal on Optimization 24(1), 232–256, 2014[5] H.H. Bauschke, P.L. Combettes,

Convex Analysis and Monotone Operator Theory in Hilbert Spaces , CMSBooks in Mathematics, Springer, New York, 2011[6] J.M. Borwein and J.D. Vanderwerﬀ,

Convex Functions: Constructions, Characterizations and Counterex-amples , Cambridge University Press, Cambridge, 2010[7] R.I. Bot¸,

Conjugate Duality in Convex Optimization , Lecture Notes in Economics and MathematicalSystems, Vol. 637, Springer, Berlin Heidelberg, 2010[8] R.I. Bot¸, E.R. Csetnek,

Regularity conditions via generalized interiority notions in convex optimization:new achievements and their relation to some classical statements , Optimization 61(1), 35–65, 2012169] R.I. Bot¸, E.R. Csetnek,

An inertial forward-backward-forward primal-dual splitting algorithm for solvingmonotone inclusion problems , arXiv:1402.5291, 2014[10] R.I. Bot¸, E.R. Csetnek, A. Heinrich,

A primal-dual splitting algorithm for ﬁnding zeros of sums of maxi-mally monotone operators , SIAM Journal on Optimization, 23(4), 2011–2036, 2013[11] R.I. Bot¸, E.R. Csetnek, A. Heinrich, C. Hendrich,

On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems , Mathematical Programming, DOI10.1007/s10107-014-0766-0[12] R.I. Bot¸, E.R. Csetnek, C. Hendrich,

Inertial Douglas-Rachford splitting for monotone inclusion problems ,arXiv:1403.3330v2, 2014[13] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein,

Distributed optimization and statistical learning viathe alternating direction method of multipliers , Foundations and Trends in Machine Learning 3, 1–12, 2010[14] L.M. Brice˜no-Arias, P.L. Combettes,

A monotone + skew splitting model for composite monotone inclu-sions in duality , SIAM Journal on Optimization 21(4), 1230–1250, 2011[15] A. Cabot, P. Frankel,

Asymptotics for some proximal-like method involving inertia and memory aspects ,Set-Valued and Variational Analysis 19, 59–74, 2011[16] J. Eckstein,

Augmented Lagrangian and alternating direction methods for convex optimization: a tutorialand some illustrative computational results , Rutcor Research Report 32-2012, 2012[17] J. Eckstein, D.P. Bertsekas,

On the Douglas-Rachford splitting method and the proximal point algorithmfor maximal monotone operators , Mathematical Programming 55, 293–318, 1992[18] I. Ekeland, R. Temam,

Convex Analysis and Variational Problems , North-Holland Publishing Company,Amsterdam, 1976[19] E. Esser,

Applications of Lagrangian-based alternating direction methods and connections to split Bregman ,CAM Reports 09-31, UCLA, Center for Applied Mathematics, 2009[20] D. Gabay,

Applications of the method of multipliers to variational inequalities , in M. Fortin and R. Glowin-ski, editors, Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems,North-Holland, Amsterdam, 1983[21] P.-E. Maing´e,

Convergence theorems for inertial KM-type algorithms , Journal of Computational and Ap-plied Mathematics 219, 223–236, 2008[22] P.-E. Maing´e, A. Moudaﬁ,

Convergence of new inertial proximal methods for dc programming , SIAMJournal on Optimization 19(1), 397–413, 2008[23] A. Moudaﬁ, M. Oliny,

Convergence of a splitting inertial proximal method for monotone operators , Journalof Computational and Applied Mathematics 155, 447–454, 2003[24] R.T. Rockafellar,

On the maximal monotonicity of subdiﬀerential mappings , Paciﬁc Journal of Mathematics33(1), 209–216, 1970[25] R.T. Rockafellar,

Monotone operators and the proximal point algorithm , SIAM Journal on Control andOptimization 14(5), 877–898, 1976 1726] S. Simons,

From Hahn-Banach to Monotonicity , Springer, Berlin, 2008[27] C. Z˘alinescu,