A Partially Inexact Alternating Direction Method of Multipliers and its Iteration-Complexity Analysis
aa r X i v : . [ m a t h . O C ] M a y A Partially Inexact Alternating Direction Method of Multipliersand its Iteration-Complexity Analysis
Vando A. Adona ∗ Max L.N. Gonçalves ∗ Jefferson G. Melo ∗ May 17,2018
Abstract
This paper proposes a partially inexact alternating direction method of multipliers for computing ap-proximate solution of a linearly constrained convex optimization problem. This method allows its firstsubproblem to be solved inexactly using a relative approximate criterion, whereas a proximal term isadded to its second subproblem in order to simplify it. A stepsize parameter is included in the updat-ing rule of the Lagrangian multiplier to improve its computational performance. Pointwise and ergodicinteration-complexity bounds for the proposed method are established. To the best of our knowledge,this is the first time that complexity results for an inexact ADMM with relative error criteria has beenanalyzed. Some preliminary numerical experiments are reported to illustrate the advantages of the newmethod.2000 Mathematics Subject Classification: 47H05, 49M27, 90C25, 90C60, 65K10.Key words: alternating direction method of multipliers, relative error criterion, hybrid extragradientmethod, convex program, pointwise iteration-complexity, ergodic iteration-complexity.
In this paper, we propose and analyze a partially inexact alternating direction method of multipliers (ADMM)for computing approximate solutions of a linearly constrained convex optimization problem. Recently, therehas been some growing interest in the ADMM [1, 2] due to its efficiency for solving the aforementioned classof problems; see, for instance, [3] for a complete review.Many variants of the ADMM have been studied in the literature. Some of these variants included proximalterms in the subproblems of the ADMM in order to make them easier to solve or even to have closed-form solutions. Others added a stepsize parameter in the Lagrangian multiplier updating to improve theperformance of the method; see, for example, [4, 5, 6, 7, 8, 9, 10, 11, 12] for papers in which one or both of theabove two strategies are used. Other works focused on studying inexact versions of the ADMM with differenterror conditions; for instance, [13, 14, 15] analyzed variants whose subproblems are solved inexactly usingrelative error criteria. Summable error conditions were also considered in [13, 16]; however, it was observedin [13] that, in general, relative error conditions are more interesting from a computational viewpoint. Theaforementioned relative error criteria were derived from the one considered in [17] to study inexact augmentedLagrangian method. The latter work, on the other hand, was motivated by [18, 19], where the authors proposedinexact proximal-point type methods based on relative error criteria.The contributions of this paper are threefold:(1) to propose an ADMM variant which combines three of the aforementioned strategies. Namely, (i) thefirst subproblem of the method is allowed to be solved inexactly in such a way that a relative approximatecriterion is satisfied; (ii) a general proximal term is added into the second subproblem; (iii) a stepsizeparameter is included in the updating rule of the Lagrangian multiplier; ∗ IME, Universidade Federal de Goiás, Goiânia, GO 74001-970, BR. (E-mails: [email protected] , [email protected] and [email protected] ). The work of these authors was supported in part by CAPES, CNPq Grants 302666/2017-6 and 406975/2016-7.
1) to provide pointwise and ergodic iteration-complexity bounds for the proposed method;3) to illustrate, by means of numerical experiments, the efficiency of the new method for solving somereal-life applications.Iteration-complexity results have been considered in the literature for most of exact ADMM variants.Paper [20] presented an ergodic iteration-complexity analysis of the ADMM. Subsequently, [6] and [7] analyzedergodic and pointwise iteration-complexities of a partially proximal ADMM, respectively. We refer the readerto [21, 22, 23, 24, 25, 26, 27, 28] where iteration-complexities of other ADMM variants have been considered.The complexity analyses of the present paper are based on showing that the proposed method falls withinthe setting of a hybrid proximal extragradient framework whose iteration-complexity bounds were establishedin [9]. To the best of our knowledge, this work is the first one to present iteration-complexity results for aninexact ADMM with relative error.This paper is organized as follows. Section 2 contains some preliminary results and it is divided into twosubsections. The first subsection presents our notation and basic definitions while the second one recalls amodified HPE framework and its basic iteration-complexity results. Section 3 introduces the partially inexactproximal ADMM and establishes its iteration-complexity bounds. Section 4 is devoted to the numericalexperiments.
This section is divided into two subsections. The first one presents our notation and basic results. The secondsubsection recalls a modified HPE framework and its iteration-complexity bounds.
This section presents some definitions, notation and basic results used in this paper.The p − norm ( p ≥ ) and maximum norm of z ∈ R n are denoted, respectively, by k z k p = ( P ni =1 | z i | p ) /p and k z k ∞ = max {| z | , . . . , | z n |} , when p = 2 , we omit the indice p . Let V be a finite-dimensional realvector space with inner product and associated norm denoted by h· , ·i and k · k , respectively. For a givenself-adjoint positive semidefinite linear operator Q : V → V , the seminorm induced by Q on V is defined by k · k Q = h Q ( · ) , ·i / . Since h Q ( · ) , ·i is symmetric and bilinear, for all v, ˜ v ∈ V , we have h Qv, ˜ v i ≤ k v k Q + k ˜ v k Q , k v + v ′ k Q ≤ (cid:0) k v k Q + k v ′ k Q (cid:1) . (1)Given a set-valued operator T : V ⇒ V , its domain and graph are defined, respectively, asDom T = { v ∈ V : T ( v ) = ∅} and Gr ( T ) = { ( v, ˜ v ) ∈ V × V | ˜ v ∈ T ( v ) } . The operator T is said to be monotone iff h u − v, ˜ u − ˜ v i ≥ ∀ ( u, ˜ u ) , ( v, ˜ v ) ∈ Gr ( T ) . Moreover, T is maximal monotone iff it is monotone and there is no other monotone operator S such that Gr ( T ) ⊂ Gr ( S ) . Given a scalar ε ≥ , the ε -enlargement T [ ε ] : V ⇒ V of a monotone operator T : V ⇒ V isdefined as T [ ε ] ( v ) = { ˜ v ∈ V : h ˜ v − ˜ u, v − u i ≥ − ε, ∀ ( u, ˜ u ) ∈ Gr ( T ) } ∀ v ∈ V . (2)The ε -subdifferential of a proper closed convex function f : V → [ −∞ , ∞ ] is defined by ∂ ε f ( v ) = { u ∈ V : f (˜ v ) ≥ f ( v ) + h u, ˜ v − v i − ε, ∀ ˜ v ∈ V} ∀ v ∈ V . When ε = 0 , then ∂ f ( v ) is denoted by ∂f ( v ) and is called the subdifferential of f at v . It is well-known thatthe subdifferential operator of a proper closed convex function is maximal monotone [29].The next result is a consequence of the transportation formula in [30, Theorem 2.3] combined with [31,Proposition 2(i)]. 2 heorem 2.1. Suppose T : V ⇒ V is maximal monotone and let ˜ v i , v i ∈ V , for i = 1 , · · · , k , be such that v i ∈ T (˜ v i ) and define ˜ v ak = 1 k k X i =1 ˜ v i , v ak = 1 k k X i =1 v i , ε ak = 1 k k X i =1 h v i , ˜ v i − ˜ v ak i . Then, the following hold:(a) ε ak ≥ and v ak ∈ T [ ε ak ] (˜ v ak ) ;(b) if, in addition, T = ∂f for a proper closed and convex function f , then v ak ∈ ∂ ε ak f (˜ v ak ) . Our problem of interest in this section is the monotone inclusion problem ∈ T ( z ) , (3)where T : Z ⇒ Z is a maximal monotone operator and Z is a finite-dimensional real vector space. We assumethat the solution set of (3), denoted by T − (0) , is nonempty.The modified HPE framework for computing approximate solutions of (3) is formally described as follows.This framework was first considered in [9] in a more general setting. Modified HPE framework
Step 0. Let z ∈ Z , η ∈ R + , σ ∈ [0 , and a self-adjoint positive semidefinite linear operator M : Z → Z begiven, and set k = 1 .Step 1. Obtain ( z k , ˜ z k , η k ) ∈ Z × Z × R + such that M ( z k − − z k ) ∈ T (˜ z k ) , k ˜ z k − z k k M + η k ≤ σ k ˜ z k − z k − k M + η k − . (4)Step 2. Set k ← k + 1 and go to step 1. Remark 2.2. (i) The modified HPE framework is a generalization of the proximal point method. Indeed, if M = I and σ = η = 0 , then (4) implies that η k = 0 , z k = ˜ z k and ∈ z k − z k − + T ( z k ) for every k ≥ ,which corresponds to the proximal point method to solve problem (3) . (ii) In Section 3, we propose a partiallyinexact proximal ADMM and show that it falls within the modified HPE framework setting. In particular, itis specified how the triple ( z k , ˜ z k , η k ) can be computed in this context. It is worth mentioning that the use ofa positive semidefinite operator M instead of a positive definite is essential in the analysis of Section 3 (see (14) ). More examples of algorithms which can be seen as special cases of HPE-type frameworks can be foundin [18, 20, 32]. We first present a pointwise iteration-complexity bound for the modified HPE framework, whose proof canbe found in [21, Theorem 2.2] (see also [9, Theorem 3.3] for a more general result).
Theorem 2.3.
Let { ( z k , ˜ z k , η k ) } be generated by the modified HPE framework. Then, for every k ≥ , wehave M ( z k − − z k ) ∈ T (˜ z k ) and there exists i ≤ k such that k z i − − z i k M ≤ √ k r σ ) d + 4 η − σ , where d = inf {k z ∗ − z k M : z ∗ ∈ T − (0) } . Remark 2.4.
For a given tolerance ¯ ρ > , it follows from Theorem 2.3 that in at most O (1 / ¯ ρ ) iterations,the modified HPE framework computes an approximate solution ˜ z of (3) and a residual r in the sense that M r ∈ T (˜ z ) and k r k M ≤ ¯ ρ . Although M is assumed to be only semidefinite positive, if k r k M = 0 , then M / r = 0 which, in turn, implies that M r = 0 . Hence, the latter inclusion implies that ˜ z is a solution ofproblem (3) . Therefore, the aforementioned concept of approximate solutions makes sense.
3e now state an ergodic iteration-complexity bound for the modified HPE framework, whose proof can befound in [21, Theorem 2.3] (see also [9, Theorem 3.4] for a more general result).
Theorem 2.5.
Let { ( z k , ˜ z k , η k ) } be generated by the modified HPE framework. Consider the ergodic sequence { (˜ z ak , r ak , ε ak ) } defined by ˜ z ak = 1 k k X i =1 ˜ z i , r ak = 1 k k X i =1 ( z i − − z i ) , ε ak = 1 k k X i =1 h M ( z i − − z i ) , ˜ z i − ˜ z ak i , ∀ k ≥ . Then, for every k ≥ , there hold ε ak ≥ , M r ak ∈ T [ ε ak ] (˜ z ak ) and k r ak k M ≤ √ d + η k , ε ak ≤ − σ )( d + η )2(1 − σ ) k , where d is as defined in Theorem 2.3. Remark 2.6.
For a given tolerance ¯ ρ > , Theorem 2.5 ensures that in at most O (1 / ¯ ρ ) iterations of themodified HPE framework, the triple (˜ z, r, ε ) := (˜ z ak , r ak , ε ak ) satisfies M r ∈ T ε (˜ z ) and max {k r k M , ε } ≤ ¯ ρ .Similarly to Remark 2.4, we see that ˜ z can be interpreted as an approximate solution of (3) . Note that, theabove ergodic complexity bound is better than the pointwise one by a factor of O (1 / ¯ ρ ) ; however, the aboveinclusion is, in general, weaker than that of the pointwise case. Consider the following linearly constrained problem min { f ( x ) + g ( y ) : Ax + By = b } , (5)where X , Y and Γ are finite-dimensional real inner product vector spaces, f : X → ¯ R and g : Y → ¯ R areproper, closed and convex functions, A : X → Γ and B : Y → Γ are linear operators, and b ∈ Γ .In this section, we propose a partially inexact proximal ADMM for computing approximate solutions of (5)and establish pointwise and ergodic iteration-complexity bounds for it.We begin by formally stating the method. Partially Inexact Proximal ADMM
Step 0. Let an initial point ( x , y , γ ) ∈ X × Y × Γ , a penalty parameter β > , error tolerance parameters τ , τ ∈ [0 , , and a self-adjoint positive semidefinite linear operator H : Y → Y be given. Choose a setpsizeparameter θ ∈ , − τ + p (1 − τ ) + 4(1 − τ )2(1 − τ ) " , (6)and set k = 1 .Step 1. Compute ( v k , ˜ x k ) ∈ X × X such that v k ∈ ∂f (˜ x k ) − A ∗ ˜ γ k , k ˜ x k − x k − + βv k k ≤ τ k ˜ γ k − γ k − k + τ k ˜ x k − x k − k , (7)where ˜ γ k = γ k − − β ( A ˜ x k + By k − − b ) , (8)and compute an optimal solution y k ∈ Y of the subproblem min y ∈Y (cid:26) g ( y ) − h γ k − , By i + β k A ˜ x k + By − b k + 12 k y − y k − k H (cid:27) . (9)Step 2. Set x k = x k − − βv k , γ k = γ k − − θβ ( A ˜ x k + By k − b ) (10)and k ← k + 1 , and go to step 1. 4 emark 3.1. (i) If τ = τ = 0 , then ˜ x k = x k due to the inequality in (7) and the first relation in (10) .Hence, since v k = ( x k − − x k ) /β , the first subproblem of Step 1 is equivalent to compute an exact solution x k ∈ X of the following subproblem min x ∈X (cid:26) f ( x ) − h γ k − , Ax i + β k Ax + By k − − b k + 12 β k x − x k − k (cid:27) , (11) and then the partially inexact proximal ADMM becomes the proximal ADMM with stepsize θ ∈ ]0 , (1 + √ / and proximal terms given by (1 /β ) I and H . Therefore, the proposed method can be seen as an extensionof the proximal ADMM, which subproblem (11) is solved inexactly using a relative approximate criterion.(ii) Subproblem (9) contains a proximal term defined by a self-adjoint positive semidefinite linear operator H which, appropriately chosen, makes the subproblem easier to solve or even to have closed-form solution. Forinstance, if H = sI − βB ∗ B with s > β k B k , subproblem (9) is equivalent to min y ∈Y n g ( y ) + s k y − ¯ y k o , for some ¯ y ∈ Y , which has a closed-form solution when g ( · ) = k · k . (iii) The use of a relative approximatecriterion in (9) requires, as far as we know, the stepsize parameter θ ∈ ]0 , . However, since, in manyapplications, the second subproblem (9) is solved exactly and a stepsize parameter θ > accelerates the method,here only the first subproblem is assumed to be solved inexactly. (iv) The partially inexact proximal ADMM isclose related to [13, Algorithm 2]. Indeed, the latter method corresponds to the former one with H = 0 , θ = 1 and the following condition β |h ˜ x k − x k − , v k i| + β k v k k ≤ τ k ˜ γ k − γ k − k (12) instead of the inequality in (7) . Numerical comparisons between the partially inexact proximal ADMM andAlgorithm 2 in [13] will be provided in Section 4. In the following, we proceed to provide iteration-complexity bounds for the partially inexact proximalADMM. Our analysis is done by showing that it is an instance of the modified HPE framework for computingapproximate solutions of the monotone inclusion problem ∈ T ( x, y, γ ) = ∂f ( x ) − A ∗ γ∂g ( y ) − B ∗ γAx + By − b . (13)We assume that the solution set of (13), denoted by Ω ∗ , is nonempty. The iteration-complexity results willfollow immediately from Theorems 2.3 and 2.5. Let us now introduce the elements required by the setting ofSection 2.2. Namely, consider the vector space Z = X × Y × Γ and the self-adjoint positive semidefinite linearoperator M = I/β H + βB ∗ B ) 00 0 I/ ( θβ ) . (14)In this setting, the quantity d defined in Theorem 2.3 becomes d = inf (cid:8) k ( x − x , y − y , γ − γ ) k M : ( x, y, γ ) ∈ T − (0) (cid:9) . (15)We start by presenting a preliminary technical result, which basically shows that a certain sequence gen-erated by the partially inexact proximal ADMM satisfies the inclusion in (4) with T and M as above. Lemma 3.2.
Consider ( x k , y k , γ k ) and (˜ x k , ˜ γ k ) generated at the k-iteration of the partially inexact proximalADMM. Then, β ( x k − − x k ) ∈ ∂f (˜ x k ) − A ∗ ˜ γ k , (16) ( H + βB ∗ B )( y k − − y k ) ∈ ∂g ( y k ) − B ∗ ˜ γ k , (17) θβ ( γ k − − γ k ) = A ˜ x k + By k − b. (18)5 s a consequence, z k = ( x k , y k , γ k ) and ˜ z k = (˜ x k , y k , ˜ γ k ) satisfy inclusion (4) with T and M as in (13) and (14) , respectively.Proof. Inclusion (16) follows trivially from the inclusion in (7) and the first relation in (10). Now, from theoptimality condition of (9) and the definition of ˜ γ k in (8), we obtain ∈ ∂g ( y k ) − B ∗ γ k − + βB ∗ ( A ˜ x k + By k − b ) + H ( y k − y k − )= ∂g ( y k ) − B ∗ [ γ k − − β ( A ˜ x k + By k − − b )] + βB ∗ B ( y k − y k − ) + H ( y k − y k − )= ∂g ( y k ) − B ∗ ˜ γ k + βB ∗ B ( y k − y k − ) + H ( y k − y k − ) . which proves to (17). The relation (18) follows immediately from the second relation in (10). To end theproof, note that the last statement of the lemma follows directly by (16)–(18) and definitions of T and M in(13) and (14), respectively.The following result presents some relations satisfied by the sequences generated by the partially inexactproximal ADMM. These relations are essential to show that the latter method is an instance of the modifiedHPE framework. Lemma 3.3.
Let { ( x k , y k , γ k ) } and { (˜ x k , ˜ γ k ) } be generated by the partially inexact proximal ADMM. Then,the following hold:(a) for any k ≥ , we have ˜ γ k − γ k − = 1 θ ( γ k − γ k − ) + βB ( y k − y k − ) , ˜ γ k − γ k = 1 − θθ ( γ k − γ k − ) + βB ( y k − y k − ); (b) we have k y − y k H − √ θ h B ( y − y ) , γ − γ i ≤ (cid:26) , θ − θ (cid:27) d , where d is as in (15) ;(c) for every k ≥ , we have θ h γ k − γ k − ,B ( y k − y k − ) i ≥ − θθ h γ k − − γ k − ,B ( y k − y k − ) i + 12 k y k − y k − k H − k y k − − y k − k H . Proof. (a) The first relation follows by noting that the definitions of ˜ γ k and γ k in (8) and (10), respectively,yield ˜ γ k − γ k − = − β ( A ˜ x k + By k − − b ) = 1 θ ( γ k − γ k − ) + βB ( y k − y k − ) . The second relation in (a) follows trivially from the first one.(b) First, note that ≤ β (cid:13)(cid:13)(cid:13)(cid:13) √ θ ( γ − γ ) + βB ( y − y ) (cid:13)(cid:13)(cid:13)(cid:13) = 12 θβ k γ − γ k + 1 √ θ h B ( y − y ) , γ − γ i + β k B ( y − y ) k , which, for every z ∗ = ( x ∗ , y ∗ , γ ∗ ) ∈ Ω ∗ , yields k y − y k H − √ θ h B ( y − y ) , γ − γ i ≤ (cid:18) k y − y k H + 1 θβ k γ − γ k + β k B ( y − y ) k (cid:19) ≤ k y − y ∗ k H + k y − y ∗ k H + 1 θβ k γ − γ ∗ k + 1 θβ k γ − γ ∗ k + β k B ( y − y ∗ ) k + β k B ( y − y ∗ ) k , k y − y k H − √ θ h B ( y − y ) , γ − γ i ≤ k z − z ∗ k M + k z − z ∗ k M , (19)where z = ( x , y , γ ) and z = ( x , y , γ ) . On the other hand, from Lemma 3.2 with k = 1 , we have M ( z − z ) ∈ T (˜ z ) , where ˜ z = (˜ x , y , ˜ γ ) and T is as in (13). Using this fact and the monotonicity of T , weobtain h ˜ z − z ∗ , M ( z − z ) i ≥ for all z ∗ = ( x ∗ , y ∗ , z ∗ ) ∈ Ω ∗ . Hence, k z ∗ − z k M − k z ∗ − z k M = k ˜ z − z k M − k ˜ z − z k M + 2 h ˜ z − z ∗ , M ( z − z ) i≥ k ˜ z − z k M − k ˜ z − z k M . (20)It follows from (14), item (a), and some direct calculations that k ˜ z − z k M = 1 β k ˜ x − x k + 1 θβ k ˜ γ − γ k = 1 β k ˜ x − x k + 1 θβ (cid:13)(cid:13)(cid:13)(cid:13) − θθ ( γ − γ ) + βB ( y − y ) (cid:13)(cid:13)(cid:13)(cid:13) = 1 β k ˜ x − x k + (1 − θ ) βθ k γ − γ k + 2(1 − θ ) θ h B ( y − y ) , γ − γ i + βθ k B ( y − y ) k . (21)Moreover, (14) and item (a) also yield k ˜ z − z k M = 1 β k ˜ x − x k + k y − y k βB ∗ B + H ) + 1 θβ k ˜ γ − γ k ≥ β k ˜ x − x k + β k B ( y − y ) k + τ β k ˜ γ − γ k + 1 − τ θθβ (cid:13)(cid:13)(cid:13)(cid:13) θ ( γ − γ ) + βB ( y − y ) (cid:13)(cid:13)(cid:13)(cid:13) = 1 β k ˜ x − x k + τ β k ˜ γ − γ k + [1 + (1 − τ ) θ ] βθ k B ( y − y ) k + 1 − τ θβθ k γ − γ k + 2(1 − τ θ ) θ h B ( y − y ) , γ − γ i . (22)Combining the above two conclusions, we obtain k ˜ z − z k M − k ˜ z − z k M ≥ β (cid:0) k ˜ x − x k − k ˜ x − x k + τ k ˜ γ − γ k (cid:1) + (1 − τ ) β k B ( y − y ) k + 2 − θ − τ βθ k γ − γ k + 2(1 − τ ) θ h B ( y − y ) , γ − γ i . (23)Now, note that the inequality in (7) with k = 1 and the definition of x in (9) imply that ≤ τ k ˜ x − x k − k ˜ x − x k + τ k ˜ γ − γ k which, combined with (23) and τ ∈ [0 , , yields k ˜ z − z k M − k ˜ z − z k M ≥ (1 − τ ) β k B ( y − y ) k + 2 − θ − τ βθ k γ − γ k + 2(1 − τ ) θ h B ( y − y ) , γ − γ i = 1 − θβθ k γ − γ k + (1 − τ ) (cid:13)(cid:13)(cid:13)(cid:13)p βB ( y − y ) + 1 θ √ β ( γ − γ ) (cid:13)(cid:13)(cid:13)(cid:13) ≥ − θβθ k γ − γ k . Hence, if θ ∈ ]0 , , then we have k ˜ z − z k M ≤ k ˜ z − z k M . (24)7ow, if θ > , then we have k ˜ z − z k M − k ˜ z − z k M ≤ θ − βθ k γ − γ k ≤ θ − θ (cid:18) βθ k γ − γ ∗ k + 1 βθ k γ − γ ∗ k (cid:19) ≤ θ − θ (cid:2) k z − z ∗ k M + k z − z ∗ k M (cid:3) where the second inequality is due to the second property in (1), and the last inequality is due to (14) anddefinitions of z , z and z ∗ . Hence, combining the last estimative with (20), we obtain k z − z ∗ k M ≤ θ − − θ k z − z ∗ k M . Thus, it follows from (20), (24) and the last inequality that k z − z ∗ k M ≤ max (cid:26) , θ − − θ (cid:27) k z − z ∗ k M . (25)Therefore, the desired inequality follows from (19), (25) and the definition of d in (15).(c) From the optimality condition for (9), the definition of ˜ γ k in (8) and item (a), we have, for every k ≥ , ∂g ( y k ) ∋ B ∗ (˜ γ k − βB ( y k − y k − )) − H ( y k − y k − ) = 1 θ B ∗ ( γ k − (1 − θ ) γ k − ) − H ( y k − y k − ) . For any k ≥ , using the above inclusion with k ← k and k ← k − and the monotonicity of ∂g , we obtain θ h B ∗ ( γ k − γ k − ) − (1 − θ ) B ∗ ( γ k − − γ k − ) , y k − y k − i≥ h H ( y k − y k − ) , y k − y k − i − h H ( y k − − y k − ) , y k − y k − i≥ k y k − y k − k H − k y k − − y k − k H , where the last inequality is due to the first property in (1), and so the proof of the lemma follows.We next consider a technical result. Lemma 3.4.
Let scalars τ , τ and θ be as in step 0 of the partially inexact proximal ADMM. Then, thereexists a scalar σ ∈ [ τ , such that the matrix G = " σ − σ − τ ) θ (1 − θ )[ σ − − τ ) θ ](1 − θ )[ σ − − τ ) θ ] σ − − θ − τ ) θ (26) is positive semidefinite.Proof. Note that the matrix G in (26) with σ = 1 reduces to θ (cid:20) − τ (1 − θ )(1 − τ )(1 − θ )(1 − τ ) 2 − θ − τ (cid:21) . Using (6) and τ , τ ∈ [0 , , it can be verified that the above matrix is positive definite. Hence, we concludethat there exists ˆ σ ∈ [0 , such that G is positive semidefinite for all σ ∈ [ˆ σ, . Therefore, the lemma followsby taking σ = max { τ , ˆ σ } . In the following, we show that the partially inexact proximal ADMM can be regarded as an instance ofthe modified HPE framework. 8 roposition 3.5.
Let { ( x k , y k , γ k ) } and { (˜ x k , ˜ γ k ) } be generated by the partially inexact proximal ADMM. Letalso T , M and d be as in (13) , (14) and (15) , respectively. Define z = ( x , y , γ ) , µ = 4[ σ − − τ ) θ ] θ / max (cid:26) , θ − θ (cid:27) , η = µd (27) and, for all k ≥ , z k = ( x k , y k , γ k ) , ˜ z k = (˜ x k , y k , ˜ γ k ) , (28) η k = [ σ − − θ − τ ) θ ] βθ k γ k − γ k − k + [ σ − − τ ) θ ] θ k y k − y k − k H , (29) where σ ∈ [ τ , is given by Lemma 3.4. Then, ( z k , ˜ z k , η k ) satisfies the error condition in (4) for every k ≥ .As a consequence, the partially inexact proximal ADMM is an instance of the modified HPE framework.Proof. First of all, since the matrix G in (26) is positive semidefinite and σ ∈ [ τ , , we have [ σ − − τ ) θ ] ≥ [ σ − σ − τ ) θ ] = g ≥ . (30)Now, using (14) and definitions of { z k } and { ˜ z k } in (28), we obtain k ˜ z k − z k − k M = 1 β k ˜ x k − x k − k + k y k − y k − k H + β k B ( y k − y k − ) k + 1 βθ k ˜ γ k − γ k − k and k ˜ z k − z k k M = 1 β k ˜ x k − x k k + 1 βθ k ˜ γ k − γ k k . Hence, σ k ˜ z k − z k − k M − k ˜ z k − z k k M = 1 β (cid:0) σ k ˜ x k − x k − k − k ˜ x k − x k k + τ k ˜ γ k − γ k − k (cid:1) + σ k y k − y k − k H + σβ k B ( y k − y k − ) k + σ − τ θβθ k ˜ γ k − γ k − k − βθ k ˜ γ k − γ k k . (31)Note that the inequality in (7) and definition of x k in (9) imply that ≤ τ k ˜ x k − x k − k − k ˜ x k − x k k + τ k ˜ γ k − γ k − k which, combined with (31) and the fact that σ ≥ τ , yields σ k ˜ z k − z k − k M − k ˜ z k − z k k M ≥ σ k y k − y k − k H + σβ k B ( y k − y k − ) k + σ − τ θβθ k ˜ γ k − γ k − k − βθ k ˜ γ k − γ k k . (32)On the other hand, it follows from Lemma 3.3(a) that σ − τ θβθ k ˜ γ k − γ k − k − βθ k ˜ γ k − γ k k = σ − τ θβθ (cid:13)(cid:13)(cid:13)(cid:13) θ ( γ k − γ k − ) + βB ( y k − y k − ) (cid:13)(cid:13)(cid:13)(cid:13) − βθ (cid:13)(cid:13)(cid:13)(cid:13) − θθ ( γ k − γ k − ) + βB ( y k − y k − ) (cid:13)(cid:13)(cid:13)(cid:13) = σ − − θ − τ ) θβθ k γ k − γ k − k + ( σ − − τ θ ) βθ k B ( y k − y k − ) k + 2[ σ − − τ ) θ ] θ h γ k − γ k − , B ( y k − y k − ) i . σ k ˜ z k − z k − k M − k ˜ z k − z k k M ≥ σ k y k − y k − k H + [ σ − − θ − τ ) θ ] βθ k γ k − γ k − k + [ σ − σ − τ ) θ ] βθ k B ( y k − y k − ) k + 2[ σ − − τ ) θ ] θ h γ k − γ k − , B ( y k − y k − ) i . (33)We will now consider two cases: k = 1 and k > .Case 1 ( k = 1 ): It follows from (33) with k = 1 , (30) and Lemma 3.3(b) that σ k ˜ z − z k M −k ˜ z − z k M ≥ [ σ − − θ − τ ) θ ] βθ k γ − γ k + [ σ − σ − τ ) θ ] βθ k B ( y − y ) k + [ σ − − τ ) θ + σθ / ] θ / k y − y k H − σ − − τ ) θ ] θ / max (cid:26) , θ − θ (cid:27) d which, combined with definitions of η and η , yields σ k ˜ z − z k M − k ˜ z − z k M + η − η ≥ [ σ − σ − τ ) θ ] θ (cid:18) β k B ( y − y ) k + 1 √ θ k y − y k H (cid:19) + [(1 − σ )(1 + √ θ − θ ) + τ θ ] θ k y − y k H . (34)Using (6), we have θ ∈ ]0 , (1 + √ / which in turn implies that (1 + √ θ − θ ) ≥ . Hence, inequality (4) with k = 1 follows from (30), (34) and the fact that σ < .Case 2 ( k > ): It follows from (33), (30) and Lemma 3.3(c) that σ k ˜ z k − z k − k M − k ˜ z k − z k k M ≥ [ σ − − θ − τ ) θ ] βθ k γ k − γ k − k + [ σ − σ − τ ) θ ] θ β k B ( y k − y k − ) k + 2(1 − θ )[ σ − − τ ) θ ] θ h γ k − − γ k − , B ( y k − y k − ) i + [ σ − − τ ) θ ] θ (cid:0) k y k − y k − k H − k y k − − y k − k H (cid:1) which, combined with definition of { η k } in (29), yields σ k ˜ z k − z k − k M − k ˜ z k − z k k M + η k − − η k ≥ [ σ − σ − τ ) θ ] βθ k B ( y k − y k − ) k + [ σ − − θ − τ ) θ ] βθ k γ k − − γ k − k + 2(1 − θ )[ σ − − τ ) θ ] θ h γ k − − γ k − , B ( y k − y k − ) i = 1 θ (cid:28) G (cid:20) √ βB ( y k − y k − )( γ k − − γ k − ) /θ √ β (cid:21) , (cid:20) √ βB ( y k − y k − )( γ k − − γ k − ) /θ √ β (cid:21)(cid:29) where G is as in (26). Therefore, since G is positive semidefinite (see Lemma 3.4(b)), we conclude thatinequality (4) also holds for k > . To end the proof, note that the last statement of the proposition followstrivially from the first one and Lemma 3.2.We are now ready to present our main results of this paper, namely, we establish pointwise and ergodiciteration-complexity bounds for the partially inexact proximal ADMM. Theorem 3.6.
Consider the sequences { ( x k , y k , γ k ) } and { (˜ x k , ˜ γ k ) } generated by the partially inexact proximalADMM. Then, for every k ≥ , β ( x k − − x k )( H + βB ∗ B )( y k − − y k ) βθ ( γ k − − γ k ) ∈ ∂f (˜ x k ) − A ∗ ˜ γ k ∂g ( y k ) − B ∗ ˜ γ k A ˜ x k + By k − b (35)10 nd there exist σ ∈ ]0 , and i ≤ k such that (cid:18) β k x i − − x i k + k y i − − y i k H + βB ∗ B ) + 1 βθ k γ i − − γ i k (cid:19) / ≤ √ d √ k r σ ) + 4 µ − σ where d and µ are as in (15) and (27) , respectively.Proof. This result follows by combining Proposition 3.5 and Theorem 2.3.
Remark 3.7.
For a given tolerance ¯ ρ > , Theorem 3.6 ensures that in at most O (1 / ¯ ρ ) iterations, thepartially inexact proximal ADMM provides an approximate solution ˜ z := (˜ x, y, ˜ γ ) of (13) together with aresidual r := ( r x , r y , r γ ) in the sense that β r x ∈ ∂f (˜ x ) − A ∗ ˜ γ, ( H + βB ∗ B ) r y ∈ ∂g ( y ) − B ∗ ˜ γ, βθ r γ = A ˜ x + By − b, k ( r x , r y , r γ ) k M ≤ ¯ ρ, where M is as in (14) . Note that the above relations are equivalent to M r ∈ T (˜ z ) and k r k M ≤ ¯ ρ with T as in (13) . Theorem 3.8.
Let the sequences { ( x k , y k , γ k ) } and { (˜ x k , ˜ γ k ) } be generated by the partially inexact proximalADMM. Consider the ergodic sequences { ( x ak , y ak , γ ak ) } , { (˜ x ak , ˜ γ ak ) } , { ( r ak,x , r ak,y , r ak,γ ) } and { ( ε ak,x , ε ak,y ) } definedby ( x ak , y ak , γ ak ) = 1 k k X i =1 ( x i , y i , γ i ) , (˜ x ak , ˜ γ ak ) = 1 k k X i =1 (˜ x i , ˜ γ i ) , ( r ak,x , r ak,y , r ak,γ ) = 1 k k X i =1 ( r i,x , r i,y , r i,γ ) , (36) ( ε ak,x , ε ak,y ) = 1 k k X i =1 ( h r i,x /β + A ∗ ˜ γ i , ˜ x i − ˜ x ak i , h ( H + βB ∗ B ) r i,y + B ∗ ˜ γ i , y i − y ak i ) (37) where ( r i,x , r i,y , r i,γ ) = ( x i − − x i , y i − − y i , γ i − − γ i ) . (38) Then, for every k ≥ , we have ε ak,x , ε ak,y ≥ , β r ak,x ( H + βB ∗ B ) r ak,y βθ r ak,γ ∈ ∂ ε ak,x f ( x ak ) − A ∗ ˜ γ ak ∂ ε ak,y g ( y ak ) − B ∗ ˜ γ ak Ax ak + By ak − b, , (39) and there exists σ ∈ ]0 , such that (cid:18) β k r ak,x k + k r ak,y k H + βB ∗ B ) + 1 βθ k r ak,γ k (cid:19) / ≤ p (1 + µ ) d k (40) and ε ak,x + ε ak,y ≤ µ )(3 − σ ) d − σ ) k (41) where d and µ are as in (15) and (27) , respectively.Proof. By combining Proposition 3.5 and Theorem 2.5, we conclude that inequality (40) holds, and ε ak ≤ µ )(3 − σ ) d − σ ) k , (42)where ε ak = 1 k k X i =1 h r i,x /β, ˜ x i − ˜ x ak i + k X i =1 h ( H + βB ∗ B ) r i,y , y i − y ak i + k X i =1 h r i,γ / ( θβ ) , ˜ γ i − ˜ γ ak i ! (43)11n the other hand, (18), (36) and (38) yield A ˜ x k + By k = 1 θβ r k,γ + b, A ˜ x ak + By ak = 1 θβ r ak,γ + b. Additionally, it follows from definitions of r i,γ and r ak,γ that k k X i =1 h ˜ γ i , r i,γ − r ak,γ i = 1 k k X i =1 h ˜ γ i − ˜ γ ak , r i,γ − r ak,γ i = 1 k k X i =1 h ˜ γ i − ˜ γ ak , r i,γ i . Hence, combining the identity in (43) with the last two equations, we have ε ak = 1 k k X i =1 (cid:16) h r i,x /β, ˜ x i − ˜ x ak i + h ( H + βB ∗ B ) r i,y , y i − y ak i (cid:17) + 1 k k X i =1 (cid:10) ˜ γ i , (cid:0) r i,γ − r ak,γ (cid:1) / ( θβ ) (cid:11) = 1 k k X i =1 (cid:16) h r i,x /β, ˜ x i − ˜ x ak i + h ( H + βB ∗ B ) r i,y , y i − y ak i + h ˜ γ i , A ˜ x i − A ˜ x ak + By i − By ak i (cid:17) = 1 k k X i =1 h r i,x /β + A ∗ ˜ γ i , ˜ x i − ˜ x ak i + 1 k k X i =1 h ( H + βB ∗ B ) r i,y + B ∗ ˜ γ i , y i − y ak i = ε ak,x + ε ak,y , where the last equality is due to the definitions of ε ak,x and ε ak,y in (37). Therefore, the inequality in (41)follows trivially from the last equality and (42).To finish the proof of the theorem, note that direct use of Theorem 2.1(b) (for f and g ), (35)–(38) give ε ak,x , ε ak,y ≥ and the inclusion in (39). Remark 3.9.
For a given tolerance ¯ ρ > , Theorem 3.8 ensures that in at most O (1 / ¯ ρ ) iterations, thepartially inexact proximal ADMM provides, in the ergodic sense, an approximate solution ˜ z := (˜ x a , y a , ˜ γ a ) of (13) together with residues r := ( r ax , r ay , r aγ ) and ( ε ax , ε ay ) such that β r ax ∈ ∂ ε ax f (˜ x a ) − A ∗ ˜ γ a , ( H + βB ∗ B ) r ay ∈ ∂ ε ay g ( y a ) − B ∗ ˜ γ a , βθ r aγ = A ˜ x a + By a − b, k ( r ax , r ay , r aγ ) k M ≤ ¯ ρ, where M is as in (14) . The above ergodic complexity bound is better than the pointwise one by a factorof O (1 / ¯ ρ ) ; however, the above inclusion is, in general, weaker than that of the pointwise case due to the ε -subdifferentials of the f and g instead of the subdifferentials. In this section, we report some numerical experiments to illustrate the performance of the partially inexactproximal ADMM (PIP-ADMM) on two classes of problems, namely, LASSO and L − regularized logisticregression. Our main goal is to show that, in some applications, the method performs better with a stepsizeparameter θ > instead of the choice θ = 1 as considered in the related literature. Similarly to [13, 14],we also used a hybrid inner stopping criterion for the PIP-ADMM, i.e., the inner-loop terminates when v k satisfies either the inequality in (7) or k v k k ≤ − . This strategy is motivated by the fact that, closeto approximate solutions, the former condition seems to be more restrictive than the latter. We set τ =0 . θ − θ ) / ( θ (2 − θ )) , τ = 1 − − and H = 0 . For a comparison purpose, we also run [13, Algorithm 2],denoted here by relerr-ADMM; see Remark 3.1(iv) for more details on the relationship between the PIP-ADMM and the relerr-ADMM. As suggested by [13], the error tolerance parameter τ in (12) was taken equalto . . For all tests, both algorithms used the initial point ( x , y , γ ) = (0 , , , the penalty parameter β = 1 , and stopped when the following condition was satisfied k ( x k − x k − , y k − y k − , γ k − γ k − ) k M ≤ − , where M is as in (14). The computational results were obtained using MATLAB R2015a on a 2.4GHz Intel(R)Core i7 computer with 8GB of RAM. 12 .1 LASSO Problem We consider to approximately solve the LASSO problem [33, 34] min x ∈ R n k Cx − d k + δ k x k where C ∈ R m × n , d ∈ R m , and δ is a regularization parameter. We set δ = 0 . k C ∗ d k ∞ . By introducing anew variable, we can rewrite the above problem as min (cid:26) k Cx − d k + δ k y k : y − x = 0 , x ∈ R n , y ∈ R n (cid:27) . (44)Obviously, (44) is an instance of (5) with f ( x ) = (1 / k Cx − d k , g ( y ) = δ k y k , A = − I , B = I and b = 0 .Note that, in this case, the pair (˜ x k , ˜ v k ) in (7) can be obtained by computing an approximate solution ˜ x k witha residual ˜ v k of the following linear system ( C ∗ C + βI ) x = ( C ∗ d + βy k − − γ k − ) . For approximately solving the above linear system, we used the conjugate gradient method [35] with startingpoint C ∗ d + βy k − − γ k − . Note also that subproblem (9) has a closed-form solution y k = shrinkage δ/β (˜ x k + γ k − /β ) , where the shrinkage operator is defined asshrinkage κ : R n → R n , ( shrinkage κ ( a )) i = sign ( a i ) max(0 , | a i | − κ ) i = 1 , , . . . , n, (45)with sign ( · ) denoting the sign function.We first tested the methods for solving 3 randomly generated LASSO problem instances. For a givendimension m × n , we generated a random matrix C and scaled its columns to have unit l -norm. The vector d ∈ R m was chosen as d = Cx + √ . y , where the (100 /n ) − sparse vector x ∈ R n and the noisy vector y ∈ R m were also generated randomly.Table 1: Performance of the relerr-ADMM and PIP-ADMM to solve 3 randomly generated LASSO problems. Dim. of A relerr-ADMM PIP-ADMM ( θ = 1 ) PIP-ADMM ( θ = 1 . ) PIP-ADMM ( θ = 1 . ) m × n Out Inner Time Out Inner Time Out Inner Time Out Inner Time ×
26 195 11.1 26 195 10.2 22 169 8.8 19 172 7.9 ×
26 193 22.7 26 193 20.9 21 155 20.9 19 169 17.9 ×
25 185 40.9 25 185 36.7 21 158 34.0 18 159 29.3
We also tested the methods on five standard data sets from the Elvira biomedical data set repository [36].The first data set is the colon tumor gene expression [37] with m = 62 and n = 2000 , the second is the centralnervous system (CNS) data [38] with m = 60 and n = 7129 , the third is the prostate cancer data [39] with m = 102 and n = 12600 , the fourth is the Leukemia cancer-ALLMLL data [40] with m = 38 and n = 7129 ,and the fifth is the lung cancer-Michigan data [41] with m = 96 and n = 7129 . As in the randomly generatedproblems, we scaled the columns of C in order to have unit l -norm.The performances of the relerr-ADMM and PIP-ADMM are listed in Tables 1 and 2, in which “Out" and“Inner" denote the number of iterations and the total number of inner iterations of the methods, respectively,whereas “Time" is the CPU time in seconds. From these tables, we see that the relerr-ADMM and the PIP-ADMM with θ = 1 had similar performances. However, the PIP-ADMM with θ = 1 . and θ = 1 . clearlyoutperformed the relerr-ADMM. 13able 2: Performance of the relerr-ADMM and PIP-ADMM on 5 data sets. Data set relerr-ADMM PIP-ADMM ( θ = 1 ) PIP-ADMM ( θ = 1 . ) PIP-ADMM ( θ = 1 . )Out Inner Time Out Inner Time Out Inner Time Out Inner TimeColon 87 1535 11.9 87 1517 11.9 78 1378 10.8 72 1390 10.2CNS 204 5979 466.6 204 5967 467.1 179 5293 425.7 164 5267 383.5Prostate 368 16176 3523.5 366 16030 3502.6 298 13212 2791.2 252 12319 2642.4Leukemia 415 7435 813.3 415 7435 811.6 347 6290 674.2 297 5710 591.4Lung 485 10975 1008.6 485 10949 1023.4 379 8612 805.6 314 7736 679.1 L − regularized Logistic Regression Consider the L − regularized logistic regression problem [42] min ( u,t ) ∈ R n × R ( m X i =1 log (1 + exp ( − d i [ h c i , u i + t ])) + δm k u k ) , where ( c i , d i ) ∈ R n ×{− , +1 } , for every i = 1 , . . . , m , and δ is a regularization parameter. We set δ = 0 . λ max,where λ max is defined as in [42, Subsection 2.1]. Note that the above problem can be rewritten as min ( x,u,t ) ∈ R n +1 × R n × R ( m X i =1 log (1 + exp ( − d i h (1 , c i ) , x i )) + δm k u k : ( u, t ) − x = 0 ) , (46)which is an instance of (5) with f ( x ) = P mi =1 log (1 + exp ( − d i h (1 , c i ) , x i )) , g ( y ) = g ( u, t ) = mδ k u k , A = − I , B = I , and b = 0 . In this case, the pair (˜ x k , ˜ v k ) in (7) was obtained as follows: the iterate ˜ x k was computedby the Newton method [35] with starting point equal to (0 , . . . , , as an approximate solution of the followingunconstrained optimization problem min x ∈ R n +1 ( h ( x ) = m X i =1 log (1 + exp ( − d i h (1 , c i ) , x i )) + h x, γ k − i + β k y k − − x k ) , whereas ˜ v k was taken as ˜ v k = ∇ h (˜ x k ) . Note that (9) has a closed-form solution y k = ( u k , t k ) given by u k = shrinkage mδ/β (cid:0) ˜ x uk + γ uk − /β (cid:1) , t k = ˜ x tk + γ tk − /β, where ˜ x uk , γ uk ∈ R n and ˜ x tk , γ tk ∈ R t are the components of the vectors ˜ x k and γ k , i.e., (˜ x uk , ˜ x tk ) = ˜ x k and ( γ uk , γ tk ) = γ k , and the operator shrinkage is as in (45).We tested the methods for solving seven L − regularized logistic regression problem instances. We selectedfour instances of Section 4.1, and three from the ICU Machine Learning Repository [43], namely, the ionospheredata [44] with m = 351 and n = 34 , the secom data with m = 1567 and n = 590 , and the spambase datawith m = 4601 and n = 57 . We also scaled the columns (resp. rows) of C = [ c , . . . , c n ] ∗ to have unit l -normwhen n ≥ m (resp. m > n ).Tables 3 reports the performances of the relerr-ADMM and PIP-ADMM for solving the aforementionedseven instances of the problem (46). In Table 3, “Out" and “Inner" are the number of iterations and the totalof inner iterations of the methods, respectively, whereas “Time" is the CPU time in seconds. Similarly tothe numerical results of Section 4.1, we observe that the relerr-ADMM and the PIP-ADMM with θ = 1 hadsimilar performances, whereas the PIP-ADMM with θ = 1 . and θ = 1 . outperformed the relerr-ADMM.Therefore, the efficiency of the PIP-ADMM for solving real-life applications is illustrated. In this paper, we proposed a partially inexact proximal ADMM and established pointwise and ergodic iteration-complexity bounds for it. The proposed method allows its first subproblem to be solved inexactly using arelative approximate criterion, whereas a stepsize parameter is added in the updating rule of the Lagrangianmultiplier in order to improve its computational performance. We presented some computational resultsillustrating the numerical advantages of the method. 14able 3: Performance of the relerr-ADMM and PIP-ADMM on 7 data sets.
Data set relerr-ADMM PIP-ADMM ( θ = 1 ) PIP-ADMM ( θ = 1 . ) PIP-ADMM ( θ = 1 . )Out Inner Time Out Inner Time Out Inner Time Out Inner TimeCNS 153 753 6545.3 153 753 6797.9 128 630 6298.5 113 564 5357.8Colon 149 596 172.2 149 596 180.5 125 500 150.5 110 464 139.0Leukemia 139 693 6264.4 139 693 6248.8 120 592 5203.9 112 563 4951.9Lung 225 1333 11676.9 225 1333 11354.4 219 1304 10910.7 215 1321 11152.5Ionosphere 54 208 0.2 54 208 0.2 42 162 0.2 35 142 0.1Secom 21 122 15.0 21 121 15.0 17 97 13.5 15 89 12.4Spambase 47 212 29.7 47 212 29.8 37 168 25.7 30 147 22.4 References [1] Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation.Comput. Math. Appl. , 17–40 (1976).[2] Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par penalisation-dualité, d’uneclasse de problèmes de Dirichlet non linéaires. (1975).[3] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternatingdirection method of multipliers. Found. Trends Mach. Learn. (1), 1–122 (2011).[4] Attouch, H., Soueycatt, M.: Augmented Lagrangian and proximal alternating direction methods of multipliers in Hilbert spaces.Applications to games, PDE’s and control. Pac. J. Optim. (1), 17–37 (2008).[5] Xu, M.H.: Proximal alternating directions method for structured variational inequalities. J. Optim. Theory Appl. (1), 107–117(2007).[6] He, B., Yuan, X.: On the O (1 /n ) convergence rate of the Douglas-Rachford alternating direction method. SIAM Journal on Numer.Anal. (2), 700–709 (2012).[7] He, B., Yuan, X.: On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numer. Math. (1), 103–118 (2002).[11] Cui, Y., Li, X., Sun, D., Toh, K.C.: On the convergence properties of a majorized ADMM for linearly constrained convex optimiza-tion problems with coupled objective functions. J. Optim. Theory Appl. (3), 1013–1041 (2016).[12] Gu, Y., Jiang, B., Deren, H.: A semi-proximal-based strictly contractive Peaceman-Rachford splitting method. Available onhttps://arxiv.org/pdf/1506.02221.[13] Eckstein, J., Yao, W.: Approximate ADMM algorithms derived from Lagrangian splitting. Comput. Optim. Appl. (2), 363–405(2017).[14] Eckstein, J., Yao, W.: Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Math.Programming (2017).[15] Xie, J., Liao, A., Yang, X.: An inexact alternating direction method of multipliers with relative error criteria. Optim. Lett. (3),583–596 (2017).[16] Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotoneoperators. Math. Programming (3, Ser. A), 293–318 (1992).[17] Eckstein, J., Silva, P.J.S.: A practical relative error criterion for augmented Lagrangians. Math. Programming (1), 319–348(2013).[18] Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximalmonotone operator. Set-Valued Anal. (4), 323–345 (1999).[19] Solodov, M.V., Svaiter, B.F.: A hybrid projection-proximal point algorithm. J. Convex Anal. (1), 59–70 (1999).[20] Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method ofmultipliers. SIAM J. Optim. (1), 475–507 (2013).[21] Adona, V.A., Gonçalves, M.L.N., Melo, J.G.: Iteration-complexity analysis of a generalized alternating direction method of multi-pliers. arXiv preprint arXiv:1705.06191 (2017).
22] Gonçalves, M.L.N.: On the pointwise iteration-complexity of a dynamic regularized ADMM with over-relaxation stepsize. arXivpreprint arXiv:1705.03097 (2017).[23] Gonçalves, M.L.N., Melo, J.G., Monteiro, R.D.C.: Improved pointwise iteration-complexity of a regularized ADMM and of aregularized non-euclidean HPE framework. SIAM J. Optim. (1), 379–407 (2017).[24] Gonçalves, M.L.N., Alves, M.M., Melo, J.G.: Pointwise and ergodic convergence rates of a variable metric proximal alternatingdirection method of multipliers. J. Optim. Theory Appl. (2018).[25] BoŢ, R.I., Csetnek, E.R.: ADMM for monotone operators: convergence analysis and rates. Available onhttps://arxiv.org/abs/1705.01913.[26] Hager, W.W., Yashtini, M., Zhang, H.: An O (1 /k ) convergence rate for the variable stepsize Bregman operator splitting algorithm.SIAM J. Numer. Anal. (3), 1535–1556 (2016).[27] Fang, E.X., Bingsheng, H., Liu, H., Xiaoming, Y.: Generalized alternating direction method of multipliers: new theoretical insightsand applications. Math. Prog. Comp. (2), 149–187 (2015).[28] Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers forconvex minimization. SIAM J. Optim. (1), 269–297 (2014).[29] Rockafellar, R.T.: On the maximal monotonicity of subdifferential mappings. Pacific J. Math. , 209–216 (1970).[30] Burachik, R.S., Sagastizábal, C.A., Svaiter, B.F.: ǫ -enlargements of maximal monotone operators: theory and applications. In:Reformulation: nonsmooth, piecewise smooth, semismooth and smoothing methods (Lausanne, 1997), Appl. Optim. , vol. 22, pp.25–43. Kluwer Acad. Publ., Dordrecht (1999).[31] Burachik, R.S., Iusem, A.N., Svaiter, B.F.: Enlargement of monotone operators with applications to variational inequalities. Set-Valued Anal. (2), 159–180 (1997).[32] Monteiro, R.D.C., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodicmean. SIAM J. Optim. (6), 2755–2787 (2010).[33] Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1), 267–288 (1996).[34] Tibshirani, R.J.: The lasso problem and uniqueness. Electron. J. Statist. , 1456–1490 (2013).[35] Nocedal, J., Wright, S.J.: Numerical Optimization 2nd. Springer, New York (2006).[36] Cano, A., Masegosa, A., Moral, S.: ELVIRA biomedical data set repository. http://leo.ugr.es/elvira/DBCRepository/ (2005).[37] Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealedby clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy ofSciences (12), 6745–6750 (1999).[38] Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black,P.M., Lau, C., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature (6870),436 (2002).[39] Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.,et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell (2), 203–209 (2002).[40] Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri,M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expressionmonitoring. Science (5439), 531–537 (1999).[41] Beer, D.G., Kardia, S.L.R., Huang, C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G.,et al.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine (8), 816 (2002).[42] Koh, K., Kim, S., Boyd, S.: An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine LearningResearch (Jul), 1519–1555 (2007).[43] Dheeru, D., Karra, T.E.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2017).[44] Sigillito, V.G., Wing, S.P., Hutton, L.V., Baker, K.B.: Classification of radar returns from the ionosphere using neural networks.Johns Hopkins APL Tech. Dig vol. 10 , 262–266 (1989)., 262–266 (1989).