[PDF] On Error Bounds and Multiplier Methods for Variational Problems in Banach Spaces

Abstract

This paper deals with a general form of variational problems in Banach spaces which encompasses variational inequalities as well as minimization problems. We prove a characterization of local error bounds for the distance to the (primal-dual) solution set and give a sufficient condition for such an error bound to hold. In the second part of the paper, we consider an algorithm of augmented Lagrangian type for the solution of such variational problems. We give some global convergence properties of the method and then use the error bound theory to provide estimates for the rate of convergence and to deduce boundedness of the sequence of penalty parameters. Finally, numerical results for optimal control, Nash equilibrium problems, and elliptic parameter estimation problems are presented.

Full PDF

OOn Error Bounds and Multiplier Methods forVariational Problems in Banach Spaces ∗ Christian Kanzow † Daniel Steck † March 27, 2018

Abstract.

This paper deals with a general form of variational problems in Banach spaceswhich encompasses variational inequalities as well as minimization problems. We prove acharacterization of local error bounds for the distance to the (primal-dual) solution set andgive a suﬃcient condition for such an error bound to hold. In the second part of the paper,we consider an algorithm of augmented Lagrangian type for the solution of such variationalproblems. We give some global convergence properties of the method and then use the errorbound theory to provide estimates for the rate of convergence and to deduce boundedness of thesequence of penalty parameters. Finally, numerical results for optimal control, Nash equilibriumproblems, and elliptic parameter estimation problems are presented.

Keywords.

Variational problem, variational inequality, error bound, augmented Lagrangianmethod, local convergence, global convergence, Nash equilibrium problem.

AMS subject classiﬁcations.

This paper deals with the following variational problem:Find x ∈ M such that (cid:104) F ( x ) , v (cid:105) ≥ ∀ v ∈ T M ( x ) , (1)where M ⊆ X is a nonempty closed set, X a real Banach space, and F : X → X ∗ a givenmapping. The set T M ( x ) denotes the (Bouligand) tangent cone [9] to M at x . If M isadditionally convex, then (1) is equivalent toFind x ∈ M such that (cid:104) F ( x ) , y − x (cid:105) ≥ ∀ y ∈ M, (2)which is often regarded as the standard form of a variational inequality (VI). Throughoutthis paper, we will use the terms “variational inequality” and “variational problem”interchangeably, and often refer to (1) as a VI. Note that, in the absence of convexity, (1) ∗ This research was supported by the German Research Foundation (DFG) within the priority program“Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and HierarchicalOptimization” (SPP 1962) under grant number KA 1296/24-1. † University of W¨urzburg, Institute of Mathematics, Campus Hubland Nord, Emil-Fischer-Str. 30,97074 W¨urzburg, Germany; { kanzow,daniel.steck } @mathematik.uni-wuerzburg.de a r X i v : . [ m a t h . O C ] J u l s the canonical formulation of variational problems; in particular, this form encompassesﬁrst-order necessary conditions for nonlinear optimization problems of the typemin f ( x ) s.t. x ∈ M (3)by choosing F := f (cid:48) . Throughout this paper, we assume that M is given in the form M = { x ∈ X : g ( x ) ∈ K } , (4)where g : X → H is a given mapping, H a real Hilbert space, and K ⊆ H a nonemptyclosed convex set (not necessarily a cone). We make no blanket convexity assumptionson g (although some of our results do pertain to the convex case). Hence, the set M isnonconvex in general, and (1) is the natural framework for our setting.Variational inequalities are a well-known and popular class in both ﬁnite and inﬁnite-dimensional optimization since they unify various problem types such as constrainedminimization and equilibrium-type problems, in particular Nash and (certain) generalizedNash equilibrium problems [16, 17, 21, 30, 38]. This opens up a broad spectrum of applica-tions including optimal control, parameter estimation, diﬀerential games, and problems inmechanics or shape optimization. Many further applications are given in [3, 25, 26, 42].As a result, VIs have gained considerable attention in the literature and a variety ofalgorithms have been developed for their solution, e.g. [18, 23, 24, 52].On the other hand, the augmented Lagrangian method (ALM, also called multiplier-penalty method or simply multiplier method) is one of the classical methods for nonlinearoptimization, see [11, 28, 47–49] and the textbooks [5, 45]. In recent years, ALMs haveseen a certain resurgence [1, 2, 6–8] in the form of modiﬁed methods which use a slightlydiﬀerent update of the Lagrange multiplier and turn out to have very strong globalconvergence properties [8]. A comparison of the classical and modiﬁed ALMs is givenin [39]. We also note that ALMs have been generalized to VIs in ﬁnite dimensions [2] and toinﬁnite-dimensional optimization problems in certain restricted settings [29, 33–36, 41, 54].However, most of these papers either consider rather speciﬁc problem settings [29, 33–36]or deal with global convergence properties only [41].The main purpose of the present paper is to analyze the local convergence propertiesof ALMs for variational inequalities in the general (possibly inﬁnite-dimensional) setting(1). To accomplish this, we will need certain elements of perturbation and error boundtheory for generalized equations and KKT systems, some of which are reﬁnements of thecorresponding results in ﬁnite dimensions [12, 14, 20, 37]. Using these, we will prove that,given a KKT point which admits a primal-dual error bound, the ALM converges locallyto this point with a rate of convergence that is essentially 1 /ρ k (where ρ k is the penaltyparameter), and that { ρ k } remains bounded if updated suitably.Suﬃcient conditions for the primal-dual error bound include a suitable second-ordersuﬃcient condition (SOSC) together with a strict version of the Robinson constraintqualiﬁcation (see Section 2). These assumptions are akin to those used in [6] for ALMs inﬁnite-dimensional nonlinear programming (NLP), where the authors obtain results similarto ours. Interestingly, however, it turns out that these results (for standard NLP) can beestablished under SOSC only [19] by using the speciﬁc structure of the constraints. Inparticular, when transferred to our notation, the set K arising from NLP is polyhedraland this yields, roughly speaking, the dual part of the error bound without any constraint2ualiﬁcation [19,37]. However, apart from the NLP setting, polyhedrality is a rare propertywhich is usually violated, e.g. in optimal control or semideﬁnite programming. As a result,SOSC alone does not yield a primal-dual error bound, see the example in Section 3. Wesolve this issue by using SOSC together with a suitable constraint qualiﬁcation.The paper is organized as follows. We start with some preliminary material in Section2 and give some results on primal-dual error bounds in Section 3. Section 4 contains aprecise statement of our algorithm and we continue with some global convergence resultsin Section 5. In Section 6, we prove the main results of this paper, i.e. local convergenceof the ALM under the error bound hypothesis. We then give some numerical results inSection 7 and ﬁnal remarks in Section 8. Notation:

Throughout the paper, X is always a real Banach space, H a real Hilbertspace, and their duals are denoted by X ∗ and H ∗ , the latter of which we usually identifywith H . Fr´echet-derivatives are denoted by a prime (cid:48) or by D x if the variable is emphasized,and we use the abbreviation lsc for lower semicontinuity. Strong and weak convergence aredenoted by → and (cid:42) , respectively. Duality pairings are written as (cid:104)· , ·(cid:105) , scalar productsas ( · , · ), and norms are denoted by (cid:107) · (cid:107) with an appropriate subscript to emphasize thecorresponding space (e.g. (cid:107) · (cid:107) X ). If S is a nonempty subset of some normed space, wewrite d S = dist( · , S ) for the distance to S . Additionally, if S ⊆ H is closed and convex,we write P S for the projection onto S . This section is dedicated to establishing some preliminary results as well as ﬁxing thesetting we will consider later. Recall that the set M is given by the formula (4) with anonempty closed convex set K ⊆ H . If S is a nonempty closed subset of some space Z , then S ◦ := { ψ ∈ Z ∗ : (cid:104) ψ, s (cid:105) ≤ ∀ s ∈ S } denotes the polar cone of S . If Z is a Hilbert space, we of course treat S ◦ as a subset of Z . Moreover, if x ∈ S is a given point, we denote by T S ( x ) := (cid:8) d ∈ Z : ∃ x k → x, t k ↓ x k ∈ S and ( x k − x ) /t k → d (cid:9) the tangent cone of S at x . If S is additionally convex, we also deﬁne the normal cone N S ( x ) := { ψ ∈ Z ∗ : (cid:104) ψ, y − x (cid:105) ≤ ∀ y ∈ S } = T S ( x ) ◦ . If x / ∈ S , we deﬁne T S ( x ) and N S ( x ) to be empty. Note that, if S is a convex set, then T S ( x ) and N S ( x ) are closed convex cones for all x ∈ S .Recall that the constraint system of the VI is given by g ( x ) ∈ K with K ⊆ H a nonempty closed convex set. A natural question is what the appropriate notion ofconvexity is in this general setting. In particular, we would like to give suﬃcient conditionsfor the convexity of the feasible set M . To this end, consider the recession cone K ∞ := { y ∈ H : y + K ⊆ K } . (5)3t is well-known that K ∞ is a nonempty closed convex cone [4, 9]. If K itself is a cone,then K ∞ = K . We associate with K (and K ∞ ) the (partial) order relation y ≤ K z : ⇐⇒ z − y ∈ K ∞ . (6)Note that we use the notation ≤ K for the sake of convenience, even though the orderis actually induced by the cone K ∞ . We also note that K ∞ may not be pointed (thatis, K ∞ ∩ ( − K ∞ ) may contain a nonzero element) and, hence, the relation ≤ K does notnecessarily satisfy the antisymmetry property a ≤ K b ∧ b ≤ K a = ⇒ a = b. In the terminology of order theory, this makes ≤ K a so-called preorder. We will simplycall it an order relation due to the descriptiveness of the term. Note also that, throughoutthis paper, the symbol ≤ without any index is always the standard ordering in R .The order relation (6) allows us to extend various familiar concepts from ﬁnite-dimensional optimization to our setting. For instance, we say that g is convex if g ( αx + (1 − α ) y ) ≤ K αg ( x ) + (1 − α ) g ( y )holds for all x, y ∈ X and α ∈ [0 , d K : H → R is decreasing since z ≥ K y implies z = y + k , k ∈ K ∞ , and d K ( z ) = d K ( y + k ) ≤ (cid:107) y + k − ( P K ( y ) + k ) (cid:107) = (cid:107) y − P K ( y ) (cid:107) = d K ( y ) , where the inequality uses the fact that P K ( y ) + k ∈ K by deﬁnition of K ∞ . Some otherresults pertaining to convexity, concavity, etc. are given in the following lemma. Notethat, in the context of our constraint set (4) with g ( x ) ∈ K , it is more natural to considerconcavity of g with respect to the ordering (6) as opposed to convexity. Lemma 2.1.

Assume that g : X → H is concave. If m : H → R is convex and decreasing,then m ◦ g is convex. In particular: (a) The function d K ◦ g : X → R is convex. (b) If λ ∈ K ◦∞ , then x (cid:55)→ ( λ, g ( x )) is convex. (c) The set M = { x ∈ X : g ( x ) ∈ K } is convex.Proof. Let x, y ∈ X and x α = αx + (1 − α ) y , α ∈ [0 , g ( x α ) ≥ K αg ( x ) +(1 − α ) g ( y )by the concavity of g . Applying m on both sides yields m ( g ( x α )) ≤ m ( αg ( x ) + (1 − α ) g ( y )) ≤ αm ( g ( x )) + (1 − α ) m ( g ( y )) , where we used the monotonicity and the convexity of m . Hence, m ◦ g is convex. Assertion(a) now follows because d K is decreasing (see above) and convex [4, Cor. 12.12]. Similarly,for (b), the function y (cid:55)→ ( λ, y ) with λ ∈ K ◦∞ is obviously a convex function, and it isdecreasing because ( λ, k ) ≤ k ∈ K ∞ . Finally, for (c), note that M = { x ∈ X : g ( x ) ∈ K } = { x ∈ X : d K ( g ( x )) ≤ } . Hence, M is a lower level set of the convex function d K ◦ g and therefore a convex set.4ote that the extreme case K ∞ = { } can occur, e.g. if K is bounded. In this case,monotonicity becomes trivial and convexity and concavity reduce to linearity.It is possible to characterize K ◦∞ by means of the so-called barrier cone to K , see [4].Here, we will only need the following observation. Lemma 2.2. If y ∈ H , then y − P K ( y ) ∈ K ◦∞ .Proof. Let k ∈ K be ﬁxed and let z ∈ K ∞ , α ≥

0. Then αz ∈ K ∞ and therefore αz + k ∈ K . A standard projection inequality yields ( y − P K ( y ) , αz + k − P K ( y )) ≤ α if ( y − P K ( y ) , z ) >

0. Hence, ( y − P K ( y ) , z ) ≤ We now turn to the variational inequality (1) and discuss its KKT conditions. Startingwith this section, we assume that the mapping F is continuously diﬀerentiable and that g is twice continuously diﬀerentiable. Consider now the Lagrange function L : X × H → X ∗ , L ( x, λ ) := F ( x ) + g (cid:48) ( x ) ∗ λ. (7)Note that, if the VI originates from a minimization problem, then L is actually thederivative of the conventional Lagrange function. The following are the standard ﬁrst-order conditions which we will use throughout this paper. Deﬁnition 2.3.

A tuple (¯ x, ¯ λ ) ∈ X × H is a KKT point of (1), (4) if L (¯ x, ¯ λ ) = 0 and ¯ λ ∈ N K ( g (¯ x )) . (8)We call ¯ x ∈ X a stationary point if (¯ x, ¯ λ ) is a KKT point for some ¯ λ ∈ H , and denote by M (¯ x ) the corresponding set of multipliers.Note that ¯ λ ∈ N K ( g (¯ x )) implies g (¯ x ) ∈ K , since otherwise the normal cone would beempty. Moreover, we remark that, if K is a cone, then ¯ λ ∈ N K ( g (¯ x )) is equivalent to thecomplementarity conditions g (¯ x ) ∈ K , ¯ λ ∈ K ◦ , and (cid:0) ¯ λ, g (¯ x ) (cid:1) = 0, see [9, Ex. 2.62].The relationship between the VI and its KKT conditions is given as follows: if ¯ x solvesthe VI and a suitable constraint qualiﬁcation holds in ¯ x , then there exists a multiplier ¯ λ such that (¯ x, ¯ λ ) is a KKT point [9, Remark 5.8]. On the other hand, it is easy to see thatthe KKT conditions are always suﬃcient for the VI (1), even if M is nonconvex. Thisresult is contained in the following theorem and crucially depends on the fact that the VIuses the tangent cone T M and not M itself. Theorem 2.4. If (¯ x, ¯ λ ) is a KKT point of the VI, then ¯ x is a solution of the VI.Proof. Let (¯ x, ¯ λ ) be a KKT point and d ∈ T M (¯ x ). Then d = lim k →∞ ( x k − ¯ x ) /t k with { x k } ⊆ M , x k → ¯ x , and t k ↓

0. Hence, (cid:104) F (¯ x ) , d (cid:105) = (cid:28) − g (cid:48) (¯ x ) ∗ ¯ λ, lim k →∞ x k − ¯ xt k (cid:29) = − lim k →∞ t k (cid:0) ¯ λ, g (cid:48) (¯ x )( x k − ¯ x ) (cid:1) . But g (cid:48) (¯ x )( x k − ¯ x ) = g ( x k ) − g (¯ x ) + o ( t k ) and therefore (cid:104) F (¯ x ) , d (cid:105) = − lim k →∞ t k (cid:0) ¯ λ, g ( x k ) − g (¯ x ) (cid:1) ≥ , where we used ¯ λ ∈ N K ( g (¯ x )) and g ( x k ) ∈ K for all k .5or a given KKT point (¯ x, ¯ λ ) and η ≥

0, we deﬁne the extended critical cone C η (¯ x ) := (cid:8) d ∈ X : (cid:104) F (¯ x ) , d (cid:105) ≤ η (cid:107) d (cid:107) X , g (cid:48) (¯ x ) d ∈ T K ( g (¯ x )) (cid:9) . The following is the second-order condition which we will use throughout this paper.

Deﬁnition 2.5.

Let (¯ x, ¯ λ ) be a KKT point of the VI. We say that the second-ordersuﬃcient condition (SOSC) holds in (¯ x, ¯ λ ) if there are η, c > (cid:10) D x L (¯ x, ¯ λ ) d, d (cid:11) ≥ c (cid:107) d (cid:107) X for all d ∈ C η (¯ x ) . Note that we use the terminology “second-order suﬃcient condition” mainly for the sakeof consistency with a similar condition from nonlinear optimization, e.g. [9, Def. 3.60].For variational problems such as (1), there is actually no need for suﬃciency conditionsto complement the KKT system because the latter always implies that ¯ x is a solution ofthe VI (see Theorem 2.4).Let us also note that Deﬁnition 2.5 is slightly diﬀerent from the second-order suﬃcientcondition for nonlinear optimization [9, Def. 3.60] because our extended critical coneis slightly smaller. However, under the Robinson constraint qualiﬁcation (see belowand [9, Def. 2.86]), the corresponding second-order conditions coincide [9, Remark 3.68].Moreover, and more importantly, our subsequent analysis will be based on [9, Thm. 5.9]which directly uses the “smaller” critical cone together with the following condition. Deﬁnition 2.6.

Let (¯ x, ¯ λ ) be a KKT point of the VI, and K := (cid:8) y ∈ K : (cid:0) ¯ λ, y − g (¯ x ) (cid:1) =0 (cid:9) . We say that the strict Robinson condition (SRC) holds in (¯ x, ¯ λ ) if0 ∈ int (cid:0) g (¯ x ) + g (cid:48) (¯ x ) X − K (cid:1) . (9)Note that the standard Robinson constraint qualiﬁcation arises if we replace K in (9) bythe larger set K . Hence, SRC is stronger than the Robinson constraint qualiﬁcation andthe equivalent regularity condition of Zowe and Kurcyusz [55]. On the other hand, SRCimplies the uniqueness of ¯ λ and is weaker than the surjectivity of g (cid:48) (¯ x ), which is a typicalregularity assumption for inﬁnite-dimensional problems.It should be noted that the deﬁnition of SRC presupposes the existence of ¯ λ andtherefore depends not only on the constraints but also on the function F . Hence, werefrain from calling (9) a constraint qualiﬁcation (in contrast to [9], where SRC is calledthe strict constraint qualiﬁcation ). A similar condition which is occasionally used inthe ﬁnite-dimensional literature is the strict Mangasarian-Fromovitz condition [6, 22, 43].This condition turns out to be a special case of SRC [9, Remark 4.49] and is also not aconstraint qualiﬁcation [53]. Recall that the KKT conditions of the VI are given by L (¯ x, ¯ λ ) = 0 and ¯ λ ∈ N K ( g (¯ x )) , x, ¯ λ ) ∈ X × H . The last condition is well-known [4, Prop. 6.46] to be equivalentto g (¯ x ) = P K ( g (¯ x ) + ¯ λ ). This suggests deﬁning the residual mapping σ ( x, λ ) := (cid:107)L ( x, λ ) (cid:107) X ∗ + (cid:107) g ( x ) − P K ( g ( x ) + λ ) (cid:107) H . (10)Clearly, the KKT conditions of the VI are equivalent to σ (¯ x, ¯ λ ) = 0. We will use thisrelationship to construct suitable error bounds for the primal-dual variables.In order to establish the error bound we are looking for, we ﬁrst need a characterizationof local error bounds in terms of a local upper Lipschitz property (or calmness) of theKKT system. This result has appeared in various forms in the literature [12, 20, 37], albeitmostly in a ﬁnite-dimensional setting. In our notation, it involves certain perturbationsof the KKT system (8) with a parameter pair p = ( α, β ) ∈ X ∗ × H . Without loss ofgenerality, we equip this product space with the norm (cid:107) ( α, β ) (cid:107) X ∗ × H := (cid:107) α (cid:107) X ∗ + (cid:107) β (cid:107) H .Recall also that M (¯ x ) denotes the set of Lagrange multipliers corresponding to ¯ x . Theorem 3.1.

Let (¯ x, ¯ λ ) ∈ X × H be a KKT point of the VI. Then the following assertionsare equivalent: (a) There are a neighborhood U of ¯ x and c > such that, for all p = ( α, β ) ∈ X ∗ × H close to (0 , , any solution ( x p , λ p ) ∈ U × H of the perturbed KKT system L ( x, λ ) = α, λ ∈ N K ( g ( x ) − β ) (11) satisﬁes the estimate (cid:107) x p − ¯ x (cid:107) X + dist( λ p , M (¯ x )) ≤ c (cid:107) p (cid:107) X ∗ × H . (b) There are a neighborhood U of ¯ x and c > such that, for all ( x, λ ) ∈ U × H with σ ( x, λ ) suﬃciently small, (cid:107) x − ¯ x (cid:107) X + dist( λ, M (¯ x )) ≤ cσ ( x, λ ) . Proof. (b) ⇒ (a): Let p = ( α, β ) ∈ X ∗ × H . It is an easy consequence of [4, Cor. 4.10] thatthe mapping y (cid:55)→ y − P K ( y + λ p ) is nonexpansive. Hence, we obtain the inequality (cid:107) g ( x p ) − P K ( g ( x p ) + λ p ) (cid:107) H ≤ (cid:107) β (cid:107) H + (cid:107) g ( x p ) − β − P K ( g ( x p ) − β + λ p ) (cid:107) H . Since λ p ∈ N K ( g ( x p ) − β ), the last term is equal to zero [4, Prop. 6.46] and we obtain σ ( x p , λ p ) ≤ (cid:107) α (cid:107) X ∗ + (cid:107) β (cid:107) H = (cid:107) p (cid:107) X ∗ × H . Choosing p = ( α, β ) suﬃciently close to 0, we seethat σ ( x p , λ p ) becomes arbitrarily small. Hence, we can apply (b) and obtain (cid:107) x p − ¯ x (cid:107) X + dist( λ p , M (¯ x )) ≤ cσ ( x p , λ p ) ≤ c (cid:107) p (cid:107) X ∗ × H . (a) ⇒ (b): Shrinking U if necessary, we may assume that (cid:107) g (cid:48) ( x ) ∗ (cid:107) L ( H,X ∗ ) ≤ c for all x ∈ U with some constant c ≥

0. Let ( x, λ ) ∈ U × H , set δ := σ ( x, λ ), and deﬁneˆ g := P K ( g ( x ) + λ ) , ˆ λ := g ( x ) + λ − ˆ g. Now, let α := L ( x, ˆ λ ) and β := g ( x ) − ˆ g . Then ˆ λ ∈ N K (ˆ g ) and, hence, ( x, ˆ λ ) solvesthe perturbed KKT system corresponding to σ := ( α, β ). Moreover, we have (cid:107) β (cid:107) H = (cid:107) ˆ g − g ( x ) (cid:107) H = (cid:107) g ( x ) − P K ( g ( x ) + λ ) (cid:107) H ≤ δ and (cid:107) ˆ λ − λ (cid:107) H = (cid:107) β (cid:107) H ≤ δ . This implies (cid:107) σ (cid:107) X ∗ × H = (cid:107)L ( x, ˆ λ ) (cid:107) X ∗ + (cid:107) β (cid:107) H ≤ (cid:107)L ( x, λ ) (cid:107) X ∗ + ( c + 1) (cid:107) β (cid:107) H ≤ ( c + 2) δ. δ = σ ( x, λ ) is small enough, then σ becomes arbitrarily close to 0. We cantherefore apply (a) to ( x, ˆ λ ) and obtain (cid:107) x − ¯ x (cid:107) X + dist(ˆ λ, M (¯ x )) ≤ c (cid:107) σ (cid:107) X ∗ × H ≤ c ( c + 2) δ. But (cid:107) ˆ λ − λ (cid:107) H ≤ δ and, hence, dist(ˆ λ, M (¯ x )) ≥ dist( λ, M (¯ x )) − δ by the nonexpansivenessof the distance function. This ﬁnally yields (cid:107) x − ¯ x (cid:107) X + dist( λ, M (¯ x )) ≤ (cid:2) c ( c + 2) + 1 (cid:3) δ, and the proof is complete.Let us stress that the distance estimate provided by the above theorem holds if x isclose to ¯ x ; in particular, no assumption on the proximity of λ to M (¯ x ) is necessary. Wealso remark that (a) does not make any assertion about the existence of solutions to theperturbed KKT conditions (11). These may have solutions for some but not all σ .Theorem 3.1 is our main tool for establishing local error bounds for the distance of( x, λ ) to the primal-dual solution set in terms of the residual mapping σ . To verify suchan error bound, we only need to prove property (a) of the theorem. The following resultdoes precisely that and is based on the perturbation theory from [9]. Theorem 3.2.

Assume that (¯ x, ¯ λ ) is a KKT point which satisﬁes SOSC and the strictRobinson condition. Then M (¯ x ) = { ¯ λ } and there is a c > such that, for all ( x, λ ) ∈ X × H with x suﬃciently close to ¯ x and σ ( x, λ ) suﬃciently small, (cid:107) x − ¯ x (cid:107) X + (cid:107) λ − ¯ λ (cid:107) H ≤ cσ ( x, λ ) . (12) Proof.

The uniqueness of ¯ λ follows as in [9, Prop. 4.47], see also the discussion in Section5.1.2 of that reference. For the error bound result, we essentially need to apply [9, Thm.5.9] and Theorem 3.1. Since some technical details need to be considered, we give a formalproof here. To this end, assume that the error bound in question does not hold. Thenproperty (a) from Theorem 3.1 does not hold either; hence, there are sequences x k → ¯ x , { λ k } ⊆ H and { σ k } ⊆ X ∗ × H with σ k = ( α k , β k ) → k , ( x k , λ k )satisﬁes the perturbed KKT conditions (11) corresponding to σ k , and (cid:107) x k − ¯ x (cid:107) X + (cid:107) λ k − ¯ λ (cid:107) H ≥ k (cid:107) σ k (cid:107) X ∗ × H . (13)Now, let F ( x, σ ) := F ( x ) − α and G ( x, σ ) := g ( x ) − β for σ = ( α, β ) ∈ X ∗ × H . Then( x k , λ k ) satisﬁes F ( x k , σ k ) + D x G ( x k , σ k ) ∗ λ k = 0 , λ k ∈ N K ( G ( x k , σ k ))for all k . Applying [9, Thm. 5.9] yields a contradiction to (13).The function σ is locally Lipschitz-continuous with respect to ( x, λ ), and globally so withrespect to λ . Hence, we can extend the one-sided error bound (12) to c σ ( x, λ ) ≤ (cid:107) x − ¯ x (cid:107) X + (cid:107) λ − ¯ λ (cid:107) H ≤ c σ ( x, λ ) (14)for suitable constants c , c > x, λ ) ∈ X × H with x near ¯ x .8or certain problem classes, it is possible to establish error bounds under weakerassumptions than those given above. The most important example in this direction is ifthe set K is (generalized) polyhedral, e.g. in nonlinear programming. Roughly speaking,one can use Hoﬀman’s lemma [9, Thm. 2.200] to get the “dual part” of the error bound forfree, while the primal part again follows from SOSC. As a result, one obtains a primal-dualerror bound under SOSC alone (with the restriction that the multiplier is not necessarilyunique). Unsurprisingly, this result does not extend to the non-polyhedral case, whichshows that additional assumptions such as SRC are inevitable. Example 3.3.

Let X := H := (cid:96) ( R ) be the space of square-summable real sequences.Consider the optimization problem (3), (4) with f ( x ) := (cid:107) x (cid:107) X / g ( x ) := ( x i /i ) ∞ i =1 , and K the nonnegative cone in X . It is easy to see that (¯ x, ¯ λ ) := (0 ,

0) is the unique KKTpoint of this problem, and that SOSC holds. Now, let x k := e k /k and λ k := − e k , where { e k } is the sequence of unit vectors. Then σ ( x k , λ k ) = (cid:107)L ( x k , λ k ) (cid:107) X ∗ + (cid:107) g ( x k ) − P K ( g ( x k ) + λ k ) (cid:107) H = k − for all k . Moreover, x k → ¯ x , but λ k (cid:54)→ ¯ λ . Hence, a local error bound does not hold. (Inparticular, SRC cannot hold, even though the Lagrange multiplier is actually unique.)A slightly diﬀerent example is obtained by setting ˆ x k := e k /k and ˆ λ k := − e k /k . In thiscase, (ˆ x k , ˆ λ k ) → (¯ x, ¯ λ ), but an easy calculation shows that σ (ˆ x k , ˆ λ k ) = k − and (cid:107) ˆ x k − ¯ x (cid:107) X + (cid:107) ˆ λ k − ¯ λ (cid:107) H = k − + k − . In particular, the error bound is violated even if the multiplier is close to ¯ λ .We close this section by noting that the error bound in Theorem 3.2 necessarily impliesthat the Lagrange multiplier ¯ λ is unique. It is natural to ask whether suﬃcient conditionscan be established which guarantee the error bound property with a nonunique multiplier(as in the statement of Theorem 3.1). However, it turns out that the resulting conditionsare often of technical nature, see [9, Thm. 4.51], and not easily veriﬁed for commonproblem classes. Therefore, and since the case covered by Theorem 3.2 suﬃces for ourapplications, we restrict ourselves to the situation where ¯ λ is unique. We now present the augmented Lagrangian method for the variational inequality (1). Themain approach is to penalize the function g and therefore reduce the VI to a sequence of(unconstrained) nonlinear equations. Consider the augmented Lagrangian L ρ : X × H → X ∗ , L ρ ( x, λ ) := F ( x ) + ρg (cid:48) ( x ) ∗ (cid:20) g ( x ) + λρ − P K (cid:18) g ( x ) + λρ (cid:19)(cid:21) . (15)Note that, if K is a cone, then we can simplify the above formula to L ρ ( x, λ ) = F ( x ) + g (cid:48) ( x ) ∗ P K ◦ ( λ + ρg ( x )) by using Moreau’s decomposition [4, 44].For the construction of our algorithm, we will need a means of controlling the penaltyparameters. To this end, we deﬁne the utility function V ( x, λ, ρ ) := (cid:107)L ρ ( x, λ ) (cid:107) X ∗ + (cid:13)(cid:13)(cid:13)(cid:13) g ( x ) − P K (cid:18) g ( x ) + λρ (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) H . (16)9his function requires some elaboration. The ﬁrst term in (16) measures the precision withwhich the subproblem was solved in the current iteration. The second term is a compositemeasure of feasibility and complementarity; it arises from an inherent slack variabletransformation which is often used to deﬁne the augmented Lagrangian for inequalityor cone constraints. As a result, the function V measures optimality, feasibility andcomplementarity at the current iterate. Algorithm 4.1 (Augmented Lagrangian method) . (S.0) Let ( x , λ ) ∈ X × H , B ⊆ H bounded, ρ > , γ > , τ ∈ (0 , , and set k := 0 . (S.1) If ( x k , λ k ) satisﬁes a suitable termination criterion: STOP. (S.2) Choose w k ∈ B and compute an inexact zero (see below) x k +1 of L ρ k ( · , w k ) . (S.3) Update the vector of multipliers to λ k +1 := ρ k (cid:20) g ( x k +1 ) + w k ρ k − P K (cid:18) g ( x k +1 ) + w k ρ k (cid:19)(cid:21) . (17)(S.4) If k = 0 or V ( x k +1 , w k , ρ k ) ≤ τ V ( x k , w k − , ρ k − ) (18) holds, set ρ k +1 := ρ k ; otherwise, set ρ k +1 := γρ k . (S.5) Set k ← k + 1 and go to (S.1) . Let us make some simple observations. First, regardless of the primal iterates { x k } , themultipliers { λ k } always lie in the polar cone K ◦∞ by Lemma 2.2. Moreover, if K is a cone,then the Moreau decomposition [44] implies that λ k +1 = P K ◦ ( w k + ρ k g ( x k +1 )).Secondly, we note that Algorithm 4.1 uses a safeguarded multiplier sequence { w k } in certain places where classical augmented Lagrangian methods use the sequence { λ k } .This bounding scheme goes back to [1, 46] and is crucial to establishing strong globalconvergence results for the method [1, 7, 8, 41]. In practice, one usually tries to keep w k as“close” as possible to λ k , e.g. by deﬁning w k := P B ( λ k ), where B (the bounded set fromthe algorithm) is chosen suitably to allow cheap projections.The third observation is that if the sequence of penalty parameters { ρ k } remainsbounded, then (18) yields V ( x k +1 , w k , ρ k ) →

0. In this case, the deﬁnition of V impliesthat both the residual (cid:107)L ρ k ( x k +1 , w k ) (cid:107) X ∗ of the subproblems and the composite feasibility-complementarity measure converge to zero. Hence, from a theoretical point of view,the case of bounded { ρ k } is the “good” case. In Section 6, we will actually prove theboundedness of { ρ k } under certain assumptions, and this result crucially depends on thefact that the function V involves both terms from (16).For the remainder of this paper, we make the following assumption. Assumption 4.2.

There is a null sequence { ε k } ⊆ [0 , ∞ ) such that (cid:107)L ρ k ( x k +1 , w k ) (cid:107) X ∗ ≤ ε k +1 for all k. This assumption is fairly natural and basically asserts that x k +1 is an approximate zeropoint of L ρ k ( · , w k ), and that the degree of inexactness vanishes as k → ∞ .10 Global Convergence

In this section, we discuss the global convergence properties of Algorithm 4.1. Somegeneral results in this direction were obtained in [38, 41] for optimization and generalizedNash equilibrium problems by assuming that the sequence { x k } has a limit point whichsatisﬁes a suitable constraint qualiﬁcation.Here, we pursue a slightly diﬀerent approach. Since the constraints occurring in VIsare often convex, we can use this convexity to directly show that (weak) limit pointsare solutions of the VI. This idea has the advantage that we do not need any constraintqualiﬁcation (in return, we do not get much information on the sequence { λ k } ).Recall that we have already assumed F to be continuously diﬀerentiable and g twicecontinuously diﬀerentiable (for this section, one degree less would actually be suﬃcient).We now make the following additional assumptions. Assumption 5.1.

We assume that g is concave with respect to K ∞ (see Section 2) andthat (cid:104) F ( x ) , x − y (cid:105) is weakly sequentially lsc with respect to x for all y ∈ X .The ﬁrst of the above conditions ensures the convexity of the set M , see Lemma 2.1. Thesecond assumption implies, roughly speaking, that weak limit points of a sequence of“approximate solutions” of the VI are exact solutions. Note that this condition has alsobeen used in certain existence results for VIs [31]. Lemma 5.2.

Let Assumptions 4.2, 5.1 hold, and let ¯ x be a weak limit point of { x k } .Then ¯ x is a minimizer of the convex function d K ◦ g . In particular, if the feasible set M is nonempty, then ¯ x is feasible.Proof. Note that the function d K ◦ g is convex by Lemma 2.1 and continuous, henceweakly sequentially lower semicontinuous [4, Thm. 9.1]. If { ρ k } remains bounded, thenthe penalty updating scheme (18) implies d K ( g ( x k +1 )) ≤ (cid:13)(cid:13)(cid:13)(cid:13) g ( x k +1 ) − P K (cid:18) g ( x k +1 ) + w k ρ k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) H ≤ V ( x k +1 , w k , ρ k ) → d K ( g (¯ x )) = 0. We now assume that ρ k → ∞ and deﬁne the auxiliaryfunctions h k ( x ) = d K ( g ( x ) + w k /ρ k ). Note that h k is continuously diﬀerentiable [4, Cor.12.30]. Let x k +1 (cid:42) K ¯ x for some index set K ⊆ N and assume that there is a point y ∈ X with d K ( g ( y )) < d K ( g (¯ x )). The weak sequential lower semicontinuity of d K ◦ g and theboundedness of { w k } imply thatlim inf k ∈K h k ( x k +1 ) = lim inf k ∈K d K (cid:0) g ( x k +1 ) + w k /ρ k (cid:1) ≥ d K ( g (¯ x ))and h k ( y ) → d K ( g ( y )). Hence, there is a constant c > h k ( x k +1 ) − h k ( y ) ≥ c for all k ∈ K suﬃciently large. Since h k is convex by Lemma 2.1, it follows that (cid:10) h (cid:48) k ( x k +1 ) , y − x k +1 (cid:11) ≤ h k ( y ) − h k ( x k +1 ) ≤ − c (19)for all k ∈ K suﬃciently large. Now, let { ε k } be the sequence from Assumption 4.2.Using [4, Cor. 12.30] for the derivative of h k , we obtain − ε k +1 (cid:107) y − x k +1 (cid:107) X ≤ (cid:10) L ρ k ( x k +1 , w k ) , y − x k +1 (cid:11) = (cid:10) F ( x k +1 ) , y − x k +1 (cid:11) + ρ k (cid:10) h (cid:48) k ( x k +1 ) , y − x k +1 (cid:11) .

11y Assumption 5.1, the function (cid:104) F ( x ) , x − y (cid:105) is weakly sequentially lsc with respect to x . Hence, there is a constant c ∈ R such that (cid:10) F ( x k +1 ) , y − x k +1 (cid:11) ≤ c for all k ∈ K .This together with (19) implies − ε k +1 (cid:107) y − x k +1 (cid:107) X ≤ c − ρ k c → −∞ . Since { x k +1 } K is bounded and ε k →

0, this is a contradiction.Note that Lemma 5.2 guarantees that every weak limit point ¯ x automatically minimizesthe constraint violation even if the feasible set M is empty.We now prove the optimality of limit points. To this end, we ﬁrst need a technicallemma which essentially asserts some sort of “approximate normality” of λ k +1 with respectto K and g ( x k +1 ), the latter not necessarily being an element of K . Note that the resultdoes not require any assumptions but directly follows from the deﬁnition of λ k +1 as wellas the updating scheme (18). Lemma 5.3.

We have lim sup k →∞ (cid:0) λ k +1 , y − g ( x k +1 ) (cid:1) ≤ for all y ∈ K .Proof. Let y ∈ K and deﬁne the sequence s k +1 := P K ( g ( x k +1 ) + w k /ρ k ). Then s k +1 ∈ K and it follows from [4, Prop. 6.46] that λ k +1 ∈ N K ( s k +1 ). Moreover, we have g ( x k +1 ) = λ k +1 − w k ρ k + s k +1 . (20)This yields (cid:0) λ k +1 , y − g ( x k +1 ) (cid:1) = (cid:18) λ k +1 , y − ρ k ( λ k +1 − w k ) − s k +1 (cid:19) ≤ ρ k (cid:104)(cid:0) λ k +1 , w k (cid:1) − (cid:107) λ k +1 (cid:107) H (cid:105) , (21)where we used λ k +1 ∈ N K ( s k +1 ) for the last inequality. Now, if { ρ k } is bounded, then (18)and (20) imply (cid:107) λ k +1 − w k (cid:107) H /ρ k → (cid:107) λ k +1 − w k (cid:107) H →

0. This yields theboundedness of { λ k +1 } in H as well as (cid:0) λ k +1 , w k (cid:1) − (cid:107) λ k +1 (cid:107) H = (cid:0) λ k +1 , w k − λ k +1 (cid:1) → ρ k → ∞ . Note that(21) is a quadratic function in λ . A simple calculation therefore shows that (cid:0) λ k +1 , y − g ( x k +1 ) (cid:1) ≤ ρ k (cid:107) w k (cid:107) H . The boundedness of { w k } now implies lim sup k →∞ (cid:0) λ k +1 , y − g ( x k +1 ) (cid:1) ≤ K is a cone. By inserting 0 ∈ K into theinequality, it is easy to see that it is equivalent to lim inf k →∞ (cid:0) λ k +1 , g ( x k +1 ) (cid:1) ≥ Theorem 5.4.

Let Assumptions 4.2, 5.1 hold, and let ¯ x be a weak limit point of { x k } . Ifthe feasible set M is nonempty, then ¯ x is feasible and solves the VI. roof. Let x k +1 (cid:42) K ¯ x for some K ⊆ N . The feasibility claim follows from Lemma 5.2.For the optimality, let y ∈ M be any feasible point. Then (cid:10) L ρ k ( x k +1 , w k ) , y − x k +1 (cid:11) ≥− ε k +1 (cid:107) y − x k +1 (cid:107) X by Assumption 4.2 and, since L ρ k ( x k +1 , w k ) = L ( x k +1 , λ k +1 ), we get − ε k +1 (cid:107) y − x k +1 (cid:107) X ≤ (cid:10) F ( x k +1 ) + g (cid:48) ( x k +1 ) ∗ λ k +1 , y − x k +1 (cid:11) ≤ (cid:10) F ( x k +1 ) , y − x k +1 (cid:11) + (cid:0) λ k +1 , g ( y ) − g ( x k +1 ) (cid:1) , where we used the fact that x (cid:55)→ (cid:0) λ k +1 , g ( x ) (cid:1) is convex by Lemma 2.1 (recall that λ k +1 ∈ K ◦∞ ). Using ε k → k ∈K (cid:10) F ( x k +1 ) , y − x k +1 (cid:11) ≥ (cid:104) F ( x ) , x − y (cid:105) is weakly sequentially lsc, this implies (cid:104) F (¯ x ) , y − ¯ x (cid:105) ≥ We will now consider the local convergence characteristics of Algorithm 4.1. A keyingredient is the error bound property from Section 3 which allows us to estimate thedistance from ( x k , λ k ) to (¯ x, ¯ λ ) by using the function σ from (10). Lemma 6.1.

Let Assumption 4.2 hold and let (¯ x, ¯ λ ) be a KKT point satisfying the errorbound (12) . Then there is an r > such that, if x k ∈ B r (¯ x ) for all k and d K ( g ( x k )) → ,then ( x k , λ k ) → (¯ x, ¯ λ ) .Proof. By Assumption 4.2, we have L ( x k +1 , λ k +1 ) = L ρ k ( x k +1 , w k ) →

0. Hence, in viewof the error bound property, it suﬃces to show that g ( x k +1 ) − P K ( g ( x k +1 ) + λ k +1 ) → s k +1 := P K ( g ( x k +1 ) + w k /ρ k ). Then s k +1 ∈ K and, asnoted before, λ k +1 ∈ N K ( s k +1 ) for all k . We now use the fact that y (cid:55)→ y − P K ( y + λ k +1 )is nonexpansive, which is an easy consequence of [4, Cor. 4.10]. Therefore, the inversetriangle inequality yields (cid:107) g ( x k +1 ) − P K ( g ( x k +1 ) + λ k +1 ) (cid:107) H ≤ (cid:107) g ( x k +1 ) − s k +1 (cid:107) H + (cid:107) s k +1 − P K ( s k +1 + λ k +1 ) (cid:107) H . (22)The last term is equal to zero since λ k +1 ∈ N K ( s k +1 ), cf. [4, Cor. 6.46]. Hence, to completethe proof, we only need to show that (cid:107) s k +1 − g ( x k +1 ) (cid:107) H →

0. If { ρ k } is bounded, thenthis readily follows from the penalty updating scheme (18). On the other hand, if ρ k → ∞ ,then (cid:107) s k +1 − g ( x k +1 ) (cid:107) H ≤ (cid:107) s k +1 − P K ( g ( x k +1 )) (cid:107) H + d K ( g ( x k +1 )) → , where we used the nonexpansiveness of the projection operator.The above lemma gives us some information about the behavior of zeros of the augmentedLagrangian in a neighborhood of ¯ x . Note that the assumption d K ( g ( x k +1 )) → Assumption 6.2.

We assume that (¯ x, ¯ λ ) is a KKT point of the VI which satisﬁes thelocal error bound (12). Moreover, the sequence { ( x k , λ k ) } from Algorithm 4.1 convergesstrongly to (¯ x, ¯ λ ), and we have w k = λ k for all k suﬃciently large.13ne of the above assumptions which might require some elaboration is w k = λ k for all k .The boundedness of { w k } is key to establishing global convergence of the algorithm, seeSection 5. Since λ k → ¯ λ in our setting, we do not need to force boundedness of { w k } andcan simply set w k := λ k for all k . (In the context of Algorithm 4.1, we formally need tochoose the bounded set B suﬃciently large to allow this.)We will now prove convergence rates for the primal-dual sequence { ( x k , λ k ) } . Sincethe distance of ( x k , λ k ) to (¯ x, ¯ λ ) admits both upper and lower estimates relative to theresidual terms σ k := σ ( x k , λ k ) by (14), we will largely base our analysis on the sequence { σ k } , and the results on the primal-dual sequence { ( x k , λ k ) } will follow directly. Lemma 6.3.

Let Assumptions 4.2, 6.2 hold, and let σ k := σ ( x k , λ k ) . Then there is aconstant c > such that (cid:18) − c ρ k (cid:19) σ k +1 ≤ ε k +1 + c ρ k σ k for all k ∈ N suﬃciently large.Proof. Observe that L ρ k ( x k +1 , w k ) = L ( x k +1 , λ k +1 ) for all k . By Assumption 4.2 and thedeﬁnition of σ k , we therefore have σ k +1 ≤ ε k +1 + (cid:107) g ( x k +1 ) − P K ( g ( x k +1 ) + λ k +1 ) (cid:107) H . (23)Now, let k ∈ N be large enough so that w k = λ k . Consider again the sequence s k +1 := P K ( g ( x k +1 ) + λ k /ρ k ). Using (22), we see that (cid:107) g ( x k +1 ) − P K ( g ( x k +1 ) + λ k +1 ) (cid:107) H ≤ (cid:107) g ( x k +1 ) − s k +1 (cid:107) H = (cid:107) λ k +1 − λ k (cid:107) H ρ k . (24)Inserting this into (23) and using the triangle inequality yields σ k +1 ≤ ε k +1 + 1 ρ k (cid:0) (cid:107) λ k +1 − ¯ λ (cid:107) H + (cid:107) λ k − ¯ λ (cid:107) H (cid:1) . Now, by Assumption 6.2 and since x k → ¯ x , there is a c > (cid:107) λ k − ¯ λ (cid:107) H ≤ c σ k for all k ∈ N suﬃciently large. Hence, σ k +1 ≤ ε k +1 + c ρ k σ k +1 + c ρ k σ k , again for k ∈ N suﬃciently large. Reordering gives the desired result.With the above lemma, it is easy to deduce convergence rates for the primal-dual sequence { ( x k , λ k ) } . Theorem 6.4.

Let Assumptions 4.2, 6.2 hold, and let ε k +1 = o ( σ k ) . Then: (a) For every q ∈ (0 , , there is a ¯ ρ q > such that, if ρ k ≥ ¯ ρ q for suﬃciently large k ,then ( x k , λ k ) → (¯ x, ¯ λ ) Q-linearly with rate q . (b) The sequence of penalty parameters { ρ k } remains bounded. roof. Let k ∈ N be suﬃciently large so that w k = λ k . By Lemma 6.3, if ρ k is largeenough so that 1 − c /ρ k >

0, then σ k +1 σ k ≤ c ρ k − c + o (1) . (25)Using (12) and the local Lipschitz-continuity of σ (e.g. equation (14)), it is easy toderive (a). For (b), let us again consider the sequence s k +1 = P K ( g ( x k +1 ) + λ k /ρ k ),and deﬁne V k +1 := V ( x k +1 , w k , ρ k ) = (cid:107)L ρ k ( x k +1 , w k ) (cid:107) X ∗ + (cid:107) g ( x k +1 ) − s k +1 (cid:107) H . To proveboundedness of { ρ k } , we need to show that V k +1 ≤ τ V k for suﬃciently large k . Using (24)and L ρ k ( x k +1 , w k ) = L ( x k +1 , λ k +1 ), we obtain V k +1 ≥ (cid:107)L ρ k ( x k +1 , w k ) (cid:107) X ∗ + (cid:107) g ( x k +1 ) − P K ( g ( x k +1 ) + λ k +1 ) (cid:107) H = σ k +1 for all k ∈ N and, from (24) and Assumption 4.2, V k +1 = (cid:107)L ρ k ( x k +1 , w k ) (cid:107) X ∗ + (cid:107) λ k +1 − λ k (cid:107) H ρ k ≤ ε k +1 + (cid:107) λ k +1 − ¯ λ (cid:107) H + (cid:107) λ k − ¯ λ (cid:107) H ρ k ≤ ε k +1 + cρ k ( σ k +1 + σ k )for all k ∈ N suﬃciently large, where c is the constant from (12) (recall that x k → ¯ x ).Putting these inequalities together yields V k +1 V k ≤ ε k +1 σ k + cρ k σ k +1 + σ k σ k = ε k +1 σ k + cρ k (cid:18) σ k +1 σ k (cid:19) . If we now assume that ρ k → ∞ , then it is easy to deduce from (25) and ε k +1 = o ( σ k )that V k +1 /V k →

0. Hence, V k +1 /V k ≤ τ for all k suﬃciently large, which contradicts theassumption that ρ k → ∞ .The assumption ε k +1 = o ( σ k ) in the above theorem says that, roughly speaking, the degreeof inexactness should be small enough to not aﬀect the rate of convergence. Note that weare comparing ε k +1 to the optimality measure σ k of the previous iterates ( x k , λ k ). Hence,it is easy to ensure this condition in practice, for instance, by always computing the nextiterate x k +1 with a precision ε k +1 ≤ z k σ k for some ﬁxed null sequence z k .Let us also note that one can easily adapt the proof of Theorem 6.4(a) to conclude that( x k , λ k ) → (¯ x, ¯ λ ) Q-superlinearly if ρ k → ∞ . However, the resulting assertion would beredundant because part (b) of the theorem actually implies the boundedness of { ρ k } . Onthe other hand, the proof of (b) uses the speciﬁc penalty updating scheme (18) with thefunction V from (16), whereas the proof of (a) does not depend on the penalty updatingrule at all. If we replace V by the function˜ V ( x, λ, ρ ) := (cid:13)(cid:13)(cid:13)(cid:13) g ( x ) − P K (cid:18) g ( x ) + λρ (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) H (which is just the second term from the deﬁnition of V ), it is rather easy to see that theassertions of Lemmas 6.1, 6.3 and Theorem 6.4(a) remain true. Additionally, we obtainsuperlinear convergence if ρ k → ∞ , but we do not get boundedness of { ρ k } .15et us close this section by mentioning two special cases for which diﬀerent or strongerrate of convergence results can be obtained. The ﬁrst case is that of convex optimization.In this case, the augmented Lagrangian algorithm is essentially equivalent to a proximal-point method (applied to the dual problem), and this duality can be used to establishcertain rate of convergence results, see [13, 27, 40, 50].The second special case, which was already mentioned in the introduction, is that ofnonlinear programming-type (NLP) constraints. Here, it is possible to prove local linearconvergence under SOSC only [19]. Constraint qualiﬁcations are not needed since the set K is polyhedral, see the discussion in the introduction and in [9, Section 4.4]. However,the techniques used in [19] rely heavily on ﬁnite-dimensional arguments and the speciﬁcstructure of NLP constraints, and thus cannot readily be adapted to our setting. This section describes some applications of our method. Recall that our variational settingencompasses constrained optimization problems (3). This opens up a broad spectrumof applications, including, as mentioned before, standard nonlinear programming (NLP).However, there already is a plethora of literature on this topic, in particular the recentpaper [19]. Moreover, the discussion in Section 3 indicates that NLP is actually a veryconﬁned special case which does not allow us to demonstrate the full generality of ourapproach. In particular, NLPs are inherently ﬁnite-dimensional and the corresponding set K is polyhedral, which is very restrictive.As a result, we focus on problems in function space settings where the constraintset is almost never polyhedral. This section contains two examples in this direction: webegin with a simple linear-quadratic optimal control problem and then continue withmultiobjective optimal control in a Nash equilibrium framework. For both examples, weﬁrst present the general problem setting and then explain why the regularity propertiesfrom Assumption 6.2 are satisﬁed.To verify our theoretical results in practice, we follow a standard approach by whichwe discretize the respective problems and then analyze the behavior of the algorithm forincreasingly ﬁne levels of discretization. As we shall see, the assertions of the previoussection can be veriﬁed in both examples, and independently of the dimension n , whichindicates that our results are valid. Let Ω ⊆ R d , d ∈ { , } , be a bounded domain. The example presented in this sectionconsists of minimizing J ( y, u ) := 12 (cid:107) y − y d (cid:107) L (Ω) + α (cid:107) u (cid:107) L (Ω) subject to y ∈ H (Ω) ∩ C ( ¯Ω) and u ∈ L (Ω) satisfying the partial diﬀerential equation(PDE) and pointwise control constraints − ∆ y = u + f and u a ≤ u ≤ u b . y d , u a , u b ∈ L (Ω) are problem-speciﬁc and α > w ∈ L (Ω), the Poisson equation − ∆ y = w admits a uniquely determined weak solution y = Sw ∈ H (Ω) ∩ C ( ¯Ω), and the resultingoperator S : L (Ω) → H (Ω) ∩ C ( ¯Ω) is linear and compact [51, Thm. 4.17]. Writing y u := S ( u + f ), we can now restate the objective function as¯ J ( u ) := J ( y u , u ) = 12 (cid:107) y u − y d (cid:107) L (Ω) + α (cid:107) u (cid:107) L (Ω) . This function together with the control constraints u a ≤ u ≤ u b is typically called the reduced formulation of the optimal control problem and directly ﬁts into our variationalframework by setting X := H := L (Ω), F ( u ) := ¯ J (cid:48) ( u ), and g ( u ) := u, K := { u ∈ X : u a ≤ u ≤ u b } . Since ¯ J is strongly convex and g is just the identity mapping on X = H , it is easy toshow that the above problem admits a unique primal-dual solution, and that both SOSCand SRC hold. Hence, by Theorem 3.2, the KKT system is upper Lipschitz stable andthe control problem admits a local error bound.We now present a numerical example which is constructed in such a way that theoptimal solution is known analytically. Let Ω := (0 , be the unit square and deﬁne α := 1, u a := − . u b := 0 .

5. Consider the functions¯ y ( x ) := sin( πx ) sin( πx ) , ¯ p ( x ) := sin(2 πx ) sin(2 πx ) , and set y d := ¯ y + ∆¯ p . Now, using ¯ u := P [ u a ,u b ] ( − ¯ p/α ) and f := − ∆¯ y − ¯ u , it is easy to seethat ¯ u is a solution to the problem. Moreover, ¯ y is the corresponding state, ¯ p the so-calledadjoint state [51], and the Lagrange multiplier is given by ¯ λ := − ¯ p − α ¯ u .For the numerical testing, we discretized the problem by means of a uniform grid with n ∈ N interior points per row or column (i.e., n points in total) and approximated theLaplace operator by a standard ﬁve-point ﬁnite diﬀerence scheme. It is easy to argue thatthe resulting discretized versions of ¯ J and g again satisfy the (now ﬁnite-dimensional)SOSC and SRC assumptions (since ¯ J is strongly convex and g is the identity mapping).Hence, we can expect locally fast convergence of the augmented Lagrangian method, bothfrom a continuous and a discrete point of view.The implementation of the algorithm was done in MATLAB R (cid:13) and uses the parameters( u , λ ) := (0 , , B := [ − , ] n , ρ := 1 , γ := 10 , τ := 0 . , together with the formula w k := P B ( λ k ) for the safeguarded multipliers (see the discussionin Section 4). Moreover, we use the termination criteria σ ( x, λ ) ≤ − and (cid:107)L ρ k ( x, w k ) (cid:107) ≤ − for the outer and inner iterations, respectively, where the norm is the discrete L -norm. The subproblems are nonlinear equations which we solve with a standardsemismooth Newton method. It should be noted that, while the discrete Laplacian is asparse matrix, the solution operator S which occurs in the function F is nearly dense. Tocircumvent this issue, we use a sparse Cholesky factorization of the negative Laplacianto obtain an “implicit” form of S and solve the Newton equations with the MATLAB R (cid:13) conjugate gradient method pcg . 17 = 64 n = 256 n = 1024 k ρ k σ k dist k ρ k σ k dist k ρ k σ k dist k n , where each line containsthe penalty parameter ρ k , the optimality measure σ k and the distance dist k of ( u k , λ k )to (¯ u, ¯ λ ). The results suggest that the algorithm works very well for this problem; inparticular, the number of required iterations remains constant as n increases. Moreover, wealso observe that the rate of convergence appears to be proportional to 1 /ρ k , as suggestedby the theory. It should be noted, however, that the distances dist k stop decreasing aftera certain point because of the inexactness induced by the discretization; in particular, ifwe discretize the (known) optimal solution pair (¯ u, ¯ λ ), we do not obtain an exact solutionof the discretized problem. This phenomenon is also evidenced by the fact that the “limit”value of dist k decreases as n increases.We close this section with an important remark on the analytical representation of thefeasible set. This observation is crucial and was in fact one of our main motivations toconsider constraint sets K which are not necessarily cones. Remark 7.1.

It is important that we deﬁne the constraint system with g and K as above.Indeed, the alternative formulation of the box constraints as ˆ g ( u ) ∈ ˆ K withˆ g ( u ) := ( u − u a , u b − u ) , ˆ K := { ( v, w ) ∈ L (Ω) : v, w ≥ } , may seem advantageous at ﬁrst glance (since ˆ K is a closed convex cone, whereas K isnot). However, in this formulation, the strict Robinson condition is not satisﬁed. Infact, the function ˆ g does not even satisfy the standard Robinson constraint qualiﬁcation(RCQ) [9] or the equivalent regularity condition of Zowe and Kurcyusz [55]. We refer thereader to [51] for a formal proof; an alternative way to verify this irregularity is to notethat if RCQ holds, then it remains stable under small perturbations of the constraintfunction [9]. However, even if u a and u b are “well separated”, it is fairly easy to constructsmall perturbations (in the sense of L ) which make the lower and upper bounds coincideon some set of positive measure. If this happens, then the set of Lagrange multiplierscorresponding to a local minimum is unbounded, and RCQ is violated.18 .2 Optimal Control in a Nash Equilibrium Framework We now present a generalization of the optimal control problem from the previous sectionby considering it in a multi-player framework [10, 15, 38]. The result is a Nash equilibriumproblem (NEP) of two players with control variables u , u ∈ L (Ω) and a state variable y ∈ H (Ω) ∩ C ( ¯Ω), where Ω ⊆ R d , d ∈ { , } , is again a bounded domain. Similarly tobefore, each player attempts to minimize the objective function J i ( y, u i ) := 12 (cid:107) y − y id (cid:107) L (Ω) + α i (cid:107) u i (cid:107) L (Ω) with respect to u i , subject to the partial diﬀerential equation − ∆ y = u + u + f andthe pointwise control constraints a i ≤ u i ≤ b i with a i , b i ∈ L (Ω). The remainingproblem parameters satisfy α i > y id ∈ L (Ω) for all i . As in Section 7.1, we canuse the compact linear solution operator S : L (Ω) → H (Ω) ∩ C ( ¯Ω) and the resultingcontrol-to-state mapping y u := S ( u + u + f ) to transform the objective functions to¯ J i ( u ) := J i ( y u , u i ) = 12 (cid:107) y u − y id (cid:107) L (Ω) + α i (cid:107) u i (cid:107) L (Ω) , where u := ( u , u ). To establish the connection with our variational problem (1), (4),we only need to make some deﬁnitions and use the well-known correspondence betweenNEPs and VIs [16, 38]. Deﬁne X := H := L (Ω) , F ( u ) := (cid:0) D u ¯ J ( u ) , D u ¯ J ( u ) (cid:1) , and g ( u , u ) := ( u , u ) , K := { ( u , u ) ∈ X : a i ≤ u i ≤ b i } . Then it is easy to see that the NEP is equivalent to the VI (1) (and (2), since the feasibleset is convex). The existence of a solution of the NEP (and of the VI) can be shown asin [10]; moreover, since g is the identity operator on X = H , SRC holds and the problemadmits a unique Lagrange multiplier. Finally, an easy calculation shows that F (cid:48) ( u ) = (cid:18) S ∗ S + α I S ∗ SS ∗ S S ∗ S + α I (cid:19) , where I is the identity operator on L (Ω), see [38]. It follows that F is strongly monotoneand, since g is linear, the problem automatically satisﬁes SOSC and therefore admits alocal error bound by Theorem 3.2. Moreover, it is easy to see that the same holds for thediscretized problems presented below.We now present some numerical results for the example from [10]. The setting is againconstructed in such a way that the optimal solution is known. In fact, the construction isvery similar to the one from the previous section: let Ω := (0 , be the unit square anddeﬁne α i := 1, a i := − .

5, and b i := 0 . i . Consider the functions¯ y ( x ) := sin( πx ) sin( πx ) , ¯ p ( x ) := − sin(2 πx ) sin(2 πx ) , ¯ p ( x ) := − sin(3 πx ) sin(3 πx ) , as well as y id := ¯ y + ∆¯ p i , ¯ u i := P [ a i ,b i ] ( − ¯ p i /α i ) for all i , and ﬁnally f := − ∆¯ y − ¯ u − ¯ u .Then it is easy to see that ¯ u is a Nash equilibrium. The corresponding state is given by ¯ y ,the variables ¯ p i are the adjoint states of the players, and the Lagrange multiplier is givenby ¯ λ := ( − ¯ p − α ¯ u , − ¯ p − α ¯ u ). 19 = 64 n = 256 n = 1024 k ρ k σ k dist k ρ k σ k dist k ρ k σ k dist k ρ k , the optimality measure σ k , and the distance dist k of( u k , λ k ) to (¯ u, ¯ λ ). We observe good consistency of the results with our established theory;in particular, the rate of convergence is roughly proportional to 1 /ρ k . We also highlightonce again that the distances dist k do not converge to zero because of the inexactnessinduced by the discretization.We close this section by noting that, as explained in Remark 7.1 for the standard(single-objective) optimal control problem, it is very important that we deﬁne g and K precisely as we did in order to ensure the fulﬁllment of the strict Robinson condition. This example is based on the theory in [32, 34]. For the sake of simplicity, we restrictourselves to the one-dimensional case. Let Ω ⊆ R be a bounded interval and consider theelliptic diﬀerential equation − ∇ ( q ∇ u ) = f, u ∈ H (Ω) , (26)where q ∈ H (Ω) and f ∈ H − (Ω). The parameter estimation problem now consists ofthe minimization of the tracking-type functional J ( q, u ) := 12 (cid:107) u − z (cid:107) H (Ω) + β (cid:107) q (cid:107) H (Ω) (27)subject to (26) and q ≥ α , where z ∈ H (Ω) and α, β >

0. To formulate this problem inour variational framework, let X := H := H (Ω) × H (Ω), F := ( D q J, D u J ), and g ( q, u ) := (cid:18) q − α − ∆ − (cid:0) ∇ ( q ∇ u ) + f (cid:1)(cid:19) , K := H (Ω) × { } , q of the parameter estimation problem for n = 256 (left)and n = 1024 (right).where H (Ω) is the nonnegative cone in H (Ω). Note that the second component of g isessentially the diﬀerential equation (26), but premultiplied with − ∆ − to map the resultback into H (Ω).The existence of solutions to (27) can be shown by eliminating u in (26) and usingthe coercivity of J , see [34]. Let (¯ q, ¯ u ) be a solution of the problem. Then g (cid:48) (¯ q, ¯ u ) = (cid:18) id H T ¯ u T ¯ q (cid:19) , where T ¯ u ( q ) := − ∆ − ( ∇ ( q ∇ ¯ u )) and T ¯ q ( u ) := − ∆ − ( ∇ (¯ q ∇ u )). Observe now that T ¯ q : H (Ω) → H (Ω) is surjective. This follows from the fact that ∆ : H (Ω) → H − (Ω) isan isomorphism and that u (cid:55)→ ∇ (¯ q ∇ u ) is surjective onto H − (Ω) by the Lax–Milgramtheorem (since ¯ q ≥ α > g (cid:48) (¯ q, ¯ u ) issurjective, and thus the strict Robinson condition is satisﬁed in (¯ q, ¯ u ).Let us furthermore assume that the second-order suﬃcient condition holds in (¯ q, ¯ u ).The precise veriﬁcation of this condition would require the knowledge of the solution, butthe second-order condition is very plausible since the objective in (27) is strongly convex(by virtue of the H -regularization term). Under the present assumptions, the problemadmits the local error bound from Theorem 3.2. The corresponding residual mapping σ : X × H → R takes on the form σ ( q, u, µ, λ ) := (cid:107) F ( q, u ) + g (cid:48) ( q, u ) ∗ ( µ, λ ) (cid:107) X ∗ + (cid:107) g ( q, u ) − P K ( g ( q, u ) + ( µ, λ )) (cid:107) H , where ( µ, λ ) ∈ H = H (Ω) × H (Ω) is the pair of Lagrange multipliers. The vector µ corresponds to the lower bound constraint q ≥ α (the ﬁrst component of g ), whereas λ belongs to the partial diﬀerential equation (the second component of g ).We now present some numerical results. For practical purposes, we slightly alterthe penalization scheme from Algorithm 4.1 in the sense that we augment the equalityconstraint only and leave the inequality constraint q ≥ α unchanged. This has thebeneﬁt that we avoid the computation of projections and distance functions involving H (Ω). The resulting modiﬁcations to Algorithm 4.1 are fairly straightforward (see, forinstance, [6, 8, 38]). Indeed, the augmented subproblems are now (constrained) variational21 = 256 , β = 1 n = 256 , β = 0 . n = 1024 , β = 1 n = 1024 , β = 0 . k ρ k σ k ρ k σ k ρ k σ k ρ k σ k { q ∈ H (Ω) : q ≥ α } . Moreover, in the updating scheme ofthe penalty parameter, we have to take into account the multiplier corresponding to thelower inequality constraint, which has to be recovered from the solution process of thecorresponding constrained subproblem.The example we present is [32, Ex. 6]. The domain Ω := (0 ,

1) is discretized by meansof n ∈ N points, including boundary points, and the derivative operators are approximatedby forward diﬀerences. The problem is constructed by setting q ( x ) := 1 + x, z ( x ) := u ( x ) := sin( πx ) , f ( x ) := (1 + x ) π sin( πx ) − π cos( πx ) , so that −∇ ( q ∇ u ) = f . Since z = u , an exact solution of (27) for β = 0 is simply givenby ( q , u ). For β >

0, which is the preferable case from a numerical perspective, thesolutions are diﬀerent in general.The implementation of the algorithm was done in MATLAB R (cid:13) and uses the parameters( q , u , µ , λ ) := (1 , , , , α := 0 . , ρ := 1 , γ := 10 , τ := 0 . , together with w k := P B ( λ k ) and B the closed ball with radius 10 around zero in H (Ω). The termination criteria for the outer and inner iterations are σ ( q, u, µ, λ ) ≤ − and (cid:107)L ρ k ( q, u, w k ) + µ k (cid:107) X ∗ ≤ − , respectively, where µ k is the Lagrange multipliercorresponding to the constraint q ≥ α . Finally, the augmented subproblems were solvedby the fmincon routine which takes into account the lower box constraint.Table 3 contains the corresponding iteration numbers for diﬀerent values of n and β .We again observe linear convergence of the optimality measures σ k , and the sequencesof penalty parameters remain bounded. The only exception is the eighth iteration for n = 1024 and β = 1, which may be due to the subproblem routine fmincon failing to ﬁnda suﬃciently exact minimizer. Finally, Figure 1 compares the computed solutions q fordiﬀerent n and β to the exact solution q for β = 0.22 Final Remarks

We have presented a method of augmented Lagrangian type for the solution of variationalproblems in Banach spaces. In particular, we have shown global and local convergence ofthe algorithm under suitable assumptions.The assumptions needed for the local convergence results include, in particular, a localerror bound for the distance of a pair ( x, λ ) to a KKT point (¯ x, ¯ λ ). This property hasplayed a central role in our analysis and is a consequence of the second-order suﬃcientcondition together with a strict version of the Robinson constraint qualiﬁcation.The above results suggest that error bounds are the natural framework for the localconvergence analysis of augmented Lagrangian methods. We therefore hope that theresults in this paper will ﬁnd applications in other areas of optimization. In particular, aninteresting idea would be to specialize some of the assumptions and results for problemclasses such as optimal control or semideﬁnite programming. Another aspect which couldlead to further developments is the concept of partial penalization which arises whenadditional constraints are present in the problem formulation which are not penalized,see [1, 6, 8] and the example in Section 7.3. References [1] R. Andreani, E. G. Birgin, J. M. Mart´ınez, and M. L. Schuverdt. On augmentedLagrangian methods with general lower-level constraints.

SIAM J. Optim. , 18(4):1286–1309, 2007.[2] R. Andreani, E. G. Birgin, J. M. Mart´ınez, and M. L. Schuverdt. AugmentedLagrangian methods under the constant positive linear dependence constraint qualiﬁ-cation.

Math. Program. , 111(1-2, Ser. B):5–32, 2008.[3] C. Baiocchi and A. Capelo.

Variational and Quasivariational Inequalities . JohnWiley & Sons, Inc., New York, 1984.[4] H. H. Bauschke and P. L. Combettes.

Convex Analysis and Monotone OperatorTheory in Hilbert Spaces . Springer, New York, 2011.[5] D. P. Bertsekas.

Constrained Optimization and Lagrange Multiplier Methods . Aca-demic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London, 1982.[6] E. G. Birgin, D. Fern´andez, and J. M. Mart´ınez. The boundedness of penaltyparameters in an augmented Lagrangian method with constrained subproblems.

Optim. Methods Softw. , 27(6):1001–1024, 2012.[7] E. G. Birgin, C. A. Floudas, and J. M. Mart´ınez. Global minimization using anaugmented Lagrangian method with variable lower-level constraints.

Math. Program. ,125(1, Ser. A):139–162, 2010.[8] E. G. Birgin and J. M. Mart´ınez.

Practical Augmented Lagrangian Methods forConstrained Optimization . Society for Industrial and Applied Mathematics (SIAM),Philadelphia, PA, 2014. 239] J. F. Bonnans and A. Shapiro.

Perturbation Analysis of Optimization Problems .Springer Series in Operations Research. Springer-Verlag, New York, 2000.[10] A. Borz`ı and C. Kanzow. Formulation and numerical solution of Nash equilibriummultiobjective elliptic control problems.

SIAM J. Control Optim. , 51(1):718–744,2013.[11] A. R. Conn, N. I. M. Gould, and P. L. Toint. A globally convergent augmentedLagrangian algorithm for optimization with general constraints and simple bounds.

SIAM J. Numer. Anal. , 28(2):545–572, 1991.[12] C. Ding, D. Sun, and L. Zhang. Characterization of the robust isolated calmness fora class of conic programming problems.

SIAM J. Optim. , 27(1):67–90, 2017.[13] Y. Dong. Comments on “The proximal point algorithm revisited”.

J. Optim. TheoryAppl. , 166(1):343–349, 2015.[14] A. L. Dontchev and R. T. Rockafellar. Characterizations of Lipschitzian stabilityin nonlinear programming. In

Mathematical programming with data perturbations ,volume 195 of

Lecture Notes in Pure and Appl. Math. , pages 65–82. Dekker, NewYork, 1998.[15] A. Dreves and J. Gwinner. Jointly convex generalized Nash equilibria and ellipticmultiobjective optimal control.

J. Optim. Theory Appl. , 168(3):1065–1086, 2016.[16] F. Facchinei, A. Fischer, and V. Piccialli. On generalized Nash games and variationalinequalities.

Oper. Res. Lett. , 35(2):159–164, 2007.[17] F. Facchinei and C. Kanzow. Generalized Nash equilibrium problems.

Ann. Oper.Res. , 175:177–211, 2010.[18] F. Facchinei and J.-S. Pang.

Finite-Dimensional Variational Inequalities and Com-plementarity Problems. Vol. I . Springer-Verlag, New York, 2003.[19] D. Fern´andez and M. V. Solodov. Local convergence of exact and inexact augmentedLagrangian methods under the second-order suﬃcient optimality condition.

SIAM J.Optim. , 22(2):384–407, 2012.[20] A. Fischer. Local behavior of an iterative framework for generalized equations withnonisolated solutions.

Math. Program. , 94(1, Ser. A):91–124, 2002.[21] A. Fischer, M. Herrich, and K. Sch¨onefeld. Generalized Nash equilibrium problems -recent advances and challenges.

Pesquisa Operacional , 34:521 – 558, 12 2014.[22] C. A. Floudas and P. M. Pardalos, editors.

Encyclopedia of Optimization. Vol. I–VI .Kluwer Academic Publishers, Dordrecht, 2001.[23] M. Fortin and R. Glowinski.

Augmented Lagrangian Methods: Applications to theNumerical Solution of Boundary-Value Problems , volume 15 of

Studies in Mathematicsand its Applications . North-Holland Publishing Co., Amsterdam, 1983.2424] R. Glowinski.

Numerical Methods for Nonlinear Variational Problems . ScientiﬁcComputation. Springer-Verlag, Berlin, 2008. Reprint of the 1984 original.[25] R. Glowinski.

Variational Methods for the Numerical Solution of Nonlinear EllipticProblems , volume 86 of

CBMS-NSF Regional Conference Series in Applied Mathe-matics . Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,2015.[26] R. Glowinski, J.-L. Lions, and R. Tr´emoli`eres.

Numerical Analysis of VariationalInequalities , volume 8 of

Studies in Mathematics and its Applications . North-HollandPublishing Co., Amsterdam-New York, 1981.[27] O. G¨uler. On the convergence of the proximal point algorithm for convex minimization.

SIAM J. Control Optim. , 29(2):403–419, 1991.[28] M. R. Hestenes. Multiplier and gradient methods.

J. Optimization Theory Appl. ,4:303–320, 1969.[29] M. Hinterm¨uller and K. Kunisch. Feasible and noninterior path-following in con-strained minimization with low multiplier regularity.

SIAM J. Control Optim. ,45(4):1198–1221, 2006.[30] M. Hinterm¨uller, T. Surowiec, and A. K¨ammler. Generalized Nash equilibriumproblems in Banach spaces: theory, Nikaido-Isoda-based path-following methods, andapplications.

SIAM J. Optim. , 25(3):1826–1856, 2015.[31] G. Isac.

Complementarity Problems , volume 1528 of

Lecture Notes in Mathematics .Springer-Verlag, Berlin, 1992.[32] K. Ito, M. Kroller, and K. Kunisch. A numerical study of an augmented Lagrangianmethod for the estimation of parameters in elliptic systems.

SIAM J. Sci. Statist.Comput. , 12(4):884–910, 1991.[33] K. Ito and K. Kunisch. The augmented Lagrangian method for equality and inequalityconstraints in Hilbert spaces.

Math. Programming , 46(3, (Ser. A)):341–360, 1990.[34] K. Ito and K. Kunisch. The augmented Lagrangian method for parameter estimationin elliptic systems.

SIAM J. Control Optim. , 28(1):113–136, 1990.[35] K. Ito and K. Kunisch. Augmented Lagrangian methods for nonsmooth, convexoptimization in Hilbert spaces.

Nonlinear Anal. , 41(5-6, Ser. A: Theory Methods):591–616, 2000.[36] K. Ito and K. Kunisch.

Lagrange Multiplier Approach to Variational Problems andApplications . Society for Industrial and Applied Mathematics (SIAM), Philadelphia,PA, 2008.[37] A. F. Izmailov and M. V. Solodov. Stabilized SQP revisited.

Math. Program. , 133(1-2,Ser. A):93–120, 2012. 2538] C. Kanzow, V. Karl, D. Steck, and D. Wachsmuth. The multiplier-penalty method forgeneralized Nash equilibrium problems in Banach spaces.

Technical Report , Instituteof Mathematics, University of W¨urzburg, July 2017.[39] C. Kanzow and D. Steck. An example comparing the standard and safeguardedaugmented Lagrangian methods.

Oper. Res. Lett. , 45(6):598–603, 2017.[40] C. Kanzow and D. Steck. A generalized proximal-point method for convex optimiza-tion problems in Hilbert spaces.

Optimization , 66(10):1667–1676, 2017.[41] C. Kanzow, D. Steck, and D. Wachsmuth. An augmented Lagrangian method foroptimization problems in Banach spaces.

SIAM J. Control Optim. , to appear.[42] D. Kinderlehrer and G. Stampacchia.

An Introduction to Variational Inequalitiesand Their Applications , volume 31 of

Classics in Applied Mathematics . Society forIndustrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000. Reprint of the1980 original.[43] J. Kyparisis. On uniqueness of Kuhn-Tucker multipliers in nonlinear programming.

Math. Programming , 32(2):242–246, 1985.[44] J.-J. Moreau. D´ecomposition orthogonale d’un espace hilbertien selon deux cˆonesmutuellement polaires.

C. R. Acad. Sci. Paris , 255:238–240, 1962.[45] J. Nocedal and S. J. Wright.

Numerical Optimization . Springer, New York, secondedition, 2006.[46] J.-S. Pang and M. Fukushima. Quasi-variational inequalities, generalized Nashequilibria, and multi-leader-follower games.

Comput. Manag. Sci. , 2(1):21–56, 2005.[47] M. J. D. Powell. A method for nonlinear constraints in minimization problems. In

Optimization (Sympos., Univ. Keele, Keele, 1968) , pages 283–298. Academic Press,London, 1969.[48] R. T. Rockafellar. A dual approach to solving nonlinear programming problems byunconstrained optimization.

Math. Programming , 5:354–373, 1973.[49] R. T. Rockafellar. Augmented Lagrange multiplier functions and duality in nonconvexprogramming.

SIAM J. Control , 12:268–285, 1974.[50] R. T. Rockafellar. Augmented Lagrangians and applications of the proximal pointalgorithm in convex programming.

Math. Oper. Res. , 1(2):97–116, 1976.[51] F. Tr¨oltzsch.

Optimal Control of Partial Diﬀerential Equations . American Mathe-matical Society, Providence, RI, 2010.[52] M. Ulbrich.

Semismooth Newton Methods for Variational Inequalities and ConstrainedOptimization Problems in Function Spaces , volume 11 of

MOS-SIAM Series onOptimization . Society for Industrial and Applied Mathematics (SIAM), Philadelphia,PA; Mathematical Optimization Society, Philadelphia, PA, 2011.2653] G. Wachsmuth. On LICQ and the uniqueness of Lagrange multipliers.

Oper. Res.Lett. , 41(1):78–80, 2013.[54] A. P. Wierzbicki and S. Kurcyusz. Projection on a cone, penalty functionals andduality theory for problems with inequality constraints in Hilbert space.

SIAM J.Control Optimization , 15(1):25–56, 1977.[55] J. Zowe and S. Kurcyusz. Regularity and stability for the mathematical programmingproblem in Banach spaces.