From perspective maps to epigraphical projections
FFrom perspective maps to epigraphical pro jections
Michael P. Friedlander
Department of Computer Science/Department of Mathematics, University of British Columbia2366 Main Mall Vancouver, BC, V6T 1Z4, [email protected], https://friedlander.io
Ariel Goodwin
Department of Mathematics and Statistics, McGill University805 Sherbrooke St West, Montr´eal, Qu´ebec, H3A 0B9, [email protected], https://github.com/arielgoodwin
Tim Hoheisel
Dedicated to James V. Burke, our collaborator and friend, on the occasion of his 65th birthday
The projection onto the epigraph or a level set of a closed proper convex function can be achieved by findinga root of a scalar equation that involves the proximal operator as a function of the proximal parameter.This paper develops the variational analysis of this scalar equation. The approach is based on a studyof the variational-analytic properties of general convex optimization problems that are (partial) infimalprojections of the the sum of the function in question and the perspective map of a convex kernel. Whenthe kernel is the Euclidean norm squared, the solution map corresponds to the proximal map, and thus thevariational properties derived for the general case apply to the proximal case. Properties of the value functionand the corresponding solution map—including local Lipschitz continuity, directional differentiability, andsemismoothness—are derived. An SC optimization framework for computing epigraphical and level-setprojections is thus established. Numerical experiments on 1-norm projection illustrate the effectiveness ofthe approach as compared with specialized algorithms. Key words : Proximal map, Moreau envelope, subdifferential, Fenchel conjugate, perspective map,epigraph, infimal projection, infimal convolution, set-valued map, coderivative, graphical derivative,semismoothness*, SC optimization MSC2000 subject classification : 52A4, 65K10, 90C25, 90C46
1. Introduction
The Moreau proximal map of a closed proper convex function f that mapsa finite-dimensional Euclidean space E f to R := R ∪ { + ∞} is given by the minimizing set P λ f ( x ) = argmin u ∈ E f (cid:8) f ( u ) + (1 / λ ) (cid:107) x − u (cid:107) (cid:9) ( λ > . The proximal map is a central operation of algorithms for nonsmooth optimization, including first-order methods such as proximal gradient and operator splitting [3, 35]. Geometrically, the proximalmap corresponds to the Euclidean projection P epi f onto the epigraph epi f ; see Fig. 1. Indeed, forall positive λ and x λ := P λ f ( x ), (cid:0) x λ , f ( x λ ) (cid:1) = P epi f ( x, f ( x λ ) − λ ) . (1)Thus, the projection of an arbitrary point ( x, α ) ∈ E f × R (cid:54)∈ epi f corresponds to the proximal mapof the base point x using the parameter λ that is the unique positive root of the function0 < λ (cid:55)→ f ( x λ ) − λ − α. (2) a r X i v : . [ m a t h . O C ] F e b M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections ER f ( x λ ) x λ f ( x λ ) − λ x f epi f Figure 1.
The proximal map x λ := P λ f ( x ) corresponds to the projection of the pair ( x, f ( x λ ) − λ )) onto the epigraphof f ; see (1). This connection between epigraphical projection and the proximal map—described by Beck [4],Bauschke and Combettes [3, Section 29.5], Chierchia et al. [11, Proposition 1], and Meng etal. [31, 32]—is a defining feature of a class of epigraphical first-order methods for structured convexoptimization over E f that operate through a sequence of projections onto the epigraphs of theunderlying functions. In effect, these methods operate on an equivalent optimization problem over E f × R [11, 43, 44, 45].This paper develops a general analysis that provides, among other things, the variational prop-erties of the maps ( x, λ ) (cid:55)→ x λ := P λ f ( x ) and ( x, λ ) (cid:55)→ f ( x λ ) , defined on E f × R . This analysis and its supporting calculus allows us to determine the sensitivityof the epigraphical projection with respect to the simultaneous variation of the base point x andthe scaling parameter λ . Although the resulting mathematical statements are key for our deeperunderstanding of epigraphical first-order methods, the overall analysis applies much more generally.The approach we take is based on the variational analysis of the optimal value function p L,ω,f : ( x, λ ) ∈ E x × R (cid:55)→ inf u ∈ E f f ( u ) + ω π ( L ( u, x ) , λ ) (3)and its corresponding solution map. Here, L is a linear map, and the perspective transform ω π ofa closed proper convex function ω is defined by epi ω π = cl R + (epi ω × { } ). When the linear map L is defined as ( u, x ) (cid:55)→ x − u , the value function (3) is the infimal convolution of the functions f and ω π ( · , λ ). For this reason, we refer to this value function as the generalized convolution of thesetwo functions.The convex calculus we establish in Section 3 for the analysis of the generalized convolution (3)provides a key tool for understanding several important cases. These include the variational prop-erties of infimal convolution (Section 3.3); parametric constrained optimization (Section 3.4); theMoreau envelope of a convex function and the corresponding proximal map (Section 4); and epi-graphical and level-set projections, including an SC optimization [20, 36] method for numericallyevaluating these projections (Section 7). The perspective map used in generalized convolu-tion (3) first appears in Rockafellar [38, Corollary 13.5.1], without a particular name attached toit. More recently, Combettes [13], Combettes and M¨uller [14, 15], and Aravkin et al. [1], describe indetail the properties and applications of this map. Our systematic study of parametric optimizationproblems with perspective maps, outlined in Section 3, appears to be new. .P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections Section 3.3 establishes the variational properties of infimal con-volution, which occurs when the map L is ( u, x ) (cid:55)→ x − u . These results complement the functionalsmoothing framework described by Beck and Teboulle [5, Section 4.1] and Burke and Hoheisel [8, 9],wherein a smooth approximation to a function f is constructed through the infimal convolutionwith the perspective map of a smooth and strongly convex regularizer ω . Bougeard et al. [7] andStr¨omberg [42] provide early contributions to this topic. Theorem 3 describes the Lipschitzianproperties of the corresponding optimal solution map—as a function of ( x, λ ). Corollary 3 estab-lishes sufficient conditions for this solution map to be semismooth* [23]. These conditions hold,for instance, when f is piecewise linear-quadratic. This analysis complements the study of theproximal case by Meng et al. [31, 32] and Milzarek [33]. A general form of parametric constrainedoptimization occurs when we specialize the convolution kernel ω in (3) to be the indicator functionto a closed convex set. Section 3.4 focuses the variational analysis of the generalized convolutionoperation to obtain formulas for the sensitivity of the optimal value of parametric optimizationproblems with relaxed linear constraints. This analysis includes perturbations to the relaxationparameter and to the right-hand side. In Section 4 we further focus our analysisof infimal convolution on the proximal case , which occurs when ω = (cid:107) · (cid:107) . Here we develop thevariational properties of the Moreau envelope and the associated proximal map as a function ofthe base point x and the proximal parameter λ , simultaneously. We also establish conditions underwhich the proximal map is semismooth*. Special attention is given to the limiting properties as λ ↓ λ to a positive-definite matrix, but makes no statementsregarding the limiting case where λ (or its matrix counterpart) vanishes, as we do in our generalanalysis. See also Attouch’s seminal monograph [2]. In Section 5 we describe the main continuity properties of the proximal value function < λ (cid:55)→ f ( P λ f (¯ x )) , (4)where ¯ x ∈ E is held fixed. Corollary 11 establishes its Lipschitzian properties and Corollary 12characterizes it as the derivative of the map λ (cid:55)→ λe λ f (¯ x ) on R ++ . Proposition 7 describes sufficientconditions under which the proximal value function is semismooth. We use our anal-ysis of the proximal value function (4) to establish, via Proposition 8, novel variational formulasfor the Moreau envelope and proximal map of post-compositions , i.e., functions of the form g ◦ ψ ,where the scalar function g is increasing and convex, and ψ is closed proper convex. As a con-sequence, Corollary 14 provides a refined version of the epigraphical projection conditions in (1),including analogous results for the projection onto the level set of f (Corollary 13). This analysisdoes not require the function to be finite-valued, and extends existing results [3, 4]. Importantly,Corollary 14 shows that the root of the aligning equation (2) coincides with the unique minimizerof a strongly convex scalar optimization problem. It follows from Proposition 7 that the objectivefor this problem is continuously differentiable with a locally Lipschitz derivative. We use this latterproperty to derive a novel SC optimization method to find the root of the function (2) and itsanalog in the level-set case. Numerical experiments in Section 7.2 show that for projection ontothe 1-norm unit ball, the resulting SC method is competitive with two specialized state-of-the-artmethods: CONDAT [16] and IBIS [30]. M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Let Γ ( E ) denote the set of functions f : E → R that are proper closed convex,i.e., the epigraph epi f = { ( x, α ) ∈ E × R | f ( x ) ≤ α } contains no vertical lines and is closed convex.Its level sets are given by lev α f := { x ∈ E | f ( x ) ≤ α } . The Fenchel conjugate of any function f : E → R is f ∗ ( y ) = sup x ∈ E {(cid:104) y, x (cid:105) − f ( x ) } . The Jacobian of a differentiable map F : R n → R m at x ∈ R n is denoted by F (cid:48) ( x ). We denote the Euclidean projection of ¯ x onto C by P C (¯ x ). Throughout,fractions such as (1 / (2 λ )) are abbreviated as (1 / λ ).For a set C ⊂ E , its indicator function is δ C : E → R given by δ C ( x ) := 0 if x ∈ C and δ C ( x ) =+ ∞ otherwise. The subdifferential of δ C is the normal cone of C , i.e., N C (¯ x ) := ∂δ C (¯ x ) := { v ∈ E | (cid:104) v, x − ¯ x (cid:105) ≤ x ∈ C ) } , which is empty if ¯ x (cid:54)∈ C . The relative interior of C is the set ri C [38,Section 6], and the horizon cone is C ∞ := { v ∈ E | ∃{ λ k } ↓ , { x k ∈ C } : λ k x k → v } . The horizonfunction of f ∈ Γ ( E ) is the closed proper convex and positively homogeneous function f ∞ : E → R defined via epi f ∞ = (epi f ) ∞ .Let f k : E → R . Then we say that the sequence { f k } epi-converges to a function f : E → R if ∀ x ∈ E : (cid:26) ∀{ x k } → x : lim inf k →∞ f k ( x k ) ≥ f ( x ) , ∃{ x k } → x : lim sup k →∞ f k ( x k ) ≤ f ( x ) , and we write f k e → f . The sequence { f k } is said to converge continuously to f iflim k →∞ f k ( x k ) = f ( x ) ∀ x ∈ E and { x k } → x, and we write f k c → f . Furthermore, { f k } is said to converge pointwise to f iflim k →∞ f k ( x ) = f ( x ) ∀ x ∈ E , and we write f k p → f . We extend these notions to families of functions { f λ } { λ ↓ } via f λ ξ → f : ⇐⇒ ∀{ λ k } ↓ f λ k ξ → f ( ξ ∈ { p, e, c } ) .
2. Properties of the perspective map
The perspective map ω π that appears in the gen-eralized infimal convolution (3) provides a mechanism for controlling, through the parameter λ ,the degree to which the functions f and ω are combined. Beck and Teboulle [5] and Burke andHoheisel [8] promoted this technique for generating smooth approximations to nonsmooth func-tions.We work with the following definition of the perspective map of ω , which appears in Rockafel-lar [38, Corollary 13.5.1]: ω π : ( z, λ ) ∈ E ω × R (cid:55)→ λω ( z/λ ) if λ > ω ∞ ( z ) if λ = 0,+ ∞ if λ <
0. (5)For positive values of the parameter λ , the perspective map corresponds to epi-multiplication :( λ (cid:63) ω )( x ) := λω ( x/λ ) . The following result confirms the consistency of the perspective map (5) as the parameter λ decreases towards zero. Lemma 1 (Variational convergence of epi-multiplication) . Let φ ∈ Γ ( E ) . Then as λ ↓ , ( λ (cid:63) φ )( x ) → φ ∞ ( x ) for all x ∈ dom φ , and λ (cid:63) φ e → φ ∞ . .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Proof.
The pointwise convergence of ( λ (cid:63) φ ) over dom φ follows from [38, Corollary 8.5.2]. Toprove epi-convergence, observe that, for all λ > x ∈ E , ( λ (cid:63) φ )( x ) = φ π ( x, λ ) . Hence,lim inf x → ¯ x λ ↓ ( λ (cid:63) φ )( x ) = lim inf x → ¯ x λ ↓ φ π ( x, λ ) ≥ φ π (¯ x,
0) = φ ∞ (¯ x ) ∀ ¯ x ∈ E , where the inequality follows because ω π is a support function [38, Corollary 13.5.1] and thusclosed [25, Proposition 2.1.2].Fix any sequence { λ k } ↓ x ∈ dom φ . Then ( λ k (cid:63) φ )(¯ x ) → φ ∞ (¯ x ). Hence, in particular,with x k := ¯ x ( k ∈ N ), lim sup k →∞ ( λ k (cid:63) φ )( x k ) ≤ φ ∞ (¯ x ) (6)for all ¯ x ∈ dom φ . Now let ¯ x / ∈ dom φ , take ˆ x ∈ dom φ and define x k := λ k ˆ x + (1 − λ k )¯ x → ¯ x . Then φ ∞ (¯ x ) = sup t> φ (ˆ x + t ¯ x ) − φ (ˆ x ) t ≥ φ (cid:16) ˆ x + (cid:16) λ k − (cid:17) ¯ x (cid:17) − φ (ˆ x ) λ k − λ k · φ (cid:16) ˆ x + (cid:16) λ k − (cid:17) ¯ x (cid:17) − φ (ˆ x )1 − λ k for all k ∈ N sufficiently large. Hence for such k ∈ N ,( λ k (cid:63) φ )( x k ) = λ k φ (cid:18) λ k ˆ x + (1 − λ k )¯ xλ k (cid:19) ≤ (1 − λ k ) φ ∞ (¯ x ) + λ k φ (ˆ x ) . Take the limit superior to obtain (6) here. This establishes epi-convergence. (cid:3)
The following result summarizes key properties of the perspective map. It also provides a support-function representation, which means that it can be written as the support function σ D ( y ) ≡ δ ∗D ( y ) = sup x ∈D (cid:104) x, y (cid:105) for some set D . Proposition 1 (Properties of perspective map) . For ω ∈ Γ ( E ω ) , the following hold: (a) ω π ( z, λ ) = σ epi ω ∗ ( z, − λ ) , hence ω π ∈ Γ ( E ω × R ) is sublinear with dom ω π = R + (dom ω × { } ) ; (b) ( ω π ) ∗ ( y, β ) = δ epi ω ∗ ( y, − β ) ; (c) for all ( z, λ ) ∈ dom ω π , ∂ω π ( z, λ ) = { ( y, − β ) | y ∈ ∂ω ( z/λ ) , β = ω ∗ ( y ) } if λ > , { ( y, − β ) | y ∈ ∂ω ∞ ( z ) , ( y, β ) ∈ epi ω ∗ } if λ = 0 . (7) Proof.
For Parts (a) and (b) see [38, Corollary 13.5.1]. Part (c) follows from [13, Proposition 2.3]or [1, Lemma 3.8]. (cid:3)
The expression for the subdifferential (7), evaluated at the origin, reduces to ∂ω π (0 ,
0) = { ( y, − β ) ∈ epi ω ∗ } , which is just the epigraph of ω ∗ under the reflection ( z, λ ) (cid:55)→ ( z, − λ ). This fol-lows because the subdifferential formula ∂ω ∞ (0) = ∂σ dom ω ∗ (0) = dom ω ∗ ; cf. [39, Corollary 8.25].Combettes [13, Corollary 2.5] provides a simplified characterization of Proposition 1 under theadditional assumption that ω is supercoercive [3, Definition 11.11].
3. Partial infimal projection with perspective maps
Our main objective in this sectionis to deduce the variational properties of the generalized infimal convolution p L,ω,f defined by (3).Throughout this section, we make the assumptions that L is a linear map from E f × E x to E ω forEuclidean spaces E i , i ∈ { f, x, ω } , that f ∈ Γ ( E f ) and ω ∈ Γ ( E ω ), and that range L ⊆ R + dom ω .Under these standing assumptions, it follows from Theorem 1 below that p L,ω,f is convex.
M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
We lead with a general result on infimal projections.
Theorem 1 (Conjugate and subdifferentials of infimal projection) . For a function ψ ∈ Γ ( E × E ) , the infimal projection p : x ∈ E (cid:55)→ inf u ψ ( x, u ) (8) is convex and (a) p ∗ = ψ ∗ ( · , , which is closed and convex; (b) ∂p ( x ) = { v | ( v, ∈ ∂ψ ( x, ¯ u ) } for all ¯ u ∈ argmin ψ ( x, · ) ; (c) p ∗ ∈ Γ ( E ) if and only if dom ψ ∗ ( · , (cid:54) = ∅ ; (d) p ∈ Γ ( E ) if dom ψ ∗ ( · , (cid:54) = ∅ , and hence the infimum in (8) is attained when finite.Proof. For convexity of p and Parts (a,b,d,e), see, e.g., [26, Theorem 3.101]. Part (c) follows fromPart (b) via Rockafellar [38, Theorem 23.5]. (cid:3) The following auxiliary result is used in this sectionto derive conjugate and a subdifferential formulas for the value function p L,ω,f . Lemma 2 (Domain and conjugate of linear-perspective composition) . The function η : ( u, x, λ ) ∈ E f × E x × R (cid:55)→ ω π ( L ( u, x ) , λ ) is closed proper convex, i.e., η ∈ Γ ( E f × E x × R ) . The nonempty domain and its (possibly empty)relative interior are given by dom η = { ( u, x, λ ) | λ ≥ , L ( u, x ) ∈ λ · dom ω } , ri (dom η ) = { ( u, x, λ ) | λ > , L ( u, x ) ∈ λ · ri (dom ω ) } . If ri (dom η ) is nonempty, then η ∗ is the indicator to the set C = { ( w, z, µ ) | ∃ y | ( y, − µ ) ∈ epi ω ∗ , L ∗ ( y ) = ( w, z ) } . (9) Proof.
Proposition 1(a) asserts that η ∈ Γ ( E f × E x × R ), and also yields the expression for itsdomain. Now assume that ri (dom η ) is nonempty, and that there exists an element ( u, x ) such that L ( u, x ) ∈ λ · ri (dom ω ) for some λ >
0. Define the linear map ˜ L : ( u, x, λ ) (cid:55)→ ( L ( u, x ) , λ ). Then, ∅ (cid:54) = { ( u, x, t ) | t > , L ( u, x ) ∈ t · ri (dom ω ) } = { ( u, x, λ ) | ∃ t > L ( u, x ) ∈ t · ri (dom ω ) , λ = t } = ˜ L − R ++ (dom ω × { } ) (i) = ˜ L − ri ( R + (dom ω × { } )) (ii) = ri ( ˜ L − R + (dom ω × { } ))= ri ( ˜ L − dom ω π ) = ri (dom η ) , where (i) uses [38, Corollary 6.8.1] and (ii) uses [38, Theorem 6.7] and the fact that L − ri ( R + (dom ω × { } )) (cid:54) = ∅ .To derive the formula for η ∗ , observe that by our reasoning above ˜ L − ri (dom ω π ) = ri (dom η ) (cid:54) = ∅ .Hence, by [38, Theorem 16.3] and Proposition 1(b), η ∗ ( w, z, µ ) = ( ω π ◦ ˜ L ) ∗ ( w, z, µ )= inf ( u,α ) (cid:110) ( ω π ) ∗ ( u, α ) (cid:12)(cid:12)(cid:12) ˜ L ∗ ( u, α ) = ( w, z, µ ) (cid:111) = inf u { ( ω π ) ∗ ( u, µ ) | L ∗ ( u ) = ( w, z ) } = inf u { δ epi ω ∗ ( u, − µ ) | L ∗ ( u ) = ( w, z ) } = δ C ( w, z, µ ) , which establishes (9) (cid:3) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections We can now deduce the subdifferential and conjugate of the generalized convolution (3).
Theorem 2 (Conjugate and subdifferential of the generalized convolution) . Under theassumptions of Lemma 2, suppose in addition that ∃ ( u, x ) ∈ ri (dom f ) × E x : L ( u, x ) ∈ R ++ ri (dom ω ) . (10) Then the following hold for the convex function p L,ω,f defined in (3) . (a) p ∗ L,ω,f ( y, µ ) = inf w { f ∗ ( w ) | ∃ a : ( a, − µ ) ∈ epi ω ∗ , L ∗ ( a ) = ( − w, y ) } and the infimum is attainedwhen finite. (b) For all ( x, λ ) ∈ dom p L,ω,f and all ¯ u ∈ argmin u ∈ E f { f ( u ) + ω π ( L ( u, x ) , λ ) } , ∂p L,ω,f ( x, λ ) = { ( v, − ω ∗ ( y )) | y ∈ ∂ω ( L (¯ u, x ) /λ ) , (0 , v ) ∈ D (¯ u, y ) } if λ > , { ( v, − β ) | y ∈ ∂ω ∞ ( L (¯ u, x )) , (0 , v ) ∈ D (¯ u, y ) , ( y, β ) ∈ epi ω ∗ } if λ = 0 ,where D ( u, y ) := ∂f ( u ) × { } + L ∗ ( y ) . (c) p ∗ L,ω,f ∈ Γ ( E x × R ) if and only if there exist w ∈ dom f ∗ , a ∈ dom ω ∗ , ( y, µ ) ∈ E x × R such that ( a, − µ ) ∈ epi ω ∗ and L ∗ ( a ) = ( − w, y ) . In this case, p L,ω,f ∈ Γ ( E x × R ) and the infimum isattained when finite.Proof. Set p = p L,ω,f . Part (a). Observe that p ( x, λ ) = inf u ψ ( u, x, λ ) for ψ = φ + η with φ ( u, x, λ ) = f ( u ) (and η as in Lemma 2). We hence compute p ∗ ( y, µ ) = ψ ∗ (0 , y, µ )= ( φ + η ) ∗ (0 , y, µ )= inf ( w,z,δ ) φ ∗ ( w, z, δ ) + η ∗ ( − w, y − z, µ − δ )= inf w f ∗ ( w ) + δ C ( − w, y, µ )= inf w { f ∗ ( w ) | ∃ a : ( a, − µ ) ∈ epi ω ∗ , L ∗ ( a ) = ( − w, y ) } . Here the first identity uses Theorem 1. The second is clear from our definitions above. The thirdrelies on [38, Theorem 16.4] and the fact that assumption (10) is, in view of Lemma 2(b) and thefact that ri (dom φ ) = ri (dom f ) × E x × R , equivalent to the condition ri (dom η ) ∩ ri (dom φ ) (cid:54) = ∅ .The fifth uses the fact that φ ∗ ( v, y, µ ) = f ∗ ( v ) + δ { } ( y, µ ) and Lemma 2 b). The last identity issimply the definition of the set C in said proposition.Part (b). By (10) we can apply [38, Theorems 23.8-23.9] to find ∂ψ ( u, x, λ ) = ∂f ( u ) × { } × { } + ˜ L ∗ ∂ω π ( ˜ L ( u, x, λ ))= ∂f ( u ) × { } × { } + ( L ∗ × id ) ∂ω π ( L ( u, x ) , λ ) . Apply Proposition 1(c) and combine with Theorem 1 to obtain the desired result.Part(c) follows from Theorem 1(d). (cid:3)
We now consider the value function p ω,f : ( x, λ ) ∈ E × R (cid:55)→ inf u ∈ E f ( u ) + ω π ( x − u, λ ) , (11)which corresponds to the standard infimal convolution between f and ω π . This is a special caseof (3) where L ( u, x ) = x − u and E i = E with i = f, x, w . The following result specializes Theorem 1. M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Corollary 1 (Conjugate and subdifferential of infimal convolution) . For the func-tion p ω,f given by (11) , assume that f, ω ∈ Γ ( E ) and ∃ ( u, x ) ∈ ri (dom f ) × E : x − u ∈ R ++ ri (dom ω ) . (12) Then the following hold. (a) p ∗ ω,f ( y, µ ) = f ∗ ( y ) + δ epi ω ∗ ( y, − µ ) . (b) For all ( x, λ ) ∈ dom p ω,f and all ¯ u ∈ argmin u ∈ E { f ( u ) + ω π ( x − u, λ ) } we have ∂p ω,f ( x, λ ) = (cid:8) ( y, − β ) (cid:12)(cid:12) y ∈ ∂f (¯ u ) ∩ ∂ω (cid:0) x − ¯ uλ (cid:1) , β = ω ∗ ( y ) (cid:9) if λ > , { ( y, − β ) | y ∈ ∂f (¯ u ) ∩ ∂w ∞ ( x − ¯ u ) , ( y, β ) ∈ epi ω ∗ } if λ = 0 . (c) p ∗ ω,f ∈ Γ ( E ) if and only if dom p ∗ ω,f = (dom f ∗ × E ) ∩ epi ω ∗ (cid:54) = ∅ . In this case, p ω,f ∈ Γ ( E ) also,and the infimum is attained when finite.Proof. Use Theorem 2(a)–(c) and observe that L ∗ ( a ) = ( − a, a ). (cid:3) Thus far, our analysis has focused exclusively onthe variational properties of the optimal value function (3) and its specializations. We now turnour attention to the optimal solution map P ω,f : ( x, λ ) ∈ E x × R (cid:55)→ argmin u ∈ E f f ( u ) + ω π ( x − u, λ ) (13)for the infimal convolution defined by (11). In this section we describe the variational-analyticproperties of the solution map, including (Lipschitz) continuity and (directional) smoothness. Tothis end, we introduce required technical machinery from variational analysis [34, 39].Let S : E ⇒ E be a set-valued map between spaces E and E . The domain and graph of S ,respectively, are the sets dom S := { x | S ( x ) (cid:54) = ∅ } and gph S := { ( x, u ) ∈ E × E | u ∈ S ( x ) } . The outer limit of S at ¯ x isLim sup x → ¯ x S ( x ) := { y ∈ E x | ∃{ x k } → ¯ x, { y k ∈ S ( x k ) } → y } . Now let A ⊂ E . The tangent cone of A at ¯ x ∈ A is T A (¯ x ) := Lim sup t ↓ ( A − ¯ x ) /t. The regular normalcone of A at ¯ x ∈ A is the polar of the tangent cone, i.e., ˆ N A (¯ x ) := { v | (cid:104) v, y (cid:105) ≤ ∀ y ∈ T A (¯ x ) } . The limiting normal cone of A at ¯ x ∈ A is N A (¯ x ) := Lim sup x → ¯ x ˆ N A ( x ) . The coderivative of S at(¯ x, ¯ y ) ∈ gph S is the map D ∗ S (¯ x | ¯ y ) : E ⇒ E defined via v ∈ D ∗ S (¯ x | ¯ y )( y ) ⇐⇒ ( v, − y ) ∈ N gph S (¯ x, ¯ y ) . The graphical derivative of S at (¯ x, ¯ y ) is the map DS (¯ x | ¯ y ) : E f ⇒ E x given by v ∈ DS (¯ x | ¯ y )( u ) ⇐⇒ ( u, v ) ∈ T gph S (¯ x, ¯ y ) , or, equivalently, DS (¯ x | ¯ y )( u ) = DS (¯ x | ¯ y )( u ) = Lim sup t ↓ , u (cid:48) → u S (¯ x + tu (cid:48) ) − ¯ yt [39, Eq. 8(14)]. The strictgraphical derivative of S at (¯ x, ¯ y ) is D ∗ S (¯ x | ¯ y ) : E f ⇒ E x given by D ∗ S (¯ x | ¯ y )( w ) = (cid:26) z (cid:12)(cid:12)(cid:12)(cid:12) ∃ (cid:26) { t k } ↓ , { w k } → w, { z k } → z, { ( x k , y k ) ∈ gph S } → (¯ x, ¯ y ) (cid:27) : z k ∈ S ( x k + t k w k ) − y k t k (cid:27) . .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections We adopt the convention to set D ∗ S (¯ x ) := D ∗ S (¯ x | ¯ u ) if S (¯ x ) is a singleton, and proceed analogouslyfor the graphical derivatives.The above generalized derivatives possess the following definiteness properties when applied toa maximally monotone operator T : E ⇒ E , which (by definition) satisfies the inequality (cid:104) v − w, x − y (cid:105) ≥ ∀ ( v, w ) ∈ T ( x ) × T ( y ) , and there is no enlargement of gph T without destroying this inequality. Our conclusion relies on Minty parameterization . Lemma 3.
Let T : E ⇒ E be maximally monotone and let (¯ y, ¯ u ) ∈ gph T . Then the pair ( w, z ) ∈ E × E satisfies (cid:104) w, z (cid:105) ≥ if one of the following conditions hold: (a) w ∈ D ∗ T (¯ y | ¯ u )( z ) ; (b) z ∈ D ∗ T (¯ y | ¯ u )( w ) ; (c) z ∈ DT (¯ y | ¯ u )( w ) .Proof. Part (a). See [34, Theorem 5.6].Part (b). For z ∈ D ∗ T (¯ y | ¯ u )( w ) there exist { z k } → z, { t k ↓ } , { ( y k , u k ) ∈ gph T } → (¯ y, ¯ u ), and { w k } → w such that t k z k ∈ T ( y k + t k w k ) − u k ∀ k ∈ N . (14)Now let λ > J λT := ( λT + id ) − . By Minty parameterization [3, Remark 23.23], thereexists { x k } such that ( y k , u k ) = ( J λT ( x k ) , ( x k − J λ ( x k ) /λ ) for all k ∈ N . Combining this with (14)yields x k + t k ( λz k + w k ) ∈ ( λT + id )( y k + t k w k ). Thus, as y k = J λT ( x k ), we have t k w k = J λT ( x k + t k ( λz k + w k ) − J λT ( x k ) ( k ∈ N ) . Because J λT is firmly nonexpansive [3, Proposition 23.8] and hence1-Lipschitz, it follows that (cid:107) w k (cid:107) ≤ (cid:107) λz k + w k (cid:107) ( k ∈ N ), hence (cid:107) w (cid:107) ≤ (cid:107) λz + w (cid:107) . We infer that − ( λ/ (cid:107) z (cid:107) ≤ (cid:104) z, w (cid:105) . Since λ > λ ↓ DS (¯ x | ¯ u )( w ) ⊂ D ∗ S (¯ x | ¯ u )( w ) for all w ∈ E f . (cid:3) We record another auxiliary result. Here we call S : E f ⇒ E x proto-differentiable at (¯ x, ¯ u ) ∈ gph S if for any ¯ z ∈ DS (¯ x | ¯ u )( ¯ w ) and any { t k } ↓ { w k } → ¯ w and { z k } → ¯ z such that z k ∈ ( S (¯ x + t k w k ) − ¯ u ) /t k for all k ∈ N . Lemma 4.
Let S : E ⇒ E be given by S = F + T , where F is smooth and T is proto-differentiable at (¯ x, ¯ u − F (¯ x )) . Then S is proto-differentiable at (¯ x, ¯ u ) .Proof. Let z ∈ DS (¯ x | ¯ u )( w ) and { t k } ↓
0. Then z − F (cid:48) (¯ x ) w ∈ DT (¯ x | ¯ u − F (¯ x ))( w ) , cf. [39,Exercise 10.43]. By assumption on T , there exist ˜ z k → z − F (cid:48) (¯ x ) w and w k → w such that ˜ z k ∈ [ T (¯ x + t k w k ) − (¯ u − F (¯ x ))] /t k , i.e., ˜ z k + [ F (¯ x + t k w k ) − F (¯ x )] /t k ∈ [ S (¯ x + t k w k ) − ¯ u ] /t k for all k ∈ N .Therefore, z k := ˜ z k + [ F (¯ x + t k w k ) − F (¯ x )] /t k → z and z k ∈ [ S (¯ x + t k w k ) − ¯ u ] /t k for all k ∈ N whichshows the proto-differentiability of S at (¯ x, ¯ u ). (cid:3) The next and main result in this subsection is based on the implicit mapping framework describedby Rockafellar and Wets [39, Theorem 9.56] together with Lemma 3.
Theorem 3 (Variational properties of the solution map) . Let f ∈ Γ ( E ) and let ω : E → R be strictly convex, level-bounded and twice continuously differentiable. Let ¯ x ∈ E and ¯ λ > , set ¯ y := P ω,f (¯ x, ¯ λ ) and ¯ V := ∇ ω (cid:0) ¯ x − ¯ y ¯ λ (cid:1) . Then for the solution map P ω,f from (13) the following hold: (a) We have dom P ω,f ⊂ E × R + and P ω,f ( · , λ ) is single-valued for all λ > . (b) If ¯ V is positive definite, then P ω,f is locally Lipschitz at (¯ x, ¯ λ ) . (c) If ¯ V is positive definite and ∂f is proto-differentiable at (cid:0) ¯ y, ∇ ω (cid:0) ¯ x − ¯ y ¯ λ (cid:1)(cid:1) , then P ω,f is is direc-tionally differentiable at (¯ x, ¯ λ ) . Concretely, for all ( d, ∆) ∈ E × R , we have P (cid:48) ω,f ((¯ x, ¯ λ ); ( d, ∆)) = (cid:20) ¯ λD ( ∂f ) (cid:18) ¯ y (cid:12)(cid:12)(cid:12) ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) + ¯ V (cid:21) − (cid:18) ¯ V d − ∆¯ λ ¯ V (¯ x − ¯ y ) (cid:19) . In fact, semidifferentiable at (¯ x, ¯ λ ) in the sense of [39, p. 332]. M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Proof.
Set P := P ω,f . Part (a). For λ > x ∈ E , the function u (cid:55)→ f ( y ) + ω π ( x − y, λ ) is lsc,proper, strictly convex and level-bounded, and therefore attains a unique minimum.Part (b). Without loss of generality, let E = R n , and observe that, for λ >
0, we have P ( x, λ ) = { y | ∈ S ( x, λ, y ) } , where S ( x, λ, u ) := ∂f ( y ) − ∇ ω (cid:0) x − uλ (cid:1) ( λ > D ∗ S (¯ x, ¯ λ, ¯ y | y ) = (cid:20) − λ ¯ V y, (¯ x − ¯ y ) T ¯ λ ¯ V y, λ ¯ V y (cid:21) + { } × { } × D ∗ ( ∂f ) (cid:18) ¯ y (cid:12)(cid:12)(cid:12) ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( y ) . Hence, ( r, γ, ∈ D ∗ S (¯ x, ¯ λ, ¯ y | y ) if and only if r = − λ ¯ V y, γ = (¯ x − ¯ y ) T ¯ λ ¯ V y, − λ ¯ V y ∈ D ∗ ( ∂f ) (cid:18) ¯ y (cid:12)(cid:12)(cid:12) ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( y ) . Invoke Lemma 3(a) and use ¯ V (cid:31) y = 0, hence r = 0 and γ = 0. Therefore, by [39,Theorem 9.56 (a)], we see that P has the Aubin property at (¯ x, ¯ λ ) for ¯ y , and since P is single-valued,it is locally Lipschitz at (¯ x, ¯ λ ).Part (c). With the definitions from Part (b), recall that the implication( r, γ, ∈ D ∗ S (¯ x, ¯ λ, ¯ y | y ) ⇒ ( r, γ ) = 0 , y = 0was proved. Now let0 ∈ D ∗ S (¯ x, ¯ λ, ¯ y | (cid:16) w (cid:17) = 1¯ λ ¯ V w + D ∗ ( ∂f ) (cid:18) ¯ y | ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( w ) , see [39, Exercise 10.43], i.e., − λ ¯ V w ∈ D ∗ ( ∂f ) (cid:18) ¯ y | ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( w ) . By Lemma 3(b), we find that w = 0. Since ∂f is assumed to be proto-differentiable at (cid:0) ¯ y, ∇ ω (cid:0) ¯ x − ¯ y ¯ λ (cid:1)(cid:1) ,Lemma 4 yields that S is proto-differentiable at ((¯ x, ¯ λ, ¯ y ) , (cid:3) Remark 1 (Proto-differentiability of ∂f from full amenability). Let f ∈ Γ ( E )and ¯ x ∈ dom f . By [39, Corollary 13.41], there exists a neighborhood V of ¯ x such that ∂f is proto-differentiable at x ∈ V ∩ dom f for any v ∈ ∂f ( x ) if f is fully amenable at ¯ x in the sense that (on aneighborhood of ¯ x ) f = g ◦ F with g ∈ Γ ( E x ) piecewise linear-quadratic and F ∈ C ( E f , E x ) suchthat ker F (cid:48) (¯ x ) ∗ ∩ N cl (dom g ) ( F (¯ x )) = { } . This comprises the following special cases: • f ( x ) = max mi =1 f i ( x ) with f i ∈ Γ ( E ) ∩ C ; • f is (convex and) piecewise linear quadratic; • f is (convex and) twice continuously differentiable.Since a strongly convex function is both strictly convex and level-bounded (in fact supercoercive)and has positive definite Hessian everywhere, and since we have D ( ∂f ) = ∇ f wherever f is twicecontinuously differentiable, we immediately obtain the following result which, of course, can alsobe derived directly from the implicit function theorem. Corollary 2 (Differentiability of the solution map) . Let (¯ x, ¯ λ ) ∈ E × R ++ such that f ∈ Γ ( E ) is twice continuously differentiable around P ω,f (¯ x, ¯ λ ) , and let ω ∈ Γ ( E ) be strongly convexand twice continuously differentiable. Then P ω,f from (13) is continuously differentiable around (¯ x, ¯ λ ) . Concretely, for all ( x, λ ) sufficiently close to (¯ x, ¯ λ ) and for all ( d, ∆) ∈ E × R , we have P (cid:48) ω,f ( x, λ )( d, ∆) = (cid:0) λ ∇ f ( y ) + V (cid:1) − (cid:20) V d − ∆ · V (cid:18) x − yλ (cid:19)(cid:21) , where y := P ω,f ( x, λ ) and V := ∇ ω (cid:0) x − yλ (cid:1) . .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections We now refine our study of smoothness properties of the solutionmap P ω,f . We base our analysis on the notion of semismoothness* recently established by Gfrererand Outrata [23], which, in turn, relies on the notion of the directional normal cone introduced byGinchev and Mordukohovich [24] and further advanced by Gfrerer et al. [6, 21, 22].For ¯ x ∈ A ⊂ E , the directional normal cone in the direction ¯ u ∈ E is given by N (¯ x ; ¯ u ) = Lim sup u → ¯ u, t ↓ ˆ N A (¯ x + tu ) . Note that N (¯ x ; ¯ u ) = ∅ if ¯ u / ∈ T A (¯ x ) and that N (¯ x ; ¯ u ) ⊂ N A (¯ x ) for all u ∈ E . Given a set-valuedmap S : E f ⇒ E x , based on the directional normal cone, we define the directional coderivative [21] D ∗ S ((¯ x, ¯ u ); ( u, v )) : E x ⇒ E f of S at (¯ x, ¯ y ) ∈ gph S in the direction ( u, v ) viagph D ∗ S ((¯ x, ¯ u ); ( u, v ))( v ∗ ) = { u ∗ ∈ E f | ( u ∗ , − v ∗ ) ∈ N gph S ((¯ x, ¯ y ); ( u, v )) } . As N (¯ x ; ¯ u ) = ∅ if ¯ u / ∈ T A (¯ x ), we also havedom D ∗ S ((¯ x, ¯ u ); ( u, v )) = ∅ ∀ ( u, v ) / ∈ DS (¯ x | ¯ u ) . (15) Definition 1 (Semismothness*).
The set A ⊂ E is semismooth* at ¯ x ⊂ A if (cid:104) x ∗ , u (cid:105) = 0 ∀ u ∈ E , x ∗ ∈ N A (¯ x ; u ) . The map S : E ⇒ E is semismooth* at (¯ x, ¯ y ) ∈ gph S if gph S is semismooth* at (¯ x, ¯ y ), i.e., (cid:104) u, u ∗ (cid:105) = (cid:104) v, v ∗ (cid:105) ∀ ( u, v ) ∈ E × E , ( v ∗ , u ∗ ) ∈ gph D ∗ S ((¯ x, ¯ u ); ( u, v )) . The notion of metric (sub)regularity is used only in the next two results, and hence we refer thereader to the abundant literature for a definition, e.g., [18].
Proposition 2 (Metric regularity and semismoothness*) . Let F : E → E be continu-ously differentiable at ¯ x , let Q ⊂ E be semismooth* (as a set) at F (¯ x ) and let S : E ⇒ E , S ( x ) := F ( x ) − Q be metrically subregular at (¯ x, . Then F − ( Q ) is semismooth* at ¯ x (as a set).Proof. By [6, Theorem 3.1], for any h ∈ E , N F − ( Q ) (¯ x ; h ) ⊂ F (cid:48) (¯ x ) ∗ N Q ( F (¯ x ); F (cid:48) (¯ x ) h ) , (16)see also [6, Remark 2.1]. Since Q is semismooth* at F (¯ x ), (cid:104) v, z (cid:105) = 0 ∀ z ∈ E , v ∈ N Q ( F (¯ x ); z ) . Therefore (cid:104) v, F (cid:48) (¯ x ) h (cid:105) = 0 ∀ h ∈ E , v ∈ N Q ( F (¯ x ); F (cid:48) (¯ x ) h ) , and hence (cid:104) u, h (cid:105) = 0 ∀ h ∈ E , u ∈ F (cid:48) (¯ x ) ∗ N Q ( F (¯ x ); F (cid:48) (¯ x ) h ) . By (16) this implies that (cid:104) u, h (cid:105) = 0 ∀ h ∈ E , u ∈ N F − ( Q ) (¯ x ; h ) , i.e., F − ( Q ) is semismooth* at ¯ x . (cid:3) Corollary 3 (Semismoothness* of the infimal convolution solution map) . Let f ∈ Γ ( E ) , let (¯ x, ¯ λ ) ∈ E × R ++ and let ω be strongly convex and twice continuously differentiable.Then the map P ω,f from (13) is semismooth* at ((¯ x, ¯ λ ) , P ω,f (¯ x, ¯ λ )) if ∂f is semismooth* at (cid:0) P ω,f (¯ x, ¯ λ ) , ∇ ω ( λ [¯ x − P ω,f (¯ x, ¯ λ )]) (cid:1) . M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Proof.
Without loss of generality, assume E = R n . Let F : R n × R ++ × R n → R n , F ( x, λ, z ) :=( z, ∇ ω ([ x − z ] /λ ) . Then for all x, z ∈ R n and λ >
0, setting V := ∇ ω ([ x − z ] /λ ) (cid:31) F (cid:48) ( x, λ, z ) = (cid:18) I λ V − λ V ( x − z ) − λ V (cid:19) . Hence, ker F (cid:48) ( x, λ, z ) ∗ = { } for all x, z ∈ R n , λ >
0. Thus, ( x, λ, z ) (cid:55)→ F ( x, λ, z ) − gph ∂f is metri-cally regular. As gph P ω,f = F − (gph ∂f ), if ∂f is semismooth* at (cid:16) P ω,f (¯ x, ¯ x ) , ∇ ω (cid:16) ¯ x − P ω,f (¯ x, ¯ λ )¯ λ (cid:17)(cid:17) = F (¯ x, ¯ λ, P ω,f (¯ x, ¯ λ )), by Proposition 2, P ω,f is semismooth* at ((¯ x, ¯ λ ) , P ω,f (¯ x, ¯ λ )). (cid:3) Corollary 3 provides a sufficient criterion for establishing semismoothness* of the solution map P on the interior of its domain. It will be a topic of future research to exploit this on a broadscale, but we can immediately state the following result for a function f ∈ Γ ( E ) which is eithertwice continuously differentiable or piecewise linear-quadratic (PLQ) in the sense of Rockafellarand Wets [39, Definition 10.20]. Proposition 3 (Semismoothness* of the subdifferential) . For f ∈ Γ ( E ) , the subgradi-ent ∂f is semismooth* at (¯ x, ¯ y ) ∈ gph ∂f under one of the following conditions: (a) f is twice continuously differentiable at ¯ x ; (b) f is piecewise linear-quadratic (in which case ∂f is semismooth* on E ).Proof. Assume condition (a) holds. If f is twice continuously differentiable, then D ( ∂f )(¯ x | ¯ y ) = ∇ f (¯ x ) = D ∗ ( ∂f )(¯ x | ¯ y ), see [39, Example 8.43]. Now let ( u, v ) ∈ T gph ∂f (¯ x, ¯ y ), i.e., v ∈ D ( ∂f )(¯ x | ¯ y )( u ) = {∇ f (¯ x ) u } , and let ( x ∗ , y ∗ ) ∈ N gph ∂f ((¯ x, ¯ y ); ( u, v )) ⊂ N gph ∂f (¯ x, ¯ y ), hence x ∗ ∈ D ∗ ( ∂f )(¯ x | ¯ y )( − y ∗ ) = {−∇ f (¯ x ) y ∗ } . Thus, we have (cid:104) ( x ∗ , y ∗ ) , ( u, v ) (cid:105) = (cid:104) y ∗ , ∇ f (¯ x ) u (cid:105) − (cid:104)∇ f (¯ x ) y ∗ , u (cid:105) = 0 . Now assume condition (b) holds. It follows from [39, Proposition 12.30] that gph ∂f is a finiteunion of polyhedra. Then [23, Proposition 3.4/3.5] yields that gph ∂f is semismooth*, which givesthe desired statement. (cid:3) We now consider an application of Theorem 1 to derive thevariational properties of the optimal value of the constrained optimization problem v : ( x, λ ) ∈ E x × R (cid:55)→ inf u ∈ E f { f ( u ) | L ( u, x ) ∈ λS } , (17)where S ⊂ E ω is a closed convex set. This function can be viewed as a special case of (3), where ω = δ S for some closed convex set S ⊂ E ω . To see this, it is sufficient to note that δ πS ( z, t ) = δ λS ( z ) if λ > δ S ∞ ( z ) if λ = 0,+ ∞ otherwise,and thus L ( u, x ) ∈ λS if and only if δ πS ( L ( u, x ) , λ ) vanishes. Let S ◦ := { v | (cid:104) v, s (cid:105) ≤ ∀ x ∈ S } be thepolar to the set S .The following result is an immediate consequence of the general study in Theorem 2. Corollary 4 (Conjugate and subdifferential of the constrained value function) . Let v be given by (17) with S ⊂ E ω closed and convex, and assume that ∃ u ∈ ri dom f, x ∈ E x : L ( u, x ) ∈ R ++ (ri S ) . Then the following hold. .P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections (a) We have v ∗ ( y, µ ) = inf w { f ∗ ( w ) | ∃ a ∈ − µS ◦ : L ∗ ( a ) = ( − w, y ) } . If S is a cone then v ∗ ( y, µ ) = inf w (cid:8) f ∗ ( w ) + δ R − ( µ ) | ( − w, y ) ∈ L ∗ ( S ◦ ) (cid:9) . (b) For any ( x, λ ) ∈ dom v and ¯ u ∈ argmin u { f ( u ) | L ( u, x ) ∈ λS } , ∂v ( x, λ ) = (cid:40) { ( v, − σ S ( y )) | y ∈ N S ( L (¯ u, x ) /λ ) , (0 , v ) ∈ D (¯ u, y ) } if λ > , { ( v, − β ) | ∃ y ∈ N S ∞ ( L (¯ u, y )) ∩ ( βS ◦ ) : (0 , v ) ∈ D (¯ u, y ) } if λ = 0 ,where D ( u, y ) := ∂f ( u ) × { } + L ∗ ( y ) . If S is bounded (hence compact), then ∂v ( x, λ ) = (cid:40) { ( v, − σ S ( y )) | y ∈ N S ( L (¯ u, x ) /λ ) , (0 , v ) ∈ D (¯ u, y ) } if λ > , { ( v, − β ) | ∃ y ∈ βS ◦ : (0 , v ) ∈ D (¯ u, y ) } if λ = 0 . (c) We have v ∗ ∈ Γ ( E x × R ) if and only if there exist y ∈ E x , w ∈ dom f ∗ , β ∈ R such that ( − w, y ) ∈− βL ∗ ( S ◦ ) . In this case, also v ∈ Γ ( E x × R ) and the infimum is attained when finite.Proof. Part (a) follows from Theorem 2(a) with w ∗ = σ S . If S is a cone then w ∗ = δ S ◦ . Part (b)follows from Theorem 2(b), observing that ω ∞ = δ S ∞ and that S ∞ = { } if S is bounded, in whichcase N S ∞ = E ω . Part (c) follows from (a) and Theorem 2(c). (cid:3) As an immediate specialization of Corollary 4 we obtaina result on the value function v : ( b, λ ) ∈ R m × R (cid:55)→ inf x ∈ R n { f ( x ) | (cid:107) Ax − b (cid:107) ≤ λ } , (18)where f ∈ Γ ( R n ), A ∈ R m × n is a matrix, and (cid:107) · (cid:107) is any norm in R n . Denote the associated dualnorm by (cid:107) · (cid:107) ◦ , and the corresponding unit-norm ball by B . Corollary 5 (Relaxed linear constraints value function) . If there exists a pair ( x, λ ) ∈ dom f × R ++ such that (cid:107) Ax − b (cid:107) < λ , then the following hold. (a) (conjugate) v ∗ ( y, µ ) = f ∗ ( A T y ) + δ µ B ◦ ( y ) , which is closed proper convex if and only if there exists β and (cid:107) y (cid:107) ◦ ≤ β such that A T y ∈ dom f ∗ . In this case, v is closed proper convex and the infimumis attained when finite. (b) (subdifferential) For any ( b, λ ) ∈ dom v and ¯ x that achieves the infimum in (18) (and hence (cid:107) A ¯ x − b (cid:107) ≤ λ ), ∂v ( b, λ ) = (cid:40) { ( y, −(cid:107) y (cid:107) ◦ ) | y ∈ N B ([ A ¯ x − b ] /λ ) , − A T y ∈ ∂f (¯ x ) } if λ > , { ( y, − β ) | (cid:107) y (cid:107) ◦ ≤ β, − A T y ∈ ∂f (¯ x ) } if λ = 0 . (c) (primal existence) For λ > and any b ∈ R m , if f ∞ ( y ) > ∀ y ∈ ker A \ { } , (19) then argmin x { f ( x ) + δ π B ( Ax − b, λ ) } (cid:54) = ∅ . This holds, e.g., when f is level-bounded or rank A = n .Proof. Part (a). The expression for the conjugate v ∗ follows from Corollary 4(a) by observingthat L : ( x, b ) (cid:55)→ Ax − b has adjoint L ∗ : z (cid:55)→ ( A T z, − z ) and that σ B = (cid:107) · (cid:107) ◦ . The remaining claimsfor Part (a) follow from Theorem 1. M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Part (b) follows from Corollary 4(b) with the foregoing observations. d) For λ > b ∈ R m ,the effective objective function in (18) is φ ( x ) := f ( x ) + δ λ B ( Ax − b ). With ˆ x such that (cid:107) A ˆ x − b (cid:107) ≤ λ ,which exists by the hypothesis of the theorem, we have( δ λ B ◦ ( A ( · ) − b )) ∞ ( x ) = sup τ> δ λ B ( A ˆ x − b + τ Ax ) = δ ker A ( x ) , where the second identity uses the property that λ B is bounded. With [39, Exercise 3.29] we hencefind that φ ∞ = f ∞ + δ ker A , which shows, using [39, Theorem 3.26], that φ is level-bounded if (19)holds. (cid:3)
4. Moreau envelope and proximal map
In this section we outline existing and new resultsregarding the variational properties of the Moreau envelope and the proximal map of a closedproper convex function.
The
Moreau envelope of f ∈ Γ ( E ) is defined by e λ f ( x ) := min u ∈ E (cid:8) f ( u ) + (1 / λ ) (cid:107) x − u (cid:107) (cid:9) ∀ x ∈ E , λ > , which has a Lipschitz gradient given by ∇ e λ f ( x ) = λ ( x − P λ f ( x )).The following result summarizes limiting properties of the Moreau envelope as λ ↓ Proposition 4 (Convergence of the Moreau envelope) . For f ∈ Γ ( E ) , the followinghold as λ ↓ : (a) e λ f e → f and e λ f p → f (in fact e λ f ( x ) ↑ f ( x ) for all x ∈ E ); (b) λf e → δ cl (dom f ) ; (c) λe λ f ( x ) → d f ) (¯ x ) as x → ¯ x ; (d) λ∂f converges to N cl (dom f ) graphically in the sense of [39, Definition 5.32]; (e) for x ∈ dom ∂f we have ∇ e λ f ( x ) → argmin g ∈ ∂f ( x ) (cid:107) g (cid:107) .Proof. Part (a). See, e.g., [39, Theorem 1.25, Proposition 7.4].Part (b). By Lemma 1(b) and [38, Theorem 13.3], λ (cid:63) f ∗ e → ( f ∗ ) ∞ = σ dom f . Wijsman’s theo-rem [39, Theorem 11.34] then yields λf = ( λ (cid:63) f ∗ ) ∗ e → δ cl (dom f ) . Part (c). By Part (b), λf e → δ cl (dom f ) . Hence, by [39, Theorem 7.37], λe λ f = e ( λf ) c → e δ cl (dom f ) = d f ) . Part (d). Follows from Part (b) and Attouch [39, Theorem 12.35].Part (e). See [2, Remark 3.32]. (cid:3)
Note that Proposition 4(e) implies that there exists
K > ∀ ¯ x ∈ dom ∂f ∃ K > ∀ λ > (cid:107) P λ f (¯ x ) − ¯ x (cid:107) ≤ Kλ. (20)Proposition 4(a) suggests the following extension of the Moreau envelope at λ = 0: p f : ( x, λ ) ∈ E × R (cid:55)→ e λ f ( x ) if λ > f ( x ) if λ = 0,+ ∞ if λ < p ω,f from (11) with ω = (cid:107) · (cid:107) . Hence, we may rely on our generalstudy on infimal convolution from Section 3.3 to understand the properties of this extension of theMoreau envelope. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Corollary 6 (Conjugate and subdifferential of the Moreau envelope) . Let f ∈ Γ ( E ) .Then p f ∈ Γ ( E × R ) and (a) p ∗ f ( y, µ ) = f ∗ ( y ) + δ epi (cid:107)·(cid:107) ( y, − µ ) and p ∗ f ∈ Γ ( E × R ) ; (b) for all ( x, λ ) ∈ dom p f , ∂p f ( x, λ ) = (cid:40)(cid:0) λ [ x − P λ f ( x )] , − (cid:107) λ [ x − P λ f ( x )] (cid:107) (cid:1) if λ > , (cid:8) ( v, β ) (cid:12)(cid:12) − v ∈ ∂f ( x ) , (cid:107) v (cid:107) ≤ β (cid:9) if λ = 0 . Proof.
We are in the situation of Corollary 1 with ω = (cid:107) · (cid:107) . In particular, the qualificationcondition (12) is trivially satisfied. (cid:3) We now turn our attention to the proximal map. Itis straightforward to show P λ f ( x ) → x as λ ↓ x ∈ dom f . The following proposition, whichgeneralizes this statement, can be derived from monotone operator theory [39, Theorem 12.37].The proof that we provide here instead relies on epigraphical convergence. Proposition 5 (Convergence of the proximal map) . Let f ∈ Γ ( E ) and ¯ x ∈ E . Then lim λ ↓ , x → ¯ x P λ f ( x ) = P cl (dom f ) (¯ x ) . Proof.
Let { λ k } ↓ { x k } → ¯ x , and φ k ( u ) := λ k f ( u ) + (cid:107) u − x k (cid:107) . Use Proposition 4(b) to deduce λ k f e → δ cl (dom f ) . Then because (cid:107) ( · ) − x k (cid:107) c → (cid:107) ( · ) − ¯ x (cid:107) , we obtain φ k e → φ := δ cl (dom f ) + (cid:107) ( · ) − ¯ x (cid:107) ; see [39, Theorem 7.46 b)]. Now observe that P λ k f ( x k ) = argmin φ k and P cl (dom f ) (¯ x ) = argmin φ .Since all functions φ k are convex and φ is level-bounded (in fact, strongly convex), the sequence { φ k } is, by [39, Exercise 7.32 c)], eventually level-bounded (in the sense of [39, p. 266]). Therefore,we can apply [39, Theorem 7.33], with ε k = 0 ( k ∈ N ), to deduce P λ k f ( x k ) → P cl (dom f ) (¯ x ) . (cid:3) We record the following auxiliary result.
Lemma 5.
Let f ∈ Γ ( E ) and fix positive scalars λ and µ . Then for all x ∈ E , µ (cid:0) (cid:107) P µ f ( x ) − x (cid:107) − (cid:107) P λ f ( x ) − x (cid:107) + (cid:107) P µ f ( x ) − P λ f ( x ) (cid:107) (cid:1) ≤ f ( P λ f ( x )) − f ( P µ f ( x )) ≤ λ (cid:0) (cid:107) P µ f ( x ) − x (cid:107) − (cid:107) P λ f ( x ) − x (cid:107) − (cid:107) P µ f ( x ) − P λ f ( x ) (cid:107) (cid:1) , (21) and (cid:107) P λ f ( x ) − P µ f ( x ) (cid:107) ≤ µ − λλ + µ (cid:0) (cid:107) P µ f ( x ) − x (cid:107) − (cid:107) P λ f ( x ) − x (cid:107) (cid:1) . (22) Proof.
Set P ( τ ) := P τ f (¯ x ) for all τ >
0. To obtain the bounds in (21), use [39, Eq. 7(34)] to infer f ( x ) + 12 τ (cid:107) x − ¯ x (cid:107) − f ( P ( τ )) − τ (cid:107) P ( τ ) − ¯ x (cid:107) ≥ τ (cid:107) x − P ( τ ) (cid:107) ∀ τ > , ∀ x ∈ E . For τ = λ and x = P ( µ ), we hence obtain f ( P ( µ )) + 12 λ (cid:107) P ( µ ) − ¯ x (cid:107) − f ( P ( λ )) − λ (cid:107) P ( λ ) − ¯ x (cid:107) ≥ λ (cid:107) P ( µ ) − P ( λ ) (cid:107) . Analogously, for τ = µ and x = P ( λ ), we find that f ( P ( λ )) + 12 µ (cid:107) P ( λ ) − ¯ x (cid:107) − f ( P ( µ )) − µ (cid:107) P ( µ ) − ¯ x (cid:107) ≥ µ (cid:107) P ( λ ) − P ( µ ) (cid:107) . Combining the last two inequalities now yields (21). M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Next, use (21) to obtain1 µ (cid:0) (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) + (cid:107) P ( λ ) − P ( µ ) (cid:107) (cid:1) ≤ λ (cid:0) (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) − (cid:107) P ( λ ) − P ( µ ) (cid:107) (cid:1) , or, equivalently (cid:18) λ + 1 µ (cid:19) (cid:107) P ( λ ) − P ( µ ) (cid:107) ≤ (cid:18) λ − µ (cid:19) (cid:0) (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) (cid:1) , which is equivalent to the desired inequality (22) (cid:3) Proposition 5 suggests the following extension of the proxi-mal map of f ∈ Γ ( E ): P f : E × R ⇒ E , P f ( x, λ ) := P λ f ( x ) if λ > P cl (dom f ) ( x ) if λ = 0, ∅ if λ < P f . Corollary 7 (Lipschitz continuity of the proximal map) . Let f ∈ Γ ( E ) . Then P f iscontinuous on dom P f = E × R + and is locally Lipschitz on int (dom P f ) . If ¯ x ∈ dom ∂f , then P f is upper Lipschitz (or calm ) at (¯ x, , and the map R + (cid:51) µ (cid:55)→ P f (¯ x, µ ) is locally Lipschitz at , i.e.,there exist positive scalars κ and ε such that (cid:107) P f (¯ x, − P f ( x, λ ) (cid:107) ≤ κ (cid:107) (¯ x − x, λ ) (cid:107) ∀ ( x, λ ) ∈ B ε (¯ x, ∩ dom P f , (23a) (cid:107) P f (¯ x, λ ) − P f (¯ x, µ ) (cid:107) ≤ κ | µ − λ | ∀ λ, µ ∈ [0 , ε ] . (23b) Proof.
The continuity to the boundary of the domain follows from Proposition 5. The localLipschitz continuity on int (dom P f ) follows from Theorem 3 with ω = (cid:107) · (cid:107) .Now assume that ¯ x ∈ dom ∂f , which implies P f (¯ x,
0) = ¯ x ∈ dom f . Then for all λ > (cid:107) P f ( x, λ ) − P f (¯ x, (cid:107) ≤ (cid:107) P λ f ( x ) − P λ f (¯ x ) (cid:107) + (cid:107) ¯ x − P λ f (¯ x ) (cid:107) ≤ (cid:107) x − ¯ x (cid:107) + Kλ, where
K > P λ f is 1-Lipschitz [3]. Set κ := max { , K } to obtain (23a).Let P := P f (¯ x, · ). By (23a), there exist positive scalars κ and ε such that (cid:107) P ( τ ) − ¯ x (cid:107) ≤ κτ for all τ ∈ (0 , ε ]. Hence for µ and λ in (0 , ε ], (cid:107) P ( µ ) − P ( λ ) (cid:107) ≤ µ − λµ + λ ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) )= µ − λµ + λ ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ) · ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) ) ≤ µ − λµ + λ κ ( µ + λ ) ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ) ≤ κ | µ − λ | · (cid:107) P ( µ ) − P ( λ ) (cid:107) , where the first inequality follows from (22) of Lemma 5, and the last inequality uses the reversetriangle inequality. Use (23a) to obtain (23b). (cid:3) The following example shows that the assumption ¯ x ∈ dom ∂f required for (23) is not redundant. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Example 1 (Upper Lipschitz continuity of proximal map).
Consider the followingtwo functions, both contained in Γ ( R ): f ( x ) = (cid:40) − log x if x > ∞ otherwise , g ( x ) = (cid:40) −√ x if x ≥ ∞ otherwise . The corresponding extended proximal maps are P f ( x, λ ) = (cid:40) ( x + √ x + 4 λ ) if λ > { x, } if λ = 0, P g (0 , λ ) = (cid:18) λ (cid:19) / ∀ λ ≥ P f . Observe that dom f does not include the origin,and | P f (0 , − P f (0 , λ ) | = √ λ for all λ >
0, which is not upper Lipschitz at (0 , ∂g does not include the origin, and P g is not upper Lipschitz at (0 , (cid:5) The next result on directional differentiability of P f follows from Theorem 3(c) with ω = (cid:107) · (cid:107) . Corollary 8 (Directional differentiability of the proximal map) . Let f ∈ Γ ( E ) andfix ( x, λ ) ∈ E × R ++ . If ∂f is proto-differentiable at (cid:0) P f ( x, λ ) , λ [ x − P f ( x, λ )] (cid:1) , then P f is direc-tionally differentiable at ( x, λ ) with P (cid:48) f (( x, λ ); ( d, ∆)) = (cid:2) λD ( ∂f ) (cid:0) P f ( x, λ ) (cid:12)(cid:12) λ [ x − P f ( x, λ )] (cid:1) + I (cid:3) − (cid:0) d − ∆ λ [ x − P f ( x, λ )] (cid:1) for all ( d, ∆) ∈ E × R . In particular, for any λ > , ( P λ f ) (cid:48) ( x ; · ) = (cid:2) λD ( ∂f ) (cid:0) P f ( x, λ ) (cid:12)(cid:12) λ [ x − P f ( x, λ )] (cid:1) + I (cid:3) − ( · ) P f We now establish semismoothness* of the extended proxi-mal map P f on E × R ++ . We lead with an auxiliary result. Lemma 6.
The map S : E ⇒ E is semismooth* at ( y, z − y ) if and only is S + id is semis-mooth* at ( y, z ) .Proof. The map S is semismooth* at ( y, z − y ) if and only if v ∈ DS ( y | z − y )( u ) , u ∗ ∈ D ∗ S (( y, z − y ); ( u, v ))( v ∗ ) ⇒ (cid:104) u, u ∗ (cid:105) = (cid:104) v, v ∗ (cid:105)⇐⇒ (cid:40) v + u ∈ D ( S + id )( y | z )( u ) ,u ∗ + v ∗ ∈ D ∗ ( S + id )(( y, z ); ( u, u + v ))( v ∗ ) (cid:41) ⇒ (cid:104) u, u ∗ + v ∗ (cid:105) = (cid:104) u + v, v ∗ (cid:105)⇐⇒ S + id semismooth* at ( y, z ) . Here the first equivalence is the definition of semismoothness* and (15). The second uses the sumrule for the graphical derivative [39, Exercise 10.43] and the directional coderivative [6, Corollary5.3 (+ comment)], respectively, when one summand is smooth (here the identity map). The lastequivalence is a variable change and the definition of semismoothness* and (15) again. (cid:3)
Proposition 6 (Semismoothness* of P f ) . For f ∈ Γ ( E ) , (a) P f is semismooth* at ( x, λ ) if ∂f semismooth* at (cid:0) P f ( x, λ ) , λ [ x − P f ( x, λ )] (cid:1) ; (b) P λ f is semismooth* at x if and only if ∂f is semismooth* at (cid:0) P λ f ( x ) , λ [ x − P λ f ( x )] (cid:1) .Proof. Part (a) follows from Corollary 3 with ω = (cid:107) · (cid:107) . For Part (b), observe that P λ f =( λ∂f + id ) − is semismooth* at x if and only if λ∂f + id is semismooth* at ( P λ f ( x ) , x ) [23, p. 7].By Lemma 6, this is the case if and only if λ∂f is semismooth* at ( P λ f ( x ) , x − P λ f ( x )) which, inturn, holds if and only if ∂f is semismooth* at (cid:0) P λ f ( x ) , λ [ x − P λ f ( x )] (cid:1) . (cid:3) M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Various papers study the semismoothness `a la Qi and Sun [37] of P f on E × R ++ . Most of theseresults, trace the semismoothness of the latter back to the semismoothness of the Euclidean projec-tion onto epi f . The work by Meng et al. [31, 32] deserves explicit mention, and a good discussion ofthese results can be found in Milzarek’s thesis [33]. Bearing our applications in Section 7 in mind,this is somewhat of a circular strategy, and hence we opened up a different path via our study inSection 3.3.1 on semismooth* properties of solution maps. For a map that is locally Lipschitz at apoint, semismoothness* differs from traditional semismoothness only in directional differentiabilityas the following result by Gfrerer and Outrata [23, Corollary 3.8] shows. Lemma 7 (Semismooth vs. semismooth*) . Let F : D ⊂ E → E be locally Lipschitz at x ∈ int D . Then the following are equivalent: (a) F is semismooth at x ; (b) F is semismooth* and directionally differentiable at x . This lemma gives the following immediate consequence about semismoothness of P f . Corollary 9 (Semismoothness of P f ) . Let f ∈ Γ ( E ) and fix ( x, λ ) ∈ E × R ++ . If ∂f isproto-differentiable and semismooth* at (cid:0) P f ( x, λ ) , λ [ x − P f ( x, λ )] (cid:1) , then P f is semismooth at ( x, λ ) .This holds, in particular, if f is PLQ or twice continuously differentiable at P f ( x, λ ) , in which case P f is continuously differentiable at ( x, λ ) .Proof. For the first statement combine Corollary 8, Proposition 6, and Lemma 7. For the second,invoke Remark 1 and Proposition 3. (cid:3)
Note that semismoothness* does not require directional differentiability of the function in ques-tion. However, semismoothness* is still sufficient to yield convergence of Newton-type methodsunder suitable regularity conditions [23, 27]. In view of the above discussion, this is importantbecause the Euclidean projector onto a closed convex set may not be directionally differentiable [40],in which case the arguments and methods based on (standard) semismoothness are invalidated.
5. The proximal value
The projection onto the epigraph of a function f ∈ Γ ( E ) requiresa particular value of λ so that the equation (2) holds. In this section we examine the variationalproperties of the value of the proximal map as a function of λ , i.e., the function0 < λ (cid:55)→ f ( P λ f (¯ x )) , (24)where ¯ x ∈ E is fixed. Note that this map is not generally convex, as illustrated by the followingcounterexample. Example 2 (Nonconvexity of the proximal value).
Define f = | · | + δ [ − , ∈ Γ ( R ). ByBeck [4, Example 6.22], P λ f ( x ) = min { max {| x | − λ, } , } · sgn( x ) ∀ x ∈ R , λ > . Hence, for ¯ x = 2, we obtain the nonconvex function f ( P λ f (¯ x )) = λ ∈ (0 , − λ if λ ∈ (1 , λ > (cid:5) The next result describes the monotonicity and continuity of the map (24).
Corollary 10 (Monotonicity and continuity in λ ) . Let f ∈ Γ ( E ) and fix ¯ x ∈ E . Then (a) 0 < λ (cid:55)→ f ( P λ f (¯ x )) is decreasing (i.e., increasing as λ ↓ ); .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections (b) 0 < λ (cid:55)→ (cid:107) ¯ x − P λ f (¯ x ) (cid:107) is increasing; (c) lim λ → f ( P λ f (¯ x )) = f ( P cl (dom f ) (¯ x )) .Proof. Parts (a) and (b). Let 0 < λ < µ and set P ( λ ) := P λ f (¯ x ), P ( µ ) := P µ f (¯ x ), and δ := ( (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) ). Then from (21) of Lemma 5, we obtain1 µ δ ≤ f ( P ( λ )) − f ( P ( µ )) ≤ λ δ. As λ < µ , this implies that δ ≥
0, i.e., (cid:107) P ( µ ) − x (cid:107) ≥ (cid:107) P ( λ ) − x (cid:107) , and hence f ( P ( µ )) ≤ f ( P ( λ )).Part (c). Let { λ k } ↓
0. Then p k := P λ k f (¯ x ) → p := P cl (dom f ) (¯ x ); see Proposition 5. It follows that f ( p ) ≥ lim sup k →∞ (cid:2) f ( p k ) + λ k (cid:0) (cid:107) ¯ x − p k (cid:107) − (cid:107) ¯ x − p (cid:107) (cid:1) (cid:3) ≥ lim sup k →∞ f ( p k ) ≥ lim inf k →∞ f ( p k ) ≥ f ( p ) . Here the first inequality uses that f ( p ) + λ k (cid:107) ¯ x − p (cid:107) ≥ f ( p k ) + λ k (cid:107) ¯ x − p k (cid:107) for all k ∈ N , bydefinition of p k . The second is due to (cid:107) ¯ x − p k (cid:107) ≥ (cid:107) ¯ x − p (cid:107) , by the definition of p and since p k ∈ dom f .The last one is just lower semicontinuity of f . (cid:3) As we did with the Moreau envelope and proximal map, we define the extension of the map (24)to include negative values of λ : η f ¯ x : λ ∈ R (cid:55)→ (cid:40) f ( P λ f (¯ x )) if λ > f ( P cl (dom f ) (¯ x )) if λ ≤ proximal value function . Observe that η f ¯ x ( λ ) = e λ f (¯ x ) − (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) ( λ > . (25)We use Corollary 10 to derive the following result. Corollary 11 (Continuity properties of the proximal value) . Let f ∈ Γ ( E ) and fix ¯ x ∈ E . Then the following hold: (a) η f ¯ x is decreasing, continuous (possibly in an extended real-valued sense), and finite-valued if (andonly if ) P cl (dom f ) (¯ x ) = ¯ x ∈ dom f . (b) η f ¯ x is locally Lipschitz on R ++ . (c) If ¯ x ∈ dom ∂f , then the assertion in (b) holds on R .Proof. Set η := η f ¯ x . Parts (a) and (b). The fact that η is decreasing follows from Corollary 10(a).Now consider (25). By Corollary 6, the map 0 < λ (cid:55)→ e λ f (¯ x ) is convex and finite-valued, hencelocally Lipschitz. By Corollary 7(a), this conclusion also holds for 0 < λ (cid:55)→ λ (cid:107) x − P λ f (¯ x ) (cid:107) . Thisgives the local Lipschitz continuity of η on R ++ . The continuity at 0 is due to Corollary 10(c).Part (c). By Parts (a) and (b), and because η is constant (and finite by assumption) on R − , weonly need to be concerned about the desired properties at 0. To this end, let µ > λ . If λ <
0, then (cid:12)(cid:12)(cid:12)(cid:12) η ( µ ) − η ( λ ) µ − λ (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) η ( µ ) − η (0) µ − (cid:12)(cid:12)(cid:12)(cid:12) . Thus we can restrict ourselves to the case 0 ≤ λ < µ . Set P ( τ ) := P τ f (¯ x ) for all τ > P (0) := ¯ x .Then by Corollary 7(c), there exist positive scalars ε and κ such that (cid:107) P ( µ ) − P ( λ ) (cid:107) ≤ κ ( µ − λ ) ∀ ≤ λ ≤ µ ≤ ε. (26) M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
For 0 < λ < µ ≤ ε , we have | η ( λ ) − η ( µ ) | = η ( λ ) − η ( µ ) ≤ λ ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) − (cid:107) P ( µ ) − P ( λ ) (cid:107) )= λ [( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ) · ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) ) − (cid:107) P ( µ ) − P ( λ ) (cid:107) ] ≤ λ (cid:107) P ( µ ) − P ( λ ) (cid:107) · ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) − (cid:107) P ( µ ) − P ( λ ) (cid:107) ) ≤ κ λ | µ − λ | ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) − ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ))= κλ (cid:107) ¯ x − P ( λ ) (cid:107) · | µ − λ |≤ κ | µ − λ | . Here, the first identity follows from Corollary 10(a), where the first inequality uses Lemma 5(a).The rest the follows from the reverse triangle inequality and (26), recalling that P (0) = ¯ x . (cid:3) Remark 2.
The requirement that ¯ x ∈ ∂f , made in Corollary 11, cannot be relaxed to ¯ x ∈ dom f . To see this, we again use Example 1(b), where η f ¯ x ( λ ) = (cid:40) − ( λ/ if λ ≥ λ < λ = 0. We also conclude fromthis example that the lack of calmness of the proximal map at λ = 0 is not necessarily compensatedby applying f .Under certain assumptions described by Corollary 12, we may interpret the extended proximalvalue function η f ¯ x as the derivative of the convex function¯ φ f ¯ x : λ ∈ R (cid:55)→ − λe λ f (¯ x ) if λ > − d f ) (¯ x ) if λ = 0, − λf ( P cl (dom f ) (¯ x )) − d f ) (¯ x ) if λ <
0; (27)cf. Attouch [2, Remark 3.32].
Corollary 12 (The function ¯ φ f ¯ x ) . Let f ∈ Γ ( E ) and fix ¯ x ∈ E . Then the following hold: (a) ¯ φ f ¯ x is proper, convex and continuous (possibly in an extended real-valued sense), and continuouslydifferentiable on R ++ with ddλ ¯ φ f ¯ x ( λ ) = − f ( P λ f (¯ x )) locally Lipschitz for all λ > . (b) If ¯ x ∈ dom f , then ¯ φ f ¯ x is continuously differentiable on R with derivative given by ddλ ¯ φ f ¯ x ( λ ) = − η f ¯ x ( λ ) = (cid:40) − f ( P λ f (¯ x )) if λ > , − f ( P cl (dom f ) (¯ x )) if λ ≤ .If, more strictly, ¯ x ∈ dom ∂f , then this derivative is locally Lipschitz on all of R . (c) If P cl (dom f ) (¯ x ) / ∈ dom f , then dom ¯ φ f ¯ x = R + and ∂ ¯ φ f ¯ x ( λ ) = (cid:40) − f ( P λ f (¯ x )) if λ > , ∅ if λ ≤ .Proof. Set ¯ φ := ¯ φ f ¯ x ( λ ). Part (a). It is an easy computation to see that0 < λ (cid:55)→ − ¯ φ ( λ ) = inf u (cid:8) λf ( y ) + (cid:107) u − ¯ x (cid:107) (cid:9) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections is concave, i.e., 0 < λ (cid:55)→ ¯ φ ( λ ) is convex. By setting ¯ φ (0) = − d f ) (¯ x ) and using Proposition 4(a),we see that ¯ φ is a continuous convex function on R + , which is linearly extended to R − . All in all, ¯ φ is convex, proper and continuous (possibly in an extended real-valued) sense. From Corollary 6(b)(and the product rule) we infer, for all λ >
0, that¯ φ (cid:48) ( λ ) = − e λ f (¯ x ) − λ (cid:16) − (cid:13)(cid:13) λ [¯ x − P λ f (¯ x )] (cid:13)(cid:13) (cid:17) = (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) − e λ f (¯ x ) = − f ( P λ f (¯ x )) , where the last equality follows from (25). Hence, the local Lipschitz continuity follows from Corol-lary 11(b).Part (b). Here we assume that P cl (dom f (¯ x ) ∈ dom f . Then by definition of ¯ φ , we have ¯ φ (cid:48) ( λ ) = − f ( P cl (dom f ) (¯ x )) for all λ <
0. It remains to establish the case λ = 0. To this end, use the subgradientinequality to deduce that g ∈ ∂ ¯ φ (0) if and only if ¯ φ (0) + λg ≤ ¯ φ ( λ ) for all λ if and only if g + e λ f (¯ x ) − λ d f ) (¯ x ) ≤ ∀ λ > , (28a) λg + λf ( P cl (dom f ) (¯ x )) ≤ ∀ λ < , (28b)hold simultaneously. (The case with λ = 0 holds trivially.) From (28a), we infer that g ≤ inf λ> d f ) (¯ x ) − λe λ f (¯ x ) λ (i) = inf λ> ¯ φ ( λ ) − ¯ φ (0) λ (ii) = lim λ ↓ ¯ φ ( λ ) − ¯ φ (0) λ (iii) = lim λ ↓ d f ) (¯ x ) − λe λ f (¯ x ) λ (iv) = lim λ ↓ − f ( P λ f (¯ x ))1 (v) = − f ( P cl (dom f ) (¯ x )) . Here, (i) is simply the definition of ¯ φ ; (ii) holds because ¯ φ is convex [38, Theorem 23.1]; (iii) fol-lows from the definition of ¯ φ ; and (iv) follows from l’Hˆopital’s rule, which is applicable becausethe last limit exists by Corollary 10(c), which implies (v). Hence, (28a) is equivalent to g ≤− f ( P cl (dom f (¯ x )). Combined with (28b), which is equivalent to g ≥ − f ( P cl (dom f (¯ x )), establishes that ∂ ¯ φ (0) = {− f ( P cl (dom f (¯ x )) } . Thus, P cl (dom f (¯ x ) ∈ dom f , ¯ φ is differentiable, and hence continuouslydifferentiable by convexity [38, Corollary 25.5.1]. The remainder follows from Corollary 11(c).Part (c). Here we assume that P cl (dom f (¯ x ) / ∈ dom f . Suppose g ∈ ∂φ (0), i.e., analogous to somearguments in b), g ≤ (1 / λ ) d f ) (¯ x ) − e λ f (¯ x ) ∀ λ > . On the other hand, using e.g., Corollary 10(b), we have(1 / λ ) d f ) (¯ x ) − e λ f (¯ x ) = (1 / λ ) (cid:107) ¯ x − P cl (dom f ) (¯ x ) (cid:107) − (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) − f ( P λ f (¯ x )) ≤ − f ( P λ f (¯ x )) . Since − f ( P λ f (¯ x )) → −∞ as λ ↓
0, this concludes the proof. (cid:3)
In view of the properties of theproximal value function, as outlined by Corollary 11, the question for semismoothness of η f ¯ x on R ++ arises naturally. Now consider the expression (25). The map 0 < λ (cid:55)→ e λ f (¯ x ) is continuouslydifferentiable by Corollary 6(a), hence semismooth [20, Proposition 7.4.5]. Moreover, the map 0 <λ (cid:55)→ (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) is semismooth if 0 < λ (cid:55)→ P λ f (¯ x ) is semismooth [20, Proposition 7.4.4].Thus, when the latter holds, we can conclude that η f ¯ x is semismooth. We can in addition useCorollary 9, which establishes conditions for the semismoothness of the map ( x, λ ) ∈ E × R ++ (cid:55)→ P λ f ( x ), to obtain the following result. M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Proposition 7 (Semismoothness of the proximal value function) . Let f ∈ Γ ( E ) and ¯ x ∈ E . Then η f ¯ x is semismooth at ¯ λ > if ∂f is proto-differentiable and semismooth* at (cid:0) P ¯ λ f (¯ x ) , λ [¯ x − P ¯ λ f (¯ x )] (cid:1) . This is the case under either of the following conditions: (a) (PLQ case) f is piecewise-linear quadratic. (b) ( C case) f is twice continuously differentiable around P ¯ λ f (¯ x ) . In this case, η f ¯ x is continuouslydifferentiable.
6. Post-composition envelopes and proximal maps
Given functions ψ ∈ Γ ( E ) and g ∈ Γ ( R ), we consider the composition( g ◦ ψ )( x ) := (cid:40) g ( ψ ( x )) if x ∈ dom ψ ,+ ∞ otherwise.It is well known that g ◦ ψ is closed proper convex if g is increasing and that the intersection ψ ( E ) ∩ dom g is nonempty; see, for example, Hiriart-Urruty and Lemar´echal [25, Theorem B.2.1.7],who describe this operation as post-composition . We establish variational formulas for the Moreauenvelope and proximal map of the composition g ◦ ψ under a regularity assumption involving theintersection of domains. These results provide us with tools to infer properties of projections ontothe epigraph and level sets of a closed proper convex function, as covered in Section 7. Proposition 8 (Post-composition, Moreau envelopes, and proximal maps) . Let g ∈ Γ ( R ) be increasing and let ψ ∈ Γ ( E ) such that (ri dom g ) ∩ ψ (ri dom ψ ) (cid:54) = ∅ . (29) Then the following properties hold. (a) e ( g ◦ ψ )(¯ x ) = − min λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) , where ¯ φ ψ ¯ x is given by (27) . (b) P ( g ◦ ψ )(¯ x ) = P (¯ λ · ψ )(¯ x ) for every ¯ λ ∈ argmin λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) (cid:54) = ∅ . (c) If ψ ( P cl (dom ψ ) (¯ x )) / ∈ ∂g ∗ (0) , then argmin λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) ⊂ R ++ . This is, in particular, thecase if P cl (dom ψ ) (¯ x ) / ∈ dom ψ .Proof. Part (a). We find that e ( g ◦ ψ )(¯ x ) = min x ∈ E (cid:8) (cid:107) x − ¯ x (cid:107) + ( g ◦ ψ )( x ) (cid:9) = − (cid:0) (cid:107) ( · ) − ¯ x (cid:107) + g ◦ ψ (cid:1) ∗ (0)= max y ∈ E ,λ ≥ − (cid:8) g ∗ ( λ ) − (cid:107) y (cid:107) + (cid:104) ¯ x, y (cid:105) + ( λ · ψ ) ∗ ( − y ) (cid:9) = max λ ≥ (cid:8) − g ∗ ( λ ) + max y ∈ E (cid:2) − (cid:107) y (cid:107) − (cid:104) ¯ x, y (cid:105) − ( λ · ψ ) ∗ ( − y ) (cid:3)(cid:9) = max λ ≥ − g ∗ ( λ ) − ¯ φ ψ ¯ x ( λ ) . Here, the third identity uses [10, Corollary 3] with f := (cid:107) ( · ) − ¯ x (cid:107) , F := ψ , and K = R + , realizingthat (29) is equivalent to qualification condition [10, Equation (17)] because dom g − R + = dom g ,and observing that attainment is guaranteed by finiteness of the left-hand side. The last identityuses Fenchel duality [38, Theorem 31.1] and the definition of ¯ φ ψ ¯ x in (27).Part (b). Note that by [10, Corollary 4], ∂ ( g ◦ ψ )( x ) = (cid:91) λ ∈ ∂g ( ψ ( x )) ∂ ( λ · ψ )( x ) ∀ x ∈ dom g ◦ ψ, (30) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections and observe that ∂g ( x ) ⊂ R + because g is increasing. Next, observe that¯ λ ∈ argmin λ ≥ { g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) } , ¯ u = P (¯ λ · ψ )(¯ x ) (i) ⇐⇒ ∈ ∂g ∗ (¯ λ ) + ∂ ¯ φ ψ ¯ x (¯ λ ) , ¯ u = P (¯ λ · ψ )(¯ x ) (ii) ⇐⇒ ψ (¯ u ) ∈ ∂g ∗ (¯ λ ) , ¯ u = P (¯ λ · ψ )(¯ u ) (iii) ⇐⇒ ¯ λ ∈ ∂g ( ψ (¯ u )) , ¯ u = P (¯ λ · ψ )(¯ x ) (iv) ⇐⇒ ¯ λ ∈ ∂g ( ψ (¯ u )) , ∈ ¯ u − ¯ x + ∂ (¯ λ · ψ )(¯ x ) (v) = ⇒ ¯ u = P ( g ◦ ψ )(¯ x ) . Equivalence (i) is valid because int (dom g ∗ ) ⊂ R ++ ⊂ int (dom ¯ φ ψ ¯ x ); see [10, Lemma 4] and Corol-lary 12, respectively. Corollary 12(b) justifies equivalence (ii). Equivalence (iii) is the inver-sion formula for the subdifferential [38, Corollary 23.5.1]. Equivalence (iv) uses the optimal-ity conditions that uniquely determines ¯ u = P (¯ λ · ψ )(¯ x ). Implication (v) follows from (30) andthe optimality conditions that uniquely determine P ( g ◦ ψ )(¯ x ). Taken together, we deducethat for any ¯ λ ∈ argmin λ ≥ g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ), we have P ( g ◦ ψ )(¯ x ) = P ( λ · ψ )(¯ x ). The fact thatargmin λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) (cid:54) = ∅ follows from Part (a).Part (c). Recall from Part (b) that 0 ∈ argmin λ ≥ { g ∗ + φ ψ ¯ x } entails 0 ∈ ∂g ∗ (0) + ∂φ f ¯ x (0) . In viewof Corollary 12(c), we must have P cl (dom ψ ) (¯ x ) ∈ dom ψ , in which case ∂φ ψ ¯ x (0) = − ψ ( P cl (dom ψ ) (¯ x )),by Corollary 12(b). This proves the claim. (cid:3)
7. Epigraphical and level-set projections
We are now equipped to answer the initial ques-tion about computing epigraphical and level-set projections via proximal mappings. Our approachis based on the Moreau envelopes of the indicator functions to the epigraph and level set of afunction f , which we express as the post-compositions δ lev α f = ( δ R − ) ◦ ( f ( · ) − α ) and δ epi f = ( δ R − ) ◦ ( f ( · ) − ( · )) . Proposition 8 provides the required tools.
Corollary 13 (Level-set projection) . Let f ∈ Γ ( E ) , (¯ x, ¯ α ) ∈ E × R , and assume thereexists ˆ x ∈ E such that f (ˆ x ) < ¯ α . Then the following statements hold. (a) (Dual representation of distance to level set) d ¯ α f (¯ x ) = − min λ ≥ (cid:8) ¯ φ f ¯ x ( λ ) + ¯ αλ (cid:9) . (b) (Projection onto level set) P lev ¯ α f (¯ x ) = (cid:40) P cl (dom f ) (¯ x ) if f ( P cl (dom f ) (¯ x )) ≤ ¯ α , P ¯ λ f (¯ x ) otherwise,for any positive ¯ λ in the optimal solution set argmin λ ≥ { ¯ φ f ¯ x ( λ ) + ¯ αλ } = { λ ≥ | f ( P λ f (¯ x )) = ¯ α } (cid:54) = ∅ . Proof.
Set g := δ R − and ψ : x ∈ E (cid:55)→ f ( x ) − ¯ α . Then g ∈ Γ ( R ) is increasing and ψ ∈ Γ ( E ) withdom ψ = dom f and δ lev ¯ α f = g ◦ ψ . Now observe that (29) applied to this setting is equivalent tosaying that there exists ¯ y ∈ ri (dom f ) such that f (¯ y ) < ¯ α . We (only) assume that there exists ˆ x ∈ dom f such that f (ˆ x ) < ¯ α . However, take any z ∈ ri (dom f ), then, by the line segment principle [38,Theorem 6.1], we have y λ := λz + (1 − λ )ˆ x ∈ ri (dom f ) for all λ ∈ (0 , f ( y λ ) < λf ( z ) +(1 − λ ) ¯ α → ¯ α as λ ↓
0. Hence there exists ˆ λ ∈ (0 ,
1] sufficiently small such that f ( y ˆ λ ) < ¯ α . Henceˆ y := y ˆ λ ∈ ri (dom f ) with f (ˆ y ) < ¯ α , and (29) holds. M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Part (a). For all λ ≥
0, ¯ φ ψ ¯ x ( λ ) = (cid:40) − λe λ ψ (¯ x ) if λ > − d ψ ) (¯ x ) if λ = 0,= (cid:40) − λ ( e λ f (¯ x ) − ¯ α ) if λ > − d f ) (¯ x ) if λ = 0,= ¯ φ f ¯ x ( λ ) + ¯ αλ. Use Proposition 8(a) and the fact that g ∗ = δ R + to deduce that d ¯ α f (¯ x ) = e δ lev ¯ α f (¯ x ) = e ( g ◦ ψ )(¯ x ) = − min λ ≥ (cid:8) ¯ φ f ¯ x ( λ ) + ¯ αλ (cid:9) . Part (b). The equality of the two sets in question is clear from the (necessary and sufficient)optimality conditions and Corollary 12. The rest follows from Proposition 8, Parts (b) and (c)because P lev ¯ α f (¯ x ) = P ( g ◦ ψ )(¯ x ). (cid:3) Corollary 14 (Epigraphical projection) . Let f ∈ Γ ( E ) and (¯ x, ¯ α ) ∈ E × R . Then the fol-lowing statements hold. (a) (Dual representation of distance to epigraph) d f (¯ x, ¯ α ) = − min λ ≥ (cid:8) ¯ φ f ¯ x ( λ ) + ¯ αλ + λ (cid:9) . (b) (Projection onto epigraph) P epi f (¯ x ) = (cid:40) [ P cl (dom f ) (¯ x ) , ¯ α ] if f ( P cl (dom f ) (¯ x )) ≤ ¯ α , [ P ¯ λ f (¯ x ) , ¯ α + ¯ λ ] otherwise,where ¯ λ > is the unique solution of the strongly convex optimization problem min λ ≥ λ + ¯ αλ + ¯ φ f ¯ x ( λ ) . Equivalently, λ is the unique root of the strictly decreasing function < λ (cid:55)→ f ( P λ f (¯ x )) − λ − ¯ α .Proof. Analogous to the proof of Corollary 13, we define closed proper convex functions g := δ R − and ψ : ( x, α ) ∈ E × R (cid:55)→ f ( x ) − α so that δ epi f = g ◦ ψ . Therefore, ψ (ri (dom ψ )) = ψ (ri (dom f ) × R ) = f (ri (dom f )) − R = R , and thus the qualification condition (29) is trivially satisfied in this setting.Part (a). Note that e λ ψ ( x, α ) = e λ f ( x ) + e λ ( − id )( α ) for all λ > ψ = dom f , ¯ φ ψ ¯ x, ¯ α ( λ ) = ¯ φ f ¯ x ( λ ) + ¯ α · λ + λ ( λ ≥ . Apply Proposition 8(a) to obtain the desired result.Part (b). Apply Proposition 8(b), observing that P epi f (¯ x, ¯ α ) = P δ epi f (¯ x, ¯ α ) and P ( λ · ψ )(¯ x, ¯ α ) =[ P ( λf )(¯ x ) , ¯ α + λ ] for all λ ≥ λ > (cid:3) Remark 3 (Prior work).
The level-set projection result Corollary 13 encompasses the resultdescribed by Beck [4, Theorem 6.30]. For epigraphical projection, Corollary 14 generalizes Beck [4,Theorem 6.36] to include functions that aren’t finite-valued. For functions f ∈ Γ ( E ) with opendomain, Chierchia et al. [11, Proposition 1] describe an alternative formula for epigraphical pro-jections via proximal maps. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Algorithm 1 SC Newton method for minimizing θ ξ (S.0) Choose λ , δ > { ε k } ↓
0, and let β, σ ∈ (0 , k := 0.(S.1) If | θ (cid:48) ( λ k ) | ≤ δ : STOP.(S.2) Choose g k ∈ ∂ B ( θ (cid:48) ξ )( λ k ) and set∆ k := P [ − λ k , ∞ ) (cid:18) − θ (cid:48) ξ ( λ k ) g k + ε k (cid:19) . (S.3) Set t k := max l ∈ N (cid:8) β l (cid:12)(cid:12) θ ξ ( λ k + β l ∆ k ) ≤ θ ξ ( λ k ) + β l σθ (cid:48) ξ ( λ )∆ k (cid:9) . (S.4) Set λ k +1 := λ k + t k ∆ k , k ← k + 1, and go to (S.1). optimization framework In this section we present a unified algorithmicframework for computing projections onto the level sets and the epigraph of a closed proper convexfunction. Corollaries 13 and 14, respectively, guide us in how to compute these projections. For agiven f ∈ Γ ( E ) and (¯ x, ¯ α ) ∈ E × R such that f (¯ x ) > ¯ α , the epigraphical and level-set projections,respectively, correspond to the proximal map of f with parameter λ that solves the scalar problemmin λ ≥ θ ξ ( λ ) ( ξ ∈ { epi , lev } ) , (31)for θ ξ : R → R given by θ ξ ( λ ) = (cid:40) ¯ φ f ¯ x ( λ ) + ¯ αλ if ξ = lev,¯ φ f ¯ x ( λ ) + ¯ αλ + λ if ξ = epi . (32)Corollary 12 asserts that θ ξ is convex, continuous (possibly in an extended real-valued sense), andcontinuously differentiable with monotonically increasing, locally Lipschitz derivative on R ++ . Inparticular, for any λ > θ (cid:48) ξ ( λ ) = (cid:40) − η f ¯ x ( λ ) + ¯ α if ξ = lev, − η f ¯ x ( λ ) + ¯ α + λ if ξ = epi , (33)The minimization of φ η could be accomplished using bisection if an upper bound on the optimal λ is available. However, the semismoothness of the derivative (33), described by Proposition 7, allowsus to tap into the powerful SC optimization framework [20, 36] that operates on functions θ : R → R that are semismoothly differentiable (i.e., SC ), which means that at points ¯ λ ∈ int (dom θ ), thegradient θ (cid:48) exists, and it is locally Lipschitz around ¯ λ and semismooth at ¯ λ . The semismoothmethod, outlined by Algorithm 1, applies to the problem (31) whenever conditions (A1) and (A2)of Pang and Qi [36] hold, which is the case when ¯ x ∈ dom ∂f ; see Corollary 12.Algorithm 1 uses the notion of a Bouligand subdifferential , which for a function φ : R n → R that is locally Lipschitz at a point ¯ x ∈ int (dom φ ), is defined at ¯ x as ∂ B φ (¯ x ) = { v | ∃{ x k ∈ D φ , x k → ¯ x } : ∇ φ ( x k ) → v } , where D φ is the set of points of differentiability of φ . The Clarke subdifferential [12] of φ at ¯ x is ∂ C φ (¯ x ) := conv ∂ B φ (¯ x ) , which coincides (on the interior ofdom φ ) with the convex subdifferential if φ is convex. Remark 4.
Because θ ξ is convex and differentiable with locally Lipschitz derivative on R ++ ,all elements in the Clarke subdifferential ∂ C ( θ (cid:48) ξ )( λ ) are nonnegative for all λ > ξ = epi ), the quadratic term in the expression for θ epi in (32) implies thatthe elements are bounded below by 1. Thus, the sequence of regularization parameters { ε k } ↓ θ (cid:48) is piecewise affine, the regularization could beeliminated by setting the constant regularization ε k := 0 for all k , which would improve numericalconvergence regardless of the optimality parameter δ > M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
Algorithm 2
Full-step SC Newton method(S.0) Choose λ > δ >
0, and { ε k } ↓
0. Set k := 0.(S.1) If | θ (cid:48) ξ ( λ k ) | ≤ δ : STOP.(S.2) Choose g k ∈ ∂ C ( θ (cid:48) ξ )( λ k ) and set∆ k := max (cid:26) − λ k , − θ (cid:48) ξ ( λ k ) g k + ε k (cid:27) . (S.3) Set λ k +1 := λ k + ∆ k , k ← k + 1, and go to (S.1). θ (cid:48) ξ is concave on (0 , λ l ) Corollaries 13 and 14 imply that thereexists positive parameters λ l ≤ λ u such that[ λ l , λ u ] = argmin λ ≥ θ ξ = (cid:8) λ > (cid:12)(cid:12) θ (cid:48) ξ ( λ ) = 0 (cid:9) , (34)for both the epigraphical and level-set cases. In the epigraphical case in particular, the solutionis unique, and thus λ u = λ l ; see Corollary 14(b). If the derivative φ ξ is concave on the interval(0 , λ (cid:96) ), it is possible to take a full Newton step at every iteration while respecting positivity of theiterates, thus saving the computational cost of a backtracking line-search. The simplified iterationis described by Algorithm 2.For many important functions, e.g., the 1-norm or negative log, (and their spectral counterparts),the respective map θ (cid:48) ξ is concave on R ++ , but, as suggested above, we only need the following: Assumption 1 (Concavity (0 , λ l )) . The function θ (cid:48) ξ is concave on (0 , λ l ) . Proposition 9 (Convergence of Algorithm 2) . Under Assumption 1, the full-step New-ton method from Algorithm 2 converges to a minimizer of θ ξ .Proof. Set θ = θ ξ . If 0 < λ k < λ l for some k ∈ N , then by Corollary 11(a), θ (cid:48) ( λ k ) < − θ (cid:48) . Therefore, λ k +1 = λ k − θ (cid:48) ( λ k ) g k + ε k > λ k . Since − ( g k + ε k ) is a convex subgradient of − ( θ (cid:48) + ε k ( · )), the concavity of θ (cid:48) ξ implies that − θ (cid:48) ( λ k +1 ) − ε k ( λ k +1 − λ k ) ≥ − θ (cid:48) ( λ k ) − ( λ k +1 − λ k )( g k + ε k ) = 0 , and hence θ (cid:48) ( λ k +1 ) <
0, thus 0 < λ k < λ k +1 < λ l . Consequently, by an inductive argument, { λ k } converges to some ˜ λ . Therefore, the sequence { g k ∈ ∂ C ( θ ξ )( λ k ) } is bounded, and hence0 = ( λ k +1 − λ k )( g k + ε k ) + θ (cid:48) ( λ k ) → θ (cid:48) (˜ λ ) , which shows that ˜ λ has the desired properties. We hence still need to cover the case where λ l < λ k for all k ∈ N . In view of (34), we can assume that λ u < λ k for all k ∈ N . (Otherwise, a solution hasalready been obtained.) Since θ (cid:48) ( λ k ) > < λ u < λ k +1 = λ k + max (cid:26) − λ k , − θ (cid:48) ( λ k ) g k + ε k (cid:27) ≤ λ k , hence the sequence { λ k } converges to some ˆ λ . In particular, λ k +1 = λ k only finitely many times.Hence, without loss of generality, 0 = ( λ k +1 − λ k )( g k + ε k ) + θ (cid:48) ( λ k ) → θ (cid:48) (ˆ λ ) , which gives θ (cid:48) (ˆ λ ) = 0also here. (cid:3) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections − − − − − Figure 2.
The function θ (cid:48) epi corresponding to theprojection of point ¯ x = ( − , . , , .
3) onto the 1-norm unit ball. φ (cid:48) ξ − − Figure 3.
The function θ (cid:48) epi for Example 3, forwhich Algorithm 2 may cycle. The next example illustrates that cycling may occur in Algorithm 2 if Assumption 1 fails.
Example 3 (Cycling).
Consider the scalar function f ( x ) = 2 | x | + δ [ − , ( x ), and the task ofprojecting the (¯ x, ¯ α ) = (4 , −
1) onto epi f . Figure 3 illustrates the function θ (cid:48) epi whose root we seek.Then for λ outside of the interval [1 . ,
2] the iterates λ k ( k ∈ N ) generated by Algorithm 2 oscillatebetween 1 . We present numerical experiments that hint at the com-putational effectiveness of the SC optimization framework described in Section 7.1. The twoexperiments in this section were run on an Apple Macbook Air with a 1.8GHz Intel Corei5 and 8Gb RAM running OS 10.14.6. The code was written in C and available at https://github.com/arielgoodwin/epi-proj . An important instance of the level-set case ( ξ = lev)is the projection onto the unit 1-norm ball lev (cid:107) · (cid:107) = { x ∈ R n | (cid:107) x (cid:107) ≤ } . The derivative of thecorresponding function θ lev reads θ (cid:48) lev ( λ ) = (cid:40) − (cid:80) ni =1 max {| x i | − λ, } if λ ≥ − || x || if λ < R + (as required) and piecewise affine, as shown by Fig. 2.We implemented Algorithm 2 and compared it numerically to two state-of-the-art algorithmsspecifically tailored to 1-norm-ball projection, namely Condat’s sorting-based method [16] as imple-mented in the code condat l1ballproject.c , and Liu and Ye’s improved bisection algorithm(IBIS) [30] implemented in the eplb module in SLEP [41].The entries of the projected vectors ¯ x ∈ R n are drawn from a Gaussian distribution with zeromean and standard deviations σ = { . , . , . , . } . The optimality tolerance was fixed at δ = 10 − , as in step (S.1) of Algorithm 2. Table 1 reports the average time required to computethe projection over 10 trials for vectors of dimension n ∈ { , } , and over 500 trials for n = 10 .The initial point λ > √ n log n coordinates randomly fromthe vector ¯ x and setting λ to be the largest of their absolute values. Observe that Algorithm 2exhibits comparable performance relative to the specialized algorithms. We now consider the epigraphical projec-tion for a function that is not polyhedral. Define the function f : x ∈ R n (cid:55)→ − (cid:80) ni =1 log x i , where wetake the negative logarithm to be ∞ outside the positive orthant. Figure 4 illustrate the function M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections n Algorithm 2 Condat IBIS Algorithm 2 Condat IBIS σ = 0 . σ = 0 . . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − σ = 0 . σ = 0 . . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − Table 1 . Average time (seconds) for projecting vectors onto the 1-norm unit ball in dimension n , with coordinateschosen using Gaussian distributions with standard deviation σ . − − − − − − − (¯ x, ¯ α ) = (+1 , −
1) (¯ x, ¯ α ) = ( − , − Figure 4.
The graph of the function θ (cid:48) epi ( λ ) that corresponds to the base points (¯ x, ¯ α ) shown for each figure. Theleft panel depicts the case where P cl (dom f ) (¯ x ) ∈ dom f ; the right panel depicts the case where P cl (dom f ) (¯ x ) / ∈ dom f . n = 1 n = 10 n = 10 SSN 8 . × − . × − . × − Bisection 1 2 . × − . × − . . × − . × − . Table 2 . Time (seconds) for projecting vectors onto the epigraph of f ( x ) = − (cid:80) ni =1 log x i in various dimensions n . φ (cid:48) epi for the case when P cl (dom f ) (¯ x ) is in, and not in, the domain of f . These functions are concaveover (0 , ∞ ). Hence − θ (cid:48) ξ is convex over this interval and Algorithm 2 applies.We numerically compare Algorithm 2 and the bisection method as solution approaches for (31).The coordinates of ¯ x were chosen uniformly at random on the interval [ − , α waschosen uniformly at random on the interval [ − , − . λ was chosen to be √ N .The termination condition for Algorithm 2 was | θ (cid:48) ξ ( λ ) | < − , and the termination conditions forbisection was | θ (cid:48) ξ ( λ ) | < − (labeled Bisection 1 ) and | b − a | < − (labeled Bisection 2 ), where a, b denote the endpoints of the bisection interval. Table 2 shows the average times over 10 trialswhen n ∈ { , } , and over 500 trials when n = 10 . The numerical examples we presented extend easily to other useful casesinvolving matrices, such as the nuclear norm on R m × n and the barrier function − log det on thespace of symmetric matrices, using variational formulas that depend on matrix spectra [28, 29]. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections In these cases, the main computational effort involves computing singular value and eigenvaluedecompositions, respectively, of the matrix iterates.The cases where θ (cid:48) ξ does not satisfy either Assumption 1 or the domain condition P cl (dom f ) (¯ x ) ∈ dom f lies outside the theoretical guarantees presented in this section, though the algorithms wepresent may still work in practice. In the case where dom f (cid:40) E is open, the formula provided byChierchia et al. [11, Proposition 1] is a viable option.
8. Final remarks
Our analysis on the variational properties of epigraphical projections andinfimal convolution is motivated by the authors’ larger research interests on variations of first-ordermethods that operate in a lifted space. The promising work by Chierchia et al. [11] on epigraphical-projection methods for minimizing convex functions over p -norm constraints shows promise forthis algorithmic approach, and we aim to develop methods for more general problem classes. Weare also motivated by statistical M-estimation approaches that include as an additional unknowna particular parameter that characterizes data distribution [15]. The variational calculus that wederive is a useful tool for developing algorithmic approaches for solving these lifted M-estimationproblems.There are at least two avenues of future research that extend our analysis in this paper. K -epigraphical projections. A significant generalization of the post-composition operationdefined in Section 6 occurs when we allow compositions of the form f = g ◦ H : E → R , where • K ⊂ E a closed convex cone; • H : E → E K -convex, i.e., the K -epigraph { ( X, Y ) | Y − H ( x ) ∈ K } is convex; • g ∈ Γ ( E ) K -increasing , i.e., g ≤ g (( · ) + v ) for all v ∈ K .This convex convex-composite setting was studied by Burke et al. [10], and the required subdif-ferential formulas for the analysis are readily available. This may lead to a proximal calculus andultimately to formulas and algorithms for projecting onto K -epigraphs, thus encompassing thestudy in Section 6. Semismoothness* of subdifferential operators.
The notion of semismooth* sets andmaps is recent and still in development. One of the critical conditions in our study is the semis-moothness* of the subdifferential operator ∂f , which also occurs in a recent report by Khanhet al. [27]. This suggests an important avenue of research that relaxes the overarching convexityassumption and, in particular, establishes verifiable sufficient conditions. Acknowledgments
M.P. Friedlander and T. Hoheisel are supported by NSERC Discoverygrants, while A. Goodwin’s work was partially supported by an NSERC summer research stipend.T. Hoheisel would like to thank Dr. Matus Benko, University of Vienna, for valuable discussionson semismoothness*.
References [1]
A.Y. Aravkin , J.V. Burke, D. Drusvyatskiy, M.P. Friedlander, and K.J. MacPhee:
Foun-dations of Gauge and Perspective Duality.
SIAM Journal on Optimization, 28(3), 2018, pp. 2406–2434.[2]
H. Attouch:
Variational Convergence for Functions and Operators.
Applied Mathematics Series,Pittman, Boston, 1984.[3]
H.H. Bauschke and P.L. Combettes:
Convex analysis and Monotone Operator Theory in HilbertSpaces.
CMS Books in Mathematics, Springer, New York, 2nd Edition, 2017.[4]
A. Beck:
First-Order Methods in Optimization . MOS-SIAM Series on Optimization, 2017.[5]
A. Beck and M. Teboulle:
Smoothing and first order methods: A unified framework.
SIAM Journalon Optimization 22 (2), 2012, pp. 557–580.[6]
M. Benko, H. Gfrerer, and J.V. Outrata:
Calculus for Directional Limiting Normal Cones andSubdifferentials.
Set-Valued and Variational Analysis 27, 2019, pp. 713–745.0
M.P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections [7]
M. Bougeard, J.P. Penot, and A. Pommellet:
Towards minimal assumptions for the infimalconvolution regularization.
Journal of Approximation Theory 64(3), 1991, pp. 245–270.[8]
J.V. Burke and T. Hoheisel:
Epi-convergent smoothing with applications to convex composite func-tions.
SIAM Journal on Optimization 23(3), 2013, pp. 1457–1479.[9]
J.V. Burke and T. Hoheisel:
Epi-convergence properties of smoothing by infimal convolution.
Set-Valued and Variational Analysis 25, 2017, pp. 1–23.[10]
J.V. Burke, T. Hoheisel, and Q.V. Nguyen:
A study of convex convex-composite functions viainfimal convolution with applications.
Mathematics of Operations Research, to appear.[11]
G. Chierchia, N. Pustelnik, J.-C. Pesquet, B. Pesquet-Popescu:
Epigraphical projection andproximal tools for solving constrained convex optimization problems.
Signal, Image and Video Processing9, 2015, pp. 1737–1749.[12]
F.H. Clarke:
Optimization and Nonsmooth Analysis.
John Wiley & Sons, New York, 1983.[13]
P.L. Combettes:
Perspective functions: properties, constructions, and examples.
Set-Valued and Vari-ational Analysis 26, 2019, pp. 247–264.[14]
P.L. Combettes and C.L. M¨uller:
Perspective functions: proximal calculus and applications inhigh-dimensional statistics.
Journal of Mathematical Analysis and Applications 457(2), 2018, pp. 1283–1306.[15]
P.L. Combettes and C.L. M¨uller:
Perspective maximum likelihood-type estimation via proximaldecomposition.
Electronic Journal of Statistics 14, 2020, pp. 207–238.[16]
L. Condat:
Fast projection onto the simplex and l1 ball.
Mathematical Programming, Series A,Springer, 2016, 158 (1), pp. 575–585.[17]
L. Condat:
URL https://lcondat.github.io/software.html
Last accessed January 27, 2021.[18]
A.L. Dontchev and R.T. Rockafellar:
Implicit Functions and Solution Mappings. A View fromVariational Analysis.
Springer Series in Operations Research and Financial Engineering, Springer-VerlagNew York, 2014.[19]
J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra:
Efficient projections onto the l1-ballfor learning in high dimensions.
ICML ’08: Proceedings of the 25th international conference on Machinelearning, ACM, New York, NY, USA, 2008, pp. 272–279.[20]
F. Facchinei and J.-S. Pang:
Finite-Dimensional Variational Inequalitites and ComplementarityProblems, Volumes I and II , Springer, New York, 2003.[21]
H. Gfrerer:
On directional metric subregularity and second-order optimality conditions for a class ofnonsmooth mathematical programs.
SIAM Journal on Optimization 23(1), 2013, pp. 63–665.[22]
H. Gfrerer:
On directional metric regularity, subregularity and optimality conditions for nonsmoothmathematical programs.
Set-Valued and Variational Analysis 21, 2013, pp. 151–176.[23]
H. Gfrerer and J.V. Outrata:
On a semismooth* Newton method for solving generalized equations.
SIAM Journal on Optimization. 31(1), 2021, pp. 489–517.[24]
I. Ginchev and B.S. Mordukhovich:
Directional subdifferentials and optimality conditions.
Positiv-ity 16, 2012, pp. 707–737.[25]
J.-B. Hiriart-Urruty and C. Lemar´echal:
Fundamentals of Convex Analysis.
Grundlehren TextEditions, Springer, Berlin, Heidelberg, 2001.[26]
T. Hoheisel:
Topics in Convex Analysis in Matrix Space.
Lecture Notes, Spring School on VariationalAnalysis, Paseky nad Jizerou, Czech Republic, 2019.[27]
P.D. Khanh, B.S. Mordukhovich, and V.T. Phat:
A generalized Newton method for subgradientsystems. arXiv:2009.10551, 2020.[28]
A.S. Lewis:
The convex analysis of unitarily invariant matrix functions.
Journal of Convex Analysis2(1–2), 1995, pp. 173–183.[29]
A.S. Lewis:
Convex analysis on the Hermitian Matrices.
SIAM Journal on Optimization 6(1), 1996,pp. 164–177. .P. Friedlander, A. Goodwin, and T. Hoheisel:
From perspective maps to epigraphical projections
J. Liu and J. Ye:
Efficient Euclidean projections in linear time.
Proceedings of the 26th AnnualInternational Conference on Machine Learning, 2009, pp. 657–664.[31]
F. Meng, D. Sun, and G. Zhao:
Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization.
Mathematical Programming 104, 2005, pp. 561–581.[32]
F. Meng, G. Zhao, M. Goh, and R. De Souza:
Lagrangian-dual functions and Moreau-Yosidaregularization.
SIAM Journal on Optimization 19, 2008, pp. 39–61.[33]
A. Milzarek:
Numerical Methods and Second Order Theory for Nonsmooth Problems.
Dissertation,Technical University of Munich, 2016.[34]
B.S. Mordukhovich:
Variational Analysis and Applications.
Springer Monographs in Mathematicsbook series, Springer International Publishing AG, 2018.[35]
P. Neal and S. Boyd:
Proximal algorithms.
Foundations and Trends in Optimization 1(3), 2013, pp.123–231.[36]
J.S. Pang and L. Qi:
A Globally convergent Newton method for convex SC minimization problems. Journal of Optimization Theory and Applications 85(3), 1995, pp. 633–648.[37]
L. Qi and J. Sun:
A nonsmooth version of Newton’s method.
Mathematical Programming 58, 1993,pp. 353–367.[38]
R.T. Rockafellar:
Convex Analysis.
Princeton Mathematical Series, No. 28. Princeton UniversityPress, Princeton, N.J. 1970.[39]
R.T. Rockafellar and R.J.-B. Wets:
Variational Analysis.
Grundlehren der Mathematischen Wis-senschaften, Vol. 317, Springer-Verlag, Berlin, 1998.[40]
A. Shapiro:
Directionally nondifferentiable metric projection.
Journal of Optimization Theory andApplications 81(1), 1994, pp. 203–204.[41]
J. Liu, S. Ji, and J. Ye:
SLEP: Sparse Learning with Efficient Projections,
Arizona State University, 2009.[42]
T. Str¨omberg:
The Operation of Infimal Convolution.
Dissertationes Mathematicae (RozprawyMatematyczne) 352, 1996.[43]
M. Tofighi, K. Kose, and A.E. Cetin:
Denoising using projections onto the epigraph set of convexcost functions.
IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 2709–2713.[44]
M. Tofighi, A. Bozkurt, K. Kose, and A.E. Cetin:
Deconvolution using projections onto theepigraph set of a convex cost function.
P.-W. Wang, M. Wytock, and J.Z. Kolter: