[PDF] From perspective maps to epigraphical projections

Abstract

The projection onto the epigraph or a level set of a closed proper convex function can be achieved by finding a root of a scalar equation that involves the proximal operator as a function of the proximal parameter. This paper develops the variational analysis of this scalar equation. The approach is based on a study of the variational-analytic properties of general convex optimization problems that are (partial) infimal projections of the the sum of the function in question and the perspective map of a convex kernel. When the kernel is the Euclidean norm squared, the solution map corresponds to the proximal map, and thus the variational properties derived for the general case apply to the proximal case. Properties of the value function and the corresponding solution map -- including local Lipschitz continuity, directional differentiability, and semismoothness -- are derived. An SC^1 optimization framework for computing epigraphical and level-set projections is thus established. Numerical experiments on 1-norm projection illustrate the effectiveness of the approach as compared with specialized algorithms

Full PDF

FFrom perspective maps to epigraphical pro jections

Michael P. Friedlander

Department of Computer Science/Department of Mathematics, University of British Columbia2366 Main Mall Vancouver, BC, V6T 1Z4, [email protected], https://friedlander.io

Ariel Goodwin

Department of Mathematics and Statistics, McGill University805 Sherbrooke St West, Montr´eal, Qu´ebec, H3A 0B9, [email protected], https://github.com/arielgoodwin

Tim Hoheisel

Dedicated to James V. Burke, our collaborator and friend, on the occasion of his 65th birthday

The projection onto the epigraph or a level set of a closed proper convex function can be achieved by ﬁndinga root of a scalar equation that involves the proximal operator as a function of the proximal parameter.This paper develops the variational analysis of this scalar equation. The approach is based on a studyof the variational-analytic properties of general convex optimization problems that are (partial) inﬁmalprojections of the the sum of the function in question and the perspective map of a convex kernel. Whenthe kernel is the Euclidean norm squared, the solution map corresponds to the proximal map, and thus thevariational properties derived for the general case apply to the proximal case. Properties of the value functionand the corresponding solution map—including local Lipschitz continuity, directional diﬀerentiability, andsemismoothness—are derived. An SC optimization framework for computing epigraphical and level-setprojections is thus established. Numerical experiments on 1-norm projection illustrate the eﬀectiveness ofthe approach as compared with specialized algorithms. Key words : Proximal map, Moreau envelope, subdiﬀerential, Fenchel conjugate, perspective map,epigraph, inﬁmal projection, inﬁmal convolution, set-valued map, coderivative, graphical derivative,semismoothness*, SC optimization MSC2000 subject classiﬁcation : 52A4, 65K10, 90C25, 90C46

1. Introduction

The Moreau proximal map of a closed proper convex function f that mapsa ﬁnite-dimensional Euclidean space E f to R := R ∪ { + ∞} is given by the minimizing set P λ f ( x ) = argmin u ∈ E f (cid:8) f ( u ) + (1 / λ ) (cid:107) x − u (cid:107) (cid:9) ( λ > . The proximal map is a central operation of algorithms for nonsmooth optimization, including ﬁrst-order methods such as proximal gradient and operator splitting [3, 35]. Geometrically, the proximalmap corresponds to the Euclidean projection P epi f onto the epigraph epi f ; see Fig. 1. Indeed, forall positive λ and x λ := P λ f ( x ), (cid:0) x λ , f ( x λ ) (cid:1) = P epi f ( x, f ( x λ ) − λ ) . (1)Thus, the projection of an arbitrary point ( x, α ) ∈ E f × R (cid:54)∈ epi f corresponds to the proximal mapof the base point x using the parameter λ that is the unique positive root of the function0 < λ (cid:55)→ f ( x λ ) − λ − α. (2) a r X i v : . [ m a t h . O C ] F e b M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections ER f ( x λ ) x λ f ( x λ ) − λ x f epi f Figure 1.

The proximal map x λ := P λ f ( x ) corresponds to the projection of the pair ( x, f ( x λ ) − λ )) onto the epigraphof f ; see (1). This connection between epigraphical projection and the proximal map—described by Beck [4],Bauschke and Combettes [3, Section 29.5], Chierchia et al. [11, Proposition 1], and Meng etal. [31, 32]—is a deﬁning feature of a class of epigraphical ﬁrst-order methods for structured convexoptimization over E f that operate through a sequence of projections onto the epigraphs of theunderlying functions. In eﬀect, these methods operate on an equivalent optimization problem over E f × R [11, 43, 44, 45].This paper develops a general analysis that provides, among other things, the variational prop-erties of the maps ( x, λ ) (cid:55)→ x λ := P λ f ( x ) and ( x, λ ) (cid:55)→ f ( x λ ) , deﬁned on E f × R . This analysis and its supporting calculus allows us to determine the sensitivityof the epigraphical projection with respect to the simultaneous variation of the base point x andthe scaling parameter λ . Although the resulting mathematical statements are key for our deeperunderstanding of epigraphical ﬁrst-order methods, the overall analysis applies much more generally.The approach we take is based on the variational analysis of the optimal value function p L,ω,f : ( x, λ ) ∈ E x × R (cid:55)→ inf u ∈ E f f ( u ) + ω π ( L ( u, x ) , λ ) (3)and its corresponding solution map. Here, L is a linear map, and the perspective transform ω π ofa closed proper convex function ω is deﬁned by epi ω π = cl R + (epi ω × { } ). When the linear map L is deﬁned as ( u, x ) (cid:55)→ x − u , the value function (3) is the inﬁmal convolution of the functions f and ω π ( · , λ ). For this reason, we refer to this value function as the generalized convolution of thesetwo functions.The convex calculus we establish in Section 3 for the analysis of the generalized convolution (3)provides a key tool for understanding several important cases. These include the variational prop-erties of inﬁmal convolution (Section 3.3); parametric constrained optimization (Section 3.4); theMoreau envelope of a convex function and the corresponding proximal map (Section 4); and epi-graphical and level-set projections, including an SC optimization [20, 36] method for numericallyevaluating these projections (Section 7). The perspective map used in generalized convolu-tion (3) ﬁrst appears in Rockafellar [38, Corollary 13.5.1], without a particular name attached toit. More recently, Combettes [13], Combettes and M¨uller [14, 15], and Aravkin et al. [1], describe indetail the properties and applications of this map. Our systematic study of parametric optimizationproblems with perspective maps, outlined in Section 3, appears to be new. .P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections Section 3.3 establishes the variational properties of inﬁmal con-volution, which occurs when the map L is ( u, x ) (cid:55)→ x − u . These results complement the functionalsmoothing framework described by Beck and Teboulle [5, Section 4.1] and Burke and Hoheisel [8, 9],wherein a smooth approximation to a function f is constructed through the inﬁmal convolutionwith the perspective map of a smooth and strongly convex regularizer ω . Bougeard et al. [7] andStr¨omberg [42] provide early contributions to this topic. Theorem 3 describes the Lipschitzianproperties of the corresponding optimal solution map—as a function of ( x, λ ). Corollary 3 estab-lishes suﬃcient conditions for this solution map to be semismooth* [23]. These conditions hold,for instance, when f is piecewise linear-quadratic. This analysis complements the study of theproximal case by Meng et al. [31, 32] and Milzarek [33]. A general form of parametric constrainedoptimization occurs when we specialize the convolution kernel ω in (3) to be the indicator functionto a closed convex set. Section 3.4 focuses the variational analysis of the generalized convolutionoperation to obtain formulas for the sensitivity of the optimal value of parametric optimizationproblems with relaxed linear constraints. This analysis includes perturbations to the relaxationparameter and to the right-hand side. In Section 4 we further focus our analysisof inﬁmal convolution on the proximal case , which occurs when ω = (cid:107) · (cid:107) . Here we develop thevariational properties of the Moreau envelope and the associated proximal map as a function ofthe base point x and the proximal parameter λ , simultaneously. We also establish conditions underwhich the proximal map is semismooth*. Special attention is given to the limiting properties as λ ↓ λ to a positive-deﬁnite matrix, but makes no statementsregarding the limiting case where λ (or its matrix counterpart) vanishes, as we do in our generalanalysis. See also Attouch’s seminal monograph [2]. In Section 5 we describe the main continuity properties of the proximal value function < λ (cid:55)→ f ( P λ f (¯ x )) , (4)where ¯ x ∈ E is held ﬁxed. Corollary 11 establishes its Lipschitzian properties and Corollary 12characterizes it as the derivative of the map λ (cid:55)→ λe λ f (¯ x ) on R ++ . Proposition 7 describes suﬃcientconditions under which the proximal value function is semismooth. We use our anal-ysis of the proximal value function (4) to establish, via Proposition 8, novel variational formulasfor the Moreau envelope and proximal map of post-compositions , i.e., functions of the form g ◦ ψ ,where the scalar function g is increasing and convex, and ψ is closed proper convex. As a con-sequence, Corollary 14 provides a reﬁned version of the epigraphical projection conditions in (1),including analogous results for the projection onto the level set of f (Corollary 13). This analysisdoes not require the function to be ﬁnite-valued, and extends existing results [3, 4]. Importantly,Corollary 14 shows that the root of the aligning equation (2) coincides with the unique minimizerof a strongly convex scalar optimization problem. It follows from Proposition 7 that the objectivefor this problem is continuously diﬀerentiable with a locally Lipschitz derivative. We use this latterproperty to derive a novel SC optimization method to ﬁnd the root of the function (2) and itsanalog in the level-set case. Numerical experiments in Section 7.2 show that for projection ontothe 1-norm unit ball, the resulting SC method is competitive with two specialized state-of-the-artmethods: CONDAT [16] and IBIS [30]. M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Let Γ ( E ) denote the set of functions f : E → R that are proper closed convex,i.e., the epigraph epi f = { ( x, α ) ∈ E × R | f ( x ) ≤ α } contains no vertical lines and is closed convex.Its level sets are given by lev α f := { x ∈ E | f ( x ) ≤ α } . The Fenchel conjugate of any function f : E → R is f ∗ ( y ) = sup x ∈ E {(cid:104) y, x (cid:105) − f ( x ) } . The Jacobian of a diﬀerentiable map F : R n → R m at x ∈ R n is denoted by F (cid:48) ( x ). We denote the Euclidean projection of ¯ x onto C by P C (¯ x ). Throughout,fractions such as (1 / (2 λ )) are abbreviated as (1 / λ ).For a set C ⊂ E , its indicator function is δ C : E → R given by δ C ( x ) := 0 if x ∈ C and δ C ( x ) =+ ∞ otherwise. The subdiﬀerential of δ C is the normal cone of C , i.e., N C (¯ x ) := ∂δ C (¯ x ) := { v ∈ E | (cid:104) v, x − ¯ x (cid:105) ≤ x ∈ C ) } , which is empty if ¯ x (cid:54)∈ C . The relative interior of C is the set ri C [38,Section 6], and the horizon cone is C ∞ := { v ∈ E | ∃{ λ k } ↓ , { x k ∈ C } : λ k x k → v } . The horizonfunction of f ∈ Γ ( E ) is the closed proper convex and positively homogeneous function f ∞ : E → R deﬁned via epi f ∞ = (epi f ) ∞ .Let f k : E → R . Then we say that the sequence { f k } epi-converges to a function f : E → R if ∀ x ∈ E : (cid:26) ∀{ x k } → x : lim inf k →∞ f k ( x k ) ≥ f ( x ) , ∃{ x k } → x : lim sup k →∞ f k ( x k ) ≤ f ( x ) , and we write f k e → f . The sequence { f k } is said to converge continuously to f iflim k →∞ f k ( x k ) = f ( x ) ∀ x ∈ E and { x k } → x, and we write f k c → f . Furthermore, { f k } is said to converge pointwise to f iflim k →∞ f k ( x ) = f ( x ) ∀ x ∈ E , and we write f k p → f . We extend these notions to families of functions { f λ } { λ ↓ } via f λ ξ → f : ⇐⇒ ∀{ λ k } ↓ f λ k ξ → f ( ξ ∈ { p, e, c } ) .

2. Properties of the perspective map

The perspective map ω π that appears in the gen-eralized inﬁmal convolution (3) provides a mechanism for controlling, through the parameter λ ,the degree to which the functions f and ω are combined. Beck and Teboulle [5] and Burke andHoheisel [8] promoted this technique for generating smooth approximations to nonsmooth func-tions.We work with the following deﬁnition of the perspective map of ω , which appears in Rockafel-lar [38, Corollary 13.5.1]: ω π : ( z, λ ) ∈ E ω × R (cid:55)→  λω ( z/λ ) if λ > ω ∞ ( z ) if λ = 0,+ ∞ if λ <

0. (5)For positive values of the parameter λ , the perspective map corresponds to epi-multiplication :( λ (cid:63) ω )( x ) := λω ( x/λ ) . The following result conﬁrms the consistency of the perspective map (5) as the parameter λ decreases towards zero. Lemma 1 (Variational convergence of epi-multiplication) . Let φ ∈ Γ ( E ) . Then as λ ↓ , ( λ (cid:63) φ )( x ) → φ ∞ ( x ) for all x ∈ dom φ , and λ (cid:63) φ e → φ ∞ . .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Proof.

The pointwise convergence of ( λ (cid:63) φ ) over dom φ follows from [38, Corollary 8.5.2]. Toprove epi-convergence, observe that, for all λ > x ∈ E , ( λ (cid:63) φ )( x ) = φ π ( x, λ ) . Hence,lim inf x → ¯ x λ ↓ ( λ (cid:63) φ )( x ) = lim inf x → ¯ x λ ↓ φ π ( x, λ ) ≥ φ π (¯ x,

0) = φ ∞ (¯ x ) ∀ ¯ x ∈ E , where the inequality follows because ω π is a support function [38, Corollary 13.5.1] and thusclosed [25, Proposition 2.1.2].Fix any sequence { λ k } ↓ x ∈ dom φ . Then ( λ k (cid:63) φ )(¯ x ) → φ ∞ (¯ x ). Hence, in particular,with x k := ¯ x ( k ∈ N ), lim sup k →∞ ( λ k (cid:63) φ )( x k ) ≤ φ ∞ (¯ x ) (6)for all ¯ x ∈ dom φ . Now let ¯ x / ∈ dom φ , take ˆ x ∈ dom φ and deﬁne x k := λ k ˆ x + (1 − λ k )¯ x → ¯ x . Then φ ∞ (¯ x ) = sup t> φ (ˆ x + t ¯ x ) − φ (ˆ x ) t ≥ φ (cid:16) ˆ x + (cid:16) λ k − (cid:17) ¯ x (cid:17) − φ (ˆ x ) λ k − λ k · φ (cid:16) ˆ x + (cid:16) λ k − (cid:17) ¯ x (cid:17) − φ (ˆ x )1 − λ k for all k ∈ N suﬃciently large. Hence for such k ∈ N ,( λ k (cid:63) φ )( x k ) = λ k φ (cid:18) λ k ˆ x + (1 − λ k )¯ xλ k (cid:19) ≤ (1 − λ k ) φ ∞ (¯ x ) + λ k φ (ˆ x ) . Take the limit superior to obtain (6) here. This establishes epi-convergence. (cid:3)

The following result summarizes key properties of the perspective map. It also provides a support-function representation, which means that it can be written as the support function σ D ( y ) ≡ δ ∗D ( y ) = sup x ∈D (cid:104) x, y (cid:105) for some set D . Proposition 1 (Properties of perspective map) . For ω ∈ Γ ( E ω ) , the following hold: (a) ω π ( z, λ ) = σ epi ω ∗ ( z, − λ ) , hence ω π ∈ Γ ( E ω × R ) is sublinear with dom ω π = R + (dom ω × { } ) ; (b) ( ω π ) ∗ ( y, β ) = δ epi ω ∗ ( y, − β ) ; (c) for all ( z, λ ) ∈ dom ω π , ∂ω π ( z, λ ) =  { ( y, − β ) | y ∈ ∂ω ( z/λ ) , β = ω ∗ ( y ) } if λ > , { ( y, − β ) | y ∈ ∂ω ∞ ( z ) , ( y, β ) ∈ epi ω ∗ } if λ = 0 . (7) Proof.

For Parts (a) and (b) see [38, Corollary 13.5.1]. Part (c) follows from [13, Proposition 2.3]or [1, Lemma 3.8]. (cid:3)

The expression for the subdiﬀerential (7), evaluated at the origin, reduces to ∂ω π (0 ,

0) = { ( y, − β ) ∈ epi ω ∗ } , which is just the epigraph of ω ∗ under the reﬂection ( z, λ ) (cid:55)→ ( z, − λ ). This fol-lows because the subdiﬀerential formula ∂ω ∞ (0) = ∂σ dom ω ∗ (0) = dom ω ∗ ; cf. [39, Corollary 8.25].Combettes [13, Corollary 2.5] provides a simpliﬁed characterization of Proposition 1 under theadditional assumption that ω is supercoercive [3, Deﬁnition 11.11].

3. Partial inﬁmal projection with perspective maps

Our main objective in this sectionis to deduce the variational properties of the generalized inﬁmal convolution p L,ω,f deﬁned by (3).Throughout this section, we make the assumptions that L is a linear map from E f × E x to E ω forEuclidean spaces E i , i ∈ { f, x, ω } , that f ∈ Γ ( E f ) and ω ∈ Γ ( E ω ), and that range L ⊆ R + dom ω .Under these standing assumptions, it follows from Theorem 1 below that p L,ω,f is convex.

M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

We lead with a general result on inﬁmal projections.

Theorem 1 (Conjugate and subdiﬀerentials of inﬁmal projection) . For a function ψ ∈ Γ ( E × E ) , the inﬁmal projection p : x ∈ E (cid:55)→ inf u ψ ( x, u ) (8) is convex and (a) p ∗ = ψ ∗ ( · , , which is closed and convex; (b) ∂p ( x ) = { v | ( v, ∈ ∂ψ ( x, ¯ u ) } for all ¯ u ∈ argmin ψ ( x, · ) ; (c) p ∗ ∈ Γ ( E ) if and only if dom ψ ∗ ( · , (cid:54) = ∅ ; (d) p ∈ Γ ( E ) if dom ψ ∗ ( · , (cid:54) = ∅ , and hence the inﬁmum in (8) is attained when ﬁnite.Proof. For convexity of p and Parts (a,b,d,e), see, e.g., [26, Theorem 3.101]. Part (c) follows fromPart (b) via Rockafellar [38, Theorem 23.5]. (cid:3) The following auxiliary result is used in this sectionto derive conjugate and a subdiﬀerential formulas for the value function p L,ω,f . Lemma 2 (Domain and conjugate of linear-perspective composition) . The function η : ( u, x, λ ) ∈ E f × E x × R (cid:55)→ ω π ( L ( u, x ) , λ ) is closed proper convex, i.e., η ∈ Γ ( E f × E x × R ) . The nonempty domain and its (possibly empty)relative interior are given by dom η = { ( u, x, λ ) | λ ≥ , L ( u, x ) ∈ λ · dom ω } , ri (dom η ) = { ( u, x, λ ) | λ > , L ( u, x ) ∈ λ · ri (dom ω ) } . If ri (dom η ) is nonempty, then η ∗ is the indicator to the set C = { ( w, z, µ ) | ∃ y | ( y, − µ ) ∈ epi ω ∗ , L ∗ ( y ) = ( w, z ) } . (9) Proof.

Proposition 1(a) asserts that η ∈ Γ ( E f × E x × R ), and also yields the expression for itsdomain. Now assume that ri (dom η ) is nonempty, and that there exists an element ( u, x ) such that L ( u, x ) ∈ λ · ri (dom ω ) for some λ >

0. Deﬁne the linear map ˜ L : ( u, x, λ ) (cid:55)→ ( L ( u, x ) , λ ). Then, ∅ (cid:54) = { ( u, x, t ) | t > , L ( u, x ) ∈ t · ri (dom ω ) } = { ( u, x, λ ) | ∃ t > L ( u, x ) ∈ t · ri (dom ω ) , λ = t } = ˜ L − R ++ (dom ω × { } ) (i) = ˜ L − ri ( R + (dom ω × { } )) (ii) = ri ( ˜ L − R + (dom ω × { } ))= ri ( ˜ L − dom ω π ) = ri (dom η ) , where (i) uses [38, Corollary 6.8.1] and (ii) uses [38, Theorem 6.7] and the fact that L − ri ( R + (dom ω × { } )) (cid:54) = ∅ .To derive the formula for η ∗ , observe that by our reasoning above ˜ L − ri (dom ω π ) = ri (dom η ) (cid:54) = ∅ .Hence, by [38, Theorem 16.3] and Proposition 1(b), η ∗ ( w, z, µ ) = ( ω π ◦ ˜ L ) ∗ ( w, z, µ )= inf ( u,α ) (cid:110) ( ω π ) ∗ ( u, α ) (cid:12)(cid:12)(cid:12) ˜ L ∗ ( u, α ) = ( w, z, µ ) (cid:111) = inf u { ( ω π ) ∗ ( u, µ ) | L ∗ ( u ) = ( w, z ) } = inf u { δ epi ω ∗ ( u, − µ ) | L ∗ ( u ) = ( w, z ) } = δ C ( w, z, µ ) , which establishes (9) (cid:3) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections We can now deduce the subdiﬀerential and conjugate of the generalized convolution (3).

Theorem 2 (Conjugate and subdiﬀerential of the generalized convolution) . Under theassumptions of Lemma 2, suppose in addition that ∃ ( u, x ) ∈ ri (dom f ) × E x : L ( u, x ) ∈ R ++ ri (dom ω ) . (10) Then the following hold for the convex function p L,ω,f deﬁned in (3) . (a) p ∗ L,ω,f ( y, µ ) = inf w { f ∗ ( w ) | ∃ a : ( a, − µ ) ∈ epi ω ∗ , L ∗ ( a ) = ( − w, y ) } and the inﬁmum is attainedwhen ﬁnite. (b) For all ( x, λ ) ∈ dom p L,ω,f and all ¯ u ∈ argmin u ∈ E f { f ( u ) + ω π ( L ( u, x ) , λ ) } , ∂p L,ω,f ( x, λ ) =  { ( v, − ω ∗ ( y )) | y ∈ ∂ω ( L (¯ u, x ) /λ ) , (0 , v ) ∈ D (¯ u, y ) } if λ > , { ( v, − β ) | y ∈ ∂ω ∞ ( L (¯ u, x )) , (0 , v ) ∈ D (¯ u, y ) , ( y, β ) ∈ epi ω ∗ } if λ = 0 ,where D ( u, y ) := ∂f ( u ) × { } + L ∗ ( y ) . (c) p ∗ L,ω,f ∈ Γ ( E x × R ) if and only if there exist w ∈ dom f ∗ , a ∈ dom ω ∗ , ( y, µ ) ∈ E x × R such that ( a, − µ ) ∈ epi ω ∗ and L ∗ ( a ) = ( − w, y ) . In this case, p L,ω,f ∈ Γ ( E x × R ) and the inﬁmum isattained when ﬁnite.Proof. Set p = p L,ω,f . Part (a). Observe that p ( x, λ ) = inf u ψ ( u, x, λ ) for ψ = φ + η with φ ( u, x, λ ) = f ( u ) (and η as in Lemma 2). We hence compute p ∗ ( y, µ ) = ψ ∗ (0 , y, µ )= ( φ + η ) ∗ (0 , y, µ )= inf ( w,z,δ ) φ ∗ ( w, z, δ ) + η ∗ ( − w, y − z, µ − δ )= inf w f ∗ ( w ) + δ C ( − w, y, µ )= inf w { f ∗ ( w ) | ∃ a : ( a, − µ ) ∈ epi ω ∗ , L ∗ ( a ) = ( − w, y ) } . Here the ﬁrst identity uses Theorem 1. The second is clear from our deﬁnitions above. The thirdrelies on [38, Theorem 16.4] and the fact that assumption (10) is, in view of Lemma 2(b) and thefact that ri (dom φ ) = ri (dom f ) × E x × R , equivalent to the condition ri (dom η ) ∩ ri (dom φ ) (cid:54) = ∅ .The ﬁfth uses the fact that φ ∗ ( v, y, µ ) = f ∗ ( v ) + δ { } ( y, µ ) and Lemma 2 b). The last identity issimply the deﬁnition of the set C in said proposition.Part (b). By (10) we can apply [38, Theorems 23.8-23.9] to ﬁnd ∂ψ ( u, x, λ ) = ∂f ( u ) × { } × { } + ˜ L ∗ ∂ω π ( ˜ L ( u, x, λ ))= ∂f ( u ) × { } × { } + ( L ∗ × id ) ∂ω π ( L ( u, x ) , λ ) . Apply Proposition 1(c) and combine with Theorem 1 to obtain the desired result.Part(c) follows from Theorem 1(d). (cid:3)

We now consider the value function p ω,f : ( x, λ ) ∈ E × R (cid:55)→ inf u ∈ E f ( u ) + ω π ( x − u, λ ) , (11)which corresponds to the standard inﬁmal convolution between f and ω π . This is a special caseof (3) where L ( u, x ) = x − u and E i = E with i = f, x, w . The following result specializes Theorem 1. M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Corollary 1 (Conjugate and subdiﬀerential of inﬁmal convolution) . For the func-tion p ω,f given by (11) , assume that f, ω ∈ Γ ( E ) and ∃ ( u, x ) ∈ ri (dom f ) × E : x − u ∈ R ++ ri (dom ω ) . (12) Then the following hold. (a) p ∗ ω,f ( y, µ ) = f ∗ ( y ) + δ epi ω ∗ ( y, − µ ) . (b) For all ( x, λ ) ∈ dom p ω,f and all ¯ u ∈ argmin u ∈ E { f ( u ) + ω π ( x − u, λ ) } we have ∂p ω,f ( x, λ ) = (cid:8) ( y, − β ) (cid:12)(cid:12) y ∈ ∂f (¯ u ) ∩ ∂ω (cid:0) x − ¯ uλ (cid:1) , β = ω ∗ ( y ) (cid:9) if λ > , { ( y, − β ) | y ∈ ∂f (¯ u ) ∩ ∂w ∞ ( x − ¯ u ) , ( y, β ) ∈ epi ω ∗ } if λ = 0 . (c) p ∗ ω,f ∈ Γ ( E ) if and only if dom p ∗ ω,f = (dom f ∗ × E ) ∩ epi ω ∗ (cid:54) = ∅ . In this case, p ω,f ∈ Γ ( E ) also,and the inﬁmum is attained when ﬁnite.Proof. Use Theorem 2(a)–(c) and observe that L ∗ ( a ) = ( − a, a ). (cid:3) Thus far, our analysis has focused exclusively onthe variational properties of the optimal value function (3) and its specializations. We now turnour attention to the optimal solution map P ω,f : ( x, λ ) ∈ E x × R (cid:55)→ argmin u ∈ E f f ( u ) + ω π ( x − u, λ ) (13)for the inﬁmal convolution deﬁned by (11). In this section we describe the variational-analyticproperties of the solution map, including (Lipschitz) continuity and (directional) smoothness. Tothis end, we introduce required technical machinery from variational analysis [34, 39].Let S : E ⇒ E be a set-valued map between spaces E and E . The domain and graph of S ,respectively, are the sets dom S := { x | S ( x ) (cid:54) = ∅ } and gph S := { ( x, u ) ∈ E × E | u ∈ S ( x ) } . The outer limit of S at ¯ x isLim sup x → ¯ x S ( x ) := { y ∈ E x | ∃{ x k } → ¯ x, { y k ∈ S ( x k ) } → y } . Now let A ⊂ E . The tangent cone of A at ¯ x ∈ A is T A (¯ x ) := Lim sup t ↓ ( A − ¯ x ) /t. The regular normalcone of A at ¯ x ∈ A is the polar of the tangent cone, i.e., ˆ N A (¯ x ) := { v | (cid:104) v, y (cid:105) ≤ ∀ y ∈ T A (¯ x ) } . The limiting normal cone of A at ¯ x ∈ A is N A (¯ x ) := Lim sup x → ¯ x ˆ N A ( x ) . The coderivative of S at(¯ x, ¯ y ) ∈ gph S is the map D ∗ S (¯ x | ¯ y ) : E ⇒ E deﬁned via v ∈ D ∗ S (¯ x | ¯ y )( y ) ⇐⇒ ( v, − y ) ∈ N gph S (¯ x, ¯ y ) . The graphical derivative of S at (¯ x, ¯ y ) is the map DS (¯ x | ¯ y ) : E f ⇒ E x given by v ∈ DS (¯ x | ¯ y )( u ) ⇐⇒ ( u, v ) ∈ T gph S (¯ x, ¯ y ) , or, equivalently, DS (¯ x | ¯ y )( u ) = DS (¯ x | ¯ y )( u ) = Lim sup t ↓ , u (cid:48) → u S (¯ x + tu (cid:48) ) − ¯ yt [39, Eq. 8(14)]. The strictgraphical derivative of S at (¯ x, ¯ y ) is D ∗ S (¯ x | ¯ y ) : E f ⇒ E x given by D ∗ S (¯ x | ¯ y )( w ) = (cid:26) z (cid:12)(cid:12)(cid:12)(cid:12) ∃ (cid:26) { t k } ↓ , { w k } → w, { z k } → z, { ( x k , y k ) ∈ gph S } → (¯ x, ¯ y ) (cid:27) : z k ∈ S ( x k + t k w k ) − y k t k (cid:27) . .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections We adopt the convention to set D ∗ S (¯ x ) := D ∗ S (¯ x | ¯ u ) if S (¯ x ) is a singleton, and proceed analogouslyfor the graphical derivatives.The above generalized derivatives possess the following deﬁniteness properties when applied toa maximally monotone operator T : E ⇒ E , which (by deﬁnition) satisﬁes the inequality (cid:104) v − w, x − y (cid:105) ≥ ∀ ( v, w ) ∈ T ( x ) × T ( y ) , and there is no enlargement of gph T without destroying this inequality. Our conclusion relies on Minty parameterization . Lemma 3.

Let T : E ⇒ E be maximally monotone and let (¯ y, ¯ u ) ∈ gph T . Then the pair ( w, z ) ∈ E × E satisﬁes (cid:104) w, z (cid:105) ≥ if one of the following conditions hold: (a) w ∈ D ∗ T (¯ y | ¯ u )( z ) ; (b) z ∈ D ∗ T (¯ y | ¯ u )( w ) ; (c) z ∈ DT (¯ y | ¯ u )( w ) .Proof. Part (a). See [34, Theorem 5.6].Part (b). For z ∈ D ∗ T (¯ y | ¯ u )( w ) there exist { z k } → z, { t k ↓ } , { ( y k , u k ) ∈ gph T } → (¯ y, ¯ u ), and { w k } → w such that t k z k ∈ T ( y k + t k w k ) − u k ∀ k ∈ N . (14)Now let λ > J λT := ( λT + id ) − . By Minty parameterization [3, Remark 23.23], thereexists { x k } such that ( y k , u k ) = ( J λT ( x k ) , ( x k − J λ ( x k ) /λ ) for all k ∈ N . Combining this with (14)yields x k + t k ( λz k + w k ) ∈ ( λT + id )( y k + t k w k ). Thus, as y k = J λT ( x k ), we have t k w k = J λT ( x k + t k ( λz k + w k ) − J λT ( x k ) ( k ∈ N ) . Because J λT is ﬁrmly nonexpansive [3, Proposition 23.8] and hence1-Lipschitz, it follows that (cid:107) w k (cid:107) ≤ (cid:107) λz k + w k (cid:107) ( k ∈ N ), hence (cid:107) w (cid:107) ≤ (cid:107) λz + w (cid:107) . We infer that − ( λ/ (cid:107) z (cid:107) ≤ (cid:104) z, w (cid:105) . Since λ > λ ↓ DS (¯ x | ¯ u )( w ) ⊂ D ∗ S (¯ x | ¯ u )( w ) for all w ∈ E f . (cid:3) We record another auxiliary result. Here we call S : E f ⇒ E x proto-diﬀerentiable at (¯ x, ¯ u ) ∈ gph S if for any ¯ z ∈ DS (¯ x | ¯ u )( ¯ w ) and any { t k } ↓ { w k } → ¯ w and { z k } → ¯ z such that z k ∈ ( S (¯ x + t k w k ) − ¯ u ) /t k for all k ∈ N . Lemma 4.

Let S : E ⇒ E be given by S = F + T , where F is smooth and T is proto-diﬀerentiable at (¯ x, ¯ u − F (¯ x )) . Then S is proto-diﬀerentiable at (¯ x, ¯ u ) .Proof. Let z ∈ DS (¯ x | ¯ u )( w ) and { t k } ↓

0. Then z − F (cid:48) (¯ x ) w ∈ DT (¯ x | ¯ u − F (¯ x ))( w ) , cf. [39,Exercise 10.43]. By assumption on T , there exist ˜ z k → z − F (cid:48) (¯ x ) w and w k → w such that ˜ z k ∈ [ T (¯ x + t k w k ) − (¯ u − F (¯ x ))] /t k , i.e., ˜ z k + [ F (¯ x + t k w k ) − F (¯ x )] /t k ∈ [ S (¯ x + t k w k ) − ¯ u ] /t k for all k ∈ N .Therefore, z k := ˜ z k + [ F (¯ x + t k w k ) − F (¯ x )] /t k → z and z k ∈ [ S (¯ x + t k w k ) − ¯ u ] /t k for all k ∈ N whichshows the proto-diﬀerentiability of S at (¯ x, ¯ u ). (cid:3) The next and main result in this subsection is based on the implicit mapping framework describedby Rockafellar and Wets [39, Theorem 9.56] together with Lemma 3.

Theorem 3 (Variational properties of the solution map) . Let f ∈ Γ ( E ) and let ω : E → R be strictly convex, level-bounded and twice continuously diﬀerentiable. Let ¯ x ∈ E and ¯ λ > , set ¯ y := P ω,f (¯ x, ¯ λ ) and ¯ V := ∇ ω (cid:0) ¯ x − ¯ y ¯ λ (cid:1) . Then for the solution map P ω,f from (13) the following hold: (a) We have dom P ω,f ⊂ E × R + and P ω,f ( · , λ ) is single-valued for all λ > . (b) If ¯ V is positive deﬁnite, then P ω,f is locally Lipschitz at (¯ x, ¯ λ ) . (c) If ¯ V is positive deﬁnite and ∂f is proto-diﬀerentiable at (cid:0) ¯ y, ∇ ω (cid:0) ¯ x − ¯ y ¯ λ (cid:1)(cid:1) , then P ω,f is is direc-tionally diﬀerentiable at (¯ x, ¯ λ ) . Concretely, for all ( d, ∆) ∈ E × R , we have P (cid:48) ω,f ((¯ x, ¯ λ ); ( d, ∆)) = (cid:20) ¯ λD ( ∂f ) (cid:18) ¯ y (cid:12)(cid:12)(cid:12) ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) + ¯ V (cid:21) − (cid:18) ¯ V d − ∆¯ λ ¯ V (¯ x − ¯ y ) (cid:19) . In fact, semidiﬀerentiable at (¯ x, ¯ λ ) in the sense of [39, p. 332]. M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Proof.

Set P := P ω,f . Part (a). For λ > x ∈ E , the function u (cid:55)→ f ( y ) + ω π ( x − y, λ ) is lsc,proper, strictly convex and level-bounded, and therefore attains a unique minimum.Part (b). Without loss of generality, let E = R n , and observe that, for λ >

0, we have P ( x, λ ) = { y | ∈ S ( x, λ, y ) } , where S ( x, λ, u ) := ∂f ( y ) − ∇ ω (cid:0) x − uλ (cid:1) ( λ > D ∗ S (¯ x, ¯ λ, ¯ y | y ) = (cid:20) − λ ¯ V y, (¯ x − ¯ y ) T ¯ λ ¯ V y, λ ¯ V y (cid:21) + { } × { } × D ∗ ( ∂f ) (cid:18) ¯ y (cid:12)(cid:12)(cid:12) ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( y ) . Hence, ( r, γ, ∈ D ∗ S (¯ x, ¯ λ, ¯ y | y ) if and only if r = − λ ¯ V y, γ = (¯ x − ¯ y ) T ¯ λ ¯ V y, − λ ¯ V y ∈ D ∗ ( ∂f ) (cid:18) ¯ y (cid:12)(cid:12)(cid:12) ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( y ) . Invoke Lemma 3(a) and use ¯ V (cid:31) y = 0, hence r = 0 and γ = 0. Therefore, by [39,Theorem 9.56 (a)], we see that P has the Aubin property at (¯ x, ¯ λ ) for ¯ y , and since P is single-valued,it is locally Lipschitz at (¯ x, ¯ λ ).Part (c). With the deﬁnitions from Part (b), recall that the implication( r, γ, ∈ D ∗ S (¯ x, ¯ λ, ¯ y | y ) ⇒ ( r, γ ) = 0 , y = 0was proved. Now let0 ∈ D ∗ S (¯ x, ¯ λ, ¯ y | (cid:16) w (cid:17) = 1¯ λ ¯ V w + D ∗ ( ∂f ) (cid:18) ¯ y | ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( w ) , see [39, Exercise 10.43], i.e., − λ ¯ V w ∈ D ∗ ( ∂f ) (cid:18) ¯ y | ∇ ω (cid:18) ¯ x − ¯ y ¯ λ (cid:19)(cid:19) ( w ) . By Lemma 3(b), we ﬁnd that w = 0. Since ∂f is assumed to be proto-diﬀerentiable at (cid:0) ¯ y, ∇ ω (cid:0) ¯ x − ¯ y ¯ λ (cid:1)(cid:1) ,Lemma 4 yields that S is proto-diﬀerentiable at ((¯ x, ¯ λ, ¯ y ) , (cid:3) Remark 1 (Proto-differentiability of ∂f from full amenability). Let f ∈ Γ ( E )and ¯ x ∈ dom f . By [39, Corollary 13.41], there exists a neighborhood V of ¯ x such that ∂f is proto-diﬀerentiable at x ∈ V ∩ dom f for any v ∈ ∂f ( x ) if f is fully amenable at ¯ x in the sense that (on aneighborhood of ¯ x ) f = g ◦ F with g ∈ Γ ( E x ) piecewise linear-quadratic and F ∈ C ( E f , E x ) suchthat ker F (cid:48) (¯ x ) ∗ ∩ N cl (dom g ) ( F (¯ x )) = { } . This comprises the following special cases: • f ( x ) = max mi =1 f i ( x ) with f i ∈ Γ ( E ) ∩ C ; • f is (convex and) piecewise linear quadratic; • f is (convex and) twice continuously diﬀerentiable.Since a strongly convex function is both strictly convex and level-bounded (in fact supercoercive)and has positive deﬁnite Hessian everywhere, and since we have D ( ∂f ) = ∇ f wherever f is twicecontinuously diﬀerentiable, we immediately obtain the following result which, of course, can alsobe derived directly from the implicit function theorem. Corollary 2 (Diﬀerentiability of the solution map) . Let (¯ x, ¯ λ ) ∈ E × R ++ such that f ∈ Γ ( E ) is twice continuously diﬀerentiable around P ω,f (¯ x, ¯ λ ) , and let ω ∈ Γ ( E ) be strongly convexand twice continuously diﬀerentiable. Then P ω,f from (13) is continuously diﬀerentiable around (¯ x, ¯ λ ) . Concretely, for all ( x, λ ) suﬃciently close to (¯ x, ¯ λ ) and for all ( d, ∆) ∈ E × R , we have P (cid:48) ω,f ( x, λ )( d, ∆) = (cid:0) λ ∇ f ( y ) + V (cid:1) − (cid:20) V d − ∆ · V (cid:18) x − yλ (cid:19)(cid:21) , where y := P ω,f ( x, λ ) and V := ∇ ω (cid:0) x − yλ (cid:1) . .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections We now reﬁne our study of smoothness properties of the solutionmap P ω,f . We base our analysis on the notion of semismoothness* recently established by Gfrererand Outrata [23], which, in turn, relies on the notion of the directional normal cone introduced byGinchev and Mordukohovich [24] and further advanced by Gfrerer et al. [6, 21, 22].For ¯ x ∈ A ⊂ E , the directional normal cone in the direction ¯ u ∈ E is given by N (¯ x ; ¯ u ) = Lim sup u → ¯ u, t ↓ ˆ N A (¯ x + tu ) . Note that N (¯ x ; ¯ u ) = ∅ if ¯ u / ∈ T A (¯ x ) and that N (¯ x ; ¯ u ) ⊂ N A (¯ x ) for all u ∈ E . Given a set-valuedmap S : E f ⇒ E x , based on the directional normal cone, we deﬁne the directional coderivative [21] D ∗ S ((¯ x, ¯ u ); ( u, v )) : E x ⇒ E f of S at (¯ x, ¯ y ) ∈ gph S in the direction ( u, v ) viagph D ∗ S ((¯ x, ¯ u ); ( u, v ))( v ∗ ) = { u ∗ ∈ E f | ( u ∗ , − v ∗ ) ∈ N gph S ((¯ x, ¯ y ); ( u, v )) } . As N (¯ x ; ¯ u ) = ∅ if ¯ u / ∈ T A (¯ x ), we also havedom D ∗ S ((¯ x, ¯ u ); ( u, v )) = ∅ ∀ ( u, v ) / ∈ DS (¯ x | ¯ u ) . (15) Definition 1 (Semismothness*).

The set A ⊂ E is semismooth* at ¯ x ⊂ A if (cid:104) x ∗ , u (cid:105) = 0 ∀ u ∈ E , x ∗ ∈ N A (¯ x ; u ) . The map S : E ⇒ E is semismooth* at (¯ x, ¯ y ) ∈ gph S if gph S is semismooth* at (¯ x, ¯ y ), i.e., (cid:104) u, u ∗ (cid:105) = (cid:104) v, v ∗ (cid:105) ∀ ( u, v ) ∈ E × E , ( v ∗ , u ∗ ) ∈ gph D ∗ S ((¯ x, ¯ u ); ( u, v )) . The notion of metric (sub)regularity is used only in the next two results, and hence we refer thereader to the abundant literature for a deﬁnition, e.g., [18].

Proposition 2 (Metric regularity and semismoothness*) . Let F : E → E be continu-ously diﬀerentiable at ¯ x , let Q ⊂ E be semismooth* (as a set) at F (¯ x ) and let S : E ⇒ E , S ( x ) := F ( x ) − Q be metrically subregular at (¯ x, . Then F − ( Q ) is semismooth* at ¯ x (as a set).Proof. By [6, Theorem 3.1], for any h ∈ E , N F − ( Q ) (¯ x ; h ) ⊂ F (cid:48) (¯ x ) ∗ N Q ( F (¯ x ); F (cid:48) (¯ x ) h ) , (16)see also [6, Remark 2.1]. Since Q is semismooth* at F (¯ x ), (cid:104) v, z (cid:105) = 0 ∀ z ∈ E , v ∈ N Q ( F (¯ x ); z ) . Therefore (cid:104) v, F (cid:48) (¯ x ) h (cid:105) = 0 ∀ h ∈ E , v ∈ N Q ( F (¯ x ); F (cid:48) (¯ x ) h ) , and hence (cid:104) u, h (cid:105) = 0 ∀ h ∈ E , u ∈ F (cid:48) (¯ x ) ∗ N Q ( F (¯ x ); F (cid:48) (¯ x ) h ) . By (16) this implies that (cid:104) u, h (cid:105) = 0 ∀ h ∈ E , u ∈ N F − ( Q ) (¯ x ; h ) , i.e., F − ( Q ) is semismooth* at ¯ x . (cid:3) Corollary 3 (Semismoothness* of the inﬁmal convolution solution map) . Let f ∈ Γ ( E ) , let (¯ x, ¯ λ ) ∈ E × R ++ and let ω be strongly convex and twice continuously diﬀerentiable.Then the map P ω,f from (13) is semismooth* at ((¯ x, ¯ λ ) , P ω,f (¯ x, ¯ λ )) if ∂f is semismooth* at (cid:0) P ω,f (¯ x, ¯ λ ) , ∇ ω ( λ [¯ x − P ω,f (¯ x, ¯ λ )]) (cid:1) . M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Proof.

Without loss of generality, assume E = R n . Let F : R n × R ++ × R n → R n , F ( x, λ, z ) :=( z, ∇ ω ([ x − z ] /λ ) . Then for all x, z ∈ R n and λ >

0, setting V := ∇ ω ([ x − z ] /λ ) (cid:31) F (cid:48) ( x, λ, z ) = (cid:18) I λ V − λ V ( x − z ) − λ V (cid:19) . Hence, ker F (cid:48) ( x, λ, z ) ∗ = { } for all x, z ∈ R n , λ >

0. Thus, ( x, λ, z ) (cid:55)→ F ( x, λ, z ) − gph ∂f is metri-cally regular. As gph P ω,f = F − (gph ∂f ), if ∂f is semismooth* at (cid:16) P ω,f (¯ x, ¯ x ) , ∇ ω (cid:16) ¯ x − P ω,f (¯ x, ¯ λ )¯ λ (cid:17)(cid:17) = F (¯ x, ¯ λ, P ω,f (¯ x, ¯ λ )), by Proposition 2, P ω,f is semismooth* at ((¯ x, ¯ λ ) , P ω,f (¯ x, ¯ λ )). (cid:3) Corollary 3 provides a suﬃcient criterion for establishing semismoothness* of the solution map P on the interior of its domain. It will be a topic of future research to exploit this on a broadscale, but we can immediately state the following result for a function f ∈ Γ ( E ) which is eithertwice continuously diﬀerentiable or piecewise linear-quadratic (PLQ) in the sense of Rockafellarand Wets [39, Deﬁnition 10.20]. Proposition 3 (Semismoothness* of the subdiﬀerential) . For f ∈ Γ ( E ) , the subgradi-ent ∂f is semismooth* at (¯ x, ¯ y ) ∈ gph ∂f under one of the following conditions: (a) f is twice continuously diﬀerentiable at ¯ x ; (b) f is piecewise linear-quadratic (in which case ∂f is semismooth* on E ).Proof. Assume condition (a) holds. If f is twice continuously diﬀerentiable, then D ( ∂f )(¯ x | ¯ y ) = ∇ f (¯ x ) = D ∗ ( ∂f )(¯ x | ¯ y ), see [39, Example 8.43]. Now let ( u, v ) ∈ T gph ∂f (¯ x, ¯ y ), i.e., v ∈ D ( ∂f )(¯ x | ¯ y )( u ) = {∇ f (¯ x ) u } , and let ( x ∗ , y ∗ ) ∈ N gph ∂f ((¯ x, ¯ y ); ( u, v )) ⊂ N gph ∂f (¯ x, ¯ y ), hence x ∗ ∈ D ∗ ( ∂f )(¯ x | ¯ y )( − y ∗ ) = {−∇ f (¯ x ) y ∗ } . Thus, we have (cid:104) ( x ∗ , y ∗ ) , ( u, v ) (cid:105) = (cid:104) y ∗ , ∇ f (¯ x ) u (cid:105) − (cid:104)∇ f (¯ x ) y ∗ , u (cid:105) = 0 . Now assume condition (b) holds. It follows from [39, Proposition 12.30] that gph ∂f is a ﬁniteunion of polyhedra. Then [23, Proposition 3.4/3.5] yields that gph ∂f is semismooth*, which givesthe desired statement. (cid:3) We now consider an application of Theorem 1 to derive thevariational properties of the optimal value of the constrained optimization problem v : ( x, λ ) ∈ E x × R (cid:55)→ inf u ∈ E f { f ( u ) | L ( u, x ) ∈ λS } , (17)where S ⊂ E ω is a closed convex set. This function can be viewed as a special case of (3), where ω = δ S for some closed convex set S ⊂ E ω . To see this, it is suﬃcient to note that δ πS ( z, t ) =  δ λS ( z ) if λ > δ S ∞ ( z ) if λ = 0,+ ∞ otherwise,and thus L ( u, x ) ∈ λS if and only if δ πS ( L ( u, x ) , λ ) vanishes. Let S ◦ := { v | (cid:104) v, s (cid:105) ≤ ∀ x ∈ S } be thepolar to the set S .The following result is an immediate consequence of the general study in Theorem 2. Corollary 4 (Conjugate and subdiﬀerential of the constrained value function) . Let v be given by (17) with S ⊂ E ω closed and convex, and assume that ∃ u ∈ ri dom f, x ∈ E x : L ( u, x ) ∈ R ++ (ri S ) . Then the following hold. .P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections (a) We have v ∗ ( y, µ ) = inf w { f ∗ ( w ) | ∃ a ∈ − µS ◦ : L ∗ ( a ) = ( − w, y ) } . If S is a cone then v ∗ ( y, µ ) = inf w (cid:8) f ∗ ( w ) + δ R − ( µ ) | ( − w, y ) ∈ L ∗ ( S ◦ ) (cid:9) . (b) For any ( x, λ ) ∈ dom v and ¯ u ∈ argmin u { f ( u ) | L ( u, x ) ∈ λS } , ∂v ( x, λ ) = (cid:40) { ( v, − σ S ( y )) | y ∈ N S ( L (¯ u, x ) /λ ) , (0 , v ) ∈ D (¯ u, y ) } if λ > , { ( v, − β ) | ∃ y ∈ N S ∞ ( L (¯ u, y )) ∩ ( βS ◦ ) : (0 , v ) ∈ D (¯ u, y ) } if λ = 0 ,where D ( u, y ) := ∂f ( u ) × { } + L ∗ ( y ) . If S is bounded (hence compact), then ∂v ( x, λ ) = (cid:40) { ( v, − σ S ( y )) | y ∈ N S ( L (¯ u, x ) /λ ) , (0 , v ) ∈ D (¯ u, y ) } if λ > , { ( v, − β ) | ∃ y ∈ βS ◦ : (0 , v ) ∈ D (¯ u, y ) } if λ = 0 . (c) We have v ∗ ∈ Γ ( E x × R ) if and only if there exist y ∈ E x , w ∈ dom f ∗ , β ∈ R such that ( − w, y ) ∈− βL ∗ ( S ◦ ) . In this case, also v ∈ Γ ( E x × R ) and the inﬁmum is attained when ﬁnite.Proof. Part (a) follows from Theorem 2(a) with w ∗ = σ S . If S is a cone then w ∗ = δ S ◦ . Part (b)follows from Theorem 2(b), observing that ω ∞ = δ S ∞ and that S ∞ = { } if S is bounded, in whichcase N S ∞ = E ω . Part (c) follows from (a) and Theorem 2(c). (cid:3) As an immediate specialization of Corollary 4 we obtaina result on the value function v : ( b, λ ) ∈ R m × R (cid:55)→ inf x ∈ R n { f ( x ) | (cid:107) Ax − b (cid:107) ≤ λ } , (18)where f ∈ Γ ( R n ), A ∈ R m × n is a matrix, and (cid:107) · (cid:107) is any norm in R n . Denote the associated dualnorm by (cid:107) · (cid:107) ◦ , and the corresponding unit-norm ball by B . Corollary 5 (Relaxed linear constraints value function) . If there exists a pair ( x, λ ) ∈ dom f × R ++ such that (cid:107) Ax − b (cid:107) < λ , then the following hold. (a) (conjugate) v ∗ ( y, µ ) = f ∗ ( A T y ) + δ µ B ◦ ( y ) , which is closed proper convex if and only if there exists β and (cid:107) y (cid:107) ◦ ≤ β such that A T y ∈ dom f ∗ . In this case, v is closed proper convex and the inﬁmumis attained when ﬁnite. (b) (subdiﬀerential) For any ( b, λ ) ∈ dom v and ¯ x that achieves the inﬁmum in (18) (and hence (cid:107) A ¯ x − b (cid:107) ≤ λ ), ∂v ( b, λ ) = (cid:40) { ( y, −(cid:107) y (cid:107) ◦ ) | y ∈ N B ([ A ¯ x − b ] /λ ) , − A T y ∈ ∂f (¯ x ) } if λ > , { ( y, − β ) | (cid:107) y (cid:107) ◦ ≤ β, − A T y ∈ ∂f (¯ x ) } if λ = 0 . (c) (primal existence) For λ > and any b ∈ R m , if f ∞ ( y ) > ∀ y ∈ ker A \ { } , (19) then argmin x { f ( x ) + δ π B ( Ax − b, λ ) } (cid:54) = ∅ . This holds, e.g., when f is level-bounded or rank A = n .Proof. Part (a). The expression for the conjugate v ∗ follows from Corollary 4(a) by observingthat L : ( x, b ) (cid:55)→ Ax − b has adjoint L ∗ : z (cid:55)→ ( A T z, − z ) and that σ B = (cid:107) · (cid:107) ◦ . The remaining claimsfor Part (a) follow from Theorem 1. M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Part (b) follows from Corollary 4(b) with the foregoing observations. d) For λ > b ∈ R m ,the eﬀective objective function in (18) is φ ( x ) := f ( x ) + δ λ B ( Ax − b ). With ˆ x such that (cid:107) A ˆ x − b (cid:107) ≤ λ ,which exists by the hypothesis of the theorem, we have( δ λ B ◦ ( A ( · ) − b )) ∞ ( x ) = sup τ> δ λ B ( A ˆ x − b + τ Ax ) = δ ker A ( x ) , where the second identity uses the property that λ B is bounded. With [39, Exercise 3.29] we henceﬁnd that φ ∞ = f ∞ + δ ker A , which shows, using [39, Theorem 3.26], that φ is level-bounded if (19)holds. (cid:3)

4. Moreau envelope and proximal map

In this section we outline existing and new resultsregarding the variational properties of the Moreau envelope and the proximal map of a closedproper convex function.

The

Moreau envelope of f ∈ Γ ( E ) is deﬁned by e λ f ( x ) := min u ∈ E (cid:8) f ( u ) + (1 / λ ) (cid:107) x − u (cid:107) (cid:9) ∀ x ∈ E , λ > , which has a Lipschitz gradient given by ∇ e λ f ( x ) = λ ( x − P λ f ( x )).The following result summarizes limiting properties of the Moreau envelope as λ ↓ Proposition 4 (Convergence of the Moreau envelope) . For f ∈ Γ ( E ) , the followinghold as λ ↓ : (a) e λ f e → f and e λ f p → f (in fact e λ f ( x ) ↑ f ( x ) for all x ∈ E ); (b) λf e → δ cl (dom f ) ; (c) λe λ f ( x ) → d f ) (¯ x ) as x → ¯ x ; (d) λ∂f converges to N cl (dom f ) graphically in the sense of [39, Deﬁnition 5.32]; (e) for x ∈ dom ∂f we have ∇ e λ f ( x ) → argmin g ∈ ∂f ( x ) (cid:107) g (cid:107) .Proof. Part (a). See, e.g., [39, Theorem 1.25, Proposition 7.4].Part (b). By Lemma 1(b) and [38, Theorem 13.3], λ (cid:63) f ∗ e → ( f ∗ ) ∞ = σ dom f . Wijsman’s theo-rem [39, Theorem 11.34] then yields λf = ( λ (cid:63) f ∗ ) ∗ e → δ cl (dom f ) . Part (c). By Part (b), λf e → δ cl (dom f ) . Hence, by [39, Theorem 7.37], λe λ f = e ( λf ) c → e δ cl (dom f ) = d f ) . Part (d). Follows from Part (b) and Attouch [39, Theorem 12.35].Part (e). See [2, Remark 3.32]. (cid:3)

Note that Proposition 4(e) implies that there exists

K > ∀ ¯ x ∈ dom ∂f ∃ K > ∀ λ > (cid:107) P λ f (¯ x ) − ¯ x (cid:107) ≤ Kλ. (20)Proposition 4(a) suggests the following extension of the Moreau envelope at λ = 0: p f : ( x, λ ) ∈ E × R (cid:55)→  e λ f ( x ) if λ > f ( x ) if λ = 0,+ ∞ if λ < p ω,f from (11) with ω = (cid:107) · (cid:107) . Hence, we may rely on our generalstudy on inﬁmal convolution from Section 3.3 to understand the properties of this extension of theMoreau envelope. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Corollary 6 (Conjugate and subdiﬀerential of the Moreau envelope) . Let f ∈ Γ ( E ) .Then p f ∈ Γ ( E × R ) and (a) p ∗ f ( y, µ ) = f ∗ ( y ) + δ epi (cid:107)·(cid:107) ( y, − µ ) and p ∗ f ∈ Γ ( E × R ) ; (b) for all ( x, λ ) ∈ dom p f , ∂p f ( x, λ ) = (cid:40)(cid:0) λ [ x − P λ f ( x )] , − (cid:107) λ [ x − P λ f ( x )] (cid:107) (cid:1) if λ > , (cid:8) ( v, β ) (cid:12)(cid:12) − v ∈ ∂f ( x ) , (cid:107) v (cid:107) ≤ β (cid:9) if λ = 0 . Proof.

We are in the situation of Corollary 1 with ω = (cid:107) · (cid:107) . In particular, the qualiﬁcationcondition (12) is trivially satisﬁed. (cid:3) We now turn our attention to the proximal map. Itis straightforward to show P λ f ( x ) → x as λ ↓ x ∈ dom f . The following proposition, whichgeneralizes this statement, can be derived from monotone operator theory [39, Theorem 12.37].The proof that we provide here instead relies on epigraphical convergence. Proposition 5 (Convergence of the proximal map) . Let f ∈ Γ ( E ) and ¯ x ∈ E . Then lim λ ↓ , x → ¯ x P λ f ( x ) = P cl (dom f ) (¯ x ) . Proof.

Let { λ k } ↓ { x k } → ¯ x , and φ k ( u ) := λ k f ( u ) + (cid:107) u − x k (cid:107) . Use Proposition 4(b) to deduce λ k f e → δ cl (dom f ) . Then because (cid:107) ( · ) − x k (cid:107) c → (cid:107) ( · ) − ¯ x (cid:107) , we obtain φ k e → φ := δ cl (dom f ) + (cid:107) ( · ) − ¯ x (cid:107) ; see [39, Theorem 7.46 b)]. Now observe that P λ k f ( x k ) = argmin φ k and P cl (dom f ) (¯ x ) = argmin φ .Since all functions φ k are convex and φ is level-bounded (in fact, strongly convex), the sequence { φ k } is, by [39, Exercise 7.32 c)], eventually level-bounded (in the sense of [39, p. 266]). Therefore,we can apply [39, Theorem 7.33], with ε k = 0 ( k ∈ N ), to deduce P λ k f ( x k ) → P cl (dom f ) (¯ x ) . (cid:3) We record the following auxiliary result.

Lemma 5.

Let f ∈ Γ ( E ) and ﬁx positive scalars λ and µ . Then for all x ∈ E , µ (cid:0) (cid:107) P µ f ( x ) − x (cid:107) − (cid:107) P λ f ( x ) − x (cid:107) + (cid:107) P µ f ( x ) − P λ f ( x ) (cid:107) (cid:1) ≤ f ( P λ f ( x )) − f ( P µ f ( x )) ≤ λ (cid:0) (cid:107) P µ f ( x ) − x (cid:107) − (cid:107) P λ f ( x ) − x (cid:107) − (cid:107) P µ f ( x ) − P λ f ( x ) (cid:107) (cid:1) , (21) and (cid:107) P λ f ( x ) − P µ f ( x ) (cid:107) ≤ µ − λλ + µ (cid:0) (cid:107) P µ f ( x ) − x (cid:107) − (cid:107) P λ f ( x ) − x (cid:107) (cid:1) . (22) Proof.

Set P ( τ ) := P τ f (¯ x ) for all τ >

0. To obtain the bounds in (21), use [39, Eq. 7(34)] to infer f ( x ) + 12 τ (cid:107) x − ¯ x (cid:107) − f ( P ( τ )) − τ (cid:107) P ( τ ) − ¯ x (cid:107) ≥ τ (cid:107) x − P ( τ ) (cid:107) ∀ τ > , ∀ x ∈ E . For τ = λ and x = P ( µ ), we hence obtain f ( P ( µ )) + 12 λ (cid:107) P ( µ ) − ¯ x (cid:107) − f ( P ( λ )) − λ (cid:107) P ( λ ) − ¯ x (cid:107) ≥ λ (cid:107) P ( µ ) − P ( λ ) (cid:107) . Analogously, for τ = µ and x = P ( λ ), we ﬁnd that f ( P ( λ )) + 12 µ (cid:107) P ( λ ) − ¯ x (cid:107) − f ( P ( µ )) − µ (cid:107) P ( µ ) − ¯ x (cid:107) ≥ µ (cid:107) P ( λ ) − P ( µ ) (cid:107) . Combining the last two inequalities now yields (21). M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Next, use (21) to obtain1 µ (cid:0) (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) + (cid:107) P ( λ ) − P ( µ ) (cid:107) (cid:1) ≤ λ (cid:0) (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) − (cid:107) P ( λ ) − P ( µ ) (cid:107) (cid:1) , or, equivalently (cid:18) λ + 1 µ (cid:19) (cid:107) P ( λ ) − P ( µ ) (cid:107) ≤ (cid:18) λ − µ (cid:19) (cid:0) (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) (cid:1) , which is equivalent to the desired inequality (22) (cid:3) Proposition 5 suggests the following extension of the proxi-mal map of f ∈ Γ ( E ): P f : E × R ⇒ E , P f ( x, λ ) :=  P λ f ( x ) if λ > P cl (dom f ) ( x ) if λ = 0, ∅ if λ < P f . Corollary 7 (Lipschitz continuity of the proximal map) . Let f ∈ Γ ( E ) . Then P f iscontinuous on dom P f = E × R + and is locally Lipschitz on int (dom P f ) . If ¯ x ∈ dom ∂f , then P f is upper Lipschitz (or calm ) at (¯ x, , and the map R + (cid:51) µ (cid:55)→ P f (¯ x, µ ) is locally Lipschitz at , i.e.,there exist positive scalars κ and ε such that (cid:107) P f (¯ x, − P f ( x, λ ) (cid:107) ≤ κ (cid:107) (¯ x − x, λ ) (cid:107) ∀ ( x, λ ) ∈ B ε (¯ x, ∩ dom P f , (23a) (cid:107) P f (¯ x, λ ) − P f (¯ x, µ ) (cid:107) ≤ κ | µ − λ | ∀ λ, µ ∈ [0 , ε ] . (23b) Proof.

The continuity to the boundary of the domain follows from Proposition 5. The localLipschitz continuity on int (dom P f ) follows from Theorem 3 with ω = (cid:107) · (cid:107) .Now assume that ¯ x ∈ dom ∂f , which implies P f (¯ x,

0) = ¯ x ∈ dom f . Then for all λ > (cid:107) P f ( x, λ ) − P f (¯ x, (cid:107) ≤ (cid:107) P λ f ( x ) − P λ f (¯ x ) (cid:107) + (cid:107) ¯ x − P λ f (¯ x ) (cid:107) ≤ (cid:107) x − ¯ x (cid:107) + Kλ, where

K > P λ f is 1-Lipschitz [3]. Set κ := max { , K } to obtain (23a).Let P := P f (¯ x, · ). By (23a), there exist positive scalars κ and ε such that (cid:107) P ( τ ) − ¯ x (cid:107) ≤ κτ for all τ ∈ (0 , ε ]. Hence for µ and λ in (0 , ε ], (cid:107) P ( µ ) − P ( λ ) (cid:107) ≤ µ − λµ + λ ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) )= µ − λµ + λ ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ) · ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) ) ≤ µ − λµ + λ κ ( µ + λ ) ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ) ≤ κ | µ − λ | · (cid:107) P ( µ ) − P ( λ ) (cid:107) , where the ﬁrst inequality follows from (22) of Lemma 5, and the last inequality uses the reversetriangle inequality. Use (23a) to obtain (23b). (cid:3) The following example shows that the assumption ¯ x ∈ dom ∂f required for (23) is not redundant. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Example 1 (Upper Lipschitz continuity of proximal map).

Consider the followingtwo functions, both contained in Γ ( R ): f ( x ) = (cid:40) − log x if x > ∞ otherwise , g ( x ) = (cid:40) −√ x if x ≥ ∞ otherwise . The corresponding extended proximal maps are P f ( x, λ ) = (cid:40) ( x + √ x + 4 λ ) if λ > { x, } if λ = 0, P g (0 , λ ) = (cid:18) λ (cid:19) / ∀ λ ≥ P f . Observe that dom f does not include the origin,and | P f (0 , − P f (0 , λ ) | = √ λ for all λ >

0, which is not upper Lipschitz at (0 , ∂g does not include the origin, and P g is not upper Lipschitz at (0 , (cid:5) The next result on directional diﬀerentiability of P f follows from Theorem 3(c) with ω = (cid:107) · (cid:107) . Corollary 8 (Directional diﬀerentiability of the proximal map) . Let f ∈ Γ ( E ) andﬁx ( x, λ ) ∈ E × R ++ . If ∂f is proto-diﬀerentiable at (cid:0) P f ( x, λ ) , λ [ x − P f ( x, λ )] (cid:1) , then P f is direc-tionally diﬀerentiable at ( x, λ ) with P (cid:48) f (( x, λ ); ( d, ∆)) = (cid:2) λD ( ∂f ) (cid:0) P f ( x, λ ) (cid:12)(cid:12) λ [ x − P f ( x, λ )] (cid:1) + I (cid:3) − (cid:0) d − ∆ λ [ x − P f ( x, λ )] (cid:1) for all ( d, ∆) ∈ E × R . In particular, for any λ > , ( P λ f ) (cid:48) ( x ; · ) = (cid:2) λD ( ∂f ) (cid:0) P f ( x, λ ) (cid:12)(cid:12) λ [ x − P f ( x, λ )] (cid:1) + I (cid:3) − ( · ) P f We now establish semismoothness* of the extended proxi-mal map P f on E × R ++ . We lead with an auxiliary result. Lemma 6.

The map S : E ⇒ E is semismooth* at ( y, z − y ) if and only is S + id is semis-mooth* at ( y, z ) .Proof. The map S is semismooth* at ( y, z − y ) if and only if v ∈ DS ( y | z − y )( u ) , u ∗ ∈ D ∗ S (( y, z − y ); ( u, v ))( v ∗ ) ⇒ (cid:104) u, u ∗ (cid:105) = (cid:104) v, v ∗ (cid:105)⇐⇒ (cid:40) v + u ∈ D ( S + id )( y | z )( u ) ,u ∗ + v ∗ ∈ D ∗ ( S + id )(( y, z ); ( u, u + v ))( v ∗ ) (cid:41) ⇒ (cid:104) u, u ∗ + v ∗ (cid:105) = (cid:104) u + v, v ∗ (cid:105)⇐⇒ S + id semismooth* at ( y, z ) . Here the ﬁrst equivalence is the deﬁnition of semismoothness* and (15). The second uses the sumrule for the graphical derivative [39, Exercise 10.43] and the directional coderivative [6, Corollary5.3 (+ comment)], respectively, when one summand is smooth (here the identity map). The lastequivalence is a variable change and the deﬁnition of semismoothness* and (15) again. (cid:3)

Proposition 6 (Semismoothness* of P f ) . For f ∈ Γ ( E ) , (a) P f is semismooth* at ( x, λ ) if ∂f semismooth* at (cid:0) P f ( x, λ ) , λ [ x − P f ( x, λ )] (cid:1) ; (b) P λ f is semismooth* at x if and only if ∂f is semismooth* at (cid:0) P λ f ( x ) , λ [ x − P λ f ( x )] (cid:1) .Proof. Part (a) follows from Corollary 3 with ω = (cid:107) · (cid:107) . For Part (b), observe that P λ f =( λ∂f + id ) − is semismooth* at x if and only if λ∂f + id is semismooth* at ( P λ f ( x ) , x ) [23, p. 7].By Lemma 6, this is the case if and only if λ∂f is semismooth* at ( P λ f ( x ) , x − P λ f ( x )) which, inturn, holds if and only if ∂f is semismooth* at (cid:0) P λ f ( x ) , λ [ x − P λ f ( x )] (cid:1) . (cid:3) M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Various papers study the semismoothness `a la Qi and Sun [37] of P f on E × R ++ . Most of theseresults, trace the semismoothness of the latter back to the semismoothness of the Euclidean projec-tion onto epi f . The work by Meng et al. [31, 32] deserves explicit mention, and a good discussion ofthese results can be found in Milzarek’s thesis [33]. Bearing our applications in Section 7 in mind,this is somewhat of a circular strategy, and hence we opened up a diﬀerent path via our study inSection 3.3.1 on semismooth* properties of solution maps. For a map that is locally Lipschitz at apoint, semismoothness* diﬀers from traditional semismoothness only in directional diﬀerentiabilityas the following result by Gfrerer and Outrata [23, Corollary 3.8] shows. Lemma 7 (Semismooth vs. semismooth*) . Let F : D ⊂ E → E be locally Lipschitz at x ∈ int D . Then the following are equivalent: (a) F is semismooth at x ; (b) F is semismooth* and directionally diﬀerentiable at x . This lemma gives the following immediate consequence about semismoothness of P f . Corollary 9 (Semismoothness of P f ) . Let f ∈ Γ ( E ) and ﬁx ( x, λ ) ∈ E × R ++ . If ∂f isproto-diﬀerentiable and semismooth* at (cid:0) P f ( x, λ ) , λ [ x − P f ( x, λ )] (cid:1) , then P f is semismooth at ( x, λ ) .This holds, in particular, if f is PLQ or twice continuously diﬀerentiable at P f ( x, λ ) , in which case P f is continuously diﬀerentiable at ( x, λ ) .Proof. For the ﬁrst statement combine Corollary 8, Proposition 6, and Lemma 7. For the second,invoke Remark 1 and Proposition 3. (cid:3)

Note that semismoothness* does not require directional diﬀerentiability of the function in ques-tion. However, semismoothness* is still suﬃcient to yield convergence of Newton-type methodsunder suitable regularity conditions [23, 27]. In view of the above discussion, this is importantbecause the Euclidean projector onto a closed convex set may not be directionally diﬀerentiable [40],in which case the arguments and methods based on (standard) semismoothness are invalidated.

5. The proximal value

The projection onto the epigraph of a function f ∈ Γ ( E ) requiresa particular value of λ so that the equation (2) holds. In this section we examine the variationalproperties of the value of the proximal map as a function of λ , i.e., the function0 < λ (cid:55)→ f ( P λ f (¯ x )) , (24)where ¯ x ∈ E is ﬁxed. Note that this map is not generally convex, as illustrated by the followingcounterexample. Example 2 (Nonconvexity of the proximal value).

Deﬁne f = | · | + δ [ − , ∈ Γ ( R ). ByBeck [4, Example 6.22], P λ f ( x ) = min { max {| x | − λ, } , } · sgn( x ) ∀ x ∈ R , λ > . Hence, for ¯ x = 2, we obtain the nonconvex function f ( P λ f (¯ x )) =  λ ∈ (0 , − λ if λ ∈ (1 , λ > (cid:5) The next result describes the monotonicity and continuity of the map (24).

Corollary 10 (Monotonicity and continuity in λ ) . Let f ∈ Γ ( E ) and ﬁx ¯ x ∈ E . Then (a) 0 < λ (cid:55)→ f ( P λ f (¯ x )) is decreasing (i.e., increasing as λ ↓ ); .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections (b) 0 < λ (cid:55)→ (cid:107) ¯ x − P λ f (¯ x ) (cid:107) is increasing; (c) lim λ → f ( P λ f (¯ x )) = f ( P cl (dom f ) (¯ x )) .Proof. Parts (a) and (b). Let 0 < λ < µ and set P ( λ ) := P λ f (¯ x ), P ( µ ) := P µ f (¯ x ), and δ := ( (cid:107) P ( µ ) − x (cid:107) − (cid:107) P ( λ ) − x (cid:107) ). Then from (21) of Lemma 5, we obtain1 µ δ ≤ f ( P ( λ )) − f ( P ( µ )) ≤ λ δ. As λ < µ , this implies that δ ≥

0, i.e., (cid:107) P ( µ ) − x (cid:107) ≥ (cid:107) P ( λ ) − x (cid:107) , and hence f ( P ( µ )) ≤ f ( P ( λ )).Part (c). Let { λ k } ↓

0. Then p k := P λ k f (¯ x ) → p := P cl (dom f ) (¯ x ); see Proposition 5. It follows that f ( p ) ≥ lim sup k →∞ (cid:2) f ( p k ) + λ k (cid:0) (cid:107) ¯ x − p k (cid:107) − (cid:107) ¯ x − p (cid:107) (cid:1) (cid:3) ≥ lim sup k →∞ f ( p k ) ≥ lim inf k →∞ f ( p k ) ≥ f ( p ) . Here the ﬁrst inequality uses that f ( p ) + λ k (cid:107) ¯ x − p (cid:107) ≥ f ( p k ) + λ k (cid:107) ¯ x − p k (cid:107) for all k ∈ N , bydeﬁnition of p k . The second is due to (cid:107) ¯ x − p k (cid:107) ≥ (cid:107) ¯ x − p (cid:107) , by the deﬁnition of p and since p k ∈ dom f .The last one is just lower semicontinuity of f . (cid:3) As we did with the Moreau envelope and proximal map, we deﬁne the extension of the map (24)to include negative values of λ : η f ¯ x : λ ∈ R (cid:55)→ (cid:40) f ( P λ f (¯ x )) if λ > f ( P cl (dom f ) (¯ x )) if λ ≤ proximal value function . Observe that η f ¯ x ( λ ) = e λ f (¯ x ) − (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) ( λ > . (25)We use Corollary 10 to derive the following result. Corollary 11 (Continuity properties of the proximal value) . Let f ∈ Γ ( E ) and ﬁx ¯ x ∈ E . Then the following hold: (a) η f ¯ x is decreasing, continuous (possibly in an extended real-valued sense), and ﬁnite-valued if (andonly if ) P cl (dom f ) (¯ x ) = ¯ x ∈ dom f . (b) η f ¯ x is locally Lipschitz on R ++ . (c) If ¯ x ∈ dom ∂f , then the assertion in (b) holds on R .Proof. Set η := η f ¯ x . Parts (a) and (b). The fact that η is decreasing follows from Corollary 10(a).Now consider (25). By Corollary 6, the map 0 < λ (cid:55)→ e λ f (¯ x ) is convex and ﬁnite-valued, hencelocally Lipschitz. By Corollary 7(a), this conclusion also holds for 0 < λ (cid:55)→ λ (cid:107) x − P λ f (¯ x ) (cid:107) . Thisgives the local Lipschitz continuity of η on R ++ . The continuity at 0 is due to Corollary 10(c).Part (c). By Parts (a) and (b), and because η is constant (and ﬁnite by assumption) on R − , weonly need to be concerned about the desired properties at 0. To this end, let µ > λ . If λ <

0, then (cid:12)(cid:12)(cid:12)(cid:12) η ( µ ) − η ( λ ) µ − λ (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) η ( µ ) − η (0) µ − (cid:12)(cid:12)(cid:12)(cid:12) . Thus we can restrict ourselves to the case 0 ≤ λ < µ . Set P ( τ ) := P τ f (¯ x ) for all τ > P (0) := ¯ x .Then by Corollary 7(c), there exist positive scalars ε and κ such that (cid:107) P ( µ ) − P ( λ ) (cid:107) ≤ κ ( µ − λ ) ∀ ≤ λ ≤ µ ≤ ε. (26) M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

For 0 < λ < µ ≤ ε , we have | η ( λ ) − η ( µ ) | = η ( λ ) − η ( µ ) ≤ λ ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) − (cid:107) P ( µ ) − P ( λ ) (cid:107) )= λ [( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ) · ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) ) − (cid:107) P ( µ ) − P ( λ ) (cid:107) ] ≤ λ (cid:107) P ( µ ) − P ( λ ) (cid:107) · ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) − (cid:107) P ( µ ) − P ( λ ) (cid:107) ) ≤ κ λ | µ − λ | ( (cid:107) P ( µ ) − ¯ x (cid:107) + (cid:107) P ( λ ) − ¯ x (cid:107) − ( (cid:107) P ( µ ) − ¯ x (cid:107) − (cid:107) P ( λ ) − ¯ x (cid:107) ))= κλ (cid:107) ¯ x − P ( λ ) (cid:107) · | µ − λ |≤ κ | µ − λ | . Here, the ﬁrst identity follows from Corollary 10(a), where the ﬁrst inequality uses Lemma 5(a).The rest the follows from the reverse triangle inequality and (26), recalling that P (0) = ¯ x . (cid:3) Remark 2.

The requirement that ¯ x ∈ ∂f , made in Corollary 11, cannot be relaxed to ¯ x ∈ dom f . To see this, we again use Example 1(b), where η f ¯ x ( λ ) = (cid:40) − ( λ/ if λ ≥ λ < λ = 0. We also conclude fromthis example that the lack of calmness of the proximal map at λ = 0 is not necessarily compensatedby applying f .Under certain assumptions described by Corollary 12, we may interpret the extended proximalvalue function η f ¯ x as the derivative of the convex function¯ φ f ¯ x : λ ∈ R (cid:55)→  − λe λ f (¯ x ) if λ > − d f ) (¯ x ) if λ = 0, − λf ( P cl (dom f ) (¯ x )) − d f ) (¯ x ) if λ <

0; (27)cf. Attouch [2, Remark 3.32].

Corollary 12 (The function ¯ φ f ¯ x ) . Let f ∈ Γ ( E ) and ﬁx ¯ x ∈ E . Then the following hold: (a) ¯ φ f ¯ x is proper, convex and continuous (possibly in an extended real-valued sense), and continuouslydiﬀerentiable on R ++ with ddλ ¯ φ f ¯ x ( λ ) = − f ( P λ f (¯ x )) locally Lipschitz for all λ > . (b) If ¯ x ∈ dom f , then ¯ φ f ¯ x is continuously diﬀerentiable on R with derivative given by ddλ ¯ φ f ¯ x ( λ ) = − η f ¯ x ( λ ) = (cid:40) − f ( P λ f (¯ x )) if λ > , − f ( P cl (dom f ) (¯ x )) if λ ≤ .If, more strictly, ¯ x ∈ dom ∂f , then this derivative is locally Lipschitz on all of R . (c) If P cl (dom f ) (¯ x ) / ∈ dom f , then dom ¯ φ f ¯ x = R + and ∂ ¯ φ f ¯ x ( λ ) = (cid:40) − f ( P λ f (¯ x )) if λ > , ∅ if λ ≤ .Proof. Set ¯ φ := ¯ φ f ¯ x ( λ ). Part (a). It is an easy computation to see that0 < λ (cid:55)→ − ¯ φ ( λ ) = inf u (cid:8) λf ( y ) + (cid:107) u − ¯ x (cid:107) (cid:9) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections is concave, i.e., 0 < λ (cid:55)→ ¯ φ ( λ ) is convex. By setting ¯ φ (0) = − d f ) (¯ x ) and using Proposition 4(a),we see that ¯ φ is a continuous convex function on R + , which is linearly extended to R − . All in all, ¯ φ is convex, proper and continuous (possibly in an extended real-valued) sense. From Corollary 6(b)(and the product rule) we infer, for all λ >

0, that¯ φ (cid:48) ( λ ) = − e λ f (¯ x ) − λ (cid:16) − (cid:13)(cid:13) λ [¯ x − P λ f (¯ x )] (cid:13)(cid:13) (cid:17) = (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) − e λ f (¯ x ) = − f ( P λ f (¯ x )) , where the last equality follows from (25). Hence, the local Lipschitz continuity follows from Corol-lary 11(b).Part (b). Here we assume that P cl (dom f (¯ x ) ∈ dom f . Then by deﬁnition of ¯ φ , we have ¯ φ (cid:48) ( λ ) = − f ( P cl (dom f ) (¯ x )) for all λ <

0. It remains to establish the case λ = 0. To this end, use the subgradientinequality to deduce that g ∈ ∂ ¯ φ (0) if and only if ¯ φ (0) + λg ≤ ¯ φ ( λ ) for all λ if and only if g + e λ f (¯ x ) − λ d f ) (¯ x ) ≤ ∀ λ > , (28a) λg + λf ( P cl (dom f ) (¯ x )) ≤ ∀ λ < , (28b)hold simultaneously. (The case with λ = 0 holds trivially.) From (28a), we infer that g ≤ inf λ> d f ) (¯ x ) − λe λ f (¯ x ) λ (i) = inf λ> ¯ φ ( λ ) − ¯ φ (0) λ (ii) = lim λ ↓ ¯ φ ( λ ) − ¯ φ (0) λ (iii) = lim λ ↓ d f ) (¯ x ) − λe λ f (¯ x ) λ (iv) = lim λ ↓ − f ( P λ f (¯ x ))1 (v) = − f ( P cl (dom f ) (¯ x )) . Here, (i) is simply the deﬁnition of ¯ φ ; (ii) holds because ¯ φ is convex [38, Theorem 23.1]; (iii) fol-lows from the deﬁnition of ¯ φ ; and (iv) follows from l’Hˆopital’s rule, which is applicable becausethe last limit exists by Corollary 10(c), which implies (v). Hence, (28a) is equivalent to g ≤− f ( P cl (dom f (¯ x )). Combined with (28b), which is equivalent to g ≥ − f ( P cl (dom f (¯ x )), establishes that ∂ ¯ φ (0) = {− f ( P cl (dom f (¯ x )) } . Thus, P cl (dom f (¯ x ) ∈ dom f , ¯ φ is diﬀerentiable, and hence continuouslydiﬀerentiable by convexity [38, Corollary 25.5.1]. The remainder follows from Corollary 11(c).Part (c). Here we assume that P cl (dom f (¯ x ) / ∈ dom f . Suppose g ∈ ∂φ (0), i.e., analogous to somearguments in b), g ≤ (1 / λ ) d f ) (¯ x ) − e λ f (¯ x ) ∀ λ > . On the other hand, using e.g., Corollary 10(b), we have(1 / λ ) d f ) (¯ x ) − e λ f (¯ x ) = (1 / λ ) (cid:107) ¯ x − P cl (dom f ) (¯ x ) (cid:107) − (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) − f ( P λ f (¯ x )) ≤ − f ( P λ f (¯ x )) . Since − f ( P λ f (¯ x )) → −∞ as λ ↓

0, this concludes the proof. (cid:3)

In view of the properties of theproximal value function, as outlined by Corollary 11, the question for semismoothness of η f ¯ x on R ++ arises naturally. Now consider the expression (25). The map 0 < λ (cid:55)→ e λ f (¯ x ) is continuouslydiﬀerentiable by Corollary 6(a), hence semismooth [20, Proposition 7.4.5]. Moreover, the map 0 <λ (cid:55)→ (1 / λ ) (cid:107) ¯ x − P λ f (¯ x ) (cid:107) is semismooth if 0 < λ (cid:55)→ P λ f (¯ x ) is semismooth [20, Proposition 7.4.4].Thus, when the latter holds, we can conclude that η f ¯ x is semismooth. We can in addition useCorollary 9, which establishes conditions for the semismoothness of the map ( x, λ ) ∈ E × R ++ (cid:55)→ P λ f ( x ), to obtain the following result. M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Proposition 7 (Semismoothness of the proximal value function) . Let f ∈ Γ ( E ) and ¯ x ∈ E . Then η f ¯ x is semismooth at ¯ λ > if ∂f is proto-diﬀerentiable and semismooth* at (cid:0) P ¯ λ f (¯ x ) , λ [¯ x − P ¯ λ f (¯ x )] (cid:1) . This is the case under either of the following conditions: (a) (PLQ case) f is piecewise-linear quadratic. (b) ( C case) f is twice continuously diﬀerentiable around P ¯ λ f (¯ x ) . In this case, η f ¯ x is continuouslydiﬀerentiable.

6. Post-composition envelopes and proximal maps

Given functions ψ ∈ Γ ( E ) and g ∈ Γ ( R ), we consider the composition( g ◦ ψ )( x ) := (cid:40) g ( ψ ( x )) if x ∈ dom ψ ,+ ∞ otherwise.It is well known that g ◦ ψ is closed proper convex if g is increasing and that the intersection ψ ( E ) ∩ dom g is nonempty; see, for example, Hiriart-Urruty and Lemar´echal [25, Theorem B.2.1.7],who describe this operation as post-composition . We establish variational formulas for the Moreauenvelope and proximal map of the composition g ◦ ψ under a regularity assumption involving theintersection of domains. These results provide us with tools to infer properties of projections ontothe epigraph and level sets of a closed proper convex function, as covered in Section 7. Proposition 8 (Post-composition, Moreau envelopes, and proximal maps) . Let g ∈ Γ ( R ) be increasing and let ψ ∈ Γ ( E ) such that (ri dom g ) ∩ ψ (ri dom ψ ) (cid:54) = ∅ . (29) Then the following properties hold. (a) e ( g ◦ ψ )(¯ x ) = − min λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) , where ¯ φ ψ ¯ x is given by (27) . (b) P ( g ◦ ψ )(¯ x ) = P (¯ λ · ψ )(¯ x ) for every ¯ λ ∈ argmin λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) (cid:54) = ∅ . (c) If ψ ( P cl (dom ψ ) (¯ x )) / ∈ ∂g ∗ (0) , then argmin λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) ⊂ R ++ . This is, in particular, thecase if P cl (dom ψ ) (¯ x ) / ∈ dom ψ .Proof. Part (a). We ﬁnd that e ( g ◦ ψ )(¯ x ) = min x ∈ E (cid:8) (cid:107) x − ¯ x (cid:107) + ( g ◦ ψ )( x ) (cid:9) = − (cid:0) (cid:107) ( · ) − ¯ x (cid:107) + g ◦ ψ (cid:1) ∗ (0)= max y ∈ E ,λ ≥ − (cid:8) g ∗ ( λ ) − (cid:107) y (cid:107) + (cid:104) ¯ x, y (cid:105) + ( λ · ψ ) ∗ ( − y ) (cid:9) = max λ ≥ (cid:8) − g ∗ ( λ ) + max y ∈ E (cid:2) − (cid:107) y (cid:107) − (cid:104) ¯ x, y (cid:105) − ( λ · ψ ) ∗ ( − y ) (cid:3)(cid:9) = max λ ≥ − g ∗ ( λ ) − ¯ φ ψ ¯ x ( λ ) . Here, the third identity uses [10, Corollary 3] with f := (cid:107) ( · ) − ¯ x (cid:107) , F := ψ , and K = R + , realizingthat (29) is equivalent to qualiﬁcation condition [10, Equation (17)] because dom g − R + = dom g ,and observing that attainment is guaranteed by ﬁniteness of the left-hand side. The last identityuses Fenchel duality [38, Theorem 31.1] and the deﬁnition of ¯ φ ψ ¯ x in (27).Part (b). Note that by [10, Corollary 4], ∂ ( g ◦ ψ )( x ) = (cid:91) λ ∈ ∂g ( ψ ( x )) ∂ ( λ · ψ )( x ) ∀ x ∈ dom g ◦ ψ, (30) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections and observe that ∂g ( x ) ⊂ R + because g is increasing. Next, observe that¯ λ ∈ argmin λ ≥ { g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) } , ¯ u = P (¯ λ · ψ )(¯ x ) (i) ⇐⇒ ∈ ∂g ∗ (¯ λ ) + ∂ ¯ φ ψ ¯ x (¯ λ ) , ¯ u = P (¯ λ · ψ )(¯ x ) (ii) ⇐⇒ ψ (¯ u ) ∈ ∂g ∗ (¯ λ ) , ¯ u = P (¯ λ · ψ )(¯ u ) (iii) ⇐⇒ ¯ λ ∈ ∂g ( ψ (¯ u )) , ¯ u = P (¯ λ · ψ )(¯ x ) (iv) ⇐⇒ ¯ λ ∈ ∂g ( ψ (¯ u )) , ∈ ¯ u − ¯ x + ∂ (¯ λ · ψ )(¯ x ) (v) = ⇒ ¯ u = P ( g ◦ ψ )(¯ x ) . Equivalence (i) is valid because int (dom g ∗ ) ⊂ R ++ ⊂ int (dom ¯ φ ψ ¯ x ); see [10, Lemma 4] and Corol-lary 12, respectively. Corollary 12(b) justiﬁes equivalence (ii). Equivalence (iii) is the inver-sion formula for the subdiﬀerential [38, Corollary 23.5.1]. Equivalence (iv) uses the optimal-ity conditions that uniquely determines ¯ u = P (¯ λ · ψ )(¯ x ). Implication (v) follows from (30) andthe optimality conditions that uniquely determine P ( g ◦ ψ )(¯ x ). Taken together, we deducethat for any ¯ λ ∈ argmin λ ≥ g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ), we have P ( g ◦ ψ )(¯ x ) = P ( λ · ψ )(¯ x ). The fact thatargmin λ ≥ (cid:8) g ∗ ( λ ) + ¯ φ ψ ¯ x ( λ ) (cid:9) (cid:54) = ∅ follows from Part (a).Part (c). Recall from Part (b) that 0 ∈ argmin λ ≥ { g ∗ + φ ψ ¯ x } entails 0 ∈ ∂g ∗ (0) + ∂φ f ¯ x (0) . In viewof Corollary 12(c), we must have P cl (dom ψ ) (¯ x ) ∈ dom ψ , in which case ∂φ ψ ¯ x (0) = − ψ ( P cl (dom ψ ) (¯ x )),by Corollary 12(b). This proves the claim. (cid:3)

7. Epigraphical and level-set projections

We are now equipped to answer the initial ques-tion about computing epigraphical and level-set projections via proximal mappings. Our approachis based on the Moreau envelopes of the indicator functions to the epigraph and level set of afunction f , which we express as the post-compositions δ lev α f = ( δ R − ) ◦ ( f ( · ) − α ) and δ epi f = ( δ R − ) ◦ ( f ( · ) − ( · )) . Proposition 8 provides the required tools.

Corollary 13 (Level-set projection) . Let f ∈ Γ ( E ) , (¯ x, ¯ α ) ∈ E × R , and assume thereexists ˆ x ∈ E such that f (ˆ x ) < ¯ α . Then the following statements hold. (a) (Dual representation of distance to level set) d ¯ α f (¯ x ) = − min λ ≥ (cid:8) ¯ φ f ¯ x ( λ ) + ¯ αλ (cid:9) . (b) (Projection onto level set) P lev ¯ α f (¯ x ) = (cid:40) P cl (dom f ) (¯ x ) if f ( P cl (dom f ) (¯ x )) ≤ ¯ α , P ¯ λ f (¯ x ) otherwise,for any positive ¯ λ in the optimal solution set argmin λ ≥ { ¯ φ f ¯ x ( λ ) + ¯ αλ } = { λ ≥ | f ( P λ f (¯ x )) = ¯ α } (cid:54) = ∅ . Proof.

Set g := δ R − and ψ : x ∈ E (cid:55)→ f ( x ) − ¯ α . Then g ∈ Γ ( R ) is increasing and ψ ∈ Γ ( E ) withdom ψ = dom f and δ lev ¯ α f = g ◦ ψ . Now observe that (29) applied to this setting is equivalent tosaying that there exists ¯ y ∈ ri (dom f ) such that f (¯ y ) < ¯ α . We (only) assume that there exists ˆ x ∈ dom f such that f (ˆ x ) < ¯ α . However, take any z ∈ ri (dom f ), then, by the line segment principle [38,Theorem 6.1], we have y λ := λz + (1 − λ )ˆ x ∈ ri (dom f ) for all λ ∈ (0 , f ( y λ ) < λf ( z ) +(1 − λ ) ¯ α → ¯ α as λ ↓

0. Hence there exists ˆ λ ∈ (0 ,

1] suﬃciently small such that f ( y ˆ λ ) < ¯ α . Henceˆ y := y ˆ λ ∈ ri (dom f ) with f (ˆ y ) < ¯ α , and (29) holds. M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Part (a). For all λ ≥

0, ¯ φ ψ ¯ x ( λ ) = (cid:40) − λe λ ψ (¯ x ) if λ > − d ψ ) (¯ x ) if λ = 0,= (cid:40) − λ ( e λ f (¯ x ) − ¯ α ) if λ > − d f ) (¯ x ) if λ = 0,= ¯ φ f ¯ x ( λ ) + ¯ αλ. Use Proposition 8(a) and the fact that g ∗ = δ R + to deduce that d ¯ α f (¯ x ) = e δ lev ¯ α f (¯ x ) = e ( g ◦ ψ )(¯ x ) = − min λ ≥ (cid:8) ¯ φ f ¯ x ( λ ) + ¯ αλ (cid:9) . Part (b). The equality of the two sets in question is clear from the (necessary and suﬃcient)optimality conditions and Corollary 12. The rest follows from Proposition 8, Parts (b) and (c)because P lev ¯ α f (¯ x ) = P ( g ◦ ψ )(¯ x ). (cid:3) Corollary 14 (Epigraphical projection) . Let f ∈ Γ ( E ) and (¯ x, ¯ α ) ∈ E × R . Then the fol-lowing statements hold. (a) (Dual representation of distance to epigraph) d f (¯ x, ¯ α ) = − min λ ≥ (cid:8) ¯ φ f ¯ x ( λ ) + ¯ αλ + λ (cid:9) . (b) (Projection onto epigraph) P epi f (¯ x ) = (cid:40) [ P cl (dom f ) (¯ x ) , ¯ α ] if f ( P cl (dom f ) (¯ x )) ≤ ¯ α , [ P ¯ λ f (¯ x ) , ¯ α + ¯ λ ] otherwise,where ¯ λ > is the unique solution of the strongly convex optimization problem min λ ≥ λ + ¯ αλ + ¯ φ f ¯ x ( λ ) . Equivalently, λ is the unique root of the strictly decreasing function < λ (cid:55)→ f ( P λ f (¯ x )) − λ − ¯ α .Proof. Analogous to the proof of Corollary 13, we deﬁne closed proper convex functions g := δ R − and ψ : ( x, α ) ∈ E × R (cid:55)→ f ( x ) − α so that δ epi f = g ◦ ψ . Therefore, ψ (ri (dom ψ )) = ψ (ri (dom f ) × R ) = f (ri (dom f )) − R = R , and thus the qualiﬁcation condition (29) is trivially satisﬁed in this setting.Part (a). Note that e λ ψ ( x, α ) = e λ f ( x ) + e λ ( − id )( α ) for all λ > ψ = dom f , ¯ φ ψ ¯ x, ¯ α ( λ ) = ¯ φ f ¯ x ( λ ) + ¯ α · λ + λ ( λ ≥ . Apply Proposition 8(a) to obtain the desired result.Part (b). Apply Proposition 8(b), observing that P epi f (¯ x, ¯ α ) = P δ epi f (¯ x, ¯ α ) and P ( λ · ψ )(¯ x, ¯ α ) =[ P ( λf )(¯ x ) , ¯ α + λ ] for all λ ≥ λ > (cid:3) Remark 3 (Prior work).

The level-set projection result Corollary 13 encompasses the resultdescribed by Beck [4, Theorem 6.30]. For epigraphical projection, Corollary 14 generalizes Beck [4,Theorem 6.36] to include functions that aren’t ﬁnite-valued. For functions f ∈ Γ ( E ) with opendomain, Chierchia et al. [11, Proposition 1] describe an alternative formula for epigraphical pro-jections via proximal maps. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections Algorithm 1 SC Newton method for minimizing θ ξ (S.0) Choose λ , δ > { ε k } ↓

0, and let β, σ ∈ (0 , k := 0.(S.1) If | θ (cid:48) ( λ k ) | ≤ δ : STOP.(S.2) Choose g k ∈ ∂ B ( θ (cid:48) ξ )( λ k ) and set∆ k := P [ − λ k , ∞ ) (cid:18) − θ (cid:48) ξ ( λ k ) g k + ε k (cid:19) . (S.3) Set t k := max l ∈ N (cid:8) β l (cid:12)(cid:12) θ ξ ( λ k + β l ∆ k ) ≤ θ ξ ( λ k ) + β l σθ (cid:48) ξ ( λ )∆ k (cid:9) . (S.4) Set λ k +1 := λ k + t k ∆ k , k ← k + 1, and go to (S.1). optimization framework In this section we present a uniﬁed algorithmicframework for computing projections onto the level sets and the epigraph of a closed proper convexfunction. Corollaries 13 and 14, respectively, guide us in how to compute these projections. For agiven f ∈ Γ ( E ) and (¯ x, ¯ α ) ∈ E × R such that f (¯ x ) > ¯ α , the epigraphical and level-set projections,respectively, correspond to the proximal map of f with parameter λ that solves the scalar problemmin λ ≥ θ ξ ( λ ) ( ξ ∈ { epi , lev } ) , (31)for θ ξ : R → R given by θ ξ ( λ ) = (cid:40) ¯ φ f ¯ x ( λ ) + ¯ αλ if ξ = lev,¯ φ f ¯ x ( λ ) + ¯ αλ + λ if ξ = epi . (32)Corollary 12 asserts that θ ξ is convex, continuous (possibly in an extended real-valued sense), andcontinuously diﬀerentiable with monotonically increasing, locally Lipschitz derivative on R ++ . Inparticular, for any λ > θ (cid:48) ξ ( λ ) = (cid:40) − η f ¯ x ( λ ) + ¯ α if ξ = lev, − η f ¯ x ( λ ) + ¯ α + λ if ξ = epi , (33)The minimization of φ η could be accomplished using bisection if an upper bound on the optimal λ is available. However, the semismoothness of the derivative (33), described by Proposition 7, allowsus to tap into the powerful SC optimization framework [20, 36] that operates on functions θ : R → R that are semismoothly diﬀerentiable (i.e., SC ), which means that at points ¯ λ ∈ int (dom θ ), thegradient θ (cid:48) exists, and it is locally Lipschitz around ¯ λ and semismooth at ¯ λ . The semismoothmethod, outlined by Algorithm 1, applies to the problem (31) whenever conditions (A1) and (A2)of Pang and Qi [36] hold, which is the case when ¯ x ∈ dom ∂f ; see Corollary 12.Algorithm 1 uses the notion of a Bouligand subdiﬀerential , which for a function φ : R n → R that is locally Lipschitz at a point ¯ x ∈ int (dom φ ), is deﬁned at ¯ x as ∂ B φ (¯ x ) = { v | ∃{ x k ∈ D φ , x k → ¯ x } : ∇ φ ( x k ) → v } , where D φ is the set of points of diﬀerentiability of φ . The Clarke subdiﬀerential [12] of φ at ¯ x is ∂ C φ (¯ x ) := conv ∂ B φ (¯ x ) , which coincides (on the interior ofdom φ ) with the convex subdiﬀerential if φ is convex. Remark 4.

Because θ ξ is convex and diﬀerentiable with locally Lipschitz derivative on R ++ ,all elements in the Clarke subdiﬀerential ∂ C ( θ (cid:48) ξ )( λ ) are nonnegative for all λ > ξ = epi ), the quadratic term in the expression for θ epi in (32) implies thatthe elements are bounded below by 1. Thus, the sequence of regularization parameters { ε k } ↓ θ (cid:48) is piecewise aﬃne, the regularization could beeliminated by setting the constant regularization ε k := 0 for all k , which would improve numericalconvergence regardless of the optimality parameter δ > M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

Algorithm 2

Full-step SC Newton method(S.0) Choose λ > δ >

0, and { ε k } ↓

0. Set k := 0.(S.1) If | θ (cid:48) ξ ( λ k ) | ≤ δ : STOP.(S.2) Choose g k ∈ ∂ C ( θ (cid:48) ξ )( λ k ) and set∆ k := max (cid:26) − λ k , − θ (cid:48) ξ ( λ k ) g k + ε k (cid:27) . (S.3) Set λ k +1 := λ k + ∆ k , k ← k + 1, and go to (S.1). θ (cid:48) ξ is concave on (0 , λ l ) Corollaries 13 and 14 imply that thereexists positive parameters λ l ≤ λ u such that[ λ l , λ u ] = argmin λ ≥ θ ξ = (cid:8) λ > (cid:12)(cid:12) θ (cid:48) ξ ( λ ) = 0 (cid:9) , (34)for both the epigraphical and level-set cases. In the epigraphical case in particular, the solutionis unique, and thus λ u = λ l ; see Corollary 14(b). If the derivative φ ξ is concave on the interval(0 , λ (cid:96) ), it is possible to take a full Newton step at every iteration while respecting positivity of theiterates, thus saving the computational cost of a backtracking line-search. The simpliﬁed iterationis described by Algorithm 2.For many important functions, e.g., the 1-norm or negative log, (and their spectral counterparts),the respective map θ (cid:48) ξ is concave on R ++ , but, as suggested above, we only need the following: Assumption 1 (Concavity (0 , λ l )) . The function θ (cid:48) ξ is concave on (0 , λ l ) . Proposition 9 (Convergence of Algorithm 2) . Under Assumption 1, the full-step New-ton method from Algorithm 2 converges to a minimizer of θ ξ .Proof. Set θ = θ ξ . If 0 < λ k < λ l for some k ∈ N , then by Corollary 11(a), θ (cid:48) ( λ k ) < − θ (cid:48) . Therefore, λ k +1 = λ k − θ (cid:48) ( λ k ) g k + ε k > λ k . Since − ( g k + ε k ) is a convex subgradient of − ( θ (cid:48) + ε k ( · )), the concavity of θ (cid:48) ξ implies that − θ (cid:48) ( λ k +1 ) − ε k ( λ k +1 − λ k ) ≥ − θ (cid:48) ( λ k ) − ( λ k +1 − λ k )( g k + ε k ) = 0 , and hence θ (cid:48) ( λ k +1 ) <

0, thus 0 < λ k < λ k +1 < λ l . Consequently, by an inductive argument, { λ k } converges to some ˜ λ . Therefore, the sequence { g k ∈ ∂ C ( θ ξ )( λ k ) } is bounded, and hence0 = ( λ k +1 − λ k )( g k + ε k ) + θ (cid:48) ( λ k ) → θ (cid:48) (˜ λ ) , which shows that ˜ λ has the desired properties. We hence still need to cover the case where λ l < λ k for all k ∈ N . In view of (34), we can assume that λ u < λ k for all k ∈ N . (Otherwise, a solution hasalready been obtained.) Since θ (cid:48) ( λ k ) > < λ u < λ k +1 = λ k + max (cid:26) − λ k , − θ (cid:48) ( λ k ) g k + ε k (cid:27) ≤ λ k , hence the sequence { λ k } converges to some ˆ λ . In particular, λ k +1 = λ k only ﬁnitely many times.Hence, without loss of generality, 0 = ( λ k +1 − λ k )( g k + ε k ) + θ (cid:48) ( λ k ) → θ (cid:48) (ˆ λ ) , which gives θ (cid:48) (ˆ λ ) = 0also here. (cid:3) .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections − − − − − Figure 2.

The function θ (cid:48) epi corresponding to theprojection of point ¯ x = ( − , . , , .

3) onto the 1-norm unit ball. φ (cid:48) ξ − − Figure 3.

The function θ (cid:48) epi for Example 3, forwhich Algorithm 2 may cycle. The next example illustrates that cycling may occur in Algorithm 2 if Assumption 1 fails.

Example 3 (Cycling).

Consider the scalar function f ( x ) = 2 | x | + δ [ − , ( x ), and the task ofprojecting the (¯ x, ¯ α ) = (4 , −

1) onto epi f . Figure 3 illustrates the function θ (cid:48) epi whose root we seek.Then for λ outside of the interval [1 . ,

2] the iterates λ k ( k ∈ N ) generated by Algorithm 2 oscillatebetween 1 . We present numerical experiments that hint at the com-putational eﬀectiveness of the SC optimization framework described in Section 7.1. The twoexperiments in this section were run on an Apple Macbook Air with a 1.8GHz Intel Corei5 and 8Gb RAM running OS 10.14.6. The code was written in C and available at https://github.com/arielgoodwin/epi-proj . An important instance of the level-set case ( ξ = lev)is the projection onto the unit 1-norm ball lev (cid:107) · (cid:107) = { x ∈ R n | (cid:107) x (cid:107) ≤ } . The derivative of thecorresponding function θ lev reads θ (cid:48) lev ( λ ) = (cid:40) − (cid:80) ni =1 max {| x i | − λ, } if λ ≥ − || x || if λ < R + (as required) and piecewise aﬃne, as shown by Fig. 2.We implemented Algorithm 2 and compared it numerically to two state-of-the-art algorithmsspeciﬁcally tailored to 1-norm-ball projection, namely Condat’s sorting-based method [16] as imple-mented in the code condat l1ballproject.c , and Liu and Ye’s improved bisection algorithm(IBIS) [30] implemented in the eplb module in SLEP [41].The entries of the projected vectors ¯ x ∈ R n are drawn from a Gaussian distribution with zeromean and standard deviations σ = { . , . , . , . } . The optimality tolerance was ﬁxed at δ = 10 − , as in step (S.1) of Algorithm 2. Table 1 reports the average time required to computethe projection over 10 trials for vectors of dimension n ∈ { , } , and over 500 trials for n = 10 .The initial point λ > √ n log n coordinates randomly fromthe vector ¯ x and setting λ to be the largest of their absolute values. Observe that Algorithm 2exhibits comparable performance relative to the specialized algorithms. We now consider the epigraphical projec-tion for a function that is not polyhedral. Deﬁne the function f : x ∈ R n (cid:55)→ − (cid:80) ni =1 log x i , where wetake the negative logarithm to be ∞ outside the positive orthant. Figure 4 illustrate the function M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections n Algorithm 2 Condat IBIS Algorithm 2 Condat IBIS σ = 0 . σ = 0 . . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − σ = 0 . σ = 0 . . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − . × − Table 1 . Average time (seconds) for projecting vectors onto the 1-norm unit ball in dimension n , with coordinateschosen using Gaussian distributions with standard deviation σ . − − − − − − − (¯ x, ¯ α ) = (+1 , −

1) (¯ x, ¯ α ) = ( − , − Figure 4.

The graph of the function θ (cid:48) epi ( λ ) that corresponds to the base points (¯ x, ¯ α ) shown for each ﬁgure. Theleft panel depicts the case where P cl (dom f ) (¯ x ) ∈ dom f ; the right panel depicts the case where P cl (dom f ) (¯ x ) / ∈ dom f . n = 1 n = 10 n = 10 SSN 8 . × − . × − . × − Bisection 1 2 . × − . × − . . × − . × − . Table 2 . Time (seconds) for projecting vectors onto the epigraph of f ( x ) = − (cid:80) ni =1 log x i in various dimensions n . φ (cid:48) epi for the case when P cl (dom f ) (¯ x ) is in, and not in, the domain of f . These functions are concaveover (0 , ∞ ). Hence − θ (cid:48) ξ is convex over this interval and Algorithm 2 applies.We numerically compare Algorithm 2 and the bisection method as solution approaches for (31).The coordinates of ¯ x were chosen uniformly at random on the interval [ − , α waschosen uniformly at random on the interval [ − , − . λ was chosen to be √ N .The termination condition for Algorithm 2 was | θ (cid:48) ξ ( λ ) | < − , and the termination conditions forbisection was | θ (cid:48) ξ ( λ ) | < − (labeled Bisection 1 ) and | b − a | < − (labeled Bisection 2 ), where a, b denote the endpoints of the bisection interval. Table 2 shows the average times over 10 trialswhen n ∈ { , } , and over 500 trials when n = 10 . The numerical examples we presented extend easily to other useful casesinvolving matrices, such as the nuclear norm on R m × n and the barrier function − log det on thespace of symmetric matrices, using variational formulas that depend on matrix spectra [28, 29]. .P. Friedlander, A. Goodwin, and T. Hoheisel: From perspective maps to epigraphical projections In these cases, the main computational eﬀort involves computing singular value and eigenvaluedecompositions, respectively, of the matrix iterates.The cases where θ (cid:48) ξ does not satisfy either Assumption 1 or the domain condition P cl (dom f ) (¯ x ) ∈ dom f lies outside the theoretical guarantees presented in this section, though the algorithms wepresent may still work in practice. In the case where dom f (cid:40) E is open, the formula provided byChierchia et al. [11, Proposition 1] is a viable option.

8. Final remarks

Our analysis on the variational properties of epigraphical projections andinﬁmal convolution is motivated by the authors’ larger research interests on variations of ﬁrst-ordermethods that operate in a lifted space. The promising work by Chierchia et al. [11] on epigraphical-projection methods for minimizing convex functions over p -norm constraints shows promise forthis algorithmic approach, and we aim to develop methods for more general problem classes. Weare also motivated by statistical M-estimation approaches that include as an additional unknowna particular parameter that characterizes data distribution [15]. The variational calculus that wederive is a useful tool for developing algorithmic approaches for solving these lifted M-estimationproblems.There are at least two avenues of future research that extend our analysis in this paper. K -epigraphical projections. A signiﬁcant generalization of the post-composition operationdeﬁned in Section 6 occurs when we allow compositions of the form f = g ◦ H : E → R , where • K ⊂ E a closed convex cone; • H : E → E K -convex, i.e., the K -epigraph { ( X, Y ) | Y − H ( x ) ∈ K } is convex; • g ∈ Γ ( E ) K -increasing , i.e., g ≤ g (( · ) + v ) for all v ∈ K .This convex convex-composite setting was studied by Burke et al. [10], and the required subdif-ferential formulas for the analysis are readily available. This may lead to a proximal calculus andultimately to formulas and algorithms for projecting onto K -epigraphs, thus encompassing thestudy in Section 6. Semismoothness* of subdiﬀerential operators.

The notion of semismooth* sets andmaps is recent and still in development. One of the critical conditions in our study is the semis-moothness* of the subdiﬀerential operator ∂f , which also occurs in a recent report by Khanhet al. [27]. This suggests an important avenue of research that relaxes the overarching convexityassumption and, in particular, establishes veriﬁable suﬃcient conditions. Acknowledgments

M.P. Friedlander and T. Hoheisel are supported by NSERC Discoverygrants, while A. Goodwin’s work was partially supported by an NSERC summer research stipend.T. Hoheisel would like to thank Dr. Matus Benko, University of Vienna, for valuable discussionson semismoothness*.

References [1]

A.Y. Aravkin , J.V. Burke, D. Drusvyatskiy, M.P. Friedlander, and K.J. MacPhee:

Foun-dations of Gauge and Perspective Duality.

SIAM Journal on Optimization, 28(3), 2018, pp. 2406–2434.[2]

H. Attouch:

Variational Convergence for Functions and Operators.

Applied Mathematics Series,Pittman, Boston, 1984.[3]

H.H. Bauschke and P.L. Combettes:

Convex analysis and Monotone Operator Theory in HilbertSpaces.

CMS Books in Mathematics, Springer, New York, 2nd Edition, 2017.[4]

A. Beck:

First-Order Methods in Optimization . MOS-SIAM Series on Optimization, 2017.[5]

A. Beck and M. Teboulle:

Smoothing and ﬁrst order methods: A uniﬁed framework.

SIAM Journalon Optimization 22 (2), 2012, pp. 557–580.[6]

M. Benko, H. Gfrerer, and J.V. Outrata:

Calculus for Directional Limiting Normal Cones andSubdiﬀerentials.

Set-Valued and Variational Analysis 27, 2019, pp. 713–745.0

M.P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections [7]

M. Bougeard, J.P. Penot, and A. Pommellet:

Towards minimal assumptions for the inﬁmalconvolution regularization.

Journal of Approximation Theory 64(3), 1991, pp. 245–270.[8]

J.V. Burke and T. Hoheisel:

Epi-convergent smoothing with applications to convex composite func-tions.

SIAM Journal on Optimization 23(3), 2013, pp. 1457–1479.[9]

J.V. Burke and T. Hoheisel:

Epi-convergence properties of smoothing by inﬁmal convolution.

Set-Valued and Variational Analysis 25, 2017, pp. 1–23.[10]

J.V. Burke, T. Hoheisel, and Q.V. Nguyen:

A study of convex convex-composite functions viainﬁmal convolution with applications.

Mathematics of Operations Research, to appear.[11]

G. Chierchia, N. Pustelnik, J.-C. Pesquet, B. Pesquet-Popescu:

Epigraphical projection andproximal tools for solving constrained convex optimization problems.

Signal, Image and Video Processing9, 2015, pp. 1737–1749.[12]

F.H. Clarke:

Optimization and Nonsmooth Analysis.

John Wiley & Sons, New York, 1983.[13]

P.L. Combettes:

Perspective functions: properties, constructions, and examples.

Set-Valued and Vari-ational Analysis 26, 2019, pp. 247–264.[14]

P.L. Combettes and C.L. M¨uller:

Perspective functions: proximal calculus and applications inhigh-dimensional statistics.

Journal of Mathematical Analysis and Applications 457(2), 2018, pp. 1283–1306.[15]

P.L. Combettes and C.L. M¨uller:

Perspective maximum likelihood-type estimation via proximaldecomposition.

Electronic Journal of Statistics 14, 2020, pp. 207–238.[16]

L. Condat:

Fast projection onto the simplex and l1 ball.

Mathematical Programming, Series A,Springer, 2016, 158 (1), pp. 575–585.[17]

L. Condat:

URL https://lcondat.github.io/software.html

Last accessed January 27, 2021.[18]

A.L. Dontchev and R.T. Rockafellar:

Implicit Functions and Solution Mappings. A View fromVariational Analysis.

Springer Series in Operations Research and Financial Engineering, Springer-VerlagNew York, 2014.[19]

J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra:

Eﬃcient projections onto the l1-ballfor learning in high dimensions.

ICML ’08: Proceedings of the 25th international conference on Machinelearning, ACM, New York, NY, USA, 2008, pp. 272–279.[20]

F. Facchinei and J.-S. Pang:

Finite-Dimensional Variational Inequalitites and ComplementarityProblems, Volumes I and II , Springer, New York, 2003.[21]

H. Gfrerer:

On directional metric subregularity and second-order optimality conditions for a class ofnonsmooth mathematical programs.

SIAM Journal on Optimization 23(1), 2013, pp. 63–665.[22]

H. Gfrerer:

On directional metric regularity, subregularity and optimality conditions for nonsmoothmathematical programs.

Set-Valued and Variational Analysis 21, 2013, pp. 151–176.[23]

H. Gfrerer and J.V. Outrata:

On a semismooth* Newton method for solving generalized equations.

SIAM Journal on Optimization. 31(1), 2021, pp. 489–517.[24]

I. Ginchev and B.S. Mordukhovich:

Directional subdiﬀerentials and optimality conditions.

Positiv-ity 16, 2012, pp. 707–737.[25]

J.-B. Hiriart-Urruty and C. Lemar´echal:

Fundamentals of Convex Analysis.

Grundlehren TextEditions, Springer, Berlin, Heidelberg, 2001.[26]

T. Hoheisel:

Topics in Convex Analysis in Matrix Space.

Lecture Notes, Spring School on VariationalAnalysis, Paseky nad Jizerou, Czech Republic, 2019.[27]

P.D. Khanh, B.S. Mordukhovich, and V.T. Phat:

A generalized Newton method for subgradientsystems. arXiv:2009.10551, 2020.[28]

A.S. Lewis:

The convex analysis of unitarily invariant matrix functions.

Journal of Convex Analysis2(1–2), 1995, pp. 173–183.[29]

A.S. Lewis:

Convex analysis on the Hermitian Matrices.

SIAM Journal on Optimization 6(1), 1996,pp. 164–177. .P. Friedlander, A. Goodwin, and T. Hoheisel:

From perspective maps to epigraphical projections

J. Liu and J. Ye:

Eﬃcient Euclidean projections in linear time.

Proceedings of the 26th AnnualInternational Conference on Machine Learning, 2009, pp. 657–664.[31]

F. Meng, D. Sun, and G. Zhao:

Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization.

Mathematical Programming 104, 2005, pp. 561–581.[32]

F. Meng, G. Zhao, M. Goh, and R. De Souza:

Lagrangian-dual functions and Moreau-Yosidaregularization.

SIAM Journal on Optimization 19, 2008, pp. 39–61.[33]

A. Milzarek:

Numerical Methods and Second Order Theory for Nonsmooth Problems.

Dissertation,Technical University of Munich, 2016.[34]

B.S. Mordukhovich:

Variational Analysis and Applications.

Springer Monographs in Mathematicsbook series, Springer International Publishing AG, 2018.[35]

P. Neal and S. Boyd:

Proximal algorithms.

Foundations and Trends in Optimization 1(3), 2013, pp.123–231.[36]

J.S. Pang and L. Qi:

A Globally convergent Newton method for convex SC minimization problems. Journal of Optimization Theory and Applications 85(3), 1995, pp. 633–648.[37]

L. Qi and J. Sun:

A nonsmooth version of Newton’s method.

Mathematical Programming 58, 1993,pp. 353–367.[38]

R.T. Rockafellar:

Convex Analysis.

Princeton Mathematical Series, No. 28. Princeton UniversityPress, Princeton, N.J. 1970.[39]

R.T. Rockafellar and R.J.-B. Wets:

Variational Analysis.

Grundlehren der Mathematischen Wis-senschaften, Vol. 317, Springer-Verlag, Berlin, 1998.[40]

A. Shapiro:

Directionally nondiﬀerentiable metric projection.

Journal of Optimization Theory andApplications 81(1), 1994, pp. 203–204.[41]

J. Liu, S. Ji, and J. Ye:

SLEP: Sparse Learning with Eﬃcient Projections,

Arizona State University, 2009.[42]

T. Str¨omberg:

The Operation of Inﬁmal Convolution.

Dissertationes Mathematicae (RozprawyMatematyczne) 352, 1996.[43]

M. Tofighi, K. Kose, and A.E. Cetin:

Denoising using projections onto the epigraph set of convexcost functions.

IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 2709–2713.[44]

M. Tofighi, A. Bozkurt, K. Kose, and A.E. Cetin:

Deconvolution using projections onto theepigraph set of a convex cost function.

P.-W. Wang, M. Wytock, and J.Z. Kolter: