Convex analysis in normed spaces and metric projections onto convex bodies
aa r X i v : . [ m a t h . M G ] A ug Convex analysis in normed spaces and metricpro jections onto convex bodies
Vitor Balestro ∗ Instituto de Matem´atica e Estat´ısticaUniversidade Federal Fluminense24210201 Niter´[email protected]ff.brHorst MartiniFakult¨at f¨ur MathematikTechnische Universit¨at Chemnitz09107 [email protected] TeixeiraInstituto de Matem´atica e Estat´ısticaUniversidade Federal Fluminense24210201 Niter´[email protected]ff.br
Abstract
We investigate metric projections and distance functions referring to convex bodiesin finite-dimensional normed spaces. For this purpose we identify the vector spacewith its dual space by using, instead of the usual identification via the standard innerproduct, the Legendre transform associated with the given norm. This approach yieldsre-interpretations of various properties of convex functions, and new relations betweensuch functions and geometric properties of the studied norm are also derived.
Keywords : Birkhoff orthogonality, convex body, convex functions, (differentiability of) dis-tance functions, Legendre transform, sub-gradient
MSC 2010: ∗ Corresponding author Introduction
Let (
M, d ) be a metric space, and let C ⊆ M be a subset of M . The distance of a givenpoint x ∈ M to C is defined to be the numberdist( x, C ) = inf { d ( x, y ) : y ∈ C } . If this number is finite, and if y ∈ C is such that d ( x, y ) = dist( x, C ), then we say that y is a metric projection of x onto C .Metric projections have been widely studied through the past decades and in a number ofcontexts (see, e.g., [1], [4], [10] and [34], further references will be given throughout the text).A very special case is when one considers the metric projections onto a given convex body (i.e., a compact, convex set with non-empty interior) K in R n endowed with the standarddistance given for the Euclidean norm (which we denote by | · | ). For this case we know that,for example, (i) K is a Chebyshev set , meaning that a metric projection onto K exists and is unique forany x ∈ R n . Actually, every Chebyshev set of R n is convex, see [8] and [11]. For infinite-dimensional vector spaces this is more complicated, and we refer the reader to [30] and [31], (ii) the map p K : R n → K which takes each x ∈ R n to its metric projection onto K is con-tracting , meaning that | p K ( x ) − p K ( y ) | ≤ | x − y | for any x, y ∈ R n , (iii) the distance function dist( · , K ) : R n → R is convex, and differentiable at R n \ K , even if K is not smooth (a convex body is said to be smooth if it has a unique supporting hyperplaneat each boundary point), (iv) the gradient of the distance function can be described in terms of the outer normals of K .We originally wanted to understand which of these (and further) properties remain true ifwe consider the metric given by an arbitrary norm in R n , instead of the usual norm given bythe standard inner product. In this paper, we give precise answers to all the cases above exceptfor (ii) , describing what properties the norm and/or the convex body have to satisfy suchthat the desired holds. For normed spaces, property (ii) was discussed in [17]. We mentionthat regularity properties for distance functions have also been extensively investigated invarious contexts (see, e.g, [19], [22], [24] and [33]). In the context of normed (or gauge )spaces we mention the papers [12], [13], [14], [27] and [32]. However, to our best knowledgethe following question was not explicitly answered: can we guarantee differentiability forthe distance function to a convex body only assuming that the ambient norm is smooth andstrictly convex (in the “geometric” sense that its unit ball is smooth and strictly convex)? Weanswer this question positively, also relating the so-called norm gradient of these functionsto the Birkhoff orthogonality relation given by the norm.Seeing properties (iii) and (iv) in the context of normed spaces, we observed that theearly theory of convex functions on R n can be built replacing the norm given by thestandard inner product by a smooth and strictly convex norm . Besides providing thecomplete understanding of differentiability of distance functions onto convex sets in normedspaces, we believe that this approach is interesting for itself. Using the Legendre transformof the norm, we define and study the notions of norm gradients and norm sub-gradients ofconvex functions, relating them also to the concept of Birkhoff orthogonality (to our bestknowledge, this relation is also new). 2e organize the text as follows: in Section 2 we recall some definitions and propertiesregarding the classical theory of convex functions, briefly discussing also the sub-linear func-tions. We begin to study metric projections onto convex sets in normed spaces in Section 3,where we give necessary and sufficient conditions for the uniqueness (among other results).Section 4 is devoted to introduce and investigate the so-called norm gradients and normsub-gradients of convex functions. They are also used to “detect” differentiability, and thisis applied in Section 5 to study the differentiability of distance functions to convex bodies.To finish this introduction, we fix some basic concepts and notation. We say that ahyperplane h ⊆ R n supports a given convex body K at a point x ∈ ∂K (where ∂K denotesthe boundary of K ) if K is contained in one of the (closed) half-spaces determined when h is translated to pass through x . As it was previously mentioned, a convex body is smooth ifit is supported by a unique hyperplane at each boundary point. We also say that a convexbody is strictly convex if its boundary contains no line segment. Let ( X, || · || ) denote a normed or Minkowski space , i.e., a finite dimensional, real Banach space. The unit ball of( X, || · || ) is the set B := { x ∈ X : || x || ≤ } , and it is immediate that B is a convex bodywhich is symmetric with respect to the origin. The boundary of the unit ball is the unitsphere ∂B = { x ∈ X : || x || = 1 } . We say that a norm is smooth (resp. strictly convex ) if itsunit ball B is a smooth convex body (resp. strictly convex body). Equivalently, a norm isstrictly convex if and only if the triangle inequality is strict for linearly independent vectors(see [20]).A norm || · || on a vector space X induces an orthogonality relation which is known as Birkhoff orthogonality . We say that a vector x ∈ X is Birkhoff left-orthogonal to a vector y ∈ X if || x + ty || ≥ || x || for every t ∈ R . We denote this relation by x ⊣ B y , and in this case wealso say that y is Birkhoff right-orthogonal to x . It is easy to see that Birkhoff orthogonalityis a homogeneous relation (meaning it is a relation between directions rather than vectors).Extending this concept, we say that a vector x ∈ X is Birkhoff left-orthogonal to a hyperplane h if x ⊣ B z for every z ∈ h . In this case, we also say that h is Birkhoff right-orthogonalto x , and this is similarly denoted by x ⊣ B h . Birkhoff orthogonality between vectors andhyperplanes is related to the geometry of the unit ball: the norm is smooth if and only if eachnon-zero vector x ∈ X admits a unique Birkhoff right-orthogonal hyperplane, and the normis strictly convex if and only if each hyperplane h ⊆ X has a unique Birkhoff left-orthogonaldirection. We refer the reader to [2] for more information on Birkhoff orthogonality (andother orthogonality types in normed spaces). A function f : R n → R is said to be convex if f ( λx + (1 − λ ) y ) ≤ λf ( x ) + (1 − λ ) f ( y )for any x, y ∈ R n and λ ∈ [0 , f : R n → R is convex if and only ifits epigraph epi( f ) = { ( x, c ) ∈ R n +1 : f ( x ) ≤ c } is a convex set. Replacing the domain by convex open subsets of R n , the definition can beextended, as well as the most part of our results. For the sake of simplicity, we work withfunctions defined over R n and which only take values on R (and not on the extended line R ∪ {−∞ , + ∞} , as it is usual in the convex analysis literature).3e deal with two differentiability notions of convex functions. First, if f : R n → R is aconvex function, then we denote by f ′ + the one-sided directional derivative f ′ + ( x, v ) := lim t → + f ( x + tv ) − f ( x ) t , for any x, v ∈ R n . In our context (that of convex functions whose domain is R n ), this limitalways exists. Also, it is well-known that f ′ + ( x, · ) is a sublinear map for each x ∈ R n . Forproofs we refer the reader to [28, Chapter 1]. Letting t → − in the limit above, we get the left-sided derivative f ′− ( x, v ). It is not difficult to see that f ′− ( x, v ) = − f ′ + ( x, − v ), and forthat reason we work mainly with f ′ + .A function f : U → R , where U ⊆ R n is an open set, is said to be ( Fr´echet ) differentiable at x ∈ U if there exists a linear map df x : R n → R such thatlim || v ||→ | f ( x + v ) − f ( x ) − df x ( v ) ||| v || = 0 , where the limit is taken in the metric given by the norm. It is immediate to notice that thisdefinition coincides with the differentiability definition given by the Euclidean norm. Forconvex functions, there are several characterizations of Fr´echet differentiability, and next westate the ones which are important for us. Proposition 2.1.
Let f : R n → R be a convex function, and let x ∈ R n . For simplicity, wewrite c := f ( x ) . The following statements are equivalent: (i) f is Frech´et differentiable at x , (ii) the one-sided derivative f ′ + ( x, · ) is a linear map, and (iii) epi( f ) is supported by a unique hyperplane at ( x, c ) .In this case, we have that f ′ + ( x, · ) = df x ( · ) .Proof. Again, for the proofs we refer to [28, Chapter 1].In the points where f is not differentiable, we still can guarantee that the one-sidedderivative f ′ + ( x, · ) : R n → R is sub-linear , meaning that it has the following properties: i. (sub-additivity) f ′ + ( x, u + v ) ≤ f ′ + ( x, u ) + f ′ + ( x, v ) for every u, v ∈ R n , and ii. (positive homogeneity) f ′ + ( x, λu ) = λf ′ + ( x, u ) for any u ∈ R n and every λ ≥ g : R n → R is a sub-linear function, then it has well-defined one-sided derivatives g ′ + ( x, u ) for any x, u ∈ R , and the maps g ′ + ( x, · ) are also sub-linear. A linearity direction ofa sub-linear function g is a vector u ∈ R n for which g ( u ) = − g ( − u ). The set of linearitydirections of g is denoted by lin( g ).The next lemma establishes some important properties of sub-linear functions that wewill use later. The proof is given in [28, Chapter 1]. Lemma 2.1.
Let g : R n → R be a sub-linear function. The following statements hold: a) g ′ + ( x, λx ) = λg ( x ) for any x ∈ R n and λ ∈ R , (b) g ′ + ( x, u ) ≤ g ( u ) for any x, u ∈ R n , (c) lin( g ) is a vector subspace of R n , (d) lin( g ′ + ( x, · )) ⊇ lin( g ) + span( x ) for every x ∈ R n , (e) lin( g ) = R n if and only if g is linear. If f is a convex function, then it is easy to notice that for each c ∈ R the sub-level set { f ≤ c } := { z ∈ R n : f ( z ) ≤ c } is convex. Despite being simple, the next propositionprovides important information about the structure of the sub-level sets of convex functions(namely, a point which is not a global minimizer lies in the boundary of some sub-level set). Proposition 2.2.
Let f : R n → R be a convex function and assume that f ( x ) = c . Thenprecisely one of the following statements holds: (i) { f < c } 6 = ∅ and x ∈ ∂ { f ≤ c } , (ii) c is the global minimum of f .Proof. If c is not the minimum of f , then there exists a point z ∈ R n such that f ( z ) < c = f ( x ). To prove that x ∈ ∂ { f ≤ c } , let ε > w = x − z . Then f ( x − εw ) = f ((1 − ε ) x + εz ) ≤ (1 − ε ) f ( x ) + εf ( z ) < c, from where we get that x − εw ∈ B ( x, ε ) ∩ { f < c } . On the other hand, writing x = 11 + ε ( x + εw ) + ε ε z , we get c = f ( x ) ≤
11 + ε f ( x + εw ) + ε ε f ( z ) <
11 + ε f ( x + εw ) + ε ε c, and this leads to f ( x + εw ) > c . Hence x + εw ∈ B ( x, ε ) ∩ { f > c } . Since this holds for any ε >
0, it follows that any neighborhood of x contains points which lie inside and outside of { f ≤ c } . Hence x ∈ ∂ { f ≤ c } .On the other hand, it is clear that if c is the global minimum of f , then { f < c } = ∅ .As a remark, notice that by continuity we have that { f < c } is an open (convex) set.Moreover, observe that if c is not a global minimum, then int { f ≤ c } = { f < c } and { f = c } = ∂ { f < c } . Proposition 2.3.
Let f : R n → R be a convex function, and assume that f is differentiableat x ∈ R n . If f ( x ) = c is not a global minimum of f , then the sub-level set { f ≤ c } issupported by a unique hyperplane at x .Proof. Let h be a supporting hyperplane of f at x , and let z ∈ h . Observe that f ( x + εz ) ≥ f ( x ) for every ε ∈ R , meaning that f grows in both directions z and − z . That is, f ′ + ( x, ± z ) ≥ .
5n the other hand, since f is differentiable at x , we have0 ≤ f ′ + ( x, z ) = df x ( z ) = f ′− ( x, z ) = − f ′ + ( x, − z ) ≤ , from which it follows that df x ( z ) = 0. This shows that df x vanishes in any direction whichsupports { f ≤ c } at x . Consequently, if { f ≤ c } has more than one supporting hyperplaneat x , then df x = 0, and this contradicts the fact that f ( x ) = c is not a global minimum. Roughly speaking, the metric projection of a point x onto a convex body K is the point of ∂K where the minimum distance from x to points from K is attained. For the Euclidean case,this concept is studied in [28, Chapter 1], and in this section we extend the ideas presentedthere to Minkowski spaces. If ( X, || · || ) is a normed space, then the distance from a point x ∈ X to a set C ⊆ X is defined asdist( x, C ) = inf {|| x − y || : y ∈ C } . We are mainly interested in studying metric projections of points onto convex bodies. Thereader must be aware of the fact that, in general, strict convexity and smoothness are notassumed, neither for the norm nor for the considered convex body.
Theorem 3.1.
Let ( X, || · || ) be a finite-dimensional vector space, and let K ⊆ X be a convexbody. For each point x ∈ X , there exists a point p K ( x ) ∈ K such that dist( x, K ) = || x − p K ( x ) || ≤ || x − y || for any y ∈ K . Moreover, if x / ∈ K , then p K ( x ) ∈ ∂K , and the vector x − p K ( x ) is (left)Birkhoff orthogonal to some supporting hyperplane of K at p K ( x ) .Remark . The point p K ( x ) is called a metric projection of x onto K . Of course, if x ∈ K ,then p K ( x ) = x . Proof.
The existence of p K ( x ) comes straightforwardly from the compactness of K and fromthe continuity of the norm. It is also clear that if x / ∈ K , then p K ( x ) ∈ ∂K . Indeed, if x / ∈ K and y ∈ int K , then the segment connecting x to y cuts the boundary of K (at a point z ,say) with || x − y || > || x − z || .For the other claim, assume that x / ∈ K and write ρ := dist( x, K ) = || x − p K ( x ) || . Then K and B ( x, ρ ) are convex bodies which clearly intersect along their boundaries, that is, K ∩ B ( x, ρ ) ⊆ ∂K ∩ ∂B ( x, ρ ) . In fact, if this does not hold, then we would have dist( x, K ) < ρ . It follows that int( K ) ∩ int( B ( x, ρ )) = ∅ , and hence int( K ) and int( B ( x, ρ )) can be (properly) separated by somehyperplane h (see [28, Theorem 1.3.8]). Since we clearly have that p K ( x ) ∈ h , we get that h supports both B ( x, ρ ) and K at p K ( x ). The fact that h supports B ( x, ρ ) at p K ( x ) givesthat x − p K ( x ) is Birkhoff left-orthogonal to h (translated to pass through the origin).From now on, we sometimes will denote the distance function dist( · , C ) : X → R to agiven non-empty compact set C by d C ( · ). In general, we work with convex bodies, but someresults are still true if we only demand compactness.6 roposition 3.1. Let K ⊆ X be a convex body. The distance function d K : X → R is aconvex function.Proof. Let x, y ∈ X and λ ∈ (0 , K is convex, the segment connecting p K ( x ) and p K ( y ) is contained in K , and hence d K ((1 − λ ) x + λy ) ≤ || (1 − λ ) x + λy − ((1 − λ ) p K ( x ) + λp K ( y )) || ≤≤ (1 − λ ) || x − p K ( x ) || + λ || y − p K ( y ) || = (1 − λ ) d K ( x ) + λd K ( y ) , proving that d K is convex. Lemma 3.1.
Let C ⊆ X be a non-empty compact set. The distance function d C : X → R isa weak contraction, that is, we have | d C ( x ) − d C ( y ) | ≤ || x − y || for any x, y ∈ X .Proof. Without loss of generality, assume that d C ( y ) ≥ d C ( x ), and write y = x + z . Let p C ( x ) and p C ( y ) be metric projections of x and y onto C , respectively. Then we have | d C ( y ) − d C ( x ) | = d C ( y ) − d C ( x ) = || y − p C ( y ) || − || x − p C ( x ) || == || x + z − p C ( x + z ) || − || x − p C ( x ) || ≤ || x + z − p C ( x ) || − || x − p C ( x ) || ≤≤ || z || = || x − y || , and this concludes the proof.Next we discuss uniqueness of the metric projection of a given point. As it is explainednow, we have no uniqueness only in the case that B X is not strictly convex and ∂K containsa line segment which is parallel to some line segment of ∂B X . The metric projection onto K is said to be well-defined if it is unique for each x ∈ X . Proposition 3.2. If ( X, || · || ) is a normed space and K ⊆ X is a convex body, then a metricprojection p K ( x ) of an exterior point x / ∈ K is not unique if and only if p K ( x ) is containedin some non-degenerate line segment of ∂K which is parallel to some non-degenerate linesegment in the boundary ∂B X of the unit ball.Proof. Assume that there are two distinct points y , z ∈ ∂K where the distance from x to K is attained, that is, with || x − y || = || x − z || = dist( x, K ) := ρ. Hence y , z ∈ B ( x, ρ ) ∩ K . Since B ( x, ρ ) ∩ K ⊆ ∂K ∩ ∂B ( x, ρ ), we have two distinct pointsin the intersection of the boundaries of two convex bodies. This is only possible if bothboundaries contain the segment seg[ y , z ] connecting these points.It follows that seg[ y , z ] is a non-degenerate line segment of ∂K . Since seg[ y , z ] is alsoa line segment in the boundary of B ( x, ρ ), it corresponds to a (parallel) line segment in ∂B X . Corollary 3.1. If ( X, || · || ) is a strictly convex normed space, then the metric projection ona given convex body is unique for any point x ∈ X . z ∈ ∂K . An outer normal of K at z is any outward pointing unit vector which isBirkhoff left-orthogonal to some supporting hyperplane of K at z . Proposition 3.3.
Let x / ∈ K , and let p K ( x ) be a metric projection of x onto K . Then theoutward pointing unit vector η K ( x ) := x − p K ( x )dist( x, K ) is an outer normal of K at p K ( x ) .Proof. The fact that η K ( x ) is Birkhoff left-orthogonal to some supporting hyperplane of K at p K ( x ) comes from Theorem 3.1. Since η K ( x ) is clearly unit and outward pointing, we getthat η K ( x ) is an outer normal of K .In what follows, we denote the line passing through z in the direction of u as z + R u . Theray starting at z with the direction of u will be denoted by z + R + u . Next we investigate the“converse” of the idea that the (normalized) vector subtraction between an exterior pointand its metric projection is an outer normal. Proposition 3.4.
Let K be any convex body in X , and let x ∈ X \ K . Then p K ( x ) is ametric projection for any y ∈ p K ( x ) + R + η K ( x ) .Proof. We write ρ := dist( x, K ). By Theorem 3.1, let h be a hyperplane which supportsboth K and B ( x, ρ ) at p K ( x ). If y ∈ p K ( x ) + R + η K ( x ), then y − p K ( x ) has the directionof x − p K ( x ), and hence it is (left) Birkhoff orthogonal to h . Writing α := || y − p K ( x ) || , itfollows that h supports B ( y, α ) at p K ( x ). Since B ( y, α ) clearly lies in the same half-spacedetermined by h as B ( x, ρ ), we get that K ∩ int( B ( y, α )) = ∅ . Consequently, we have that || y − z || ≥ α for all z ∈ K , and hence p K ( x ) is a metric projection of y onto K (because || y − p K ( x ) || = α ). Remark . A set with the property proved in this proposition is often called a sun (see,e.g., [7, 15, 26] and references therein).
Scholium . If z ∈ ∂K , then any point in the ray starting at z and going into the directionof an outer normal of K at z has z as a metric projection. Proof.
Let z ∈ ∂K and assume that u is an outer normal of K at z . Let h be a supportinghyperplane of K at z which is (right) Birkhoff orthogonal to u . If w = z + βu for some β > z ∈ ∂B ( w, β ) and w − z has the direction of u , from where h supports B ( w, β ) at z .Clearly, K and B ( w, β ) lie in distinct half-spaces determined by h , from where we get that K ∩ int( B ( w, ρ )) = ∅ . It follows that || w − y || ≥ β for every y ∈ K , and this shows that z isa metric projection of w onto K , because || w − z || = β .We have a geometric consequence that will be useful to understand the regularity of thedistance function. A parallel set (or parallel body ) of a given convex body K is a set of thetype K + δB := { x + δy : x ∈ K, y ∈ B } = { x ∈ X : dist( x, K ) ≤ δ } , for δ >
0. The equality above is immediate, and we will skip the proof. Also, it is easy tosee that each parallel set K + δB is a convex body.If we assume that B is smooth (in the sense that it has a unique supporting hyperplaneat each boundary point), then K + δB is smooth for any δ >
0, even if K is not smooth (see,e.g., [18]). 8 roposition 3.5. Let K be a convex body, and let z ∈ ∂K be a boundary point. Let u K ( z ) be an outer normal vector of K at z , and let δ > . If x = z + δu K ( z ) , then u K ( z ) is aBirkhoff normal vector of the parallel body K + δB at x .Proof. Denote by h the hyperplane which is Birkhoff right-orthogonal to the vector u K ( z ).It suffices to prove that h supports K + δB at x . Abusing a little of the notation, we assumethat h is translated to pass through x . Denoting the (closed) half-spaces regarding h by h + and h − , we may assume, without loss of generality, that K ⊆ h − . Hence we have that K + δB ⊆ ( h + δu K ( z )) − . Indeed, if y ∈ int( h + δu K ( z )) + , then || y − w || > δ for any w ∈ K ,and this, together with the compactness of K , leads to dist( y, K ) > δ , from where we getthat y / ∈ K + δB . It follows that h + δu K ( z ) supports K + δB at x , yielding that u K ( z ) is aBirkhoff normal vector of K + δB at x .We know that the distance function to a convex body is convex, and hence it is continuous.However, we said nothing about the continuity of the metric projection (in case it is unique)thus far. This will be clarified next. In the last section of the paper, we will prove thatthe continuity of the metric projection gives that the derivative of the distance function iscontinuous (that is, d K is of class C ). Proposition 3.6.
Let K ⊆ X be a convex body, and assume that the metric projection p K : X → K is well-defined. Then p K is continuous.Proof. It is clear that p K is continuous in the interior of K . Hence we consider the case where x ∈ X is not an interior point of K . Let ( x n ) n ∈ N be a sequence converging to x , and assumethat p K ( x n ) does not converge to p K ( x ). Since K is compact, it follows that there exists asubsequence p K ( x n j ) converging to a point w ∈ K with w = p K ( x ). From the continuity ofthe norm and of the distance function we get || x − w || = lim j →∞ || x n j − p K ( x n j ) || = lim j →∞ dist( x n j , K ) = dist( x, K ) , from where w is a metric projection of x onto K . This contradicts the uniqueness of themetric projection. Hence p K ( x n ) → p K ( x ), and this shows that p K is continuous at x . For simplicity, we work from now on in the vector space R n ; so we can use standard notionsof differentiability. Also, we always suppose that the norm is smooth and strictly convex;so the Birkhoff orthogonality is “well-behaving”. Let f : U → R be a (Fr´echet) differentiablefunction, where U ⊆ R n is an open set. A number c ∈ im( f ) is said to be a regular value of f if df x is surjective for each x ∈ f − ( c ). It is well known that the pre-image of any regularvalue is a hypersurface of R n . The gradient vector grad f ( x ) of f at x ∈ U is the (unique)vector such that df x ( v ) = h grad f ( x ) , v i for each v ∈ R n . We say that f is a function of class C if grad f ( x ) is continuous. This definition relies on an inner product fixed in R n , and wewould like to have a definition consistent with the geometry given by a (smooth and strictlyconvex) arbitrary norm. Definition 4.1.
Let U ⊆ R n be an open set, and let f : U ⊆ R n → R be a differentiablefunction. Let c be a regular value of f . The norm gradient of f at a point x ∈ f − ( c ) is theunique vector ∇ f ( x ) ∈ R n with the following properties:9 i) ∇ f ( x ) ⊣ B T x ( f − ( c )), and (ii) df x ( ∇ f ( x )) = ||∇ f ( x ) || .If c ∈ im( f ) is not a regular value and x ∈ f − ( c ), then we simply put ∇ f ( x ) = 0. Remark . From now on we always use the symbol ∇ as a notation for the norm gradient,despite the fact that this notation is often used for the standard Euclidean gradient. Thischoice is justified because we will not work with the Euclidean gradient throughout the text.It is not difficult to see that the norm gradient is well-defined. If c is a regular value,then f − ( c ) is a hypersurface, and hence it has an ( n − T x ( f − ( c )) at any point x ∈ f − ( c ). This hyperplane supports the unit ball of the norm at apoint which determines the Birkhoff left-orthogonal direction to T x ( f − ( c )). The condition (ii) gives both, the choice of an orientation and a normalization.This definition makes sense for C functions, since in this case we can define the tangentspace of the pre-image of a regular value. However, we can relax this regularity hypothesiswhen dealing with convex functions, and we are mostly concerned with this case. Based onProposition 2.2, we define the norm gradient of a convex function f : R n → R at a point x ∈ R n where f is differentiable. If x ∈ R n is such that c = f ( x ) is not a global minimum,then we take the unique non-zero vector ∇ f ( x ) such that (a) ∇ f ( x ) is Birkhoff left-orthogonal to the (unique) supporting hyperplane of { f ≤ c } at x , (b) df x ( ∇ f ( x )) = ||∇ f ( x ) || .Otherwise (that is, if c is a global minimum), we put ∇ f ( x ) = 0.Denote by ( R n ) ∗ the dual space of R n , and assume that || · || is a smooth and strictlyconvex norm on R n . The Legendre transform on ( R n , || · || ) is the map L : R n → ( R n ) ∗ whichassociates to each non-zero vector x ∈ R n the unique linear functional L ( x ) ∈ ( R n ) ∗ such that (a) ker L ( x ) is the (unique) hyperplane which is Birkhoff right-orthogonal to x , (b) L ( x ) · x = || x || .We also define L (0) = 0, that is, the null functional in ( R n ) ∗ . It is immediate to observe that L is a bijection. Remark . Notice very carefully that, for defining the Legendre transform, we do not needthe norm to be strictly convex. However, we always assume that this hypothesis is true,because we are more interested in this case.
Remark . If | · | is the Euclidean norm in R n derived from the standard inner product h· , ·i ,then it is easy to see that the Legendre transform of | · | is given as L ( x ) = h x, ·i . That is, inthis case the Legendre transform is the natural isomorphism between R n and ( R n ) ∗ given bythe standard inner product. Lemma 4.1.
The Legendre transform is homogeneous of degree one, that is, L ( αx ) = α L ( x ) for any α ∈ R and x ∈ R n .Proof. Of course, we may assume that α = 0 and x = 0, since otherwise the proof is trivial.Let h be the hyperplane such that x ⊣ B h . Then we also have that αx ⊣ B h , meaning thatthe linear functionals L ( αx ) and α L ( x ) have the same kernel h . Now we observe that L ( αx ) · ( αx ) = || αx || = α || x || = α L ( x ) · ( αx ) , L ( αx ) and α L ( x ) take the same value at αx . This concludesthe proof, since now we have that both linear functionals agree on a basis of R n .We also have that the Legendre transform is a norm-preserving map when one considersthe dual norm on ( R n ) ∗ . We prove this next. Proposition 4.1.
Let || · || ∗ be the dual norm on ( R n ) ∗ , defined as || φ || ∗ := sup { φ ( x ) : x ∈ B } . The Legendre transform is a norm-preserving map of ( R n , || · || ) onto (( R n ) ∗ , || · || ∗ ) .Proof. From the homogeneity, it suffices to prove that ||L ( x ) || ∗ = 1 whenever || x || = 1. Let x be a unit vector, and assume that h is the hyperplane such that x ⊣ B h . It is easy to see fromthe definition that L ( x ) · y is positive if and only if y lies in the open half-space determinedby h which contains x ( h + , say). Also, h supports the unit ball at x , and consequently anyvector y ∈ B ∩ h + can he written as y = x + z || x + z || for some z ∈ h . Since || x + z || ≥ || x || , we get L ( x ) · y = L ( x ) · x + z || x + z || = L ( x ) · x || x + z || = || x || || x + z || ≤ || x || = 1 , and hence ||L ( x ) || ∗ = sup {L ( x ) · v : v ∈ B } ≤ x ∈ B and L ( x ) · x = || x || = 1, and this yields ||L ( x ) || ∗ = 1 = || x || .Our definition of the Legendre transform may be a little bit intrincated to work with, butit has the advantage of not demanding any differentiability properties of the norm (only thegeometric features of being smooth and strictly convex). Actually, later our approach willgive an easy proof of the fact that norms with these geometric properties must be of type C . Even without assuming a priori differentiability properties, we can guarantee that theLegendre transform is continuous. Proposition 4.2.
Let || · || be a smooth and strictly convex norm on R n . Then the associatedLegendre transform L : ( R n , || · || ) → (( R n ) ∗ , || · || ∗ ) is continuous.Proof. First, observe that continuity at the origin comes immediately from the fact that theLegendre transform is norm-preserving. Also, since the Legendre transform is homogenousof degree 1 it suffices to show that for any sequence ( x n ) n ∈ N of unit vectors converging to a(unit) vector x we have that L ( x n ) → L ( x ) in the dual norm as n → ∞ . We will divide theproof in steps. First step.
We prove that for each fixed v ∈ R n we have convergence L ( x n ) · v → L ( x ) · v as n → ∞ (pointwise convergence). Denote by h n the supporting hyperplane of B at x n , andby h the supporting hyperplane of B at x . For each n ∈ N , we can write v = α n x n + z n α n ∈ R and z n ∈ h n . Similarly, we write v = αx + z , with z ∈ h . We claim that, as n → ∞ , we have α n → α (and, as a consequence, z n → z ). First, observe that || v || = || α n x n + z n || = | α n | · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x n + z n α n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ | α n | · || x n || = | α n | whenever α n = 0, where the last inequality comes from x n ⊣ B z n . This shows that ( α n ) isbounded. Suppose that some subsequence α n k → β = α . Then z n k → v − βx , and sinceBirkhoff orthogonality is a continuous relation, it follows that x ⊣ B ( v − βx ). Therefore, wehave two distinct decompositions of v in the sum h ⊕ span { v } , which is in contradiction tothe fact that the norm is smooth. Consequently, every converging subsequence of ( α n ) goesto α . This, together with the fact that ( α n ) is bounded, yields that α n → α .Thus, we estimate |L ( x n ) · v − L ( x ) · v | = |L ( x n ) · ( α n x n + z n ) − L ( x ) · ( αx + z ) | = | α n − α | , and the latter goes to 0 as n → ∞ . This concludes the first step. Second step.
We show that L ( x n ) → L ( x ) in the dual norm up to a subsequence. Considerthe sequence of the (continuous) functions F = ( L ( x n ) | B ) n ∈ N , where | B means that we arerestricting the domain to B . Since ||L ( x n ) || ∗ = || x n || = 1 for every n ∈ N , we have that thisfamily is uniformly bounded. We also have that F is equicontinuous, because |L ( x n ) · v − L ( x n ) · w | ≤ ||L ( x n ) || ∗ || v − w || = || v − w || , for any n ∈ N and every v, w ∈ B . Noticing that B is compact, the Arzela-Ascoli theoremgives that there exists a subsequence ( L ( x n k ) | B ) k ∈ N such that L ( x n k ) | B → φ in the dual normfor some continuous function φ : B → R . Since the convergence in the dual norm impliespointwise convergence, it follows from the first step of the proof that φ = L ( x ) | B . Hence L ( x n k ) → L ( x ) in the dual norm. Third step.
We prove that L ( x n ) → L ( x ) with respect to || · || ∗ (that is, we can guarantee theconvergence of the original sequence, without needing to pass to a subsequence). We proceedby contradiction. If this convergence does not hold, then there exist a number ε > x n j ) j ∈ N such that ||L ( x n j ) − L ( x ) || ∗ > ε for every j ∈ N . Hence there exists asequence ( v n j ) of vectors of B such that |L ( x n j ) · v n j − L ( x ) · v n j | > ε. Passing to a subsequence if necessary, we can assume that v n j → v for some vector v ∈ B (recall that B is compact). However, the left-hand side of the inequality above can be madearbitrarily small as j → ∞ . This comes from the estimate |L ( x n j ) · v n j − L ( x ) · v n j | ≤≤ |L ( x n j ) · v n j − L ( x n j ) · v | + |L ( x n j ) · v − L ( x ) · v | + |L ( x ) · v − L ( x ) · v n j | ≤≤ ||L ( x n j ) || ∗ || v n j − v || + |L ( x n j ) · v − L ( x ) · v | + ||L ( x ) || ∗ || v − v n j || == 2 || v − v n j || + |L ( x n j ) · v − L ( x ) · v | , where the reader may observe that, as a consequence of the first step of the proof, the lastterm converges to 0 as j → ∞ . 12et (( R n ) ∗∗ , || · || ∗∗ ) be the bi-dual of R n , which is the dual space of ( R n ) ∗ , endowed withthe norm || · || ∗∗ := ( || · || ∗ ) ∗ , that is, the dual norm of || · || ∗ . Let J : R n → ( R n ) ∗∗ be thecanonical identification given by J ( x ) · φ = φ ( x ), for any x ∈ R n and every φ ∈ ( R n ) ∗ . Itis well-known that J is a norm-preserving isomorphism. Next we discuss the duality of theLegendre transform. Proposition 4.3.
Denote by L ∗ the Legendre transform of (( R n ) ∗ , || · || ∗ ) onto ( R n ) ∗∗ , andlet φ ∈ ( R n ) ∗ . Then L ∗ ( φ ) = J ( x ) if and only if φ = L ( x ) . In other words, L ∗ = J ◦ L − .Proof. Assume first that L ∗ ( φ ) = J ( x ). Since J is linear, and the Legendre transform isalways homogeneous of degree one, we may assume that || x || = 1 (observe also that the case x = 0 is trivial). Noticing that L ∗ and J are norm-preserving, we have φ ( x ) = J ( x ) · φ = L ∗ ( φ ) · φ = || φ || ∗ = ||L ∗ ( φ ) || ∗∗ = || J ( x ) || ∗∗ = || x || = 1 . It follows that 1 = φ ( x ) = || φ || ∗ = sup { φ ( y ) : y ∈ B } , that is, the dual norm of φ is attained at x . In particular, we have that L ( x ) · x = φ ( x ). Forany z ∈ ker( φ ) the inequality 1 ≥ φ (cid:18) x + tz || x + tz || (cid:19) = 1 || x + tz || holds for any t ∈ R . Thus, || x + tz || ≥ || x || for every t ∈ R , meaning that x ⊣ B z .Denoting by h the hyperplane which supports B at x , it follows that ker( φ ) ⊆ h . Sinceboth are ( n − φ ) = h , and this proves that φ = L ( x ).Now assume that L ( x ) = φ , and still consider that || x || = 1. Since the Legendre transformis norm-preserving, we have || φ || ∗ = ||L ( x ) || ∗ = || x || = 1. Also, J ( x ) · φ = φ ( x ) = L ( x ) · x = || x || = 1 . On the other hand, L ∗ ( φ ) · φ = || φ || ∗ = 1 , from where we get that J ( x ) · φ = L ∗ ( φ ) · φ . It remains to prove that J ( x ) and L ∗ ( φ ) havethe same kernel. If ψ ∈ ker( J ( x )), then 0 = J ( x ) · ψ = ψ ( x ), and this leads to || φ + tψ || ∗ ≥ φ ( x ) + tψ ( x ) = φ ( x ) = 1 = || φ || ∗ for any t ∈ R . Thus, ψ is a vector of the supporting hyperplane of the dual unit ball B ∗ at φ , and hence ψ ∈ ker( L ∗ ( φ )). It follows that ker( J ( x )) ⊆ ker( L ∗ ( φ )), and using again thefact that both are ( n − L ∗ ( φ ) = J ( x ). Remark . What we have proved is that, up to the canonical identification between R n and( R n ) ∗∗ , the Legendre transform is self-dual. Notice that the proof relies heavily on the factthat the kernel of a non-zero functional is precisely the hyperplane which supports the unitball in the boundary point where its dual norm is attained.13 orollary 4.1. The inverse of the Legendre transform of a smooth and strictly convex normedspace is continuous.Proof.
Simply observe that L − = J − ◦ L ∗ , and the latter is the composition of continuousmaps ( L ∗ is continuous because it is a Legendre transform).Next we give a characterization of the norm gradient of a differentiable convex function. Theorem 4.1.
Let f : R n → R be a convex function differentiable at a point x ∈ R n . Then f ( y ) − f ( x ) ≥ L ( ∇ f ( x )) · ( y − x ) (4.1) for any y ∈ R n . The converse is also true: the norm gradient ∇ f ( x ) is the unique vector forwhich this inequality holds for every y ∈ R n .Proof. We have to use some machinery from the theory of convex functions. First of all, westate and prove an inequality which “captures” the convexity in terms of one-sided derivatives.We have f ( x + w ) − f ( x ) ≥ f ′ + ( x, w ) (4.2)for any x, w ∈ R n . To prove this inequality, observe that for any ε > f ( x + εw ) − f ( x ) = f ( ε ( x + w ) + (1 − ε ) x ) − f ( x ) ≤ εf ( x + w ) + (1 − ε ) f ( x ) − f ( x ) == ε ( f ( x + w ) − f ( x ))holds, yielding f ( x + w ) − f ( x ) ≥ f ( x + εw ) − f ( x ) ε . Hence, letting ε → + we get (4.2).For simplicity of notation, we write v = ∇ f ( x ) and c = f ( x ). First assume that c is nota global minimum of f . Then v = 0, and there is a unique hyperplane h supporting { f ≤ c } at x . Consequently, we have the direct sum R n = h ⊕ span { v } , and since any translation is bijective, we get that any point y ∈ R n can be written in theform y = λv + x + z, for some λ ∈ R and some z ∈ h . From inequality (4.2) and the linearity of f ′ + ( x, · ) we get f ( y ) − f ( x ) ≥ f ′ + ( x, λv + z ) = λf ′ + ( x, v ) + f ′ + ( x, z ) . (4.3)Since f ′ + ( x, · ) is the Fr´echet derivative of f at x , we get from the definition of the normgradient that f ′ + ( x, v ) = df x ( v ) = || v || .
14n the other hand, since z ∈ h we have that, for any ε ∈ R , the point x + εz lies in thehyperplane which supports { f ≤ c } at x . It follows that f ( x + εz ) ≥ c = f ( x ) for any ε > f ′ + ( x, z ) = lim ε → + f ( x + εz ) − f ( x ) ε ≥ . Plugging this inequality in (4.3), we get f ( y ) − f ( x ) ≥ λf ′ + ( x, v ) = λ || v || . (4.4)Now we observe that L ( v ) · ( y − x ) = L ( v ) · ( λv + z ) = λ L ( v ) · v + L ( v ) · z . From the definition of the Legendre transform we have that L ( v ) · z = 0, since z is a vectorof the hyperplane which is Birkhoff right-orthogonal, and L ( v ) · v = || v || . Consequently, L ( v ) · ( y − x ) = λ || v || . Together with (4.4) this gives the desired inequality.Now we assume that c = f ( x ) is a global minimum of f , and we have ∇ f ( x ) = 0 bythe definition. As a consequence, we get that f ( y ) − f ( x ) ≥ L ( ∇ f ( x )) · ( y − x ) for any y ∈ R n .For the converse, suppose first that c = f ( x ) is not the global minimum of f , and assumethat v ∈ R n is such that f ( y ) − f ( x ) ≥ L ( v ) · ( y − x ). Notice that we must have v = 0 (otherwise c is a global minimum), and let h be the hyperplane which is Birkhoff right-orthogonal to v , translated to pass through x . We claim that h supports the sub-level set { f ≤ c } at x .Indeed, if this is not true, then we can take a point y ∈ h ∩ int { f ≤ c } = h ∩ { f < c } . Itfollows that v ⊣ B y − x , which means that L ( v ) · ( y − x ) = 0 . Thus, f ( y ) − f ( x ) < c − f ( x ) = c − c = 0 = L ( v ) · ( y − x ) , and this contradiction shows that v ⊣ B h . It still remains to prove that df x ( v ) = || v || . Firstwe notice that for any λ > f ( x + λx ) − f ( x ) ≥ L ( v ) · ( λv ) = λ || v || , from which we get df x ( v ) = f ′ + ( x, v ) = lim λ → + f ( x + λv ) − f ( x ) λ ≥ || v || . For the reverse inequality we observe that with λ > f ( x − λv ) − f ( x ) ≥ L ( v ) · ( − λv ) = − λ || v || , holds, and hence || v || ≥ lim λ → + f ( x − λv ) − f ( x ) − λ = df x ( v ) . Finally, assume that c = f ( x ) is a global minimum of f , and let v ∈ R n be a vector suchthat f ( y ) − f ( x ) ≥ L ( v ) · ( y − x ) for every y ∈ R n . Since c is a global minimum, we havethat df x = 0, and then, in particular, df x ( v ) = 0. For any λ > f ( x + λv ) − f ( x ) ≥ L ( v ) · ( λv ) = λ || v || , || v || ≤ lim λ → + f ( x + λv ) − f ( x ) λ = f ′ + ( x, v ) = df x ( v ) = 0 , implying also that v = 0. The proof is complete.Notice carefully that inequality (4.1) provides an equivalent definition for the norm gradi-ent of a convex function f in a point x where f is differentiable. Inspired by that, we extendthis notion for a convex function f : R n → R which is not necessarily differentiable. We saythat v ∈ R n is a norm sub-gradient of f at x if f ( y ) − f ( x ) ≥ L ( v ) · ( y − x ) (4.5)for any y ∈ R n . For each x ∈ R n , the set ∂f ( x ) of the norm sub-gradients of f at x is calledthe norm sub-differential of f at x . It is immediate to check that ∂f ( x ) is always closed.It is also easy to see that f ( x ) = c is a global minimum of f if and only if 0 ∈ ∂f ( x ). Atthis point, we warn the reader that other generalizations of the concept of sub-gradient werestudied, also in view of differentiability properties (see, e.g., [5] and [9]). As in the Euclideancase, we have a characterization of the norm sub-gradients in terms of one-sided derivatives(see [6], for example). Lemma 4.2.
Let f : R n → R be a convex function, and let x ∈ R n . We have that v ∈ ∂f ( x ) if and only if f ′ + ( x, u ) ≥ L ( v ) · u for every u ∈ R n .Proof. Assume first that v ∈ ∂f ( x ). Then for each u ∈ R n and any λ > f ( x + λu ) − f ( x ) ≥ L ( v ) · λu = λ L ( v ) · u. It follows that f ( x + λu ) − f ( x ) λ ≥ L ( v ) · u, for any λ >
0. Letting λ → + , we have the desired inequality.Now assume that v ∈ R n is a vector such that f ′ + ( x, u ) ≥ L ( v ) · u . If y ∈ R n , then frominequality (4.2) we get f ( y ) − f ( x ) ≥ f ′ + ( x, y − x ) ≥ L ( v ) · ( y − x ) , and this concludes the proof. Theorem 4.2. If f : R n → R is a convex function, then for any x ∈ R n we have that ∂f ( x ) = ∅ and f ′ + ( x, u ) = max {L ( w ) · u : w ∈ ∂f ( x ) } , for each u ∈ R n . roof. By the previous lemma, we already have that the inequalitymax {L ( w ) · u : w ∈ ∂f ( x ) } ≤ f ′ + ( x, u )holds for any u ∈ R n . Hence we only have to show that for an arbitrarily given vector u ∈ R n there exists a vector w ∈ ∂f ( x ) such that L ( w ) · u = f ′ + ( x, u ).To prove that, we first assume that u = 0. Choose a basis { e , . . . , e n } of R n with theproperty that e = u . Let g ( · ) := f ′ + ( x, · ), and define recursively functions g , . . . , g n bysetting g m ( · ) := ( g m − ) ′ + ( e m , · ). Each of these functions (including g ) is sub-linear, and fromproperty (d) of Lemma 2.1 we have thatlin( g m ) ⊇ lin( g m − ) + span( e m )for each m = 1 , . . . , n . From property (e) of Lemma 2.1 we have that g n is a linear functional.Since the Legendre transform is a bijection, there exists w ∈ R n such that g n = L ( w ). Weclaim that w is a norm sub-gradient such that L ( w ) · u = f ′ + ( x, u ). To check that, we firstobserve that by property (b) of Lemma 2.1 the inequalities g ≥ g ≥ . . . ≥ g n hold. Thisyields L ( w ) · ( y − x ) = g n ( y − x ) ≤ g ( y − x ) = f ′ + ( x, y − x ) ≤ f ( y ) − f ( x ) , for any x, y ∈ R n , where the last inequality comes from (4.2). This shows that w ∈ ∂f ( x )and, in particular, we also get that ∂f ( x ) = ∅ (notice that this construction holds true forany u = 0). Finally, from properties (a) and (b ) of Lemma 2.1 it follows that g n ( u ) ≤ g ( u ) = − ( g ) ′ + ( u, − u ) = − ( g ) ′ + ( e , − u ) = − g ( − u ) ≤ − g n ( − u ) = g n ( u ) , from where we get that L ( w ) · u = g n ( u ) = g ( u ) = f ′ + ( x, u ). This concludes the proof of thecase u = 0. If u = 0, then any w ∈ ∂f ( x ) satisfies L ( w ) · u = 0 = f ′ + ( x, u ). Since we alreadyhave that ∂f ( x ) = ∅ , the proof is finished. Remark . This proof appears for the Euclidean sub-case in [6, Theorem 3.1.8]. The readermay notice that the proof constructs a linear functional rather than a sub-gradient. Onlyafter that, we identify this linear functional with a vector. In the classical theory, this ismade via an inner product, and in our case we use the Legendre transform.The next and important corollary is also an immediate analogue of the Euclidean sub-case. It states that we can “detect” differentiability of a convex point at a given interiorpoint of its domain looking at the norm sub-differential at this point.
Corollary 4.2.
A convex function f : R n → R is differentiable at x ∈ R n if and only if ∂f ( x ) is a singleton. In this case, the unique norm sub-gradient is the norm gradient, and we have df x ( u ) = L ( ∇ f ( x )) · u for every u ∈ R n .Proof. The uniqueness part of Theorem 4.1 already gives that if f is differentiable at x , then ∇ f ( x ) is the unique norm sub-gradient of f at x . Hence it remains to prove the converse.Assume that ∂f ( x ) = { v } . Then for any u ∈ R n we have f ′ + ( x, − u ) = L ( v ) · ( − u ) = −L ( w ) · u = − f ′ + ( x, u ) , meaning that every u ∈ R n is a linearity direction of f ′ + ( x, · ), which is, therefore, a linearmap. It follows from Proposition 2.1 that f is differentiable at x .17o finish the proof, just notice that if f is differentiable at x ∈ R n , then ∂f ( x ) = {∇ f ( x ) } ,and consequently df x ( u ) = f ′ + ( x, u ) = max {L ( w ) · u : w ∈ ∂f ( x ) } = L ( ∇ f ( x )) · u for each u ∈ R n . Remark . We have defined functions of class C as differentiable functions whose respec-tive Euclidean gradients are continuous. This is clearly the same as demanding that theirrespective differential maps are continuous as maps of R n onto ( R n ) ∗ . Namely, this followseasily from the fact that the identification between R n and ( R n ) ∗ given by the standard innerproduct is linear (and hence continuous). Since the Legendre transform and its inverse arealso continuous, it follows from the last corollary that a convex differentiable function is ofclass C if and only if its norm gradient is continuous for any norm.For the sake of completeness, we state and prove two other properties of the norm sub-differential which are completely analogous to the usual sub-differential. One of them is away to “detect” convexity via the norm sub-differential, and the other one is a version ofRockafellar’s theorem. Proposition 4.4.
A function f : ( R n , || · || ) → R is convex if and only if ∂f ( x ) = ∅ for every x ∈ R n .Proof. We already know that if f is convex, then ∂f ( x ) is non-empty for every x ∈ R n (fromTheorem 4.2). Hence we prove the converse. Let x, y ∈ R n and λ ∈ [0 , w ∈ ∂f ((1 − λ ) x + λy ). Therefore, f ( x ) − f ((1 − λ ) x + λy ) ≥ L ( w ) · ( x − (1 − λ ) x − λy ) = λ L ( w ) · ( x − y )and f ( y ) − f ((1 − λx ) + λy ) ≥ L ( w ) · ( y − (1 − λ ) x − λy ) = (1 − λ ) L ( w ) · ( y − x ) . Multiplying the first inequality by (1 − λ ), the second by λ , and adding both, we get that f ((1 − λ ) x + λy ) ≤ (1 − λ ) f ( x ) + λf ( y ) , and this proves that f is convex.In what follows, we define the norm sub-differential of a convex function f as the set ∂f ⊆ R n × R n given by ∂f := { ( x, w ) ∈ R n × R n : w ∈ ∂f ( x ) } . A subset S ⊆ R n × R n is said to be norm cyclically monotonic if for every m ∈ N and anysubset { ( x , w ) , . . . , ( x m , w m ) } ∈ S the number L ( w ) · ( x − x ) + L ( w ) · ( x − x ) + . . . + L ( w m − ) · ( x m − x m − ) + L ( w m ) · ( x − x m )is non-positive. If f is a convex function, then it is clear that any finite ordered subset { ( x , w ) . . . , ( x m , w m ) } of ∂f is norm cyclically monotonic, because L ( w ) · ( x − x ) + L ( w ) · ( x − x ) + . . . + L ( w m − ) · ( x m − x m − ) + L ( w m ) · ( x − x m ) ≤≤ f ( x ) − f ( x ) + f ( x ) − f ( x ) + . . . + f ( x m ) − f ( x m − ) + f ( x ) − f ( x m ) = 0 . −∞ , + ∞ ]. Aconvex function f : R n → ( −∞ , + ∞ ] is said to be proper if { f = ∞} 6 = R n . Theorem 4.3.
A non-empty set S ⊆ R n × R n is norm cyclically monotonic if and only ifthere exists a proper convex function f : R n → ( −∞ , + ∞ ] such that S ⊆ ∂f .Proof. We already know that if S ⊆ ∂f for some proper convex function f , then S is normcyclically monotonic. Thus, we prove the converse. Let S ⊆ R n × R n be norm cyclicallymonotonic. We fix ( x , w ) ∈ S and define f : R n → ( −∞ , + ∞ ] by f ( x ) = sup {L ( w m ) · ( x − x m ) + L ( w m − ) · ( x m − x m − ) + . . . + L ( w ) · ( x − x ) } , where the supremum is taken over all values of m ∈ N and every possible choice of points( x j , w j ) ∈ S for j = 1 , . . . , m . The function f defined this way is the supremum of affinefunctions, and hence it is a convex function (see [28, Chapter 1]). Moreover, since S is normcyclically montonic, it follows that f ( x ) = 0, and then f is proper.Now let ( x, w ) ∈ S . We have to prove that w is a norm sub-gradient of f at x . For thispurpose, choose a number α < f ( x ). Hence there exist pairs ( x , w ) , . . . , ( x m , w m ) such that α < L ( w m ) · ( x − x m ) + L ( w m − ) · ( x m − x m − ) + . . . + L ( w ) · ( x − x ) . Putting ( x, w ) = ( x m +1 , w m +1 ), it also comes from the definition of f that f ( y ) ≥ L ( w m +1 ) · ( y − x m +1 ) + L ( w m ) · ( x m +1 − x m ) + . . . + L ( w ) · ( x − x ) , for every y ∈ R n . These two inequalities yield immediately that f ( y ) > α + L ( w m +1 ) · ( y − x m +1 ) = α + L ( w ) · ( y − x ) . Since this holds for any α < f ( x ) and every y ∈ R n , we have that f ( y ) ≥ f ( x ) + L ( w ) · ( y − x )for each y ∈ R n . It follows that w ∈ ∂f ( x ), and the proof is done. Remark . The proof of the Euclidean case is given in [28, Theorem 1.5.16].Here a disclaimer is due, namely on reasons why all of that “works so well” when weonly have a (smooth and strictly convex) norm to work with. The idea behind sub-gradientsis to “control” the one-sided derivatives using linear functionals. As it is pointed out inRemark 4.5, in the classical theory these functionals are simply identified with vectors viathe standard inner product of R n . When we have a norm, we still can identify vectors withlinear functionals in a way which is coherent with the norm of the domain by using theLegendre transform.Things get more interesting when we adopt the geometric point of view. In the classicaltheory, sub-differentials are related to normal cones of sub-level sets. The natural questionthat arises is whether this is also true in our case, when one replaces the standard normalitynotion (combined with the standard inner product) by Birkhoff orthogonality. The nexttheorems are devoted to answer this question.19 heorem 4.4. Let x ∈ R n and c = f ( x ) . Assume that c is not a global minimum of f ,and let v ∈ R n be a norm sub-gradient of f at x . Then the hyperplane h which is Birkhoffright-orthogonal to v supports { f ≤ c } at x . Moreover, we have the estimates sup { f ′− ( x, v + z ) : z ∈ h } ≤ || v || ≤ inf { f ′ + ( x, v + z ) : z ∈ h } . (4.6) Remark . If v = 0 and (i) holds, then one can easily see that f ′− ( x, v ) and f ′ + ( x, v ) arepositive numbers. Proof.
By abuse of notation, we assume that h is translated to pass through x . If h does notsupport { f ≤ c } at x , then one may take a point y ∈ h ∩{ f < c } . Since y − x ∈ h and v ⊣ B h ,we have that L ( v ) · ( y − x ) = 0. Once f ( y ) < c = f ( x ), we get f ( y ) − f ( x ) < L ( v ) · ( y − x ),which contradicts the fact that v is a sub-gradient of f at x . This proves that h supports { f ≤ c } at x .The estimates come from Lemma 4.2. For any z ∈ h we have that f ′ + ( x, v + z ) ≥ L ( v ) · ( v + z ) = L ( v ) · v = || v || and f ′− ( x, v + z ) = − f ′ + ( x, − v − z ) ≤ −L ( v ) · ( − v − z ) = || v || , and this concludes the proof.The natural question that arises here is whether the converse holds true, namely thefollowing implication: if a vector v is Birkhoff left-orthogonal to some hyperplane whichsupports { f ≤ c } at x , and if (4.6) also holds, is v then a norm sub-gradient of f ? To answerthat question (positively), we first need to check that the estimates (4.6) really “make sense”. Proposition 4.5.
Let f : R n → R be a convex function, and assume that f ( x ) = c is not aglobal minimum. If h is a supporting hyperplane of { f ≤ c } at x , and if v ⊣ B h , then f ′− ( x, v + z ) ≤ f ′ + ( x, v + z ) for any z , z ∈ h . As a consequence, we have that sup { f ′− ( x, v + z ) : z ∈ h } ≤ inf { f ′ + ( x, v + z ) : z ∈ h } . Proof.
Suppose that there exist z , z ∈ h such that f ′− ( x, v + z ) > f ′ + ( x, v + z ) . This yields f ′ + ( x, z − z ) ≤ f ′ + ( x, v + z ) + f ′ + ( x, − v − z ) = f ′ + ( x, v + z ) − f ′− ( x, v + z ) < , and this is a contradiction because z − z ∈ h . Indeed, since h supports { f ≤ c } at x , wehave that, at x , f is non-decreasing in the direction of z − z .It follows from Proposition 4.5 that if h supports { f ≤ c } at x , then we always canchoose a vector v ⊣ B h for which (4.6) holds. Next we will prove that such a vector is a normsub-gradient. This guarantees the existence of norm sub-gradients pointing in all Birkhoffouter normal directions of { f ≤ c } at x . 20 heorem 4.5. Let f : R n → R be a convex function. As usual, assume that f ( x ) = c is nota global minimum, and that the hyperplane h supports { f ≤ c } at x . If v ∈ R n is a vectorsuch that (i) v ⊣ B h and (ii) sup { f ′− ( x, v + z ) : z ∈ h } ≤ || v || ≤ inf { f ′ + ( x, v + z ) : z ∈ h } ,then v is a norm sub-gradient of f at x .Proof. First observe that, by Remark 4.8, v is non-zero. We have to prove that f ( y ) − f ( x ) ≥L ( v ) · ( y − x ). Any y ∈ R n can be written as y = x + z + λv for some λ ∈ R and some z ∈ h .Assuming first that λ >
0, we have f ( y ) − f ( x ) = f ( x + z + λv ) − f ( x ) ≥ f ′ + ( x, z + λv ) = λf ′ + (cid:16) x, v + zλ (cid:17) ≥ λ || v || == L ( v ) · ( λv ) = L ( v ) · ( z + λv ) = L ( v ) · ( y − x ) , where the first inequality comes from (4.2). We also used the definition of the Legendretransform. If λ <
0, we have f ( y ) − f ( x ) ≥ f ′ + ( x, z + λv ) = − λf ′ + (cid:16) x, − v − zλ (cid:17) = λf ′− (cid:16) x, v + zλ (cid:17) ≥ λ || v || == L ( v ) · ( z + λv ) = L ( v ) · ( y − x ) . It only remains to prove the case λ = 0. To do so, we recall that f ′ + ( x, z ) ≥
0, because z is avector of a supporting hyperplane of { f ≤ c } at x . Hence f ( y ) − f ( x ) = f ( x + z ) − f ( x ) ≥ f ′ + ( x, z ) ≥ L ( v ) · z = L ( v ) · ( y − x ) , and the proof is complete. Corollary 4.3.
Under the same conditions as in the previous theorem, let u be a unit outwardpointing vector Birkhoff orthogonal to a supporting hyperplane h of { f ≤ c } at x . If sup { f ′− ( x, u + z ) : z ∈ h } ≤ λ ≤ inf { f ′ + ( x, u + z ) : z ∈ h } , then λu ∈ ∂f ( x ) .Proof. We have to prove that conditions (i) and (ii) of Theorem 4.5 hold. The first isobvious, because Birkhoff orthogonality is homogeneous. For the second, recall that λ > λ · sup { f ′− ( x, u + z ) : z ∈ h } = sup { f ′− ( x, λu + z ) : z ∈ h } , and that the same holds for the infimum. Thensup { f ′− ( x, λv + z ) : z ∈ h } ≤ λ = || λu || ≤ inf { f ′ + ( x, λu + z ) : z ∈ h } , which is condition (ii) of Theorem 4.5.As we shall see later, Theorem 4.4 is a key ingredient in proving that a distance functionto a convex body in a (smooth) normed space is differentiable outside the body. But this canalso be used to give an easy proof to a well-known result regarding the regularity of norms.21 heorem 4.6. Let ρ : R n → R be a norm. If its unit ball B is smooth, then ρ is C on R n \ { } . Moreover, its norm gradient is given by ∇ ρ ( x ) = xρ ( x ) , for each x ∈ R n \ { } .Proof. First notice that ρ is a sub-linear function, and hence it is convex. Thus, to show that ρ is differentiable at a given point x , it suffices to prove that the norm sub-differential of ρ at x contains a unique element. Let x ∈ R n \ { } be such that ρ ( x ) = c ( > { ρ ≤ c } is the ball cB , which by the smoothness hypothesis is supported at x by a uniquehyperplane ( h , say). Thus, given v ∈ ∂ρ ( x ) we have v ⊣ B h , meaning that v = βx for some β ∈ R . Since ddt ρ ( x ± tx ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 = ± ρ ( x ) , we have that ρ ′ + ( x, x ) = ρ ( x ) = − ρ ′ + ( x, − x ). With Lemma 4.2 this yields ρ ( x ) = ρ ′ + ( x, x ) ≥ L ( v ) · x = L ( βx ) · x = βρ ( x ) and − ρ ( x ) = ρ ′ + ( x, − x ) ≥ L ( v ) · ( − x ) = −L ( βx ) · x = − βρ ( x ) . These inequalities give that β = 1 /ρ ( x ). It follows that ∂ρ ( x ) is a singleton, and hence ρ isdifferentiable. The unique element of ∂ρ ( x ) is the norm gradient ∇ ρ ( x ) = βx = xρ ( x ) , which is clearly continuous. It follows that ρ is C on R n \ { } . Remark . Notice that by the continuity of the Legendre transform we have that if the normgradient is continuous (as a map onto R n ), then the Euclidean gradient is also continuous.Indeed, the Euclidean gradient is the image of the norm gradient under the composition ofthe Legendre transform with the isomorphism between R n and ( R n ) ∗ given by the standardinner product.In some fields (such as Finsler geometry, for example) it is more common to define theLegendre transform by means of the derivative of the norm. However, in the beginning wedid not assume differentiability of the norm, but only strict convexity and smoothness. Thelast theorem states that, under these hypotheses, the norm is indeed differentiable, and hencewe can characterize now the Legendre transform by means of the derivative of the norm. Corollary 4.4.
Let ρ be a smooth norm on R n . The associated Legendre transform can bewritten as L ( x ) = ρ ( x ) · dρ x ( · ) , for each x ∈ R n . roof. If x = 0, then there is nothing to prove. Hence we assume that x = 0 and put c = ρ ( x ). With B as unit ball of ρ , let h be the hyperplane such that x ⊣ B h . If z ∈ h , thenwe may take a differentiable curve γ ( t ) : I ⊆ R → c · ∂B such that γ (0) = x and γ ′ (0) = z .Thus, ρ ( x ) · dρ x ( z ) = ρ ( x ) · ddt ρ ◦ γ ( t ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 = 0 = L ( x ) · z, because ρ ◦ γ ( t ) = c for every t . Finally, we calculate ρ ( x ) · dρ x ( x ) = ρ ( x ) · ddt ρ ( x + tx ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 = ρ ( x ) = L ( x ) · x. Remark . Having the differentiability of the norm (except at the origin, of course) apriori , the Legendre transform can be equivalently defined as L ( x ) · v := 12 ddt ρ ( x + tv ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 , for any x, v ∈ R n . This approach was taken in [3], for example. Even more, in [16] theauthors study the map [ v, x ] := L ( x ) · v , which they call a semi-inner product .To finish this section, we relate norm sub-differentials with normal cones of sub-level setsat boundary points. Let K be a convex body, and x be a boundary point. The ( Birkhoff ) normal cone of K at x , denoted by NC( K, x ), is the set of all outward pointing vectorswhich are Birkhoff left-orthogonal to some hyperplane that supports K at x , together withthe zero vector (for other normal cones appearing in the geometry of Banach spaces we referthe reader to [25]). Since an outward pointing unit Birkhoff left-orthogonal vector to somesupporting hyperplane of K at x is called an outer normal, and since Birkhoff orthogonalityis homogeneous, we haveNC( K, x ) = R + · { outer normals of K at x } , where R + · A := { λa : λ ≥ a ∈ A } . A consequence of Theorem 4.4 is that the inclusion R + · ∂f ( x ) ⊆ NC( { f ≤ c } , x )holds whenever f : R n → R is a convex function and f ( x ) = c is not a global minimum.Under these same hypotheses, Theorem 4.5 gives the reverse inclusion:NC( { f ≤ c } , x ) ⊆ R + · ∂f ( x ) . As it was mentioned before, the norm sub-differential is the pull-back of a (convex) set oflinear functionals in ( R n ) ∗ by the Legendre transform: ∂f ( x ) = L − (cid:16) { φ ∈ ( R n ) ∗ : f ( y ) − f ( x ) ≥ φ ( y − x ) ∀ y ∈ R n } (cid:17) , and the same holds for the Birkhoff normal cone, as the next lemma shows. Lemma 4.3.
Let K ⊆ R n be a convex body, and let x ∈ ∂K be a boundary point. Then NC(
K, x ) = L − (cid:16) { φ ∈ ( R n ) ∗ : φ ( y − x ) ≤ ∀ y ∈ K } (cid:17) . roof. Let v ∈ NC(
K, x ). Of course, there is nothing to prove for the case v = 0, and hencewe may assume that v is a non-zero vector. If h is the hyperplane such that v ⊣ B h , then h supports K at x . Denote φ = L ( v ). If y ∈ K , then y does not lie in the same half-spacedetermined by h as v (recall that v is an outer normal), and hence we may write y = αv + z, for some z ∈ h and some α ≤
0. Thus, φ ( y − x ) = L ( v ) · ( y − x ) = L ( v ) · ( αv + z ) = α || v || ≤ , and this shows the inclusion “ ⊆ ”. Now assume that φ is a non-zero functional and observethat if φ ( y − x ) ≤ y ∈ K , then h := ker( φ ) supports K at x . Then, if v is a unitand outward pointing vector such that v ⊣ B h , it follows that φ = L ( λv ) for some λ > Remark . The “functional versions” of normal cones and sub-differentials can be used toinvestigate the case where the norms are not smooth or strictly convex. In this direction, werefer the reader to [23].
Throughout this section we will always assume that the involved norms are smooth andstrictly convex. Let K ⊆ ( R n , || · || ) be a convex body which is not necessarily smooth orstrictly convex. For a given x ∈ R n \ K the metric projection p K ( x ) ∈ ∂K is unique, andhence the outer normal η K ( x ) defined as η K ( x ) := x − p K ( x ) || x − p K ( x ) || is unique. Denote by d K the distance function to K , defined as d K ( x ) := dist( x, K ) . In the next theorem we discuss the differentiability of d K . Theorem 5.1.
The function d K is differentiable in R n \ K . Moreover, we have ∇ d K ( x ) = η K ( x ) for any x ∈ R n \ K .Proof. First observe that, for each c >
0, the sub-level set { d K ≤ c } is precisely the convexbody K + cB . Assume that d K ( x ) = c , and let h be the (unique) supporting hyperplaneof K + cB at x . Assume that v ∈ ∂d K ( x ). Due to Theorem 4.4, the hyperplane whichis right-orthogonal to v supports K + cB at x , and hence v ⊣ B h . From Propositions 3.3and 3.5 we get that v is a multiple of η K ( x ). It follows that ∂d K ( x ) ⊆ span { η K ( x ) } . FromProposition 3.4 we have that d K ( x + tη K ( x )) = d K ( x ) + t, for t ∈ R small enough. From that equality we get immediately that( d K ) ′ + ( x, η K ( x )) = 1 = − ( d K ) ′ + ( x, − η K ( x )) . αη K ( x ) ∈ ∂d K ( x ). Lemma 4.2 implies that1 = ( d K ) ′ + ( x, η K ( x )) ≥ L ( αη K ( x )) · η K ( x ) = α || η K ( x ) || = α. Finally, we have − d K ) ′ + ( x, − η K ( x )) ≥ L ( αη K ( x )) · ( − η K ( x )) = − α || η K ( x ) || = − α. It follows that α = 1. Hence ∂d K ( x ) = { η K ( x ) } , that is, the norm sub-differential of d K at x is a singleton. From Corollary 4.2 we get that d K is differentiable at x , and ∇ d K ( x ) = η K ( x ). Corollary 5.1.
The norm gradient ∇ d K of the distance function d K is continuous in R n \ K .In particular, d K is a function of class C .Proof. This follows immediately from the equality x = p K ( x ) + dist( x, K ) · η K ( x ) , which holds for any x ∈ R n \ K . Since the distance function dist( · , K ) and the metricprojection p K are continuous functions in R n \ K (see Proposition 3.6) we get that η K ( x ) iscontinuous in R n \ K . Consequently, ∇ d K is continuous in R n \ K , and from Remark 4.6 weget that d K is a function of class C in R n \ K .Observe that in the interior of K we clearly have that d K is differentiable, and ∇ d K ( x ) = { } . In fact, one just has to recall that d K is constant in K . Next we investigate what happenson the boundary of K . We show that d K is not differentiable on ∂K , and characterize itsnorm sub-differential at these points. Theorem 5.2.
For each x ∈ ∂K we have ∂d K ( x ) = NC( K, x ) . In particular, d K is not differentiable at a boundary point of K .Proof. The fact that 0 ∈ ∂d K ( x ) follows from the property that d K ( x ) = 0 is the globalminimum of d K . Now let v be a (unit) outer normal of K at x , and assume that h is thesupporting hyperplane of K at x such that v ⊣ B h . Since d K ( x + tv ) = t for t ≥
0, we havethat ( d K ) ′ + ( x, v ) = 1. If θ ∈ (0 , + ∞ ), then we will prove that θv ∈ ∂d K ( x ). Let y ∈ R n , andwrite y = λθv + z + x for some z ∈ h and some λ ∈ R . First, assume that λ >
0. From (4.2)we have d K ( λθv + z + x ) − d K ( x ) ≥ ( d K ) ′ + ( x, λθv + z ) . Also, we claim that ( d K ) ′ + ( x, λθv + z ) ≥ ( d K ) ′ + ( x, λθv ). Indeed, observe that d K ( x + tv ) = t for any t >
0, and that Proposition 3.5 implies that z is a supporting direction of { d K ≤ t } at x + tv . Hence d K ( x + αv + βz ) ≥ d K ( x + αv ) for any α > β ∈ R . Consequently,( d K ) ′ + ( x, λθv + z ) = lim ε → + d K ( x + ε ( λθv + z )) − d K ( x ) ε ≥ lim ε → + d K ( x + ελθv ) − d K ( x ) ε == ( d K ) ′ + ( x, λθv ) . Since ( d K ) ′ + ( x, λθv ) = λθ ( d K ) ′ + ( x, v ) = λθ , we get d K ( y ) − d K ( x ) ≥ ( d K ) ′ + ( x, λθv + z ) ≥ λθ.
25n the other hand, we have that L ( θv ) · ( y − x ) = L ( θv ) · ( λv + z ) = λθ || v || = λθ, and this concludes the case λ >
0. If λ ≤
0, then we recall again that d K ( x ) = 0 is the globalminimum of d K , and write L ( θv ) · ( y − x ) = L ( θv ) · ( λv + z ) = λθ || v || = λθ ≤ ≤ d K ( y ) − d K ( x ) . This shows that θv ∈ ∂d K ( x ). Hence we have the inclusion ∂d K ( x ) ⊇ NC(
K, x ). Now assumethat v ∈ ∂d K ( x ). For any y ∈ K we have0 = d K ( y ) − d K ( x ) ≥ L ( v ) · ( y − x ) . Consequently, we get from Lemma 4.3 that v ∈ NC(
K, x ). This gives the remaining inclusion ∂d K ( x ) ⊆ NC(
K, x ). References [1] A. R. Alimov, I. G. Tsar’kov, Connectedness and other geometric properties of suns andChebyshev sets (Russian).
Fundam. Prikl. Mat. (4), pp. 21–91, 2014; translation in J. Math. Sci. (N.Y.) (6), pp. 683–730, 2016.[2] J. Alonso, H. Martini, S. Wu, On Birkhoff orthogonality and isosceles orthogonality innormed linear spaces.
Aequationes Math. , 153-189, 2012.[3] J. C. ´Alvarez-Paiva and A. C. Thompson, Volumes in normed and Finsler spaces, in: ASampler of Riemann-Finsler Geometry (eds.: D. Bao, R. Bryant, S. S. Chern, and Z.Shen), pp. 1–49. Cambridge University Press, Cambridge, 2004.[4] E. Asplund, Differentiability of the metric projection in finite-dimensional Euclideanspace.
Proc. Amer. Math. Soc. , pp. 218–219, 1973.[5] M. Borwein, S. P. Fitzpatrick and J. R. Giles, The differentiability of real functions onnormed linear space using generalized subgradients. J. Math. Anal. Appl. (2), pp.512–534, 1987.[6] M. Borwein and A. S. Lewis,
Convex Analysis and Nonlinear Optimization: Theoryand Examples , Second Edition. CMS Books in Mathematics , Canadian MathematicalSociety, 2006.[7] A. L. Brown, Suns in normed linear spaces which are finite-dimensional. Math. Ann. (1), pp. 87–101, 1987.[8] L. Bunt, Bijdrage tot de theorie der konvekse puntverzamelingen. Thesis, University ofGroningen, Amsterdam, 1934.[9] F. H. Clarke,
Optimization and Nonsmooth Analysis . Classics in Applied Mathematics , Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990.[10] R. Correa, D. Salas and L. Thibault, Smoothness of the metric projection onto nonconvexbodies in Hilbert spaces. J. Math. Anal. Appl. (2), pp. 1307–1332, 2018.2611] F. Deutsch, The convexity of Chebyshev sets in Hilbert space, in
Topics in Polynomialsof One and Several Variables and Applications . World Scientific Publishing Co., Inc.,NJ, pp. 143–150, 1993.[12] S. Fitzpatrick, Metric projections and the differentiability of distance functions.
Bull.Aust. Math. Soc. , pp. 291–312, 1980.[13] J. R. Giles, A distance function property implying differentiability. Bull. Aust. Math.Soc. (1), pp. 59–70, 1989.[14] J. R. Giles, Differentiability of distance functions and a proximinal property inducingconvexity. Proc. Amer. Math. Soc. (2), pp. 458–464, 1988.[15] L. Hetzelt, On suns and cosuns in finite-dimensional normed real vector spaces.
ActaMath. Hungar. (1–2), pp. 53–68, 1985.[16] ´A. G. Horv´ath, Z. L´angi and M. Spirova, Semi-inner products and the concept of semi-polarity. Results Math. (1–2), pp. 127–144, 2017.[17] L. A. Karlovitz, The construction and application of contractive retractions in 2-dimensional normed linear spaces. Indiana Univ. Math. J. (5), pp. 473–481, 1972.[18] S. G. Krantz and H. R. Parks, On the vector sum of two convex sets in space. Canad.J. Math. (2), pp. 347–355, 1991.[19] Y. Li and L. Nirenberg, Regularity of the distance function to the boundary. Rend.Accad. Naz. Sci. XL Mem. Mat. Appl. (5) (1), pp. 257–264, 2005.[20] H. Martini, K. J. Swanepoel and G. Weiss, The geometry of Minkowski spaces – a survey.Part I. Expositiones Math. , pp. 97–142, 2001.[21] B. S. Mordukhovich and N. M. Nam, Convex Analysis with Applications to Optimizationand Location Problems . Springer, to appear.[22] D. Noll, Directional differentiability of the metric projection in Hilbert space.
Pacific J.Math. (2), pp. 567–592, 1995.[23] J. P. Penot and R. Ratsimahalo, Characterizations of metric projections in Banachspaces and applications.
Abstr. Appl. Anal. (1–2), pp. 85–103, 1998.[24] R. A. Poliquin, R. T. Rockafellar and L. Thibault, Local differentiability of distancefunctions. Trans. Amer. Math. Soc. (11), pp. 5231–5249, 2000.[25] D. Sain, K. Paul and A. Mal, On approximate Birkhoff-James orthogonality and normalcones in a normed space.
J. Convex. Anal. (1), pp. 341–351, 2019.[26] Sangeeta and T. D. Narang, On suns in linear metric spaces and convex metric spaces. East J. Approx. (2), pp. 127–139, 2011.[27] M. Safdari, The distance function from the boundary of a domain with corners. NonlinearAnal. , pp. 294–310, 2019.[28] R. Schneider,
Convex Bodies: The Brunn-Minkowski Theory . Encyclopedia of Mathe-matics and its Applications , Cambridge University Press, Cambridge, 2014.[29] A. C. Thompson,
Minkowski Geometry . Encyclopedia of Mathematics and Its Applica-tions , Cambridge University Press, Cambridge, 1996.2730] L. P. Vlasov, Approximatively convex sets in uniformly smooth spaces. Mat. Zametki ,pp. 443–450, 1967.[31] L. P. Vlasov, On almost convex sets in Banach spaces (Russian). Dokl. Akad. Nauk SSSR , pp. 18–21, 1965.[32] L. Zaj´ıˇcek, Differentiability of the distance function and points of multivaluedness of themetric projection in Banach space.
Czechoslovak Math. J. (2), pp. 292–308, 1983.[33] L. Zaj´ıˇcek, On the Fr´echet differentiability of distance functions. Proceedings of the 12thwinter school on abstract analysis (Srn´ı, 1984). Rend. Circ. Mat. Palermo , Suppl. No.5, pp. 161–165, 1984.[34] Z. Wu, A Chebyshev set and its distance function. J. Approx. Theory119