[PDF] Conservative and semismooth derivatives are equivalent for semialgebraic maps

Abstract

Subgradient and Newton algorithms for nonsmooth optimization require generalized derivatives to satisfy subtle approximation properties: conservativity for the former and semismoothness for the latter. Though these two properties originate in entirely different contexts, we show that in the semi-algebraic setting they are equivalent. Both properties for a generalized derivative simply require it to coincide with the standard directional derivative on the tangent spaces of some partition of the domain into smooth manifolds. An appealing byproduct is a new short proof that semi-algebraic maps are semismooth relative to the Clarke Jacobian.

Full PDF

aa r X i v : . [ m a t h . O C ] F e b Conservative and semismooth derivatives areequivalent for semialgebraic maps

Damek Davis ∗ Dmitriy Drusvyatskiy † Abstract

Subgradient and Newton algorithms for nonsmooth optimization require gener-alized derivatives to satisfy subtle approximation properties: conservativity for theformer and semismoothness for the latter. Though these two properties originatein entirely diﬀerent contexts, we show that in the semi-algebraic setting they areequivalent. Both properties for a generalized derivative simply require it to coincidewith the standard directional derivative on the tangent spaces of some partition ofthe domain into smooth manifolds. An appealing byproduct is a new short proofthat semi-algebraic maps are semismooth relative to the Clarke Jacobian.

Algorithms for nonsmooth optimization, such as subgradient and semi-smooth New-ton methods, crucially rely on ﬁrst-order approximations of Lipschitz maps. Twoseemingly disparate ﬁrst-order approximation properties underlie existing results.

Conservativity.

In full generality, Lipschitz continuous functions can be highlypathological, resulting in failure of subgradient methods to ﬁnd any critical points[8]. Consequently, it is essential to limit the class of functions under consideration.With this in mind, recent analysis of subgradient methods [3, 9, 15] requires conser-vativity of a generalized gradient mapping G ( x ) for f . This property, axiomaticallyexplored by Bolte and Pauwels [3], stipulates the validity of a formal chain rule ddt ( f ◦ γ )( t ) = h G ( γ ( t )) , ˙ γ ( t ) i for a.e. t ∈ (0 , , along any absolutely continuous curve γ . Most importantly, conservativity holdsautomatically when f is semialgebraic and G is the Clarke subdiﬀerential ∂ C f [9].More generally, Bolte and Pauwels [3,4] showed that conservativity holds when G isan output of an automatic diﬀerentiation scheme on a composition of semialgebraicfunctions. ∗ School of ORIE, Cornell University, Ithaca, NY 14850, USA; people.orie.cornell.edu/dsd95/ .Research of Davis supported by an Alfred P. Sloan research fellowship and NSF DMS award 2047637. † Department of Mathematics, U. Washington, Seattle, WA 98195; ∼ ddrusv . Research of Drusvyatskiy was supported by the NSF DMS1651851 and CCF 1740551 awards. emismoothness. Newton methods for solving nonsmooth equations F ( x ) = 0,for a Lipschitz map F require a semismoothness property of F with respect to ageneralized Jacobian mapping G ( x ). This property, introduced by Miﬄin [16] andexplored for Newton methods by Qi and Sun [21], stipulates the estimate: F ( y ) − F ( x ) − G ( y )( y − x ) ⊂ o ( k x − y k ) B as y → x, at any point x ∈ R n . Bolte, Daniilidis and Lewis [1] famously showed that semis-moothness holds when F is semialgebraic and G = ∂ C F is the Clarke Jacobian.Seemingly disparate, there is reason to believe conservativity and semismooth-ness are closely related. For example, Norkin’s seminal work [18, 19] analyzed sub-gradient methods on functions f that are semismooth with respect to certain gen-eralized subdiﬀerential operators. On the other hand, Ruszczy´nski [23] recentlyshowed that semismoothness implies a weaker notion of conservativity along semis-mooth curves. In this paper, we prove that the notions of conservativity and semis-moothness are equivalent for semialgebraic maps. Our proof strategy is to relate thetwo properties to an intermediate “stratiﬁed derivative” condition, already knownfrom [14] to be equivalent to conservativity. The end result applies to a naturalfamily of (set-valued) directional derivatives D ( x, u ), akin to G ( x ) u . In particular, D ( x, u ) could be a directional derivative, the map induced by the coordinate-wiseClarke Jacobian, or the output of an automatic diﬀerentiation procedure. Theorem 1.1 (Informal) . Suppose F ( · ) and D ( · , · ) are semialgebraic. Then con-servativity and semismoothness are both equivalent to the following condition: thereexists a partition of R d into ﬁnitely many semialgebraic C manifolds such that forany point x lying in a manifold M equality holds: D ( x, u ) = { F ′ ( x, u ) } ∀ u ∈ T M ( x ) , Thus both conservativity and semismoothness hold if and only if D ( x, u ) coin-cides with the standard directional derivative F ′ ( · , · ) on the tangent spaces of somepartition of R n into ﬁnitely many smooth manifolds. In dual terms, this means thatsemismooth generalized derivatives are simply selections of the map ( x, u ) → J ( x ) u where the rows of J ( x ) are Clarke subgradients of F i shifted by the normal space.Building on [3], the authors of [14] recently showed that the latter property is equiv-alent to J ( x ) being conservative. Summarizing, the results of our paper togetherwith those already available in the literature [3, 14] imply that semismooth direc-tional derivatives, stratiﬁed directional derivatives/subgradients, and conservativeset-valued vector ﬁelds are equivalent in the semi-algebraic setting. In this section, we record basic notation and preliminaries from variational analysisand semi-algebraic geometry that will be used in the paper. .1 Variational analysis We follow standard notation of variational analysis, as set out for example in themonographs [5, 12, 13, 17, 20, 22]. Throughout, ﬁx a Euclidean n -dimension space,denoted by R n , equipped with an inner product h· , ·i and the induced norm k x k = p h x, x i . A set-valued map G : R n → R m , maps points to subsets G ( x ) ⊂ R m . Thedomain and graph of G , respectively, are deﬁned bydom G = { x ∈ R n : G ( x ) = ∅} and gph G = { ( x, y ) ∈ R n × R m : y ∈ G ( x ) } . The set Limsup x → ¯ x G ( x ) consists of all limits of sequences y i ∈ G ( x i ) where x i isany sequence converging to ¯ x . The map G is called outer-semicontinuous if gph G is a closed set. We say that G is inner-semicontinuous on a set Q ⊂ R n if forany points x ∈ Q and y ∈ G ( x ) and for any sequence x i Q −→ x , there exist points y i ∈ G ( x i ) converging to y .The distance function and the projection onto a set X ⊂ R n are deﬁned by:dist( y, X ) := inf x ∈ X k y − x k , proj( y, X ) := argmin x ∈ X k y − x k . The deviation between sets

X, Y ⊂ R n will be measured by the Hausdorﬀ distancedist( X, Y ) := max ( sup x ∈ X dist( x, Y ) , sup y ∈ Y dist( y, X ) ) . The directional derivative of any map F : R n → R m is deﬁned by F ′ ( x, u ) := lim t ց F ( x + tu ) − F ( x ) t . (2.1)The map F is called directionally diﬀerentiable if F ′ ( x, u ) is well-deﬁned, meaningthe limit exists in (2.1) for every x, u ∈ R n . It is straightforward to verify that when F is locally Lipschitz continuous and directionally diﬀerentiable, equality holds: F ′ ( x, u ) = lim t ց , v → u F ( x + tv ) − F ( x ) t Abusing notation, for any curve γ : [0 , → R n , we deﬁne the one-sided velocity γ ′ (0) := lim t ց γ ( t ) − γ (0) t and we let ˙ γ ( t ) denote the derivative of γ at any point t ∈ (0 ,

1) where γ is diﬀerentiable.Given any locally Lipschitz continuous map F : R n ⇒ R m , the Clarke Jacobianof F at x is the set ∂ C F ( x ) = conv (cid:26) lim i →∞ ∇ F ( x i ) : x i Ω −→ x (cid:27) , where Ω is the set of points at which F is diﬀerentiable and conv ( · ) denotes theconvex hull. Finally, for any C manifold M ⊂ R n and a point x ∈ M , the symbols T M ( x ) and N M ( x ) will denote the tangent and normal spaces to M at x . .2 Semialgebraic geometry We next collect a few elementary facts from semialgebraic geometry. For detailswe refer the reader to [6, 7, 25]. All results in the paper hold more generally, andwith identical proofs, for sets and functions deﬁnable in an o-minimal structure. Wefocus on the semialgebraic setting only for simplicity.A set Q ⊂ R n is called semialgebraic if it can be written as a union of ﬁnitelymany sets deﬁned by ﬁnitely many polynomial inequalities. A set-valued map G : R n ⇒ R m is called semi-algebraic if its graph is a semialgebraic set. Univariatesemialgebraic functions are particularly simple. Lemma 2.1 (Curves) . For any semialgebraic map γ : [0 , → R n , there exists ǫ > such that γ is C -smooth on the open interval (0 , ǫ ) . Moreover, as long as thequotients t − ( γ ( t ) − γ (0)) are bounded for all small t > , the derivative γ ′ (0) existsand equality γ ′ (0) = lim t ց ˙ γ ( t ) holds. Notice that the equality γ ′ (0) = lim t ց ˙ γ ( t ) can be interpreted as semi-smoothnessof univariate semi-algebraic functions—an observation we will revisit. An immediateconsequence is that locally Lipschitz semi-algebraic maps are directionally diﬀeren-tiable. The following theorem moreover shows that semi-algebraic maps are “gener-ically smooth.” To simplify notation, we use the term C semialgebraic partition of R n to mean a partition of R n into ﬁnitely many semialgebraic C manifolds. Theorem 2.1 (Generic smoothness) . Let F : R n → R m be a semialgebraic map.Then there exists a C -semialgebraic partition A of R n such that the restriction of F to each manifold M ∈ A is C -smooth. A useful property that blends variational analytic and semialgebraic construc-tions is the projection formula, proved in [2, Proposition 4].

Theorem 2.2 (Projection formula) . Let f : R n → R be a semialgebraic locallyLipschitz continuous function. Then there exists a C -semialgebraic partition A of R n such that for any point x lying in a manifold M ∈ A equality holds: { f ′ ( x, u ) } = h ∂ C f ( x ) , u i ∀ u ∈ T M ( x ) . The next result shows that any semialgebraic map admits a semialgebraic single-valued selection.

Lemma 2.2 (Semialgebraic selection) . Any semi-algebraic map G : R n ⇒ R m admits a semialgebraic selection g : dom G → R satisfying g ( x ) ∈ G ( x ) for all x ∈ dom G . The following theorem, from [11, Proposition 2.28], shows that semialgebraicset-valued maps are generically inner-semicontinuous.

Theorem 2.3 (Generic inner semicontinuity) . Let G : R n ⇒ R m be a semialge-braic map. Then there exists a C -semialgebraic partition A of R n such that therestriction G | M is inner-semicontinuous for every M ∈ A . any of the aforementioned results guarantee existence of certain partitions of R n . Consequently, it will be useful to reﬁne partitions. A partition A of R n iscalled compatible with a collection B of sets in R n if each set in B is a union of somesets in A . Theorem 2.4 (Compatible partitions) . Let B be a collection of ﬁnitely many semi-algebraic sets. Then there exists a C -semialgebraic partition A of R n that is com-patible with B . Throughout, we let F : R n → R m be a locally Lipschitz continuous and directionallydiﬀerentiable map and let D : R n × R n ⇒ R m be a set-valued map. The readermay view D ( x, u ) as a “generalized directional derivative” of F at x in direction u . Assumption 1.

We introduce the following assumptions on D .1. (Full domain) The image D ( x, u ) is nonempty for all x, u ∈ R n .2. (Homogeneity) The map D is positively homogeneous in the second argument: D ( x,

0) = { } and D ( x, tu ) = tD ( x, u ) , for all x, u ∈ R n and t > .3. (Lipschitz continuity) The assignment D ( x, · ) is Lipschitz continuous locallyuniformly in x . That is, for every point ¯ x ∈ R n , there exists L > such that dist( D ( x, u ) , D ( x, u )) ≤ L k u − u k , for all u , u ∈ R n and all x suﬃciently close to ¯ x . The ﬁrst condition is self-explanatory. The second condition, which asserts thatgph D ( x, · ) is a cone, is natural for directional derivatives. The third condition ismore nuanced but is again mild. In particular, all three conditions hold for thedirectional derivative D ( x, u ) = F ′ ( x, u ) and for any map of the form D ( x, u ) = J ( x ) u where J : R n ⇒ R m × n is locally bounded.Clearly, D ( x, u ) can be regarded as a generalized directional derivative of F only if it accurately predicts the variations of F at x in direction u . There are anumber of seemingly distinct conditions in the literature that model “goodness ofapproximation,” depending on context. We record the most relevant ones for usbelow and comment on each. Conditions 1.

We introduce the following conditions.1. (Semismooth I)

For any point x it holds: Limsup y → x F ( y ) − F ( x ) − D ( y ; y − x ) k y − x k = { } , (Semismooth II) For any point x it holds: Limsup y → x F ( y ) − F ( x ) + D ( y ; x − y ) k y − x k = { } . . (Conservative derivative) For any absolutely continuous curve γ : [0 , → R n , equality holds: (cid:26) ddt ( F ◦ γ )( t ) (cid:27) = D ( γ ( t ) , ˙ γ ( t )) for a.e. t ∈ (0 , . (3.1) (Stratiﬁed derivative) There exists a semialgebraic C partition A of R n suchthat equality D ( x, u ) = { F ′ ( x, u ) } holds for any manifold M ∈ A , x ∈ M , and any tangent vector u ∈ T M ( x ) .5. (Stratiﬁed subdiﬀerential) There exists a semialgebraic C partition A of R n such that for any manifold M ∈ A and x ∈ M , it holds: D ( x, u ) ⊂ J ( x ) u, where we set J ( x ) = { A ∈ R m × n : A i ∈ ∂ C F i ( x ) + N M ( x ) } , and A i denote the rows of A . Let us discuss these conditions in turn.

Semismoothness.

The ﬁrst two conditions, 1 and 2, play a central role for en-suring superlinear convergence of Newton-type algorithms for nonsmooth equations.We refer the reader the monograph [13, Chapter 10] for details. To place these con-ditions in context, recall that any locally Lipschitz and directionally diﬀerentiablemap F satisﬁes the ﬁrst-order approximation property [24]: F ( y ) = F ( x ) + F ′ ( x, y − x ) + o ( k y − x k ) as y → x. (3.2)Condition 1 instead asserts F ( y ) − F ( x ) − D ( y, y − x ) ⊂ o ( k y − x k ) B as y → x. Notice that contrary to (3.2), the value D ( y, y − x ) is computed at the basepoint y .Condition 2 asserts instead F ( x ) − F ( y ) − D ( y, x − y ) ⊂ o ( k y − x k ) B as y → x. Clearly, when D has the form D ( x, u ) = J ( x ) u for some set-valued map J : R n → R m × n , conditions 1 and 2 are equivalent. In particular, if F is directionally dif-ferentiable and J ( x ) = ∂ C F ( x ) is the Clarke generalized Jacobian, both conditionsreduce to the semismoothness property in the sense of [16, 21]. In the contextof optimization, similar conditions have been used by Norkin [18, 19] to establishasymptotic convergence guarantees of subgradient methods. onservative derivative. Condition 3 asserts that the generalized directionalderivative D satisﬁes a formal chain rule at almost every point along any absolutelycontinuous path γ . Equivalently, we may write condition (3.1) as: { F ′ ( γ ( t ) , ˙ γ ( t )) } = D ( γ ( t ) , ˙ γ ( t )) for a.e. t ∈ (0 , . This property is equivalent to the conservative derivatives J ( x ) introduced in [3],in the setting D ( x, u ) = J ( x ) u . Such generalized derivatives play a key role injustifying automatic diﬀerentiation techniques for nonsmooth functions [4], as wellas for analyzing the asymptotic behavior of subgradient methods [9]. Stratiﬁed derivative/subdiﬀerential

Condition 4 is geometrically intuitive.It simply asserts that there exists a partition of R n into ﬁnitely many smoothmanifolds, so that D ( x, · ) coincides with the directional derivative in directionstangent to the manifold containing the point x . Condition 5 can be interpreted as a“dual” counterpart of 4. Namely it stipulates that D ( x, u ) is a selection of the map( x, u ) J ( x ) u , where the rows of J ( x ) consist of the Clarke subdiﬀerentials of thecomponent functions F i shifted by a normal space N M ( x ). The “duality” betweenconditions 4 and 5 is explored in [2]. The goal of our paper is to prove that if Assumption 1 holds and F and D aresemialgebraic, then conditions 1-5 are equivalent. The authors of [14], buildingon [3], recently proved the equivalence 3 ⇔ D ( x, u ) = J ( x ) u where J ( · ) locally bounded and outer-semicontinuous. Therefore we claim nooriginality with respect to the equivalence 3 ⇔ Lemma 3.1.

Condition 3 implies that for any absolutely continuous curve γ : [0 , → R n , equality holds: D ( γ ( t ) , ˙ γ ( t )) = − D ( γ ( t ) , − ˙ γ ( t )) for a.e. t ∈ (0 , . Proof.

Fix an absolutely continuous curve γ : [0 , → R n and deﬁne α ( t ) = γ (1 − t ).Then ˙ α ( t ) = − ˙ γ (1 − t ) and ( F ◦ α ) ′ ( t ) = − ( F ◦ γ ) ′ (1 − t ) for a.e. t ∈ (0 , α in place of γ in (3.1) completes the proof. Theorem 3.2 (Conservative derivatives and semismoothness) . Suppose that F and D are semialgebraic and that Assumption 1 holds. Then condition 3 implies both 1and 2. Moreover, both implications are true if 3 only holds with respect to semial-gebraic curves γ .Proof. Suppose condition 3 holds; we aim to verify 1. To this end, ﬁx a point x andassume without loss of generality x = 0 and F ( x ) = 0. We aim to proveLimsup d → F ( d ) − D ( d, d ) k d k = { } . or any t ∈ [0 ,

1] deﬁne the function ϕ ( t ) := sup ( d,v ): k d k = t, v ∈ D ( d,d ) k F ( d ) − v k . Condition 1 will follow immediately once we establish ϕ ′ (0) = 0. Clearly, we mayassume ϕ ( t ) > t >

0, since otherwise the equality ϕ ′ (0) = 0 holdstrivially. Lemma 2.2 yields semialgebraic curves d ( · ) and v ( · ) satisfying k d ( t ) k = t , v ( t ) ∈ D ( d ( t ), d ( t )) and k F ( d ( t )) − v ( t ) k ≥ ϕ ( t ) for all small t >

0. Observe d (0) = 0 and k d ( t ) k /t = 1. Therefore Lemma 2.1 shows that d ′ (0) exists andlim t → ˙ d ( t ) = d ′ (0).Note the inclusion v ( t ) /t ∈ D ( d ( t ) , d ( t ) t ). Local Lipschitz continuity of D ( x, · ),implies that there exists L > w ( t ) satisfying w ( t ) ∈ D ( d ( t ) , ˙ d ( t )) with (cid:13)(cid:13)(cid:13)(cid:13) w ( t ) − v ( t ) t (cid:13)(cid:13)(cid:13)(cid:13) ≤ L (cid:13)(cid:13)(cid:13)(cid:13) d ( t ) t − ˙ d ( t ) (cid:13)(cid:13)(cid:13)(cid:13) for all small t >

0. Since the right side tends to zero as t ց

0, we compute ϕ ′ (0) = lim t ց ϕ ( t ) t ≤ limsup t ց k F ( d ( t )) − v ( t ) k t = 2 · limsup t ց (cid:13)(cid:13)(cid:13)(cid:13) F ( d ( t )) t − w ( t ) (cid:13)(cid:13)(cid:13)(cid:13) . Lemma 2.1 and condition 3 guaranteelim t ց w ( t ) = lim t ց ( F ◦ d ) ′ ( t ) = lim t ց F ( d ( t )) t , Note that all the limits in the displayed equation exist due to the local monotonicityof semialgebraic functions. We conclude ϕ ′ (0) = 0 and therefore condition 1 holds.The proof of condition 2 follows by identical reasoning, while taking into accountLemma 3.1.Combining Theorem 3.2 with Theorem 2.2 immediately shows that semi-algebraiclocally Lipschitz maps are semismooth with respect to the Clarke generalized Jacobian—the main result of [1]. The proof just presented appears to be simpler and moredirect than the original argument in [1]. Corollary 3.3 (Semi-algebraic maps are semismooth) . Any locally Lipschitz semi-algebraic map F : R n ⇒ R m satisﬁes condition 1 for the map D ( x, u ) = ∂ C F ( x ) u .Proof. Note the inclusion ∂ C F ( x ) ⊂ { A ∈ R m × n : A i ∈ ∂ C F i ( x ) } . Let A i be thepartition ensured by Theorem 2.2 for each coordinate function F i . Using Theo-rem 2.4, let A be a semialgebraic partition that is compatible with each A i . Fixa semialgebraic curve γ . Semialgebraicity implies that for any M ∈ A and a.e. t ∈ (0 ,

1) with γ ( t ) ∈ M , the inclusion ˙ γ ( t ) ∈ T M ( γ ( t )) holds. Consequently, condi-tion 3 holds with respect to all semialgebraic curves. An application of Theorem 3.2completes the proof.The proof of the full equivalence between conditions 1-5 will make use of thefollowing simple linear algebraic fact. emma 3.4. For any sets

A, B ⊂ R n and a subspace V ⊂ R n , the equivalenceholds: A ⊂ B + V ⇐⇒ proj( A, V ⊥ ) ⊂ proj( B, V ⊥ ) . We now have all the ingredients to prove the main the result of the paper.

Theorem 3.5 (Equivalence) . Suppose that F and D are semialgebraic and thatAssumption 1 holds. Then conditions 1-5 are equivalent.Proof. We ﬁrst establish the equivalences 2 ⇔ ⇔ ⇔ ⇒ ⇒ ⇒ ⇒ Implication ⇒ : This was proved in Theorem 3.2.

Implication ⇒

4: Theorems 2.1 and 2.3 yield a partition A of R n into ﬁnitely manysemialgebraic C manifolds such that when restricted to each manifold M ∈ A , thefunction F is C -smooth and the map x gph D ( x, · )is inner semicontinuous. Fix a manifold M ∈ A , a point x ∈ M , unit tangent vector u ∈ T M ( x ), and v ∈ D ( x, u ). Since − u also lies in T M ( x ), we may ﬁnd sequences x i M −→ x and τ i ց τ − i ( x i − x ) → − u . By inner-semicontinuity, there existsequences ( u i , v i ) converging to ( u, v ) and satisfying v i ∈ D ( x i , u i ). The assumedcondition 2 therefore guarantees { } = Limsup i → F ( x i ) − F ( x ) + D ( x i , x − x i ) k x i − x k . Rearranging and using linearity of F ′ ( x, · ) on T M ( x ) we deduce { F ′ ( x, u ) } = {− F ′ ( x, − u ) } = Limsup i →∞ D (cid:18) x i , x − x i k x − x i k (cid:19) . (3.3)Taking into account local Lipschitz continuity of D ( y, · ) uniformly for all y near x ,we deduce that there exists L > z i ∈ D (cid:16) x i , x − x i k x − x i k (cid:17) satisfying (cid:13)(cid:13)(cid:13)(cid:13) z i − v i k u i k (cid:13)(cid:13)(cid:13)(cid:13) ≤ L (cid:13)(cid:13)(cid:13)(cid:13) x − x i k x − x i k − u i k u i k (cid:13)(cid:13)(cid:13)(cid:13) , for all large indices i . Observe that the right side tends to zero. Continuing (3.3),we deduce { F ′ ( x, u ) } = lim i →∞ z i = lim i →∞ v i k u i k = { v } . We have thus proved D ( x, u ) = { F ′ ( x, u ) } for all u ∈ T M ( x ), and therefore 4 holds. Implication ⇒ : Let A be the partition of R n into C semialgebraic manifoldsstipulated by condition 4. Using Theorem 2.2 for each coordinate function F i andreﬁning the partitions using Theorem 2.4, we arrive at a ﬁnite partition A ′ of R n into C semialgebraic manifolds that is compatible with A and satisﬁes { F ′ i ( x, u ) } = h ∂ C F i ( x ) , u i , henever x lies in M ∈ A ′ and u ∈ T M ( x ) is a tangent vector. Since for any suchvector u , equality { F ′ ( x, u ) } = D ( x, u ) holds by 4, we conclude D ( x, u ) = J ( x ) u ∀ u ∈ T M ( x ) . On the other hand, for any u / ∈ T M ( x ), we trivially have J i ( x ) u = h ∂ C F i ( x ) , u i + h N M ( x ) , u i = h ∂ C F i ( x ) , u i + R = R , where J i ( x ) denotes the set of all i ’th rows of matrices in J ( x ). Therefore, in thiscase, the inclusion D ( x, u ) ⊂ J ( x ) u holds trivially. Implication ⇒ : Let A be the partition of R n into C semialgebraic manifoldsstipulated by condition 5. Reﬁning the partition using Theorems 2.2 and 2.4, wemay ensure that the equality holds: { F ′ ( x, u ) } = J ( x ) u ∀ u ∈ T M ( x ) , whenever x lies in some manifold M ∈ A .Consider now any absolutely continuous curve γ : [0 , → R n . It is elementaryto verify that for a.e. t ∈ (0 ,

1) the implication holds (e.g. [10, Lemma 4.13]): γ ( t ) ∈ M = ⇒ ˙ γ ( t ) ∈ T M ( γ ( t )) . (3.4)We conclude that for a.e. t ∈ (0 ,

1) and M ∈ A satisfying γ ( t ) ∈ M , we have (cid:26) ddt ( F ◦ γ )( t ) (cid:27) = { F ′ ( γ ( t ) , ˙ γ ( t )) } = J ( γ ( t )) ˙ γ ( t ) ⊃ D ( γ ( t ) , ˙ γ ( t )) , as claimed.Summarizing, we have proved the equivalence 2 ⇔ ⇔ ⇔

5. Next, ob-serve that 5 holds for a map D ( · , · ) if and only if it holds for the map ˆ D ( x, u ) := − D ( x, − u ). Noting that condition 2 for ˆ D coincides with condition 1 for D com-pletes the proof of the theorem. References [1] J. Bolte, A. Daniilidis, and A. Lewis. Tame functions are semismooth.

Math.Prog. , 117(1-2):5–19, 2009.[2] J. Bolte, A. Daniilidis, A.S. Lewis, and M. Shiota. Clarke subgradients ofstratiﬁable functions.

SIAM J. Optim. , 18(2):556–572, 2007.[3] J. Bolte and E. Pauwels. Conservative set valued ﬁelds, automatic diﬀerentia-tion, stochastic gradient methods and deep learning.

Math. Prog. , pages 1–33,2020.[4] J. Bolte and E. Pauwels. A mathematical model for automatic diﬀerentiationin machine learning. arXiv:2006.02080 , 2020.[5] F.H. Clarke, Yu. Ledyaev, R.I. Stern, and P.R. Wolenski.

Nonsmooth Analysisand Control Theory . Texts in Math. 178, Springer, New York, 1998.

6] M. Coste.

An introduction to o-minimal geometry . RAAG Notes, 81 pages,Institut de Recherche Math´ematiques de Rennes, November 1999.[7] M. Coste.

An Introduction to Semialgebraic Geometry . RAAG Notes, 78 pages,Institut de Recherche Math´ematiques de Rennes, October 2002.[8] A. Daniilidis and D. Drusvyatskiy. Pathological subgradient dynamics.

SIAMJ. Optim. , 30(2):1327–1338, 2020.[9] D. Davis, D. Drusvyatskiy, S. Kakade, and J.D. Lee. Stochastic subgradientmethod converges on tame functions.

Foundations of computational mathemat-ics , 20(1):119–154, 2020.[10] D. Drusvyatskiy, A.D. Ioﬀe, and A.S. Lewis. Curves of descent.

To appear inSIAM J. Control and Optim. , 2015.[11] D. Drusvyatskiy and A.S. Lewis. Semi-algebraic functions have small subdif-ferentials.

Math. Program. , 140(1):5–29, 2012.[12] A.D. Ioﬀe.

Variational analysis of regular mappings . Springer Monographs inMathematics. Springer, Cham, 2017. Theory and applications.[13] D. Klatte and B. Kummer.

Nonsmooth equations in optimization , volume 60of

Nonconvex Optimization and its Applications . Kluwer Academic Publishers,Dordrecht, 2002. Regularity, calculus, methods and applications.[14] A. Lewis and T. Tian. The structure of conservative gradient ﬁelds. arXiv:2101.00699 , 2021.[15] S. Majewski, B. Miasojedow, and E. Moulines. Analysis of nonsmoothstochastic approximation: the diﬀerential inclusion approach. arXiv preprintarXiv:1805.01916 , 2018.[16] R. Miﬄin. Semismooth and semiconvex functions in constrained optimization.

SIAM J. Control and Optim. , 15(6):959–972, 1977.[17] B.S. Mordukhovich.

Variational Analysis and Generalized Diﬀerentiation I:Basic Theory . Grundlehren der mathematischen Wissenschaften, Vol 330,Springer, Berlin, 2006.[18] VI Norkin. Generalized-diﬀerentiable functions.

Cybernetics , 16(1):10–12,1980.[19] VI Norkin. Stochastic generalized-diﬀerentiable functions in the problem ofnonconvex nonsmooth stochastic optimization.

Cybernetics , 22(6):804–809,1986.[20] J.-P. Penot.

Calculus without derivatives , volume 266 of

Graduate Texts inMathematics . Springer, New York, 2013.[21] L. Qi and J. Sun. A nonsmooth version of newton’s method.

Mathematicalprogramming , 58(1-3):353–367, 1993.[22] R.T. Rockafellar and R.J-B. Wets.

Variational Analysis . Grundlehren dermathematischen Wissenschaften, Vol 317, Springer, Berlin, 1998.[23] A. Ruszczy´nski. Convergence of a stochastic subgradient method with averag-ing for nonsmooth nonconvex constrained optimization.

Optimization Letters ,pages 1–11, 2020.

24] A. Shapiro. On concepts of directional diﬀerentiability.

Journal of optimizationtheory and applications , 66(3):477–487, 1990.[25] L. van den Dries and C. Miller. Geometric categories and o-minimal structures.

Duke Math. J. , 84:497–540, 1996., 84:497–540, 1996.