A convex function satisfying the Lojasiewicz inequality but failing the gradient conjecture both at zero and infinity
AA convex function satisfying the (cid:32)Lojasiewicz inequalitybut failing the gradient conjecture both at zero and infinity.
Aris Daniilidis, Mounir Haddou, Olivier Ley
Abstract.
We construct an example of a smooth convex function on the plane with a strictminimum at zero, which is real analytic except at zero, for which Thom’s gradient conjecturefails both at zero and infinity. More precisely, the gradient orbits of the function spiral aroundzero and at infinity. Besides, the function satisfies the (cid:32)Lojasiewicz gradient inequality at zero.
Key words.
Gradient conjecture, gradient conjecture at infinity, Kurdyka-(cid:32)Lojasiewicz inequal-ity, convex function, convergence of secants.
AMS Subject Classification
Primary
Secondary
Answering a question of Whitney, (cid:32)Lojasiewicz [19] showed that every analytic variety f − (0),where f : U ⊂ R N → R is real-analytic ( U (cid:54) = ∅ , open), is a deformation retract of its openneighborhood. The deformation was given by the flow of the Euclidean gradient ∇ f . The mainargument of (cid:32)Lojasiewicz was based on a famous lemma, nowadays known as the (cid:32)Lojasiewicz(gradient) inequality, which asserts that for some ϑ ∈ (0 ,
1) and c > (cid:107)∇ f ( x ) (cid:107) ≥ c | f ( x ) − f ( a ) | ϑ (1.1)for all x sufficiently close to a ∈ f − (0). The above inequality ensures that every boundedgradient orbit t (cid:55)→ γ ( t ) ( i.e. , ˙ γ = ∇ f ( γ )) has finite length and therefore converges to a singularpoint γ ∞ with ∇ f ( γ ∞ ) = 0.Some years later, Thom conjectured that in this case, up to a change of coordinates thatidentifies γ ∞ to 0, the spherical part of the orbit also converges. In other words, the limit ofsecants lim t → + ∞ γ ( t ) − γ ∞ || γ ( t ) − γ ∞ || exists. (1.2)For decades, this has been known as the (Thom) gradient conjecture, see [1, 29]. (For themore general problem of non-oscillation of trajectories, we refer to [4, 12, 24].) The gradientconjecture makes sense for any gradient dynamics for which bounded orbits converge. Partialresults revealed that (1.2) should hold in the real-analytic case, see [13, 18, 27], fact that waseventually published in full generality by Kurdyka, Mostowski and Parusi´nski [16] in 2000. Theproof was based on (1.1) together with concrete analytic estimations.(cid:32)Lojasiewicz showed that the gradient inequality (1.1) remains valid also for C semialgebraic(respectively, globally subabalytic) functions, see [20]. In 1998, Kurdyka generalized (1.1) for C functions that are definable in some o-minimal structure , an axiomatic definition due to vanden Dries [30, 31] which encompasses semialgebraic and globally subanalytic functions, but alsolarger classes that include the exponential function [23]. More precisely, Kurdyka showed that1 a r X i v : . [ m a t h . D S ] F e b or every definable function f and critical value r ∞ (which is necessarily isolated) there exists δ > r ∞ , r ∞ + δ ) → R which is C on ( r ∞ , r ∞ + δ ) with Ψ (cid:48) > ||∇ (Ψ ◦ f )( x ) || ≥ x ∈ R N such that r ∞ < f ( x ) < r ∞ + δ . In addition, Kurdyka’s proof showed that thefunction Ψ can be taken in the same o-minimal structure as f . Consequently, if f is semialgebraicor globally subanalytic, then so is Ψ and thanks to Puiseux’s theorem we may take Ψ( r ) = r − ϑ ,for ϑ ∈ (0 , c = (1 − ϑ ) − . We refer to (1.3) as the Kurdyka-(cid:32)Lojasiewicz (in short, K(cid:32)L) inequality and we call K(cid:32)L-function any function with (upper) isolated critical values that satisfies the K(cid:32)L-inequality aroundany of them. Similarly to the gradient inequality (1.1), bounded gradient orbits of a K(cid:32)L-functionhave finite length. There are well-known examples of C ∞ functions in R with isolated criticalvalues that are not K(cid:32)L-functions (they have bounded gradient orbits which fail to converge),see [10, 25]. Bounded gradient orbits of convex functions have finite length [7, 22] and thereforeconverge, but there are also examples of C -smooth convex functions failing K(cid:32)L-property, see [3, § § C o-minimal functions provided either N = 2 (planar case) or the structure is polynomiallybounded (in particular if f is semialgebraic or globally subanalytic). On the other hand, mereconvexity is not sufficient to guarantee (1.2): there exist examples of convex functions whoseorbits either spiral [8, § f isa C semialgebraic function and t (cid:55)→ γ ( t ) is a gradient orbit satisfying || γ ( t ) || → ∞ , as t → + ∞ , then the limit of secants at infinitylim t → + ∞ γ ( t ) || γ ( t ) || exists (gradient conjecture at infinity). (1.4)The proof is based on a (cid:32)Lojasiewicz type gradient inequality at infinity previously obtained bythe author together with D’Acunto in [6].The behavior of secants at infinity has recently become relevant in Machine Learning. If adeep network model is unbiased and homogeneous (max-pooling, ReLu, linear and convolutionallayers), then minimizing the cross-entropy or other classification losses forces the parametersof the model to diverge in norm to infinity [21]. In this setting, convergence of the secantsat infinity is important. In [14] the authors manage to establish that for a certain type ofprediction functions ( L -homogeneous and definable in the log-exp structure) (1.4) holds. Forthe time being, no further results have been reported.In a nutshell, proving the gradient conjecture (respectively, the gradient conjecture at infin-ity) seems to require at least the K(cid:32)L-inequality (1.3) together with other properties of o-minimalfunctions, but it is still unknown if these conjectures are true for general o-minimal functions.In this work we present an example of a smooth convex function in R , which is real-analyticoutside zero (its unique critical point), it satisfies the (cid:32)Lojasiewicz inequality (1.1) and fails thegradient conjecture both at zero and at infinity. In particular, all gradient orbits spiral both2t zero and at infinity, underlying in this way the two failures of o-minimality of the function,despite the fact that the function is convex and satisfies the (cid:32)Lojasiewicz gradient inequality. Theorem 1.1 (main result) . For every k ∈ N , there exists a C k -convex function f : R → R with a unique minimum at O := (0 , such that:- f is real analytic on R \ {O} ;- f satisfies the (cid:32)Lojasiewicz inequality at O and- every gradient orbit γ : ( −∞ , T ) → R of f spirals infinitely many times both when t → T and t → −∞ . Throughout the manuscript, by gradient orbits (or gradient trajectories) we refer to maximalsolutions of the ordinary differential equation: γ (cid:48) ( t ) = ∇ f ( γ ( t )) . In our example, the function f will be convex, with unique critical point (global minimizer)at O , where we tacitly assume that γ (0) (cid:54) = O (avoiding stationary orbits).Let us briefly describe our strategy for the construction of this example: in Section 2 weprescribe a family of convex sets, all being delimited by ellipses, centered at the origin, andobtained via rotations and size adjustments of a basic ellipse E (0). This is done in a way thatconvex foliation is obtained, which can be represented by some (quasiconvex) function.In Section 3, we further calibrate the parameters so that we can apply a criterium due tode Finetti [9] and Crouzeix [5] that guarantees that the aforementioned quasiconvex functionis in fact convex. The construction yields that the function is real-analytic on R \ O , whichof course cannot be further improved to real analycity on the whole space, due to the proof ofThom’s gradient conjecture [16]. Instead, we are able to show that the function can be taken C k -smooth at O for arbitrary large k ∈ N . Still our construction fails to ensure C ∞ . Finally,applying a result of [3] which gives conditions for a convex function to satisfy (1.3), we showthat our function satisfies K(cid:32)L-inequality and in fact even (1.1) (the (cid:32)Lojasiewicz inequality).Gradient orbits are perpendicular to the foliation and explicit calculations, conducted inSection 4, show that the orbits turn around both at the origin and at infinity, which disprovesthe conjecture. An additional difficulty to establish spirality is that the evolution of the sphericalpart of the orbit (the rotation angle α ( t ) of γ ( t ) in polar coordinates) is not monotone in time,so that the decrease rate is established in average, see Figure 3 and Figure 4. For a study ofmonotonic spiraling of orbits of general analytic vector fields in dimensions 2 and 3, we referto [28]. R \ {O} . Let us first consider two smooth increasing functions a, b : R → (0 , + ∞ ) for which we assume: lim t → + ∞ a ( t ) = lim t → + ∞ b ( t ) = + ∞ lim t →−∞ a ( t ) = lim t →−∞ b ( t ) = 0 and a ( t ) ≥ b ( t ), for all t ∈ R . (2.1)3he exact definition of the functions a ( t ) and b ( t ) will be given in Lemma 3.1 (Section 3). Wealso consider the rotation matrix by an angle t denoted by: R ( t ) = (cid:18) cos t − sin t sin t cos t (cid:19) (2.2)For t ∈ R and θ ∈ T := R / π Z we set m ( t, θ ) := ( x ( t, θ ) , y ( t, θ )) = ( a ( t ) cos θ, b ( t ) sin θ ) , and M ( t, θ ) := R ( t ) m ( t, θ ) = ( X ( t, θ ) , Y ( t, θ )) . (2.3)Therefore (cid:40) X ( t, θ ) = x ( t, θ ) cos t − y ( t, θ ) sin t = a ( t ) cos t cos θ − b ( t ) sin t sin θY ( t, θ ) = x ( t, θ ) sin t + y ( t, θ ) cos t = a ( t ) sin t cos θ + b ( t ) cos t sin θ . (2.4)The subset E ( t ) := { M ( t, θ ) : θ ∈ T } (2.5)is an ellipse with major axis of length a ( t ) and minor axis of length b ( t ) (see Figure 1 forillustration). Notice that E ( t ) is the rotation by angle t of the ellipse E ( t ) := (cid:8) m ( t, θ ) : θ ∈ T (cid:9) = (cid:26) ( x, y ) ∈ R : x a ( t ) + y b ( t ) = 1 (cid:27) . Under an additional condition on the functions a, b , the family of ellipses {E ( t ) } t ∈ R definedin (2.5) is disjoint with union equal to R \{O} . More precisely, denoting by a (cid:48) , b (cid:48) the derivativesof the functions a , b respectively, we have the following result: Lemma 2.1 (Convex foliation by ellipses) . Let a, b : R → (0 , + ∞ ) satisfy (2.1) and assume a ( t ) b ( t ) a (cid:48) ( t ) b (cid:48) ( t ) > ( a ( t ) − b ( t ) ) , for all t ∈ R . (2.6) Then ( E ( t )) t ∈ R defines an analytic convex foliation of R \ {O} .Proof. The proof is divided in three steps:
Step 1.
The map M : R × T → R \ {O} is a local analytic diffeomorphism.Indeed, let us first notice that the map M , defined by (2.3)–(2.4), is real-analytic as compositionof analytic functions. Therefore, if we show that the Jacobian matrix J M = (cid:32) ∂X∂t ∂X∂θ∂Y∂t ∂Y∂θ (cid:33) is invertible at each point ( t, θ ) ∈ R × T , the assertion follows from the local analytic inversefunction theorem [15, Theorem 2.5.1]. To this end, we shall prove thatdet( J M ) = ∂X∂t ∂Y∂θ − ∂Y∂t ∂X∂θ = (cid:10) ∂M∂t , n (cid:11) > , (2.7)4igure 1: The ellipse E ( t ) and the map ( t, θ ) (cid:55)→ M ( t, θ ) O m (0 ,θ ) m ( t, θ ) M ( t, θ ) a (0) b (0) a ( t ) b ( t ) φφ t t xy E ( t ) E ( t )where n ( t, θ ) = − R ( π ) ∂M∂θ = ( ∂Y∂θ , − ∂X∂θ ) is the outer unit normal to the convex set conv E ( t )(convex envelope of E ( t )) at M ( t, θ ). Recalling that M ( t, θ ) = R ( t ) m ( t, θ ) (see (2.3)) and thatthe rotation matrix (2.2) satisfies R (cid:48) ( t ) = R ( t + π , R ( t ) − = R ( t ) T = R ( − t ) and R ( t ) R ( s ) = R ( t + s ) , we deduce (cid:10) ∂M∂t , n (cid:11) = (cid:10) ∂∂t ( R ( t ) m ) , − R ( π ∂∂θ ( R ( t ) m ) (cid:11) = (cid:10) R (cid:48) ( t ) m + R ( t ) ∂m∂t , − R ( π R ( t ) ∂m∂θ (cid:11) = (cid:10) R ( t + π m + R ( t ) ∂m∂t , R ( t − π ∂m∂θ (cid:11) = (cid:10) R ( t − π T R ( t + π m, ∂m∂θ (cid:11) + (cid:10) R ( t − π T R ( t ) ∂m∂t , ∂m∂θ (cid:11) = − (cid:10) m, ∂m∂θ (cid:11) + (cid:10) R ( π ∂m∂t , ∂m∂θ (cid:11) . Plugging ∂m∂θ = ( − a sin θ, b cos θ ) and ∂m∂t = ( a (cid:48) cos θ, b (cid:48) sin θ )into the above equality, we end up with the expression:det( J M ) = (cid:104) ∂M∂t , n (cid:105) = a (cid:48) b cos θ + ab (cid:48) sin θ + ( a − b ) cos θ sin θ. (2.8)5his is a quadratic expression with respect to cos θ and sin θ , which is positive for all θ ∈ T ifand only if the discriminant ( a − b ) − aa (cid:48) bb (cid:48) is negative. The result follows in view of (2.6). Step 2.
The map M : R × T → R \ {O} is injective.Fix t ∈ R . From (2.7)–(2.8), using compactness of E ( t ) and smoothness of M , we deduce theexistence of δ t , ρ t > s ∈ [ t, t + δ t ], θ ∈ T , (cid:10) ∂M∂t ( s, θ ) , n ( t, θ ) (cid:11) ≥ ρ t > , which yields (cid:10) M ( s, θ ) − M ( t, θ ) , n ( t, θ ) (cid:11) ≥ ρ t ( s − t ) > , for t < s ≤ t + δ t and θ ∈ T .It follows that conv E ( t ) ⊂ int conv E ( s ) for all s > t . Therefore, the family (conv E ( t )) t ∈ R isnested and the map M is injective. Step 3.
The map M : R × T → R \ {O} is surjective.Fix ( x, y ) ∈ R \ {O} and set, for t ∈ R and D ( t ) = (cid:18) a ( t ) 00 b ( t ) (cid:19) , ρ ( t ) := || D ( t ) − R ( t ) − ( x, y ) || = 1 a ( t ) ( x cos t + y sin t ) + 1 b ( t ) ( − x sin t + y cos t ) . We claim that ρ is a smooth decreasing function with lim −∞ ρ = + ∞ and lim + ∞ ρ = 0.Indeed, since ( x, y ) (cid:54) = (0 , R ( t ) − ( x, y ) (cid:54) = (0 ,
0) and either x cos t + y sin t (cid:54) = 0 or − x sin t + y cos t (cid:54) = 0. Recalling that a ( t ) , b ( t ) → t → −∞ , we deduce lim −∞ ρ = + ∞ . We alsoobserve that lim + ∞ ρ = 0 is a direct consequence of the fact a ( t ) , b ( t ) → + ∞ as t → + ∞ .It remains to prove that ρ (cid:48) is negative. To this end, set q ( t ) := x cos t + y sin t and notice that ρ = a − q + b − ( q (cid:48) ) . Using that q (cid:48)(cid:48) = − q , we infer ρ (cid:48) ( t ) = − a (cid:48) a − q + 2 a − q (cid:48) q − b (cid:48) b − ( q (cid:48) ) + 2 b − q (cid:48)(cid:48) q (cid:48) = − a − b − (cid:0) a (cid:48) a − b q + ( a − b ) qq (cid:48) + b (cid:48) b − a ( q (cid:48) ) (cid:1) . The quadratic expression a (cid:48) a − b q +( a − b ) qq (cid:48) + b (cid:48) b − a ( q (cid:48) ) with respect to q and q (cid:48) is positiveif and only if its discriminant is negative, which is equivalent, once again, to assume (2.6). Thus ρ is strictly decreasing and the claim follows.Using the claim, we infer that there exists a unique t ∈ R such that ρ ( t ) = || D ( t ) − R ( t ) − ( x, y ) || = 1 . Therefore, there exists a unique θ ∈ T such that D ( t ) − R ( t ) − ( x, y ) = (cos θ, sin θ ). It followsthat M ( t, θ ) = ( x, y ), which proves that M is onto.A typical instance where Lemma 2.1 applies is to take a = µb for some constant µ > b ( t ) = e νt with ν > µ − µ , it is straightforward to check that a, b satisfy (2.1) and (2.6).Figure 2 represents the explicit choice µ = 2 and ν = 1 leading to a ( t ) = 2 e t and b ( t ) = e t .6 Figure 2: The convex foliation ( E ( t )) t ∈ R for a ( t ) = 2 b ( t ) = 2 e t . In this section we shall show that for a more precise choice of the functions a ( t ) , b ( t ) we canconstruct a convex function whose level sets are exactly the foliation {E ( t ) } t ∈ R . Moreover, weshall show that this convex function is smooth, real-analytic on R \ {O} and satisfies (1.1).Concretely, let us denote by ϕ : R → R a smooth strictly increasing function (the concretedefinition will be given in (3.2), see Lemma 3.1) and let us set for all M ∈ R f ( M ) = (cid:40) , if M = (0 , ϕ ( t ) , if M ∈ E ( t ) , (3.1)where E ( t ) is the ellipse given in (2.5). We shall now show that we can adjust the parametersand choose ϕ in a way that (3.1) gives a well-defined convex function. Lemma 3.1 (Construction of the convex function) . Setting for t ∈ R a ( t ) = √ t ) , b ( t ) = exp( t ) in (2.4) , ϕ ( t ) = exp( t/τ ) , τ ∈ (0 , ) , in (3.1) , (3.2) the function f defined by (3.1) is convex, with level sets the ellipses E ( t ) and argmin f = {O} .Proof. Since the functions a, b satisfy (2.1) and (2.6), we deduce by Lemma 2.1 that conv( E ( t )) t ∈ R is a convex foliation. In particular, the function f is well defined from (3.1) with sublevel sets[ f ≤ λ ] := { M ∈ R : f ( M ) ≤ λ } = conv [ E ( ϕ − ( λ ))]7ompact and convex. Therefore f is a coercive, quasiconvex function.We shall now use a result due to de Finetti and Crouzeix [5,9] which asserts that the quasiconvexfunction f is convex if and only if λ (cid:55)→ σ [ f ≤ λ ] ( p ) is concave for every p ∈ R , where σ A ( p ) = max M ∈ A (cid:104) p, M (cid:105) is the support function to the subset A . Without loss of generality,we may restrict to unit vectorss p ∈ R , which results in assuming that p = (cos α, sin α ), forsome α ∈ T . Therefore, we are led to prove that the function G α ( λ ) := sup (cid:110)(cid:68) ( x, y ) , (cos α, sin α ) (cid:69) : f ( x, y ) ≤ λ (cid:111) = sup (cid:110)(cid:68) M ( t, θ ) , (cos α, sin α ) (cid:69) : f ( M ( t, θ )) = ϕ ( t ) ≤ λ (cid:111) = max (cid:110)(cid:68) M ( t, θ ) , (cos α, sin α ) (cid:69) : θ ∈ T , t = t ( λ ) = ϕ − ( λ ) (cid:111) is concave. To this, end, after straightforward calculations we obtain (cid:68) M ( t, θ ) , (cos α, sin α ) (cid:69) = (cid:68) R ( t ) m ( t, θ ) , (cos α, sin α ) (cid:69) = (cid:68) ( a ( t ) cos θ, b ( t ) sin θ ) , R ( − t ) (cos α, sin α ) (cid:69) = (cid:68) (cos θ, sin θ ) , ( a ( t ) cos( α − t ) , b ( t ) sin( α − t )) (cid:69) whence we deduce G α ( λ ) = (cid:13)(cid:13)(cid:13) a ( t ( λ )) cos( α − t ( λ )) , b ( t ( λ )) sin( α − t ( λ )) (cid:13)(cid:13)(cid:13) = (cid:112) g α ( λ ) (3.3)with g α ( λ ) = a ( t ( λ )) cos ( t ( λ ) − α ) + b ( t ( λ )) sin ( t ( λ ) − α ) . (3.4)Calculating the second derivative of G α in (3.3) yields G (cid:48)(cid:48) α = 2 g (cid:48)(cid:48) α g α − ( g (cid:48) α ) g / α . Therefore, the functions { G α } α ∈ T are concave provided we establish:2 g (cid:48)(cid:48) α g α − ( g (cid:48) α ) ≤ , for all α ∈ T . (3.5)At this step, we replace in (3.4) the choice for a , b and ϕ given in (3.2): a ( t ) = √ e t , b ( t ) = e t and λ = ϕ ( t ) = e t/τ , for all t ∈ R , and we seek for the values of τ > t := t ( λ ) = τ log λ, whence t (cid:48) ( λ ) = τλ and t (cid:48)(cid:48) ( λ ) = − τλ < . g α = e t (cid:0) cos ( t − α ) + 1 (cid:1) , g (cid:48) α = 2 e t t (cid:48) (cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1) and g (cid:48)(cid:48) α = 2 e t (cid:16) ( t (cid:48) ) (cid:0) − t − α ) sin( t − α ) (cid:1) + t (cid:48)(cid:48) (cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1) (cid:17) . Hence2 g (cid:48)(cid:48) α g α − ( g (cid:48) α ) == 4 e t ( t (cid:48) ) (cid:110)(cid:0) cos ( t − α ) + 1 (cid:1)(cid:0) − t − α ) sin( t − α ) (cid:1) − (cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1) + 4 e t t (cid:48)(cid:48) (cid:0) cos ( t − α ) + 1 (cid:1)(cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1)(cid:111) ≤ e t (cid:16) t (cid:48) ) + 12 t (cid:48)(cid:48) (cid:17) ≤ τ (10 τ − e t λ , which is negative provided we choose τ < / M : R × T (cid:55)→ R \{O} under the choice made in Lemma 3.1, that is, M ( t, θ ) = ( X ( t, θ ) , Y ( t, θ )) = e t (cid:16) √ t cos θ − sin t sin θ, √ t cos θ + cos t sin θ (cid:17) . (3.6)Setting (cid:40) ˜ f : R × T (cid:55)→ R ˜ f ( t, θ ) = ϕ ( t ) = exp( t/τ ) (3.7)we observe that the convex function f defined in (3.1) satisfies: f ( x, y ) = (cid:40) ( ˜ f ◦ M − )( x, y ) , if ( x, y ) (cid:54) = O , , if ( x, y ) = O . (3.8)With the next couple of lemmas we show that the function f , apart from being convex, enjoysseveral other good properties. Lemma 3.2 (Properties of the convex function) . Let f : R (cid:55)→ [0 , + ∞ ) be the convex functiondefined by (3.6) – (3.8) for < τ < / . Then (i). f is strictly positive on R \ {O} with f ( O ) = 0 . (ii). For all ( x, y ) ∈ R , it holds (cid:16) / √ (cid:17) /τ (cid:13)(cid:13) ( x, y ) (cid:13)(cid:13) /τ ≤ f ( x, y ) ≤ (cid:13)(cid:13) ( x, y ) (cid:13)(cid:13) /τ . (3.9) In particular, f is coercive. (iii). f is real analytic on R \ {O} and f ∈ C ( R ) . f satisfies the (cid:32)Lojasiewicz inequality (1.1) with ϑ = 1 − τ , c = τ / √ , a ≡ O and f ( O ) = 0 ,that is (cid:107)∇ f ( x, y ) (cid:107) ≥ (cid:18) τ √ (cid:19) f ( x, y ) − τ , for all ( x, y ) ∈ R . (3.10) Proof. (i). It is straightforward from the definition of f in (3.1) and the choice of ϕ .(ii). From Lemma 2.1, for every ( x, y ) ∈ R \ {O} , there exists a unique t ∈ R such that( x, y ) ∈ E ( t ) and we have x + y a ( t ) ≤ a ( t ) ( x cos t + y sin t ) + 1 b ( t ) ( − x sin t + y cos t ) = 1 ≤ x + y b ( t ) , whence e t = b ( t ) ≤ (cid:107) ( x, y ) (cid:107) ≤ a ( t ) = √ e t . We deduce easily that2 − / (2 τ ) (cid:107) ( x, y ) (cid:107) /τ ≤ f ( x, y ) = ϕ ( t ) = e t/τ ≤ (cid:107) ( x, y ) (cid:107) /τ . (iii). It follows from (3.1) that f = ϕ ◦ p ◦ M − on R \ {O} , where p : R × T (cid:55)→ R with p ( t, θ ) = t . By Lemma 2.1, the map M : R × T (cid:55)→ R \ {O} given in (3.6) is a real analyticdiffeomorphism. Since p and ϕ are analytic, the first part of the assertion follows. In particular,the function f is C ∞ -smooth on R \ {O} .Since 1 /τ >
1, the function ( x, y ) (cid:55)→ (cid:107) ( x, y ) (cid:107) /τ is C over R and (3.9) yields that f is differ-entiable at O with ∇ f ( O ) = 0. Therefore f is differentiable everywhere in R and, since it isconvex, it is C (see for instance, [26, p. 20]).(iv) Since S := argmin f = {O} , we have dist S ( M ) = (cid:107) M (cid:107) for all M = ( x, y ) ∈ R . Therefore,the first inequality in (3.9) can be written f ( M ) ≥ m (dist S ( M )) for all M ∈ R , where m ( r ) = 2 − / (2 τ ) r /τ . Since m − ( s ) s = √ s τ − ∈ L ((0 , + ∞ )) , we deduce from [3, Theorem 30] that the K(cid:32)L-inequality (cid:107)∇ ( ψ ◦ f )( M ) (cid:107) ≥ , holds for all M ∈ [ f >
0] := R \ {O} , where ψ ( s ) = (cid:90) s m − ( σ ) σ dσ = √ τ s τ . A straightforward calculation shows that (3.10) holds.10 emma 3.3 ( C k -smoothness of the convex function) . Let f be the convex function definedby (3.7) – (3.8) for < τ < / . Let k ∈ N be the biggest integer such that k < τ . Then f ∈ C k ( R ) and f (cid:54)∈ C k +1 ( R ) .Proof. Recalling that f is real analytic in R \ {O} with f ( O ) = 0 and ∇ f ( O ) = 0, in order toprove that f is C k , it is sufficient to show that all the partial derivatives ∂ l + l f∂x l ∂y l , l + l ≤ k, (3.11)which exist in R \ {O} , converge to 0 at O . To this end, it is more convenient to start bycomputating the partial derivatives of ˜ f defined in (3.7). We have˜ f ( t, θ ) := f ( M ( t, θ )) = e t/τ = f ( x, y ) for ( x, y ) = M ( t, θ ) = ( X ( t, θ ) , Y ( t, θ )),and by differentiation, we obtain (cid:32) ∂ ˜ f∂t∂ ˜ f∂θ (cid:33) = (cid:18) τ e t/τ (cid:19) = (cid:32) ∂X∂t ∂Y∂t∂X∂θ ∂Y∂θ (cid:33) (cid:32) ∂f∂x∂f∂y (cid:33) (3.12)We can compute explicitely the partial derivatives of X and Y , see (3.6), to obtain ∂X∂t , ∂Y∂t , ∂X∂θ , ∂Y∂θ = e t P ( t, θ ) , where P ( t, θ ) denotes generically a smooth periodic (hence bounded) function with respect to t and θ . More generally, in what follows, P n,m ( t, θ ) (respectively B n,m ( t, θ )) denotes a n × m matrix, the coefficients of which are smooth and periodic with respect to t and θ (respectivelybounded in ( −∞ , × R ). It follows that (cid:32) ∂f∂x∂f∂y (cid:33) = 1 ∂X∂t ∂Y∂θ − ∂Y∂t ∂X∂θ (cid:32) ∂Y∂θ − ∂Y∂t − ∂X∂θ ∂X∂t (cid:33) (cid:18) τ e t/τ (cid:19) Since 0 < e t ( √ −
12 ) ≤ ∂X∂t ∂Y∂θ − ∂Y∂t ∂X∂θ = e t ( √ θ sin θ ) ≤ e t ( √ , we obtain (cid:32) ∂f∂x∂f∂y (cid:33) = e ( τ − t P , ( t, θ ) , (3.13)from which we infer that ∂f∂x , ∂f∂y → x, y ) → O or equivalently as t → −∞ , since τ >
1. Wethen recover the fact that f is C , with ∇ f ( O ) = (0 , f is C (when τ > ∂ ˜ f∂t ∂ ˜ f∂t∂θ∂ ˜ f∂θ = τ e t/τ = e t P , ( t, θ ) ∂ f∂x ∂ f∂x∂y∂ f∂y + e t P , ( t, θ ) (cid:32) ∂f∂x∂f∂y (cid:33) , (3.14)11here the coefficients of e t P , ( t, θ ) are of the form Z Z , with Z , Z ∈ D := (cid:110) ∂X∂t , ∂Y∂t , ∂X∂θ , ∂Y∂θ (cid:111) and the coefficients of e t P , ( t, θ ) are second derivatives of X , Y . The matrix P , ( t, θ ) isinvertible since ( t, θ ) ∈ R × T (cid:55)→ M ( t, θ ) := ( x, y ) ∈ R \ {O} is an analytic diffeomorphism.Finally, we get ∂ f∂x ∂ f∂x∂y∂ f∂y = e ( τ − t P , ( t, θ ) + e ( τ − t B , ( t, θ ) , which proves that the second derivatives of f converge to 0 as ( x, y ) → O if τ >
2. Therefore f is C with ∇ f ( O ) = 0 × .Continuing along the same lines, when differentiating l times, the invertible matrix in frontof the l -th order derivatives of f has coefficients of the form Z Z · · · Z l with Z , · · · , Z l ∈ D and, after tedious computations, we obtain ∂ l f∂x l ... ∂ l f∂x l − i ∂y i ... ∂ l f∂y l = e ( τ − l ) t P l +1 , ( t, θ ) + e ( τ − ( l − t B l +1 , ( t, θ ) , (3.15)which converges to 0 as ( x, y ) → O as long as τ > l . Therefore f is C l and all the l -th orderderivatives of f are zero at O and we conclude that f ∈ C k ( R ), where k is the biggest integersuch that k < τ .Let us now assume, towards a contradiction, that f is C k +1 . Then we can write a Taylorexpansion of f up to the order k + 1 at O . Since ∇ l f ( O ) = 0 for l ≤ k , we obtain that f ( x, y ) = O ( || ( x, y ) || k +1 ) in a neighborhood of O , (3.16)where O ( r k +1 ) /r k +1 is bounded near 0. If τ (cid:54)∈ N , then k +1 > τ , and we obtain a straightforwardcontradiction with the first inequality in (3.9). If now k + 1 = τ ∈ N , then (3.16) is not anymorecontradictory with (3.9). But writing (3.15) with l = k + 1, we get ∂ k +1 f∂x k +1 ... ∂ k +1 f∂y k +1 = P k +2 , ( t, θ ) + e t B k +2 , ( t, θ ) . The second term above converges to zero as t → −∞ , or equivalently as ( x, y ) → O , but P k +2 , ( t, θ ) is a periodic nonconstant matrix with respect to t and θ so cannot converge as t → −∞ , contradicting our assumption. This ends the proof.12 Oscillating gradient trajectories
Suppose that f is the convex function defined in the previous section, see (3.1)–(3.2), andconsider the ordinary differential equation for the gradient orbits: (cid:40) γ (cid:48) ( t ) = ∇ ( f ( γ ( t ))) , t ∈ R ,γ (0) = γ ∈ R \ {O} . (4.1)Since f is convex analytic in R \ {O} and coercive with a unique minimum at O , there existsa unique maximal solution γ in ( −∞ , T ), T ≤ + ∞ , withlim t → T (cid:107) γ ( t ) (cid:107) = + ∞ and lim t →−∞ γ ( t ) = O . In fact, finding gradient orbits is a geometric problem. We seek the unique curve γ passingthrough γ , which is orthogonal to the level sets of f . It is convenient to parametrize γ as γ ( s ) = M ( t ( s ) , θ ( s )) = ( X ( t ( s ) , θ ( s )) , Y ( t ( s ) , θ ( s ))) , s ∈ R (4.2)using the notations (2.3)–(2.4). Under this parametrization γ ( s ) ∈ E ( t ( s )), for every s ∈ R and γ (cid:48) ( s ) is a normal vector at γ ( s ) to the (convex) sublevel set [ f ≤ f ( γ ( s ))] = conv E ( t ( s )).Therefore: γ (cid:48) ( s ) ⊥ ∂ θ M ( t ( s ) , θ ( s )) , for all s ∈ R . (4.3)We define the rotation angle s (cid:55)→ α ( s ) as the angle between the x -axis and the secant γ ( s ) (cid:107) γ ( s ) (cid:107) (spherical part of the orbit) varying in a continuous way. Therefore cos α ( s ) = X ( t,θ ) √ X ( t,θ ) + Y ( t,θ ) , sin α ( s ) = Y ( t,θ ) √ X ( t,θ ) + Y ( t,θ ) . In particular, according to the notation used in (2.3)–(2.5), if φ ( s ) is the angle in polar coordi-nates of the point m ( t, θ ), then we have (see Figure 1): α ( s ) = t ( s ) + φ ( s ) , for all s ∈ R . Lemma 4.1 (Spiraling around the origin) . Let f be the convex function defined in (3.1) underthe assumption (3.2) and let s (cid:55)→ γ ( s ) be a maximal orbit of the convex foliation ( E ( t )) t ∈ R . Thenthe rotation angle s (cid:55)→ α ( s ) satisfies lim s →±∞ α ( s ) = ±∞ . (4.4)See Figure 3 for a generic numerical simulation of the maximal orbit of the function f associatedwith the convex foliation of Figure 2. 13igure 3: Gradient orbit γ ( s ) with initial point γ (0) = (2 , Proof.
We use the parametrization given by (4.2). Sincelim s → + ∞ (cid:107) γ ( s ) (cid:107) = + ∞ and lim t →−∞ γ ( s ) = O , we can assume that the function s (cid:55)→ t ( s ) satisfies t (cid:48) ( s ) > s →±∞ t ( s ) = ±∞ . (4.5)The goal is to compute α ( s ) using the orthogonality condition (4.3), which is equivalent to (cid:10) γ (cid:48) ( s ) , ∂ θ M ( t ( s ) , θ ( s )) (cid:11) = 0 , for all s ∈ R . (4.6)Using the notations of Section 2, we have γ (cid:48) ( s ) = dds M ( t ( s ) , θ ( s )) = t (cid:48) ∂ t ( Rm ) + θ (cid:48) ∂ θ ( Rm ) = t (cid:48) ( R (cid:48) m + R∂ t m ) + θ (cid:48) R∂ θ m and ∂ θ M = ∂ θ ( Rm ) = R∂ θ m . It follows (cid:10) γ (cid:48) ( s ) , ∂ θ M (cid:11) = t (cid:48) (cid:10) R (cid:48) m, R∂ θ m (cid:11) + t (cid:48) (cid:10) R∂ t m, R∂ θ m (cid:11) + θ (cid:48) (cid:10) R∂ θ m, R∂ θ m (cid:11) = t (cid:48) (cid:10) R ( π m, ∂ θ m (cid:11) + t (cid:48) (cid:10) ∂ t m, ∂ θ m (cid:11) + θ (cid:48) (cid:107) ∂ θ m (cid:107) = t (cid:48) (cid:0) ab + ( bb (cid:48) − aa (cid:48) ) cos θ sin θ (cid:1) + θ (cid:48) (cid:0) a sin θ + b cos θ (cid:1) . By (4.3), we have (cid:10) γ (cid:48) ( s ) , ∂ θ M (cid:11) = 0 and after substitution a ( t ) = √ e t and b ( t ) = e t we get t (cid:48) e t ( √ − cos θ sin θ ) + θ (cid:48) e t (1 + sin θ ) = 0whence we deduce the following relation between t ( s ) and θ ( s ): t (cid:48) ( s ) = − θ ( s ) √ − cos θ ( s ) sin θ ( s ) θ (cid:48) ( s ) . (4.7)Since for every θ ∈ R we have0 < √ ≤ θ √ − cos θ sin θ ≤ √ − ,
14e get − √ θ (cid:48) ( s ) ≤ t (cid:48) ( s ) ≤ − √ − θ (cid:48) ( s ) . Therefore, from (4.5) we deduce θ (cid:48) ( s ) < , θ ( s ) → s →−∞ + ∞ , θ ( s ) → s → + ∞ −∞ . (4.8)Next, we establish the relation between θ ( s ) and φ ( s ), see Figure 1. We havecos φ = a cos θ (cid:112) a cos θ + b sin θ = √ θ (cid:112) θ + sin θ , sin φ = b sin θ (cid:112) a cos θ + b sin θ = sin θ (cid:112) θ + sin θ . Differentiating cos φ and plugging the result in the second expression, we end up with φ (cid:48) = √
21 + cos θ θ (cid:48) . (4.9)Assembling (4.7) and (4.9), we obtain α (cid:48) = t (cid:48) + φ (cid:48) = (cid:32) √
21 + cos θ − θ √ − cos θ sin θ (cid:33) θ (cid:48) =: h ( θ ) θ (cid:48) , (4.10)The function h is analytic and 2 π -periodic, see Figure 4. We can expand it in Fourier seriesand integrate (4.10) to obtain α ( s ) = a θ ( s ) + O (1) , (4.11)where O (1) is a bounded function and a = 1 π (cid:90) π h ( θ ) dθ (cid:39) − . < . We finally conclude from (4.11) and (4.5) that (4.4) holds.
Consider the convex foliation by ellipses {E ( t ) } t ∈ R given by Lemma 2.1. Let k ≥ f be the convex function defined by Lemma 3.1 for 0 < τ < min { / , /k } . Then, byLemma 3.2, the function f is coercive, has its unique minimum at the origin O , is real analyticin R \ {O} and satisfies the (cid:32)Lojasiewicz inequality (1.1). Further, Lemma 3.3, ensures that f is C k -smooth. Finally, Lemma 4.1 asserts that all nontrivial gradient orbits spiral infinitely manytimes both near the origin (bounded part) and at infinity. (cid:3) Acknowledgement.
This work was partially supported by the Centre Henri Lebesgue ANR-11-LABX-0020-01 and the grants CMM AFB170001, ECOS-Sud/ANID C18E04 and FONDECYT1211217. Major part of this work has been done during a research visit of the first author toINSA Rennes. This author is indebted to his hosts for hospitality.15igure 4: Plot of h ( θ ) = √
21 + cos θ − θ √ − cos θ sin θ . References [1] V. I. Arnold. Some open problems in the theory of singularities. In
Singularities, Part 1(Arcata, Calif., 1981) , volume 40 of
Proc. Sympos. Pure Math. , pages 57–69. Amer. Math.Soc., Providence, R.I., 1983. Translated from the Russian.[2] J. Bolte and E. Pauwels. Curiosities and counterexamples in smooth convex optimization.
TSE Working Paper, n. 20-1080 , 2020.[3] J. Bolte, A. Daniilidis, O. Ley, and L. Mazet. Characterizations of Lojasiewicz inequalities:subgradient flows, talweg, convexity.
Trans. Amer. Math. Soc. , 362(6):3319–3363, 2010.[4] F. Cano, R. Moussu, F. Sanz. Nonoscillating projections for trajectories of vector fields.
J.Dyn. Control Syst.
Math. Oper. Res. ,5(1):120–125, 1980.[6] D. D’Acunto and V. Grandjean. On gradient at infinity of semialgebraic functions.
Ann.Polon. Math. , 87:39–49, 2005.[7] A. Daniilidis, G. David, E. Durand-Cartagena, and A. Lemenant. Rectifiability of self-contracted curves in the Euclidean space and applications.
J. Geom. Anal. , 25(2):1211–1239, 2015. 168] A. Daniilidis, O. Ley, and S. Sabourau. Asymptotic behaviour of self-contracted planarcurves and gradient orbits of convex functions.
J. Math. Pures Appl. (9) , 94(2):183–199,2010.[9] B. de Finetti. Sulle stratificazioni convesse.
Ann. Mat. Pura Appl. (4) , 30:173–183, 1949.[10] M. Fokin. Limit sets of trajectories of dynamical systems of gradient type (Russian).
Mat.Sb. (N.S.) , 606(4):502–514, 1981.[11] V. Grandjean. On the limit set at infinity of a gradient trajectory of a semialgebraicfunction.
J. Differential Equations , 233(1):22–41, 2007.[12] V. Grandjean, F. Sanz. On restricted analytic gradients on analytic isolated surface singu-larities.
J. Differential Equations , 255(7):1684–1708, 2013.[13] F. Ichikawa. Thom’s conjecture on singularities of gradient vector fields.
Kodai Math. J. ,15(1):134–140, 1992.[14] Z. Ji and M. Telgarsky. Directional convergence and alignment in deep learning.
Preprint ,2020.[15] S. G. Krantz and H. R. Parks.
A primer of real analytic functions . Birkh¨auser AdvancedTexts: Basler Lehrb¨ucher. [Birkh¨auser Advanced Texts: Basel Textbooks]. Birkh¨auserBoston, Inc., Boston, MA, second edition, 2002.[16] K. Kurdyka, T. Mostowski, and A. Parusi´nski. Proof of the gradient conjecture of R. Thom.
Ann. of Math. (2) , 152(3):763–792, 2000.[17] K. Kurdyka and A. Parusi´nski. Quasi-convex decomposition in o-minimal structures. Ap-plication to the gradient conjecture. In
Singularity theory and its applications , volume 43of
Adv. Stud. Pure Math. , pages 137–177. Math. Soc. Japan, Tokyo, 2006.[18] H. X. Lin. Sur la structure des champs de gradients de fonctions analytiques r´eelles.
PhDThesis, Universit´e Paris VII , 1992.[19] S. (cid:32)Lojasiewicz. Une propri´et´e topologique des sous-ensembles analytiques r´eels. In
Les´Equations aux D´eriv´ees Partielles (Paris, 1962) , pages 87–89. ´Editions du Centre Nationalde la Recherche Scientifique, Paris, 1963.[20] S. (cid:32)Lojasiewicz. Sur les trajectoires du gradient d’une fonction analytique. In
Geometryseminars, 1982–1983 (Bologna, 1982/1983) , pages 115–117. Univ. Stud. Bologna, Bologna,1984.[21] K. Lyu and J. Li. Gradient descent maximizes the margin of homogeneous neural networks.
Preprint , 2019.[22] P. Manselli and C. Pucci. Maximum length of steepest descent curves for quasi-convexfunctions.
Geom. Dedicata , 38(2):211–227, 1991.[23] C. Miller. Exponentiation is hard to avoid.
Proc. Amer. Math. Soc. , 122(1):257–259, 1994.1724] R. Moussu. Sur la dynamique des gradients. Existence de vari´et´es invariantes (French).
Math. Ann.
Geometric theory of dynamical systems . Springer-Verlag, NewYork, 1982. An introduction, Translated from the Portuguese by A. K. Manning.[26] R. R. Phelps.
Convex functions, monotone operators and differentiability , volume 1364 of
Lecture Notes in Mathematics . Springer-Verlag, Berlin, 1989.[27] F. Sanz. Non-oscillating solutions of analytic gradient vector fields.
Ann. Inst. Fourier(Grenoble) , 48(4):1045–1067, 1998.[28] F. Sanz. Balanced coordinates for spiraling dynamics.
Qual. Theory Dyn. Syst. , 3(1):181–226, 2002.[29] R. Thom. Probl`emes rencontr´es dans mon parcours math´ematique: un bilan.
Inst. Hautes´Etudes Sci. Publ. Math. , (70):199–214 (1990), 1989.[30] L. van den Dries.
Tame topology and o-minimal structures , volume 248 of
London Mathe-matical Society Lecture Note Series . Cambridge University Press, Cambridge, 1998.[31] L. van den Dries and Chris Miller. Geometric categories and o-minimal structures.
DukeMath. J. , 84(2):497–540, 1996.Aris DaniilidisDIM–CMM, CNRS IRL 2807Beauchef 851, FCFM, Universidad de ChileE-mail:
Research supported by the grants:CMM AFB170001, ECOS-ANID C18E04, Fondecyt 1211217 (Chile),PGC2018-097960-B-C22 (Spain and EU).Mounir Haddou, Olivier LeyUniv Rennes, INSA, CNRS, IRMAR - UMR 6625, F-35000 Rennes, FranceE-mail: { mounir.haddou, olivier.ley } @insa-rennes.frhttp:// { haddou, ley } .perso.math.cnrs.fr/.perso.math.cnrs.fr/