[PDF] A convex function satisfying the Lojasiewicz inequality but failing the gradient conjecture both at zero and infinity

Abstract

We construct an example of a smooth convex function on the plane with a strict minimum at zero, which is real analytic except at zero, for which Thom's gradient conjecture fails both at zero and infinity. More precisely, the gradient orbits of the function spiral around zero and at infinity. Besides, the function satisfies the Lojasiewicz gradient inequality at zero.

Full PDF

AA convex function satisfying the (cid:32)Lojasiewicz inequalitybut failing the gradient conjecture both at zero and inﬁnity.

Aris Daniilidis, Mounir Haddou, Olivier Ley

Abstract.

We construct an example of a smooth convex function on the plane with a strictminimum at zero, which is real analytic except at zero, for which Thom’s gradient conjecturefails both at zero and inﬁnity. More precisely, the gradient orbits of the function spiral aroundzero and at inﬁnity. Besides, the function satisﬁes the (cid:32)Lojasiewicz gradient inequality at zero.

Key words.

Gradient conjecture, gradient conjecture at inﬁnity, Kurdyka-(cid:32)Lojasiewicz inequal-ity, convex function, convergence of secants.

AMS Subject Classiﬁcation

Primary

Secondary

Answering a question of Whitney, (cid:32)Lojasiewicz [19] showed that every analytic variety f − (0),where f : U ⊂ R N → R is real-analytic ( U (cid:54) = ∅ , open), is a deformation retract of its openneighborhood. The deformation was given by the ﬂow of the Euclidean gradient ∇ f . The mainargument of (cid:32)Lojasiewicz was based on a famous lemma, nowadays known as the (cid:32)Lojasiewicz(gradient) inequality, which asserts that for some ϑ ∈ (0 ,

1) and c > (cid:107)∇ f ( x ) (cid:107) ≥ c | f ( x ) − f ( a ) | ϑ (1.1)for all x suﬃciently close to a ∈ f − (0). The above inequality ensures that every boundedgradient orbit t (cid:55)→ γ ( t ) ( i.e. , ˙ γ = ∇ f ( γ )) has ﬁnite length and therefore converges to a singularpoint γ ∞ with ∇ f ( γ ∞ ) = 0.Some years later, Thom conjectured that in this case, up to a change of coordinates thatidentiﬁes γ ∞ to 0, the spherical part of the orbit also converges. In other words, the limit ofsecants lim t → + ∞ γ ( t ) − γ ∞ || γ ( t ) − γ ∞ || exists. (1.2)For decades, this has been known as the (Thom) gradient conjecture, see [1, 29]. (For themore general problem of non-oscillation of trajectories, we refer to [4, 12, 24].) The gradientconjecture makes sense for any gradient dynamics for which bounded orbits converge. Partialresults revealed that (1.2) should hold in the real-analytic case, see [13, 18, 27], fact that waseventually published in full generality by Kurdyka, Mostowski and Parusi´nski [16] in 2000. Theproof was based on (1.1) together with concrete analytic estimations.(cid:32)Lojasiewicz showed that the gradient inequality (1.1) remains valid also for C semialgebraic(respectively, globally subabalytic) functions, see [20]. In 1998, Kurdyka generalized (1.1) for C functions that are deﬁnable in some o-minimal structure , an axiomatic deﬁnition due to vanden Dries [30, 31] which encompasses semialgebraic and globally subanalytic functions, but alsolarger classes that include the exponential function [23]. More precisely, Kurdyka showed that1 a r X i v : . [ m a t h . D S ] F e b or every deﬁnable function f and critical value r ∞ (which is necessarily isolated) there exists δ > r ∞ , r ∞ + δ ) → R which is C on ( r ∞ , r ∞ + δ ) with Ψ (cid:48) > ||∇ (Ψ ◦ f )( x ) || ≥ x ∈ R N such that r ∞ < f ( x ) < r ∞ + δ . In addition, Kurdyka’s proof showed that thefunction Ψ can be taken in the same o-minimal structure as f . Consequently, if f is semialgebraicor globally subanalytic, then so is Ψ and thanks to Puiseux’s theorem we may take Ψ( r ) = r − ϑ ,for ϑ ∈ (0 , c = (1 − ϑ ) − . We refer to (1.3) as the Kurdyka-(cid:32)Lojasiewicz (in short, K(cid:32)L) inequality and we call K(cid:32)L-function any function with (upper) isolated critical values that satisﬁes the K(cid:32)L-inequality aroundany of them. Similarly to the gradient inequality (1.1), bounded gradient orbits of a K(cid:32)L-functionhave ﬁnite length. There are well-known examples of C ∞ functions in R with isolated criticalvalues that are not K(cid:32)L-functions (they have bounded gradient orbits which fail to converge),see [10, 25]. Bounded gradient orbits of convex functions have ﬁnite length [7, 22] and thereforeconverge, but there are also examples of C -smooth convex functions failing K(cid:32)L-property, see [3, § § C o-minimal functions provided either N = 2 (planar case) or the structure is polynomiallybounded (in particular if f is semialgebraic or globally subanalytic). On the other hand, mereconvexity is not suﬃcient to guarantee (1.2): there exist examples of convex functions whoseorbits either spiral [8, § f isa C semialgebraic function and t (cid:55)→ γ ( t ) is a gradient orbit satisfying || γ ( t ) || → ∞ , as t → + ∞ , then the limit of secants at inﬁnitylim t → + ∞ γ ( t ) || γ ( t ) || exists (gradient conjecture at inﬁnity). (1.4)The proof is based on a (cid:32)Lojasiewicz type gradient inequality at inﬁnity previously obtained bythe author together with D’Acunto in [6].The behavior of secants at inﬁnity has recently become relevant in Machine Learning. If adeep network model is unbiased and homogeneous (max-pooling, ReLu, linear and convolutionallayers), then minimizing the cross-entropy or other classiﬁcation losses forces the parametersof the model to diverge in norm to inﬁnity [21]. In this setting, convergence of the secantsat inﬁnity is important. In [14] the authors manage to establish that for a certain type ofprediction functions ( L -homogeneous and deﬁnable in the log-exp structure) (1.4) holds. Forthe time being, no further results have been reported.In a nutshell, proving the gradient conjecture (respectively, the gradient conjecture at inﬁn-ity) seems to require at least the K(cid:32)L-inequality (1.3) together with other properties of o-minimalfunctions, but it is still unknown if these conjectures are true for general o-minimal functions.In this work we present an example of a smooth convex function in R , which is real-analyticoutside zero (its unique critical point), it satisﬁes the (cid:32)Lojasiewicz inequality (1.1) and fails thegradient conjecture both at zero and at inﬁnity. In particular, all gradient orbits spiral both2t zero and at inﬁnity, underlying in this way the two failures of o-minimality of the function,despite the fact that the function is convex and satisﬁes the (cid:32)Lojasiewicz gradient inequality. Theorem 1.1 (main result) . For every k ∈ N , there exists a C k -convex function f : R → R with a unique minimum at O := (0 , such that:- f is real analytic on R \ {O} ;- f satisﬁes the (cid:32)Lojasiewicz inequality at O and- every gradient orbit γ : ( −∞ , T ) → R of f spirals inﬁnitely many times both when t → T and t → −∞ . Throughout the manuscript, by gradient orbits (or gradient trajectories) we refer to maximalsolutions of the ordinary diﬀerential equation: γ (cid:48) ( t ) = ∇ f ( γ ( t )) . In our example, the function f will be convex, with unique critical point (global minimizer)at O , where we tacitly assume that γ (0) (cid:54) = O (avoiding stationary orbits).Let us brieﬂy describe our strategy for the construction of this example: in Section 2 weprescribe a family of convex sets, all being delimited by ellipses, centered at the origin, andobtained via rotations and size adjustments of a basic ellipse E (0). This is done in a way thatconvex foliation is obtained, which can be represented by some (quasiconvex) function.In Section 3, we further calibrate the parameters so that we can apply a criterium due tode Finetti [9] and Crouzeix [5] that guarantees that the aforementioned quasiconvex functionis in fact convex. The construction yields that the function is real-analytic on R \ O , whichof course cannot be further improved to real analycity on the whole space, due to the proof ofThom’s gradient conjecture [16]. Instead, we are able to show that the function can be taken C k -smooth at O for arbitrary large k ∈ N . Still our construction fails to ensure C ∞ . Finally,applying a result of [3] which gives conditions for a convex function to satisfy (1.3), we showthat our function satisﬁes K(cid:32)L-inequality and in fact even (1.1) (the (cid:32)Lojasiewicz inequality).Gradient orbits are perpendicular to the foliation and explicit calculations, conducted inSection 4, show that the orbits turn around both at the origin and at inﬁnity, which disprovesthe conjecture. An additional diﬃculty to establish spirality is that the evolution of the sphericalpart of the orbit (the rotation angle α ( t ) of γ ( t ) in polar coordinates) is not monotone in time,so that the decrease rate is established in average, see Figure 3 and Figure 4. For a study ofmonotonic spiraling of orbits of general analytic vector ﬁelds in dimensions 2 and 3, we referto [28]. R \ {O} . Let us ﬁrst consider two smooth increasing functions a, b : R → (0 , + ∞ ) for which we assume:  lim t → + ∞ a ( t ) = lim t → + ∞ b ( t ) = + ∞ lim t →−∞ a ( t ) = lim t →−∞ b ( t ) = 0 and a ( t ) ≥ b ( t ), for all t ∈ R . (2.1)3he exact deﬁnition of the functions a ( t ) and b ( t ) will be given in Lemma 3.1 (Section 3). Wealso consider the rotation matrix by an angle t denoted by: R ( t ) = (cid:18) cos t − sin t sin t cos t (cid:19) (2.2)For t ∈ R and θ ∈ T := R / π Z we set m ( t, θ ) := ( x ( t, θ ) , y ( t, θ )) = ( a ( t ) cos θ, b ( t ) sin θ ) , and M ( t, θ ) := R ( t ) m ( t, θ ) = ( X ( t, θ ) , Y ( t, θ )) . (2.3)Therefore (cid:40) X ( t, θ ) = x ( t, θ ) cos t − y ( t, θ ) sin t = a ( t ) cos t cos θ − b ( t ) sin t sin θY ( t, θ ) = x ( t, θ ) sin t + y ( t, θ ) cos t = a ( t ) sin t cos θ + b ( t ) cos t sin θ . (2.4)The subset E ( t ) := { M ( t, θ ) : θ ∈ T } (2.5)is an ellipse with major axis of length a ( t ) and minor axis of length b ( t ) (see Figure 1 forillustration). Notice that E ( t ) is the rotation by angle t of the ellipse E ( t ) := (cid:8) m ( t, θ ) : θ ∈ T (cid:9) = (cid:26) ( x, y ) ∈ R : x a ( t ) + y b ( t ) = 1 (cid:27) . Under an additional condition on the functions a, b , the family of ellipses {E ( t ) } t ∈ R deﬁnedin (2.5) is disjoint with union equal to R \{O} . More precisely, denoting by a (cid:48) , b (cid:48) the derivativesof the functions a , b respectively, we have the following result: Lemma 2.1 (Convex foliation by ellipses) . Let a, b : R → (0 , + ∞ ) satisfy (2.1) and assume a ( t ) b ( t ) a (cid:48) ( t ) b (cid:48) ( t ) > ( a ( t ) − b ( t ) ) , for all t ∈ R . (2.6) Then ( E ( t )) t ∈ R deﬁnes an analytic convex foliation of R \ {O} .Proof. The proof is divided in three steps:

Step 1.

The map M : R × T → R \ {O} is a local analytic diﬀeomorphism.Indeed, let us ﬁrst notice that the map M , deﬁned by (2.3)–(2.4), is real-analytic as compositionof analytic functions. Therefore, if we show that the Jacobian matrix J M = (cid:32) ∂X∂t ∂X∂θ∂Y∂t ∂Y∂θ (cid:33) is invertible at each point ( t, θ ) ∈ R × T , the assertion follows from the local analytic inversefunction theorem [15, Theorem 2.5.1]. To this end, we shall prove thatdet( J M ) = ∂X∂t ∂Y∂θ − ∂Y∂t ∂X∂θ = (cid:10) ∂M∂t , n (cid:11) > , (2.7)4igure 1: The ellipse E ( t ) and the map ( t, θ ) (cid:55)→ M ( t, θ ) O m (0 ,θ ) m ( t, θ ) M ( t, θ ) a (0) b (0) a ( t ) b ( t ) φφ t t xy E ( t ) E ( t )where n ( t, θ ) = − R ( π ) ∂M∂θ = ( ∂Y∂θ , − ∂X∂θ ) is the outer unit normal to the convex set conv E ( t )(convex envelope of E ( t )) at M ( t, θ ). Recalling that M ( t, θ ) = R ( t ) m ( t, θ ) (see (2.3)) and thatthe rotation matrix (2.2) satisﬁes R (cid:48) ( t ) = R ( t + π , R ( t ) − = R ( t ) T = R ( − t ) and R ( t ) R ( s ) = R ( t + s ) , we deduce (cid:10) ∂M∂t , n (cid:11) = (cid:10) ∂∂t ( R ( t ) m ) , − R ( π ∂∂θ ( R ( t ) m ) (cid:11) = (cid:10) R (cid:48) ( t ) m + R ( t ) ∂m∂t , − R ( π R ( t ) ∂m∂θ (cid:11) = (cid:10) R ( t + π m + R ( t ) ∂m∂t , R ( t − π ∂m∂θ (cid:11) = (cid:10) R ( t − π T R ( t + π m, ∂m∂θ (cid:11) + (cid:10) R ( t − π T R ( t ) ∂m∂t , ∂m∂θ (cid:11) = − (cid:10) m, ∂m∂θ (cid:11) + (cid:10) R ( π ∂m∂t , ∂m∂θ (cid:11) . Plugging ∂m∂θ = ( − a sin θ, b cos θ ) and ∂m∂t = ( a (cid:48) cos θ, b (cid:48) sin θ )into the above equality, we end up with the expression:det( J M ) = (cid:104) ∂M∂t , n (cid:105) = a (cid:48) b cos θ + ab (cid:48) sin θ + ( a − b ) cos θ sin θ. (2.8)5his is a quadratic expression with respect to cos θ and sin θ , which is positive for all θ ∈ T ifand only if the discriminant ( a − b ) − aa (cid:48) bb (cid:48) is negative. The result follows in view of (2.6). Step 2.

The map M : R × T → R \ {O} is injective.Fix t ∈ R . From (2.7)–(2.8), using compactness of E ( t ) and smoothness of M , we deduce theexistence of δ t , ρ t > s ∈ [ t, t + δ t ], θ ∈ T , (cid:10) ∂M∂t ( s, θ ) , n ( t, θ ) (cid:11) ≥ ρ t > , which yields (cid:10) M ( s, θ ) − M ( t, θ ) , n ( t, θ ) (cid:11) ≥ ρ t ( s − t ) > , for t < s ≤ t + δ t and θ ∈ T .It follows that conv E ( t ) ⊂ int conv E ( s ) for all s > t . Therefore, the family (conv E ( t )) t ∈ R isnested and the map M is injective. Step 3.

The map M : R × T → R \ {O} is surjective.Fix ( x, y ) ∈ R \ {O} and set, for t ∈ R and D ( t ) = (cid:18) a ( t ) 00 b ( t ) (cid:19) , ρ ( t ) := || D ( t ) − R ( t ) − ( x, y ) || = 1 a ( t ) ( x cos t + y sin t ) + 1 b ( t ) ( − x sin t + y cos t ) . We claim that ρ is a smooth decreasing function with lim −∞ ρ = + ∞ and lim + ∞ ρ = 0.Indeed, since ( x, y ) (cid:54) = (0 , R ( t ) − ( x, y ) (cid:54) = (0 ,

0) and either x cos t + y sin t (cid:54) = 0 or − x sin t + y cos t (cid:54) = 0. Recalling that a ( t ) , b ( t ) → t → −∞ , we deduce lim −∞ ρ = + ∞ . We alsoobserve that lim + ∞ ρ = 0 is a direct consequence of the fact a ( t ) , b ( t ) → + ∞ as t → + ∞ .It remains to prove that ρ (cid:48) is negative. To this end, set q ( t ) := x cos t + y sin t and notice that ρ = a − q + b − ( q (cid:48) ) . Using that q (cid:48)(cid:48) = − q , we infer ρ (cid:48) ( t ) = − a (cid:48) a − q + 2 a − q (cid:48) q − b (cid:48) b − ( q (cid:48) ) + 2 b − q (cid:48)(cid:48) q (cid:48) = − a − b − (cid:0) a (cid:48) a − b q + ( a − b ) qq (cid:48) + b (cid:48) b − a ( q (cid:48) ) (cid:1) . The quadratic expression a (cid:48) a − b q +( a − b ) qq (cid:48) + b (cid:48) b − a ( q (cid:48) ) with respect to q and q (cid:48) is positiveif and only if its discriminant is negative, which is equivalent, once again, to assume (2.6). Thus ρ is strictly decreasing and the claim follows.Using the claim, we infer that there exists a unique t ∈ R such that ρ ( t ) = || D ( t ) − R ( t ) − ( x, y ) || = 1 . Therefore, there exists a unique θ ∈ T such that D ( t ) − R ( t ) − ( x, y ) = (cos θ, sin θ ). It followsthat M ( t, θ ) = ( x, y ), which proves that M is onto.A typical instance where Lemma 2.1 applies is to take a = µb for some constant µ > b ( t ) = e νt with ν > µ − µ , it is straightforward to check that a, b satisfy (2.1) and (2.6).Figure 2 represents the explicit choice µ = 2 and ν = 1 leading to a ( t ) = 2 e t and b ( t ) = e t .6 Figure 2: The convex foliation ( E ( t )) t ∈ R for a ( t ) = 2 b ( t ) = 2 e t . In this section we shall show that for a more precise choice of the functions a ( t ) , b ( t ) we canconstruct a convex function whose level sets are exactly the foliation {E ( t ) } t ∈ R . Moreover, weshall show that this convex function is smooth, real-analytic on R \ {O} and satisﬁes (1.1).Concretely, let us denote by ϕ : R → R a smooth strictly increasing function (the concretedeﬁnition will be given in (3.2), see Lemma 3.1) and let us set for all M ∈ R f ( M ) = (cid:40) , if M = (0 , ϕ ( t ) , if M ∈ E ( t ) , (3.1)where E ( t ) is the ellipse given in (2.5). We shall now show that we can adjust the parametersand choose ϕ in a way that (3.1) gives a well-deﬁned convex function. Lemma 3.1 (Construction of the convex function) . Setting for t ∈ R a ( t ) = √ t ) , b ( t ) = exp( t ) in (2.4) , ϕ ( t ) = exp( t/τ ) , τ ∈ (0 , ) , in (3.1) , (3.2) the function f deﬁned by (3.1) is convex, with level sets the ellipses E ( t ) and argmin f = {O} .Proof. Since the functions a, b satisfy (2.1) and (2.6), we deduce by Lemma 2.1 that conv( E ( t )) t ∈ R is a convex foliation. In particular, the function f is well deﬁned from (3.1) with sublevel sets[ f ≤ λ ] := { M ∈ R : f ( M ) ≤ λ } = conv [ E ( ϕ − ( λ ))]7ompact and convex. Therefore f is a coercive, quasiconvex function.We shall now use a result due to de Finetti and Crouzeix [5,9] which asserts that the quasiconvexfunction f is convex if and only if λ (cid:55)→ σ [ f ≤ λ ] ( p ) is concave for every p ∈ R , where σ A ( p ) = max M ∈ A (cid:104) p, M (cid:105) is the support function to the subset A . Without loss of generality,we may restrict to unit vectorss p ∈ R , which results in assuming that p = (cos α, sin α ), forsome α ∈ T . Therefore, we are led to prove that the function G α ( λ ) := sup (cid:110)(cid:68) ( x, y ) , (cos α, sin α ) (cid:69) : f ( x, y ) ≤ λ (cid:111) = sup (cid:110)(cid:68) M ( t, θ ) , (cos α, sin α ) (cid:69) : f ( M ( t, θ )) = ϕ ( t ) ≤ λ (cid:111) = max (cid:110)(cid:68) M ( t, θ ) , (cos α, sin α ) (cid:69) : θ ∈ T , t = t ( λ ) = ϕ − ( λ ) (cid:111) is concave. To this, end, after straightforward calculations we obtain (cid:68) M ( t, θ ) , (cos α, sin α ) (cid:69) = (cid:68) R ( t ) m ( t, θ ) , (cos α, sin α ) (cid:69) = (cid:68) ( a ( t ) cos θ, b ( t ) sin θ ) , R ( − t ) (cos α, sin α ) (cid:69) = (cid:68) (cos θ, sin θ ) , ( a ( t ) cos( α − t ) , b ( t ) sin( α − t )) (cid:69) whence we deduce G α ( λ ) = (cid:13)(cid:13)(cid:13) a ( t ( λ )) cos( α − t ( λ )) , b ( t ( λ )) sin( α − t ( λ )) (cid:13)(cid:13)(cid:13) = (cid:112) g α ( λ ) (3.3)with g α ( λ ) = a ( t ( λ )) cos ( t ( λ ) − α ) + b ( t ( λ )) sin ( t ( λ ) − α ) . (3.4)Calculating the second derivative of G α in (3.3) yields G (cid:48)(cid:48) α = 2 g (cid:48)(cid:48) α g α − ( g (cid:48) α ) g / α . Therefore, the functions { G α } α ∈ T are concave provided we establish:2 g (cid:48)(cid:48) α g α − ( g (cid:48) α ) ≤ , for all α ∈ T . (3.5)At this step, we replace in (3.4) the choice for a , b and ϕ given in (3.2): a ( t ) = √ e t , b ( t ) = e t and λ = ϕ ( t ) = e t/τ , for all t ∈ R , and we seek for the values of τ > t := t ( λ ) = τ log λ, whence t (cid:48) ( λ ) = τλ and t (cid:48)(cid:48) ( λ ) = − τλ < . g α = e t (cid:0) cos ( t − α ) + 1 (cid:1) , g (cid:48) α = 2 e t t (cid:48) (cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1) and g (cid:48)(cid:48) α = 2 e t (cid:16) ( t (cid:48) ) (cid:0) − t − α ) sin( t − α ) (cid:1) + t (cid:48)(cid:48) (cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1) (cid:17) . Hence2 g (cid:48)(cid:48) α g α − ( g (cid:48) α ) == 4 e t ( t (cid:48) ) (cid:110)(cid:0) cos ( t − α ) + 1 (cid:1)(cid:0) − t − α ) sin( t − α ) (cid:1) − (cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1) + 4 e t t (cid:48)(cid:48) (cid:0) cos ( t − α ) + 1 (cid:1)(cid:0) cos ( t − α ) + 1 − cos( t − α ) sin( t − α ) (cid:1)(cid:111) ≤ e t (cid:16) t (cid:48) ) + 12 t (cid:48)(cid:48) (cid:17) ≤ τ (10 τ − e t λ , which is negative provided we choose τ < / M : R × T (cid:55)→ R \{O} under the choice made in Lemma 3.1, that is, M ( t, θ ) = ( X ( t, θ ) , Y ( t, θ )) = e t (cid:16) √ t cos θ − sin t sin θ, √ t cos θ + cos t sin θ (cid:17) . (3.6)Setting (cid:40) ˜ f : R × T (cid:55)→ R ˜ f ( t, θ ) = ϕ ( t ) = exp( t/τ ) (3.7)we observe that the convex function f deﬁned in (3.1) satisﬁes: f ( x, y ) = (cid:40) ( ˜ f ◦ M − )( x, y ) , if ( x, y ) (cid:54) = O , , if ( x, y ) = O . (3.8)With the next couple of lemmas we show that the function f , apart from being convex, enjoysseveral other good properties. Lemma 3.2 (Properties of the convex function) . Let f : R (cid:55)→ [0 , + ∞ ) be the convex functiondeﬁned by (3.6) – (3.8) for < τ < / . Then (i). f is strictly positive on R \ {O} with f ( O ) = 0 . (ii). For all ( x, y ) ∈ R , it holds (cid:16) / √ (cid:17) /τ (cid:13)(cid:13) ( x, y ) (cid:13)(cid:13) /τ ≤ f ( x, y ) ≤ (cid:13)(cid:13) ( x, y ) (cid:13)(cid:13) /τ . (3.9) In particular, f is coercive. (iii). f is real analytic on R \ {O} and f ∈ C ( R ) . f satisﬁes the (cid:32)Lojasiewicz inequality (1.1) with ϑ = 1 − τ , c = τ / √ , a ≡ O and f ( O ) = 0 ,that is (cid:107)∇ f ( x, y ) (cid:107) ≥ (cid:18) τ √ (cid:19) f ( x, y ) − τ , for all ( x, y ) ∈ R . (3.10) Proof. (i). It is straightforward from the deﬁnition of f in (3.1) and the choice of ϕ .(ii). From Lemma 2.1, for every ( x, y ) ∈ R \ {O} , there exists a unique t ∈ R such that( x, y ) ∈ E ( t ) and we have x + y a ( t ) ≤ a ( t ) ( x cos t + y sin t ) + 1 b ( t ) ( − x sin t + y cos t ) = 1 ≤ x + y b ( t ) , whence e t = b ( t ) ≤ (cid:107) ( x, y ) (cid:107) ≤ a ( t ) = √ e t . We deduce easily that2 − / (2 τ ) (cid:107) ( x, y ) (cid:107) /τ ≤ f ( x, y ) = ϕ ( t ) = e t/τ ≤ (cid:107) ( x, y ) (cid:107) /τ . (iii). It follows from (3.1) that f = ϕ ◦ p ◦ M − on R \ {O} , where p : R × T (cid:55)→ R with p ( t, θ ) = t . By Lemma 2.1, the map M : R × T (cid:55)→ R \ {O} given in (3.6) is a real analyticdiﬀeomorphism. Since p and ϕ are analytic, the ﬁrst part of the assertion follows. In particular,the function f is C ∞ -smooth on R \ {O} .Since 1 /τ >

1, the function ( x, y ) (cid:55)→ (cid:107) ( x, y ) (cid:107) /τ is C over R and (3.9) yields that f is diﬀer-entiable at O with ∇ f ( O ) = 0. Therefore f is diﬀerentiable everywhere in R and, since it isconvex, it is C (see for instance, [26, p. 20]).(iv) Since S := argmin f = {O} , we have dist S ( M ) = (cid:107) M (cid:107) for all M = ( x, y ) ∈ R . Therefore,the ﬁrst inequality in (3.9) can be written f ( M ) ≥ m (dist S ( M )) for all M ∈ R , where m ( r ) = 2 − / (2 τ ) r /τ . Since m − ( s ) s = √ s τ − ∈ L ((0 , + ∞ )) , we deduce from [3, Theorem 30] that the K(cid:32)L-inequality (cid:107)∇ ( ψ ◦ f )( M ) (cid:107) ≥ , holds for all M ∈ [ f >

0] := R \ {O} , where ψ ( s ) = (cid:90) s m − ( σ ) σ dσ = √ τ s τ . A straightforward calculation shows that (3.10) holds.10 emma 3.3 ( C k -smoothness of the convex function) . Let f be the convex function deﬁnedby (3.7) – (3.8) for < τ < / . Let k ∈ N be the biggest integer such that k < τ . Then f ∈ C k ( R ) and f (cid:54)∈ C k +1 ( R ) .Proof. Recalling that f is real analytic in R \ {O} with f ( O ) = 0 and ∇ f ( O ) = 0, in order toprove that f is C k , it is suﬃcient to show that all the partial derivatives ∂ l + l f∂x l ∂y l , l + l ≤ k, (3.11)which exist in R \ {O} , converge to 0 at O . To this end, it is more convenient to start bycomputating the partial derivatives of ˜ f deﬁned in (3.7). We have˜ f ( t, θ ) := f ( M ( t, θ )) = e t/τ = f ( x, y ) for ( x, y ) = M ( t, θ ) = ( X ( t, θ ) , Y ( t, θ )),and by diﬀerentiation, we obtain (cid:32) ∂ ˜ f∂t∂ ˜ f∂θ (cid:33) = (cid:18) τ e t/τ (cid:19) = (cid:32) ∂X∂t ∂Y∂t∂X∂θ ∂Y∂θ (cid:33) (cid:32) ∂f∂x∂f∂y (cid:33) (3.12)We can compute explicitely the partial derivatives of X and Y , see (3.6), to obtain ∂X∂t , ∂Y∂t , ∂X∂θ , ∂Y∂θ = e t P ( t, θ ) , where P ( t, θ ) denotes generically a smooth periodic (hence bounded) function with respect to t and θ . More generally, in what follows, P n,m ( t, θ ) (respectively B n,m ( t, θ )) denotes a n × m matrix, the coeﬃcients of which are smooth and periodic with respect to t and θ (respectivelybounded in ( −∞ , × R ). It follows that (cid:32) ∂f∂x∂f∂y (cid:33) = 1 ∂X∂t ∂Y∂θ − ∂Y∂t ∂X∂θ (cid:32) ∂Y∂θ − ∂Y∂t − ∂X∂θ ∂X∂t (cid:33) (cid:18) τ e t/τ (cid:19) Since 0 < e t ( √ −

12 ) ≤ ∂X∂t ∂Y∂θ − ∂Y∂t ∂X∂θ = e t ( √ θ sin θ ) ≤ e t ( √ , we obtain (cid:32) ∂f∂x∂f∂y (cid:33) = e ( τ − t P , ( t, θ ) , (3.13)from which we infer that ∂f∂x , ∂f∂y → x, y ) → O or equivalently as t → −∞ , since τ >

1. Wethen recover the fact that f is C , with ∇ f ( O ) = (0 , f is C (when τ >  ∂ ˜ f∂t ∂ ˜ f∂t∂θ∂ ˜ f∂θ  =  τ e t/τ  = e t P , ( t, θ )  ∂ f∂x ∂ f∂x∂y∂ f∂y  + e t P , ( t, θ ) (cid:32) ∂f∂x∂f∂y (cid:33) , (3.14)11here the coeﬃcients of e t P , ( t, θ ) are of the form Z Z , with Z , Z ∈ D := (cid:110) ∂X∂t , ∂Y∂t , ∂X∂θ , ∂Y∂θ (cid:111) and the coeﬃcients of e t P , ( t, θ ) are second derivatives of X , Y . The matrix P , ( t, θ ) isinvertible since ( t, θ ) ∈ R × T (cid:55)→ M ( t, θ ) := ( x, y ) ∈ R \ {O} is an analytic diﬀeomorphism.Finally, we get  ∂ f∂x ∂ f∂x∂y∂ f∂y  = e ( τ − t P , ( t, θ ) + e ( τ − t B , ( t, θ ) , which proves that the second derivatives of f converge to 0 as ( x, y ) → O if τ >

2. Therefore f is C with ∇ f ( O ) = 0 × .Continuing along the same lines, when diﬀerentiating l times, the invertible matrix in frontof the l -th order derivatives of f has coeﬃcients of the form Z Z · · · Z l with Z , · · · , Z l ∈ D and, after tedious computations, we obtain  ∂ l f∂x l ... ∂ l f∂x l − i ∂y i ... ∂ l f∂y l  = e ( τ − l ) t P l +1 , ( t, θ ) + e ( τ − ( l − t B l +1 , ( t, θ ) , (3.15)which converges to 0 as ( x, y ) → O as long as τ > l . Therefore f is C l and all the l -th orderderivatives of f are zero at O and we conclude that f ∈ C k ( R ), where k is the biggest integersuch that k < τ .Let us now assume, towards a contradiction, that f is C k +1 . Then we can write a Taylorexpansion of f up to the order k + 1 at O . Since ∇ l f ( O ) = 0 for l ≤ k , we obtain that f ( x, y ) = O ( || ( x, y ) || k +1 ) in a neighborhood of O , (3.16)where O ( r k +1 ) /r k +1 is bounded near 0. If τ (cid:54)∈ N , then k +1 > τ , and we obtain a straightforwardcontradiction with the ﬁrst inequality in (3.9). If now k + 1 = τ ∈ N , then (3.16) is not anymorecontradictory with (3.9). But writing (3.15) with l = k + 1, we get  ∂ k +1 f∂x k +1 ... ∂ k +1 f∂y k +1  = P k +2 , ( t, θ ) + e t B k +2 , ( t, θ ) . The second term above converges to zero as t → −∞ , or equivalently as ( x, y ) → O , but P k +2 , ( t, θ ) is a periodic nonconstant matrix with respect to t and θ so cannot converge as t → −∞ , contradicting our assumption. This ends the proof.12 Oscillating gradient trajectories

Suppose that f is the convex function deﬁned in the previous section, see (3.1)–(3.2), andconsider the ordinary diﬀerential equation for the gradient orbits: (cid:40) γ (cid:48) ( t ) = ∇ ( f ( γ ( t ))) , t ∈ R ,γ (0) = γ ∈ R \ {O} . (4.1)Since f is convex analytic in R \ {O} and coercive with a unique minimum at O , there existsa unique maximal solution γ in ( −∞ , T ), T ≤ + ∞ , withlim t → T (cid:107) γ ( t ) (cid:107) = + ∞ and lim t →−∞ γ ( t ) = O . In fact, ﬁnding gradient orbits is a geometric problem. We seek the unique curve γ passingthrough γ , which is orthogonal to the level sets of f . It is convenient to parametrize γ as γ ( s ) = M ( t ( s ) , θ ( s )) = ( X ( t ( s ) , θ ( s )) , Y ( t ( s ) , θ ( s ))) , s ∈ R (4.2)using the notations (2.3)–(2.4). Under this parametrization γ ( s ) ∈ E ( t ( s )), for every s ∈ R and γ (cid:48) ( s ) is a normal vector at γ ( s ) to the (convex) sublevel set [ f ≤ f ( γ ( s ))] = conv E ( t ( s )).Therefore: γ (cid:48) ( s ) ⊥ ∂ θ M ( t ( s ) , θ ( s )) , for all s ∈ R . (4.3)We deﬁne the rotation angle s (cid:55)→ α ( s ) as the angle between the x -axis and the secant γ ( s ) (cid:107) γ ( s ) (cid:107) (spherical part of the orbit) varying in a continuous way. Therefore  cos α ( s ) = X ( t,θ ) √ X ( t,θ ) + Y ( t,θ ) , sin α ( s ) = Y ( t,θ ) √ X ( t,θ ) + Y ( t,θ ) . In particular, according to the notation used in (2.3)–(2.5), if φ ( s ) is the angle in polar coordi-nates of the point m ( t, θ ), then we have (see Figure 1): α ( s ) = t ( s ) + φ ( s ) , for all s ∈ R . Lemma 4.1 (Spiraling around the origin) . Let f be the convex function deﬁned in (3.1) underthe assumption (3.2) and let s (cid:55)→ γ ( s ) be a maximal orbit of the convex foliation ( E ( t )) t ∈ R . Thenthe rotation angle s (cid:55)→ α ( s ) satisﬁes lim s →±∞ α ( s ) = ±∞ . (4.4)See Figure 3 for a generic numerical simulation of the maximal orbit of the function f associatedwith the convex foliation of Figure 2. 13igure 3: Gradient orbit γ ( s ) with initial point γ (0) = (2 , Proof.

We use the parametrization given by (4.2). Sincelim s → + ∞ (cid:107) γ ( s ) (cid:107) = + ∞ and lim t →−∞ γ ( s ) = O , we can assume that the function s (cid:55)→ t ( s ) satisﬁes t (cid:48) ( s ) > s →±∞ t ( s ) = ±∞ . (4.5)The goal is to compute α ( s ) using the orthogonality condition (4.3), which is equivalent to (cid:10) γ (cid:48) ( s ) , ∂ θ M ( t ( s ) , θ ( s )) (cid:11) = 0 , for all s ∈ R . (4.6)Using the notations of Section 2, we have γ (cid:48) ( s ) = dds M ( t ( s ) , θ ( s )) = t (cid:48) ∂ t ( Rm ) + θ (cid:48) ∂ θ ( Rm ) = t (cid:48) ( R (cid:48) m + R∂ t m ) + θ (cid:48) R∂ θ m and ∂ θ M = ∂ θ ( Rm ) = R∂ θ m . It follows (cid:10) γ (cid:48) ( s ) , ∂ θ M (cid:11) = t (cid:48) (cid:10) R (cid:48) m, R∂ θ m (cid:11) + t (cid:48) (cid:10) R∂ t m, R∂ θ m (cid:11) + θ (cid:48) (cid:10) R∂ θ m, R∂ θ m (cid:11) = t (cid:48) (cid:10) R ( π m, ∂ θ m (cid:11) + t (cid:48) (cid:10) ∂ t m, ∂ θ m (cid:11) + θ (cid:48) (cid:107) ∂ θ m (cid:107) = t (cid:48) (cid:0) ab + ( bb (cid:48) − aa (cid:48) ) cos θ sin θ (cid:1) + θ (cid:48) (cid:0) a sin θ + b cos θ (cid:1) . By (4.3), we have (cid:10) γ (cid:48) ( s ) , ∂ θ M (cid:11) = 0 and after substitution a ( t ) = √ e t and b ( t ) = e t we get t (cid:48) e t ( √ − cos θ sin θ ) + θ (cid:48) e t (1 + sin θ ) = 0whence we deduce the following relation between t ( s ) and θ ( s ): t (cid:48) ( s ) = − θ ( s ) √ − cos θ ( s ) sin θ ( s ) θ (cid:48) ( s ) . (4.7)Since for every θ ∈ R we have0 < √ ≤ θ √ − cos θ sin θ ≤ √ − ,

14e get − √ θ (cid:48) ( s ) ≤ t (cid:48) ( s ) ≤ − √ − θ (cid:48) ( s ) . Therefore, from (4.5) we deduce θ (cid:48) ( s ) < , θ ( s ) → s →−∞ + ∞ , θ ( s ) → s → + ∞ −∞ . (4.8)Next, we establish the relation between θ ( s ) and φ ( s ), see Figure 1. We havecos φ = a cos θ (cid:112) a cos θ + b sin θ = √ θ (cid:112) θ + sin θ , sin φ = b sin θ (cid:112) a cos θ + b sin θ = sin θ (cid:112) θ + sin θ . Diﬀerentiating cos φ and plugging the result in the second expression, we end up with φ (cid:48) = √

21 + cos θ θ (cid:48) . (4.9)Assembling (4.7) and (4.9), we obtain α (cid:48) = t (cid:48) + φ (cid:48) = (cid:32) √

21 + cos θ − θ √ − cos θ sin θ (cid:33) θ (cid:48) =: h ( θ ) θ (cid:48) , (4.10)The function h is analytic and 2 π -periodic, see Figure 4. We can expand it in Fourier seriesand integrate (4.10) to obtain α ( s ) = a θ ( s ) + O (1) , (4.11)where O (1) is a bounded function and a = 1 π (cid:90) π h ( θ ) dθ (cid:39) − . < . We ﬁnally conclude from (4.11) and (4.5) that (4.4) holds.

Consider the convex foliation by ellipses {E ( t ) } t ∈ R given by Lemma 2.1. Let k ≥ f be the convex function deﬁned by Lemma 3.1 for 0 < τ < min { / , /k } . Then, byLemma 3.2, the function f is coercive, has its unique minimum at the origin O , is real analyticin R \ {O} and satisﬁes the (cid:32)Lojasiewicz inequality (1.1). Further, Lemma 3.3, ensures that f is C k -smooth. Finally, Lemma 4.1 asserts that all nontrivial gradient orbits spiral inﬁnitely manytimes both near the origin (bounded part) and at inﬁnity. (cid:3) Acknowledgement.

This work was partially supported by the Centre Henri Lebesgue ANR-11-LABX-0020-01 and the grants CMM AFB170001, ECOS-Sud/ANID C18E04 and FONDECYT1211217. Major part of this work has been done during a research visit of the ﬁrst author toINSA Rennes. This author is indebted to his hosts for hospitality.15igure 4: Plot of h ( θ ) = √

21 + cos θ − θ √ − cos θ sin θ . References [1] V. I. Arnold. Some open problems in the theory of singularities. In

Singularities, Part 1(Arcata, Calif., 1981) , volume 40 of

Proc. Sympos. Pure Math. , pages 57–69. Amer. Math.Soc., Providence, R.I., 1983. Translated from the Russian.[2] J. Bolte and E. Pauwels. Curiosities and counterexamples in smooth convex optimization.

TSE Working Paper, n. 20-1080 , 2020.[3] J. Bolte, A. Daniilidis, O. Ley, and L. Mazet. Characterizations of Lojasiewicz inequalities:subgradient ﬂows, talweg, convexity.

Trans. Amer. Math. Soc. , 362(6):3319–3363, 2010.[4] F. Cano, R. Moussu, F. Sanz. Nonoscillating projections for trajectories of vector ﬁelds.

J.Dyn. Control Syst.

Math. Oper. Res. ,5(1):120–125, 1980.[6] D. D’Acunto and V. Grandjean. On gradient at inﬁnity of semialgebraic functions.

Ann.Polon. Math. , 87:39–49, 2005.[7] A. Daniilidis, G. David, E. Durand-Cartagena, and A. Lemenant. Rectiﬁability of self-contracted curves in the Euclidean space and applications.

J. Geom. Anal. , 25(2):1211–1239, 2015. 168] A. Daniilidis, O. Ley, and S. Sabourau. Asymptotic behaviour of self-contracted planarcurves and gradient orbits of convex functions.

J. Math. Pures Appl. (9) , 94(2):183–199,2010.[9] B. de Finetti. Sulle stratiﬁcazioni convesse.

Ann. Mat. Pura Appl. (4) , 30:173–183, 1949.[10] M. Fokin. Limit sets of trajectories of dynamical systems of gradient type (Russian).

Mat.Sb. (N.S.) , 606(4):502–514, 1981.[11] V. Grandjean. On the limit set at inﬁnity of a gradient trajectory of a semialgebraicfunction.

J. Diﬀerential Equations , 233(1):22–41, 2007.[12] V. Grandjean, F. Sanz. On restricted analytic gradients on analytic isolated surface singu-larities.

J. Diﬀerential Equations , 255(7):1684–1708, 2013.[13] F. Ichikawa. Thom’s conjecture on singularities of gradient vector ﬁelds.

Kodai Math. J. ,15(1):134–140, 1992.[14] Z. Ji and M. Telgarsky. Directional convergence and alignment in deep learning.

Preprint ,2020.[15] S. G. Krantz and H. R. Parks.

A primer of real analytic functions . Birkh¨auser AdvancedTexts: Basler Lehrb¨ucher. [Birkh¨auser Advanced Texts: Basel Textbooks]. Birkh¨auserBoston, Inc., Boston, MA, second edition, 2002.[16] K. Kurdyka, T. Mostowski, and A. Parusi´nski. Proof of the gradient conjecture of R. Thom.

Ann. of Math. (2) , 152(3):763–792, 2000.[17] K. Kurdyka and A. Parusi´nski. Quasi-convex decomposition in o-minimal structures. Ap-plication to the gradient conjecture. In

Singularity theory and its applications , volume 43of

Adv. Stud. Pure Math. , pages 137–177. Math. Soc. Japan, Tokyo, 2006.[18] H. X. Lin. Sur la structure des champs de gradients de fonctions analytiques r´eelles.

PhDThesis, Universit´e Paris VII , 1992.[19] S. (cid:32)Lojasiewicz. Une propri´et´e topologique des sous-ensembles analytiques r´eels. In

Les´Equations aux D´eriv´ees Partielles (Paris, 1962) , pages 87–89. ´Editions du Centre Nationalde la Recherche Scientiﬁque, Paris, 1963.[20] S. (cid:32)Lojasiewicz. Sur les trajectoires du gradient d’une fonction analytique. In

Geometryseminars, 1982–1983 (Bologna, 1982/1983) , pages 115–117. Univ. Stud. Bologna, Bologna,1984.[21] K. Lyu and J. Li. Gradient descent maximizes the margin of homogeneous neural networks.

Preprint , 2019.[22] P. Manselli and C. Pucci. Maximum length of steepest descent curves for quasi-convexfunctions.

Geom. Dedicata , 38(2):211–227, 1991.[23] C. Miller. Exponentiation is hard to avoid.

Proc. Amer. Math. Soc. , 122(1):257–259, 1994.1724] R. Moussu. Sur la dynamique des gradients. Existence de vari´et´es invariantes (French).

Math. Ann.

Geometric theory of dynamical systems . Springer-Verlag, NewYork, 1982. An introduction, Translated from the Portuguese by A. K. Manning.[26] R. R. Phelps.

Convex functions, monotone operators and diﬀerentiability , volume 1364 of

Lecture Notes in Mathematics . Springer-Verlag, Berlin, 1989.[27] F. Sanz. Non-oscillating solutions of analytic gradient vector ﬁelds.

Ann. Inst. Fourier(Grenoble) , 48(4):1045–1067, 1998.[28] F. Sanz. Balanced coordinates for spiraling dynamics.

Qual. Theory Dyn. Syst. , 3(1):181–226, 2002.[29] R. Thom. Probl`emes rencontr´es dans mon parcours math´ematique: un bilan.

Inst. Hautes´Etudes Sci. Publ. Math. , (70):199–214 (1990), 1989.[30] L. van den Dries.

Tame topology and o-minimal structures , volume 248 of

London Mathe-matical Society Lecture Note Series . Cambridge University Press, Cambridge, 1998.[31] L. van den Dries and Chris Miller. Geometric categories and o-minimal structures.

DukeMath. J. , 84(2):497–540, 1996.Aris DaniilidisDIM–CMM, CNRS IRL 2807Beauchef 851, FCFM, Universidad de ChileE-mail:

Research supported by the grants:CMM AFB170001, ECOS-ANID C18E04, Fondecyt 1211217 (Chile),PGC2018-097960-B-C22 (Spain and EU).Mounir Haddou, Olivier LeyUniv Rennes, INSA, CNRS, IRMAR - UMR 6625, F-35000 Rennes, FranceE-mail: { mounir.haddou, olivier.ley } @insa-rennes.frhttp:// { haddou, ley } .perso.math.cnrs.fr/.perso.math.cnrs.fr/