[PDF] A regularity method for lower bounds on the Lyapunov exponent for stochastic differential equations

Abstract

We put forward a new method for obtaining quantitative lower bounds on the top Lyapunov exponent of stochastic differential equations (SDEs). Our method combines (i) an (apparently new) identity connecting the top Lyapunov exponent to a Fisher information-like functional of the stationary density of the Markov process tracking tangent directions with (ii) a novel, quantitative version of Hörmander's hypoelliptic regularity theory in an L 1 framework which estimates this (degenerate) Fisher information from below by an $W^{1,s}_{\loc}$ Sobolev norm. This method is applicable to a wide range of systems beyond the reach of currently existing mathematically rigorous methods. As an initial application, we prove the positivity of the top Lyapunov exponent for a class of weakly-dissipative, weakly forced SDE; in this paper we prove that this class includes the Lorenz 96 model in any dimension, provided the additive stochastic driving is applied to any consecutive pair of modes.

Full PDF

aa r X i v : . [ m a t h . D S ] J u l A regularity method for lower boundson the Lyapunov exponentfor stochastic differential equations

Jacob Bedrossian ∗ Alex Blumenthal † Sam Punshon-Smith ‡ August 3, 2020

Abstract

We put forward a new method for obtaining quantitative lower bounds on the top Lyapunov expo-nent of stochastic differential equations (SDEs). Our method combines (i) an (apparently new) identityconnecting the top Lyapunov exponent to a Fisher information-like functional of the stationary densityof the Markov process tracking tangent directions with (ii) a novel, quantitative version of H¨ormander’shypoelliptic regularity theory in an L framework which estimates this (degenerate) Fisher informationfrom below by an W ,s loc Sobolev norm. This method is applicable to a wide range of systems beyondthe reach of currently existing mathematically rigorous methods. As an initial application, we prove thepositivity of the top Lyapunov exponent for a class of weakly-dissipative, weakly forced SDE; in thispaper we prove that this class includes the Lorenz 96 model in any dimension, provided the additivestochastic driving is applied to any consecutive pair of modes.

Contents R n . . . . . . . . . . . . . . . . . . . . . 81.3 Context within prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Λ s with |·| X j ,s j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Mathematics subject classiﬁcation.

Primary: 37H15, 35H10. Secondary: 37D25, 58J65, 35B65 ∗ Department of Mathematics, University of Maryland, College Park, MD 20742, USA [email protected] . J.B. wassupported by National Science Foundation CAREER grant DMS-1552826 and National Science Foundation RNMS † School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA [email protected] . A.B.was supported by National Science Foundation grant DMS-2009431 ‡ Division of Applied Mathematics, Brown University, Providence, RI 02906, USA [email protected] . This material wasbased upon work supported by the National Science Foundation under Award No. DMS-1803481. .4 Positive X regularity from negative X and positive X j regularity . . . . . . . . . . . . . . . . . . . 253.5 Regularization: Lemma 3.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 R n . . . . . . . . . . . 354.3 Projective spanning for Lorenz 96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 A Qualitative properties of the projective stationary measure 41B Proof of Proposition 4.2 44References 46

Many nonlinear systems of physical origin exhibit chaotic behavior when subjected to external forcing andweak damping. Here, “chaos” refers to sensitivity with respect to the initial conditions, and is often measuredby the Lyapunov exponent, a measure of the asymptotic exponential rate at which nearby trajectories diverge;positivity of the Lyapunov exponent is a well-known hallmark of chaos. Despite the ubiquity of chaos insystems of physical interest, and in contrast with the rather well-developed abstract theory for the descriptionof chaotic states and associated statistical properties, it is notoriously challenging to verify, for a givensystem, whether or not chaotic behavior is actually present in the above sense.The purpose of this paper is to put forward a method for providing (at least “preliminary” ) quantitativelower bounds for the Lyapunov exponents of weakly-damped, weakly-driven SDE. Our method combinestwo new ingredients: (i) an apparently new identity connecting the largest Lyapunov exponent to a Fisherinformation-type quantity on the stationary statistics of tangent directions; and (ii) a quantitative hypoellip-ticity argument for showing that this Fisher information-typequantity uniformly controls Sobolev regularityof the tangent-direction stationary statistics, and hence regularity provides a lower bound on the Lyapunovexponent . Our methods can potentially be interpreted as the beginning of a quantitative and more robust `ala Furstenberg theory for SDEs. See Section 1.3 for a more in-depth discussion of the previously existingwork and how ours ﬁts in.As a ﬁrst application of our methods, we study a class of high-dimensional SDE commonly used as ﬁnitedimensional models in ﬂuid mechanics and other ﬁelds. In [45], Lorenz put forward the following model ,now referred to as Lorenz-96 (L96), for a system of J periodically coupled oscillators u = ( u , · · · , u J ) ∈ R J written here with a small damping parameter ˆ ǫ > and subjected to stochastic forcing: d u m = (cid:0) ( u m +1 − u m − ) u m − − ˆ ǫu m (cid:1) d t + q m d ˆ W mt , ≤ m ≤ J (1.1)where { ˆ W mt } is a collection of one-dimensional independent standard Brownian motions, { q m } are ﬁxedparameters, and the u m are J -periodic in m , i.e., u m + kJ := u m . Since its introduction, L96 has come “Preliminary” in the sense that the lower bounds we provide are expected to be sub-optimal for most systems of physicalinterest. More precisely, we provide a lower bound on nλ − λ Σ , where λ Σ is the sum of the Lyapunov exponents. Note that L96 is distinct from the “butterﬂy attractor” model, an ODE on R , put forward in Lorenz’s seminal 1967 work [44].For this simple 3D model, positivity of Lyapunov exponents for typical initial conditions in a certain parameter regime follows fromthe well-known computer-assisted proof carried out in [67].

2o be recognized as a prototypical model of chaotic behavior in spatially extended systems such as thosearising in ﬂuid mechanics and similar ﬁelds, and serves as a remarkably common benchmark for numericalmethods adapted to the analysis of chaotic systems (see, e.g., [31, 46, 48–50] and references therein). As thenonlinearity is bilinear, by rescaling u by √ ˆ ǫu ( √ ˆ ǫt ) and a re-deﬁnition of ǫ := ˆ ǫ / , (1.1) is equivalent tothe following system (see Remark 1.2), d u m = (cid:0) ( u m +1 − u m − ) u m − − ǫu m (cid:1) d t + √ ǫq m d W mt , ≤ m ≤ J, (1.2)where { W mt } are equal, in law to { ˆ W mt } . In this form, the damping and driving are balanced in the sense thatthe stationary measures of (1.2) are tight as ǫ → (see Appendix A) and these stationary measures convergeto (absolutely continuous) invariant measures of the deterministic ǫ = 0 problem. There is little known(with mathematical rigor) regarding the dynamics of the ǫ = 0 nor can one make a perturbative treatmentfrom the existing `a la Furstenberg methods for random dynamics; see Section 1.3 for more discussion.For (1.2), our methods yield the following theorem. Theorem 1.1.

Let Φ tω : R J → R J denote the stochastic ﬂow of diffeomorphisms solving (1.2) for almostevery random sample path ω . Assume J ≥ and that q , q = 0 . Then, for every ǫ > sufﬁciently small,the top Lyapunov exponent λ ǫ = lim t →∞ t log | D Φ tω ( u ) | exists, is constant over Leb.-a.e. u ∈ R J and a.e. sample path ω , and satisﬁes λ ǫ ǫ → ∞ in particular, λ ǫ > for all ǫ sufﬁciently small. Remarkably, the problem of proving λ ǫ > was previously open in spite of overwhelming numericalevidence to support this [19, 31, 50, 57, 59]. In fact, our results apply, in principle, to any model in a wideclass including not only Lorenz-96, but also Galerkin truncations of the Navier-Stokes equations, subject toa suitable hypoellipticity condition (Theorem 1.12 below) which currently remains open for Galerkin NSE–this will be the subject of future work. On the other hand, we remark that the scaling ǫ − λ ǫ → ∞ is likelyto be sub-optimal. Remark 1.2.

Without the the Brownian term and when ˆ ǫ = 0 , equation (1.1) preserves volume on R J ,and so < ˆ ǫ ≪ can be thought of as a weakly dissipative regime (a property shared, e.g., by Galerkintruncations of NSE). However, as we are taking t → ∞ , it is not possible to treat (1.1) directly using aperturbation argument in ˆ ǫ since at ˆ ǫ = 0 all trajectories diverge as t → ∞ .On the other hand, we can relate the stochastic ﬂow of diffeomorphisms ˆΦ t ˆ ω solving the SDE (1.1) withthe stochastic ﬂow Φ tω solving (1.2) by Φ tω ( u ) = √ ˆ ǫ b Φ √ ˆ ǫt ˆ ω ( u/ √ ˆ ǫ ) (where ω t = ˆ ǫ − / ˆ ω √ ˆ ǫt a Brownian self-similar rescaling of the noise path ˆ ω so equality of the two ﬂows is interpreted as equality in probabilisticlaw ). Thus, the Lyapunov exponent ˆ λ ˆ ǫ of the stochastic ﬂow ˆΦ tω satisﬁes ˆ ǫ − ˆ λ ˆ ǫ = ǫ − λ ǫ , and in particular ˆ λ ˆ ǫ > if and only if λ ǫ > . We ﬁrst provide our main results relating regularity to the lower bounds on the top Lyapunov exponents forgeneral SDEs. Let ( M, g ) be a smooth, connected, n -dimensional, orientable Riemannian manifold (notnecessarily bounded) with no boundary, and consider the stochastic process x t ∈ M, t ≥ deﬁned by the(Stratonovich) SDE d x t = X ( x t ) d t + r X k =1 X k ( x t ) ◦ d W kt , (1.3)3here { X k } rk =0 are a family of smooth vector ﬁelds (potentially unbounded) on M and { W k } rk =1 areindependent standard Wiener processes with respect to a canonical stochastic basis (Ω , F , ( F t ) , P ) . Let uslist the ﬁrst of several (relatively mild) standing assumptions to be imposed throughout the paper. Assumption 1. (i) For each initial data x ∈ M , equation (1.9) has a unique global solution ( x t ) withprobability 1. The (random) solution maps x x t =: Φ tω ( x ) , t ≥ comprise a (stochastic) ﬂow of C r diffeomorphisms (Φ tω ) on M , r ≥ . Moreover, (ii) ( x t ) admits a unique, absolutely continuous stationaryprobability measure µ on M for which (iii) we have the integrability condition E ˆ M (cid:2) log + | D Φ t ( x ) | + log + | D Φ t ( x ) − | (cid:3) d µ ( x ) < ∞ . Assumption 1(i) is well-studied and follows from mild conditions on (1.9); see, e.g., [5, 37]. When M is compact, item (ii) follows from a parabolic H ¨ormander condition on the vector ﬁelds { X , · · · , X r } (seeDeﬁnition 1.7 below). If M is not compact then some additional constraints are needed to avoid drift toinﬁnity. Given (i) and (ii), item (iii) is standard; see, e.g., [33]. Additional discussion and details are givenin Section 2.1.The following standard result is a corollary of the Kingman subadditive ergodic theorem [35] as wellas some basic ergodic theory for random dynamical systems [34]. It provides mathematically rigorousjustiﬁcation for the existence of Lyapunov exponents. Theorem 1.3.

Assume (1.9) satisﬁes Assumption 1. Then there exist positive, deterministic constants λ and λ Σ , independent of both the random sample ω as well as x ∈ M , such that for P ⊗ µ almost every ( ω, x ) ∈ Ω × M the following limits hold: λ = lim t →∞ t log | D Φ tω ( x ) | ,λ Σ = lim t →∞ t log | det D Φ tω ( x ) | . The value λ is the top Lyapunov exponent ; the condition λ > implies sensitivity with respect toinitial conditions as well as local moving-frame saddle-type behavior for (random) trajectories for µ -typicalinitial x for a.e. random sample ω ∈ Ω (see [10, 73]; see also [5, 34, 43] for emphasis on random dynamics).This abstract smooth ergodic theory leans on the Multiplicative Ergodic Theorem [56, 62, 69]; in brief, thisresult provides a a decomposition of the tangent bundle T M into (random) sub-bundles along which variousexponential growth rates (a.k.a. Lyapunov exponents) are realized. Similarly value λ Σ is the sum Lyapunovexponent and describes the asymptotic exponential rate at which Lebesgue volume is contracted/expandedby the dynamics. For more information, see, e.g., the expositions [72, 74].The purpose of this paper is to put forward a new method for obtaining lower bounds on λ . Results areframed in terms of the augmented Markov process ( x t , v t ) tracking a trajectory in phase space ( x t ) and thetangent direction v t := D Φ t ( x ) v | D Φ t ( x ) v | . (1.4)It is straightforward to see that the top Lyapunov exponent λ is connected to Birkhoff sums of the observable g ω ( x, v ) := log k D Φ ω ( x ) v k on the sphere bundle S M (consisting of ﬁbers S x M = S n − ( T x M ) ), and sothere is a clear connection between λ and the “augmented” tangent-direction process ( x t , v t ) ; see, e.g., Here, for a > we write log + a := min { log a, } for the positive part of log . ( w t ) = ( x t , v t ) as the projective process on S M . It is not hard to check that v t solves the SDE d v t = V ∇ X ( x t ) ( v t )d t + r X k =1 V ∇ X k ( x t ) ( v t ) ◦ d W k, where ∇ denotes the covariant derivative and, for x ∈ M and A : T x M → T x M linear, the vector ﬁeld V A on S x M is deﬁned by V A ( v ) := Av − h v, Av i v =: Π v Av .

Here, and everywhere below unless speciﬁed otherwise, we use the notation h a, b i = g ( a, b ) for a, b ∈ T x M .The full projective process ( w t ) evolves according to d w t = ˜ X ( w t )d t + r X k =1 ˜ X k ( w t ) ◦ d W kt . (1.5)Here, for w = ( x, v ) we regard T w S M = T x M ⊕ T v ( S x M ) (see Section 2.2) and deﬁne { ˜ X k } rk =0 by ˜ X k ( x, v ) := (cid:18) X k ( x ) V ∇ X k ( x ) ( v ) (cid:19) . Throughout, we take on the following assumption regarding ( w t ) . Assumption 2.

The SDE (1.5) deﬁning the process ( w t ) satisﬁes Assumptions 1(i) and (ii). That is, the SDEdeﬁning ( w t ) is globally well-posed for a.e. random sample and every initial data; and the Markov process ( w t ) admits a unique, absolutely continuous stationary measure ν on S M . Let d q denote Lebesgue measure on S M , let ν be the stationary measure for the projective process ( x t , v t ) ,and let f = d ν d q denote the stationary density, similarly let µ be the stationary measure for ( x t ) with density ρ = d µ d x . Our ﬁrst main result is a new formula (to our knowledge) connecting the stationary density f to theexponent λ through a partial Fisher information -type quantity

F I ( f ) deﬁned by F I ( f ) := 12 r X k =1 ˆ S M | ˜ X ∗ k f | f d q . Here, the vector ﬁelds ˜ X k are regarded as ﬁrst order differential operators, and ˜ X ∗ k denotes the formal dualin L (d q ) . Proposition 1.4 (Fisher Information Identity) . Let Assumptions 1 and 2 hold. Moreover, assume that (a) thestationary density f satisﬁes f log f ∈ L (d q ) and (b) Q ∈ L ( µ ) and ˜ Q ∈ L ( ν ) , where Q, ˜ Q are deﬁnedby Q ( x ) := div X ( x ) + 12 r X k =1 X k div X k ( x ) , ˜ Q ( w ) := div ˜ X ( w ) + 12 r X k =1 ˜ X k div ˜ X k ( w ) . The distinction between v t or − v t is irrelevant for Lyapunov exponents, and so morally we regard ( w t ) as evolving on theprojective bundle P M consisting of ﬁbers P x M = P n − ( T x M ) , the projectivization of the tangent space T x M . However, inpractice we will regard ( w t ) as a process on the sphere bundle S M . hen the following identities hold: F I ( ρ ) = − ˆ S M Q d µ = − λ Σ ,F I ( f ) = − ˆ S M ˜ Q d ν = nλ − λ Σ . (1.6) Equivalently, writing h x ( v ) = f ( x, v ) /ρ ( x ) for the conditional densities on S x M of v with respect to x , wehave F I ( f ) − F I ( ρ ) = 12 r X k =1 ˆ M ˆ S x M | ( X k − V ∗∇ X k ( x ) ) h x ( v ) | h x ( v ) d v ! d µ ( x ) = nλ − λ Σ . (1.7) Remark 1.5.

In each line of (1.6), the second equality is a version of the famous Furstenberg-Khasminskiiformula (see, e.g., [5]) for the Lyapunov exponents of an SDE satisfying Assumptions 1 and 2 (see Lemma2.4 for more details). What’s new here are the ﬁrst equalities concerning

F I ( ρ ) , F I ( f ) . Equation (1.7)is an equivalent formulation highlighting the relation to the natural quantity nλ − λ Σ and criteria `a laFurstenberg for the Lyapunov exponents of stochastic systems. In particular, note that F I ( f ) − F I ( ρ ) ≥ and nλ > λ Σ if and only if F I ( f ) − F I ( ρ ) > . See Section 1.3 for more information.Proposition 1.4 is derived in Section 2. In fact, we give two proofs: the ﬁrst is a combination of theFurstenberg-Khasminskii formula [5, 32] with the Kolmogorov equation ˜ L ∗ f = ˜ X ∗ f + 12 r X j =1 ( ˜ X ∗ j ) f = 0 , (1.8)for f ; the second proof (which we merely sketch, leaving details to the interested reader) connects F I ( f ) to a certain relative entropy formula for Lyapunov exponents [11, 26] (see also [41]) intimately connectedwith the Furstenberg-style approach to Lyapunov exponents of random systems. See Sections 1.3 and 2 foradditional discussion. Remark 1.6.

The Fisher information is a fundamental quantity in the theory of statistical inference andinformation geometry (see [2]). Typically used to measure the amount of information a parametrized familyof laws (e.g., the law of one variable conditioned on the value of another) carries about the inference param-eter. In this case, ( ν x ) x ∈ M are the family of laws indicating that the Fisher information in (1.7) signiﬁes,on average, how much information about x can be inferred by making observations only in the projectivevariable v . The identity (1.6) will be most useful in a quantitative sense when studying the small-noise limit. Hence wedeﬁne for ǫ ∈ (0 ,

1] d x ǫt = X ǫ ( x t ) d t + √ ǫ r X k =1 X ǫk ( x t ) ◦ d W kt , (1.9)where note we also are allowing X ǫj to be parameterized by ǫ ; below we assume natural uniformity propertieson this dependence. In this case, (1.6) becomes (now parameterizing everything by ǫ ): F I ( f ǫ ) := 12 r X k =1 ˆ S M | ( ˜ X ǫk ) ∗ f ǫ | f ǫ d q = nλ ǫ − λ ǫ Σ ǫ . { ˜ X ǫ , ..., ˜ X ǫk } spans the tangent space of S M everywhere, then the identity (1.6) would imply that nλ ǫ − λ ǫ Σ is related in a straightforward manner to the regularity of f in the sense of distributional derivatives,i.e., Sobolev norms.However, in nearly all cases of interest (and especially in the settings we are interested in, such as (1.2)),the collection { ˜ X ǫ , ..., ˜ X ǫk } fails to span the tangent space of S M . Our second main result, Theorem 1.9below, overcomes this complication by adapting ideas from H ¨ormander’s hypoelliptic regularity theory toshow that, in fact, the partial Fisher information F I ( f ) actually does control at least some Sobolev regularityof f .In [29], H ¨ormander isolated the general conditions that guarantee the regularity of solutions to Kol-mogorov equations such as the PDE satisﬁed by f (1.8) when the forcing directions do not span the tangentspace. We recall the classical parabolic H ¨ormander condition, as it is directly important for the next mainresult. For vector ﬁelds X, Y , we write [ X, Y ] for the usual Lie bracket of X and Y . Deﬁnition 1.7.

Given a collection of vector ﬁelds Z , Z , . . . , Z r on a manifold M , we deﬁne collectionsof vector ﬁelds X ⊆ X ⊆ . . . recursively by X = { Z j : j ≥ } , X k +1 = X k ∪ { [ Z j , Z ] : Z ∈ X k , j ≥ } . We say that { Z i } ri =0 satisﬁes the parabolic H¨ormander condition if exists k such that for all w ∈ M , span { Z ( w ) : Z ∈ X k } = T w M . Assumption 3 (Projective spanning condition) . The vector ﬁelds { ˜ X ǫ , ˜ X ǫ , · · · , ˜ X ǫr } satisfy the parabolicH¨ormander condition on S M and uniformly in ǫ ∈ (0 , on bounded sets (see Deﬁnition 3.1 below forprecise statement). Remark 1.8.

This condition appears routinely in the random dynamics literature: see for example [11, 24].For SDE systems (1.9) it is the primary sufﬁcient condition used to ensure that ( w ǫt ) will have at most oneabsolutely continuous stationary measure as in Assumption 2; indeed, in most practical examples, one willuse Assumption 3 to deduce Assumption 2. We also note that Assumption 3 can be shown to imply that { X ǫ , ..., X ǫr } satisﬁes the parabolic H ¨ormander condition on M ; see Section 4.We are now positioned to state our second result, which provides a quantitative hypoelliptic regularityestimate turning the partial information F I ( f ǫ ) of f ǫ into a uniform-in- ǫ estimate of Sobolev regularity in all directions. Theorem 1.9.

Assume that { ˜ X ǫ , ..., ˜ X ǫr } are uniformly bounded in C kloc ∀ k and such that Assumptions 1,2 and 3 hold. Then, there exists s ∗ ∈ (0 , such that for any bounded, open set U ⊂ S M , there exists C = C U > such that for all ǫ ∈ (0 , || f ǫ || W s ∗ , ( U ) ≤ C (cid:16) p F I ( f ǫ ) (cid:17) . Remark 1.10.

It might be possible that there is a slightly more reﬁned version of Theorem 1.9 whichreplaces

F I with ǫ δ F I for some δ ∈ (0 , s ∗ ) in the statement, which would lead to a more precise lowerbound on the Lyapunov exponents in the example below. Such a scaling would be more consistent with theresults of [14, 61]. Remark 1.11.

Above, the value s ∗ is determined exclusively by the number of ‘generations’ of bracketsneeded to satisfy Assumption 3 (though note this will generally depend on the dimension of the manifold M itself). 7heorem 1.9 is proved in Section 3. The result is a key aspect of our work and the proof requiressome signiﬁcant effort. It essentially amounts to a quantitative version of H ¨ormander’s a priori estimate forhypoelliptic regularity in an L framework; in contrast to H ¨ormander’s original work [29] is based in L forfundamental reasons.One of H ¨ormander’s original insights is that, given regularity in the forcing directions, the PDE (1.8)implies a matching, negative regularity-type estimate deﬁned by duality on ˜ X (see also discussions in [4]).Using a delicate regularization procedure, the regularity in { ˜ X j } rj =1 and the negative regularity in the ˜ X are combined in a suitable manner to obtain regularity in all directions. In order to exploit the negativeregularity dual to the regularity provided by F I , the regularization procedure we must perform is even moredelicate than H ¨ormander’s. Of course, there is a large literature of works extending H ¨ormander’s theory invarious ways, e.g., to handle rough coefﬁcients: we refer the reader to, e.g., [1, 3, 16, 28, 36, 38, 53] and thereferences therein. However, as far as the authors are aware, there are no previous works that fundamentallyrework the theory into L . R n As our application, we apply Proposition 1.4 and Theorem 1.9 to a concrete class of dynamical systemsposed on R n of which L96 in (1.1) and Galerkin truncations of many PDE are special cases. The generalclass of systems we consider on R n are of the following form, modeling a volume-preserving nonlinearitywith a weak linear damping and weak noise: d x ǫt = F ( x ǫt )d t + ǫAx ǫt + √ ǫ r X k =1 X k d W kt . (1.10)Here W kt are independent standard Brownian motions, the forcing directions { X k } rk =1 are assumed forsimplicity to be constant vector ﬁelds (a.k.a. “additive noise”), while the matrix A ∈ R n × n is negativedeﬁnite, contributing volume dissipation to the overall system. We will primarily consider drift terms F of the following form: F ( x ) = B ( x, x ) for B : R n × R n → R n bilinear,and moreover, div F ≡ and x · F ≡ . (1.11)The divergence-free condition implies preservation of Liouville measure (Lebesgue measure on R n ), whilethe condition x · F ( x ) ≡ ensures that ˙ x = F ( x ) preserves the “energy shells” S E = { x ∈ R n : k x k = E } . Systems of this form include the Lorenz-96 model (1.1) as well as Galerkin truncations of severalwell-known PDE of interest such as the Navier-Stokes equations. A more general class of models for whichthese methods apply is discussed in Remark 1.16 below.Regarding our standing assumptions, it is straightforward [5, 37] to show that (1.10) with drift term asin (1.11) generates a unique stochastic ﬂow of diffeomorphisms Φ tω : R n → R n for a.e. Brownian path ω ,and so Assumption 1(i) always holds for any ǫ > . By standard hypoellipticity theory, Assumption 1(ii) re-garding a unique stationary density is valid when { F + ǫA, X , · · · , X r } satisﬁes the parabolic H ¨ormandercondition on R n . If this holds, then Assumption 1(iii) is essentially automatic and follows from a combina-tion of results in Appendix A and [33]. As a result, the exponents λ ǫ , λ ǫ Σ as in Theorem 1.3 exist for (1.10)for any ǫ > .For systems of the form (1.10), it is particularly easy to lift vector ﬁelds to S M ≃ R n × S n − via ˜ X ǫ := (cid:18) F ( x ) + ǫAx Π v ( ∇ F ( x ) + ǫA ) v (cid:19) , ˜ X j := (cid:18) X k (cid:19) . (1.12) Note that since the vector ﬁelds X k are constant, the Itˆo and Stratonovich formulations are identical, hence why we use the Itˆonotation above. ( w ǫt ) = ( x ǫt , v ǫt ) evolving on R n × S n − , well-posedness as inAssumption 2 is standard, while the existence and uniqueness of a stationary density f ǫ follows from theparabolic H ¨ormander condition for { ˜ X ǫ , ˜ X , · · · , ˜ X r } in Assumption 3 and the drift condition provided bythe damping (see Appendix A). We emphasize that Assumption 3 generally requires work to check: see thediscussion below.For the class of Euler-like SDE above, our main result is as follows, which shows that Assumption 3 issufﬁcient to deduce ǫ − λ ǫ → ∞ . Theorem 1.12.

Consider the SDE (1.10) where F ( x ) = B ( x, x ) is as in (1.11) and B ( x, x ) is not identically . If { ˜ X ǫ , ˜ X , ... ˜ X r } as in (1.12) satisﬁes the parabolic H¨ormander condition as in Assumption 3, then thetop Lyapunov exponent λ ǫ for (1.10) satisﬁes λ ǫ ǫ → ∞ as ǫ → . Before presenting the proof of Theorem 1.12, let us brieﬂy comment on the veriﬁcation of Assumption3. For many systems of interest, it can be signiﬁcantly harder to verify spanning for { ˜ X k } on S M than toverify spanning for the vector ﬁelds { X k } on M : this is already the case for the L96 model with additivenoise. As discussed above, the veriﬁcation of Assumption 3 is the only remaining task to apply Theorem1.12 to Galerkin truncations of the Navier-Stokes equations. This is being undertaken in ongoing work.Nevertheless, we emphasize that for a given model of the form (1.10) with ﬁxed dimension and param-eters, Assumption 3 is (at least in principle) checkable using, e.g., computer algebra software. In Section4 we prove the following, which reduces the question of projective spanning to a combination of (i) thespanning condition for { X ǫ , X , ..., X r } on R d and (ii) the purely linear condition that sl ( R n ) , the space oftraceless real matrices, is generated by a collection (cid:8) H i (cid:9) of constant-valued n × n real matrices (deﬁnedexplicitly in terms of B ( x, x ) ) under the standard matrix Lie bracket. Lemma 1.13.

Let { X ǫ , X , ..., X r } be deﬁned by the SDE (4.3) and suppose that the constant vector ﬁelds { ∂ x k } nk =1 belong to the parabolic Lie algebra Lie( X ǫ ; X , . . . , X r ) . Deﬁne for each k = 1 , . . . n thefollowing constant matrices , H k := ∂ x k ∇ F ∈ sl ( R n ) and let Lie( H , . . . , H n ) be the matrix Lie sub-algebra of sl ( R n ) generated by H , . . . H n . Then theprojective vector ﬁelds { ˜ X ǫ , ˜ X , . . . , ˜ X r } satisﬁes Assumption 3 if Lie( H , . . . , H n ) = sl ( R n ) . (1.13)Lemma 1.13 is used to prove projective spanning for L96 with additive forcing in Section 4.3. UsingTheorem 1.12, Theorem 1.1 above follows as a corollary.We note that the proof presented there for L96 heavily relies on the “local” coupling of unknowns inthe nonlinearity, which greatly simpliﬁes the application of Lemma 1.13 in this case. However, for modelswhich are the Galerkin truncations of PDEs, coupling between unknowns has a more ‘global’ character, andverifying the hypotheses of Lemma 1.13 remains open. Throughout, we assume the setting of Theorem 1.12, and in particular, that the collection of projective vectorﬁelds { ˜ X ǫ , ˜ X , · · · , ˜ X r } as in (1.12) satisﬁes the parabolic H ¨ormander condition in Assumption 3.Let us begin by articulating the Fisher information identity (Proposition 1.4) and hypoelliptic regular-ity estimate (1.9) in the context of Euler-like models. Writing λ ǫ , λ ǫ Σ for the top and summed Lyapunov9xponents as in Theorem 1.3, the partial Fisher information identity (1.6) reads as follows: nλ ǫ ǫ − A = 12 X k ˆ R n × S n − | ˜ X k f ǫ | f ǫ d x d v =: F I ( f ǫ ) . (1.14)This is immediate from Proposition 1.4 on noting that (i) λ Σ = ǫ tr A by Theorem 1.3 and (1.11), while(ii) the condition f ǫ log f ǫ ∈ L (d q ) , with d q the volume element for SR n , follows from the estimates inAppendix A. Turning to the hypoelliptic regularity estimate: Theorem 1.9 implies k f ǫ k W s, ( U × S n − ) ≤ C (cid:16) p F I ( f ǫ ) (cid:17) , (1.15)for any U ⊂ R n bounded, where s ∈ (0 , and C = C U are constants independent of ǫ .In view of the form of (1.14) and (1.15) we see that if ǫ − λ ǫ were to remain bounded, then f ǫ would bebounded in W s, uniformly in ǫ . This observation leads naturally to the following alternative. Proposition 1.14.

At least one of the following holds:(a) lim ǫ → λ ǫ ǫ = ∞ ; or(b) the zero-noise ﬂow ( x t , v t ) admits a stationary density f ∈ L ( R n × S n − ) (and moreover f ∈ W s, loc on bounded sets).Proof. Suppose that (a) fails, i.e. lim inf ǫ → λ ǫ ǫ < ∞ . In this case, (1.14) implies that lim inf ǫ → F I ( f ǫ ) < ∞ and the hypoelliptic regularity estimate (1.15)provides regularity in the missing directions, i.e., lim inf ǫ → || f ǫ || W s, ( U ) < ∞ for all open, bounded sets.Combined with the uniform tightness of { f ǫ } ǫ> in (A.1) (coming from the energy identity x · B ( x, x ) = 0 and that A is negative deﬁnite) this yields compactness in L for { f ǫ } ǫ ∈ (0 , as ǫ → . Extracting asubsequence { ǫ j } , we see that ∃ f ∈ L such that f ǫ j → f in L . Furthermore, passing to the limit ǫ j → pathwise in the SDE and in the Kolmogorov equation (1.8), we see that f d q is an invariant measure of deterministic ﬂow ( x t , v t ) and hence (b) holds.A crucial feature of our approach is that alternative (b) in Proposition 1.14 is quite rigid and can beruled out in many cases, even for systems with very complicated deterministic dynamics for which we haveaccess to little information. In our setting, alternative (b) is ruled out by the following proposition, provedin Section 5; this is enough to complete the proof of Theorem 1.12.Below, we deﬁne ˆΦ t : SR n (cid:9) for the (deterministic) ﬂow corresponding to the ǫ = 0 process ( x t , v t ) ,while Φ t : R n (cid:9) is the ﬂow corresponding to ( x t ) on R n . Proposition 1.15.

Assume that the bilinear mapping B is not identically 0. Let ν be any invariant proba-bility measure for ˆΦ t with the property that ν ( A × S n − ) = µ ( A ) , where µ ≪ Leb R n . Then, ν is singularwith respect to Lebesgue measure Leb SR n on SR n . Proposition 1.15 is proved in Section 5, using ideas inspired by the classiﬁcation of invariant projectivemeasures for general linear cocycles [6]. Roughly speaking, this theory implies that if ν ≪ Leb SR n were tohold, then the ǫ = 0 ﬂow must be an isometry with respect to a potentially ‘rough’ (i.e., measurably-varying)Riemannian metric on R n . For systems of the form (1.10) satisfying (1.11), this can be ruled out relativelyeasily, using the fact that at ǫ = 0 , the dynamics of (1.10) induces shearing between successive “energyshells” S E = {k x k = E } , E > . To wit, D Φ t ( x ) x = Φ t ( x ) + tB (Φ t ( x ) , Φ t ( x )) (1.16)10see Lemma 5.1). In particular, at a point x ∈ R n \{ } , an inﬁnitesimal perturbation in the “radial” direction x will grow indeﬁnitely at a linear rate t , except at times when B (Φ t ( x ) , Φ t ( x )) is very small. Thus,Proposition 1.14(b) can be ruled out by a simple Poincar´e recurrence argument, using only the assumptionthat B is not identically 0 (see Section 5 for details). Remark 1.16.

The above arguments apply, in principle, to a broader class of drift terms F ( x ) than thosegiven in (1.11). For instance, provided that we start with the weak-damping, constant forcing regime d x ǫt = F ( x ǫt )d t + ǫAx ǫt d t + r X k =1 X k d W kt , our methods easily extend to treat multilinear F ( x ) = P Pj =0 B j ( x, .., x ) for B j multilinear of degree p j with p j ≥ for at least one j . Reordering so that p > p > . . . > p P , a rescaling of u, t and a re-deﬁnitionof ǫ provides the analogue of (1.10): d x ǫt = P X j =0 ǫ pj − p p B j ( x t , ...x t )d t + ǫAx ǫt d t + √ ǫ r X k =1 X k d W kt . Hence, as ǫ → , the leading order nonlinearity dominates and the problem essentially reduces to the homo-geneous case, provided of course that the leading nonzero nonlinearity term B p satisﬁes x · B p ( x, · · · , x ) ≡ and div B p ( x, · · · , x ) ≡ , analogously to (1.11) (and of course, we require Assumption 3). Remark 1.17.

Without much additional work, Proposition1.14 generalizes to a large class of zero-noiselimits of volume-preserving systems on a compact manifold. Of particular interest are parabolic ﬂows, e.g.,‘typical’ completely integrable ﬂows, for which Proposition 1.14(b) is usually impossible due to shearingbetween invariant tori (analogous to (1.16)). This suggests that the scaling ǫ − λ ǫ → ∞ ought to be fairlycommon among zero-noise limits, even for a large class of decidedly non-chaotic zero-noise dynamics.Remarkably, when ǫ ≪ , many Lyapunov times O (( λ ǫ ) − ) elapse before the O ( ǫ − ) timescale whenthe effects of noise become apparent. On the other hand, how long a Lyapunov time actually takes (thatis, how long it typically takes a tangent vector to double in length) depends crucially on the rate at whichLyapunov exponents are realized, itself a large-deviations problem. This will be the subject of future work. As remarked earlier, for a given system it can be extremely challenging to estimate its Lyapunov expo-nents and provide a mathematically rigorous account of its time-asymptotic behavior. Indeed, in principleLyapunov exponents require inﬁnitely precise information on inﬁnite trajectories, and in practice the con-vergence of Lyapunov exponents to their ‘true’ values can exhibit long stretches of intermittent behavior.This is especially so for deterministic systems in the absence of stochastic driving, for which one anticipatesthat “chaotic” and “orderly” regimes coexist in a convoluted way in both phase space as well as ‘parame-ter space’, i.e., as the underlying dynamical system is varied: we refer the interested reader to, e.g., workon Newhouse phenomena in dissipative systems [54, 55]; the proliferation of elliptic islands in volume-preserving systems [25]; known coexistence of chaotic and ordered regimes for the quadratic map family[47]; and C generic dichotomies [17, 18]. For more background on this rich topic, see, e.g., [23, 60, 72, 74].Although it still presents signiﬁcant challenges, the situation for Lyapunov exponents of stochasticallyforced systems is notably more tractable. To start, let us ﬁrst address the body of work `a la Furstenbergwhich describes necessary conditions for ‘degeneracy’ of the Lyapunov exponents of a random dynamicalsystem. Consider a stochastic ﬂow of diffeomorphisms Φ tω on an n -dimensional manifold M arising froman SDE satisfying Assumption 1, and let λ , λ Σ be as in Theorem 1.3. Let µ be the (unique) stationarymeasure for x t := Φ tω ( x ) . Note that unconditionally we have nλ − λ Σ ≥ . In this context, and brushing11side technical details, the criterion `a la Furstenberg is due to a variety of authors (e.g., [21, 41, 65, 68]), andcan be stated as follow: if ν ∈ P ( S M ) is a stationary measure for the projective process and d ν ( x, v ) =d ν x ( v )d µ ( x ) the disintegration of ν , then for all t > there holds E ˆ M H ( D Φ t ( x ) ∗ ν x | ν Φ t ( x ) ) d µ ( x ) ≤ t ( nλ − λ Σ ) , (1.17)where H denotes the relative entropy of deﬁned for two measure measures η ≪ λ by H ( η | λ ) := ˆ log (cid:18) d η d λ (cid:19) d η . From this we see that either nλ − λ Σ > , (1.18)or the probabilistic law governing the stochastic ﬂow admits a strong ‘degeneracy’ in the sense that ( D x Φ tω ) ∗ ν x = ν Φ tω ( x ) (1.19)with probability 1 for all t ≥ and µ -typical x . That this is situation is very ‘degenerate’ follows from thefact that for ﬁxed x and t , the above right-hand side depends only on the time − t position Φ tω ( x ) , while theleft-hand side depends additionally on the entire noise path ω | [0 ,t ] .Observe that in the weakly-damped, weakly-driven setting of (1.10), λ ǫ Σ = ǫ tr A < and so (1.18) istotally agnostic as to whether λ ǫ > or not. Indeed, the techniques in the above-mentioned works are “soft”as the identity (1.19) is non-quantitative in the parameters of the underlying system. Although (1.17) does atleast provide some kind of formula for nλ − λ Σ , it is unclear how to glean useful quantitative informationdirectly from (1.17).Interestingly, our Fisher-information identity in Proposition 1.4, speciﬁcally (1.7), is in fact essentiallythe time-inﬁnitesimal analogue of (1.17), as we show below in Section 2.4. Hence, like (1.17), our Propo-sition 1.4 admits an interpretation in terms of the rate at which the degeneracy (1.19) fails to hold for thestochastic ﬂow Φ tω . However, Proposition 1.4 recasts the information in terms of the generator of ( w t ) ,which is more amenable now to the use of hypoelliptic PDE methods such as those employed in Theorem1.9. This motivates the claim that the methods in this paper constitute a ﬁrst step towards a quantitative `ala Furstenberg theory. We remark that Fisher information-type quantities also commonly appear as the timederivatives of the relative entropy in the study of gradient ﬂows and logarithmic Sobolev inequalities (seee.g. [9, 39, 63, 66]).Beyond `a la Furstenberg, there is by now a large literature on the Lyapunov exponents of particularmodels for which we cannot do justice in this space. Instead, we will focus on a class of results most closelyrelated to ours (Theorems 1.1 and 1.12): small-noise expansions of Lyapunov exponents for weakly-drivenstochastic systems. To frame the discussion, consider the abstract linear SDE d V t = A ǫt V t d t + √ ǫ r X k =1 B kt V t ◦ d W kt , (1.20)where A ǫt , B kt are, in general, time-varying and/or themselves randomly driven, and A ǫt may or may notexhibit some vanishingly weak damping as ǫ → . There are many works studying the scaling behaviorof Lyapunov exponent λ ǫ := lim t →∞ t log | V t | of such systems, e.g., [8, 30, 52, 58, 61] in the constantcoefﬁcient case, and [7, 13, 14] when the when A t , B kt are coupled to some other stochastic ﬂow. To theauthors’ best knowledge, however, all of these results are restricted to settings where the ǫ = 0 dynamics12re relatively simple and essentially completely known. In comparison, our results are indifferent to anydetailed description of the zero-noise dynamics. On the other hand, the sacriﬁce for our level of generalityis that our estimate λ ǫ /ǫ → ∞ is far weaker than an asymptotic expansion, and is likely to be suboptimalfor many models of interest.Of particular interest is that among models of the form (1.20), scaling laws of the form λ ǫ ∼ ǫ γ , γ ≥ tend to be associated with zero-noise dynamics which are rigid isometries (exhibiting no shearing) [8, 12,13, 58]: note that such projectivized zero-noise dynamics preserve an invariant density, namely, Lebesguemeasure, c.f. alternative (b) in Proposition 1.14. Meanwhile, laws of the form λ ǫ ∼ ǫ γ , γ < are associatedwith zero-noise dynamics exhibiting some shearing mechanism (c.f. shearing between energy shells as in(1.16)). By way of example, [8, 61] derive such scaling laws when A t as above is given by A t ≡ (cid:18) (cid:19) , B t ≡ (cid:18) (cid:19) , corresponding to the constant application of a horizontal shear in conjunction with a small, stochasticallydriven vertical shear. This analysis was extended to the setting of ﬂuctuation-dissipation zero-noise limitsof certain 2d completely integrable Hamiltonian systems in the work [14].Although the Fisher information identity in Proposition 1.4 and our Proposition 1.14 for zero-noiselimits make no direct reference to a speciﬁc dynamical motif or behavior, it is clear from our application tothe class of Euler-like models (1.10) with bilinear nonlinearity as in (1.11) that shearing in the zero-noise,zero-damping dynamics is a very natural way to rule out the rigid isometry alternative in Proposition 1.14(b).Of course, shearing has long been regarded as a potential mechanism for the generation of chaotic behavior.As early as the late 70’s it was realized that chaotic attractors could arise from time-periodic driving of asystem undergoing a Hopf bifurcation [75], while subsequent mathematically rigorous work has conﬁrmedthis mechanism (see, e.g., [71] for an overview of this program). We also point out the work [42], whichprovides a mix of heuristics, numerics and mathematical analysis demonstrating the shearing mechanism asa source of chaotic behavior. This section is devoted to a proof of the Fisher information identity Proposition 1.4 for the top Lyapunovexponent in a general setting. We present two proofs: the ﬁrst, via the Furstenberg-Khasminskii formula[5], is carried out in Section 2.2, while the second, via a relative entropy formula related more directly toFurstenberg’s work on Lyapunov exponents, is presented in Section 2.4.

Let ( M, g ) be a smooth connected Riemannian manifold, and as in (1.9), consider the SDE d x t = X ( x t ) d t + r X k =1 X k ( x t ) ◦ d W kt , (2.1)where { X k } rk =0 are a family of smooth vector ﬁelds (potentially unbounded) on M , { W k } rk =1 are inde-pendent standard Wiener processes and the product is taken in the Stratonovich sense. Recall that (2.1) isdeﬁned so that for each ϕ ∈ C ∞ c ( M ) the following R valued Stratonovich equation holds for each t ∈ R + ϕ ( x t ) = ϕ ( x ) + ˆ t X ϕ ( x s )d s + r X k =1 ˆ t X k ϕ ( x s ) ◦ d W ks . (2.2)The generator of the Markov semigroup is the following second order differential operator written inH ¨ormander form L := X + 12 r X k =1 X k , Mϕ ( x t ) = ϕ ( x ) + ˆ t L ϕ ( x s )d s + r X k =1 ˆ t X k ϕ ( x s )d W ks , Recall that a stationary probability µ ∈ P ( M ) for the SDE (1.9), is any probability measure µ satisfying ˆ M L ϕ d µ = 0 , for all ϕ ∈ C ∞ c ( M ) .We will only be interested in cases when equation (1.9) gives rise to a unique Markov process ( x t ) anda global-in-time stochastic ﬂow of diffeomorphisms Φ t and has a unique stationary measure µ . Remark 2.1.

Obtaining the existence and uniqueness of a global stochastic ﬂow of diffeomorphisms and athe existence of a stationary probability measure µ is not automatic when the manifold M is not compactdue to the potential unboundedness of a the vector ﬁelds { X k } rk =0 and the loss of tightness as t → ∞ of Law( x t ) . In general, one must obtain a suitable Lyapunov function (also called a drift condition) tocontrol the growth of the process ( x t ) (see e.g. [51]) to obtain global solutions and existence of a stationaryprobability measure. In order to deduce our Fisher information identity, we must ﬁrst derive a formula for the top Lyapunovexponent, commonly referred to as the Furstenberg-Khasminskii formula (see, e.g., [5, 32]).As in Section 1.1, we deﬁne the projective process ( x t , v t ) deﬁned on P M as in (1.4) and in particular(1.5). As is commonly done in, we will often conﬂate the projective bundle

P M with the unit sphere bundle S M whose ﬁbers are the spheres S x M = S n − ( T x M ) canonically embedded in T x M , and are universaldouble covers of P x M .Using the Riemannian structure on M and the Levi-Civita connection ∇ , we equip S M with a cannon-ical Riemannian metric ˜ g (the Sasaki metric), so that the bundle projection π : S M → M is a Riemanniansubmersion. This means that for each w = ( x, v ) ∈ S M we can decompose T w S M into a horizontal H w S M subspace of directions transverse to the ﬁbers and a vertical V w S M subspace of directions along theﬁbers and each of which can be identiﬁed with the spaces T x M and T v ( S x M ) respectively. Moreover thesespaces are orthogonal with respect to the metric ˜ g giving the orthogonal decomposition T w S M = T x M ⊕ T v ( S x M ) which allows us to write the vector ﬁelds { ˜ X k } rk =0 as ˜ X k ( x, v ) := (cid:18) X k ( x ) V ∇ X k ( x ) ( v ) (cid:19) , where ∇ X k ( x ) the covariant derivative of the vector ﬁeld X k , which for each x ∈ M we view as a linearendomorphism ∇ X k ( x ) : T x M → T x M, so that for each v ∈ T x M , ∇ X k ( x ) v := ∇ v X k ( x ) . Recall that the divergence div X of a vector ﬁeld X onRiemannian manifold M is given by the trace of it’s covariant derivative (using the Levi-Civita connection) div X := tr( ∇ X ) . The following identity will be useful relating the divergence of ˜ X k to that of X k .14 emma 2.2. The following identity holds for for each k = 0 , . . . r and v ∈ S x M , div ˜ X k ( x, v ) = 2 div X k ( x ) − n h v, ∇ X k ( x ) v i . (2.3) Proof.

First we note that in light of the orthogonal splitting T w SM = T x M ⊕ T v ( S x M ) we have div ˜ X k ( x, v ) = div X k ( x ) + div V ∇ X k ( x ) ( v ) , where for a ﬁxed x ∈ M the divergence div V ∇ X k ( x ) ( v ) is divergence of V ∇ X k ( x ) ( v ) treated as a vector ﬁeldon the sphere S n − ( T x M ) . Since T x M is isomorphic to R n , it sufﬁces to show that for any linear operator A : R n → R n that the following identity holds true div V A ( v ) = tr( A ) − n h v, Av i . (2.4)To show this, we ﬁrst compute the covariant derivative ∇ V A using the embedding of S n − in R n . Speciﬁcallywe use that ∇ V A is related to the Euclidean differential DV A ( v ) : T v S n − → R n by projecting it’s rangeonto T v S n − via Π v = I − v ⊗ v ♯ . Recalling that V A ( v ) = Π v Av a simple calculation shows that for each v ∈ S n − , the Euclidean differential is DV A ( v ) = Π v A − h v, Av i I − v ⊗ ( Av ) ♯ . Projecting onto T v S n − eliminates the normal term v ⊗ Av ♯ , giving the following formula for the covariantderivative ∇ V A ( v ) = Π v A − h v, Av i I, which implies div V A ( v ) = tr( ∇ V A ) = tr T v S n − ( A ) − ( n − h v, Av i , (2.5)where tr T v S n − ( A ) denotes the trace of A restricted to the n − dimensional subspace T v S n − . To computethis, ﬁx v ∈ S n − and let { e , e , . . . e n − } be an orthonormal basis for T v S n − = v ⊥ and note that { e , e , . . . , e n − , v } is an orthonormal basis for R n . Therefore we have tr T v S n − ( A ) = n − X i =1 h e i , Ae i i = tr( A ) − h v, Av i . Upon substituting this expression into (2.5), we obtain (2.4).We will need the following enhancement of the multiplicative ergodic theorem which says that undersome ergodicity assumptions on the projective process w t , one achieves λ exponential growth in every tangent direction with probability 1. For each ( t, x ) ∈ R + × M , let D Φ t ( x ) : T x M → T x t M be theJacobian of the stochastic ﬂow Φ t at x ∈ M . The following is a corollary of, e.g., Theorem III.1.2 in [34]. Theorem 2.3.

Suppose that Assumptions 1 and 2 hold. Let ν be the unique stationary measure for ( w t ) .Then, for ν almost every w = ( x, v ) ∈ S M we have λ = lim t →∞ t log | D Φ t ( x ) v | with probability 1 . We are now ready to prove the Furstenberg-Khasminskii formula for (2.1). A sketch of its proof isincluded for completeness. 15 roposition 2.4 (Furstenberg-Khasminskii) . Deﬁne for each x ∈ MQ ( x ) := div X ( x ) + 12 r X k =1 X k div X k ( x ) and each w ∈ S M ˜ Q ( w ) := div ˜ X ( w ) + 12 r X k =1 ˜ X k div ˜ X k ( w ) . Suppose that ( w t ) has a unique stationary probability measure ν on S M that projects to µ on M , and that Q ∈ L ( µ ) and ˜ Q ∈ L ( ν ) , then the following formulas hold λ Σ = ˆ M Q d µ, (2.6) nλ − λ Σ = − ˆ S M ˜ Q d ν. (2.7) Proof.

We begin by proving (2.6). We begin by noting that a standard calculation relating determinants totraces gives d log | det D Φ t ( x ) | = tr ∇ X ( x t ) d t + r X k =1 tr ∇ X k ( x t ) ◦ d W kt = div X ( x t ) d t + r X k =1 div X k ( x t ) ◦ d W kt . Upon converting to Itˆo and integrating in time, we obtain t log | det D Φ t ( x ) | = 1 t ˆ t div X ( x s ) d s + r X k =1 t ˆ t X k div X k ( x s )d s + 1 t M t = 1 t ˆ t Q ( x s )d s + 1 t M t , where M t is a mean-zero martingale arising from the Itˆo integral. We now take t → ∞ : the LHS convergesto λ Σ by Theorem 1.3, while the ﬁrst term on the RHS converges to ´ Qdµ by the ergodic theorem. Inparticular, t M t must also converge, both pointwise and in L ( P × µ ) , hence t M t → and (2.6) follows.Likewise, to prove (2.7), we see that a straight forward computation and formula (2.3) yields d log( | D Φ t ( x ) v | ) = h v t , ∇ X ( x t ) v t i d t + r X k =1 h v t , ∇ X k ( x t ) v t i ◦ d W kt = 1 n (cid:16) X ( x t ) − div ˜ X ( w t ) (cid:17) d t + 1 n r X k =1 (cid:16) X k ( x t ) − div ˜ X k ( w t ) (cid:17) ◦ d W kt . Converting to Itˆo gives t log( | D Φ t ( x ) v | ) = 1 nt ˆ t (cid:16) X k ( x s ) − div ˜ X k ( w s ) (cid:17) d s + 1 nt ˆ t (cid:18) X k div X k ( x s ) −

12 ˜ X k div ˜ X k ( w s ) (cid:19) d s + 1 t ˜ M t = 1 nt ˆ t Q ( x s )d s − nt ˆ t ˜ Q ( w s )d s + 1 t ˜ M t , with ˜ M t another mean-zero martingale. The proof is complete on sending t → ∞ and applying the ergodictheorem, this time using Theorem 2.3 to ensure the LHS converges to λ .16 .3 Fisher information identity In this section, we prove Proposition 1.4. As discussed in Section 1.1, the Markov process ( w t ) on thesphere bundle S M has the following generator in H ¨ormander form ˜ L = ˜ X + 12 X k ˜ X k . As discussed in the intro, we will be working in the setting where the process ( w t ) admits a unique stationaryprobability measure ν on S M with smooth density f ( w ) with respect to the volume measure d q on S M satisfying ´ f d q = 1 . The stationary density f solves the following PDE ˜ L ∗ f = ˜ X ∗ f + 12 r X k =1 ( ˜ X ∗ k ) f = 0 , (2.8)where for a given vector ﬁeld ˜ X on S M , ˜ X ∗ denotes the formal adjoint operator with respect to L (d q ) .Note that the differential operator ˜ X ∗ can be related to ˜ X and div ˜ X through the following relation ˜ X ∗ h = − ˜ Xh − (div ˜ X ) h, h ∈ C ∞ c ( S M ) . (2.9)We are now ready to prove Proposition 1.4. Proof of Proposition 1.4 . Formally, the argument is straightforward. Consider ﬁrst the second equality in(1.6). Pairing (2.8) with log f and integrating gives − r X k =1 ˆ S M (log f )( ˜ X ∗ k ) f d q = ˆ S M (log f ) ˜ X ∗ f d q. Ignoring, for the moment, that f is not compactly supported, integrating by parts a few times and using (2.9)gives for the left hand side − r X k =1 ˆ S M (log f )( ˜ X ∗ k ) f d q = − r X k =1 ˆ S M ( ˜ X k f )( ˜ X ∗ k f ) f d q = F I ( f ) + 12 r X k =1 ˆ S M ( ˜ X k div ˜ X k ) f d q, whereas, for the right hand side we have ˆ (log f ) ˜ X ∗ f d q = ˆ ˜ X f d q = − ˆ (div ˜ X ) f d q. Putting these two identities together yields

F I ( f ) = − ´ Q d ν and therefore (1.6). The formula for F I ( ρ ) with ρ = dµdx the stationary density for ( x t ) , follows from an identical argument, omitted for brevity, onceone observes that ρ solves the Kolmogorov equation X ∗ ρ + 12 r X j =1 ( X ∗ j ) ρ = 0 . To rigorously justify the above formal calculation, we need to be a little more careful with integration byparts and make use of the f log f integrability assumption. Let χ ∈ C ∞ c ( B (0 , R )) satisfy ≤ χ ≤ with17 R ( x ) = 1 in B (0 , R/ , where B (0 , R ) is the geodesic ball of radius R on M . Multiplying both sides by (log f ) χ R and following the above procedure gives X k ˆ S M | ˜ X ∗ k f | f χ R d q = − ˆ S M ˜ Qf χ R d q + ˆ S M ( L χ R )( f log f − f ) d q. Using the fact that χ R → , |L χ R | . uniformly in R , and L χ R → pointwise as R → ∞ , and the factthat f log f − f ∈ L , we apply the dominated convergence theorem to pass the limit as R → ∞ .Turn next to (1.7). We give only the formal proof, the rigorous proof by the dominated convergencetheorem is analogous given the regularity provided by (1.6). For this we observe that (denoting d v Lebesguemeasure on S x M ) ˜ X ∗ k ( hρ ) = (( V ∗∇ X k h − ˜ X k h ) ρ + ( X ∗ k ρ ) h and therefore since ´ S x M h x ( v ) = 1 , we ﬁnd F I ( f ) − F I ( ρ ) = 12 r X k =1 ˆ S M | ( V ∗∇ X k h − ˜ X k h ) ρ + ( X ∗ k ρ ) h | hρ d q − r X k =1 ˆ S M | X ∗ k ρ | ρ h d q = 12 r X k =1 ˆ M ˆ S x M | ( X k − V ∗∇ X k ( x ) ) h x ( v ) | h x ( v ) d v ! d µ ( x )+ ˆ S M ( V ∗∇ X k h − X k h ) ( X ∗ k ρ ) d q. However, ˆ S M ( V ∗∇ X k h − X k h ) ( X ∗ k ρ ) d q = − ˆ S M X k h ( X ∗ k ρ ) d q = − ˆ S M h ( X ∗ k ) ρ d q = 0 . In this section we give a formal argument of the Fisher information identity using the proper analogue ofthe relative entropy formula (1.17), measuring the degree to which the degeneracy (1.19) fails to hold. Wealready have given a complete proof above, this section is simply a way to get some additional intuitionregarding the meaning behind Proposition 1.4. Hence, in this section we do not endeavor to give a completeproof, furthermore, for technical simplicity in this section we only consider the case in which M is compact(still with no boundary).In preparation, recall that given two measures λ, η on S M , η ≪ λ , we deﬁne the relative entropy of η with respect to λ by H ( η | λ ) := ˆ S M log (cid:18) d η d λ (cid:19) d η . Since we work frequently with absolutely continuous measures, we abuse notation somewhat and also write H ( f | g ) for the relative entropy of f d q with respect to g d q . Recall that H ( f | g ) = 0 if and only if f ≡ g .In what follows, we let ˆΦ t be the stochastic ﬂow of diffeomorphisms on S M induced by the SDEgoverning the projective process ( w t ) . Given a smooth density f ∈ L ( S M ) , let f t := ( ˆΦ t ) ∗ f = f ◦ ˆΦ − t | det D ˆΦ − t |

18e the pushforward of f as a measure on S M . The density f t can readily be seen to solve the stochasticcontinuity equation d f t = ˜ L ∗ f t d t + r X k =1 ˜ X ∗ k f t d W kt , and satisﬁes f t → f locally uniformly on S M .In [11], Baxendale derived the following formula (inspired by one of Furstenberg [26] in the context ofIID compositions of random matrices): Theorem 2.5 (Baxendale [11]) . Under Assumptions 1 & 2, writing f = dνdq for the stationary density of ( w t ) and ρ = dµdx for that of ( x t ) and denoting f t = ( ˆΦ t ) ∗ f, ρ t = (Φ t ) ∗ ρ, one has the following: E H ( ρ t | ρ ) = − tλ Σ , E ( H ( f t | f ) − H ( ρ t | ρ )) = t ( nλ − λ Σ ) . (2.10)The second line can be rewritten (using, e.g., Lemma 3.2 in [11]) in the following highly suggestive form.Let ( ν x ) denote the disintegration measures of ν (as integrands, d ν ( x, v ) = d ν x ( v )d µ ( x ) ; see Section 5.2),one has E ˆ H (cid:0) D Φ t ( x ) ∗ ν x | ν Φ t ( x ) (cid:1) d µ ( x ) = nλ − λ Σ . (2.11)This form directly encodes the Furstenberg criterion (1.19) for nλ − λ Σ = 0 : naturally, if nλ − λ Σ = 0 then one must have ( D x Φ) ∗ ν x = ν x t for all t . Indeed, one might hope to extract quantitative informationabout gap nλ − λ Σ using (2.11), although to our best knowledge this has not been done.On the other hand, our Fisher information identity can be thought of as the time-inﬁnitesimal analogueof (2.10), as the following shows. Lemma 2.6.

F I ( ρ ) = lim t → t E H ( ρ t | ρ ) ,F I ( f ) = lim t → t E H ( f t | f ) . Consequently,

F I ( f ) = nλ − λ Σ . In view of (2.10), we see that

F I ( f ) − F I ( ρ ) measures the rate at which the projective dynamics ˆΦ t distorts the stationary ﬁber measures ( ν x ) x ∈ M . Proof.

We include a sketch of the proof, ignoring technical details related to localization and convergenceof ρ t → ρ, f t → f . We present here the proof for f t ; the proof for ρ t largely follows the same lines and isomitted.Using the formula for f t , we can apply Itˆo’s lemma to obtain the following stochastic equation that holdspointwise on S M : d (cid:20) f t log (cid:18) f t f (cid:19)(cid:21) = 12 r X k =1 | X ∗ k f t | f t d t + (cid:20) log (cid:18) f t f (cid:19) − (cid:21) (d f t ) . f t log (cid:18) f t f (cid:19) = 12 r X k =1 ˆ t | X ∗ k f s | f s d s + ˆ t ( ˜ L ∗ f s ) (cid:20) log (cid:18) f s f (cid:19) − (cid:21) d s + 1 t M t where M t is a mean-zero Martingale whose exact form is not important. Integrating over S M , using Fubini,and averaging with respect to E gives t E H ( f t | f ) = 1 t E ˆ t F I ( f s )d s + 1 t E ˆ t ˆ S M ( ˜ L ∗ f s ) log (cid:18) f s f (cid:19) d q d s, where we used the fact that ´ S M ˜ L ∗ f s d q = 0 . Sending t → and assuming that we can pass the limit f t → f in both terms on the right-hand side gives the result since log (cid:16) f t f (cid:17) → . This section is dedicated to the proof of Theorem 1.9. We start with some notation and conventions to setup our main result, the statement of the quantitative hypoelliptic regularity estimate in Theorem 3.2.The proof we present has little directly to do with S M , and so throughout Section 3 we replace S M withan arbitrary, connected, orientable Riemannian manifold ( M , g ) with volume element d q . Some notation:in what follows we denote d = dim M , and write X ( M ) for the set of smooth vector ﬁelds on M . Elements X ∈ X ( M ) are regarded in the usual way as ﬁrst-order differential operators acting on observables w : M → R .Throughout, { X ǫ , X ǫ , ..., X ǫr } ⊂ X ( M ) is a ﬁxed collection of smooth vector ﬁelds (note that since thissection is not speciﬁc to S M , we will drop the tildes for notational simplicity). We are interested in studyingthe regularity of the family { f ǫ d q } ⊂ P ( M ) of smooth, absolutely continuous probability densities solvingthe stationary forward Kolmogorov equation ( X ǫ ) ∗ f ǫ + ǫ r X j =1 (( X ǫj ) ∗ ) f ǫ = 0 . (3.1)In what follows we will drop the ǫ superscript on the vector ﬁelds for notational simplicity.Regularity is estimated using the following ‘fractional’ norms, which arise naturally in our analysis. Todeﬁne these, let { x j } be a countable family of smooth injective mappings x j : B δ (0) → M , B δ (0) ⊂ R d such that both ˜ U j := x j ( B δ (0)) and U j := x j ( B δ (0)) are covers of M for which diam ˜ U j < ∞ and every q ∈ M is in at most ﬁnitely many ˜ U j . Let { χ j } be a smooth partition of unity on M subordinate to thecover { U j } , i.e., (i) ≤ χ j ≤ everywhere, (ii) χ j | U j ≡ , and (iii) χ j is supported in ˜ U j .Fractional L Sobolev spaces W s, with s ∈ (0 , are deﬁned by || w || W s, = || w || L + X j ˆ | h | <δ ˆ R d | ˜ w ( x + h ) − ˜ w ( x ) || h | d + s J j ( x )d x d h. (3.2)In practice, though, is easier to work with the following L H ¨older-type regularity class (essentially theBesov space B s , ∞ ): for s ∈ (0 , , || w || Λ s = || w || L + sup h ∈ R d : | h | <δ X j ˆ R n | ˜ w ( x + h ) − ˜ w ( x ) || h | s J j ( x )d x, (3.3)20here ˜ w j = ( χ j w ) ◦ x j and J = J j : B δ (0) → R ≥ is the coordinate representation of the volume elementin the chart ( U j , x j ) . The following embedding is clear: for all < s < s ′ < , for any w ∈ C ∞ c ( U ) with U ⊂ M open and bounded, we have || w || W s, . || w || Λ s ′ . Given

X, Y ∈ X ( M ) , the adjoint action of X on Y is deﬁned through the Lie bracketad ( X ) Y = [ X, Y ] . For a multi-index I = ( i , ..., i k ) , i j ∈ { , · · · , r } for each ≤ j ≤ k , we denote X I = ad ( X i ) . . . ad ( X i k − ) X i k . In what follows, set s = , s j = 1 , ≤ j ≤ r and for a multi-index I = ( i , ..., i k ) we write, m ( I ) := 1 s ( I ) := k X j =1 s i j . Note that m ( I ) provides a measure of how “deep” a bracket is (i.e. the larger m ( I ) the more brackets thatwere taken), weighted in a way that will be consistent with available regularity.We denote by X s ( M ) ⊂ X ( M ) the C ∞ ( M ) -submodule of vector ﬁelds generated from successivebrackets with s ≤ s ( I ) , that is, X s ( M ) =  Z ∈ X ( M ) : Z = N X j =1 h j X I j , s ( I j ) ≥ s, h j ∈ C ∞ ( M )  . Recall that { X j } rj =0 = { X ǫj } rj =0 ⊂ X ( M ) depend in a general manner on a parameter ǫ ∈ (0 , , hence X s ( M ) also depends on ǫ . This dependence is constrained only by the following ‘uniform’ version of theparabolic H ¨ormander condition: Deﬁnition 3.1 (Uniform parabolic H ¨ormander) . Let { Z ǫ , Z ǫ , ..., Z ǫr } ⊂ X ( M ) be a set of vector ﬁeldsparameterized by ǫ ∈ (0 , . With X k deﬁned as in Deﬁnition 1.7 we say { Z ǫ , Z ǫ , ..., Z ǫr } satisﬁes theuniform parabolic H ¨ormander condition on M if ∃ k ∈ N , such that for any open, bounded set U ⊆ M there exists constants { K n } ∞ n =0 , such that for all ǫ ∈ (0 , and all x ∈ U , there is a subset V ( x ) ⊂ X k suchthat ∀ ξ ∈ R d | ξ | ≤ K X Z ∈ V ( x ) | Z ( x ) · ξ | X Z ∈ V ( x ) || Z || C n ≤ K n . Assuming, as we do, that n X ǫj o satisﬁes the uniform parabolic H ¨ormander condition, a simple conse-quence is that ∃ s > such that ∀ ǫ ∈ (0 , , X s ( M ) = X ( M ) . Once and for all, ﬁx s ∗ > so that X s ∗ ( M ) = X ( M ) .We now prove the following variant of Theorem 1.9. Indeed, if { Z , ..., Z m } span T w M then ∃ δ > such that ∀ v ∈ B δ ( w ) = { v ∈ M : d ( w, v ) < δ } the same vector ﬁeldsspan, and so for V ∈ X ( M ) , ∃ c j ∈ C ∞ such that V = P c j Z j on B δ . The result then follows by a suitable partition of unity. heorem 3.2. We assume that for all ǫ ∈ (0 , the PDE (3.1) admits a unique, smooth probability mea-sure solution which satisﬁes f ǫ log f ǫ ∈ L (d q ) . Assume { X ǫ , X ǫ , ..., X ǫr } satisﬁes the uniform parabolicH¨ormander condition as in Deﬁnition 3.1. Then, ∀ U ⊂ M open, bounded, ∃ C > such that for all ∀ ǫ ∈ (0 , there holds, || f ǫ || Λ s ∗ ( U ) ≤ C (cid:16) √ F I ( f ǫ ) (cid:17) . Moreover, the constant C can be chosen to depend only on U , d and the constants k and { K n } Jn =0 (for a J depending only on k and d ) in Deﬁnition 3.1. Remark 3.3.

One can check from the proof that k < s ≤ k where k is as in Deﬁnition 3.1.The remainder of this section is devoted to proving Theorem 3.2. The following is a brief outline ofwhat is to come in the remainder of Section 3. Outline of the proof of Theorem 3.2

Crucial to both H ¨ormander’s original approach and our own is the ability to measure partial regularity of afunction along some given set of directions. To make sense of this, for a vector ﬁeld Y ∈ X ( M ) and s > ,we deﬁne below the norm |·| Y,s which measures L H ¨older regularity along the direction Y .Let us make this more precise. Throughout, we ﬁx an open bounded set U ⊂ M . Given Y ∈ X ( M ) ,let Y ∗ denotes its formal adjoint and let e tY ∗ denote the linear propagator solving the partial differentialequation ∂ t − Y ∗ = 0 (this makes sense as long as t > is taken sufﬁciently small depending on Y and U ).For s > , we deﬁne the family of ‘partial’ L H ¨older seminorms | w | Y,s = sup | t |≤ δ | t | − s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tY ∗ w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . Note the dependence on the parameter δ > : in practice, given U , this parameter is ﬁxed and dependsonly on the regularity of { X I } as I ranges over the multi-indices with s ( I ) ≥ s ∗ . We may choose thisparameter thus as the vector ﬁelds in the proof vary in a uniformly bounded set in C J for a J dependingonly the constants in Deﬁnition 3.1.Turning back to the proof of Theorem 3.2: ultimately, for f = f ǫ solving the Kolmogorov equation (3.1),we seek to control k f ǫ k Λ s ∗ ( U ) from above in terms of F I ( f ǫ ) . Starting from the latter, it is straightforward(Lemma 3.4) to obtain the general functional inequality r X j =1 | w | X j , . || w || / p F I ( w ) . (3.4)for any w ∈ C ∞ ( U ) . Hence, for all intents and purposes it sufﬁces to control the regularity || f || Λ s ∗ fromabove in terms of P j ≥ | f | X j , .For this, we turn to the ideas laid out by H ¨ormander. First, the spanning condition X s ∗ = X allows to“ﬁll in” the missing directions not spanned by the original { X , · · · , X r } , leading to the general functionalinequality || w || Λ s ∗ . U || w || L + r X j =0 | w | X j ,s j (3.5) Note that these seminorms are slightly different from those used in [29], where the linear propagator e tY solving ∂ t − Y = 0 is used directly. Note, though, that the regularity deﬁned is essentially the same in the sense that || w || L + sup | t |≤ δ | t | − s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tY ∗ w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L ≈ δ ,U,H || w || L + sup | t |≤ δ | t | − s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e tY w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . w ∈ C ∞ ( U ) . This is a straightforward adaptation of [Section 4; [29]]– see Lemma 3.7 below.While (3.4) controls | w | X j , , ≤ j ≤ r in terms of the Fisher information F I ( w ) , it remains (as in[29]) to obtain an upper estimate on | f ǫ | X , / . The starting point is the derivation of an a priori estimateon f ǫ from (3.1). In [29], H ¨ormander observed that one naturally obtains an a priori regularity estimate on X f in a negative regularity L space in terms of X j f ∈ L (see also discussions in [4]). In our case, wecannot work in L , and instead have to work in a negative-type regularity which is essentially the dual tothat in (3.4)– this is the only a priori estimate available that will be useful. Pairing (3.1) with a test function v ∈ C ∞ we obtain the following, which is essentially the W − , ∞ norm with respect to the X ∗ j directions: D ǫ ( f ǫ ) := sup v ∈ C ∞ : || v || L ∞ + P rj =1 || X j v || L ∞ ≤ (cid:12)(cid:12)(cid:12)(cid:12) ˆ f ǫ X ǫ v (cid:12)(cid:12)(cid:12)(cid:12) ≤ ǫ r X j =1 (cid:12)(cid:12)(cid:12)(cid:12) X ∗ j f ǫ (cid:12)(cid:12)(cid:12)(cid:12) L . ǫ √ F I. (3.6)Using this, the missing X regularity is recovered by the following, which is the main difﬁculty in theproof: for any < σ < s ∗ , U ⊂ M bounded, open set and w ∈ C ∞ c ( U ) , we show that | w | X , / . U r X j =1 | w | X j , + D ǫ ( w ) + || w || Λ σ . (3.7)That is, we recover the | w | X , / regularity by a combination of the negative D ǫ regularity in conjunctionwith the positive | w | X j , regularity, accruing only a remainder term || w || Λ σ . Combining with (3.5) (alongwith interpolation of Λ σ between Λ s ∗ and L ), we obtain the following: ∀ U ⊂ M open, bounded, ∃ C > such that for all w ∈ C ∞ c ( U ) , there holds || w || Λ s ∗ ( U ) ≤ C  || w || L + D ǫ ( w ) + r X j =1 | w | X j ,  . (3.8)From here, our estimate on || f ǫ || Λ s ∗ ( U ) in Theorem 3.2 is an easy consequence of the functional inequality(3.4) and the a priori estimate in (3.6).In Section 3.2 we review the available a priori estimates and basic functional inequalities that are usedin the proof. In Section 3.3 we brieﬂy recall (3.5) and a closely related inequality which are straightforwardadaptations of estimates in [Section 4; [29]]. In Section 3.4 we give the proof of (3.7), leaving the mainlemma to be proved in Section 3.5. As in the corresponding step in [29], (3.7) is based on a careful regu-larization procedure, though it is more subtle to perform this procedure in the W − , ∞ -type framework wework with here. Section 3.5 is dedicated to the details of this regularization. To start, we record some useful estimates for the L H ¨older-type seminorms | · |

Y,s . Let Y ∈ X ( M ) and let e tY be the linear propagator of the partial differential operator ∂ t − Y . By the method of characteristics, thesmooth family of diffeomorphisms h Y ( t ) : M → M solving the initial value problem ˙ x = Y ( x ) satisﬁesthe identity e tY w = w ◦ h Y ( t ) . With Y ∗ the formal adjoint of Y , again by the method of characteristics there is a smooth family of strictlypositive densities H Y ( t ) : M → (0 , ∞ ) such that e − tY ∗ w = H Y ( t ) w ◦ h Y ( t ) = H Y ( t ) e tY w . (3.9)23n particular, for | t | . , | H Y − | . | t | , (3.10)with similar estimates on higher derivatives.Next, we prove (3.4): that k X ∗ j w k L controls one derivative in the L -H ¨older norms. Lemma 3.4.

Let U be a bounded, open set U ⊂ M . Then, ∀ w ∈ C ∞ c ( U ) there holds, k w k X j , . U k X ∗ j w k L . || w || / L p F I ( w ) . Proof.

Let v ∈ L ∞ , then (cid:12)(cid:12)(cid:12)(cid:12) ˆ M v ( e − tX ∗ j w − w )d q (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) ˆ t ˆ M ve − sX ∗ j X ∗ j w d q d s (cid:12)(cid:12)(cid:12)(cid:12) ≤ | t |k v k L ∞ k X ∗ j w k L . Taking the supremum over k v k L ∞ ≤ and dividing by | t | gives the ﬁrst inequality whereas the secondfollows by Cauchy-Schwarz.Lastly, we record the simple observation that the negative regularity D ǫ can be localized. Lemma 3.5.

Let U ⊆ M be an open, bounded set and χ ∈ C ∞ c ( U ) . Then, for any h ∈ L ( M ) , we have D ǫ ( χh ) . U || h || L + D ǫ ( h ) . Proof.

Set w = χh . For test functions v ∈ C ∞ ( U ) , we estimate (cid:12)(cid:12)(cid:12)(cid:12) ˆ ( X v ) w d q (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) ˆ v ( X χ ) h (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) ˆ X ( χv ) h (cid:12)(cid:12)(cid:12)(cid:12) . || h || L + D ǫ ( h ) . Note that the estimate is uniform in k X k C ( U ) . Λ s with |·| X j ,s j The ﬁrst steps to Theorem 3.2 are several lemmas that are nearly the same as those in [Section 4; [29]],except (A) we need them in L , (B) we need them uniform in the parameter ǫ hidden in X , (C) we needto generalize the proof to compact manifolds ( M , g ) . However, these small changes are straightforwardon a careful reading of [29] and are omitted for the sake of brevity; see [16] for more discussion on theuniformity. Lemma 3.6.

Let U be an open, bounded set and < σ < s ∗ with X s ∗ ( M ) = X ( M ) . For all δ > , ∃ C δ > such that for all multi-indices I such that least one index is zero, the following holds ∀ w ∈ C ∞ c ( U ) , | w | X I ,s ( I ) ≤ δ | w | X , + C δ  r X j =1 | w | X j , + || w || Λ σ  . Moreover, C δ depends on { X , X , ..., X r } only in the manner stated in Theorem 3.2. The next lemma shows that one can control regularity in Λ s by controlling the original vector ﬁelds. Lemma 3.7.

Let U be an open, bounded set and s ∗ be such that X s ∗ ( M ) = X ( M ) . Then, for w ∈ C ∞ c ( U ) there holds || w || Λ s ∗ . U || w || L + r X j =0 | w | X j ,s j . Moreover, C δ depends on { X , X , ..., X r } only in the manner stated in Theorem 3.2. .4 Positive X regularity from negative X and positive X j regularity In this subsection, we prove the a priori estimate (3.7) and then use it to complete the proof of Theorem3.2. Fix < σ < s ∗ arbitrary. Having ﬁxed U we may, by rescaling { X , X , ..., X r } , assume that e tX I (and hence e − tX ∗ I ) is well-posed for w ∈ C ∞ ( U ) for t ∈ [ − , for σ ≤ s ( I ) (and hence we may choose δ = 1 ).Analogous to [Section 5; [29]], the primary intermediate step is to ﬁrst deduce the estimate assumingthe natural control on essentially all other vector ﬁelds in X σ . Deﬁnition 3.8.

Denote by J the set of all multi-indices I with σ ≤ s ( I ) except for the singleton { } .Note this deﬁnition is slightly different from that in [29]. Deﬁne the following semi-norm | w | M := X I ∈J | w | X I ,s ( I ) . The main step in the proof of (3.7) (and hence Theorem 3.2 as a whole) is to prove the following.

Lemma 3.9.

For any bounded, open set U ⊂ M , and w ∈ C ∞ ( U ) , the following holds uniformly in ǫ | w | X , . U | w | M + k w k Λ σ + D ǫ ( w ) . As in the corresponding [Section 5; [29]] (and in [4]), we use an approach based on a carefully selectedregularization, but our choice is even a little more delicate than [29]. As the regularization procedure is quitetechnically subtle, we ﬁrst give the proof of Lemma 3.9 assuming the existence of a regularizer satisfyingthe desired properties.

Lemma 3.10.

There exists a family of uniformly bounded smoothing operators S τ : L p → L p for τ ∈ (0 , and p ∈ [1 , ∞ ] with the following properties: for all w ∈ C ∞ ( U ) , || S ∗ τ w − w || L . τ | w | Mr X j =1 || X j S τ w || L ∞ . τ || w || L ∞ || [ X , S τ ] ∗ w || L . τ ( | w | M + k w k Λ σ ) . Assuming this lemma for now, we proceed.

Proof of Lemma 3.9 assuming Lemma 3.10.

We will ﬁrst obtain regularity estimates by evaluating the frac-tional time derivative of e tX ∗ w (omitting the ǫ for notational simplicity). Observe that for any t, τ > , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ ( S ∗ τ w − w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L + || S ∗ τ w − w || L + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ S ∗ τ w − S ∗ τ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . (3.11)Therefore, by Lemma 3.10 and L boundedness of the group e − tX ∗ on U , sup | t |≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ ( S ∗ τ w − w ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . || S ∗ τ w − w || L . τ | w | M . (3.12)This will sufﬁce for the ﬁrst two terms in (3.11). Next, we estimate the last term in (3.11). We will do thisusing the fact that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ S ∗ τ w − S ∗ τ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L ≤ sup k v k L ∞ ≤ (cid:12)(cid:12)(cid:12)(cid:12) ˆ t ˆ M ( e sX v ) X ∗ S ∗ τ w d q d s (cid:12)(cid:12)(cid:12)(cid:12) . (3.13)25or a ﬁxed v ∈ L ∞ , we ﬁnd that (cid:12)(cid:12)(cid:12)(cid:12) ˆ M ( e sX v ) X ∗ S ∗ τ w d q (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) ˆ M ( e sX v )[ X , S τ ] ∗ w d q (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) ˆ M ( S τ e sX v ) X ∗ w d q (cid:12)(cid:12)(cid:12)(cid:12) ≤ k e sX v k L ∞ k [ X , S τ ] ∗ w k L + (cid:12)(cid:12)(cid:12)(cid:12) S τ e sX v (cid:12)(cid:12)(cid:12)(cid:12) ∞ + r X j =1 k X j S τ e sX v k L ∞  D ( w ) . Using Lemma 3.10 and the boundedness of e tX in L ∞ ( U ) , we conclude that (cid:12)(cid:12)(cid:12)(cid:12) ˆ M ( e sX v ) X ∗ S ∗ τ w d q (cid:12)(cid:12)(cid:12)(cid:12) . U τ k v k L ∞ ( | w | M + k w k Λ σ + D ( w )) and from (3.13) we deduce (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ S ∗ τ w − S ∗ τ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . U | t | τ ( | w | M + k w k Λ σ + D ( w )) . Therefore, setting τ = p | t | and using (3.12) implies (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . p | t | ( | w | M + D ( w )) . By (3.9), (3.10), and the boundedness of U , this implies the desired result.To complete the section, we explain in more detail how Lemma 3.9 implies Theorem 3.2 Proof of Theorem 3.2.

By Lemma 3.9 followed by Lemma 3.6 to absorb the effect of the higher order brack-ets by choosing δ sufﬁciently small, implies (3.7), that is for any w ∈ C ∞ c ( U ) , || w || X , . || w || Λ σ + r X j =1 | w | X j , + D ǫ ( w ) . Applying Lemma 3.7 then implies || w || Λ s ∗ . r X j =1 | w | X j , + || w || Λ σ + D ǫ ( w ) . (3.14)Next, note the interpolation (from H ¨older’s inequality and Deﬁnition 3.3): ∀ σ ∈ (0 , s ) and all δ > , ∃ C δ such that || w || Λ σ ≤ δ || w || Λ s ∗ + C δ || w || L , which by (3.14) implies H ¨ormander inequality (3.8). Let U ⊂⊂ U ′ ⊂ M where U ′ is another open andbounded set and let χ ∈ C ∞ c ( U ′ ) with χ ( x ) = 1 for all x ∈ U . Then, Lemma 3.5 implies || χf ǫ || Λ s . r X j =1 | χf ǫ | X j , + D ǫ ( f ǫ ) . (3.15)Putting Lemma 3.4 together with (3.15) and (3.6), completes the proof of Theorem 3.2.26 .5 Regularization: Lemma 3.10 In this subsection we prove Lemma 3.10. First, we deﬁne a suitable “isotropic” molliﬁer via the parameter-ization. Let ϕ ∈ C ∞ (( − , with ϕ ≥ , ´ − ϕ ( t )d t = 1 , and ϕ ( − t ) = ϕ ( t ) , denoting ˜ w j = χ j w ◦ x j ,and for each x ∈ R d let φ τ ( x ) = τ d φ ( | x | /τ ) . We deﬁne the regularization of χ j w as follows for | τ | ≤ δ , Φ ( j ) τ w ◦ x j = ˆ R d φ τ ( | x − y | ) ˜ w j ( y ) J j ( y )d y, where as above we write J j = ( det ˜ g ) / , the volume element on M in local coordinates. We write Φ τ w ( q ) = X j : j ∈ ˜ U j Φ ( j ) τ w ( q ) . (3.16)Note that by deﬁnition, Φ τ = Φ ∗ τ for the adjoint in L ( dq ) . The basic properties of these kinds of molliﬁersare classical, however, we include sketches for completeness. Due to the compatibility between deﬁnitions(3.3) and (3.16), and the fact that the properties we are interested in are purely local, the results followfrom the corresponding statements on R d . We sketch the details of this in the ﬁrst lemma for the readers’convenience. Lemma 3.11.

For all σ ∈ [0 , , U ⊂ M open and bounded, there holds the following uniformly in τ ∈ (0 , δ ) and uniformly in C bounded sets of Y ∈ X ( U ) , for all w ∈ C ∞ ( U ) (identifying Λ = L ), || [ Y, Φ τ ] w || L . U τ σ || w || Λ σ . Proof.

It sufﬁces to show that the lemma holds for all Φ ( j ) . By the deﬁnition of the parameterization wehave, writing a k ( x ) ∂ kx (using Einstein notation summation) as the parameterization of the vector ﬁeld Y , (cid:12)(cid:12)(cid:12)(cid:12) [ Y, Φ jτ ] w (cid:12)(cid:12)(cid:12)(cid:12) L = ˆ R d (cid:12)(cid:12)(cid:12)(cid:12) ˆ R d J j ( y ) a k ( x ) ∂ kx φ τ ( | x − y | ) ˜ w j ( y ) − φ τ ( | x − y | ) a k ( y ) ∂ ky ˜ w j ( y )d y (cid:12)(cid:12)(cid:12)(cid:12) J j ( x )d x. Integrating by parts and using the average zero property, we obtain (cid:12)(cid:12)(cid:12)(cid:12) [ Y, Φ jτ ] w (cid:12)(cid:12)(cid:12)(cid:12) L . ˆ R d (cid:12)(cid:12)(cid:12)(cid:12) ˆ R d (cid:16) J j ( y ) a k ( x ) ∂ kx φ τ ( | x − y | ) + ∂ ky ( J j ( y ) φ τ ( | x − y | ) a k ( y )) (cid:17) ( ˜ w j ( y ) − ˜ w j ( x )) d y (cid:12)(cid:12)(cid:12)(cid:12) J j ( x )d x. Using that | a k ( x ) − a k ( y ) | . | x − y | , and ∂ ky ( J j ( y ) a k ( y )) ≤ gives (cid:12)(cid:12)(cid:12)(cid:12) [ Y, Φ jτ ] w (cid:12)(cid:12)(cid:12)(cid:12) L . ˆ R d ˆ R d (cid:12)(cid:12)(cid:12)(cid:12) τ d φ (cid:18) | x − y | τ (cid:19) + | x − y | τ d +1 φ ′ (cid:18) | x − y | τ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) | ˜ w j ( x ) − ˜ w j ( y ) | d yJ j ( x )d x. Then, making the change of variables y = x + h , we obtain from (3.3), (cid:12)(cid:12)(cid:12)(cid:12) [ Y, Φ jτ ] w (cid:12)(cid:12)(cid:12)(cid:12) L . τ σ || w || Λ σ . Next we prove the following regularization estimate.27 emma 3.12.

For all σ ∈ [0 , , for all U ⊂ M open and bounded, there holds uniformly over τ ∈ (0 , and uniformly over bounded C sets of Y ∈ X ( K ) , for all w ∈ C ∞ c ( U ) and p ∈ [1 , ∞ ] , || τ Y Φ τ w || L p . U || w || L p (3.17) || τ Y Φ τ w || L . U τ σ || w || Λ σ . (3.18) || Φ τ τ Y w || L . U τ σ || w || Λ σ . (3.19) Proof.

We proceed with a proof similar to that used in Lemma 3.11. We consider only (3.18); the proofs of(3.17) and (3.19) follow from similar arguments. As above, we have, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) τ Y Φ ( j ) τ w j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . ˆ R d (cid:12)(cid:12)(cid:12)(cid:12) ˆ R d τ a k ∂ kx φ τ ( | x − y | ) ( ˜ w j ( y ) − ˜ w j ( x )) d y (cid:12)(cid:12)(cid:12)(cid:12) d x . τ σ || w || Λ σ . Next, we introduce directional regularizations with respect to a given vector ﬁeld Y ∈ X , as done in[Section 5; [29]]. Accordingly, for each ϕ ∈ C ∞ c ([ − , and τ ∈ (0 , deﬁne ϕ τY w := ˆ R ( e tY w ) ϕ τ ( t )d t, where ϕ τ ( t ) := τ ϕ ( τ − t ) . Note that, ( ϕ τY ) ∗ w = ϕ − τY ∗ w = ˆ R ( e − tY ∗ w ) ϕ τ ( t )d t, a property that will be used repeatedly in the sequel.First we record the basic property that these regularizers are bounded on L p . The proof is straightforwardand is omitted for brevity. Lemma 3.13.

For any Y ∈ X , for any open bounded set U ⊂ M , and ϕ ∈ C ∞ c ([ − , there holds for all p ∈ [1 , ∞ ] , and w ∈ C ∞ c ( U ) , || ( ϕ τY ) ∗ w || L p . || w || L p || Φ τ w || L p . || w || L p . Next, we note that the regularizations, the adjoint regularizations, and vector ﬁeld exponentials arebounded in the Λ s space. Lemma 3.14.

For | t | ≤ , τ ∈ (0 , and σ ∈ [0 , , for all open, bounded sets U ⊂ M and w ∈ C ∞ c ( U ) ,there holds (cid:12)(cid:12)(cid:12)(cid:12) e tY w (cid:12)(cid:12)(cid:12)(cid:12) Λ σ . || w || Λ σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e tY ∗ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Λ σ . || w || Λ σ || ( ϕ τY ) ∗ w || Λ σ . || w || Λ σ . || Φ τ w || Λ σ . || w || Λ σ . Proof.

The latter three estimates follow easily from the ﬁrst estimate. After applying parameterization toreduce to the case of R d , the ﬁrst estimate follows from a straightforward L adaptation of [Lemma 4.2;[29]]. The details are omitted for the sake of brevity.28n a similar vein, the chain rule implies the following estimates. Lemma 3.15.

For all open, bounded U ⊂ M , for all | τ | ≤ and ∀ k ≥ , the following holds ∀ w ∈ C ∞ c ( U ) , sup Z ∈ X : || Z || Ck ≤ (cid:12)(cid:12)(cid:12)(cid:12) Ze τY w (cid:12)(cid:12)(cid:12)(cid:12) L ∞ . sup Z ∈ X : || Z || Ck ≤ || Zw || L ∞ (3.20) sup Z ∈ X : || Z || Ck ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z ∗ e τY ∗ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . sup Z ∈ X : || Z || Ck ≤ || Z ∗ w || L (3.21) sup Z ∈ X : || Z || Ck ≤ || Z ∗ ( ϕ τY ) ∗ w || L . sup Z ∈ X : || Z || Ck ≤ || Z ∗ w || L . (3.22) Proof.

Estimates (3.20), (3.21) follow from the chain rule and (3.22) then follows from the deﬁnition of theregularizers.The next lemma characterizes the regularization property of the regularizers.

Lemma 3.16.

For all open, bounded sets U ⊂ M and w ∈ C ∞ c ( U ) , || ( Y ϕ τY ) ∗ w || L . sup | t |≤ τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tY ∗ w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . Proof.

We have ( ϕ τY ) ∗ Y ∗ w = ˆ R ( e − tY ∗ Y ∗ w ) ϕ τ ( t ) d t = − ˆ R dd t (cid:16) e − tY ∗ w − w (cid:17) ϕ τ ( t )d t = ˆ R (cid:16) e − tY ∗ w − w (cid:17) ϕ ′ τ ( t )d t. The result then follows by Minkowski’s inequality.We will also need the L ∞ regularization property. Lemma 3.17.

For all open bounded sets U ⊂ M and w ∈ C ∞ c ( U ) , || Y ϕ τY w || L ∞ . τ || w || L ∞ . Proof.

This follows by a straightforward variant of the proof of Lemma 3.16.Next, we show that the H ¨older-type regularity classes are natural for controlling convergence of theoperators. It is natural to specialize to the speciﬁc form in which we are using it.

Lemma 3.18.

For all open bounded sets U ⊂ M and w ∈ C ∞ c ( U ) , there holds for ϕ ∈ C ∞ c ([ − , , ϕ ≥ and ´ R ϕ ( t )d t = 1 , || ( ϕ τX I ) ∗ w − w || L . sup | t |≤ τ k e − tX ∗ I w − w k L . Proof.

By Minkowski’s inequality, (cid:12)(cid:12)(cid:12)(cid:12) ( ϕ ντX I ) ∗ w − w (cid:12)(cid:12)(cid:12)(cid:12) L ≤ ˆ R (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ I w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L ϕ τ ( t )d t ≤ sup | t |≤ τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ I w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . ϕ τX I with respect to X J Lemma 3.19.

For all open bounded sets U ⊂ M and w ∈ C ∞ c ( U ) , for I, J ∈ J , there holds sup | t |≤ τ m ( J ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ J ( ϕ τ m ( I ) X I ) ∗ w − ( ϕ τ m ( I ) X I ) ∗ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L ≤ sup | t |≤ τ m ( J ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ J w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L + sup | t |≤ τ m ( I ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ I w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . Now, we are ready to deﬁne the regularizer S t . Let us now give J a total ordering so that m ( I ) is anincreasing function of I ∈ J and we denote J ∞ = J ∪ {∞} . We deﬁne S t in terms of an ascending ,ordered composition of regularizing operators S τ w := Y I ∈J ϕ τ m ( I ) X I ! Φ τ /σ w. This regularizer is similar, but not quite exactly the same as that deﬁned in [29] due to the inclusion of moreregularization operators. We will ultimately use S ∗ t as the regularizer, which is a little more subtle to workwith. Analogous to [29], we also deﬁne the truncated regularizer, for all J ∈ J , S Jτ :=  Y I ∈J : I ≥ J ϕ τ m ( I ) X I  Φ τ /σ The remainder of the subsection is dedicated to proving Lemma 3.10. The ﬁrst step is to obtain the L convergence. Lemma 3.20.

For all open bounded sets U ⊂ M and w ∈ C ∞ c ( U ) , (cid:12)(cid:12)(cid:12)(cid:12) S ∗ η w − w (cid:12)(cid:12)(cid:12)(cid:12) L . t | w | M Proof.

For any ﬁnite family of L → L bounded linear operators Z , Z , ..., Z k we have || Z Z ...Z k w − w || L ≤ k X j =1 (cid:12)(cid:12)(cid:12)(cid:12) ( Z ...Z j − )( Z j w − w ) (cid:12)(cid:12)(cid:12)(cid:12) L . k X j =1 || Z j w − w || L . The result then follows from Lemma 3.18.The next Lemma is crucial for characterizing the regularization properties of ( S Jt ) ∗ in L . This is theadjoint analogue of [Lemma 5.2; [29]], which is a little more technical. Lemma 3.21.

For all open bounded sets U ⊂ M and w ∈ C ∞ c ( U ) , there holds for any multi-indices J ≤ I , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( τ /σ Y S Jτ ) ∗ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . τ || w || Λ σ (3.23) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( τ m ( I ) X I S Jτ ) ∗ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . X I ′ ∈J : I ′ ≥ J sup | t |≤ τ m ( I ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ I ′ w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L + τ || w || Λ σ . (3.24)Before we continue, we deﬁne for two vector ﬁelds X and Ye t ad ( X ) Y := e tX Y e − tX , which is the adjoint representation of e tX on the Lie algebra of vector ﬁelds. It will be useful to expand e t ad ( X ) Y in a Taylor expansion. 30 emma 3.22. For two smooth vector ﬁelds

X, Y , t ∈ [ − , and N ∈ N , there exists a smooth boundedvector ﬁeld Y N,t locally uniformly bounded in C k ( ∀ k ) on t ∈ [ − , , ( e t ad ( X ) Y ) = X ≤ k

The adjoint representation gives the following commutator representation for the smoothing operators

Y ϕ τX w = ˆ R (cid:16) e tX ( e − t ad ( X ) Y ) w (cid:17) ϕ τ ( t )d t. Lemma 3.22 then gives the following formula for

Y ϕ τX (used also in [29]). Lemma 3.23.

For each ϕ ∈ C ∞ c (( − , , k ∈ N and vector ﬁeld X deﬁne ( ˆ ϕ kτX ) g := ˆ R ( e tX g ) ˆ ϕ kτ ( t )d t, where ˆ ϕ k ( t ) := t k k ! ϕ ( t ) ∈ C ∞ c (( − , . (3.25) For two smooth vector ﬁelds

X, Y , τ ∈ (0 , and N ∈ N , the following holds Y ϕ τX = X ≤ k

Proof of (3.23). We proceed by induction. For J = ∞ the result follows from (3.19).Hence, we next assume that the result holds for all J ′ with J ′ > J and prove that it also holds for J . Webegin with the decomposition ( S Jτ ) ∗ = ( S J ′ τ ) ∗ ( ϕ τ m ( J ) X J ) ∗ . By a trivial application of Lemma 3.23 with N = 1 and X I = Y , there exists a smooth bounded vector ﬁeld Y ,t such that (recall Deﬁnition (3.26)), ( τ /σ Y S Jτ ) ∗ = ( τ /σ Y S J ′ τ ) ∗ ( ϕ τ m ( J ) X J ) ∗ + τ m ( J )+1 /σ ( S J ′ τ ) ∗ ( R τ m ( J ) X ) ∗ . By the induction hypothesis and Lemma 3.14 we have for the ﬁrst term above k ( τ /σ Y S J ′ τ ) ∗ ( ϕ τ m ( J ) X J ) ∗ g k L . τ k ( ϕ τ m ( J ) X J ) ∗ g k Λ σ . τ k g k Λ σ . A similar estimate holds for the second term using Minkowski’s inequality k ( τ /σ S J ′ τ ) ∗ ( R τ m ( J ) X J ) ∗ g k L ≤ ˆ R k ( τ /σ Y ,t S J ′ τ ) ∗ e − tX ∗ J g k L ˆ ϕ τ m ( J ) ( t )d t . τ k e − tX ∗ J g k Λ σ . τ k g k Λ σ and the estimate (3.23) now follows. 31 roof of (3.24). First we note that if I = J then we have for J ′ the smallest element such that J ′ > J by Lemma 3.16 and the L boundedness of ( S Jτ ) ∗ k ( X J S Jτ ) ∗ g k L = k ( S J ′ τ ) ∗ ( X J ϕ τ m ( J ) X J ) ∗ g k L . sup | t |≤ τ m ( J ) k e − tX ∗ J g − g k L . When

I > J , we proceed by induction. First of all, the result follows by deﬁnition of (3.16) if J = ∞ .Hence, we next assume that the result holds for all J ′ with J ′ > J and prove that it also holds for J thelargest element less than J ′ . Again writing ( S Jτ ) ∗ = ( S J ′ τ )( ϕ τ m ( J ) X J ) ∗ and using Lemma 3.23 we obtain, ∀ N ≥ , ( τ m ( I ) X I S Jτ ) ∗ = X ≤ k

For all J ∈ J , U ⊂ M open and bounded, there holds ∀ w ∈ C ∞ c ( U ) and τ ∈ ( − , , (cid:12)(cid:12)(cid:12)(cid:12) [ τ X , S Jτ ] ∗ w (cid:12)(cid:12)(cid:12)(cid:12) L . X I ∈J : I ≥ J sup | t |≤ τ m ( I ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ I w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L + τ || w || Λ s . Proof.

As in the proof of (3.24) above, we proceed by induction. Firstly, the estimate holds for J = ∞ due to the commutator estimate Lemma 3.11. As above, assume the result holds for all J ′ with J ′ > J andprove that it also holds for J , writing ( S Jt ) ∗ = ( S J ′ t ) ∗ ( ϕ t m ( J ) X J ) ∗ . [ τ X , ( S Jτ )] ∗ = [ τ X , S J ′ τ ] ∗ ( ϕ τ m ( J ) X J ) ∗ + ( S J ′ τ ) ∗ [ τ X , ϕ τ m ( J ) X J ] ∗ . (3.27)By the inductive hypothesis and Lemmas 3.19 and Lemmas 3.13 we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) [ τ X , S J ′ τ ] ∗ ( ϕ τ m ( J ) X J ) ∗ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . X I ′ ≥ J ′ sup | t |≤ τ m ( I ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ I ′ ( ϕ τ m ( J ) X J ) ∗ w − ( ϕ τ m ( J ) X J ) ∗ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L + τ (cid:12)(cid:12)(cid:12)(cid:12) ϕ τ m ( J ) X J w (cid:12)(cid:12)(cid:12)(cid:12) Λ σ . X I ′ ≥ J sup | t |≤ τ m ( I ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e − tX ∗ I ′ w − w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L + τ k w k Λ σ . This term in (3.27) is consistent with the desired result.For the second term in (3.27), by Lemma 3.23 we have (note the cancellation which eliminates the k = 0 term) ( S J ′ τ ) ∗ [ τ X , ϕ τ m ( J ) X J ] ∗ = X

N m ( J ) ≥ σ .We omit the repetitive details for the sake of brevity.Finally, we prove the required L ∞ regularization estimate. Lemma 3.25.

Let U ⊂ M be open and bounded. Let I be any multi-index and J ≤ I . Then, ∀ w ∈ C ∞ c ( U ) , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t m ( I ) X I S Jt w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L ∞ . U || w || L ∞ Proof.

This is done by induction as in previous lemmas. The case J = ∞ follows from (3.17). Assume thatthe result holds for all J ′ with J ′ > J and write S Jt = ϕ t m ( J ) X J S J ′ t . Case 1:

I > J . We apply the Taylor expansion of Lemma 3.23 (recalling deﬁnitions (3.25) and (3.26)) τ m ( I ) X I S Jτ = X ≤ k

Projective spanning

In this section, we discuss tools for verifying the parabolic H ¨ormander conditions on the projective process ( w t ) on the sphere bundle S M in general (Section 4.1) as well as speciﬁc criteria for the class of Euler-likemodels introduced in Section 1.2 (Section 4.2). This projective spanning condition is proved for the Lorenz96 model in Section 4.3 We work in this section at the same level of generality as we did in Section 2. As before, let M be a smooth,connected, orientable Riemannian manifold without boundary. Given a vector ﬁeld X on M we denote theassociated “lifted” vector ﬁeld ˜ X on the sphere bundle S M by ˜ X ( x, v ) := (cid:18) X ( x ) V ∇ X ( x ) ( v ) (cid:19) where each of the components in the block vector above are with respect to the orthogonal splitting T w S M = T x M ⊕ T v S x M , V ∇ X ( x ) ( v ) = Π v ∇ X ( x ) v is the projective vector ﬁeld on S x M and ∇ X ( x ) denotes thetotal covariant derivative with respect to the Levi-Civita connections, viewed as a linear endomorphism on T x M .Here we give general necessary and sufﬁcient conditions on a collection of vector ﬁelds { X k } rk =0 sothat their lifts { ˜ X k } rk =0 satisfy the parabolic H ¨ormander condition on S M . Deﬁnition 4.1.

Let { X k } rk =0 be a collection of smooth vector ﬁelds on a connected manifold M , and let X k ⊆ X ( M ) be as in Deﬁnition 1.7. Deﬁne the parabolic Lie algebra generated by { X k } rk =0 to be theLie-algebra of vector ﬁelds spanned by these collections Lie( X ; X , . . . , X r ) := ( Z ∈ X ( M ) : Z = N X i =1 c i Z i , c i ∈ R , { Z i } ⊂ [ k ∈ N X k ) . It follows that { X k } rk =0 satisﬁes the parabolic H¨ormander condition if for each x ∈ M { X ( x ) : X ∈ Lie( X ; X , . . . , X r ) } = T x M, and similarly for { ˜ X k } rk =0 .Since the vector ﬁelds { X k } rk =0 may not be volume preserving, it is convenient to deﬁne for each X ∈ X ( M ) the following traceless linear operator on T x M : M X ( x ) := ∇ X ( x ) − n div X ( x ) I , which we view as an element of the Lie algebra sl ( T x M ) (the space of traceless linear operators on T x M ).Since the projective vector ﬁeld V ∇ X ( v ) includes a projection orthogonal to v , we always have V ∇ X ≡ V M X .An important role will be played by the following Lie sub-algebra of sl ( T x M ) : m x ( X ; X , . . . , X r ) := { M X ( x ) : X ∈ Lie( X ; X , . . . , X r ) , X ( x ) = 0 } . (4.1)It is a simple matter to check that m x ( X ; X , . . . , X r ) is indeed a Lie sub-algebra of sl ( T x M ) with respectto the matrix commutator [ A, B ] = AB − BA.

It was observed by Baxendale in [11] that the parabolicH ¨ormander condition on lifted vector ﬁelds { ˜ X k } rk =0 on S M can be related to properties of the matrix Liealgebra m x ( X ; X , . . . , X r ) . However, as there does not seem to be a proof of this fact anywhere in theliterature, a self-contained proof is included in the Appendix B.34 roposition 4.2. Let { X k } rk =0 be a collection of smooth vector ﬁelds on M . Their lifts { ˜ X k } rk =0 satisfy theparabolic H¨ormander condition on S M if and only if { X k } rk =0 satisfy the parabolic H¨ormander conditionon M and for each ( x, v ) ∈ S M we have { V A ( v ) : A ∈ m x ( X ; X , . . . , X r ) } = T v S x M. (4.2) Remark 4.3.

The theory of Lie algebra actions on manifolds, the condition (4.2) means that m x acts tran-sitively on S x M through the Lie algebra action A V A . It is straight forward to show that the Lie algebra so ( T x M ) of skew-symmetric linear operators (depending on the metric) acts transitively on S x M and there-fore a sufﬁcient condition for transitive action of m x ( X ; X , . . . , X r ) on S x M is so ( T x M ) ⊆ m x ( X ; X , . . . , X r ) . R n In this section we introduce useful sufﬁcient conditions for verifying the parabolic H ¨ormander condition forthe projective process arising from a certain class of SDE on M = R n of the form d x ǫt = ( F ( x ǫt ) + ǫAx ǫt ) d t + √ ǫ r X k =1 X k d W kt , (4.3)where { X k } rk =1 are assumed for simplicity to be constant vector ﬁelds while the matrix A ∈ R n × n isnegative deﬁnite, contributing volume dissipation to the overall system, and F ( x ) = B ( x, x ) is bilinear asin (1.11).As previously, ǫ > denotes a parameter that should be thought of as small. Our goal will be to verifythat ˜ X ǫ , ˜ X , . . . , ˜ X r satisﬁes the parabolic H ¨ormander condition uniformly in ǫ on bounded sets in the senseof Deﬁnition 3.1,On R n , the sphere bundle trivializes to SR n ≃ R n × S n − and the lifts { ˜ X ǫ , ˜ X , . . . , ˜ X r } to R n × S n − are given by ˜ X ǫ ( x, v ) = (cid:18) X ǫ V ∇ X ǫ ( v ) (cid:19) , ˜ X k = (cid:18) X k (cid:19) , where k = 1 , . . . r and X ǫ ( x ) = F ( x ) + ǫAx .By Proposition 4.2, we know that verifying the parabolic H ¨ormander condition for { ˜ X ǫ , ˜ X , . . . , ˜ X r } on R n × S n − is equivalent to checking that { X ǫ , X , . . . , X r } satisﬁes the parabolic H ¨ormander conditionon R n and that the Lie algebra m x ( X ǫ ; X , . . . , X r ) deﬁned by (4.1) satisﬁes { V A ( v ) : A ∈ m x ( X ǫ ; X , . . . , X r ) } = T v S n − for each ( x, v ) ∈ R n × S n − . In general it is a challenge to directly work with m x ( X ǫ ; X , . . . , X r ) as it isnot a simple task to classify all vector ﬁelds in Lie( X ǫ ; X , . . . X r ) that vanish at each x ∈ R n . However, in R n it is often the case that the parabolic Lie algebra generated by { X k } rk =0 contains a spanning collectionof constant vector ﬁelds. In this case m x can be described more explicitly. Lemma 4.4.

Let { X k } rk =0 ⊆ X ( R n ) and suppose that Lie( X ; X , . . . , X r ) contains the constant vectorﬁelds { ∂ x k } nk =1 . Then m x ( X ; X , . . . , X r ) = { M X ( x ) : X ∈ Lie( X ; X , . . . , X r ) } . Proof.

Our hypothesis { ∂ x k } nk =1 ⊆ Lie( X ; X , . . . , X r ) implies that for each X ∈ Lie( X ; X , . . . , X r ) and x ∈ R n , the vector ﬁeld ˆ X = X − X ( x ) also belongs to Lie( X ; X , . . . , X r ) and satisﬁes ˆ X ( x ) = 0 .Since ∇ ˆ X = ∇ X , we have M ˆ X ( x ) = M X ( x ) , hence M X ( x ) ∈ m x .35urthermore, the assumptions that F ( x ) = B ( x, x ) is bilinear and Ax is linear allow us to deduce aconvenient sufﬁcient condition for verifying (4.2) on the vector ﬁelds X ǫ , X , . . . , X r uniformly in ǫ . Tomake this precise, we deﬁne for each k = 1 , . . . , n the linear operator H k := ∇ [ ∂ x k , X ǫ ] = ∂ x k ∇ F ∈ sl ( R n ) . Note that H k is independent of both x ∈ M and ǫ . Below, Lie( H , . . . , H n ) denotes the matrix Liesubalgebra of sl ( R n ) generated by (cid:8) H , · · · , H n (cid:9) . Lemma 4.5.

Assume (i) { ∂ x k } nk =1 ⊆ Lie( X ǫ ; X , . . . , X r ) and (ii) that Lie( H , . . . , H n ) = sl ( R n ) . (4.4) Then, ˜ X ǫ , ˜ X , . . . , ˜ X r satisﬁes the uniform parabolic H¨ormander condition in the sense of Deﬁnition 3.1 as ǫ is varied in (0 , . Remark 4.6.

By Remark 4.3, we can replace (4.4) with the weaker condition so ( R n ) ⊆ Lie( H , . . . , H n ) . Remark 4.7.

Let us comment brieﬂy on how one might verify (4.4). Since sl ( R n ) is n − dimensional,it is clear that one must use commutators that go several generations deep if one has any hope of generating sl ( R n ) . However, it can simplify things to instead look to build a suitable generating set for sl ( R n ) out ofbrackets of H i ’s. A particularly useful generating set for sl ( R n ) is the collection of elementary matrices E , , E , , . . . , E n, , where E i,j is the matrix with in ( i, j ) entry and elsewhere. For these, we have the commutation relation [ E i,j , E k,ℓ ] = E i,ℓ δ j,k − E k,j δ ℓ,i , so that, e.g., [ E , , E , ] = E , and [ E , , E , ] = E , − E , . Continuing like this allows to generate the off-diagonal matrices { E i,j } i = j as well as the directions E , − E , , . . . E n,n − E n − ,n − needed to complete a basis for sl ( R n ) . Therefore, (cid:8) E , , E , , . . . , E n, (cid:9) gen-erates sl ( R n ) . Now we turn to verifying the uniform projective spanning for stochastically forced Lorentz 96 (1.1). Recallthe stochastic Lorenz 96 is an SDE on R J deﬁned by d u ℓ = ( u ℓ +1 − u ℓ − ) u ℓ − d t − ǫu ℓ d t + √ ǫq ℓ d W ℓt . (4.5)Here, we assume a periodic ensemble of coupled oscillators, i.e., u i + kJ := u i . Naturally we can write (4.5)in the general form (4.3), by deﬁning F ℓ ( u ) = u ℓ +1 u ℓ − − u ℓ − u ℓ − , ( Au ) ℓ = u, X ℓ ( u ) = q ℓ ∂ u ℓ , where F ( u ) satisﬁes assumption 1.11.First, we verify uniform hypoellipticity of ( u t ) . Lemma 4.8.

Let

J < ∞ be arbitrary, suppose that at least q , q = 0 , then Lie(

F, q ∂ u , q ∂ u ) contains { ∂ u j } Jj =1 and spans R n uniformly in ǫ on compact sets. roof. Since the nonlinearity is bilinear, we readily observe that [ ∂ u , [ ∂ u , F ]] = − ∂ u . Iterating this observation allows to generate all brackets of the form [ ∂ u i +1 , [ ∂ u i , F ]] = − ∂ u i +2 .In order to prove uniform projective spanning we ﬁrst observe that ( ∇ F ( u )) ℓm = DF ℓ ( u ) m = u ℓ − δ m = ℓ +1 + u ℓ +1 δ m = ℓ − u ℓ − δ m = ℓ − − u ℓ − δ m = ℓ − , hence it follows that for each k ∈ { , . . . , J } we have H k = ∂ u k DF ( u ) = E k +1 ,k +2 + E k − ,k − − E k +1 ,k − − E k +2 ,k +1 . The following lemma now implies projective spanning for Lorenz-96 when combined Lemma 4.5.

Lemma 4.9.

The following holds

Lie( H , . . . , H J ) = sl ( R J ) Proof.

Throughout, we regard the indices in E i,j modulo J , so that E i + kJ,j + ℓJ = E i,j for all i, j, k, ℓ .Let g denote the smallest Lie algebra containing { H k } . To start, let ≤ k ≤ J . We compute [ H k , H k +4 ] = E k +3 ,k +1 , hence E k,k − ∈ g for all ≤ k ≤ J . Continuing, [ H k , E k − ,k − ] = E k − ,k − , hence E k,k − ∈ g for all k . Inductively, assuming E k,k − ℓ ∈ g , we have that [ H k , E k − ,k − ( ℓ +2) ] = E k − ,k − − ( ℓ +1) , (4.6)hence E k,k − ( ℓ +1) ∈ g for all k . The induction step in (4.6) continues to hold as long as k − ( ℓ + 2) is disjointfrom { k − , k + 1 , k + 2 } modulo J , which is assured so long as ℓ < J − .Fix ℓ ∈ { , , · · · , J − } so that J − ℓ is co-prime to J . In particular, { − ℓ , − ℓ , · · · , − ( J − ℓ } coincides with the complete set of residue classes { , , · · · , J − } in Z /J Z . Since n E , − ℓ , E − ℓ , − ℓ , · · · , E − ( J − ℓ , − ( J − ℓ , E − ( J − ℓ , o ⊂ g is really just a re-ordering of the generating set identiﬁed in Remark 4.7, we conclude g = sl ( R J ) . The goal of this section is to complete the proof of Proposition 1.15 described in Section 1. Denote by Φ t : R n → R n , t ≥ the (deterministic) ﬂow of diffeomorphisms solving the Euler-like initial valueproblem ˙ x t = B ( x t , x t ) , x = x ∈ R n . (5.1)where B : R n × R n → R n is a bilinear mapping for which x · B ( x, x ) = 0 and div B ( x, x ) = 0 . As inSection 1 let ˆΦ t : SR n → SR n be the associated projective ﬂow deﬁned by ˆΦ t ( x, v ) = (cid:18) Φ t ( x ) , D x Φ t v | D x Φ t v | (cid:19) . To start, in Section 5.1 we collect preliminaries regarding the Euler-like class (5.1). In Section 5.2 wethen recall some general linear cocycle theory ruling out the existence of absolutely continuous invariantprobabilities for generalized projective actions. Finally, Section 5.3 completes the proof of Proposition1.15. 37 .1 Preliminaries for Euler-like systems

For our use, we record below the following simple consequences of the special Euler-like structure imposedby (5.1). Some notation: for

E > let us write S E := { x ∈ R d : | x | = E } for the “energy shells”preserved by the ﬂow Φ t , i.e., Φ t ( S E ) = S E for all t ≥ , E > . Write E ( x ) = | x | . Lemma 5.1.

Let x ∈ R d then the following identity holds D Φ t ( x ) x = Φ t ( x ) + tB (Φ t ( x ) , Φ t ( x )) . (5.2) Moreover, for each x ∈ R d and t ≥ , we have that | D Φ t ( x ) | ≥ t | B (Φ t ( x ) , Φ t ( x )) || x | . (5.3) Proof of Lemma.

For a given α > , note that the rescaled ﬂow α Φ αt ( x ) also solves (5.1) with initial data αx . Therefore by uniqueness, we have Φ t ( αx ) = α Φ αt ( x ) (5.4)Taking the derivative with respect to α on both sides of (5.4) yields D Φ t ( αx ) x = Φ αt ( x ) + αtB (Φ αt ( x ) , Φ αt ( x )) . Setting α = 1 gives (5.2).Inequality (5.3) follows from part (5.2) and the fact that Φ t ( x ) · B (Φ t ( x ) , Φ t ( x )) = 0 for all x , byassumption.The following emphasizes the shearing between energy surfaces used in the sequel: In this section, we will state everything in the following abstract linear cocycle setting. Throughout, T :( X, B , m ) (cid:9) is a (discrete-time) continuous transformation of a compact metric space X , with B the Borel σ -algebra. Let A : ( X, B ) → SL d ( R ) , x A x be a measurable mapping . This generates the cocyle oflinear operators A : X × Z ≥ → SL d ( R ) deﬁned by A ( n, x ) = A nx := A T n − x A T n − x · · · A T x A x . Note that A satisﬁes the cocycle identity A m + nx = A mT n x A nx for all m, n ≥ , x ∈ X . Associated to T, A is the projective action ˆ T : X × S d − (cid:9) deﬁned by ( x, v ) (cid:18) T x, A x v | A x v | (cid:19) , x ∈ X, v ∈ S d − , which we regard as a dynamical system on X × S d − in its own right.Let ˆ m be any ˆ T -invariant measure on X × S d − projecting to m (i.e., ˆ m ( K × S d − ) = m ( K ) for allmeasurable K ⊂ X ), and consider its disintegration d ˆ m ( u, v ) = d ˆ m x ( v )d m ( x ) . Here, SL d ( R ) is the group of d × d real matrices of determinant 1. When T is a smooth mapping of a manifold and A nx := D x ( T n ) is the so-called derivative cocycle , the cocycle identity ismerely the chain rule for T n .

38n this context, it is well-known [22, 64] that disintegrations ( ˆ m x ) x ∈ X exist and are essentially unique (upto m -measure zero modiﬁcations) and x ˆ m x is weak-* measurably varying. Note that invariance of ˆ m implies that ( A x ) ∗ ˆ m x = ˆ m T x for m -a.e. x ∈ X, where for a d × d matrix A we write A ∗ for the action of A on probability measures on S d − .The following result (more-or-less Theorem 3.23 of [6], up to a technical issue-see below) involves therigidity of absolute continuity of the disintegration measures ˆ m x with respect to Leb S d − . Theorem 5.2.

Assume that ˆ m x ≪ Leb S d − for m -almost every x ∈ X . Then, there exists a measurablefamily of inner products X ∋ x g x ( · , · ) on R d and a T -invariant set Γ ⊂ X of full m -measure such thatfor all x ∈ Γ and v, w ∈ R d , we have that g T x ( A x v, A x w ) = g x ( v, w ) . That is, A x : ( R d , g x ) → ( R d , g T x ) is an isometry. This is slightly different from the form in Theorem 3.23 of [6]: there, it is supposed that ˆ m x ∼ Leb S d − ,whereas for our purposes we need the version with “ ≪ ”. For this reason, as well as for the sake of com-pleteness, we sketch the proof of Theorem 5.2 here. Proof sketch.

To start, let us assume for now that T : ( X, B , m ) (cid:9) is ergodic (note that we do not assume ˆ m is ergodic). We require the following Lemma: Lemma 5.3 (Corollary 3.7 in [6]; Lemma 6.2 in [27]) . Assume ( X, B , m, T ) is ergodic. Then, there is afull m -measure set of x ∈ X with the following property: there exists a measurable mapping G : X → SL d ( R ) , depending on the choice of x , such that G ( x ) ∗ ˆ m x = ˆ m x for m − almost every x ∈ X .

This version is slightly different from those appearing in [6, 27], so we brieﬂy recall the proof below.

Proof sketch of Lemma.

Let P ( S d − ) denote the space of probability measures on S d − with the weak ∗ topology. Consider the quotient P ( S d − ) /SL d ( R ) , i.e., for ξ, η ∈ P ( S d − ) we set ξ ∼ η iff ∃ B ∈ SL d ( R ) so that B ∗ ξ = η . Writing [ η ] for the equivalence class of η ∈ P ( S d − ) , note that [ ˆ m x ] = [ ˆ m T k x ] for all k , i.e., x [ ˆ m x ] is constant along orbits. By Corollary 3.2.12 in [76], the Borel σ -algebra on the quotientspace P ( S d − ) /SL d ( R ) is countably generated. Using this along with the fact that T : ( X, µ ) (cid:9) is ergodic,it follows from a standard argument that [ ˆ m x ] is constant µ -almost surely. In particular, for µ -a.e. x , x ∈ X ,the measures ˆ m x and ˆ m x are related by the application of a matrix in SL d ( R ) . It is now straightforward toconstruct a measurable selection G : X → SL d ( R ) as above.Fix x so that ˆ m x ≪ Leb S d − and let G be as in Lemma 5.3. Observe that for any n ≥ and m -a.e. x ∈ X we have that G ( T n x ) − A nx G ( x ) ∈ H x , where H x := { B ∈ SL d ( R ) : B ∗ ˆ m x = ˆ m x } . Observe that H x is a subgroup of SL d ( R ) , which we claim to be compact. If not, then a lemma of Fursten-berg (see, e.g., Claim 4.8 in [15]) would imply the existence of proper subspaces V , V ⊂ R d and asequence { B n } ⊂ H x so that dist ( B n v, V ) → for all v / ∈ V , which would contradict ˆ m x ≪ Leb S d − .Since H x is compact, there exists an inner product h· , ·i on R d with respect to which all members of H x are isometries (Lemma 4.6 in [15]). The proof is complete on deﬁning g x through g x ( v, w ) = h G ( x ) − v, G ( x ) − w i . (5.5)39o handle the case when m is not ergodic, we use the ergodic decomposition [70] m ( · ) = ˆ ξ ∈E T ( X ) ξ ( · ) d τ m ( ξ ) , where E T ( X ) is the space of T -ergodic measures on X and τ m a Borel measure (w.r.t. the weak ∗ topology)on E T ( X ) . For each component ξ , we deﬁne ˆ ξ through the formula d ˆ ξ ( x, v ) = d ˆ m x ( v )d ξ ( x ) , noting that ˆ m x ≪ Leb S d − for ξ -a.e. x ∈ X and τ m -a.e. ξ ∈ E T ( X ) . The proof now goes through the sameas before, the only difference being that the measurable inner product (5.5) is deﬁned along each ξ ∈ E T ( X ) one at a time. To start, let ν be ˆΦ t -invariant, projecting to an absolutely continuous measure µ on R n , and assume that ν = ν ac + ν ⊥ where ν ac ≪ Leb SR n is not identically zero (our contradiction hypothesis), while ν ⊥ is singular. Since ˆΦ t sends absolutely continuous measures to absolutely continuous measures and singular to singular, it followsthat ν ac is ˆΦ t -invariant. Since ν ac ≤ ν , the measure µ ac ( K ) := ν ac ( K × S n − ) satisﬁes µ ac ≪ µ ≪ Leb R n and is likewise Φ t -invariant. On replacing ν with the normalization of ν ac , going forward we may assumewithout loss that ν ≪ Leb SR n . Finally, since the energy shells S E = {| x | = E } are invariant, we mayreplace ν with the normalization of its restriction to B (0 , R ) × S d − for some large, ﬁxed R > .Continuing, let dν ( x, v ) = dν x ( v ) dµ ( x ) denote the disintegration measures of ν and note that ν x ≪ Leb S n − for µ -a.e. x . By Theorem 5.2, there exists a measurable family of inner products g x , x ∈ R n sothat D Φ ( x ) : ( R n , g x ) → ( R n , g Φ x ) (5.6)is an isometry for µ -a.e. x . By a standard procedure, we may assume that (5.6) holds for x ∈ Γ , where Γ ⊂ R n satisﬁes µ (Γ) = 1 and Φ (Γ) = Γ .For L > , deﬁne Γ L = ( x ∈ Γ : L − ≤ p g x ( v, v ) | v | ≤ L for all v ∈ S n − ) ∩ (cid:8) x ∈ Γ : | B ( x, x ) | ≥ L − (cid:9) . and note that if x, Φ n x ∈ Γ L for some n ≥ , then | D x Φ n | ≤ L must hold by (5.6). Moreover, we havethat µ (Γ L ) ր µ (Γ) = 1 as L → ∞ . Observe that this relies on the assumption that B is not identically 0,hence | B ( x, x ) | > Lebesgue-a.e. (here, we use the standard fact that { B ( x, x ) = 0 } is a proper variety in R n , hence must have zero volume).Fix L such that µ (Γ L ) ≥ / > . By the Poincar´e Recurrence Theorem, µ -a.e. x ∈ Γ L visits Γ L inﬁnitely many times. Fix such an x ∈ Γ L \ { } and let n < n < n < · · · , lim ℓ →∞ n ℓ = ∞ , sothat Φ n ℓ ( x ) ∈ Γ L for all n ℓ , hence | D Φ n ℓ ( x ) | ≤ L for all such n ℓ . On the other hand, (5.3) implies | D Φ n ℓ ( x ) | ≥ n ℓ L | x | as n ℓ → ∞ , a contradiction. This completes the proof of Proposition 1.15.40 Qualitative properties of the projective stationary measure

In this section we record basic properties of the SDE (1.10).

Theorem A.1.

Suppose that { ˜ X , ˜ X , ..., ˜ X r } satisﬁes the uniform parabolic H¨ormander condition on SR n as in Deﬁnition 3.1. For all ǫ > , the SDE (1.10) satisﬁes Assumptions 1 and 2. Moreover, the stationarymeasure of the ( w t ) process f ǫ has a smooth density with respect to Lebesgue measure f ǫ ∈ L ∩ L ∩ C ∞ with f ǫ log f ǫ ∈ L , the marginal ˆ S n − f ǫ ( x, v )d v = ρ ǫ ( x ) and ∃ C, γ > such that ∀ ǫ ∈ (0 , , ˆ S M f ǫ e γ | x | d q < C. (A.1) Furthermore, the estimate in Assumption 1 (iii) holds for all ǫ > .Proof of Theorem A.1. Claims (i) and (ii) of Assumption 1 are standard or proved in [16]. The proof ofAssumption 1 (iii) follows by providing suitable moment estimates on log (cid:12)(cid:12) det D Φ t (cid:12)(cid:12) and log (cid:12)(cid:12) D Φ t (cid:12)(cid:12) usingthe SDE derived in the proof of Proposition 2.4. The results of Assumption 2 follow from similar methods(including (A.1), though see below)However, we are not aware of any proof of f ǫ ∈ L or f ǫ log f ǫ ∈ L in the literature and we thereforeinclude the proof. For this we use some ideas that appear in [16]. As in [16], a convenient method to justifymany formal calculations begins by ﬁrst regularizing the problem by adding elliptic Brownian motions.Recall that the generator ˜ L for the projective process ( w t ) is given by ˜ L = ˜ X + 12 r X k =1 X k . We then regularize this by the perturbing the generator ˜ L δ = ˜ L + δ x,v , where ∆ x,v = ∆ x + ∆ v with ∆ x the usual Laplacian on R n and ∆ v the Laplace-Beltrami operator on S n − .This corresponds to perturbing the SDE (1.10) by a non-degenerate √ δ Brownian motion on SR n . It is nothard to show that ˜ L δ satisﬁes a drift condition ˜ L e γ | x | ≤ − αe γ | x | + K for some α ∈ (0 , , K ≥ (uniformly in ǫ, δ ). This gives rise to a globally deﬁned Markov process ( w δt ) .Moreover for a given initial density f ∈ C ∞ c ( SR d ) with ´ f d q = 1 and f ≥ , such that Law( w δ ) = f wedenote f t = Law( w δt ) , which solves the forward Kolmogorov equation ∂ t f t = ˜ L ∗ f t + 12 δ ∆ x,v f t = 0 . (A.2)From the drift condition we have, ∀ γ sufﬁciently small, ∃ α ∈ (0 , such that (uniformly in ǫ, δ ), ˆ SR d f t e γ | x | d q . e − αt ˆ SR d f e γ | x | d q. (A.3)41et ¯ χ ∈ C ∞ c ( B (0 , with ≤ ¯ χ ≤ , and ¯ χ = 1 for x ≤ / and deﬁne χ ( x ) = χ ( x/ − χ ( x ) . Deﬁne χ j = χ (2 − j x ) , which deﬁnes the partition of unity χ + P ∞ j =0 χ j ( x ) . From energy estimates on (A.2)we have the following, dd t || f t || L + δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / f t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . || (1 + | x | ) f || L . || ¯ χf || L + ∞ X j =1 j || χ j f || L , (in order to justify such estimates one may apply smooth, v -independent radially symmetric cut-offs to thenonlinearity and pass to the limit). By the Gagliardo-Nirenberg-Sobolev inequality, ∃ θ ∈ (0 , (that suchan inequality holds is veriﬁed through the local coordinates and the fact that the estimates on the metric areuniform over the manifold), dd t || f t || L + δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / f t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . || ¯ χf t || L + ∞ X j =1 j || χ j f t || L . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / ¯ χf t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL || ¯ χf t || − θL + ∞ X j =1 j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / χ j f t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL || χ j f t || − θL . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / f t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL + ||∇ x ¯ χf t || θL + ∞ X j =1 j (cid:18)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / f t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL + ||∇ x χ j f t || θL (cid:19) || χ j f t || − θL , and hence dd t || f t || L + δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / f t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . δ || f t || − θL (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / f t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL + ˆ SR d f t e γ | x | d q. Hence, from (A.3), there holds for some q > , t ˆ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / f τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L d τ . δ − q ˆ SR d f e γ | x | d q. Combined with the uniform drift condition, this allows to pass to the limit t → ∞ and conclude that theunique stationary measure, denoted below as f ǫ,δ is in H ( SR d ) ; we note that f ǫ,δ is a smooth solution ofthe Kolmogorov equation (cid:18) ˜ L ∗ + δ x,v (cid:19) f ǫ,δ = 0 . (A.4)Next, we obtain an L estimate that is uniform in δ in order to pass to the δ → limit. For this, we clearlyneed to depend on hypoelliptic regularity. Deﬁne the regularized H ¨ormander norm pair (see discussions in[4, 16, 29] for motivations), || w || H δ := || w || L + r X k =1 || X k w || L + δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( − ∆ x,v ) / w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L || w || H ∗ δ := sup ϕ : || ϕ || H δ ≤ (cid:12)(cid:12)(cid:12)(cid:12) ˆ SR n ( ˜ X ϕ ) w d q (cid:12)(cid:12)(cid:12)(cid:12) The proof is similar to [Lemma 2.3; [16]] provided we have the following quantiﬁcation of H ¨ormander’sinequality. 42 emma A.2 (Quantitative H ¨ormander inequality for the projective process) . Suppose that { ˜ X , ˜ X , ..., ˜ X r } satisﬁes the uniform parabolic H¨ormander condition on B (0 , × S n − as in Deﬁnition 3.1. There exists s > and q > , such that for any R ≥ , w ∈ C ∞ ( B R × S n − ) and δ ∈ [0 , there holds k w k H s . R q ( || w || H δ + || w || H ∗ δ ) , (A.5) where both s > and the implicit constant do not depend on ǫ , δ , or R , where here we denote in analogywith (3.2) (for dim SR d = m ), for ˜ w j = χ j w ◦ x j as deﬁned therein), || w || H s = || w || L + X j ˆ | h | <δ ˆ R n × S n − | ˜ w j ( q + h ) − ˜ w j ( q ) | | h | m +2 s J j ( q )d q d h  / Proof.

The proof begins with a re-scaling as in [Lemma 3.2; [16]]. Deﬁne h ( x, v ) = w ( Rx, v ) which solvesa PDE of the following form for suitable vector ﬁelds N , V , Y , (denoting ∆ x,v as the Laplace-Beltramioperator, which note is invariant under this scaling) ǫδ ∆ x,v h + 12 r X j =1 ǫ ( ˜ X ∗ j ) h − N h + R − V ∗ h − ǫR Y ∗ h = 0 . where N ( x ) = B ( x, x ) , Y ( x ) = Ax and V ( x, v ) = Π v ∇ F ( x ) v , and their action on h is interpreted asa differential operator. We see that the proof here is more subtle than in the corresponding [Lemma 3.2;[16]] as R − V is required to span the directions in projective space. From Proposition 4.2, we see that thespanning in x and v can be considered essentially separately, ﬁrst choosing brackets to span in x and thencorrecting by choosing suitable brackets in m x ( X ; X , . . . , X r ) to span in v . Using this structure we seethat given a vector ﬁeld Z ∈ X ( SR n ) and q ∈ B (0 , × S n − , there exists p j < ... < p < p ≤ k (with k as in Deﬁnition 3.1) such that for q in a neighborhood of q , there are ﬁnitely many smooth coefﬁcients c j and vectors Z j ∈ X k with Z ( q ) = X j R p j c j ( q ) Z j ( q ) , where if Z varies in a bounded set in C m , then { c j } j varies in a similarly bounded set as well. A carefulreading of [29] shows that this introduces powers of R matching the powers of t into all of the estimates in[Sections 4 and 5; [29]], the maximal power arising being R k . In particular, the error estimates come in theform O ( R k/σ ) , provided that R k t < and < σ < s ∗ as in [29]. This restriction on t in the estimatesfurther introduces only polynomial dependence on R , as for any Z ∈ X ( SR n ) , sup | t |≤ | t | − σ (cid:12)(cid:12)(cid:12)(cid:12) e tZ g − g (cid:12)(cid:12)(cid:12)(cid:12) L . R kσ || g || L + sup | t |≤ R − k | t | − σ (cid:12)(cid:12)(cid:12)(cid:12) e tZ g − g (cid:12)(cid:12)(cid:12)(cid:12) L . Combining the above observations with those of [29] implies that the constant in (A.5) remains polynomialin R (exponential would also be sufﬁcient for our purposes, as we only use that the constant is boundedabove by e ηR for η < γ ).Once one has Lemma A.2, the proof of Theorem A.1 follows easily, given that we are not seeking ǫ -independent bounds, as these such bounds will be false for all but the most degenerate models (see [Lemma2.4; [16]] for the corresponding argument on ρ ǫ , which does yield ǫ -independent estimates). Let ¯ χ ∈ C ∞ c ( B (0 , with ≤ ¯ χ ≤ , and ¯ χ = 1 for x ≤ / and deﬁne χ ( x ) = χ ( x/ − χ ( x ) . Deﬁne χ j = χ (2 − j x ) , which deﬁnes the partition of unity χ + P ∞ j =0 χ j ( x ) .43or s as in Lemma A.2, let θ ∈ (0 , be such that for any g ∈ C ∞ c (that such an inequality holds on SR d is veriﬁed again through the local coordinates and the fact that the estimates on the metric are uniformover the manifold), || g || L . || g || θL || g || − θH s . We now obtain a uniform-in- δ L estimate. By Lemma A.2, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ¯ χf ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − θH Hyp ,δ + ∞ X j =1 jq (1 − θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) χ j f ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − θH Hyp ,δ , where we have denoted k · k H Hyp ,δ = k · k H δ + k · k H ∗ δ . Pairing (A.4) with ¯ χf ǫ,δ and χ j f ǫ,δ followed bystandard manipulations gives (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ¯ χf ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) H Hyp ,δ + sup j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) χ j f ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) H Hyp ,δ . ǫ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L . Therefore, we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL . ∞ X j =1 jq (1 − θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) χ j f ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e | x | f ǫ,δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θL . . Hence, we have a uniform-in- δ estimate on the L norm. Note that the estimate still depends badly on ǫ .Passing to the δ → limit shows that f ǫ ∈ L for each ǫ > . Finally, observe that f ǫ log f ǫ ∈ L , indeed, ˆ SR n f ǫ | log f ǫ | d q . ˆ SR n p f ǫ + ( f ǫ ) d q . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ǫ e γ | x | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) L + || f ǫ || L . This completes the proof of Theorem A.1.

B Proof of Proposition 4.2

Before we prove Proposition 4.2, we will need some preliminary results. As we will be taking commutatorsof the above vector ﬁelds it is important to record how projective vector ﬁelds behave under the Lie bracket.

Lemma B.1.

Let

A, B ∈ sl ( R n ) , then the following identity holds [ V A , V B ]( v ) = − V [ A,B ] ( v ) , where [ A, B ] := AB − BA denotes the usual commutator on linear operators.Proof. Let ∇ denote the Levi-Civita connection on S n − , then since ∇ is torsion-free, we have the followingformula for the Lie bracket in terms of the the covariant derivative [ V A , V B ] = ∇ V A V B − ∇ V B V A . Recall from the proof of Lemma 2.3 that using the embedding of S n − into R n , we have the followingformula for the total covariant derivative of V A (viewed as a linear operator on T v S n − ) ∇ V A ( v ) = Π v A − h v, Av i I. It follows that [ V A , V B ]( v ) = ∇ V B ( v ) V A ( v ) − ∇ V A ( v ) V B ( v )= Π v B Π v Av − Π v A Π v Bv − h v, Bv i V A ( v ) + h v, Av i V B ( v ) Π v u = u − h u, v i v for u ∈ T x M , we ﬁnd Π v B Π v Av + h v, Av i V B ( v ) = Π v BAv and Π v A Π v Bv + h v, Bv i V A ( v ) = Π v ABv , hence [ V A , V B ] = V BA − V AB = − V [ A,B ] . Of fundamental importance is the following observation for the lifting operation X ˜ X . Lemma B.2.

Any two vector ﬁelds

X, Y ∈ X ( M ) satisfy the identity [ ˜ X, ˜ Y ] = ^ [ X, Y ] . Thus the lifting operation X ˜ X is a Lie algebra isomorphism onto its range.Proof. Denote by ˆ X ( x, v ) = (cid:18) X ( x )0 (cid:19) , ˆ V ∇ X ( x, v ) = (cid:18) V ∇ X ( x ) ( v ) (cid:19) the extensions of the vector ﬁelds X and V ∇ X to vector ﬁelds on the sphere bundle S M and let U ( x, v ) = (cid:18) v (cid:19) be the ‘canonical’ vector ﬁeld on S M . Note that U is parallel to ˆ X in the sense that ˜ ∇ ˆ X U = 0 . Let ˜ ∇ denote the Levi-Civita connection on S M induced by the Sasaki metric ˜ g , and deﬁne a projection ˜Π on T ( x,v ) S M = T x M ⊕ T v S x M by ˜Π ( x,v ) (cid:18) u u (cid:19) = (cid:18) v u (cid:19) Note that for any “horizontal” vector ﬁeld ˆ X , ˜ ∇ ˆ X ˜Π = 0 holds since ∇ preserves the metric g .Using the above notation, we can write ˆ V ∇ X = ˜Π ˜ ∇ U ˆ X . Since we can now split any lifted vector ﬁeld ˜ X as ˜ X = ˆ X + ˆ V ∇ X , the commutator of ˜ X and ˜ Y is [ ˜ X, ˜ Y ] = [ ˆ X, ˆ Y ] + [ ˆ X, ˆ V ∇ Y ] − [ ˆ Y , ˆ V ∇ X ] + [ ˆ V ∇ X , ˆ V ∇ Y ] . (B.1)Naturally we ﬁnd that [ ˆ X, ˆ Y ] = (cid:18) [ X, Y ]0 (cid:19) = \ [ X, Y ] . Likewise, a simple consequence of Lemma B.1 implies [ ˆ V ∇ X , ˆ V ∇ Y ] = − ˜Π[ ˜ ∇ ˆ X, ˜ ∇ ˆ Y ] U = ˜Π (cid:16) ˜ ∇ [ U, ˆ X ] ˆ Y − ˜ ∇ [ U, ˆ Y ] ˆ X (cid:17) , where above [ ˜ ∇ ˆ X, ˜ ∇ ˆ Y ] denotes the commutator of ˜ ∇ ˆ X, ˜ ∇ ˆ Y as linear endomorphisms on a ﬁxed tangentspace T ( x,v ) S M . The remaining terms in (B.1) can be computed as [ ˆ X, ˆ V ∇ Y ] − [ ˆ Y , ˆ V ∇ X ] = ˜ ∇ ˆ X ˆ V ∇ Y − ˜ ∇ ˆ Y ˆ V ∇ X = ˜Π (cid:16) ˜ ∇ ˆ X ˜ ∇ U ˆ Y − ˜ ∇ ˆ Y ˜ ∇ U ˆ X (cid:17) . [ ˜ X, ˜ Y ] = \ [ X, Y ] + ˜Π (cid:16) ˜ ∇ ˆ X ˜ ∇ U ˆ Y − ˜ ∇ ˆ Y ˜ ∇ U ˆ X + ˜ ∇ [ U, ˆ X ] ˆ Y − ˜ ∇ [ U, ˆ Y ] ˆ X (cid:17) The proof will be complete once we show the identity ˜ ∇ ˆ X ˜ ∇ U ˆ Y − ˜ ∇ ˆ Y ˜ ∇ U ˆ X + ˜ ∇ [ U, ˆ X ] ˆ Y − ˜ ∇ [ U, ˆ Y ] ˆ X = ˜ ∇ U \ [ X, Y ] . (B.2)For this, we can use the Riemann curvature tensor on S M ˜ R ( X, Y ) Z := ˜ ∇ X ˜ ∇ Y Z − ˜ ∇ Y ˜ ∇ X Z − ˜ ∇ [ X,Y ] Z to change the order of covariant derivatives, giving ˜ ∇ ˆ X ˜ ∇ U ˆ Y − ˜ ∇ ˆ Y ˜ ∇ U ˆ X + ˜ ∇ [ U, ˆ X ] ˆ Y − ˜ ∇ [ U, ˆ Y ] ˆ X = ˜ R ( ˆ X, U ) ˆ Y − ˜ R ( ˆ Y , U ) ˆ X + ˜ ∇ U ˜ ∇ ˆ X ˆ Y − ˜ ∇ U ˜ ∇ ˆ Y ˆ X = ˜ R ( ˆ X, U ) ˆ Y + ˜ R ( U, Y ) ˆ X + ∇ U [ ˆ X, ˆ Y ] . The ﬁrst Bianchi identity implies that ˜ R ( ˆ X, U ) ˆ Y + ˜ R ( U, ˆ Y ) ˆ X = ˜ R ( ˆ X, ˆ Y ) U, and therefore identity (B.2) follows from the fact that R ( ˆ X, ˆ Y ) U = 0 since, for any vector ﬁeld Z ∈ X ( M ) ,we have that ˜ ∇ ˆ Z U = 0 .We are now ready to prove Proposition 4.2. Proof of Proposition 4.2.

A simple consequence of Lemma B.2 that for any collection of vector ﬁelds { X k } rk =0 on M we have the following identiﬁcation Lie( ˜ X ; ˜ X , . . . , ˜ X r ) = n ˜ X : X ∈ Lie( X ; X , . . . , X r ) o . Therefore the parabolic H ¨ormander condition for { ˜ X k } rk =0 is equivalent to (cid:26)(cid:18) X ( x ) V M X ( x ) ( v ) (cid:19) : X ∈ Lie( X ; X , . . . , X r ) (cid:27) = T x M ⊕ T v S x M. Clearly if the above condition is satisﬁed then { X k } rk =0 satisﬁes the parabolic H ¨ormander condition and 4.2holds. The converse follows from the fact that (4.2) (cid:26)(cid:18) X ( x ) V M X ( x ) ( v ) (cid:19) : X ∈ Lie( X ; X , . . . , X r ) , X ( x ) = 0 (cid:27) = { } ⊕ T v S x M. and { X ( x ) : X ∈ Lie( X ; X , . . . , X r ) , X ( x ) = 0 } = T x M \{ } . eferences [1] F. Abedin and G. Tralli, Harnack inequality for a class of Kolmogorov–Fokker–Planck equations in non-divergence form ,Archive for Rational Mechanics and Analysis (2019), no. 2, 867–900.[2] S. Amari and H. Nagaoka,

Methods of information geometry , American Mathematical Society, 2000 (en).[3] F. Anceschi, S. Polidoro, and M. A. Ragusa,

Moser’s estimates for degenerate Kolmogorov equations with non-negativedivergence lower order coefﬁcients , Nonlinear Analysis (2019), 111568.[4] S. Armstrong and J.-C. Mourrat,

Variational methods for the kinetic Fokker-Planck equation , arXiv preprint arXiv:1902.04037(2019).[5] L. Arnold,

Random dynamical systems , Dynamical systems, 1995, pp. 1–43.[6] L. Arnold, D. C. Nguyen, and V. Oseledets,

Jordan normal form for linear cocycles , Random Operators and StochasticEquations (1999), no. 4, 303–358.[7] L. Arnold, G. Papanicolaou, and V. Wihstutz, Asymptotic analysis of the Lyapunov exponent and rotation number of therandom oscillator and applications , SIAM Journal on Applied Mathematics (1986), no. 3, 427–450.[8] E. I. Auslender and G. N. Milstein, Asymptotic expansion of Lyapunov exponent for linear stochastic systems with smallnoises , Prikl. Mat. i Mekh. (1982), 358–365 (In Russ.)[9] D. Bakry and M. ´Emery, Diffusions hypercontractives , S´eminaire de probabilit´es xix 1983/84, 1985, pp. 177–206.[10] L. Barreira and Y. Pesin,

Smooth ergodic theory and nonuniformly hyperbolic dynamics , Handbook of dynamical systems (2006), 57–263.[11] P. H Baxendale, Lyapunov exponents and relative entropy for a stochastic ﬂow of diffeomorphisms , Probability Theory andRelated Fields (1989), no. 4, 521–554.[12] P. H Baxendale, Lyapunov exponents and stability for the stochastic Dufﬁng-van der Pol oscillator , Iutam symposium onnonlinear stochastic dynamics, 2003, pp. 125–135.[13] P. H Baxendale,

Stochastic averaging and asymptotic behavior of the stochastic dufﬁng–van der pol equation , Stochasticprocesses and their applications (2004), no. 2, 235–272.[14] P. H Baxendale and L. Goukasian,

Lyapunov exponents for small random perturbations of Hamiltonian systems , Annals ofprobability (2002), 101–134.[15] J. Bedrossian, A. Blumenthal, and S. Punshon-Smith,

Lagrangian chaos and scalar advection in stochastic ﬂuid mechanics ,arXiv preprint arXiv:1809.06484 (2018).[16] J. Bedrossian and K. Liss,

Quantitative spectral gaps and uniform lower bounds in the small noise limit for Markov semigroupsgenerated by hypoelliptic stochastic differential equations , arXiv:2007.13297 (2020).[17] J. Bochi,

Genericity of zero Lyapunov exponents , Ergodic Theory and Dynamical Systems (2002), no. 6, 1667–1696.[18] J. Bochi and M. Viana, The Lyapunov exponents of generic volume-preserving and symplectic maps , Annals of mathematics(2005), 1423–1485.[19] G. Boffetta, M. Cencini, M. Falcioni, and A. Vulpiani,

Predictability: a way to characterize complexity , Physics reports (2002), no. 6, 367–474.[20] A. Carverhill,

A nonrandom Lyapunov spectrum for nonlinear stochastic dynamical systems , Stochastics: an internationaljournal of probability and stochastic processes (1986), no. 4, 253–287.[21] A. Carverhill, Furstenberg’s theorem for nonlinear stochastic systems , Probability theory and related ﬁelds (1987), no. 4,529–534.[22] J. T Chang and D. Pollard, Conditioning as disintegration , Statistica Neerlandica (1997), no. 3, 287–317.[23] S. Crovisier and S. Senti, Un probl`eme pour le xxi(i)`eme si`ecle , La Gazette des mathaticiens (2018).[24] D. Dolgopyat, V. Kaloshin, L. Koralov, et al.,

Sample path properties of the stochastic ﬂows , The Annals of Probability (2004), no. 1A, 1–27.[25] P. Duarte, Abundance of elliptic isles at conservative bifurcations , Dynamics and Stability of Systems (1999), no. 4,339–356.[26] H. Furstenberg, Noncommuting random products , Transactions of the American Mathematical Society (1963), no. 3,377–428.[27] H. Furstenberg,

Rigidity and cocycles for ergodic actions of semi-simple lie groups , S´eminaire bourbaki vol. 1979/80 expos´es543–560, 1981, pp. 273–292.

28] F Golse, C. Imbert, C. Mouhot, and A Vasseur,

Harnack inequality for kinetic fokker-planck equations with rough coefﬁcientsand application to the landau equation , to appear in Annali della Scuola Normale Superiore di Pisa (2016).[29] L. H¨ormander,

Hypoelliptic second order differential equations , Acta Mathematica (1967), no. 1, 147–171.[30] P. Imkeller and C. Lederer,

An explicit description of the Lyapunov exponents of the noisy damped harmonic oscillator ,Dynamics and Stability of Systems (1999), no. 4, 385–405.[31] A. Karimi and M. R Paul, Extensive chaos in the Lorenz-96 model , Chaos: An interdisciplinary journal of nonlinear science (2010), no. 4, 043105.[32] R. Khasminskii, Stochastic stability of differential equations , Vol. 66, Springer Science & Business Media, 2011.[33] Y. Kifer,

A note on integrability of C r -norms of stochastic ﬂows and applications , Stochastic mechanics and stochastic pro-cesses, 1988, pp. 125–131.[34] Y. Kifer, Ergodic theory of random transformations , Vol. 10, Springer Science & Business Media, 2012.[35] J. F. C. Kingman,

Subadditive ergodic theory , The annals of Probability (1973), no. 6, 883–899.[36] A. E. Kogoj and S. Polidoro, Harnack inequality for hypoelliptic second order partial differential operators , Potential Anal. (20164), no. 14, 545–555.[37] H. Kunita, Stochastic ﬂows and stochastic differential equations , Vol. 24, Cambridge university press, 1997.[38] A. Lanconelli, A. Pascucci, and S. Polidoro,

Gaussian lower bounds for non-homogeneous kolmogorov equations with mea-surable coefﬁcients , Journal of Evolution Equations (2020), 1–19.[39] M. Ledoux, I. Nourdin, and G. Peccati,

Steins method, logarithmic sobolev and transport inequalities , Geometric and Func-tional Analysis (2015), no. 1, 256–306.[40] F. Ledrappier, Quelques propri´et´es des exposants caract´eristiques , ´Ecole d’´et´e de probabilit´es de saint-ﬂour xii-1982, 1984,pp. 305–396.[41] F. Ledrappier,

Positivity of the exponent for stationary sequences of matrices , Lyapunov exponents, 1986, pp. 56–73.[42] K. K Lin and L.-S. Young,

Shear-induced chaos , Nonlinearity (2008), no. 5, 899.[43] P.-D. Liu and M. Qian, Smooth ergodic theory of random dynamical systems , Springer, 2006.[44] E. N Lorenz,

The nature and theory of the general circulation of the atmosphere , Vol. 218, World Meteorological OrganizationGeneva, 1967.[45] E. N Lorenz,

Predictability: A problem partly solved , Proc. seminar on predictability, 1996.[46] E. N Lorenz and K. A Emanuel,

Optimal sites for supplementary weather observations: Simulation with a small model ,Journal of the Atmospheric Sciences (1998), no. 3, 399–414.[47] M. Lyubich, Almost every real quadratic map is either regular or stochastic , Annals of Mathematics (2002), 1–78.[48] A. Majda, R. V Abramov, and M. J Grote,

Information theory and stochastics for multiscale nonlinear systems , Vol. 25,American Mathematical Soc., 2005.[49] A. Majda and X. Wang,

Nonlinear dynamics and statistical theories for basic geophysical ﬂows , Cambridge University Press,2006.[50] A. J Majda,

Introduction to turbulent dynamical systems in complex systems , Springer, 2016.[51] S. P Meyn and R. L Tweedie,

Markov chains and stochastic stability , Springer Science & Business Media, 2012.[52] N. Moshchuk and R Khasminskii,

Moment Lyapunov exponent and stability index for linear conservative system with smallrandom perturbation , SIAM Journal on Applied Mathematics (1998), no. 1, 245–256.[53] C. Mouhot, De Giorgi–Nash–Moser and H¨ormander theories: new interplays , Proceedings of the international congress ofmathematiciansrio de, 2018, pp. 2467–2493.[54] S. E Newhouse,

Diffeomorphisms with inﬁnitely many sinks , Topology (1974), no. 1, 9–18.[55] S. E Newhouse, The abundance of wild hyperbolic sets and non-smooth stable sets for diffeomorphisms , PublicationsMath´ematiques de l’IH ´ES (1979), 101–151.[56] V. I. Oseledets, A multiplicative ergodic theorem. characteristic Ljapunov exponents of dynamical systems , TrudyMoskovskogo Matematicheskogo Obshchestva (1968), 179–210.[57] E. Ott, B. R Hunt, I. Szunyogh, A. V Zimin, E. J Kostelich, M. Corazza, E. Kalnay, D. Patil, and J. A Yorke, A localensemble Kalman ﬁlter for atmospheric data assimilation , Tellus A: Dynamic Meteorology and Oceanography (2004),no. 5, 415–428.

58] E. Pardoux and V. Wihstutz,

Lyapunov exponent and rotation number of two-dimensional linear stochastic systems with smalldiffusion , SIAM Journal on Applied Mathematics (1988), no. 2, 442–457.[59] D. Paz´o, I. G Szendro, J. M L´opez, and M. A Rodr´ıguez, Structure of characteristic Lyapunov vectors in spatiotemporalchaos , Physical Review E (2008), no. 1, 016209.[60] Y. Pesin and V. Climenhaga, Open problems in the theory of non-uniform hyperbolicity , Discrete Contin. Dyn. Syst (2010),no. 2, 589–607.[61] M. A Pinsky and V. Wihstutz, Lyapunov exponents of nilpotent Itˆo systems , Stochastics: An International Journal of Probabilityand Stochastic Processes (1988), no. 1, 43–57.[62] M. S Raghunathan, A proof of Oseledec’s multiplicative ergodic theorem , Israel Journal of Mathematics (1979), no. 4,356–362.[63] F. Rezakhanlou, C. Villani, and F. Golse, Entropy methods for the Boltzmann equation: lectures from a special semester at theCentre ´Emile Borel, Institut H. Poincar´e, Paris, 2001 , Springer Science & Business Media, 2008.[64] V. A. Rokhlin,

On the fundamental ideas of measure theory , Matematicheskii Sbornik (1949), no. 1, 107–150.[65] G. Royer, Croissance exponentielle de produits Markoviens de matrices al´eatoires , Annales de l’ihp probabilit´es et statis-tiques, 1980, pp. 49–62.[66] G. Toscani,

Entropy production and the rate of convergence to equilibrium for the fokker-planck equation , Quarterly of Ap-plied Mathematics (1999), no. 3, 521–541.[67] W. Tucker, The Lorenz attractor exists , Comptes Rendus de l’Acad´emie des Sciences-Series I-Mathematics (1999), no. 12,1197–1202.[68] A. Virtser,

On products of random matrices and operators , Theory of Probability & Its Applications (1980), no. 2, 367–377.[69] P. Walters, A dynamical proof of the multiplicative ergodic theorem , Transactions of the American Mathematical Society (1993), no. 1, 245–257.[70] P. Walters,

An introduction to ergodic theory , Vol. 79, Springer Science & Business Media, 2000.[71] Q. Wang and L.-S. Young,

Toward a theory of rank one attractors , Annals of Mathematics (2008), no. 2, 349–480.[72] A. Wilkinson,

What are Lyapunov exponents, and why are they interesting? , Bulletin of the American Mathematical Society (2017), no. 1, 79–105.[73] L.-S. Young, Ergodic theory of differentiable dynamical systems , Real and complex dynamical systems, 1995, pp. 293–336.[74] L.-S. Young,

Mathematical theory of Lyapunov exponents , Journal of Physics A: Mathematical and Theoretical (2013),no. 25, 254001.[75] G. Zaslavsky, The simplest case of a strange attractor , Physics Letters A (1978), no. 3, 145–147.[76] R. J Zimmer, Ergodic theory and semisimple groups , Vol. 81, Springer Science & Business Media, 2013., Vol. 81, Springer Science & Business Media, 2013.