[PDF] Estimates on the Markov Convexity of Carnot Groups and Quantitative Nonembeddability

Abstract

We show that every graded nilpotent Lie group G of step r , equipped with a left invariant metric homogeneous with respect to the dilations induced by the grading, (this includes all Carnot groups with Carnot-Caratheodory metric) is Markov p -convex for all p∈[2r,∞) . We also show that this is sharp whenever G is a Carnot group with r≤3 , a free Carnot group, or a jet space group; such groups are not Markov p -convex for any p∈(0,2r) . This continues a line of research started by Li who proved this sharp result when G is the Heisenberg group. As corollaries, we obtain new estimates on the non-biLipschitz embeddability of some finitely generated nilpotent groups into nilpotent Lie groups of lower step. Sharp estimates of this type are known when the domain is the Heisenberg group and the target is a uniformly convex Banach space or L 1 , but not when the target is a nonabelian nilpotent group.

Full PDF

EESTIMATES ON THE MARKOV CONVEXITY OF CARNOT GROUPS ANDQUANTITATIVE NONEMBEDDABILITY

CHRIS GARTLAND

Abstract.

3, a free Carnot group, or ajet space group; such groups are not Markov p -convex for any p ∈ (0 , r ). This continues a line ofresearch started by Li who proved this sharp result when G is the Heisenberg group. As corollaries,we obtain new estimates on the non-biLipschitz embeddability of some ﬁnitely generated nilpotentgroups into nilpotent Lie groups of lower step. Sharp estimates of this type are known when thedomain is the Heisenberg group and the target is a uniformly convex Banach space or L , but notwhen the target is a nonabelian nilpotent group. Contents

1. Introduction 11.1. Background 11.2. Summary of Results 32. Discussion of Proof Methods 62.1. Discussion of Proof of Theorem 4.19 62.2. Discussion of Proof of Theorem 5.6 73. Preliminaries 83.1. Graded Nilpotent and Stratiﬁed Lie Algebras and their Lie Groups 83.2. Norms and Metrics 93.3. Model Filiform Groups and Jet Spaces over R J r − ( R ) 245.1. Directed Graphs and Random Walks 245.2. Mapping the Graphs into J r − ( R ) 27References 351. Introduction

Background.

In [Rib76], Ribe showed that if two Banach spaces

E, F are uniformly home-omorphic, then they are mutually ﬁnitely representable; there exists a λ < ∞ such that for anyﬁnitely dimensional subspace E of E , there is a subspace F of F whose Banach-Mazur distance Thanks to Jeremy Tyson for helpful comments in the preparation of this article and to Assaf Naor for suggestingthe (non)embeddability corollaries of the main theorems. a r X i v : . [ m a t h . M G ] D ec rom E is at most λ . Properties of Banach spaces that are preserved under mutual ﬁnite repre-sentability are called local , and many classical properties such as type, cotype, superreﬂexivity, and p -convexity are local. Recall that a Banach space is said to be p -convex for some p ≥ (cid:107) · (cid:107) and K < ∞ such that for every (cid:15) ∈ [0 , {(cid:107) ( x + y ) / (cid:107) : (cid:107) x (cid:107) , (cid:107) y (cid:107) ≤ , (cid:107) x − y (cid:107) ≥ (cid:15) } ≤ − (cid:15) p /K Ribe’s theorem implies that these properties are really metric properties, suggesting that eachshould have a reformulation that involves only the metric structure of the Banach space and notthe linear structure. The research program concerned with ﬁnding these reformulations is knownas the

Ribe program . The program was initiated by Bourgain in [Bou86] in which he made theﬁrst substantial contribution by characterizing superreﬂexive Banach spaces as those which do notadmit biLipschitz embeddings of the binary trees of depth k with uniform control on the biLipschitzdistortion. We record here that the biLipschitz distortion (or just distortion) of a map f : X → Y between metric spaces ( X, d X ), ( Y, d Y ) is the least value of L for which there exists 0 < D < ∞ sothat d X ( x, y ) ≤ Dd Y ( f ( x ) , f ( y )) ≤ Ld X ( x, y )for all x, y ∈ X , that f is a biLipschitz embedding if its distortion is ﬁnite, and that f is a biLipschitzequivalence if it is a biLipschitz embedding and surjective. The biLipschitz distortion of X into Y is the inﬁmal distortion of all maps from X into Y . Another major contribution to the Ribeprogram is a purely metric reformulation of p -convexity. The metric property Markov p -convexity was originally deﬁned by Lee-Naor-Peres in [LNP09] and proved by Mendel-Naor in [MN13] to bea reformulation of p -convexity. Here are the speciﬁcs: Deﬁnition 1.1 (Deﬁnition 1.2, [MN13]) . Let { X t } t ∈ Z be a Markov chain on a state space Ω. Givenan integer k ≥

0, we denote by { ˜ X t ( k ) } t ∈ Z the process which equals X t for time t ≤ k and evolvesindependently (with respect to the same transition probabilities) for time t > k . Fix p >

0. Ametric space (

M, d ) is called

Markov p -convex if there is Π < ∞ so that for every Markov chain { X t } t ∈ Z on a state space Ω, and for every f : Ω → M , ∞ (cid:88) k =0 (cid:88) t ∈ Z E [ d ( f ( X t ) , f ( ˜ X t ( t − k ))) p ]2 kp ≤ Π p (cid:88) t ∈ Z E [ d ( f ( X t +1 ) , f ( X t )) p ]Set Π p ( M ) equal to the least value of Π so that the above inequality holds (whenever it exists).Π p ( M ) is called the Markov p -convexity constant of M . Theorem 1.2 (Theorem 1.3, [MN13]) . A Banach space is p -convex if and only if it is Markov p -convex. Observe the following fact: if there is a map f : X → Y with biLipschitz distortion L , thenΠ p ( X ) ≤ L Π p ( Y ). Thus, Markov convexity can be used to answer quantitative questions aboutmetric spaces in the Lipschitz category.We present two such applications, the ﬁrst on the impossibility of dimension reduction in traceclass operators, S . From page 2 of [NPS18]): A Banach space ( X, (cid:107) · (cid:107) X ) admits metric dimen-sion reduction if there exists α < ∞ such that every n -point subset of X biLipschitz embeds withdistortion α into a linear subspace of X with dimension n o (1) . This deﬁnition is inspired by the fa-mous Johnson-Lindenstrauss Lemma ([JL84]) which implies Hilbert space admits metric dimensionreduction. In [NPS18], Naor, Pisier, and Schechtman showed that there is an inﬁnite sequence of n -point subsets of S whose Markov 2-convexity constant is bounded below by a universal constanttimes (cid:112) ln( n ), and that the Markov 2-convexity constant of any d -dimensional linear subspace of S is bounded above by a universal constant times (cid:112) ln( n ). Together these imply their main result Theorem 1, [NPS18]): S does not admit dimension reduction. For more on the Ribe programand dimension reduction, see the surveys [Nao12] and [Nao18].Here is a second application of Markov convexity. In the spirit of the Ribe program, Ostrovskiifound a purely metric characterization of the Radon-Nikodym property (RNP) of Banach spacesby showing that a Banach space has the RNP if and only if it does not contain a biLipschitz copyof a thick family of geodesics (Corollary 1.5 [Ost14a]). He asked a natural follow-up question: if ageodesic metric space does not biLipschitz embed into any RNP space, must it contain a biLipschitzcopy of a thick family of geodesics? The Heisenberg group is a geodesic metric space that does notbiLipschitz embed into any RNP space (see Section 1.2 of [LN06] or Theorem 6.1 of [CK06]), andOstrovskii showed that in fact it does not contain a biLipschitz copy of a thick family of geodesics,thus negatively answering the question. He accomplished this by proving that any metric spacecontaining a biLipschitz copy of a thick family of geodesics cannot be Markov p -convex for any p > Theorem 1.3.

Proposition 7.2 and Theorem 7.4, [Li14] : Every graded nilpotent Lie group of step r is Markov r !) -convex.Theorem 1.1 and Corollary 1.3, [Li16] : The set of p for which the Heisenberg group is Markov p -convex is exactly [4 , ∞ ) . Summary of Results.

This article continues the line of research started by Theorem 1.3.Our main results are:

Theorem 4.19.

For every p > , r ≥ , coarsely dense set N ⊆ J r − ( R ) , and R ≥ , let B N ( R ) := { x ∈ N : d CC (0 , x ) ≤ R } . Then Π p ( B N ( R )) (cid:38) ln( R ) p − r ln(ln( R )) p + r where the implicit constant can depend on r, p but not on N, R . Recall that a subset N of a metric space ( X, d X ) is coarsely dense if there exists C < ∞ suchthat X = ∪ x (cid:48) ∈ N { x ∈ X : d X ( x, x (cid:48) ) ≤ C } . See Section 3 for the deﬁnition of J r − ( R ). Theorem4.19 is restated and proved at the end of Section 4.2, and similarly for Theorem 5.6 at the end ofSection 5.2.We can extend this result to other groups using the notion of subquotients. Recall that asurjective map f : X → Y between metric spaces ( X, d X ) , ( Y, d Y ) is a Lipschitz quotient map withconstant

C < ∞ if there exists 0 < D < ∞ such that for all x ∈ X and R > B R ( f ( x )) ⊆ f ( B DR ( x )) ⊆ B CR ( f ( x ))If such a map f exists we say Y is a Lipschitz quotient of X . X is a Lipschitz subquotient of Y with constant C if there is a metric space Z such that Z embeds isometrically into Y and X isa Lipschitz quotient of Z with constant C , or, equivalently, there is a a metric space Z such that Z is a Lipschitz quotient of Y with constant C and X isometrically embeds into Z . It followsfrom Proposition 4.1 of [MN13] that if X is a Lipschitz subquotient of Y with constant C thenΠ p ( X ) ≤ C Π p ( Y ).Every free Carnot group of step r ≥ J r − ( R ) (in fact every graded nilpotent Lie groupof step r with 2-dimensional horizontal layer) as a graded quotient group, and the projection map k (cid:16) R dualizes to a graded embedding J r − ( R ) (cid:44) → J r − ( R k ). See Chapter 14 of [BLU07] forbackground on free Carnot groups and [War05] for background on the jet spaces groups J r − ( R k ). Corollary 1.4.

Let G be a Carnot group of step r that has J r − ( R ) as a graded subquotient group,for example G may be a free Carnot group, J r − ( R k ) , or any Carnot group if r ≤ . The set of p > for which G is Markov p -convex is exactly [2 r, ∞ ) .Proof. This follows from Theorems 4.19 and 5.6 and the preceding discussion. (cid:3)

Recall that a subgroup Γ ≤ G of a Lie group G is a lattice if the subspace topology on Γ isdiscrete and G/ Γ carries a G -invariant, Borel probability measure. Corollary 1.5.

Let G be a Carnot group of step r that has J r − ( R ) as a graded subquotient group,for example G may be a free Carnot group, J r − ( R k ) , or any Carnot group if r ≤ (by Lemma3.3). Let Γ ≤ G be a lattice equipped with the word metric with respect to a ﬁnite generating set(which exists by Theorem 2.21 of [Rag72] ), and let B Γ ( R ) denote the ball of radius R in Γ centeredat the identity. Then for any p > , Π p ( B Γ ( R )) (cid:38) ln( R ) p − r ln(ln( R )) p + r Proof.

Let G, Γ , p be as above. The inclusion Γ (cid:44) → G is a biLipschitz embedding onto a coarselydense subset when Γ is equipped with the word metric with respect to a ﬁnite generating set (thiscan be proven using Mostow’s theorem that lattices in nilpotent Lie groups are cocompact ([Mos62])and applying the fundamental theorem of geometric group theory). Thus it suﬃces to prove theconclusion for any coarsely dense N (cid:48)(cid:48) ⊆ G . Let N (cid:48)(cid:48) be such a subset. By assumption, there is aCarnot group G (cid:48) and a graded quotient homomorphism q : G → G (cid:48) such that J r − ( R ) is a gradedsubgroup of G (cid:48) . Then q is a Lipschitz quotient map, so there is a constant C < ∞ such that forany R ≥

3, Π p ( B N (cid:48)(cid:48) ( R )) (cid:38) Π p ( B q ( N (cid:48)(cid:48) ) ( R/C ))Thus it suﬃces to prove the conclusion for any coarsely dense subset N (cid:48) ⊆ G (cid:48) . Let N (cid:48) be such asubset. Fix B >> N ⊆ J r − ( R ) be a coarsely dense, B -separated subset (each pair ofdistinct points in N is separated by a distance at least B - such sets always exist by Zorn’s Lemma).Then since J r − ( R ) is a graded subgroup of G (cid:48) , there is a biLipschitz embedding N → G (cid:48) . If B ischosen large enough, we map postcompose with a nearest neighbor map G (cid:48) → N (cid:48) to obtain anotherbiLipschitz embedding N → N (cid:48) . Then the conclusion follows from Theorem 5.6. (cid:3) The following quantitative nonembeddability estimate follows from the previous corollary andTheorem 4.19.

Corollary 1.6.

Let G be a Carnot group of step r that has J r − ( R ) as a graded subquotient group,for example G may be a free Carnot group, J r − ( R k ) , or any Carnot group if r ≤ . Let Γ ≤ G bea lattice equipped with the word metric with respect to a ﬁnite generating set, and let B Γ ( R ) denotethe ball of radius R in Γ centered at the identity. Let G (cid:48) be any graded nilpotent Lie group of step r (cid:48) < r . Then we have the following estimate for c G (cid:48) ( B Γ ( R )) , the biLipschitz distortion of B Γ ( R ) in G (cid:48) : c G (cid:48) ( B Γ ( R )) (cid:38) ln( R ) r (cid:48) − r ln(ln( R )) r (cid:48) + r where the implicit constant depends on G and G (cid:48) but not on R . Such quantitative nonembeddability estimates have been the subject of much attention for em-beddings of Heisenberg groups into certain Banach spaces, see [ANT13] and [LN14] for uniformlyconvex Banach space targets and [NY18] for L targets. In particular, it can be deduced from ANT13] and [Ass83] that the biLipschitz distortion of the ball of radius R in a lattice in theHeisenberg group into Hilbert space equals, up to universal factors, (cid:112) ln( R ). Thus, our estimatesin the previous corollary cannot be sharp when r = 2 and r (cid:48) = 1. However, these estimates seemto be the ﬁrst of their type when the target is allowed to be a nilpotent group of step larger than 1.Other quantitative nonembeddability estimates of between Carnot groups were obtained in [Li14],but they are of a diﬀerent ﬂavor. Since our estimates are not sharp for r = 2 , r (cid:48) = 1, we speculatethat they are not sharp for larger values of r, r (cid:48) either.Next, we obtain new results on the nonexistence Lipschitz subquotient maps. Corollary 1.7.

Let G be a Carnot group of step r that has J r − ( R ) as a graded subquotient group,for example G may be a free Carnot group, J r − ( R k ) , or any Carnot group if r ≤ . Let G (cid:48) be anygraded nilpotent Lie group of step r (cid:48) .(1) G is not a Lipschitz subquotient of L p (or any p -convex space) for any p ∈ (1 , r ) .(2) If r > r (cid:48) , G is not a Lipschitz subquotient of G (cid:48) .Proof. These follow from the previous corollary, the fact that Markov p -convexity is preservedunder Lipschitz subquotients, Theorem 1.2, and the classical fact that L p is max(2 , p )-convex for p > (cid:3) Essentially all of the previously know results of this ﬂavor are proved via Pansu diﬀerentiation([Pan89]), which applies when the domain is a (ﬁnite dimensional) Carnot group and the targetis an RNP Banach space or (ﬁnite dimensional) Carnot group (Section 1.2 of [LN06] or Theorem6.1 of [CK06], which are stated for biLipschitz maps on the Heisenberg group, but also apply tobiLipschitz or Lipschitz quotient maps on any Carnot group of step at least 2). There is also arecent diﬀerentiation theorem of Le Donne-Li-Moisala ([LDLM18]) which applies when the domainis a “scalable” group ﬁltrated by (ﬁnite dimensional) Carnot groups and the target is an RNPspace. However, there does not seem to be a clear way to deduce Corollary 1.7 in full generalityfrom any of these methods.We may use Markov convexity again to prove nonexistence of subquotient maps onto some“inﬁnite step” graded Lie groups. See Section 3.4 for the deﬁnitions of inverse limits, J ∞ ( R k ), andthe free Carnot group on k generators, F ∞ k . Corollary 1.8.

Let G ← G ← . . . be an inverse system of graded nilpotent Lie groups such thatfor every r , there is an i with J r − ( R ) a graded subquotient of G i , and let G ∞ be the inverse limitgroup. For example, G ∞ may be J ∞ ( R k ) or F ∞ k . Then G ∞ is not a Lipschitz subquotient of anysuperreﬂexive space.Proof. Pisier’s renorming theorem, Theorem 11.37 of [Pis16], states that any superreﬂexive Banachspace is p -convex for some p ∈ [2 , ∞ ). Thus it suﬃces to show that G ∞ is not Markov p -convexfor any p ∈ (0 , ∞ ). For every r ≥ J r − ( R ) is a Lipschitz subquotient of G ∞ , so since Markov p -convexity is preserved under Lipschitz quotients, the conclusion follows from Corollary 1.4. (cid:3) Finally, we provide a positive result on the existence of embeddings using one of the main resultsof [LNP09]. A metric tree is the vertex set of a weighted graph-theoretical tree equipped with theshortest path metric.

Theorem 1.9 (Theorem 4.1, [LNP09]) . If T is a metric tree and T is Markov p -convex, then T biLipschitz embeds into L p . Corollary 1.10.

If a metric tree T is a Lipschitz subquotient of a graded nilpotent Lie group G ofstep r , then T biLipschitz embeds into L p for every p ≥ r .Proof. This follows from Theorem 4.19, the fact that Markov convexity is inherited by Lipschitzsubquotients, and Theorem 1.9. (cid:3) e conclude this introduction with the obvious conjecture that Theorems 4.19 and 5.6 lead to,and another somewhat less obvious conjecture. Conjecture 1.11.

Every Carnot group of step r is not Markov p -convex for every p ∈ (0 , r ) . Conjecture 1.12.

For each graded nilpotent Lie group G , the set of p for which G is Markov p -convex is the same as that of the largest Carnot subgroup of G . Discussion of Proof Methods

We engage here in informal discussion of the proofs of Theorem 4.19 and 5.6. This discussionis intended to give a brief overview of the proofs for readers with a suﬃcient background in therelevant topics. For Theorem 4.19, the relevant topics are graded nilpotent Lie algebras, the groupstructure they inherit via the Baker-Campbell Hausdorﬀ formula, and their graded-homogeneousgroup quasi-norms. For Theorem 5.6, the relevant topics are Markov convexity of diamond-typegraphs, jet space Carnot groups, and Khintchine’s inequality. Readers unfamiliar with these topicmay ﬁnd this section unuseful.2.1.

Discussion of Proof of Theorem 4.19.

The method employed by Mendel-Naor to provethat p -convexity of Banach spaces implies Markov p -convexity is to:(1) Invoke the well-known result that p -convex Banach spaces have equivalent norms (cid:107) · (cid:107) satisfying the parallelogram inequality ( (cid:107) x (cid:107) p + (cid:107) x − y (cid:107) p ) / − (cid:107) y/ (cid:107) p (cid:38) (cid:107) x − y/ (cid:107) p .(2) Prove the 4-point inequality (2 d ( y, x ) p + d ( z, y ) p + d ( y, w ) p ) / − ( d ( x, w ) / p − ( d ( x, z ) / p (cid:38) d ( z, w ) p , where d ( x, y ) = (cid:107) x − y (cid:107) .(3) Prove the Markov p -convexity inequality, Deﬁnition 1.1.We prove the analogous inequalities for graded nilpotent Lie groups:(1) Lemma 4.17. Construct a group quasi-norm N satisfying ( N ( x ) p + N ( y − x ) p ) / − ( N ( y ) / p (cid:38) N ( δ / ( y ) − x ) p .(2) Lemma 4.18. Prove the 4-point inequality (2 d ( y, x ) p + d ( z, y ) p + d ( y, w ) p ) / − ( d ( x, w ) / p − ( d ( x, z ) / p (cid:38) d ( z, w ) p , where d ( x, y ) = N ( y − x ).(3) Prove Theorem 4.19. The Markov p -convexity inequality.The passage from (1) to (2) and from (2) to (3) is exactly the same as in Banach space case.To prove (1), we recursively construct a sequence of homogeneous quasi-norms on the group, andprove that they satisfy (1) inductively. Actually, the following stronger version of (1) (with p = 2 s ,the case p ≥ s is taken care of later) is needed for the induction to close, this is Lemma 4.16.( N s ( x ) s + N s ( y − x ) s ) / − ( N s ( y ) / s (cid:38) SN s ( x, y ) s + D s ( x, y ) + N s ( δ / ( y ) − x ) s There are two extra terms that appear in this inequality, SN s ( x, y ) and D s ( x, y ), deﬁned in Def-initions 4.8 and 4.14. D s ( x, y ) is designed to bound (up to constants) the square of any BCHpolynomial of degree s (see Deﬁnition 4.1), so one may guess how it would be useful to prove (1). SN s ( x, y ) is nearly a positive deﬁnite quasi-norm of ( x , . . . x s , y , . . . y s ) (the name SN is meantto suggest that it is a seminorm instead of a norm, since it is not positive deﬁnite), but not quiteas it vanishes when x = y / x i = y i = 0 for i ≥

2. However, this is not an issue as we willhave an extra (cid:107) y (cid:107) term in the induction, so that (cid:107) y (cid:107) + SN s ( x, y ) is genuinely a quasi-norm of( x , . . . x s , y , . . . y s ). Here are D s and SN s for some small s : D ( x, y ) = (cid:107) ( x , y ) (cid:107) + (cid:107) ( x , y ) (cid:107) (cid:107) ( x , y ) (cid:107) + (cid:107) ( x , y ) (cid:107) τ ( x, y ) D ( x, y ) = (cid:107) ( x , y ) (cid:107) + (cid:107) ( x , y ) (cid:107) (cid:107) ( x , y ) (cid:107) + (cid:107) ( x , y ) (cid:107) + (cid:107) ( x , y ) (cid:107) (cid:107) ( x , y ) (cid:107) + (cid:107) ( x , y ) (cid:107) τ ( x, y ) + (cid:107) ( x , y ) (cid:107) τ ( x, y ) SN ( x, y ) = max( (cid:107) x − y / (cid:107) , (cid:107) ( x , y ) (cid:107) / , (cid:107) ( x , y ) (cid:107) / ) he polynomial τ ( x, y ) is designed to bound the squares of terms coming from the bracket betweentwo vectors from the horizontal layer. For example, in the second Heisenberg group, τ ( x, y ) = ( x y − x y ) + ( x y − x y ) We recursively construct the quasi-norms N s +1 given all the previous quasi-norms by deﬁning N s +1 ( x ) to be an (cid:96) s +1) sum of λ s +1 (cid:107) x s +1 (cid:107) / ( s +1) and the top half of the previously deﬁned quasi-norms, where λ s +1 is a positive constant chosen small enough (depending on the product structureof the group in question) to make the inequality of Lemma 4.16(1) hold. Speciﬁcally, from (4.1), N ( x ) = (cid:112) (cid:107) x (cid:107) + λ (cid:107) x (cid:107) N s +1 ( x ) = s +1) (cid:118)(cid:117)(cid:117)(cid:116) λ s +1 (cid:107) x s +1 (cid:107) + s (cid:88) s (cid:48) = (cid:100) ( s +1) / (cid:101) N s +1) s (cid:48) ( x )The reason why we add the top half of the previously deﬁned norms, and the reason for the inclusion SN s ( x, y ) term in the inequality, is to help pass from D s ( x, y ) to D s +1 ( x, y ) during the proof of theinductive step. When proving the inductive step, we have terms like( SN s (cid:48) ( x, y ) s (cid:48) + D s (cid:48) ( x, y )) ( s +1) /s (cid:48) , s (cid:48) ≤ s , appearing to which we apply Lemma 3.6 and obtaina term like SN s (cid:48) ( x, y ) s +1 − s (cid:48) ) D s (cid:48) ( x, y ). This term bounds (cid:107) ( x s +1 − s (cid:48) , y s +1 − s (cid:48) ) (cid:107) D s (cid:48) ( x, y ) exactlywhen (cid:100) ( s + 1) / (cid:101) ≤ s (cid:48) ≤ s . Then summing (cid:107) ( x s +1 − s (cid:48) , y s +1 − s (cid:48) ) (cid:107) D s (cid:48) ( x, y ) over this range of s (cid:48) accounts for all the terms in D s +1 ( x, y ), except for the top-layer term (cid:107) ( x s +1 , y s +1 ) (cid:107) (since anyother term in D s +1 ( x, y ) contains as a factor a variable from one of the lower half layers, see Lemma4.9 for details), which is accounted for later.2.2. Discussion of Proof of Theorem 5.6.

We recursively construct a sequence of directedgraphs Γ m and maps from them into the jet space of step r ( J r − ( R )) to show that it is not Markov p -convex for any p < r . The Markov processes we use are standard directed random walks on thegraphs. This is very similar to the method used in [Li16], where something akin to the Laakso-Lang-Plaut diamond graphs were used. The main feature of those graphs G m is that G m +1 isobtained from G i by replaced each edge of G with a copy of G m . Roughly speaking, Li recursivelymaps G m +1 into R by replacing each edge of a distorted image of G by a rotated, distorted copyof the image of G i . The distortion is done in such a way that the coLipschitz constant (the Lipschitzconstant of the inverse map) is on the order of √ m (cid:112) ln( m + 1), and the fact that rotations areisometries of the Heisenberg group aﬀords one uniform control on the Lipschitz constants. One canconclude from this that the Heisenberg group is not Markov p -convex for p < m ).Our graphs diﬀer from those in [Li16] in that, to obtain Γ m +1 from Γ m , we ﬁrst glue together many copies of Γ m together with a small number of copies of a single edge I in series to get a newgraph Γ (cid:48) m +1 , and then replace each edge of Γ with a copy of Γ (cid:48) m +1 (this isn’t exactly how ourconstruction is deﬁned, but is close enough to get the main idea). See Deﬁnition 5.1 for the fulldetails. We will explain the reasoning for this after describing our maps of Γ m into J r − ( R ).Our maps diﬀer from those in [Li16] in that we do not rotate the image of Γ m before using itto replace the edges of the image of Γ , as rotations are not Lipschitz maps in higher step groupslike they are in the Heisenberg group. Refer to Figure 2 throughout this discussion to get anidea of the construction of these maps. Instead of rotating, we simply add (many copies of) theimage of Γ m to a distorted copy of the image of Γ to obtain the mapping of Γ m +1 into R . Morespeciﬁcally, we map each directed path γ in Γ m +1 to the jet of a function φ γ - a horizontal curvein J r − ( R ). The Lipschitz constant of this map is controlled by (cid:13)(cid:13) d r d r x φ γ (cid:13)(cid:13) ∞ . We still distort thegraphs Γ m with the same asymptotics as in [Li16], so that the coLipschitz constant is on the orderof r √ m r (cid:112) ln( m + 1) (at least on the pairs of random walks ( X mt , ˜ X mt ( t − k )). That we get the r th root of m instead of the fourth root of m comes from the fact that J r − ( R ) is of step r andthe Heisenberg group is of step 2. One potential problem is that the absence of isometric rotationsand the fact that ( √ m ln( m )) − isn’t summable means (cid:13)(cid:13) d r d r x φ γ (cid:13)(cid:13) ∞ blows up along some paths, andthus we do not have uniform control on the Lipschitz constant of the map, unlike [Li16]. However,( √ m ln( m )) − is square-summable , and together with the nature of the image of the random walk X mt in J r − ( R ), this allows us to control E [ d CC ( X mt +1 , X mt ) p ] uniformly in m, t . Loosely, alongthe random walk in the horizontal layer (which has x - and u r − -coordinates), every time one isconfronted with a choice of direction to walk in, the choice is to walk 1 unit in the x -directionand +( √ i ln( i + 1)) − units in the u r − -direction with probability 1/2, or 1 unit in the x -directionand − ( √ i ln( i + 1)) − units in the u r − -direction with probability 1/2 (for some i depending onhow far one has walked). Thus, one might expect d CC ( X mt +1 , X mt ) to be bounded by a randomvariable distributed like 1 + | (cid:80) ti =1 (cid:15) i ( √ i ln( i + 1)) − | , where { (cid:15) i } i are iid Rademachers, and thenKhintchine’s inequality implies we should have a uniform bound on E [ d CC ( X mt +1 , X mt ) p ] (which isthe real quantity of interest, recall Deﬁnition 1.1). Of course, the random walk is not distributedlike this, but it turns out that this intuition is correct nonetheless, see Lemmas 3.9 and 5.5(4) forthe speciﬁcs.Finally, the reason we use many copies of Γ m in creating Γ m +1 is so that, compared to thediameter of Γ m +1 , the diameter of the copies of Γ m is very small, and thus those that replacedopposite edges of Γ don’t get too close together, which would ruin the coLipschitz constant.Morally, this “decouples” any interaction between diﬀerent scales in Γ m +1 .3. Preliminaries

The next two subsections don’t follow any particular reference, but ones we recommend are[BLU07] for Carnot groups and [LD17] for graded nilpotent groups. We mostly follow [War05] forthe subsection on jet spaces.3.1.

Graded Nilpotent and Stratiﬁed Lie Algebras and their Lie Groups. A graded nilpo-tent Lie algebra ( g , [ · , · ]) of step r is a Lie algebra equipped with a grading g = ⊕ ri =1 g i , meaning g r (cid:54) = 0, [ g i , g j ] ⊆ g i + j if i + j ≤ r , and [ g i , g j ] = 0 if i + j > r . A stratiﬁed Lie algebra ( g , [ · , · ]) of step r is a graded nilpotent Lie algebra of step r such that the Lie subalgebra generated by g is allof g . The grading is called a stratiﬁcation , g is often called the horizontal layer (or stratum), and g is said to be horizontally generated . Whenever a Lie algebra g (not presumed to be equipped witha grading) admits a stratiﬁcation, it is unique (Lemma 2.16, [LD17]). A graded nilpotent Lie groupof step r is a simply connected Lie group whose Lie algebra is graded nilpotent of step r . A gradednilpotent Lie group whose Lie algebra is stratiﬁed is a Carnot group . A graded homomorphism or map is a Lie group homomorphism between graded nilpotent Lie groups whose derivative is agraded Lie algebra homomorphism. One graded nilpotent Lie group G (cid:48) is a graded subgroup ofanother graded nilpotent Lie group G if there is an injective graded homomorphism from G (cid:48) into G . One graded nilpotent Lie group G (cid:48) is a graded quotient group of another graded nilpotent Liegroup G if there is a surjective graded homomorphism from G onto G (cid:48) . One graded nilpotent Liegroup G (cid:48) is a graded subquotient group of another graded nilpotent Lie group G if there is anothergraded nilpotent Lie group G (cid:48)(cid:48) such that G (cid:48)(cid:48) is a graded subgroup of G and G (cid:48) is a graded quotientgroup of G (cid:48)(cid:48) , or, equivalently, there is another graded nilpotent Lie group G (cid:48)(cid:48) such that G (cid:48)(cid:48) is agraded quotient group of G and G (cid:48) is a graded subgroup of G (cid:48)(cid:48) .Given a graded nilpotent Lie group G and its Lie algebra g , since g is nilpotent and G is simplyconnected, the exponential map is a diﬀeomorphism, and thus we can use it to equip g with agraded nilpotent Lie group structure such that it becomes graded isomorphic to G . The Baker-Campbell-Hausdorﬀ formula provides a formula for the group product on g in terms of the Lie lgebra structure (Section 2, [War05]): xy = (cid:88) n> ( − n +1 n (cid:88)

-action on G . It can be deduced that a Lie group homomorphism θ between graded nilpotentLie groups is a graded homomorphism if and only if it is δ t -equivariant, that is, θ ( δ t ( x )) = δ t ( θ ( x )),where we’ve abused (and will continue to do so) notation and written δ t for the dilation on boththe domain and codomain.3.2. Norms and Metrics.

Let G be a graded nilpotent Lie group. A homogeneous quasi-norm on G is a continuous function N : G → R such that for all x ∈ G and t ∈ R > , • N ( x ) ≥ • N ( x − ) = N ( x ) (symmetry) • N ( δ t ( x )) = tN ( x ) (homogeneity)If additionally N ( x ) = 0 implies x = 0, then N is a positive deﬁnite homogeneous quasi-norm,and if N ( xy ) ≤ N ( x ) + N ( y ) for all x, y ∈ G (triangle inequality), N is a homogeneous norm . Forany two positive deﬁnite homogeneous quasi-norms N, N (cid:48) on G , the continuity, homogeneity, andpositive deﬁniteness of N, N (cid:48) , together with the compactness of the unit sphere in ⊕ ri =1 R dim( g i ) ,imply that N and N (cid:48) are biLipschitz equivalent, that is, there is a constant 0 < C < ∞ such that C − N ( x ) ≤ N (cid:48) ( x ) ≤ CN ( x )for all x ∈ G .Positive deﬁnite homogeneous norms always exist, most famously those considered in [HS90].Thus any positive deﬁnite homogeneous quasi-norm N satisﬁes the quasi-triangle inequality : thereis a 0 < C < ∞ such that for all x, y ∈ G , N ( xy ) ≤ C ( N ( x ) + N ( y ))Typically one requires that every homogeneous quasi-norm N satisﬁes the quasi-triangle in-equality. Although it turns out that the quasi-norms we consider in this article do satisfy thequasi-triangle inequality, we only need to know this for positive-deﬁnite quasi-norms and thus donot explicitly make this requirement.There is a bijective correspondence between homogeneous, positive deﬁnite quasi-norms N on G and left-invariant, homogeneous quasi-metrics d N on G via N (cid:55)→ d N deﬁned by d N ( x, y ) := N ( y − x )Positive deﬁniteness of N implies positive deﬁniteness of d N , symmetry of N implies symmetry of d N , homogeneity of N implies the homogeneity of d N (meaning d N ( δ t ( x ) , δ t ( y )) = td N ( x, y )), andthe quasi-triangle inequality of N implies the quasi-triangle inequality of d N . The left-invarianceof d N is automatic from the deﬁnition. N satisﬁes the triangle inequality if and only if d N does.The inverse of N (cid:55)→ d N is d (cid:55)→ N d , where N d ( x ) := d (0 , x ). In addition to those determined by the omogeneous, positive deﬁnite norms from [HS90], there are canonical left-invariant, homogeneousmetrics on Carnots groups called Carnot-Caratheodory metrics , denoted d CC . These metrics arealso geodesic. See [BLU07] or [LD17] for further information.In what follows, whenever dealing with a graded nilpotent Lie group, we will automaticallyassume it is equipped with a left-invariant, homogeneous quasi-metric. By the preceding discussion,this quasi-metric is well-deﬁned up to biLipschitz equivalence, so any biLipschitz-invariant propertyof metric spaces we may well attribute to a graded nilpotent Lie group G knowing only the algebraicstructure of its graded Lie algebra. The δ t -equivariance of graded group maps implies that anygraded map between graded nilpotent Lie groups is Lipschitz, and thus graded group embeddingsare biLipschitz embeddings, graded quotient maps are Lipschitz quotient maps, and graded groupisomorphisms are biLipschitz equivalences.3.3. Model Filiform Groups and Jet Spaces over R . We follow [War05] (especially Example4.3) throughout this subsection. The model ﬁliform group of step r ≥ g = ( R X ⊕ R Y ) ⊕ ri =2 R Y i , where X, Y is a basis for g and Y i is a basis for g i for 2 ≤ i ≤ r , and the nontrivial bracket relations are given by [ X, Y i ] = Y i +1 for 1 ≤ i ≤ r − s ≥ r , there is a canonical Carnot group quotient map from the model ﬁliform groupof step s to that of step r . The model ﬁliform group of step 2 is frequently called the Heisenberggroup , and the one of step 3 the

Engel group . The corresponding Lie algebras are the

Heisenbergalgebra and

Engel algebra .The jet space over R of step r ≥

0, denoted J r − ( R ), is a certain Carnot group of step r graded isomorphic to the model ﬁliform group of step r . There are also jet space groups J r − ( R k )over higher dimensional Euclidean space, but we will focus on k = 1 in this discussion. As aset, J r − ( R ) consists of equivalence classes of pairs ( x, f ) where x ∈ R and f ∈ C r − ( R ). Twopairs ( x, f ) , ( y, g ) are equivalent if x = y and f ( k ) ( x ) = g ( k ) ( y ) for all 0 ≤ k ≤ r −

1. We deﬁnemaps π x , π i : J r − ( R ) → R , 0 ≤ i ≤ r −

1, by π x ([( y, g )]) = y and π i ([( y, g )]) = g ( i ) ( y ). Thesemaps are obviously well-deﬁned and the direct sum map π x ⊕ r − i =0 π r − − i : J r − ( R ) → R × R r isa bijection. For v ∈ J r − ( R ), the quantity π x ( v ) is referred to as the x -coordinate and π i ( v ) asthe u i -coordinate . We equip J r − ( R ) with a topological vector space structure so that this mapis a linear homeomorphism, and from this point on will represent elements of J r − ( R ) using thesecoordinates. We will especially represent elements as pairs ( y, v ) ∈ J r − ( R ) = R × R r so that y ∈ R , v ∈ R r , and π x (( y, v )) = y . Although we won’t explicitly use it, the group operation on J r − ( R ) isgiven by π x (( x, u r − , . . . u ) ∗ ( y, v r − , . . . v )) = x + yπ i (( x, u r − , . . . u ) ∗ ( y, v r − , . . . v )) = u i + v i + r − (cid:88) j = i +1 u j y j − i ( j − i )!Given y ∈ R and g ∈ C r − ( R ), we get an element [ j r − ( y )]( g ) ∈ J r − ( R ) deﬁned by π x ([ j r − ( y )]( g )) = yπ i ([ j r − ( y )]( g )) = g ( i ) ( y )called the jet of g at y . The following two Lemmas are essentially all we need to know about jetspaces. The ﬁrst is a special case of [RW10]. Although their lemma is stated for C r functions, theproof works the same in the case of C r − , functions. Lemma 3.1 (pages 4-5, [RW10]) . For any [ a, b ] ⊆ R and φ ∈ C r − , ([ a, b ]) , d CC ([ j r − ( b )]( φ ) , [ j r − ( a )]( φ )) ≤ (cid:18) (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) L ∞ ([ a,b ]) (cid:19) | b − a | emma 3.2. There is a constant c > such that for all ( x, u ) , ( x, v ) ∈ J r − ( R ) , d CC (( x, u ) , ( x, v )) ≥ c | π ( u − v ) | r Proof.

By left invariance of d CC and the ball-box theorem (see Corollary 2.2 of [Jun19], there is aconstant c > x, u ) , ( x, v ) ∈ J r − ( R ), d CC (( x, u ) , ( x, v )) ≥ c | π (( x, v ) − ( x, u )) | r and by Lemma 3.1 from [Jun17], π (( x, v ) − ( x, u )) = π ( u − v ) (cid:3) The following lemma will be used to obtain lower bounds on the Markov convexity of Carnotgroups of step 2 or 3.

Lemma 3.3.

Every Carnot group of step 2 or 3 contains the model ﬁliform group of the corre-sponding step (the Heisenberg or Engel group) as a graded subquotient group.Proof.

Let G be a Carnot group of step 2 with stratiﬁed Lie algebra g = g ⊕ g . Since g has step2, there is a nonzero V ∈ g . Since g is horizontally generated, there exist U, V ∈ g such that[ U, V ] = V . Recall that the Heisenberg algebra has ﬁrst layer generated by linearly independentvectors X, Y , second layer generated by Y (cid:54) = 0, and nontrivial bracket relation [ X, Y ] = Y . Thenit easily follows that X (cid:55)→ U , Y (cid:55)→ V , Y (cid:55)→ V is a graded algebra embedding into g . This provesthat the Heisenberg group is a graded subgroup of G .Now assume G is of step 3 with stratiﬁed Lie algebra g = g ⊕ g ⊕ g . By the grading property,any subspace of g is an ideal, and thus there is a graded algebra quotient map onto another step 3stratiﬁed Lie algebra whose third layer is one dimensional. Thus we may assume g = R W , W (cid:54) = 0,and prove that the Engel algebra embeds into g . Since g is horizontally generated, W = [ U , [ U , U ]]for some U , U , U ∈ g . First we claim that there is a 2-dimensional subspace of the span of U , U , U that generates a Lie subalgebra of step 3. After proving the claim, we’ll show that thissubalgebra must be graded algebra-isomorphic to the Engel algebra. To prove the claim, we’ll showthat at least one of the following is nonzero:(1) [ U , [ U , U ]](2) [ U , [ U , U ]](3) [ U , [ U , U ]](4) [ U , [ U , U ]](5) [ U + U , [ U + U , U ]](6) [ U + U , [ U + U , U ]]Assume that all terms are 0. First let’s see that [ U , [ U , U ]] = W .0 (5) = [ U + U , [ U + U , U ]] = [ U , [ U , U ]] + [ U , [ U , U ]] + [ U , [ U , U ]] + [ U , [ U , U ]] (2) , (3) = W + [ U , [ U , U ]] = W − [ U , [ U , U ]]Using (6) , (1) , (4) in place of (5) , (2) , (3) shows [ U , [ U , U ]] = W . Putting these together yields:[ U , [ U , U ]] + [ U , [ U , U ]] + [ U , [ U , U ]] = 3 W (cid:54) = 0in violation of the Jacobi identity. This proves the claim.So now the situation is that there are Z , Z ∈ g with [ Z , [ Z , Z ]] = zW for some z (cid:54) = 0.Recall that the Engel algebra has ﬁrst layer spanned by X, Y , second layer by Y , and third layerby Y with nontrivial bracket relations [ X, Y ] = Y and [ X, Y ] = Y . Let z (cid:48) ∈ R such that Z , [ Z , Z ]] = z (cid:48) W . Then since [ Z , [ Z , Z ]] = zW (cid:54) = 0, the map from the Engel algebra into g deﬁned by X (cid:55)→ Z , Y (cid:55)→ Z − z (cid:48) z Z , Y (cid:55)→ [ Z , Z ] , Y (cid:55)→ zW is a graded algebra embedding. (cid:3) Remark . The analogue of Lemma 3.3 is false for groups of step larger than 3. Let g be thestratiﬁed Lie algebra g = ⊕ i =1 g i with g = R X ⊕ R X , g = R X , g = R X ⊕ R X , g = R X and nontrivial brackets [ X , X ] = X , [ X , X ] = X , [ X , X ] = X , [ X , X ] = X ,[ X , X ] = X . The only graded quotient maps from g onto another step 4 stratiﬁed Lie algebraor graded embeddings into g from another step 4 stratiﬁed Lie algebra are isomorphisms.3.4. Inﬁnite Step Carnot groups.

Given an inverse system of graded nilpotent Lie groups G ρ ← G ρ ← . . . , where each ρ i is a graded quotient map, we deﬁne the inverse limit metric group , G ∞ , to be the subgroup of ( ⊕ ∞ i =1 G i ) ∞ consisting of those sequences ( x i ) ∞ i =1 for which ρ ( x i +1 ) = x i for all i ≥

1, where ( ⊕ ∞ i =1 G i ) ∞ is the (cid:96) ∞ -sum of the pointed metric spaces ( G i , d CC , G ∞ inheritsa left-invariant homogeneous metric from ( ⊕ ∞ i =1 G i ) ∞ (where the dilations δ t are deﬁned on G ∞ inthe obvious way), and each G i is a Lipschitz quotient of G ∞ . Deﬁnition 3.5. J ∞ ( R k ) is the inverse limit metric group, equipped with the induced δ t -action,associated to the natural inverse system formed by the jet space groups, J ( R k ) ρ ← J ( R k ) ρ ← . . . .See [War05] for background on jet space groups. Similarly, F ∞ k is the inverse limit metric group,equipped with the induced δ t -action, associated to the natural inverse system formed by the freeCarnot groups on k generators, F k ρ ← F k ρ ← . . . . See Chapter 14 of [BLU07] for background onfree Carnot groups.3.5. Probabilistic and Convexity Inequalities.

In this article, we will often justify an inequal-ity with the phrase “by convexity” or “by the parallelogram law”. The convexity inequalities werefer to are almost always of the form a p + b p ≥ (cid:18) a + b (cid:19) p or a p + b p ≤ ( a + b ) p for p ≥ a, b ≥

0. The form of the parallelogram law we most often use is (cid:107) u (cid:107) + (cid:107) u − v (cid:107) (cid:107) v/ (cid:107) + (cid:107) u − v/ (cid:107) for u, v in a Hilbert space, which implies the inequality (cid:107) u (cid:107) + (cid:107) u − v (cid:107) ≥ (cid:107) v (cid:107) L p -normsof random variables. Lemma 3.6.

For all a, b ≥ and q ≥ , ( a + b ) q ≥ a q + qa q − b Proof.

Let a, b, q be as above. The inequality is obviously true if a = 0. Then if a >

0, after dividingeach side by a q and replacing b/a with t , it suﬃces to prove (1 + t ) q ≥ qt . This inequality istrue since the right hand is the linearization of the left hand side at t = 0, and the left hand sideis a convex function of t . (cid:3) emma 3.7. For each p > and k ≥ , k (cid:88) t =1 (2 t ) p > k p +1 / Proof.

Let p > k ≥

1. Since the function t (cid:55)→ (2 t ) p is increasing, k (cid:88) t =1 (2 t ) p > ˆ k (2 t ) p dt = 2 p p + 1 k p +1 ≥ k p +1 / (cid:3) The following two lemmas are frequently used in tandem to prove Khintchine’s inequality (forexample, Proposition 4.5 of [Wol03]). We will need them for a similar inequality used in Section5.2.

Lemma 3.8.

For all y ∈ R , cosh( y ) ≤ exp( y / .Proof. Let y ∈ R .cosh( y ) = e y + e − y ∞ (cid:88) k =0 y k + ( − y ) k k ! = ∞ (cid:88) k =0 y k (2 k )! ≤ ∞ (cid:88) k =0 ( y / k k ! = exp (cid:18) y (cid:19) (cid:3) Lemma 3.9.

For each p ≥ and < A, B < ∞ , there is a constant C = C ( p, A, B ) < ∞ such thatany real-valued random variable Y satisfying the moment generating function subgaussian bound E [exp( yY )] ≤ Ae By also satisﬁes the L p -norm bound E [ | Y | p ] ≤ C Proof.

This is a standard result from the theory of subgaussian random variables whose proofappears in any text on measure concentration. For the sake of completeness we’ll include the proof,roughly following the proof of Proposition 4.5 from [Wol03]. Let p , A , B , Y be as above. For any t >

0, Markov’s inequality and our assumption imply P ( Y ≥ t ) = P (cid:18) exp (cid:18) t B Y (cid:19) ≥ exp (cid:18) t B (cid:19)(cid:19) ≤ exp (cid:18) − t B (cid:19) E (cid:20) exp (cid:18) t B Y (cid:19)(cid:21) ≤ A exp (cid:18) − t B + t B (cid:19) = A exp (cid:18) − t B (cid:19) Likewise, P ( Y ≤ − t ) ≤ A exp (cid:18) − t B (cid:19) giving us P ( | Y | ≥ t ) ≤ A exp (cid:18) − t B (cid:19) We then use the layer cake principle to calculate E [ | Y | p ]: E [ | Y | p ] = p ˆ ∞ t p − P ( | Y | ≥ t ) dt ≤ p ˆ ∞ t p − A exp (cid:18) − t B (cid:19) dt = C ( p, A, B ) < ∞ (cid:3) . Upper Bound on Markov Convexity of Graded Nilpotent Lie Groups

Throughout this section, ﬁx a graded nilpotent Lie algebra ( g , [ · , · ]) of step r ≥ ⊕ ri =1 g i and dim( g i ) = k i . Choose an ordered basis U i, , . . . U i,k i for each g i and equip g with aHilbert norm (cid:107) · (cid:107) such that these vectors form an orthonormal basis. We also use (cid:107) · (cid:107) to denotethe Euclidean norm on any R n . Given x ∈ g , let x i ∈ g i denote its g i -component. Given x i ∈ g i ,let x i,j ∈ R denote its U i,j -component. Thus, (cid:107) x (cid:107) = r (cid:88) i =1 (cid:107) x i (cid:107) and (cid:107) x i (cid:107) = k i (cid:88) j =1 | x i,j | Consider g as a graded nilpotent Lie group as in Section 3. It’s easy to see that 0 is the groupidentity element and x − = − x . Whenever u, v ∈ g or u, v ∈ R n , we use the notation (cid:107) ( u, v ) (cid:107) tomean (cid:107) u (cid:107) + (cid:107) v (cid:107) .4.1. BCH Polynomials.Deﬁnition 4.1.

For s ≥

0, a function P : g × g → R that is a monomial(polynomial) in the variables x n,m , y n,m is a graded-homogeneous monomial(polynomial) of degree s if P ( δ t ( x ) , δ t ( y )) = t s P ( x, y )for all x, y ∈ g and t ∈ R > . Clearly, any graded-homogeneous polynomial of degree s must be asum of graded-homogeneous monomials of degree s .In this section, a multiset is a ﬁnite sequence of positive integers modulo permutations. Disjointunions I (cid:116) I of multisets are deﬁned in the obvious way. Given a multiset I , (cid:107) I (cid:107) denotes the sumof the elements and (cid:107) I (cid:107) ∞ the maximum of the elements. Given a nonzero graded-homogeneousmonomial M of degree s , we associate to it a multiset I ( M ) deﬁned recursively on the numberof variables in the monomial by I ( M ) = { i } (cid:116) I ( M (cid:48) ) if M ( x, y ) = x i,n M (cid:48) ( x, y ) or M ( x, y ) = y i,n M (cid:48) ( x, y ) for some n ≤ k i and graded-homogeneous polynomial M (cid:48) of degree s − i (the basecase is I (1) = ∅ ). By the homogeneity property, it must hold that if M is nonzero and graded-homogeneous of degree s , (cid:107) I ( M ) (cid:107) = s .For s ≥

1, let s denote the unique multiset with (cid:107) s (cid:107) = s and (cid:107) s (cid:107) ∞ = 1 (and = ∅ ). For each n, m ≤ k , let τ n,m ( x, y ) := x ,n y ,m − x ,m y ,n . A graded-homogeneous polynomial P of degree s ≥ τ -type if P ( x, y ) = τ n,m ( x, y ) M (cid:48) ( x, y ) for some n, m ≤ k and graded-homogeneousmonomial M (cid:48) with I ( M (cid:48) ) = s − .A graded homogeneous polynomial of degree s ≥ (cid:80) j Q j (the sum is ﬁnite), whereeach Q j is of τ -type or a graded-homogeneous monomial of degree s with 1 < (cid:107) I ( Q j ) (cid:107) ∞ < s iscalled a BCH polynomial of degree s . Remark . Obviously a sum of BCH polynomials of degree s is another such polynomial. If P is a BCH polynomial of degree s , 1 ≤ i ≤ r , and 1 ≤ j ≤ k i , x i,j P ( x, y ) and y i,j P ( x, y ) are BCHpolynomials of degree s + i . If P ( x, y ) is a BCH polynomial of degree s , then so is P ( x, δ t ( y )) forany t ∈ R > . Example 4.3.

Let M ( x, y ) = 6 x , x , y , , P ( x, y ) = − y , ( x , y , − x , y , ), Q ( x, y ) = x , y , ,and R ( x, y ) = y , . M is a graded-homogeneous monomial of degree 7 with I ( M ) = { , , , } , P is a graded homogeneous polynomial of degree 3 of τ -type, Q is a graded-homogeneous monomial ofdegree 2 with I ( Q ) = { , } , and R is a graded homogeneous monomial of degree 3 with I ( r ) = { } . M and P are BCH polynomials, but Q and R are not because they are monomials with (cid:107) I ( Q ) (cid:107) ∞ = 1and (cid:107) I ( R ) (cid:107) ∞ = (cid:107) I ( R ) (cid:107) .We now arrive at a key structural lemma for the group product on graded nilpotent Lie algebras.The rest of this subsection is dedicated to its proof. emma 4.4. For all x, y ∈ g and ≤ s ≤ r , (1) ( y − x ) = x − y (2) ( y − x ) s = x s − y s + (cid:80) k s j =1 P s,j ( x, y ) U s,j where each P s,j is a BCH polynomial of degree s . A trusting reader familiar with the group structure of graded nilpotent Lie algebras induced bythe Baker-Campbell-Hausdorﬀ formula may safely skip the rest of this subsection. Before provingthe lemma, we need to set some useful notation that allows us to work with nested Lie brackets,and then prove a lemma about these brackets.

Deﬁnition 4.5.

Given x, y ∈ g , i ≥

1, and (cid:15) ∈ { , } i , we recursively deﬁne ( x, y ) (cid:15) as follows:for i = 1, ( x, y ) (cid:15) := x if (cid:15) = 1 and ( x, y ) (cid:15) := y if (cid:15) = 2. Assume ( x, y ) (cid:15) has been deﬁned for all (cid:15) ∈ { , } i for some i ≥

1. Let (cid:15) ∈ { , } i +1 . Then (cid:15) equals (1 , (cid:15) (cid:48) ) or (2 , (cid:15) (cid:48) ) for some (cid:15) (cid:48) ∈ { , } i .We deﬁne ( x, y ) (cid:15) := [ x, ( x, y ) (cid:15) (cid:48) ] if (cid:15) = (1 , (cid:15) (cid:48) ) and ( x, y ) (cid:15) := [ y, ( x, y ) (cid:15) (cid:48) ] if (cid:15) = (2 , (cid:15) (cid:48) ). Example 4.6. ( x, y ) (1 , , , = [ x, [ y, [ y, x ]]]. The 1 or 2 in the superscript should be thought of asindicating the ﬁrst or second component of ( x, y ) in the nested Lie bracket. Lemma 4.7.

For all x, y ∈ g , ≤ i , i ≤ r , and (cid:15) ∈ { , } i , (( x, y ) (cid:15) ) i = k i (cid:88) j =1 Q i ,j ( x, y ) U i ,j where each Q i ,j is a BCH polynomial of degree i if i ≤ i i > i .Proof. Let x, y ∈ g . By the grading property, (( x, y ) (cid:15) ) i = 0 if (cid:15) ∈ { , } i and i > i . We’ll provethe remaining case by induction on i .Proof of base case. The base case is i = 2. Let (cid:15) ∈ { , } . Then (cid:15) equals (1 , , , , x, y ) (1 , = ( x, y ) (2 , = 0 and ( x, y ) (2 , = − ( x, y ) (1 , , it suﬃces to only consider (cid:15) = (1 , x, y ) (cid:15) = [ x, y ]. Let i ≥

2. We treat the two cases i = 2 and i >

2. Firstassume i = 2. Then we have[ x, y ] = [ x , y ] =  k (cid:88) j =1 x ,j U ,j , k (cid:88) j (cid:48) =1 y ,j (cid:48) U ,j (cid:48)  = k (cid:88) j =1 k (cid:88) j (cid:48) =1 x ,j y ,j (cid:48) (cid:2) U ,j , U ,j (cid:48) (cid:3) = 12  k (cid:88) n,m =1 x ,n y ,m [ U ,n , U ,m ] + k (cid:88) n,m =1 x ,m y ,n [ U ,m , U ,n ]  = 12 k (cid:88) n,m =1 ( x ,n y ,m − x ,m y ,n ) [ U ,n , U ,m ] = 12 k (cid:88) n,m =1 τ n,m ( x, y ) [ U ,n , U ,m ]= 12 k (cid:88) n,m =1 τ n,m ( x, y ) k (cid:88) j =1 c j,n,m U ,j = k (cid:88) j =1  k (cid:88) n,m =1 c j,n,m τ n,m ( x, y )  U ,j for some c j,n,m ∈ R . The inner sum is a sum of polynomials of degree 2 of τ -type, and thus a BCHpolynomial of degree i .Now we consider the case i > x, y ] i = i − (cid:88) n =1 [ x n , y i − n ] = i − (cid:88) n =1  k n (cid:88) j =1 x n,j U n,j , k i − n (cid:88) j (cid:48) =1 y i − n,j (cid:48) U i − n,j (cid:48)  i − (cid:88) n =1 k n (cid:88) j =1 k i − n (cid:88) j (cid:48) =1 x n,j y i − n,j (cid:48) (cid:2) U n,j , U i − n,j (cid:48) (cid:3) = i − (cid:88) n =1 k n (cid:88) j =1 k i − n (cid:88) j (cid:48) =1 x n,j y i − n,j (cid:48) k i (cid:88) m =1 c m,n,j,j (cid:48) U i ,m = k i (cid:88) m =1  i − (cid:88) n =1 k n (cid:88) j =1 k i − n (cid:88) j (cid:48) =1 c m,n,j,j (cid:48) x n,j y i − n,j (cid:48)  U i ,m for some c m,n,j,j (cid:48) ∈ R . Notice that, for each n, j, j (cid:48) , I ( x n,j y i − n,j (cid:48) ) = { n, i − n } , and so since i > ≤ n ≤ i −

1, 1 < (cid:107) I ( x n,j y i − n,j (cid:48) ) (cid:107) ∞ < i , and thus x n,j , y i − n,j (cid:48) is a BCH polynomial ofdegree i . This completes the proof of the base case.Proof of inductive step. Now assume the lemma holds for some 2 ≤ i < r . Let (cid:15) ∈ { , } i +1 .Then (cid:15) equals (1 , (cid:15) (cid:48) ) or (2 , (cid:15) (cid:48) ) for some (cid:15) (cid:48) ∈ { , } i . Without loss of generality, assume (cid:15) = (1 , (cid:15) (cid:48) ).Let i ≥ i + 1. Then (( x, y ) (cid:15) ) i = [ x, ( x, y ) (cid:15) (cid:48) ] i = i − (cid:88) n =1 [ x n , (( x, y ) (cid:15) (cid:48) ) i − n ] ind hyp = i − (cid:88) n =1  k n (cid:88) j =1 x n,j U n,j , k i − n (cid:88) j (cid:48) =1 P i − n,j (cid:48) ( x, y ) U i − n,j (cid:48)  = i − (cid:88) n =1 k n (cid:88) j =1 k i − n (cid:88) j (cid:48) =1 x n,j P i − n,j (cid:48) ( x, y ) (cid:2) U n,j , U i − n,j (cid:48) (cid:3) = i − (cid:88) n =1 k n (cid:88) j =1 k i − n (cid:88) j (cid:48) =1 x n,j P i − n,j (cid:48) ( x, y ) k i (cid:88) m =1 c m,n,j,j (cid:48) U i ,m = k i (cid:88) m =1  i (cid:88) n =1 k n (cid:88) j =1 k i − n (cid:88) j (cid:48) =1 c m,n,j,j (cid:48) x n,j P i − n,j (cid:48) ( x, y )  U i ,m for some c m,n,j,j (cid:48) ∈ R and BCH polynomials P i − n,j (cid:48) ,(cid:96) of degree i − n . This implies x n,j P i − n,j (cid:48) ( x, y )is a BCH polynomial of degree i , as desired. (cid:3) Proof of Lemma 4.4.

The Baker-Campbell-Hausdorﬀ formula, (3.1), implies that there are con-stants (many can be taken to be 0) { α (cid:15) } (cid:15) ∈∪ ri =2 { , } i ⊆ R such that y − x = x − y + r (cid:88) i =2 (cid:88) (cid:15) ∈{ , } i α (cid:15) ( x, y ) (cid:15) Since ( y − x ) i = x i − y i + r (cid:88) i =2 (cid:88) (cid:15) ∈{ , } i α (cid:15) (( x, y ) (cid:15) ) i the desired conclusion follows by appealing to Lemma 4.7. (cid:3) .2. Convex Metrics.

The goal of this subsection is to prove Theorem 4.19. To do so, we constructa left invariant homogeneous quasi-metric on g that satisﬁes a certain 4-point inequality. This isthe content of Lemma 4.18. All the lemmas and deﬁnitions preceding Lemma 4.18 exist to proveit. We next deﬁne a graded-homogeneous polynomial of degree 2 s that dominates the square ofany BCH polynomial of degree s , Lemma 4.9. As a consequence of this we get two dominationinequalities involving norms of group products, Lemmas 4.11 and 4.12. These types of dominationare what will ultimately allow us to prove Lemma 4.16, the key lemma used in the proof of Lemma4.18. Deﬁnition 4.8.

Let τ ( x, y ) := (cid:115) (cid:88) n,m ≤ k τ n,m ( x, y )so that τ ( x, y ) ≥ τ n,m ( x, y ) for every n and m . For each 2 ≤ s ≤ r , deﬁne D s : g × g → R ≥ recursively by D ( x, y ) := τ ( x, y ) + (cid:107) ( x , y ) (cid:107) D s +1 := (cid:107) ( x s +1 , y s +1 ) (cid:107) + (cid:98) ( s +1) / (cid:99) (cid:88) s (cid:48) =1 (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) D s +1 − s (cid:48) ( x, y ) Lemma 4.9.

For any ≤ s ≤ r and BCH polynomial P of degree s , there exists < c ≤ suchthat for all x, y ∈ g , D s ( x, y ) − (cid:107) ( x s , y s ) (cid:107) ≥ cP ( x, y ) Proof.

The proof is by induction on s . The base case s = 2 is clear from the deﬁnition of D andBCH polynomial of degree 2. Assume the inequality holds for all s ≤ s for some s < r . Let P be a BCH polynomial of degree s + 1. By deﬁnition of BCH polynomial, it suﬃces to prove theinequality assuming P is a monomial with 1 < (cid:107) I ( P ) (cid:107) ∞ < s + 1 or P is of τ -type. First assume P is a monomial with 1 < (cid:107) I ( P ) (cid:107) ∞ < s + 1. There are two subcases to consider: 1 ∈ I ( P ) and1 / ∈ I ( P ). Assume the ﬁrst subcase holds. Then P = x ,n M ( x, y ) or P = y ,n M ( x, y ) for some n ≤ k and monomial M of degree s with 1 < (cid:107) I ( M ) (cid:107) ∞ < s + 1. Then D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) = (cid:98) ( s +1) / (cid:99) (cid:88) s (cid:48) =1 (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) D s +1 − s (cid:48) ( x, y ) ≥ (cid:107) ( x , y ) (cid:107) D s ( x, y ) ind hyp ≥ c (cid:107) ( x , y ) (cid:107) M ( x, y ) ≥ cP ( x, y )Now assume the second subcase holds. Then P ( x, y ) = x i,j M ( x, y ) or P ( x, y ) = y i,j M ( x, y ) forsome 1 < i ≤ (cid:98) ( s + 1) / (cid:99) , j ≤ k i , and monomial M of degree s + 1 − i with 1 < (cid:107) I ( M ) (cid:107) ∞ < s + 1.Then D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) = (cid:98) ( s +1) / (cid:99) (cid:88) s (cid:48) =1 (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) D s +1 − s (cid:48) ( x, y ) ≥ (cid:107) ( x i , y i ) (cid:107) D s +1 − i ( x, y ) ind hyp ≥ c (cid:107) ( x i , y i ) (cid:107) M ( x, y ) ≥ cP ( x, y )Now assume P is of τ -type. By deﬁnition, since P has degree s + 1, this means P ( x, y ) = τ n,m ( x, y ) M (cid:48) ( x, y ) for some n, m ≤ k and graded-homogeneous monomial M (cid:48) with I ( M (cid:48) ) = s − . his implies P ( x, y ) = x ,(cid:96) P (cid:48) ( x, y ) or P ( x, y ) = y ,(cid:96) P (cid:48) ( x, y ) for some (cid:96) ≤ k , and degree s polyno-mial P (cid:48) of τ -type. Then D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) = (cid:98) ( s +1) / (cid:99) (cid:88) s (cid:48) =1 (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) D s +1 − s (cid:48) ( x, y ) ≥ (cid:107) ( x , y ) (cid:107) D s ( x, y ) ind hyp ≥ c (cid:107) ( x , y ) (cid:107) ( P (cid:48) ) ( x, y ) ≥ cP ( x, y ) (cid:3) Lemma 4.10.

Let ≤ s ≤ r . For any t > , there is a constant c > such that for all x, y ∈ g , D s ( x, y ) − (cid:107) ( x s , y s ) (cid:107) ≥ c (cid:107) ( δ t ( y ) − x ) s − ( x s − t s y s ) (cid:107) Proof.

Let t >

0. By Lemma 4.4, (cid:107) ( δ t ( y ) − x ) s − ( x s − t s y s ) (cid:107) . = (cid:88) j | P s,j ( x, δ t ( y )) | = (cid:88) j | P (cid:48) s,j,t ( x, y ) | where each P s,j is a BCH polynomial of degree s , and by Remark 4.2, each P (cid:48) s,j,t is a BCH polynomialof degree s . Then the desired inequality follows from Lemma 4.9. (cid:3) Lemma 4.11.

Let ≤ s ≤ r and c > . For all suﬃciently small λ > (depending on c ), for all x, y ∈ g , c ( D s ( x, y ) − (cid:107) ( x s , y s ) (cid:107) ) + λ (cid:107) ( y − x ) s (cid:107) ≥ λ (cid:107) x s − y s (cid:107) Proof.

Let λ >

0. By Lemma 4.10, there is a constant c (cid:48) > x, y ) such that c ( D s ( x, y ) − (cid:107) ( x s , y s ) (cid:107) ) + λ (cid:107) ( y − x ) s (cid:107) ≥ c (cid:48) (cid:107) ( y − x ) s − ( x s − y s ) (cid:107) + λ (cid:107) ( y − x ) s (cid:107) =: ( ∗ )Thus, if λ ≤ c (cid:48) , ( ∗ ) ≥ λ (cid:107) ( y − x ) s − ( x s − y s ) (cid:107) + λ (cid:107) ( y − x ) s (cid:107) ≥ λ (cid:107) x s − y s (cid:107) where the last inequality follows from the parallelogram law. (cid:3) Lemma 4.12.

Let ≤ s ≤ r . There is a constant c > such that for all x, y ∈ g , D s ( x, y ) ≥ c (cid:107) ( δ / ( y ) − x ) s (cid:107) Proof.

By Lemma 4.10, it suﬃces to show (cid:107) ( δ / ( y ) − x ) s − ( x s − − s y s ) (cid:107) + (cid:107) ( x s , y s ) (cid:107) ≥ (cid:107) ( δ / ( y ) − x ) s (cid:107) Since (cid:107) x s − − s y s (cid:107) ≤ (cid:107) ( x s , y s ) (cid:107) it suﬃces to show (cid:107) ( δ / ( y ) − x ) s − ( x s − − s y s ) (cid:107) + 14 (cid:107) x s − − s y s (cid:107) ≥ (cid:107) ( δ / ( y ) − x ) s (cid:107) This inequality is true by the parallelogram law. (cid:3)

Lemma 4.13.

There is a constant c > such that for all x, y ∈ g , (cid:107) y (cid:107)(cid:107) x − y / (cid:107) ≥ c τ ( x, y ) roof. It suﬃces to show, for each ﬁxed n, m ≤ k , (cid:107) y (cid:107)(cid:107) x − y / (cid:107) ≥ | τ n,m ( x, y ) | . By Cauchy-Schwarz, (cid:107) y (cid:107)(cid:107) x − y / (cid:107) ≥ (cid:107) ( y ,m , − y ,n ) (cid:107)(cid:107) ( x ,n , x ,m ) − ( y ,n , y ,m ) / (cid:107) C-S ≥ | y ,m ( x ,n − y ,n / − y ,n ( x ,m − y ,m / | = | x ,n y ,m − x ,m y ,n | = | τ n,m ( x, y ) | (cid:3) Deﬁnition 4.14.

For 2 ≤ s ≤ r , deﬁne SN s : g × g → R by SN s ( x, y ) := max {(cid:107) x − y / (cid:107) , (cid:107) ( x , y ) (cid:107) / , (cid:107) ( x , y ) (cid:107) / , . . . (cid:107) ( x s , y s ) (cid:107) /s } Remark . Using the maximum of the terms is not important here; it could be replaced by any (cid:96) p -sum or other such norm. If a diﬀerent choice of norm was used, the rest of the section wouldproceed the exact same way except with possibly diﬀerent values of constants (but still independentof x, y ). Lemma 4.16.

For each ≤ s ≤ r , there exists a homogeneous quasi-norm N s and a constant c > such that for all x, y ∈ g ,(1) ( N s ( x ) s + N s ( y − x ) s ) / − ( N s ( y ) / s ≥ cSN ss ( x, y ) + cD s ( x, y ) + cN s ( δ / ( y ) − x ) s (2) N s ( y ) ≥ (cid:107) y (cid:107) for all s ≥ . Consequently, N s ( y ) + SN s ( x, y ) ≥ b (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) /s (cid:48) for some b > and all ≤ s (cid:48) ≤ s .(3) If N s ( y ) = 0 , y i = 0 for all ≤ i ≤ s . In particular, N r is a positive deﬁnite homogeneousquasi-norm.Proof. The proof is by induction on s . The functions N s we construct will clearly be homogeneousquasi-norms and satisfy (2) and (3), so we will only concern ourselves with proving (1).Proof of base case: The base case is s = 2. Throughout the proof of the base case, c (cid:48) , c (cid:48)(cid:48) , c (cid:48)(cid:48)(cid:48) denote (small) positive constants that depend on g but not on x, y . Each of the constants maydepend on the ones previously appearing, but of course this is compatible with the fact that theyare all independent of x, y . Deﬁne N ( x ) := (cid:112) (cid:107) x (cid:107) + λ (cid:107) x (cid:107) where λ > SN ( x, y ) = max( (cid:107) x − y / (cid:107) , (cid:107) ( x , y ) (cid:107) ) ≤(cid:107) x − y / (cid:107) + (cid:107) ( x , y ) (cid:107) and D ( x, y ) = τ ( x, y ) + (cid:107) ( x , y ) (cid:107) , we need to show( N ( x ) + N ( y − x ) ) / ≥ ( N ( y ) / + c (cid:107) x − y / (cid:107) + c τ ( x, y ) + c (cid:107) ( x , y ) (cid:107) + cN ( δ / ( y ) − x ) for some λ, c >

0. First let’s write out the deﬁnitions of some of the terms in the inequality. N ( x ) = (cid:107) x (cid:107) + λ (cid:107) x (cid:107) N ( y − x ) = (cid:107) x − y (cid:107) + λ (cid:107) ( y − x ) (cid:107) N ( y ) = (cid:107) y (cid:107) + λ (cid:107) y (cid:107) By convexity, parallelogram law, and Lemma 4.13,( (cid:107) x (cid:107) + (cid:107) x − y (cid:107) ) / ≥ (( (cid:107) x (cid:107) + (cid:107) x − y (cid:107) ) / = ( (cid:107) y / (cid:107) + (cid:107) x − y / (cid:107) ) = ( (cid:107) y (cid:107) / + 2 (cid:107) y / (cid:107) (cid:107) x − y / (cid:107) + (cid:107) x − y / (cid:107) . ≥ ( (cid:107) y (cid:107) / + c (cid:48) τ ( x, y ) + (cid:107) x − y / (cid:107) or some c (cid:48) >

0. Thus, it suﬃces to show that for suﬃciently small λ, c > c (cid:48) τ ( x, y ) + λ (cid:107) x (cid:107) + λ (cid:107) ( y − x ) (cid:107) + 12 (cid:107) x − y / (cid:107) ≥ − λ (cid:107) y (cid:107) + c τ ( x, y ) + c (cid:107) ( x , y ) (cid:107) + cN ( δ / ( y ) − x ) By Lemma 4.11, the following inequality is true for suﬃciently small λ > c (cid:48) τ ( x, y ) + λ (cid:107) ( y − x ) (cid:107) = c (cid:48) D ( x, y ) − (cid:107) ( x , y ) (cid:107)| ) + λ (cid:107) ( y − x ) (cid:107) . ≥ λ (cid:107) x − y (cid:107) Thus it suﬃces for the following inequality to hold for λ, c > c (cid:48) τ ( x, y ) + λ (cid:107) x (cid:107) + λ (cid:107) x − y (cid:107) + 12 (cid:107) x − y / (cid:107) ≥ − λ (cid:107) y (cid:107) + c (cid:107) ( x , y ) (cid:107) + cN ( δ / ( y ) − x ) We have λ (cid:107) x (cid:107) + λ (cid:107) x − y (cid:107) ≥ λ (cid:107) x (cid:107) + (cid:107) x − y (cid:107) )= λ (cid:107) x (cid:107) + (cid:107) x − y (cid:107) ) + λ (cid:107) x (cid:107) + (cid:107) x − y (cid:107) ) ≥ − λ (cid:107) y (cid:107) + c (cid:48)(cid:48) (cid:107) ( x , y ) (cid:107) Thus it remains to show c (cid:48) τ ( x, y ) + c (cid:48)(cid:48) (cid:107) ( x , y ) (cid:107) + 12 (cid:107) x − y / (cid:107) ≥ cN ( δ / ( y ) − x ) for c > c (cid:48) τ ( x, y ) + c (cid:48)(cid:48) (cid:107) ( x , y ) (cid:107) ≥ c (cid:48)(cid:48)(cid:48) ( τ ( x, y ) + (cid:107) ( x , y ) (cid:107) ) = c (cid:48)(cid:48)(cid:48) D s ( x, y ) Lem 4 . ≥ cλ (cid:107) ( δ / ( y ) − x ) (cid:107) c > cλ (cid:107) ( δ / ( y ) − x ) (cid:107) + 12 (cid:107) x − y / (cid:107) ≥ cN ( δ / ( y ) − x ) This is true by deﬁnition of N . This completes the proof of the base case.Proof of inductive step: Now assume the statement holds for all 2 ≤ s (cid:48) ≤ s some 2 ≤ s ≤ r − N s +1 by N s +1 ( x ) := s +1) (cid:118)(cid:117)(cid:117)(cid:116) λ (cid:107) x s +1 (cid:107) + s (cid:88) s (cid:48) = (cid:100) ( s +1) / (cid:101) N s +1) s (cid:48) ( x ) (4.1)where λ is a (small) positive constant (diﬀerent λ than in the base case) to be chosen later (indepen-dent of x, y ). Throughout the remainder of the proof, c − c denote (small) positive constants thatdepend on g but not on x, y . Each of the constants may depend on the ones previously appearing,but of course this is compatible with the fact that they are all independent of x, y . The constant λ will end up depending on c (which in turn depends on c ), and the subsequent constants willdepend on λ .We now prove the inductive step. In what follows, we adopt some conventions to help makethe proof more readable. There are two types of equalities/inequalities we use relating each ofthe expressions below. The ﬁrst type is simply using a lemma, deﬁnition, inductive hypothesis, orconvexity or trivial numerical inequality. Whenever an equality/inequality of this type is used, theparticular terms in the expression that change from one to the next are bolded . No other termschange, except for the bolded ones to which the particular lemma, deﬁnition, inductive hypothesis,or convexity or trivial numerical inequality apply. Apart from the trivial numerical inequalities,the name of the lemma or deﬁnition, “ind hyp”, or “convexity” decorates the equality/inequalitysymbol. The second type of equality/inequality used is always an equality and the equality symbol s decorated with the word “rearrange”. This means we use trivialities like commutivity of addi-tion or multiplication, reindexing of a sum, or no symbolic changes at all. Importantly, we alsouse equalities decorated with “rearrange” to change which terms are bolded in the expression, inpreparation for the use of another equality/inequality of the ﬁrst type. ( N s +1 ( x ) s +1) + N s +1 ( y − x ) s +1) ) / (4.1) = (cid:16) λ (cid:107) x s +1 (cid:107) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) N s +1) s (cid:48) ( x ) + (cid:107) ( y − x ) s +1 (cid:107) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) N s +1) s (cid:48) ( y − x ) (cid:17) / rearrange = λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) ) + (cid:16)(cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) N s +1) s (cid:48) ( x ) + N s +1) s (cid:48) ( y − x ) (cid:17) / convexity ≥ λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) (cid:32) N s (cid:48) s (cid:48) ( x ) + N s (cid:48) s (cid:48) ( y − x )2 (cid:33) s +1 s (cid:48) ind hyp (1) ≥ λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) (( N s (cid:48) ( y ) / s (cid:48) + c SN s (cid:48) s (cid:48) ( x, y )+ c D s (cid:48) ( x, y ) + c N s (cid:48) ( δ / ( y ) − x ) s (cid:48) ) s +1 s (cid:48) Lem 3 . ≥ λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) (( N s (cid:48) ( y ) / s (cid:48) + c SN s (cid:48) s (cid:48) ( x, y ) + c N s (cid:48) ( δ / ( y ) − x ) s (cid:48) ) s +1 s (cid:48) +(( N s (cid:48) ( y ) / s (cid:48) + c SN s (cid:48) s (cid:48) ( x, y )) s +1 − s (cid:48) s (cid:48) c D s (cid:48) ( x, y ) rearrange = λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) (( N s (cid:48) ( y ) / s (cid:48) + c SN s (cid:48) s (cid:48) ( x, y ) + c N s (cid:48) ( δ / ( y ) − x ) s (cid:48) ) s +1 s (cid:48) +(( N s (cid:48) ( y ) / s (cid:48) + c SN s (cid:48) s (cid:48) ( x, y )) s +1 − s (cid:48) s (cid:48) c D s (cid:48) ( x, y ) convexity ≥ λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c SN s +1) s (cid:48) ( x, y ) + c N s (cid:48) ( δ / ( y ) − x ) s +1) +(( N s (cid:48) ( y ) / s (cid:48) + c SN s (cid:48) s (cid:48) ( x, y )) s +1 − s (cid:48) s (cid:48) c D s (cid:48) ( x, y ) rearrange = λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c SN s +1) s (cid:48) ( x, y ) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + (( N s (cid:48) ( y ) / s (cid:48) + c SN s (cid:48) s (cid:48) ( x, y )) s +1 − s (cid:48) s (cid:48) c D s (cid:48) ( x, y ) ≥ λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c SN s +1) s (cid:48) ( x, y ) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + c (cid:107) ( x s +1 − s (cid:48) , y s +1 − s (cid:48) ) (cid:107) D s (cid:48) ( x, y ) rearrange = λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c SN s +1) s (cid:48) ( x, y ) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + (cid:80) (cid:98) ( s +1) / (cid:99) s (cid:48) =1 c (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) D s +1 − s (cid:48) ( x, y ) ≥ λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ c SN s +1) s ( x, y ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + (cid:80) (cid:98) ( s +1) / (cid:99) s (cid:48) =1 c (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) D s +1 − s (cid:48) ( x, y ) rearrange = λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ c SN s +1) s ( x, y ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + (cid:80) (cid:98) ( s +1) / (cid:99) s (cid:48) =1 c (cid:107) ( x s (cid:48) , y s (cid:48) ) (cid:107) D s +1 − s (cid:48) ( x, y ) Def 4 . = λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) )+ c SN s +1) s ( x, y ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + c ( D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) rearrange = c SN s +1) s ( x, y ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) + λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) ) =: ( ∗ )By Lemma 4.11, we can choose λ > c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) + λ (cid:107) x s +1 (cid:107) + (cid:107) ( y − x ) s +1 (cid:107) ) Lem 4 . ≥ λ (cid:107) x s +1 (cid:107) + (cid:107) x s +1 − y s +1 (cid:107) )= λ (cid:107) x s +1 (cid:107) + (cid:107) x s +1 − y s +1 (cid:107) ) + λ (cid:107) x s +1 (cid:107) + (cid:107) x s +1 − y s +1 (cid:107) ) ≥ λ (cid:107) y s +1 (cid:107) + c (cid:107) ( x s +1 , y s +1 ) (cid:107) ≥ − ( s +1) λ (cid:107) y s +1 (cid:107) + c (cid:107) ( x s +1 , y s +1 ) (cid:107) And thus we get( ∗ ) ≥ c SN s +1) s ( x, y ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) − ( s +1) λ (cid:107) y s +1 (cid:107) + c (cid:107) ( x s +1 , y s +1 ) (cid:107) = c SN s +1) s ( x, y ) + − ( s +1) λ (cid:107) y s +1 (cid:107) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) ( N s (cid:48) ( y ) / s +1) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) )+ c (cid:107) ( x s +1 , y s +1 ) (cid:107) = c SN s +1) s ( x, y ) + ( N s +1 ( y ) / s +1) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) )+ c (cid:107) ( x s +1 , y s +1 ) (cid:107) = c SN s +1) s ( x, y ) + c (cid:107) ( x s +1 , y s +1 ) (cid:107) + ( N s +1 ( y ) / s +1) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) + c (cid:107) ( x s +1 , y s +1 ) (cid:107) . ≥ c SN s +1) s +1 ( x, y ) + ( N s +1 ( y ) / s +1) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) + c (cid:107) ( x s +1 , y s +1 ) (cid:107) = ( N s +1 ( y ) / s +1) + c SN s +1) s +1 ( x, y ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) − (cid:107) ( x s +1 , y s +1 ) (cid:107) ) + c (cid:107) ( x s +1 , y s +1 ) (cid:107) ≥ ( N s +1 ( y ) / s +1) + c SN s +1) s +1 ( x, y ) + (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) rearrange = ( N s +1 ( y ) / s +1) + c SN s +1) s +1 ( x, y ) + c D s +1 ( x, y )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c D s +1 ( x, y ) Lem 4 . ≥ ( N s +1 ( y ) / s +1) + c SN s +1) s +1 ( x, y ) + c D s +1 ( x, y )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c (cid:107) ( δ / ( y ) − x ) s +1 (cid:107) rearrange = ( N s +1 ( y ) / s +1) + c SN s +1) s +1 ( x, y ) + c D s +1 ( x, y )+ (cid:80) ss (cid:48) = (cid:100) ( s +1) / (cid:101) c N s (cid:48) ( δ / ( y ) − x ) s +1) + c (cid:107) ( δ / ( y ) − x ) s +1 (cid:107) (4.1) ≥ ( N s +1 ( y ) / s +1) + c SN s +1) s +1 ( x, y ) + c D s +1 ( x, y )+ c N s +1 ( δ / ( y ) − x ) s +1) (cid:3) Lemma 4.17.

There exists a positive deﬁnite homogeneous quasi-norm N r on g and a constant c > (depending on g but not on x, y ) such that for all p ≥ r and all x, y ∈ g , ( N r ( x ) p + N r ( y − x ) p ) / − ( N r ( y ) / p ≥ c p/r N r ( δ / ( y ) − x ) p roof. Let N r , c be as in the conclusion of Lemma 4.16. Let p ≥ r . Then by convexity and thatlemma, ( N r ( x ) p + N r ( y − x ) p ) / ≥ (( N r ( x ) r + N r ( y − x ) r ) / p/r Lem 4 . ≥ (( N r ( y ) / r + cN r ( δ / ( y ) − x ) r ) p/r ≥ ( N r ( y ) / p + c p/r N r ( δ / ( y ) − x ) p (cid:3) Lemma 4.18.

There exists a left invariant, homogeneous, positive deﬁnite quasi-metric d N r on g and a constant c > (depending on g but not on w, x, y, z ) such that for all p ≥ r and w, x, y, z ∈ g , (2 d N r ( y, x ) p + d N r ( y, w ) p + d N r ( y, z ) p ) / − ( d N r ( x, w ) / p − ( d N r ( x, z ) / p ≥ c (cid:48) d N r ( w, z ) p Proof.

Let N r , c be as in the previous lemma. Let d N r be the metric derived from N r ; d N r ( x, y ) := N r ( y − x ). By left invariance of the metric, we may assume x = 0. Then by applying the previouslemma to each of the pairs ( y, w ) and ( y, z ), we obtain( d N r ( y, p + d N r ( y, w ) p ) / − ( d N r (0 , w ) / p ≥ c p/r d N r ( δ / ( w ) , p ( d N r ( y, p + d N r ( y, z ) p ) / − ( d N r (0 , z ) / p ≥ c p/r d N r ( δ / ( z ) , p Adding these and then using using H¨older, the quasi-triangle inequality, and homogeneity gives(2 d N r ( y, p + d N r ( y, w ) p + d N r ( y, z ) p ) / − ( d N r (0 , w ) / p − ( d N r (0 , z ) / p ≥ c p/r ( d N r ( δ / ( w ) , p + d N r ( δ / ( z ) , p ) ≥ − p +1 c p/r ( d N r ( δ / ( w ) ,

0) + d N r ( δ / ( z ) , p ≥ c (cid:48) d N r ( δ / ( w ) , δ / ( z )) p = 2 − p c (cid:48) d N r ( w, z ) p for some c (cid:48) > (cid:3) Theorem 4.19.

Every graded nilpotent Lie group of step r , equipped with a left invariant metrichomogeneous with respect to the dilations induced by the grading, is Markov p -convex for every p ∈ [2 r, ∞ ) .Proof. Markov p -convexity is invariant under biLipschitz equivalence. Thus, we need only show( g , d N r ) is Markov 2 p -convex for all p ≥ r , where d N r is the quasi-metric from Lemma 4.18. TheMarkov convexity of d N r follows from the 4-point inequality of Lemma 4.18 and the proof ofProposition 2.1 in [MN13]. (cid:3) Lower Bound on Markov Convexity of J r − ( R )The goal of this section is to prove Theorem 5.6, which occurs at the conclusion. The strategy is toconstruct a sequence of directed graphs (see Deﬁnition 5.1) with bad Markov convexity properties.These bad properties are manifested by the dispersive nature of random walks on the graphs. Thisis the content of Lemma 5.3. We then map these graphs into J r − ( R ) with suﬃcient control overthe distortion (Lemma 5.5) to prove Theorem 5.6.5.1. Directed Graphs and Random Walks.

Let ( N m ) ∞ m =0 be any sequence of integers with N = 0 and N m +1 ≥ max(1 , N m + (cid:100) ( m + 1) (cid:101) ). We’ll deﬁne a sequence of directed graphs(Γ m ) ∞ m =0 . The graphs will be directed from unique source vertex to unique and sink vertex, whichwe will denote by 0 m and 1 m , respectively. Let diam(Γ m ) be the number of edges in a directededge path from 0 to 1, which is also equal to the diameter of Γ m with respect to the shortest pathmetric. The construction will be such that diam(Γ m ) = 2 N m . eﬁnition 5.1. We’ll perform the construction and also prove that diam(Γ m ) = 2 N m by induction.Let Γ be the interval I , that is, a graph with two vertices 0 , m has been constructed for some m ≥

0. We deﬁne anintermediate graph Γ (cid:48) m +1 by gluing together a := 2 N m +1 −(cid:100) ( m +1) (cid:101)− copies of I , then A :=2 N m +1 − N m − N m +1 − N m −(cid:100) ( m +1) (cid:101) = 2 − N m (2 N m +1 − a ) = 2 N m +1 − N m (1 − −(cid:100) ( m +1) (cid:101) ) copiesof Γ m , then a more copies of I again together in series. The source vertex of this graph is thesource vertex of the ﬁrst copy of I , and the sink vertex is the sink vertex of the last copy of I . Thediameter of this graph is a · diam( I ) + A · diam(Γ m ) + a · diam( I ) ind hyp = 2 a + 2 N m A = 2 N m +1 We then deﬁne Γ m +1 to be two copies of Γ (cid:48) m +1 , denoted +Γ (cid:48) m +1 and − Γ (cid:48) m +1 , glued together inparallel. Denote the common source vertex 0 m and sink vertex 1 m . The diameter of Γ m +1 is thesame as the diameter of Γ (cid:48) m +1 . We note that each copy of Γ m in Γ m +1 is isometrically embedded;any shortest path between two points in a copy of Γ m ⊆ Γ m +1 completely belongs to Γ m .By swapping +Γ (cid:48) m +1 and − Γ (cid:48) m +1 in Γ m +1 , we obtain a directed graph involution ι : Γ m +1 → Γ m +1 .For q , q ∈ Γ m , ( q , q ) is called a vertical pair if d m ( q , m ) = d m ( q , m ).For each m ≥

0, let ( X mt ) Nm t =0 be the standard directed random walk on Γ m . Let d m denote theshortest path metric on Γ m . With full probability, d ( X mt , m ) = t for 0 ≤ t ≤ N m .See the two right-hand graphs of Figure 2 for what Γ and Γ look like when N = 0, N = 2,and N = 4. The graphs are drawn in such a way that the direction is from left to right, +Γ (cid:48) m liesabove the x -axis, and − Γ (cid:48) m lies below the x -axis. The source vertices 0 m are both drawn at (0 , are both drawn at (1 , Lemma 5.2.

For all p > and m ≥ , N m (cid:88) k =0 2 Nm (cid:88) t =1 E [ d m ( X mt , ˜ X mt ( t − k )) p ]2 kp ≥ m N m Π m − i =1 (1 − ( i + 1) − ) Proof.

Let p ≥

1. The proof is by induction on m . The base case m = 0 is trivially true. Assumethe inequality holds for some m ≥

0. Now we consider the standard random walk X m +1 t on Γ m +1 .Consider k and t in the range a + 1 ≤ t ≤ N m +1 − a , 0 ≤ k ≤ N m , where a = 2 N m +1 −(cid:100) ( m +1) (cid:101)− .Then t − k ≥ N m +1 −(cid:100) ( m +1) (cid:101)− + 1 − N m ≥

1, so X m +11 and ˜ X m +11 ( t − k ) agree. Thenfor all subsequent times, with full probability, X m +1 t and ˜ X m +1 t ( t − k ) belong to the same copyof Γ (cid:48) m +1 in Γ m +1 . Then, after recalling the construction of Γ (cid:48) m +1 as a number of copies of Γ m and I glued together, it can be seen that for the range of t in interest, X m +1 t and ˜ X m +1 t ( t − k )are standard random walks across A = 2 N m +1 − N m (1 − −(cid:100) ( m +1) (cid:101) ) consecutive copies of Γ m ,which we denote as A · X m +1 t and A · ˜ X m +1 t ( t − k ). Thus, under our assumptions on k and t , d m +1 ( X m +1 t , ˜ X m +1 t ( t − k )) has the same distribution as d m ( A · X mt , A · ˜ X mt ( t − k )). Hence weobtain by the inductive hypothesis N m (cid:88) k =0 2 Nm +1 − a (cid:88) t = a +1 E [ d m +1 ( X m +1 t , ˜ X m +1 t ( t − k )) p ]2 kp = N m (cid:88) k =0 2 Nm +1 − a (cid:88) t = a +1 E [ d m ( A · X mt , A · ˜ X mt ( t − k )) p ]2 kp = N m (cid:88) k =0 A (cid:88) T =1  a + T Nm (cid:88) t = a +( T − Nm +1 E [ d m ( A · X mt , A · ˜ X mt ( t − k )) p ]2 kp  N m (cid:88) k =0 A (cid:88) T =1 2 Nm (cid:88) t =1 E [ d m ( X mt , ˜ X mt ( t − k )) p ]2 kp ind hyp ≥ A (cid:88) T =1 m N m Π m − i =1 (1 − ( i + 1) − )= 2 N m A m m − i =1 (1 − ( i + 1) − ) = 2 N m +1 (1 − −(cid:100) ( m +1) (cid:101) ) m m − i =1 (1 − ( i + 1) − ) ≥ N m +1 (1 − ( m + 1) − ) m m − i =1 (1 − ( i + 1) − ) = m N m +1 Π mi =1 (1 − ( i + 1) − )In summary, N m (cid:88) k =0 2 Nm +1 − a (cid:88) t = a +1 E [ d m +1 ( X m +1 t , ˜ X m +1 t ( t − k )) p ]2 kp ≥ m N m +1 Π mi =1 (1 − ( i + 1) − ) (5.1)Now consider k and t in the range 0 ≤ k ≤ N m +1 −

1, 1 ≤ t ≤ k , so that t − k ≤

0. Notethat this means this range is disjoint from the one previously considered. Since t − k ≤

0, therandom walks X m +1 and ˜ X m +1 ( t − k ) evolved independently immediately. Thus, with probability1/2, X m +1 and ˜ X m +1 ( t − k ) belong to diﬀerent copies of Γ (cid:48) m +1 in Γ m +1 . This implies that, withprobability 1/2, d m +1 ( X m +1 t , ˜ X m +1 t ( t − k )) = 2 t . Thus, N m +1 − (cid:88) k =0 2 k (cid:88) t =1 E [ d m +1 ( X m +1 t , ˜ X m +1 t ( t − k )) p ]2 kp ≥ N m +1 − (cid:88) k =0 2 k (cid:88) t =1 (2 t ) p kp +1Lem 3 . > N m +1 − (cid:88) k =0 k ( p +1) kp +2 = N m +1 − (cid:88) k =0 k − = 2 N m +1 − − ≥

18 2 N m +1 In summary, N m +1 − (cid:88) k =0 2 k (cid:88) t =1 E [ d m +1 ( X m +1 t , ˜ X m +1 t ( t − k )) p ]2 kp >

18 2 N m +1 (5.2)Again, notice that in (5.1) and (5.2), the range of t , k we consider are disjoint from each otherand are subsets of the range 0 ≤ k ≤ N m +1 , 1 ≤ t ≤ N m +1 . Thus, by adding (5.1) and (5.2), weobtain N m +1 (cid:88) k =0 2 Nm +1 (cid:88) t =1 E [ d m ( X mt , ˜ X mt ( t − k )) p ]2 kp > m N m +1 Π mi =1 (1 − ( i + 1) − ) + 18 2 N m +1 > (cid:18) m + 18 (cid:19) N m +1 Π mi =1 (1 − ( i + 1) − )completing the inductive step. (cid:3) Lemma 5.3. ∞ (cid:88) k =0 2 Nm (cid:88) t =1 E [ d m ( X mt , ˜ X mt ( t − k )) p ]2 kp (cid:38) m N m for all p > .Proof. This follows from Lemma 5.2 and the fact that Π m − i =1 (1 − ( i + 1) − ) > Π ∞ i =1 (1 − ( i + 1) − ) > m ≥ (cid:3) .2. Mapping the Graphs into J r − ( R ) .Lemma 5.4. There exists φ ∈ C r − , ([0 , such that(1) φ is symmetric across the line x = , that is, φ ( x ) = φ (1 − x ) for all x ∈ [0 , ] .(2) φ ( x ) ≥ (2 x ) r for all x ∈ [0 , ] .(3) [ j r − (0)]( φ ) = (0 , , and thus by (1), [ j r − (1)]( φ ) = (1 , .(4) For every integer ≤ i < r and every x ∈ [ i − r , ( i + 1)2 − r ) , φ ( r ) ( x ) = φ ( r ) ( i − r ) (so φ ( r ) is constant on intervals of this form).Since φ ∈ C r − , ([0 , , φ ( r ) ∈ L ∞ ([0 , . We also remark here that whenever dealing with L ∞ functions, we choose representatives that are everywhere (not just almost everywhere) bounded bytheir norm.Proof. The proof is by induction on r . For the base case r = 1, deﬁne φ ( x ) := (cid:26) x x ∈ [0 , ]2 − x x ∈ [ , φ satisﬁes (1) - (4).Now suppose such a function φ exists for some r ≥

1. We’ll construct a function ψ that satisﬁes(1) - (4) for r + 1. Deﬁne φ ∈ C r − , ([0 , φ ( x ) := (cid:26) φ (2 x ) x ∈ [0 , ] − φ (2 − x ) x ∈ [ , ∈ C r, ([0 , x ) := ˆ x φ ( ξ ) dξ Φ satisﬁes (1), (3), and (4) by the inductive hypothesis. Note that the inductive hypothesis appliedto (2) implies φ ( x ) ≥ r (2 x ) r for every x ∈ [0 , ], and henceΦ( x ) ≥ r − r + 1 (2 x ) r +1 ≥

12 (2 x ) r +1 Also, since φ ≥

0, (which follows from the inductive hypothesis applied to (1) and (2)),Φ( x ) ≥ Φ (cid:18) (cid:19) ≥ (cid:18) (cid:19) r +2 for all x ∈ [ , ]. Together, these two inequalities imply ψ ( x ) := 2 r +2 Φ( x ) ≥ (2 x ) r +1 for all x ∈ [0 , ]. Thus, ψ satisﬁes (1)-(4), completing the inductive step. (cid:3) See Figure 1 for graphs of φ and its ﬁrst two derivatives when r = 3. Note that these graphs arenot on the same scale. Lemma 5.5.

Let φ be the function from Lemma 5.4. Set N = 0 , and for m ≥ , set N m := (cid:100) Cm log ( m + 1) (cid:101) , where C is a suﬃciently large constant to be chosen later, so that N m ≥ r and N m +1 ≥ max(1 , N m + (cid:100) ( m + 1) (cid:101) ) . Then there exists a sequence of maps F m : Γ m → J r − ( R ) such that, for all m ≥ and all directed paths γ from m to m in Γ m , there is a function φ γ ∈ C r − , ([0 , N m ]) such that(1) [ j r − (0)]( φ γ ) = (0 , and [ j r − (2 N m )]( φ γ ) = (2 N m , .(2) After isometrically identifying γ with [0 , N m ] via q (cid:55)→ d m ( q, m ) , F m restricted to γ equalsthe jet of φ γ ; F m ( t ) = [ j r − ( t )]( φ γ ) . ϕ x ϕ ′ x ϕ ′′ Figure 1.

Graphs of the function φ from Lemma 5.4 and its ﬁrst two derivativeswhen r = 3. Note that these are not shown to the same scale. (3) For all vertical pairs ( q , q ) ∈ Γ m × Γ m , √ m ln( m + 1) | π ( F m ( q )) − F m ( q )) | ≥ d m ( q , q ) r (4) Let γ ( X m ) denote the directed path followed by the random walk X m (so γ ( X m ) is itself apath-valued random variable). For all y ∈ R , and ≤ t < N m , E (cid:34) exp (cid:32) y (cid:32) sup [ t,t +1] φ ( r ) γ ( X m ) (cid:33)(cid:33)(cid:35) ≤ exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) and E (cid:20) exp (cid:18) y (cid:18) inf [ t,t +1] φ ( r ) γ ( X m ) (cid:19)(cid:19)(cid:21) ≤ exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) and thus there exists a constant B < ∞ (not depending on y , t , or m ) such that E (cid:20) exp (cid:18) y (cid:13)(cid:13)(cid:13) φ ( r ) γ ( X m ) (cid:13)(cid:13)(cid:13) L ∞ [ t,t +1] (cid:19)(cid:21) ≤ e By (5) (cid:107) φ ( r ) γ (cid:107) ∞ ≤ √ m (cid:107) φ ( r ) (cid:107) ∞ .(6) (cid:107) π ◦ F m (cid:107) ∞ ≤ r ( m + 1) Crm +1 (cid:107) φ (cid:107) ∞ .Proof. The proof is by induction on m . The base case m = 0 is easy, we simply deﬁne F to be thejet of the 0 function on Γ = I . Then (1) - (6) hold. Assume such a sequence of maps F , . . . F m exist for some m ≥

0. Set K := (cid:107) π ◦ F m (cid:107) ∞ (5.3)Since N m +1 ≥ C ( m + 1) log ( m + 2), we may (and do) choose C suﬃciently large so that K ind hyp (6) ≤ (cid:107) φ (cid:107) ∞ r ( m + 1) Crm +1 ≤ r ( N m +1 −(cid:100) ( m +1) (cid:101)− − √ m + 1 ln( m + 2) (5.4)Deﬁne ˜ φ ∈ C r − , ([0 , N m +1 ]) by˜ φ ( x ) := 2 rN m +1 √ m + 1 ln( m + 2) φ (2 − N m +1 x )Note that since N m +1 ≥ r , Lemma 5.4(4) tells us:˜ φ ( r ) ( x ) = ˜ φ ( r ) ( i ) (5.5)for every integer 0 ≤ i < N m and every x ∈ [ i, i + 1). We also have by the chain rule (cid:13)(cid:13)(cid:13) ˜ φ ( r ) (cid:13)(cid:13)(cid:13) ∞ = (cid:13)(cid:13) φ ( r ) (cid:13)(cid:13) ∞ √ m + 1 ln( m + 2) (5.6) nd additionally (cid:13)(cid:13)(cid:13) ˜ φ (cid:13)(cid:13)(cid:13) ∞ ≤ rN m +1 (cid:107) φ (cid:107) ∞ ≤ r ( C ( m +1) log ( m +2)+1) (cid:107) φ (cid:107) ∞ = 2 r ( m + 2) Cr ( m +1) (cid:107) φ (cid:107) ∞ (5.7)We will now deﬁne the function F m +1 on Γ m +1 = +Γ (cid:48) m +1 ∪ − Γ (cid:48) m +1 . Let us ﬁrst work with+Γ (cid:48) m +1 . Let γ be a directed path from 0 m to 1 m in +Γ (cid:48) m +1 . Then by deﬁnition of +Γ (cid:48) m +1 , γ consists of a = 2 N m +1 −(cid:100) ( m +1) (cid:101)− copies of I , then A = 2 − N m (2 N m +1 − a ) copies of diﬀerentdirected paths γ i , 1 ≤ i ≤ A , each belonging to Γ m and connecting 0 m to 1 m , then a more copies of I glued together in series. Identify γ isometrically with [0 , N m +1 ] via q (cid:55)→ d m +1 ( q, m +1 ). Underthis identiﬁcation, the ﬁrst set of copies of I gets identiﬁed with the subinterval [0 , a ], each γ i getsidentiﬁed with the subinterval [ a +( i − N m , a + i N m ], and the last set of copies of I gets identiﬁedwith the subinterval [2 N m +1 − a, N m +1 ]. We then deﬁne φ γ := ˜ φ + f γ (5.8)where f γ is deﬁned as follows: f γ is identically 0 on [0 , a ] ∪ [2 N m +1 − a, N m +1 ], and f γ ( x ) = φ γ i ( x − a − ( i − N m ) on [ a + ( i − N m , a + i N m ] ( φ γ i is given to us by the inductive hypothesis).By the inductive hypothesis applied to (1) and Lemma 5.4(3), φ γ ∈ C r − , ([0 , N m +1 ]) and satisﬁes(1). It is also clear from this deﬁnition, (5.6), and the inductive hypothesis applied to (5) that (cid:13)(cid:13)(cid:13) φ ( r ) γ (cid:13)(cid:13)(cid:13) ∞ (5.8) ≤ (cid:13)(cid:13)(cid:13) ˜ φ ( r ) (cid:13)(cid:13)(cid:13) ∞ + max ≤ i ≤ A (cid:13)(cid:13)(cid:13) φ ( r ) γ i (cid:13)(cid:13)(cid:13) ∞ (5.6) ≤ (cid:13)(cid:13) φ ( r ) (cid:13)(cid:13) ∞ √ m + 1 ln( m + 2) + max ≤ i ≤ A (cid:13)(cid:13)(cid:13) φ ( r ) γ i (cid:13)(cid:13)(cid:13) ∞ ind hyp (5) ≤ (cid:13)(cid:13) φ ( r ) (cid:13)(cid:13) ∞ √ m + 1 ln( m + 2) + 2 √ m (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ ≤ √ m + 1 (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ verifying (5). We can ﬁnally deﬁne F m +1 on +Γ (cid:48) m +1 by declaring it to be the jet of φ γ on γ .We need to check that F m +1 is well-deﬁned. Since every point of +Γ m +1 is contained in somedirected path from 0 m to 1 m , we only need to check what happens when one point belongs to twodiﬀerent paths. Let q ∈ +Γ (cid:48) m +1 and suppose q ∈ γ ∩ γ (cid:48) for some directed paths γ, γ (cid:48) from 0 m +1 to 1 m +1 in +Γ (cid:48) m +1 . Set t := d ( q, m +1 ). There are two cases: t ∈ [0 , a ] ∪ [2 N m +1 − a, N m +1 ] or t ∈ [ a + ( i − N m , a + i N m ] for some i . Assume the ﬁrst case holds. Then our deﬁnition of F m +1 ( q ) based on either q ∈ γ or q ∈ γ (cid:48) is F m +1 ( q ) = [ j r − ( t )]( ˜ φ )so well-deﬁnedness holds in this case. In the other case, our deﬁnition of F m +1 ( q ) based on q ∈ γ is, by the inductive hypothesis applied to (2), F m +1 ( q ) = [ j r − ( t )]( ˜ φ ) + ([ j r − ( t − a − ( i − N m )]( φ γ i ) + ( a + ( i − N m − t, ind hyp = [ j r − ( t )]( ˜ φ ) + F m ( q ) + ( a + ( i − N m − t, q ∈ γ (cid:48) , F m +1 ( q ) = [ j r − ( t )]( ˜ φ ) + ([ j r − ( t − a − ( i − N m )]( φ γ (cid:48) i ) + ( a + ( i − N m − t, ind hyp = [ j r − ( t )]( ˜ φ ) + F m ( q ) + ( a + ( i − N m − t, a + ( i − N m − t,

0) is present so that the x -coordinate of the entire expressionwill be t , and that we identify q as belonging to a copy of Γ m so that F m ( q ) makes sense) so well-deﬁnedness holds in this case as well. Thus F m +1 is well-deﬁned on +Γ (cid:48) m +1 . We deﬁne F m +1 on − Γ m +1 by F m +1 ( q ) = − F m +1 ( ι ( q )), where ι : +Γ (cid:48) m +1 → − Γ (cid:48) m +1 is the involution. It follows fromthis that if γ is a directed 0 m +1 -1 m +1 path in − Γ (cid:48) m +1 , then φ γ = − φ ι ( γ ) . Thus, (1) and (2) are u xu xu xu Figure 2.

Above, the image of Γ , and below, the image of Γ , based on N = 0, N = 2, N = 4, in J ( R ) under the map F . J ( R ) is identiﬁed with R via thecoordinates x, u , u . These are not drawn to the same scale. The two images onthe right are respectively graph isomorphic to Γ and Γ .satisﬁed. It remains to show (3), (4), and (6). Before doing so, let us summarize the discussion on F m +1 of this paragraph: for q ∈ Γ m +1 and t = d m +1 ( q, m +1 ), F m +1 ( q ) =  [ j r − ( t )]( ˜ φ ) t ∈ [0 , a ] ∪ [2 N m +1 − a, N m +1 ] q ∈ +Γ (cid:48) m +1 [ j r − ( t )]( ˜ φ )+ F m ( q ) + ( a + ( i − N m − t, t ∈ [ a + ( i − N m , a + i N m ] q ∈ +Γ (cid:48) m +1 [ j r − ( t )]( − ˜ φ ) t ∈ [0 , a ] ∪ [2 N m +1 − a, N m +1 ] q ∈ − Γ (cid:48) m +1 [ j r − ( t )]( − ˜ φ ) − F m ( q ) − ( a + ( i − N m − t, t ∈ [ a + ( i − N m , a + i N m ] q ∈ − Γ (cid:48) m +1 (5.9) ee Figure 2 for the images of Γ and Γ , based on N = 0, N = 2, N = 4, in J ( R ). Using(5.9), we can quickly verify (6): (cid:107) π ◦ F m +1 (cid:107) ∞ (5.9) ≤ (cid:13)(cid:13)(cid:13) ˜ φ (cid:13)(cid:13)(cid:13) ∞ + (cid:107) π ◦ F m (cid:107) ∞ ind hyp (6) ≤ (cid:13)(cid:13)(cid:13) ˜ φ (cid:13)(cid:13)(cid:13) ∞ + 2 r ( m + 1) Crm +1 (cid:107) φ (cid:107) ∞ (5.7) ≤ r ( m + 2) Cr ( m +1) (cid:107) φ (cid:107) ∞ + 2 r ( m + 1) Crm +1 (cid:107) φ (cid:107) ∞ ≤ r ( m + 2) Cr ( m +1)+1 (cid:107) φ (cid:107) ∞ (3) and (4) require more involved arguments.Proof of (3). Let ( q , q ) ∈ Γ m +1 × Γ m +1 be a vertical pair. By deﬁnition of vertical pair, d m +1 ( q , m +1 ) = d m +1 ( q , m +1 ). Let t denote this common value. There are two cases, q , q belong to the same copy of Γ (cid:48) m +1 , or they belong to diﬀerent copies. First assume they belongto the same copy. Without loss of generality say +Γ (cid:48) m +1 . Then there are two subcases for t : t ∈ [0 , a ] ∪ [2 N m +1 − a, N m +1 ] or t ∈ [ a + ( i − N m , a + i N m ] for some 1 ≤ i ≤ A . Assume the ﬁrstsubcase holds. Then by construction of +Γ (cid:48) m +1 , q , q belong to a copy of I , and thus the equality d m +1 ( q , m +1 ) = d m +1 ( q , m +1 ) implies q = q , so (3) trivially holds. Assume the second subcasefor t . Then | π ( F m +1 ( q ) − F m +1 ( q )) | (5.9) = | π ( F m ( q ) − F m ( q )) | and so (3) holds by the inductive hypothesis.Now assume we are in the second case where q , q belong to diﬀerent copies of Γ (cid:48) m +1 . Withoutloss of generality, assume q ∈ +Γ (cid:48) m +1 and q ∈ − Γ (cid:48) m +1 . Observe that under this assumption, d m +1 ( q , q ) = 2 t if t ≤ N m +1 − and d m +1 ( q , q ) = 2(2 N m +1 − t ) if t ≥ N m +1 − . Because of thesymmetry of ˜ φ about the line x = 2 N m +1 − , it suﬃces to only check the case t ≤ N m +1 − . Let usﬁrst record the following inequality: π ([ j r − ( t )]( ˜ φ )) ≥ (2 t ) r √ m + 1 ln( m + 2) (5.10)which can be proven by π ([ j r − ( t )]( ˜ φ )) = ˜ φ ( t ) = 2 rN m +1 √ m + 1 ln( m + 2) φ (2 − N m +1 t ) Lem 5 . ≥ (2 t ) r √ m + 1 ln( m + 2)Again split into two subcases: t ∈ [0 , a ] or t ∈ [ a, N m +1 − ]. In the ﬁrst subcase we have π ( F m +1 ( q )) (5.9) = π ([ j r − ( t )]( ˜ φ )) (5.10) ≥ (2 t ) r √ m + 1 ln( m + 2)and π ( F m +1 ( q )) (5.9) = π ([ j r − ( t )]( − ˜ φ )) (5.10) ≤ − (2 t ) r √ m + 1 ln( m + 2)and thus | π ( F m +1 ( q ) − F m +1 ( q )) | ≥ t ) r √ m + 1 ln( m + 2) = 2 d m +1 ( q , q ) r √ m + 1 ln( m + 2)proving (3) in this subcase.Now assume the second subcase, t ∈ [ a, N m +1 − ]. Then π ( F m +1 ( q )) (5.9) = π ([ j r − ( t )]( ˜ φ ) + F m ( q ) + ( a + ( i − N m − t, π ([ j r − ( t )]( ˜ φ )) + π ( F m ( q )) (5.3) ≥ π ([ j r − ( t )]( ˜ φ )) − K (5.4) ≥ π ([ j r − ( t )]( ˜ φ )) − r ( N m +1 −(cid:100) ( m +1) (cid:101) ) − √ m + 1 ln( m + 2) (5.10) ≥ (2 t ) r − r ( N m +1 −(cid:100) ( m +1) (cid:101) ) − √ m + 1 ln( m + 2) = (2 t ) r − (2 a ) r / √ m + 1 ln( m + 2) ≥ (2 t ) r − (2 t ) r / √ m + 1 ln( m + 2) = (2 t ) r √ m + 1 ln( m + 2) imilarly, π ( F m +1 ( q )) ≤ − (2 t ) r √ m + 1 ln( m + 2)and thus | π ( F m +1 ( q ) − F m +1 ( q )) | ≥ (2 t ) r √ m + 1 ln( m + 2) = d m +1 ( q , q ) r √ m + 1 ln( m + 2)proving (3) in this ﬁnal subcase.Proof of (4). Let 0 ≤ t < N m +1 be an arbitrary integer. Again we consider two cases for t : t ∈ [0 , a ) ∪ [2 N m +1 − a, N m +1 ) or t ∈ [ a, N m +1 − a ). Assume the ﬁrst case holds. There are twosubcases to consider for γ ( X m +1 ): γ ( X m +1 ) belongs to +Γ (cid:48) m +1 or γ ( X m +1 ) belongs to − Γ (cid:48) m +1 .These are complementary events each occuring with probability 1/2. Restricted to the ﬁrst event,for every x ∈ [ t, t + 1], φ ( r ) γ ( X m +1 ) ( x ) (5.8) = ˜ φ ( r ) ( x ) + f γ ( X m +1 ) ( x ) = ˜ φ ( r ) ( x ) (5.5) = ˜ φ ( r ) ( t )where the second equality holds by the deﬁnition of f succeeding (5.8). Thus,sup [ t,t +1] φ ( r ) γ ( X m +1 ) = inf [ t,t +1] φ ( r ) γ ( X m +1 ) = ˜ φ ( r ) ( t )Likewise, for the second subcase where we restrict to the event that γ ( X m +1 ) belongs to − Γ (cid:48) m +1 ,sup [ t,t +1] φ ( r ) γ ( X m +1 ) = inf [ t,t +1] φ ( r ) γ ( X m +1 ) = − ˜ φ ( r ) ( t )Combining these yields E (cid:34) exp (cid:32) y (cid:32) sup [ t,t +1] φ ( r ) γ ( X m +1 ) (cid:33)(cid:33)(cid:35) = 12 (cid:16) exp (cid:16) y ˜ φ ( r ) ( t ) (cid:17) + exp (cid:16) − y ˜ φ ( r ) ( t ) (cid:17)(cid:17) = cosh (cid:16) y ˜ φ ( r ) ( t ) (cid:17) ≤ cosh (cid:16) y (cid:13)(cid:13)(cid:13) ˜ φ ( r ) (cid:13)(cid:13)(cid:13) ∞ (cid:17) (5.6) = cosh (cid:18) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ √ m + 1 ln( m + 2) (cid:19) Lem 3 . ≤ exp (cid:18) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m + 1) ln( m + 2) (cid:19) and the same estimate holds for the essential inﬁmum, verifying (4) in this case.Now consider the second case, t ∈ [ a + ( i − N m , a + i N m ] for some 1 ≤ i ≤ A . Again, thereare two subcases to consider for γ ( X m +1 ): γ ( X m +1 ) belongs to +Γ (cid:48) m +1 or γ ( X m +1 ) belongs to − Γ (cid:48) m +1 . Restricted to the ﬁrst event, and for the range of t under consideration, X m +1 is equal indistribution to a copy of X m (after an appropriate shift in the time parameter), by deﬁnition of+Γ (cid:48) m +1 . Thus, for every x ∈ [ t, t + 1], φ ( r ) γ ( X m +1 ) ( x ) (5.8) = ˜ φ ( r ) ( x ) + f γ ( X m +1 ) ( x ) = ˜ φ ( r ) ( x ) + φ γ ( X m ) ( x (cid:48) ) (5.5) = ˜ φ ( r ) ( t ) + φ γ ( X m ) ( x (cid:48) )where x (cid:48) = x − a − ( i − N m , and the second equality holds by the deﬁnition of f succeeding (5.8).Thus, sup [ t,t +1] φ ( r ) γ ( X m +1 ) = ˜ φ ( r ) ( t ) + sup [ t (cid:48) ,t (cid:48) +1] φ γ ( X m ) inf [ t,t +1] φ ( r ) γ ( X m +1 ) = ˜ φ ( r ) ( t ) + inf [ t (cid:48) ,t (cid:48) +1] φ γ ( X m )32 here t (cid:48) = t − a − ( i − N m . Likewise, for the second subcase where we restrict to the event that γ ( X m +1 ) belongs to − Γ (cid:48) m +1 , sup [ t,t +1] φ ( r ) γ ( X m +1 ) = − ˜ φ ( r ) ( t ) − inf [ t (cid:48) ,t (cid:48) +1] φ γ ( X m ) inf [ t,t +1] φ ( r ) γ ( X m +1 ) = − ˜ φ ( r ) ( t ) − sup [ t (cid:48) ,t (cid:48) +1] φ γ ( X m ) Combining these and using the inductive hypothesis applied to (4) and some basic monotonicityand symmetry properties of cosh yields E (cid:34) exp (cid:32) y (cid:32) sup [ t,t +1] φ ( r ) γ ( X m +1 ) (cid:33)(cid:33)(cid:35) = 12 exp (cid:16) y ˜ φ ( r ) ( t ) (cid:17) E (cid:34) exp (cid:32) y (cid:32) sup [ t (cid:48) ,t (cid:48) +1] φ ( r ) γ ( X m +1 ) (cid:33)(cid:33)(cid:35) + 12 exp (cid:16) − y ˜ φ ( r ) ( t ) (cid:17) E (cid:20) exp (cid:18) − y (cid:18) inf [ t (cid:48) ,t (cid:48) +1] φ ( r ) γ ( X m +1 ) (cid:19)(cid:19)(cid:21) ind hyp ≤

12 exp (cid:16) y ˜ φ ( r ) ( t ) (cid:17) exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) + 12 exp (cid:16) − y ˜ φ ( r ) ( t ) (cid:17) exp (cid:32) ( − y ) (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) = cosh (cid:16) y ˜ φ ( r ) ( t ) (cid:17) exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) ≤ cosh (cid:16) y (cid:13)(cid:13)(cid:13) ˜ φ ( r ) (cid:13)(cid:13)(cid:13) ∞ (cid:17) exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) (5.6) = cosh (cid:18) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ √ m + 1 ln( m + 2) (cid:19) exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) Lem 3 . ≤ exp (cid:18) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m + 1) ln( m + 2) (cid:19) exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m (cid:88) n =1 n ln( n + 1) (cid:33) = exp (cid:32) y (cid:13)(cid:13)(cid:13) φ ( r ) (cid:13)(cid:13)(cid:13) ∞ m +1 (cid:88) n =1 n ln( n + 1) (cid:33) and the same estimate holds for the inﬁmum, verifying (4) in this case. This completes the inductivestep and the proof of the lemma. (cid:3) Theorem 5.6.

For every p > , r ≥ , coarsely dense set N ⊆ J r − ( R ) , and R ≥ , let B N ( R ) := { x ∈ N : d CC (0 , x ) ≤ R } . Then Π p ( B N ( R )) (cid:38) ln( R ) p − r ln(ln( R )) p + r where the implicit constant can depend on r, p but not on N, R . roof. Let p, r, N be as above. Since the Markov convexity constant Π p is scale-invariant, then byapplying a dilation we may assume without loss of generality that every point of J r − ( R ) is at adistance of at most 1 away from a point of N . Let F m : Γ m → J r − ( R ) be the sequence of mapsfrom Lemma 5.5. Extend the domain of t for the random walks on Γ m by X mt := X m if t ≤

0, and X mt := X m Nm if t ≥ N m . Each { X mt } t ∈ Z is a Markov process on the state space Γ m .With full probability, d CC ( X mt , m ) = min(max(0 , t ) , N m ). Since ˜ X mt ( t − k ) equals X mt indistribution, ( X mt , ˜ X mt ( t − k )) is a vertical pair with full probability. Then Lemma 5.5(3) applies,and we get the following lower bound for the left hand side of the Markov convexity inequality inDeﬁnition 1.1: ∞ (cid:88) k =0 (cid:88) t ∈ Z E [ d CC ( F m ( X mt ) , F m ( ˜ X mt ( t − k ))) p ]2 kp Lem 3 . ≥ ∞ (cid:88) k =0 (cid:88) t ∈ Z E [ | π ( f m ( X mt ) − f m ( ˜ X mt ( t − k ))) | p/r ]2 kp Lem 5 . ≥ m − p r ln( m + 1) ps ∞ (cid:88) k =0 (cid:88) t ∈ Z E [ d m ( X mt , ˜ X mt ( t − k )) p ]2 kp Lem 5 . (cid:38) m − p r ln( m +1) − pr m N m = m − p r N m ln( m + 1) pr In summary, ∞ (cid:88) k =0 (cid:88) t ∈ Z E [ d CC ( F m ( X mt ) , F m ( ˜ X mt ( t − k ))) p ]2 kp (cid:38) m − p r N m ln( m + 1) pr (5.11)Now we upper bound the right hand side of the Markov convexity inequality. Since d CC ( F m ( X mt +1 ) , F m ( X mt )) = 0 whenever t ≤ t ≥ N m , (cid:88) t ∈ Z E [ d CC ( F m ( X mt +1 ) , F m ( X mt )) p ] = Nm − (cid:88) t =0 E [ d CC ( F m ( X mt +1 ) , F m ( X mt )) p ] =: ( ∗ ) (5.12)Then ( ∗ ) Lem 5 . = Nm − (cid:88) t =0 E (cid:2) d CC ([ j r − ( t + 1)]( φ γ ( X m ) )([ j r − ( t )]( φ γ ( X m ) )) p (cid:3) Lem 3 . ≤ Nm − (cid:88) t =0 E (cid:20)(cid:18) (cid:13)(cid:13)(cid:13) φ ( r ) γ ( X m ) (cid:13)(cid:13)(cid:13) L ∞ [ t,t +1] (cid:19) p (cid:21) (cid:46) Nm − (cid:88) t =0 E (cid:20)(cid:13)(cid:13)(cid:13) φ ( r ) γ ( X m ) (cid:13)(cid:13)(cid:13) pL ∞ [ t,t +1] (cid:21) Lems 3 . , . (cid:46) Nm − (cid:88) t =0 N m In summary, (cid:88) t ∈ Z E [ d CC ( F m ( X mt +1 ) , F m ( X mt )) p ] (cid:46) N m (5.13)Let π N : J r − ( R ) → N be any map so that d CC ( x, π N ( x )) ≤ π N to transfer inequalities (5.11) and (5.13) tocorresponding inequalities on N . Consider the maps ¯ F m : Γ m → N deﬁned by ¯ F m := π N ◦ δ m ◦ F m .By Lemma 5.5(3), d CC ( δ m ( F m ( q )) , δ m ( F m ( q ))) ≥ d m ( q , q ) ≥ q , q ) ∈ Γ m × Γ m . Combining this with (5.14) yields d CC ( ¯ F m ( q ) , ¯ F m ( q )) (5.14) ≥ d CC ( δ m ( F m ( q )) , δ m ( F m ( q ))) − ≥ d CC ( δ m ( F m ( q )) , δ m ( F m ( q ))) = md CC ( F m ( q ) , F m ( q )) or any vertical pair ( q , q ). Combining this with (5.11) yields ∞ (cid:88) k =0 (cid:88) t ∈ Z E [ d CC ( ¯ F m ( X mt ) , ¯ F m ( ˜ X mt ( t − k ))) p ]2 kp (cid:38) m p +1 − p r N m ln( m + 1) pr (5.15)Next, d CC ( ¯ F m ( X mt +1 ) , ¯ F m ( X mt )) (5.14) ≤ d CC ( δ m ( F m ( X mt +1 )) , δ m ( F m ( X mt ))) + 2= 2 md CC ( F m ( X mt +1 ) , F m ( X mt )) + 2Combining this with (5.13) and (5.12) yields (cid:88) t ∈ Z E [ d CC ( ¯ F m ( X mt +1 ) , ¯ F m ( X mt )) p ] (cid:46) m p N m (5.16)For each R ≥

1, let m ( R ) denote the largest m so that ¯ F m ( R ) (Γ m ( R ) ) ⊆ B N ( R ). Then (5.15) and(5.16) imply Π p ( B N ( R )) (cid:38) m ( R ) p − r ln( m ( R ) + 1) r (5.17)Now we wish to estimate the quantity m ( R ). Let m ≥ m are connected by a geodesic that is a piecewise directed path, the Lipschitz constant of any mapon Γ m is the maximum of the Lipschitz constants of the map restricted to directed paths. Thus,by Lemmas 5.5(2), 5.5(5), and 3.1, Lip( F m ) (cid:46) √ m . Since diam(Γ m ) = 2 N m ≤ Cm log ( m +1)+1 and F m (0 m ) = 0, this implies F m (Γ m ) ⊆ B J r − ( R ) ( R (cid:48) ) with R (cid:48) (cid:46) ( m + 1) Cm + . Then δ m ( F m (Γ m )) ⊆ B J r − ( R ) ( R (cid:48)(cid:48) ) with R (cid:48)(cid:48) (cid:46) ( m + 1) Cm + . Then ¯ F m (Γ m ) = π N ( δ m ( F m (Γ m ))) ⊆ B J r − ( R ) ( R (cid:48)(cid:48) + 1).This implies, for any R ≥ R (cid:46) ( m ( R ) + 1) Cm ( R )+ , where the implied constant is independentof R . This implies m ( R ) (cid:38) ln( R )ln(ln( R )) for R ≥

3. Plugging this into (5.17) yieldsΠ p ( B N ( R )) (cid:38) ln( R ) p − r ln(ln( R )) p + r (cid:3) References [ANT13] Tim Austin, Assaf Naor, and Romain Tessera,

Sharp quantitative nonembeddability of the Heisenberggroup into superreﬂexive Banach spaces , Groups Geom. Dyn. (2013), no. 3, 497–522. MR 3095705[Ass83] Patrice Assouad, Plongements lipschitziens dans R n , Bull. Soc. Math. France (1983), no. 4, 429–448.MR 763553[BLU07] A. Bonﬁglioli, E. Lanconelli, and F. Uguzzoni, Stratiﬁed Lie groups and potential theory for their sub-Laplacians , Springer Monographs in Mathematics, Springer, Berlin, 2007. MR 2363343[Bou86] J. Bourgain,

The metrical interpretation of superreﬂexivity in Banach spaces , Israel J. Math. (1986),no. 2, 222–230. MR 880292[CK06] Jeﬀ Cheeger and Bruce Kleiner, On the diﬀerentiability of Lipschitz maps from metric measure spaces toBanach spaces , Inspired by S. S. Chern, Nankai Tracts Math., vol. 11, World Sci. Publ., Hackensack, NJ,2006, pp. 129–152. MR 2313333[HS90] Waldemar Hebisch and Adam Sikora,

A smooth subadditive homogeneous norm on a homogeneous group ,Studia Math. (1990), no. 3, 231–236. MR 1067309[JL84] William B. Johnson and Joram Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space ,Conference in modern analysis and probability (New Haven, Conn., 1982), Contemp. Math., vol. 26,Amer. Math. Soc., Providence, RI, 1984, pp. 189–206. MR 737400[Jun17] Derek Jung,

A variant of Gromov’s problem on H¨older equivalence of Carnot groups , J. Math. Anal. Appl. (2017), no. 1, 251–273. MR 3680967[Jun19] ,

Bilipschitz embeddings of spheres into jet space Carnot groups not admitting Lipschitz extensions ,Ann. Acad. Sci. Fenn. Math. (2019), no. 1, 261–280. MR 3919136 LD17] Enrico Le Donne,

A primer on Carnot groups: homogenous groups, Carnot-Carath´eodory spaces, andregularity of their isometries , Anal. Geom. Metr. Spaces (2017), no. 1, 116–137. MR 3742567[LDLM18] Enrico Le Donne, Sean Li, and Terhi Moisala, Gˆateaux diﬀerentiability on inﬁnite-dimensional carnotgroups , arxiv (2018).[Li14] Sean Li,

Coarse diﬀerentiation and quantitative nonembeddability for Carnot groups , J. Funct. Anal. (2014), no. 7, 4616–4704. MR 3170215[Li16] ,

Markov convexity and nonembeddability of the Heisenberg group , Ann. Inst. Fourier (Grenoble) (2016), no. 4, 1615–1651. MR 3494180[LN06] J. R. Lee and A. Naor, Lp metrics on the heisenberg group and the goemans-linial conjecture , 2006 47thAnnual IEEE Symposium on Foundations of Computer Science (FOCS’06), Oct 2006, pp. 99–108.[LN14] Vincent Laﬀorgue and Assaf Naor,

Vertical versus horizontal Poincar´e inequalities on the Heisenberggroup , Israel J. Math. (2014), no. 1, 309–339. MR 3273443[LNP09] James R. Lee, Assaf Naor, and Yuval Peres,

Trees and Markov convexity , Geom. Funct. Anal. (2009),no. 5, 1609–1659. MR 2481738[MN13] Manor Mendel and Assaf Naor, Markov convexity and local rigidity of distorted metrics , J. Eur. Math.Soc. (JEMS) (2013), no. 1, 287–337. MR 2998836[Mos62] G. D. Mostow, Homogeneous spaces with ﬁnite invariant measure , Ann. of Math. (2) (1962), 17–37.MR 145007[Nao12] Assaf Naor, An introduction to the Ribe program , Jpn. J. Math. (2012), no. 2, 167–233. MR 2995229[Nao18] , Metric dimension reduction: A snapshot of the ribe program , Proc. Int. Cong. of Math., vol. 1,2018, pp. 759–838.[NPS18] Assaf Naor, Gilles Pisier, and Gideon Schechtman,

Impossibility of dimension reduction in the nuclearnorm [extended abstract] , Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on DiscreteAlgorithms, SIAM, Philadelphia, PA, 2018, pp. 1345–1352. MR 3775876[NY18] Assaf Naor and Robert Young,

Vertical perimeter versus horizontal perimeter , Ann. of Math. (2) (2018), no. 1, 171–279. MR 3815462[Ost14a] Mikhail Ostrovskii,

Radon-Nikod´ym property and thick families of geodesics , J. Math. Anal. Appl. (2014), no. 2, 906–910. MR 3103207[Ost14b] Mikhail I. Ostrovskii,

Metric spaces nonembeddable into Banach spaces with the Radon-Nikod´ym propertyand thick families of geodesics , Fund. Math. (2014), no. 1, 85–96. MR 3247034[Pan89] Pierre Pansu,

M´etriques de Carnot-Carath´eodory et quasiisom´etries des espaces sym´etriques de rang un ,Ann. of Math. (2) (1989), no. 1, 1–60. MR 979599[Pis16] Gilles Pisier,

Martingales in Banach spaces , Cambridge Studies in Advanced Mathematics, vol. 155,Cambridge University Press, Cambridge, 2016. MR 3617459[Rag72] M. S. Raghunathan,

Discrete subgroups of Lie groups , Springer-Verlag, New York-Heidelberg, 1972, Ergeb-nisse der Mathematik und ihrer Grenzgebiete, Band 68. MR 0507234[Rib76] M. Ribe,

On uniformly homeomorphic normed spaces , Ark. Mat. (1976), no. 2, 237–244. MR 0440340[RW10] S´everine Rigot and Stefan Wenger, Lipschitz non-extension theorems into jet space Carnot groups , Int.Math. Res. Not. IMRN (2010), no. 18, 3633–3648. MR 2725507[War05] Ben Warhurst,

Jet spaces as nonrigid Carnot groups , J. Lie Theory (2005), no. 1, 341–356. MR 2115247[Wol03] Thomas H. Wolﬀ, Lectures on harmonic analysis , University Lecture Series, vol. 29, American Mathemat-ical Society, Providence, RI, 2003, With a foreword by Charles Feﬀerman and a preface by Izabella (cid:32)Laba,Edited by (cid:32)Laba and Carol Shubin. MR 2003254, University Lecture Series, vol. 29, American Mathemat-ical Society, Providence, RI, 2003, With a foreword by Charles Feﬀerman and a preface by Izabella (cid:32)Laba,Edited by (cid:32)Laba and Carol Shubin. MR 2003254