Exponential bounds for random walks on hyperbolic spaces without moment conditions
aa r X i v : . [ m a t h . P R ] F e b EXPONENTIAL BOUNDS FOR RANDOM WALKS ON HYPERBOLICSPACES WITHOUT MOMENT CONDITIONS
SÉBASTIEN GOUËZEL
Abstract.
We consider nonelementary random walks on general hyperbolic spaces. With-out any moment condition on the walk, we show that it escapes linearly to infinity, withexponential error bounds. We even get such exponential bounds up to the rate of escapeof the walk. Our proof relies on an inductive decomposition of the walk, recording timesat which it could go to infinity in several independent directions, and using these times tocontrol further backtracking. Introduction
Let X be a Gromov-hyperbolic space, with a fixed basepoint o . Fix a discrete probabilitymeasure µ on the space of isometries of X . We assume that µ is non-elementary : in thesemigroup generated by the support of µ , there are two loxodromic elements with disjointfixed points. Let g , g , . . . be independent isometries of X distributed according to µ . Onecan then define a random walk on X given by Z n · o , where Z n = g · · · g n − .In general, results in the literature fall into two classes, qualitative and quantitative,where the second class requires more stringent assumptions on the walk.Without any moment assumption, it is known that Z n · o converges almost surely to apoint on the boundary ∂X , thanks to a beautiful non-constructive argument originally dueto Furstenberg [Fur63] in a matrix setting but that works in our setting when X is proper,and extended to the general situation above by Maher and Tiozzo [MT18]. The idea is to usea stationary measure on the boundary of X and the martingale convergence theorem thereto obtain the convergence of the random walk. When X is not proper, the boundary is notcompact, and showing the existence of a stationary measure on the boundary is a difficultpart of [MT18]. In this article, the authors also show linear progress, in the following sense:there exists κ > such that, almost surely, lim inf d ( o, Z n · o ) /n > κ .Assuming additional moments conditions, one gets stronger results. [MT18] shows that, if µ has finite support, then P ( d ( o, Z n · o ) κn ) is exponentially small, for some κ > (we saythat the walk makes linear progress with exponential decay). The finite support assumptionhas been weakened to an exponential moment condition in [Sun20]. More recently, still underan exponential moment condition, [BMSS20] shows (among many other results) that theexponential bound holds for any κ strictly smaller than the escape rate ℓ = lim E ( d ( o, Z n · o )) /n .When X is a hyperbolic group, one has in fact linear progress with exponential decaywithout any moment assumption: this follows from nonamenability of the group, and the Date : February 3, 2021. fact that the cardinality of balls is at most exponential. This arguments breaks down whenthe space is non-proper, though, as in many interesting examples such as the curve complex.Our goal in this paper is to show that, to have linear progress with exponential decay(even in its strongest versions), there is no need for any moment condition. Define theescape rate of the walk ℓ ( µ ) = lim E ( d ( o, Z n · o )) /n if µ has a moment of order , i.e., P µ ( g ) d ( o, g · o ) < ∞ , and ℓ ( µ ) = ∞ otherwise.Our first result is that the escape rate is positive, with an exponential error term. Theorem 1.1.
Consider a discrete non-elementary measure on the space of isometries ofa Gromov-hyperbolic space X with a basepoint o . Then there exists κ > such that, for all n , P ( d ( o, Z n · o ) κn ) e − κn . One recovers in particular that ℓ ( µ ) > , a fact already proved in [MT18]. The controlin the previous theorem can in fact be established up to the escape rate: Theorem 1.2.
Under the assumptions of Theorem 1.1, consider r < ℓ ( µ ) . Then thereexists κ > such that, for all n , P ( d ( o, Z n · o ) rn ) e − κn . In particular, when µ has no moment of order , this implies that d ( o, Z n · o ) /n → + ∞ almost surely.We also get the corresponding statement concerning directional convergence to infinity.For ξ ∈ ∂X and x, y ∈ X , denote the corresponding Gromov product by(1.1) ( x, ξ ) y = inf z n → ξ lim inf n ( x, z n ) y , where ( x, z n ) y = ( d ( y, x ) + d ( y, z n ) − d ( x, z n )) / is the usual Gromov product inside thespace (see Section 3 for more background on Gromov-hyperbolic spaces). The limit onlydepends on the choice of the sequence z n up to δ . Intuitively, ( x, ξ ) y is the distance from y to a geodesic between x and ξ . It is also the amount that x has moved in the direction of ξ compared to y . A sequence x n converges to ξ if and only if ( x n , ξ ) o → ∞ . Theorem 1.3.
Under the assumptions of Theorem 1.2, Z n · o converges almost surely to apoint Z ∞ ∈ ∂X . Moreover, for any r < ℓ ( µ ) , there exists κ > such that, for all n , P (( Z n · o, Z ∞ ) o rn ) e − κn . Theorem 1.3 readily implies Theorem 1.2 as ( Z n · o, Z ∞ ) o d ( o, Z n · o ) , which followsdirectly from the definition.The convergence statement in Theorem 1.3 is due to [MT18]. The novelty is the quanti-tative exponential bound, without any moment assumption. Note that, in both theorems,when µ has no moment of order , one may take any r > , so the conclusion is superlineargrowth with exponential decay.It follows from subadditivity that, for any r ℓ , the sequence − log( P ( d ( o, Z n · o ) rn )) /n converges to a limit I ( r ) . This is a rate function in the classical sense of large deviationsin probability theory. Theorem 1.2 shows that the rate function is strictly positive for r < ℓ , recovering part of [BMSS20, Theorem 1.2] while removing their exponential momentassumption. Note that [BMSS20] also obtains exponential estimates for upper deviation ANDOM WALKS WITHOUT MOMENT CONDITION 3 inequalities P ( d ( o, Z n · o ) > rn ) for r > ℓ . These estimates can not hold without exponentialmoments, since exponential controls for lower and upper deviation probabilities imply anexponential moment for the measure, see [BMSS20, Subsection 3.1]. Remark 1.4.
The fact that we use discrete measures in the above theorems is for con-venience only, to avoid discussing measurability issues and conditioning on zero measuresets. Suitable versions removing discreteness, but adding measurability and separabilityconditions, hold with the same proofs.Our approach is elementary, in the spirit of [MS20] and [BMSS20] (the latter article isa strong inspiration for our work), and does not rely on any boundary theory. The mainintuition is the following. In the hyperbolic plane, we define a path as follows: walk straighton during a distance d , then turn by an angle θ ¯ θ < π , then walk straight on during adistance d , then turn by an angle θ ¯ θ , and so on. If all the lengths d i are larger than aconstant D = D (¯ θ ) , then this path is essentially going straight to infinity, and at time n itis roughly at distance d + · · · + d n of the origin. The problem when doing a random walkis that the analogues of the angles θ i could be equal to π , i.e., the walker could come backexactly along its footsteps. But this should not happen often. Our main input is a technicalway to justify that indeed it does not happen often, in a precise quantitative version: we willkeep track of some times (called pivotal times below) at which the random walk can choosesome direction, with most choices leading to progress towards infinity (this is implementedthrough the notion of Schottky set coming from [BMSS20]), and at which we will keep somedegree of freedom in an inductive construction. Of course, backtracking can happen lateron, and we will spend the degree of freedom we had kept to still control the behavior afterbacktracking.We could give directly the proof of Theorem 1.3, but it would be very hard to follow.Instead, we will start with proofs of easier statements, and add new ingredients in increas-ingly complicated proofs. Section 2 is devoted to the simplest instance of our proof, in thefree group, where everything is as transparent as possible. Then, Section 3 introduces sometools of Gromov-hyperbolic geometry (notably chains, shadows and Schottky sets) that willbe used to extend the previous proof to a non-tree setting. Section 4 uses these tools in acrude way to prove Theorem 1.1, i.e., linear escape with exponential decay, and also con-vergence at infinity with exponential bounds. Section 5 follows the same strategy but in amore refined way, to get Theorems 1.2 and 1.3.2. Linear escape with exponential decay on free groups
The goal of this section is to illustrate the concept of pivotal times in the simplest possiblesetting. We show that, for a class of measures without moments on the free group, there islinear escape with exponential decay. Of course, this follows from non-amenability. Insteadof the result, what matters here is the proof: the rest of the paper is an extension of the sameidea to technically more involved contexts (general measures, Gromov-hyperbolic spaces),but the main insight can be explained much more transparently in a tree setting.
Theorem 2.1.
Let d > . Let µ be a probability measure on F d that can be written as µ S ∗ ν , where µ S is the uniform probability measure on the canonical generators of F d , and ν is a probability measure with ν ( e ) = 0 . Let Z n = g · · · g n , where the g i are independent ANDOM WALKS WITHOUT MOMENT CONDITION 4 and distributed according to µ . There exists κ > (independent of ν and of d ) such that,for all n , P ( | Z n | κn ) e − κn . Remark 2.2.
The fact that κ can be chosen independently of ν and of d does not followfrom non-amenability, and is really a byproduct of our proof technique. Remark 2.3.
The restrictions d > and ν ( e ) = 0 are simplifying assumptions to havea proof that is as streamlined as possible. In the next sections, we will prove analogoustheorems but for general measures, on general hyperbolic spaces.The key point in the proof of Theorem 2.1 is the next lemma. Lemma 2.4.
There exists κ > satisfying the following. Consider d > and n > . Fix w , . . . , w n nontrivial words in F d , and let Z n = s w · · · s n w n , where the s i are generatorsof F d , chosen uniformly and independently. Then P ( | Z n | κn ) e − κn . This lemma directly implies Theorem 2.1, by conditioning with respect to the realizationsof ν and just keeping the randomness coming from the factor µ S in µ = µ S ∗ ν .To prove the lemma, one wants to argue that the walk does not backtrack too much. Ofcourse, the walk can backtrack completely: as the size of the w i is not controlled, it mayhappen that w n is exactly inverse to s w · · · s n and therefore that Z n = e . However, thisis unlikely to happen for most choices of s , . . . , s n .A difficulty is that the distance to the origin is not well-behaved under the walk. Forinstance, assume that Z n − = e , that w n − is very long (of length n , say) and that forsome generators s and t , one has tw n = ( sw n − ) − . Then Z n − is far away from the origin,and in particular it satisfies the inequality | Z n − | > n . However, Z n is equal to the originif s n − = s and s n = t , which happens with probability / (2 d ) . This is not exponentiallysmall, even though the distance control at time n − is good.For this reason, we will not try to control inductively the distribution of the distance tothe origin. Instead, we will control a number of branching points of the random walk upto time n , that we call pivotal points . In the general case of random walks in hyperbolicspaces, the definition will be quite involved, but for trees one can give a direct definition asfollows. Denote by γ n the path in the Cayley graph of F d corresponding to the walk up to Z n , i.e., the concatenation of the geodesics from e to s then to s w then to s w s and soon until s w s w · · · s n w n = Z n . Definition 2.5.
A time k ∈ [1 , n ] is a pivotal time (with respect to n ) if s k is the inverseneither of the last letter of Z k − , nor of the first letter ( w k ) of w k (so that the path γ n islocally geodesic of length around Z k − ) and moreover the path γ n does not come back to Z k − s k afterwards.We will denote by P n the set of pivotal times with respect to n . In other words, k is pivotal if the walk at time k goes away from the origin during twosteps ( s k and then ( w k ) ) and then remains stuck in the subtree based at Z k − s k ( w k ) .The evolution of the set of pivotal times is not monotone: if the walk backtracks a lot,then many times that were pivotal with respect to n will not be any more pivotal withrespect to n + 1 , since the non-backtracking condition is not satisfied any more. On theother hand, the only possible new pivotal point is the last one: P n +1 ⊆ P n ∪ { n + 1 } . ANDOM WALKS WITHOUT MOMENT CONDITION 5
We will say that a sequence ( s ′ , . . . , s ′ n ) is pivoted from ¯ s = ( s , . . . , s n ) if they have thesame pivotal times and, additionally, s ′ k = s k for all k which is not a pivotal time. Thisis an equivalence relation. Moreover, a sequence has many pivoted sequences: if k is apivotal time and one changes s k to s ′ k which still satisfies the local geodesic condition (i.e., s ′ k is different from the last letter of Z k − and from the first letter of w k ), then we claimthat ( s , . . . , s ′ k , . . . , s n ) is pivoted from ( s , . . . , s n ) . Indeed, the part of γ n originating from Z k − s k ( w k ) never comes back on the edge from Z k − to Z k − s k (not even on its endpoints),so changing s k to s ′ k does not change this fact. Thus the behavior of γ ′ n after Z k − is exactlythe same as that of γ n , but in a different subtree – one has pivoted the end of γ n around Z k − s k , hence the name. In particular, subsequent pivotal times are the same. Moreover,since the trajectory never comes back before Z k − s k , pivotal times before k are not affected,and are the same for γ n and γ ′ n .More generally, denoting the pivotal times by p < · · · < p q , then changing the s p i to s ′ p i still satisfying the local geodesic condition gives a pivoted sequence. Let E n (¯ s ) be the set ofsequences which are pivoted from ¯ s . Conditionally on E n (¯ s ) , the previous discussion showsthat the random variables s ′ p i are independent (but not identically distributed as each ofthem is drawn from some subset of the generators depending on i , of cardinality | S | − or | S | − ). Proposition 2.6.
Let A n = | P n | be the number of pivotal times. Then, in distribution, A n +1 > A n + U where U is a random variable independent from A n and distributed asfollows: P ( U = − j ) = 2 d − d (2 d − j for j > , P ( U = 0) = 0 , P ( U = 1) = d − d . In other words, P ( A n +1 > i ) > P ( A n + U > i ) for all i .Proof. Let us fix a sequence ¯ s = ( s , . . . , s n ) , and let q = | P n | be its number of pivotal times.We will prove the estimate by conditioning on E n (¯ s ) . Let ¯ s ′ ∈ E n (¯ s ) .First, assume there are no pivotal points, i.e., q = 0 . Then for each ¯ s ′ there are at least d − generators which are different from the last letter of Z ′ n and from the first letter of w n +1 ,giving rise to one pivotal time in P ′ n +1 with probability at least (2 d − / (2 d ) = P ( U = 1) .Otherwise, | P ′ n +1 | = 0 . Conditionally on E n (¯ s ) , it follows that the conclusion of the lemmaholds.Assume now that there is at least one pivotal point. From the last pivotal time onward,the behavior is the same over all the equivalence class E n (¯ s ) , so the last letter of Z ′ n doesnot depend on ¯ s ′ . There are at least d − generators of F d which are different fromthe last letter of Z ′ n and from the first letter of w n +1 . If s ′ n +1 is such a generator, then P ′ n +1 = P ′ n ∪ { n + 1 } . Therefore, P ( A n +1 > q + 1 | E n (¯ s )) > (2 d − / (2 d ) . We have adjusted the definition of U so that the right hand side is P ( U > . ANDOM WALKS WITHOUT MOMENT CONDITION 6
Fix now s ′ n +1 which is not such a nice generator. Then s ′ n +1 w n +1 may backtrack, possiblyuntil the last pivotal point Z ′ p q , thereby decreasing the number of pivotal points with respectto n + 1 . However, it may only backtrack further if the generator s ′ p q is exactly the inverseof the corresponding letter in w n +1 . This can happen for s ′ , but then it will not happenfor all the pivoted configurations of s ′ obtained by changing s ′ p q to another generator stillsatisfying the local geodesic condition. Therefore, P ( A n +1 q − | E n (¯ s )) d × d − , where the first factor corresponds to the choice of a generator s ′ n +1 which does not satisfythe local geodesic condition, and the second factor corresponds to the choice of the specificgenerator for s ′ p q to make sure that one backtracks further.More generally, to cross j pivotal times, there is one specific choice of generator at eachof these pivotal times, which can only happen with a probability at most / (2 d − at eachof these times. Therefore, for j > , P ( A n +1 q − j | E n (¯ s )) d · d − j − . We have adjusted the distribution of U so that the right hand side is exactly P ( U − j ) .Finally, we obtain the inequalities P ( A n +1 q − j | E n (¯ s )) P ( U − j ) for j > , P ( A n +1 > q + 1 | E n (¯ s )) > P ( U > . Taking the complement in the first inequality yields P ( A n +1 > q + k | E n (¯ s )) > P ( U > k ) for all k ∈ Z . As A n is constant equal to q on E n (¯ g ) , the right hand side is P ( A n + U > q + k | E n (¯ s )) . Writing i = q + k , we have obtained for all i the inequality P ( A n +1 > i | E n (¯ s )) > P ( A n + U > i | E n (¯ s )) . As this inequality is uniform over the conditioning, it gives the conclusion of the lemma. (cid:3)
Proof of Lemma 2.4.
Let U , U , . . . be a sequence of i.i.d. random variables distributed like U in Proposition 2.6. Iterating the proposition, one gets P ( A n > k ) > P ( U + · · · + U n > k ) .The random variables U i have an exponential moment. Moreover, their expectation ispositive when d > , as it is (2 d − · ( d − / ((2 d − · d ) . Large deviations for sums ofi.i.d. real random variables with an exponential moment ensure the existence of κ > suchthat P ( U + · · · + U n κn ) e − κn for all n . Then P ( A n κn ) e − κn . As the distance tothe origin is bounded from below by the number of pivotal points, this proves Lemma 2.4,except that the constant c depends on the number of generators d . However, the randomvariables U = U ( d ) depending on d increase with d (in the sense that when d > d ′ then P ( U ( d ) > k ) > P ( U ( d ′ ) > k ) for all k ). Therefore, one can use the random variables U (3) to obtain a lower bound in all free groups F d with d > . (cid:3) The rest of the paper is devoted to the extension of this argument to general measures andgeneral Gromov-hyperbolic spaces. While the intuition will remain the same, the definitionof pivotal times will need to be adjusted, as there is no well-defined concept of subtree:instead, we will use a suitable notion of shadow, and require that the walk after the pivotaltime remains in the shadow. Also, to separate possible directions, we will rely on the notion
ANDOM WALKS WITHOUT MOMENT CONDITION 7 of Schottky sets introduced by [BMSS20], instead of just using the generators as in the freegroup. These notions are explained in the next section.3.
Prerequisites on Gromov-hyperbolic spaces
Let X be a metric space, and x, y, z ∈ X . Their Gromov product is defined by ( x, z ) y = 12 ( d ( x, y ) + d ( y, z ) − d ( x, z )) . Let δ > . A metric space is δ -Gromov hyperbolic if, for all x, y, z, a ,(3.1) ( x, z ) a > min(( x, y ) a , ( y, z ) a ) − δ. When the space is geodesic, this is equivalent (up to changing δ ) to the fact that geodesictriangles are thin, i.e., each side is contained in the δ -neighborhood of the other two sides.In the rest of the paper, X is a δ -hyperbolic metric space (without any geodesicity orproperness or separability condition). We also fix a basepoint o ∈ X .3.1. Boundary at infinity.
We recall a few basic facts on the boundary at infinity of aGromov-hyperbolic space that we will need later on.A sequence ( x n ) n ∈ N is converging at infinity if ( x n , x m ) o tends to infinity when m, n → ∞ .Two sequences ( x n ) and ( y n ) which are converging at infinity are converging to the samelimit if ( x n , y n ) o → ∞ . This is an equivalence relation, thanks to the hyperbolicity inequality.Quotienting by this equivalence relation, one gets the boundary at infinity of the space X denoted ∂X .The C -shadow of a point x , seen from o , is the set of points y such that ( y, o ) x C . Wedenote it with S o ( y ; C ) . Geometrically, this means that a geodesic from o to y goes withindistance C + O ( δ ) of x . Let us record a few classical properties of shadows. Lemma 3.1.
For y ∈ S o ( x ; C ) , one has d ( y, o ) > d ( x, o ) − C .Proof. We have d ( y, o ) = d ( y, x ) + d ( x, o ) − y, o ) x > d ( x, o ) − C. (cid:3) Lemma 3.2.
Let
C > , and let x n ∈ X be such that d ( o, x n ) → ∞ . Consider anothersequence y p such that, for all n , eventually y p ∈ S o ( x n ; C ) . Then y p converges at infinity.Proof. Fix n large. For large enough p , one has y p ∈ S o ( x n ; C ) , i.e., ( o, y p ) x n C . As ( o, y p ) x n + ( x n , y p ) o = d ( o, x n ) , this gives ( x n , y p ) o > d ( o, x n ) − C .For large enough p, q , we get (using hyperbolicity for the first inequality)(3.2) ( y p , y q ) o > min(( y p , x n ) o , ( y q , x n ) o ) − δ > d ( o, x n ) − C − δ. As d ( o, x n ) → ∞ by assumption, it follows that ( y p , y q ) o → ∞ , as claimed. (cid:3) Lemma 3.3.
Let
C > and x ∈ X . Consider y ∈ S o ( x ; C ) , and a point ξ ∈ ∂X which isa limit of points in S o ( x ; C ) . Then ( y, ξ ) o > d ( o, x ) − C − δ. Proof.
Let z n ∈ S o ( x ; C ) be a sequence converging to ξ . As the Gromov product at infinitydoes not depend on the sequence up to δ , we have ( y, ξ ) o > lim inf( y, z n ) o − δ . Moreover,as both y and z n belong to S o ( x ; C ) , the inequality (3.2) gives ( y, z n ) o > d ( o, x ) − C − δ .The conclusion follows. (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 8
Chains and shadows.
In a hyperbolic space, ( x, z ) y is roughly the distance from y to a geodesic between x and z . In particular, if ( x, z ) y C for some constant C , this meansthat the points x, y, z are roughly aligned in this order, up to an error C . We will say thatthe points are C -aligned .In a hyperbolic space, if in a sequence of points all consecutive points are C -aligned, andthe points are separated enough, then the sequence is progressing linearly, and all pointsin the sequence are C + O ( δ ) -aligned (see for instance [GdlH90, Theorem 5.3.16]). We willneed variations around this classical idea.We start with distance estimates for 3 points. Lemma 3.4. . Consider x, y, z with ( x, z ) y C . Then d ( x, z ) > d ( x, y ) − C and d ( x, z ) > d ( y, z ) − C .Proof. By symmetry, it suffices to prove the first inequality. We claim that d ( x, z ) > d ( x, y ) − ( x, z ) y , which implies the result. Expanding the definition of the Gromov product, thisinequality holds if and only if d ( y, x ) + d ( y, z ) − d ( x, z )2 + d ( x, z ) > d ( x, y ) . This reduces to d ( y, z ) + d ( x, z ) > d ( x, y ) , which is just the triangular inequality. (cid:3) The next lemma gives estimates for 4 points, from which results for more points willfollow by induction.
Lemma 3.5.
Consider w, x, y, z ∈ X , and C > . Assume ( w, y ) x C and ( x, z ) y C + δ and d ( x, y ) > C + 2 δ + 1 . Then ( w, z ) x C + δ .Proof. By definition of the Gromov product, ( x, z ) y + ( y, z ) x = d ( x, y ) . As ( x, z ) y C + δ ,we get ( y, z ) x > d ( x, y ) − C − δ . As d ( x, y ) > C + 2 δ + 1 , this gives ( y, z ) x > C + δ + 1 .Writing down the first condition and the hyperbolicity condition, we get C > ( w, y ) x > min(( w, z ) x , ( z, y ) x ) − δ. If the minimum were realized by ( z, y ) x , we would get C > ( C + δ + 1) − δ , a contradiction.Therefore, the minimum is realized by ( w, z ) x , which gives ( w, z ) x C + δ . (cid:3) Definition 3.6.
For
C, D > , a sequence of points x , . . . , x n is a ( C, D ) -chain if one has ( x i − , x i +1 ) x i C for all < i < n , and d ( x i , x i +1 ) > D for all i < n . Lemma 3.7.
Let x , . . . , x n be a ( C, D ) chain with D > C +2 δ +1 . Then ( x , x n ) x C + δ ,and (3.3) d ( x , x n ) > n − X i =0 ( d ( x i , x i +1 ) − (2 C + 2 δ )) > n. Proof.
Let us show by decreasing induction on i that ( x i − , x n ) x i C + δ , the result beingtrue for i = n − by assumption. Assume it holds for i +1 . Then the points x i − , x i , x i +1 , x n satisfy the assumptions of Lemma 3.5, which gives ( x i − , x n ) x i C + δ as desired.Let us now show that d ( x j , x n ) > P n − i = j ( d ( x i , x i +1 ) − (2 C + 2 δ )) by decreasing inductionon j , the case j = n being trivial and the case j = 0 being (3.3). We have d ( x j , x n ) = d ( x j , x j +1 ) + d ( x j +1 , x n ) − x j , x n ) x j +1 > d ( x j , x j +1 ) + d ( x j +1 , x n ) − (2 C + 2 δ ) , ANDOM WALKS WITHOUT MOMENT CONDITION 9 which concludes the induction. (cid:3)
Lemma 3.8.
Let x , . . . , x n be a ( C, D ) chain with D > C + 4 δ + 1 . Then for all i onehas ( x , x n ) x i C + 2 δ .Proof. Lemma 3.7 applied to the ( C, D ) -chain x i , x i +1 , . . . , x n gives ( x i , x n ) x i +1 C + δ .The same lemma applied to the ( C, D ) -chain x i +1 , x i , . . . , x gives ( x i +1 , x ) x i C + δ .Therefore, the points x , x i , x i +1 , x n are ( C + δ ) -aligned. Let us apply Lemma 3.5 to thesepoints, with C + δ instead of C . It gives ( x , x n ) x i C + 2 δ , as claimed. (cid:3) We will need to say that a point z belongs to a half-space based at a point y and directedtowards a point y + . The usual definition for this is the shadow of y + seen from y , definedas the set S y ( y + ; C ) of points z with ( y, z ) y + C for some suitable C . Unfortunately,this definition is not robust enough for our purposes as we will need to say that being ina half-space and walking again from y one stays in the half-space, which is not satisfied bythis definition due to the loss of δ when one applies the hyperbolicity inequality.A more robust definition can be given in terms of chains. If we have a chain (which goesroughly in a straight direction by the previous lemma) and if we prescribe the direction of itsfirst jump, then we are essentially prescribing the direction of the whole chain. This makesit possible to define another notion that we call chain-shadow, as follows. The choice of theminimal distance C + 2 δ + 1 between points in the chain in this definition is somewhatarbitrary, it should just be large enough that lemmas on the linear progress of chains apply. Definition 3.9.
Let C > and y, y + , z ∈ X . We say that z belongs to the C -chain-shadowof y + seen from y if there exists a ( C, C + 2 δ + 1) -chain x = y, x , . . . , x n = z satisfyingadditionally ( x , x ) y + C . We denote the chain-shadow with CS y ( y + ; C ) . The next lemma shows that this definition of shadow is roughly equivalent to the usualdefinition in terms of the Gromov product ( y, z ) y + . Lemma 3.10. If z ∈ CS y ( y + ; C ) , then ( y, z ) y + C + δ and d ( y, z ) > d ( y, y + ) − C − δ .Proof. Let x = y, x , . . . , x n = z be a ( C, C + 2 δ + 1) -chain as in the definition of chain-shadows. We have d ( y, z ) = d ( y, x )+ d ( x , z ) − y, z ) x = d ( y, y + )+ d ( y + , x ) − y, x ) y + + d ( x , z ) − y, z ) x . Let us bound ( y, x ) y + with C (by the definition of chain-shadows) and ( y, z ) x by C + δ (thanks to Lemma 3.7 applied to the chain x , . . . , x n ). Let us also bound from below d ( y + , x ) + d ( x , z ) with d ( y + , z ) . We get d ( y, z ) > d ( y, y + ) + d ( y + , z ) − C − δ. Expanding the definition of the Gromov product, this gives ( y, z ) y + C + δ . Then we get d ( y, z ) > d ( y, y + ) − C − δ by applying Lemma 3.4 to y, y + , z . (cid:3) Schottky sets.
To be able to prescribe enough directions at pivotal points, we willuse a variation around the notion of Schottky set in [BMSS20]. This is essentially a finiteset of isometries such that, for all x and y , most of these isometries put x and sy in generalposition with respect to o , i.e., such that x, o, sy are C -aligned for some given C . Definition 3.11.
Let η, C, D > . A finite set S of isometries of X is ( η, C, D ) -Schottky if ANDOM WALKS WITHOUT MOMENT CONDITION 10 • For all x, y ∈ X , we have |{ s ∈ S, ( x, sy ) o C }| > (1 − η ) | S | . • For all x, y ∈ X , we have |{ s ∈ S, ( x, s − y ) o C }| > (1 − η ) | S | . • For all s ∈ S , we have d ( o, so ) > D . We could define analogously a notion of an ( η, C, D ) -probability measure, where theprevious definition would be this property for the uniform measure on S .The next proposition shows that one can find Schottky sets by using powers of twoloxodromic isometries. Proposition 3.12.
Fix two loxodromic isometries u and v of X , with disjoint sets of fixedpoints at infinity. For all η > , there exists C > such that, for all D > , there exist n ∈ N and an ( η, C, D ) -Schottky set in { w · · · w n : w i ∈ { u, v }} .Proof. This is essentially a classical application of the ping-pong method. [BMSS20, Propo-sition A.2] contains a slightly less precise statement, but their proof also gives our strongerversion, as we explain now. Let S n = { w · · · w n : w i ∈ { u, v }} .The ping-pong argument at infinity shows that one can choose n large enough so that,for all m the elements w · · · w m for w i ∈ { u n , v n } are all different, loxodromic, with disjointsets of fixed points at infinity. Let us fix such an n , and then such an m with − m < η/ ,and denote these m isometries with g , . . . , g m . They all belong to S nm . Let g + i and g − i be their attractive and repulsive fixed points.Let K be large enough. Define a neighborhood V ( g + i ) = { x ∈ X : ( x, g + i ) o > K } anda smaller neighborhood V ′ ( g + i ) = { x ∈ X : ( x, g + i ) o > K + δ } . In the same way, define V ( g − i ) and V ′ ( g − i ) . If K is large enough, then the m +1 sets ( V ( g ± i )) i =1 ,..., m are disjoint asthe fixed points at infinity of the g i are all different. Moreover, for large enough p , then g pi maps the complement of V ( g − i ) to V ′ ( g + i ) , and the complement of V ( g + i ) to V ′ ( g − i ) .We claim that, for all D , if p is large enough, then S = { g p , . . . , g p m } is an ( η, K + δ, D ) -Schottky set. As all these elements belong to S nmp , this will prove the theorem. First, thecondition d ( o, so ) > D for s = g pi is true if p is large enough, as g i is loxodromic. Let usshow that |{ s ∈ S, ( x, sy ) o K + δ }| > (1 − η ) | S | for all x, y (the corresponding inequalitywith s − is similar). There is at most one s = g i for which y ∈ V ( g − i ) , as all these sets aredisjoint. There is also at most one s = g j for which x ∈ V ( g + j ) , again by disjointness. If s = g k is not one of these two, we claim that ( x, sy ) o K + δ . This will prove the result,since this implies |{ s ∈ S, ( x, sy ) o K + δ }| > | S | − m − | S | (1 − · − m ) > (1 − η ) | S | . As x / ∈ V ( g + k ) , we have ( x, g + k ) o < K . As y / ∈ V ( g − k ) , we have sy = g k y ∈ V ′ ( g + k ) , i.e., ( sy, g + k ) o > K + δ . By hyperbolicity, we obtain K > ( x, g + k ) o > min(( x, sy ) o , ( sy, g + k ) o ) − δ. (Note that the hyperbolicity inequality (3.1), initially stated inside the space, remains truefor the Gromov product at infinity as we have used an inf in its definition (1.1)). If theminimum were realized by ( sy, g + k ) o > K + δ , we would get K > ( K + δ ) − δ , a contradiction.Therefore, the minimum is realized by ( x, sy ) o , yielding K > ( x, sy ) o − δ as claimed. (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 11
Corollary 3.13.
Let µ be a non-elementary discrete measure on the set of isometries of X . For all η > , there exists C > such that, for all D > , there exist M > and an ( η, C, D ) -Schottky set in the support of µ M .Proof. By definition of a non-elementary measure, one can find loxodromic elements u and v with disjoint fixed points in the support of µ a and µ b for some a, b > . Then u = u b and v = v a belong to the support of µ ab and have disjoint fixed points. ApplyingProposition 3.12, we obtain an ( η, C, D ) -Schottky set in the support of µ abn as desired. (cid:3) Linear escape
In this section, we prove Theorem 1.1, i.e., the random walk on X driven by a non-elementary measure escapes linearly towards infinity, with exponential bounds. We copythe proof of Section 2, replacing subtrees with chain-shadows in the definition of pivotaltimes, and generators with elements of a Schottky set. The reader who would prefer touse shadows instead of chain-shadows may do so for intuition, but should be warned thatthe argument will then barely fail (at a single place, the backtracking step in the proof ofLemma 4.8).Like in Section 2, the main technical part is to understand what happens for walks ofthe form w s w · · · w n − s n w n , where the w i are fixed, while the s i are random, and drawnfrom a Schottky set. This will be done in Subsection 4.1, while the application to proveTheorem 1.1 is done in Subsection 4.24.1. A simple model.
In this section, we fix isometries w , w , · · · of X , a constant C > ,and S a (1 / , C , D ) -Schottky set of isometries of X . We will assume that D is largeenough compared to C (for definiteness D > C + 100 δ + 1 will do). Let µ S be theuniform measure on S . Let s i be i.i.d. random variables distributed like µ S .We form a random process on X by composing the w i and s i and applying them to thebasepoint o . Our goal is to understand the behavior of y − n +1 = w s w · · · s n w n · o when n tends to infinity. The main result of this subsection is the following proposition. Proposition 4.1.
There exists a universal constant κ > (independent of everything) suchthat, for all n , P ( d ( o, y − n +1 ) κn ) e − κn . Write s i = a i b i with a i , b i ∈ S . We define y − i = w s w · · · s i − w i − · o, y i = w s w · · · w i − a i · o, y + i = w s w · · · w i − a i b i · o, the three points visited during the transition around i . We have d ( y − i , y i ) = d ( o, a i · o ) > D as a i belongs to the (1 / , C , D ) -Schottky set S . In the same way, d ( y i , y + i ) > D . A difficultythat we will need to handle is that d ( y + i , y − i +1 ) may be short, as there is no lower bound on w i , while we need long jumps everywhere to apply the results on chains of Subsection 3.2.We will define a sequence of pivotal times P n ⊆ { , . . . , n } , evolving with time: when goingfrom n to n + 1 , we will either add a pivotal time at time n + 1 (so that P n +1 = P n ∪ { n + 1 } ,if the walk is going more towards infinity), or we will remove a few pivotal times at the endbecause the walk has backtracked (in this case, P n +1 = P n ∩ { , . . . , m } for some m ).Let us define inductively the pivotal times, starting from P = ∅ . Assume that P n − is defined, and let us define P n . Let k = k ( n ) be the last pivotal time before n , i.e., ANDOM WALKS WITHOUT MOMENT CONDITION 12 k = max( P n − ) . (If P n − = ∅ , take k = 0 and let y k = o – we will essentially ignore theminor adjustments to be made in this special case in the forthcoming discussion). Let ussay that the local geodesic condition is satisfied at time n if(4.1) ( y k , y n ) y − n C , ( y − n , y + n ) y n C , ( y n , y − n +1 ) y + n C . In other words, the points y k , y − n , y n , y + n , y − n +1 follow each other successively, with a C -alignment condition. As the points are well separated by the definition of Schottky sets,this will guarantee that we have a chain, progressing in a definite direction.If the local geodesic condition is satisfied at time n , then we say that n is a pivotal time,and we set P n = P n − ∪ { n } . Otherwise, we backtrack to the largest pivotal time m ∈ P n − for which y − n +1 belongs to the ( C + δ ) -chain-shadow of y + m seen from y m . In this case, weerase all later pivotal times, i.e., we set P n = P n − ∩ { , . . . , m } . If there is no such pivotaltime m , we set P n = ∅ . Lemma 4.2.
Assume that P n is nonempty. Let m be its maximum. Then y − n +1 belongs tothe ( C + δ ) -chain-shadow of y + m seen from y m .Proof. If P n has been defined from P n − by backtracking, then the conclusion of the lemmais a direct consequence of the definition. Otherwise, the last pivotal time is n . In thiscase, let us show that y − n +1 belongs to the ( C + δ ) -chain-shadow of y + n seen from y n , byconsidering the chain y n , y − n +1 . By definition of the chain-shadow, we should check that ( y n , y − n +1 ) y + n C + δ and d ( y n , y − n +1 ) > C + 4 δ + 1 . The first inequality is obviousas ( y n , y − n +1 ) y + n C C + δ by the local geodesic condition (4.1). Moreover, since ( y n , y − n +1 ) y + n C by (4.1), Lemma 3.4 gives d ( y n , y − n +1 ) > d ( y n , y + n ) − C > D − C , whichis > C + 4 δ + 1 if D is large enough. (cid:3) Lemma 4.3.
Let P n = { k < · · · < k p } . Then the sequence y − k , y k , y − k , y k , . . . , y k p , y − n +1 is a (2 C + 3 δ, D − C − δ ) -chain.Proof. Let us first check the condition on Gromov products. We have to show that ( y k i − , y k i ) y − ki C + 3 δ and ( y − k i , y − k i +1 ) y ki C + 3 δ . The first inequality is obvious, as it follows from thefirst property in the local geodesic condition when introducing the pivotal time k i . Let usshow the second one. Lemma 4.2 applied to the time k i +1 − shows that y − k i +1 belongs to the ( C + δ ) chain-shadow of y + k i seen from y k i . Lemma 3.10 thus yields ( y k i +1 , y k i ) y + ki C + 3 δ .Moreover, ( y + k i , y − k i ) y ki C by the local geodesic condition when introducing the pivotaltime k i . We apply Lemma 3.5 with the points y − k i , y k i , y + k i , y − k i +1 , with C = 2 C + 2 δ . As d ( y k i , y + k i ) > D is large enough, this lemma applies and gives ( y − k i , y − k i +1 ) y ki C + 3 δ . Thisis the desired inequality.Let us check the condition on distances. We have to show that d ( y − k i , y k i ) > D − C − δ and d ( y k i , y − k i +1 ) > D − C − δ . The first condition is obvious as d ( y − k i , y k i ) > D . For thesecond, Lemma 3.10 gives d ( y k i , y − k i +1 ) > d ( y k i , y + k i ) − C − δ > D − C − δ . (cid:3) The first point in the previous chain can be replaced with o : ANDOM WALKS WITHOUT MOMENT CONDITION 13
Lemma 4.4.
Let P n = { k < · · · < k p } . Then the sequence o, y k , y − k , y k , . . . , y k p , y − n +1 isa (2 C + 4 δ, D − C − δ ) -chain.Proof. We have to control d ( o, y k ) and ( o, y − k ) y k as the other quantities are controlledby Lemma 4.3. For this, we will apply Lemma 3.5 to the points y − k , y k , y − k , o with C =2 C + 3 δ . We have ( y − k , y − k ) y k C + 3 δ by Lemma 4.3, and ( y k , o ) y − k C (this is thefirst property in the local geodesic condition when introducing the pivotal time k ), and d ( y k , y − k ) > D > C + δ + 1 . Therefore, Lemma 3.5 gives ( y − k , o ) y k C + 4 δ . Moreover,Lemma 3.4 gives d ( y k , o ) > d ( y k , y − k ) − ( y k , o ) y − k > D − C > D − C − δ. (cid:3) Proposition 4.5.
We have d ( o, y − n +1 ) > | P n | .Proof. This follows from Lemma 4.4, saying that we have a chain of length at least | P n | between o and y − n +1 , and from Lemma 3.7, saying that the distance grows linearly along achain. (cid:3) This proposition shows that, to obtain the linear escape rate with exponential decay, itsuffices to show that there are linearly many pivotal times.
Lemma 4.6.
Fix s , . . . , s n , and draw s n +1 according to µ S . The probability that | P n +1 | = | P n | + 1 (i.e., that n + 1 gets added as a pivotal time) is at least / .Proof. In the local geodesic condition (4.1), the last property reads ( g · o, gb n w n · o ) gb n · o C for g = w s · · · w n − a n . Composing with b − n g − , it becomes ( b − n · o, w n · o ) o C . By thedefinition of a Schottky set, this inequality is satisfied with probability at least − η = 99 / when choosing b n . Once b n is fixed, the other two properties in the geodesic condition onlydepend on a n , and each of them is satisfied with probability at least / , again by theSchottky property. They are satisfied simultaneously with probability at least / . As (99 / · (98 / > / , this concludes the proof. (cid:3) The key point is to control the backtracking length. For this, we will see that for oneconfiguration that backtracks a lot, there are many configurations that do not. Given ¯ s = ( s , . . . , s n ) , let us say that another sequence ¯ s ′ = ( s ′ , . . . , s ′ n ) is pivoted from ¯ s if theyhave the same pivotal times, b ′ k = b k for all k , and a ′ k = a k when k is not a pivotal time. Lemma 4.7.
Let i be a pivotal time of ¯ s = ( s , . . . , s n ) . Replace s i = a i b i with s ′ i = a ′ i b i which still satisfies the local geodesic condition (4.1) (with n replaced by i ). Then ( s , . . . , s ′ i , . . . , s n ) is pivoted from ¯ s .Proof. We should show that the pivotal times of ¯ s ′ are the same as those of ¯ s . Until time i , the sequences are the same, hence they have the same pivotal times: P i − (¯ s ) = P i − (¯ s ′ ) .Then i is added as a pivotal time for both ¯ s and ¯ s ′ by assumption, therefore P i (¯ s ) = P i (¯ s ′ ) .Then the remaining part of the trajectory for ¯ s never backtracks beyond i , as i remains apivotal time. This backtracking property is defined in terms of the relative position of thetrajectory compared to y i and y + i , and therefore it depends on b i but not on the beginningof the trajectory (and in particular it does not depend on a i ). Hence, replacing a i with a ′ i does not change the backtrackings, which are the same for ¯ s and ¯ s ′ until time n . (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 14
Lemma 4.7 shows that, if a trajectory has p pivotal times, then it has a lot of pivotedtrajectories (exponentially many in p ) as one can change a i to a ′ i at each pivotal time.Denote by E n (¯ s ) the set of trajectories which are pivoted from ¯ s . Conditionally on E n (¯ s ) ,the random variables a ′ i for i a pivotal time are independent (but not identically distributed,as they are each drawn from a subset of S depending on i , of large cardinality. Lemma 4.8.
Let ¯ s = ( s , . . . , s n ) be a trajectory with q pivotal times. We condition on E n (¯ s ) , and we draw s n +1 according to µ S . Then, for all j > , P ( | P n +1 | < q − j | E n (¯ s )) / j +1 . Proof. If q = 0 , then the result follows readily from Lemma 4.6. Assume q > .First, the probability that s n +1 creates a new pivotal time is at least / , by Lemma 4.6(and the elements s n +1 that create a new pivotal time are the same over the whole equiva-lence class E n (¯ s ) as q > ). Let us now fix a bad s n +1 , giving rise to backtracking.Let us show the lemma for j = 1 . Let m < k be the last two pivotal times. We have toshow that(4.2) P ( | P n +1 | < q − | E n (¯ s ) , s n +1 ) / , i.e., most trajectories do not backtrack beyond k : for many choices of a k , then y − n +1 shouldbelong to the ( C + δ ) -chain-shadow of y + m seen from y m . By Lemma 4.2 applied at time k − , we already know that y − k belongs to this set. Therefore, there exists a chain x = y m , x , . . . , x i = y − k pointing in the chain-shadow. With a good choice of a k , we will increasethe chain by adding y − n +1 at its end.Let us consider a ′ k so that the points x i − , y − k , y k , y − n +1 are C -aligned, i.e., such that ( x i − , y k ) y − k C and ( y − k , y − n +1 ) y k C . By the Schottky property, there are at least (98 / | S | such a ′ k . Let us show that, with this choice, y − n +1 belongs to the chain-shadowof y + m seen from y m (and therefore backtracking stops here). For this, it is enough to see that x , . . . , x i − , y − k , y − n +1 is a ( C + δ, C + 4 δ + 1) -chain. We have to see that d ( y − k , y − n +1 ) > C + 4 δ + 1 and ( x i − , y − n +1 ) y − k C + δ . For this, apply Lemma 3.5 to the points x i − , y − k , y k , y − n +1 , which are C -aligned. As d ( y − k , y k ) > D is large enough, this lemma gives ( x i − , y − n +1 ) y − k C + δ . Moreover, Lemma 3.4 gives d ( y − k , y − n +1 ) > d ( y − k , y k ) − ( y − k , y − n +1 ) y k > D − C > C + 4 δ + 1 , as claimed.In the equivalence class, the number of possible choices for a ′ k when introducing thepivotal time k is at least (98 / | S | , since most choices satisfy the local geodesic condition(see the proof of Lemma 4.6). The number of choices of a ′ k that ensure there is no furtherbacktracking is also bounded below by (98 / | S | , by the previous discussion, so that thenumber of bad choices is at most (1 − (98 / | S | . Finally, the proportion of bad choicesthat lead to further backtracking is at most (1 − (98 / | S | (98 / | S | < . This proves (4.2) for j = 1 .To prove the lemma for j = 2 , let us fix s n +1 as well as a bad choice of a ′ k that gives riseto backtracking beyond k (this happens with probability at most / ). We have to see ANDOM WALKS WITHOUT MOMENT CONDITION 15 that, once these quantities are fixed, the probability to backtrack past the previous pivotaltime is at most / . This is the same argument as above. The case of general j is provedanalogously by induction. (cid:3) Lemma 4.9.
Let A n = | P n | be the number of pivotal times. Then, in distribution, A n +1 > A n + U where U is a random variable independent from A n and distributed as follows: P ( U = − j ) = 910 j +1 for j > , P ( U = 0) = 0 , P ( U = 1) = 910 . In other words, P ( A n +1 > i ) > P ( A n + U > i ) for all i .Proof. Conditionally on E n (¯ s ) , this follows from Lemma 4.8, just like in the proof of Propo-sition 2.6: one shows that P ( A n +1 > i | E n (¯ s )) > P ( A n + U > i | E n (¯ s )) . As the inequality is uniform over the conditioning, the unconditioned version follows. (cid:3)
Proposition 4.10.
There exists a universal constant κ > such that, for all n , P ( | P n | κn ) e − κn . Proof.
Let U , U , . . . be a sequence of independent copies of the variable U from Lemma 4.9.Iterating this lemma gives P ( | P n | > i ) > P ( U + · · · + U n > i ) for all i . In particular, P ( | P n | κn ) P ( U + · · · + U n κn ) . As the U i are real randomvariables with an exponential moment and positive expectation, P ( U + · · · + U n κn ) isexponentially small if κ is small enough. (cid:3) Proof of Proposition 4.1.
The linear escape with exponential error term follows from Propo-sition 4.5 giving d ( o, y − n +1 ) > | P n | , and from Proposition 4.10 ensuring that | P n | growslinearly outside of a set of exponentially small probability. (cid:3) Proof of linear escape and convergence at infinity.
Let µ be a non-elementarymeasure on the set of isometries of the space X . In this subsection, we prove Theorem 1.1:the µ -random walk goes to infinity linearly, with an exponential error term. The techniqueswe develop along the way will also prove convergence of the walk at infinity.We apply Corollary 3.13 with η = 1 / . Let C = C be given by this corollary. Choose D = D ( C , δ ) large enough so that the result of the previous Subsection apply ( D =20 C + 100 δ + 1 suffices). The corollary gives an ( η, C , D ) Schottky set S included inthe support of µ M for some M . For α > small enough and N = 2 M , we may write µ N = αµ S + (1 − α ) ν for some probability measure ν , where µ S is the uniform measure on S . As in [BMSS20, Section 6], let us reconstruct in a slightly indirect way the random walk,as follows, on a space Ω containing Bernoulli random variables ε i (satisfying P ( ε i = 1) = α and P ( ε i = 0) = 1 − α ) and variables h i distributed according to ν and variables s i = a i b i ANDOM WALKS WITHOUT MOMENT CONDITION 16 distributed according to µ S , all independent. Define γ i = s i if ε i = 1 , and γ i = h i if ε i = 0 .Then γ · · · γ n − is distributed like Z Nn . With a standard coupling argument, extending Ω ifnecessary, we can also construct on Ω a sequence of independent random variables g , g , . . . with distribution µ such that γ i = g iN · · · g iN + N − .Let t < t < · · · be the times where ε i = 1 . Fix n ∈ N . We let τ = τ ( n ) be the last index j such that N ( t j + 1) n , so that the interval [ N t j , N ( t j + 1)) is contained in [0 , n ) . We willdecompose the product g · · · g n − as a product of the elements s ′ j = s t j (the product of all g i for i ∈ [ N t j , N ( t j + 1)) ) interspersed with other words that we will consider as fixed, tobe in the framework of Subsection 4.1. Let w j = g N ( t j +1) · · · g Nt j +1 − (where by convention t = 0 ), and let w ′ = w ′ ( n ) = g N ( t τ ( n ) +1) · · · g n − be the last missing word (it really dependson n , contrary to the previous words that just fill the gaps between blocks corresponding to ε j = 1 ). By construction, Z n · o = w s ′ w · · · w τ − s ′ τ w ′ ( n ) · o. We can associate to this decomposition a sequence of pivotal times P ( n )1 , . . . , P ( n ) τ , wherethe exponent ( n ) is here to emphasize that the intermediate words we use depend on n . Infact, the only word that really depends on n is the last word w ′ = w ′ ( n ) , as the other onesare w j = g ( N +1) t j · · · g Nt j +1 − so they only depend on t j . Hence, the sequence of pivotaltimes is rather(4.3) P , P , . . . , P τ − , P ( n ) τ . The main quantity we will control is u n := (cid:12)(cid:12)(cid:12) P ( n ) τ ( n ) (cid:12)(cid:12)(cid:12) , the final number of pivotal times after n steps of the initial random walk. Proposition 4.11.
There exists κ > such that P ( u n κn ) e − κn .Proof. The sequence t j +1 − t j is a sequence of independent random variables with an expo-nential tail. Therefore, there exist C > and κ > such that P ( t j > Cj ) = P j − X i =0 ( t i +1 − t i ) > Cj ! e − κj . Hence, if β > is small enough, we have N ( t ⌊ βn ⌋ + 1) n outside of a set with exponentiallysmall probability. This gives P ( τ ( n ) > βn ) e − κn for some κ > . For any c > , we get P ( u n cn ) e − κn + P ( u n cn, τ > βn ) . Let us concentrate on the second set. We condition with respect to the ε i (which fixes the t i , and τ ) and with respect to the g i outside of the intervals [ N t j , N ( t j + 1)) (which fixesthe w j and w ′ ). Once these are fixed, we are in the framework of Subsection 4.1. We maytherefore apply Proposition 4.10 and deduce that, conditionally on these quantities, we have P ( u n cτ ) e − cτ , for some c > . As τ > βn , this gives conditionally P ( u n cβn ) e − cβn .As this is uniform on the conditioning, this implies the conclusion. (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 17
Proof of Theorem 1.1.
Outside of a set with exponentially small probability, the numberof pivotal times at the n -th step of the random walk is at least κn for some κ > , byProposition 4.11. As the distance to the origin is bounded below by the number of pivotaltimes, by Proposition 4.5, this concludes the proof. (cid:3) This argument enables us to recover a theorem of [MT18], the convergence of the walkat infinity. We even get exponential error terms in the speed of convergence. We start witha lemma ensuring that positions of the random walk stay in a shadow.
Lemma 4.12.
Let n ∈ N and C > . Assume that, for all k > n , one has u k > C . Let x be the position of the walk at the C -th pivotal time in P ( n ) τ ( n ) . Then, for all k > n , the point Z k · o belongs to the (2 C + 6 δ ) -shadow of x seen from o .Proof. For k > n , the set P ( k ) τ ( k ) has strictly more than C points by assumption. In particular,the C -th pivotal time is not introduced at the last step, and the last step does not backtrackbeyond this point. The set of pivotal times before the last index does not depend on k , asexplained before (4.3). It follows that the C -th pivotal time in P ( k ) τ ( k ) is independent of k > n .In particular, x is the position of the walk at a pivotal time in P ( k ) τ ( k ) , for any k > n .For k > n , Lemma 4.4 shows that there is a (2 C + 4 δ, D − C − δ ) -chain from o to Z k · o going through x . By Lemma 3.8, we deduce that ( o, Z k · o ) x C + 6 δ . In other words, allthe points Z k · o remain in the (2 C + 6 δ ) -shadow of x seen from o , as claimed. (cid:3) Proposition 4.13.
Almost surely, there is a point Z ∞ ∈ ∂X such that Z n · o converges to Z ∞ . Moreover, there exists κ > such that (4.4) P (( Z n · o, Z ∞ ) o κn ) e − κn . Proof.
Fix c > such that P ( u n cn ) e − cn , by Lemma 4.11. Since P ( u n cn ) isexponentially small, Borel-Cantelli ensures that almost surely one has eventually u n > cn .Lemma 4.12 then applies, with C = ⌊ cn ⌋ − . Let x n denote the position of the walk at the ( ⌊ cn ⌋ − -th pivotal time for large n . By Proposition 4.5, it satisfies(4.5) d ( o, x n ) > ⌊ cn ⌋ − . The sequence Z k · o is eventually trapped in the shadow of x n seen from o by Lemma 4.12.This implies the convergence at infinity of Z k · o , by Lemma 3.2.Finally, let us show the quantitative estimate (4.4). Assume that for all k > n , one has u k > ck (this happens with probability at least − Ce − cn ). In this case, all the points Z k · o for k > n belong to the (2 C + 6 δ ) -shadow of x n . Therefore, Lemma 3.3 applies and gives(4.6) ( Z n · o, Z ∞ ) o > d ( o, x n ) − (2 C + 6 δ ) − δ. Together with (4.5), this gives a linear lower bound for the Gromov product, that holdsoutside of an exponentially small set. (cid:3)
We will also need the following lemma, that follows from the same techniques.
ANDOM WALKS WITHOUT MOMENT CONDITION 18
Lemma 4.14.
Let µ be a non-elementary discrete measure on the set of isometries of aGromov-hyperbolic space X with basepoint o . Let Z n = g · · · g n − where the g i are i.i.d.with distribution µ . Let ε > . There exists C > such that, for any isometry g , P ( ∀ n, d ( o, gZ n · o ) > d ( o, g · o ) − C ) > − ε. The point of the lemma is that the possible loss C is uniform in g . Without momentassumptions on µ , it is not possible to get a better bound, contrary to the case of walkswith an exponential moment (compare [BMSS20, Theorem 2.12]). Proof.
We follow the same construction as at the beginning of this subsection to reconstructthe random walk, but adding the isometry g before the first step of the random walk. Sincethe estimates of Subsection 4.1 are uniform in w , replacing w with gw does not changethem. Therefore, the number u n := (cid:12)(cid:12)(cid:12) P ( n ) τ ( n ) (cid:12)(cid:12)(cid:12) of pivotal times for the random walk at time n still satisfies the estimate of Proposition 4.11: there exists κ > (independent of g ) suchthat P ( u n κn ) e − κn .Let us fix n such that P i > n e − κi < ε/ . On a set A g of probability at least − ε/ (whichmay depend on g ), one has for all i > n the inequality u i > κi > κn . As in the proof ofProposition 4.13, one can then find a point x n such that, for all i > n , the points gZ i · o belong to the (2 C + 6 δ ) -shadow of x n seen from o . In particular, by Lemma 3.1, d ( gZ i · o, o ) > d ( o, x n ) − C − δ. Moreover, x n is of the form gZ k · o for some k n .By measurability, we can find a set A (independent of g ) of measure at least − ε/ anda constant C such that, for all ω ∈ A and all k n , holds d ( o, Z k · o ) C .Consider ω ∈ A g ∩ A (this set has measure at least − ε ). Then d ( o, x n ) = d ( o, gZ k · o ) > d ( o, g · o ) − d ( g · o, gZ k · o ) = d ( o, g · o ) − d ( o, Z k · o ) > d ( o, g · o ) − C. For all i > n , we get d ( gZ i · o, o ) > d ( o, g · o ) − C − C − δ . For i < n , this estimate alsoholds as d ( o, Z i · o ) C . This proves the lemma, for the constant C + 4 C + 12 δ which isindependent of g . (cid:3) Precise estimates
A more complicated model.
To obtain precise estimates on the rate of convergenceto infinity, we will need to compare the distance to the origin with the sum of independentreal valued random variables corresponding to the size of jumps of the random walk. Thisis done in the next proposition.
Proposition 5.1.
For η ∈ (0 , / , there exists κ = κ ( η ) > with the following property.Let S be an ( η, C , D ) -Schottky set of isometries of a δ -hyperbolic space X with basepoint o , where D is large enough compared to C (for definiteness D > C + 100 δ + 1 is enough).Let ρ , ρ , . . . be probability measures on the isometry set of X . Let R be a nonnegative realrandom variable such that for all i and all M > one has P ρ i ( d ( o, g · o ) > M ) > P ( R > M ) , i.e., the distance with respect to the origin for ρ i dominates stochastically R , for all i . ANDOM WALKS WITHOUT MOMENT CONDITION 19
Let w , w , . . . be fixed isometries of X . Let s , s , . . . be independent random variables,where s i is sampled according to µ S ∗ ρ i ∗ µ S . Define y − n +1 = w s w · · · s n w n · o . Then forall M > , P ( d ( o, y − n +1 ) M ) P ( R + · · · + R ⌊ (1 − η ) n ⌋ M ) + e − κn , where R , R , . . . are independent copies of R . When all the ρ i are the Dirac mass at the origin, then the setting of the proposition isessentially the same as the simple model of Subsection 4.1, except that we are sampling the s i according to µ S instead of µ S (which does not really make a difference). The conclusionin the general setting of Proposition 5.1 is that the growth rate of the distance to the originis at least the growth rate of sums of i.i.d. random variables distributed like the ρ i , up to aminor loss (that tends to when the proportion η of bad elements in the Schottky set tendsto ) and an exponentially small error term. This model will be precise enough to capturethe right growth rate of a general random walk, to prove Theorems 1.2 and 1.3 in the nextparagraphs, in the same way that we have deduced linear escape with exponential estimatesfrom the results on the simple model of Subsection 4.1. The possibility to have differentmeasures ρ i at the different jumps will be important in the application of this propositionin Subsection 5.3, but for the proof the reader may pretend for simplicity that they are allequal to a fixed measure ρ (and then one can take R to be the distribution of d ( o, g · o ) withrespect to ρ ).To prove Proposition 5.1, let us introduce a refined notion of pivotal times, in whichwe will keep the randomness coming from the ρ i . Write s i = a i b i r i c i d i , where a i , b i , c i , d i are distributed according to µ S while r i is distributed according to ρ i . This gives rise to successive points at the i -th transition: y − i = y (0) i = w s · · · s i − w i − · o, y (1) i = w s · · · s i − w i − a i · o,y (2) i = w s · · · s i − w i − a i b i · o, y (3) i = w s · · · s i − w i − a i b i r i · o,y i = y (4) i = w s · · · s i − w i − a i b i r i c i · o, y + i = y (5) i = w s · · · s i − w i − a i b i r i c i d i · o. The distances between two successive points in this list is at least D as it comes from theapplication of an element of the Schottky set S , except for the distance between y (2) i and y (3) i for which we have no lower bound as r i is drawn according to ρ i .Let us define inductively a set of refined pivotal times, that we will denote by ¯ P n todifferentiate it from the previous unrefined notion. We copy the definition of Subsection 4.1.We start from ¯ P = ∅ . Assume that ¯ P n − is defined, and let us define ¯ P n . Let k = k ( n ) be the last pivotal time before n , i.e., k = max( ¯ P n − ) . (If ¯ P n − = ∅ , take k = 0 and let y k = o ). Let us say that the local geodesic condition is satisfied at time n if inthe sequence y k , y (0) n , y (1) n , y (2) n , y (3) n , y (4) n , y (5) n , y − n +1 , all successive points are C -aligned, andmoreover y (1) n , y (3) n , y (4) n are C -aligned (the latter condition is useful to compensate the factthat the jump from y (2) n to y (3) n may be small, preventing us to apply the results on chains ofSubsection 3.2). If the local geodesic condition is satisfied at time n , then we say that n isa refined pivotal time, and we set ¯ P n = ¯ P n − ∪ { n } . Otherwise, we backtrack to the largestrefined pivotal time m ∈ ¯ P n − for which y − n +1 belongs to the ( C + δ ) chain-shadow of y + m seen ANDOM WALKS WITHOUT MOMENT CONDITION 20 from y m . In this case, we erase all later pivotal times, i.e., we set ¯ P n = ¯ P n − ∩ { , . . . , m } .If there is no such pivotal time m , we set ¯ P n = ∅ .For the refined notion, we can prove the analogues of the lemmas of Subsection 4.1. Lemma 5.2.
Assume that ¯ P n is nonempty. Let m be its maximum. Then y − n +1 belongs tothe ( C + δ ) chain-shadow of y + m seen from y m .Proof. The proof is exactly the same as for Lemma 4.2: when there is backtracking, thisfollows from the definition, and when there is no backtracking (i.e., the last pivotal timeis n ), then the chain y n , y − n +1 satisfies all the properties to show that y − n +1 is in the chain-shadow. (cid:3) Lemma 5.3.
Let ¯ P n = { k < · · · < k p } . Then the sequence y − k , y k , y − k , y k , . . . , y k p , y − n +1 is a (2 C + 3 δ, D − C − δ ) -chain. Moreover, d ( y − k i , y k i ) > d ( o, r k i · o ) + D for all i .Proof. This differs a little bit from the proof of Lemma 4.3 as there are more points involvedat each pivotal time. It is still basic chain manipulations, with the only difficulty that thejumps corresponding to r i and w i may be short (but since they are surrounded by big jumpswith controlled alignment conditions this can be circumvented easily).By definition, the points y k i − , y − k i , y (1) k i , y (2) k i , y (3) k i , y (4) k i , y (5) k i are C -aligned. However, thedistances between y k i − and y − k i on the one hand, and between y (2) k i and y (3) k i on the otherhand, are not obviously bounded below (contrary to the other distances, which are > D ),so one can not apply the results on chains to these points. However, we can fix this byremoving one point: we claim that(5.1) y k i − , y − k i , y (1) k i , y (3) k i , y (4) k i (= y k i ) , y (5) k i form a ( C + δ, D − C − δ ) chain.Let us prove this claim. We may apply Lemma 3.5 to the points y − k i , y (1) k i , y (2) k i , y (3) k i , with C = C , to deduce that ( y − k i , y (3) k i ) y (1) ki C + δ . Moreover, Lemma 3.4 gives d ( y (1) k i , y (3) k i ) > d ( y (1) k i , y (2) k i ) − ( y (1) k i , y (3) k i ) y (2) ki > D − C . Moreover, d ( y k i − , y − k i ) > D − C − δ by Lemma 3.10,as y − k i is in the ( C + δ ) chain shadow of y + k i − seen from y k i − , by Lemma 5.2. Finally, notethat ( y (1) k i , y (4) k i ) y (3) ki C by the last assumption in the local geodesic condition. We havechecked all the nontrivial properties in (5.1), completing its proof.We have in particular d ( y k i − , y − k i ) > D − C − δ , and also by (3.3)(5.2) d ( y − k i , y k i ) = d ( y − k i , y (4) k i ) > d ( y − k i , y (1) k i ) + d ( y (1) k i , y (3) k i ) + d ( y (3) k i , y (4) k i ) − C + δ ) . By Lemma 3.4 applied to y (1) k i , y (2) k i , y (3) k i , d ( y (1) k i , y (3) k i ) > d ( y (2) k i , y (3) k i ) − ( y (1) k i , y (3) k i ) y (2) ki > d ( o, r i · o ) − C . The two other distances in (5.2) are bounded below by D . Using D > C + δ ) + C , weobtain d ( y − k i , y k i ) > D + d ( o, r i · o ) . This proves all the distance conditions in the claim of the lemma.
ANDOM WALKS WITHOUT MOMENT CONDITION 21
Let us now check the Gromov product estimates. Applying Lemma 3.7 to the chain (5.1),we get ( y k i − , y k i ) y − ki C + 2 δ C + 3 δ , proving one of the desired estimates. Theother one is ( y − k i , y − k i +1 ) y ki C + 3 δ . To prove it, let us apply Lemma 3.5 to the points y − k i , y k i , y + k i , y − k i +1 . The Gromov product of the last three is at most C + 3 δ by Lemmas 5.2and 3.10, and the Gromov product of the first three is at most C +2 δ by applying Lemma 3.7to the reverse of the chain (5.1). Moreover, the distance d ( y k i , y + k i ) is at least D , large enough.Therefore, Lemma 3.5 indeed applies with C = 2 C + 2 δ , and gives ( y − k i , y − k i +1 ) y ki C + 3 δ as claimed. (cid:3) The first point in the previous chain can be replaced with o : Lemma 5.4.
Let ¯ P n = { k < · · · < k p } . Then the sequence o, y k , y − k , y k , . . . , y k p , y − n +1 isa (2 C + 4 δ, D − C − δ ) -chain. Moreover, d ( o, y k ) > d ( o, r k · o ) + D − C − δ .Proof. The only difference compared to the proof of Lemma 4.4 is that we do not have theinequality ( y k , o ) y − k C due to the more complicated definition of refined pivotal times.If we can prove that ( y k , o ) y − k C + 3 δ , the proof of Lemma 4.4 goes through. Let uscheck this inequality.As in (5.1), the points y − k , y (1) k , y (3) k , y (4) k (= y k ) , y (5) k form a ( C + δ, D − C − δ ) chain.Therefore, ( y − k , y k ) y (1) k C + 2 δ by Lemma 3.7. Moreover, ( o, y (1) k ) y − k C by thedefinition of pivotal times. As d ( y (1) k , y − k ) > D is large, it follows that Lemma 3.5 appliesto the points o, y − k , y (1) k , y k with C = C + 2 δ . It gives ( y k , o ) y − k C + 3 δ , concluding theproof that we have a chain.Moreover, Lemma 3.4 together with Lemma 5.3 give d ( o, y k ) > d ( y − k , y k ) − ( o, y k ) y − k > ( d ( o, r k · o ) + D ) − ( C + 3 δ ) , proving the last claim. (cid:3) Proposition 5.5.
Let ¯ P n = { k < · · · < k p } . We have d ( o, y − n +1 ) > P i d ( o, r k i · o ) .Proof. This follows from Lemmas 5.3 and 5.4, saying that we have a chain between o and y − n +1 with jumps of size at least d ( o, r k i · o ) + D − C − δ , and from Lemma 3.7 saying thatthe distance grows at least as the size of the jumps along a chain. (cid:3) To prove Proposition 5.1, it follows that we should show that there are many refinedpivotal times. For this, we follow the same strategy as in Subsection 4.1.
Lemma 5.6.
Fix s , . . . , s n , and draw s n +1 according to µ S ∗ ρ n +1 ∗ µ S . The probabilitythat | ¯ P n +1 | = | ¯ P n | + 1 (i.e., that n + 1 gets added as a refined pivotal time) is at least − η .Proof. In the local geodesic condition, there are 7 alignment conditions to be satisfied. Whendrawing s n +1 according to µ S ∗ ρ n +1 ∗ µ S , each of them is satisfied with probability at least − η (for each of them, this can be seen by fixing all variables but one and using that thelast one is picked from a Schottky set). Therefore, they are simultaneously satisfied withprobability at least − η . (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 22
To control the backtracking, we defined pivoted sequences. Given ¯ s = ( s , . . . , s n ) , let ussay that another sequence ¯ s ′ = ( s ′ , . . . , s ′ n ) is pivoted from ¯ s if they have the same refinedpivotal times, and d ′ k = d k at all times, and a ′ k = a k , b ′ k = b k , r ′ k = r k , c ′ k = c k at timeswhich are not a refined pivotal time. In other words, we freeze the last jump d k , but wekeep the freedom in the other parts of s k at refined pivotal times only.The next lemma is proved exactly like Lemma 4.7. Lemma 5.7.
Let i be a refined pivotal time of ¯ s = ( s , . . . , s n ) . Replace s i = a i b i r i c i d i with s ′ i = a ′ i b ′ i r ′ i c ′ i d i which still satisfies the local geodesic condition (with n replaced by i ). Then ( s , . . . , s ′ i , . . . , s n ) is pivoted from ¯ s . Denote by ¯ E n (¯ s ) the sequences which are pivoted from ¯ s . Conditionally on ¯ E n (¯ s ) , thevariables s ′ i over pivotal times i are independent, but drawn from distributions that dependson i . Lemma 5.8.
Let ¯ s = ( s , . . . , s n ) be a trajectory with q refined pivotal times. We conditionon ¯ E n (¯ s ) , and we draw s n +1 according to µ S ∗ ρ n +1 ∗ µ S . Then, for all j > , P ( | ¯ P n +1 | < q − j | ¯ E n (¯ s )) (7 η ) j +1 . Proof.
The proof is essentially the same as for Lemma 4.8. Assume that s n +1 is fixedand gives rise to some backtracking. Let us show that further backtracking happens withprobability at most η , from which the estimate follows inductively. Let m < k be the lasttwo refined pivotal times, and let x i − be the last point in a chain from y m to y − k witnessingthat y − k ∈ CS y m ( y + m ; C + δ ) as guaranteed by Lemma 5.2.In s ′ k , let us condition also with respect to b ′ k , r ′ k , c ′ k compatible with the local geodesiccondition. Then the total number of possible values for a ′ k that give rise to s ′ k satisfying thelocal geodesic condition is at least (1 − η ) | S | , as one should ensure the condition (( a ′ k ) − · o, b ′ k · o ) o C and S is a Schottky set. Among these, the values of a ′ k that may giverise to further backtracking are those for which the points x i − , y − k , y (1) k , y − n +1 are not C -aligned, because this alignment would imply y − n +1 ∈ CS y m ( y + m ; C + δ ) (as in the proof ofLemma 4.8) and would block the backtracking. By the Schottky condition applied twice,there are at most η | S | such a ′ k . Therefore, the probability of further backtracking is atmost η/ (1 − η ) η . (cid:3) Lemma 5.9.
Let A n = | ¯ P n | be the number of pivotal times. Then, in distribution, A n +1 > A n + U where U is a random variable independent from A n and distributed as follows: P ( U = − j ) = (1 − η )(7 η ) j for j > , P ( U = 0) = 0 , P ( U = 1) = 1 − η. In other words, P ( A n +1 > i ) > P ( A n + U > i ) for all i .Proof. This is proved exactly like Lemma 4.9 using Lemma 5.8. (cid:3)
Proposition 5.10.
There exists κ > only depending on η such that for all n , P ( | ¯ P n | (1 − η ) n ) e − κn . ANDOM WALKS WITHOUT MOMENT CONDITION 23
Proof.
Let U , U , . . . be a sequence of independent copies of the variable U from Lemma 5.9.Iterating this lemma gives P ( | ¯ P n | > i ) > P ( U + · · · + U n > i ) for all i . In particular, P ( | ¯ P n | (1 − η ) n ) P ( U + · · · + U n (1 − η ) n ) . The U i are realrandom variables with an exponential moment, and expectation (1 − η ) / (1 − η ) > − η .Large deviations for sums of i.i.d. real random variables ensure that P ( U + · · · + U n (1 − η ) n ) is exponentially small. (cid:3) Proof of Proposition 5.1.
We want to bound P ( d ( o, y − n +1 ) M ) . By Proposition 5.10, wehave(5.3) P ( d ( o, y − n +1 ) M ) P ( d ( o, y − n +1 ) M, | ¯ P n | > (1 − η ) n ) + e − κn . Therefore, we may focus on trajectories with | ¯ P n | > (1 − η ) n . Let ¯ s = ( s , . . . , s n ) be sucha trajectory, and ¯ E n (¯ s ) its equivalence class under the pivotal relation. We will estimate P ( d ( o, y − n +1 ) M | ¯ E n (¯ s )) .Along ¯ E n (¯ s ) , we have d ( o, y − n +1 ) > P pi =1 d ( o, r k i · o ) where the pivotal times are k < . . .
Since the B i have expectation − η , the probability P ( P ni =1 B i (1 − η ) n ) is exponen-tially small. We get P ( d ( o, y − n +1 ) M ) P ⌊ (1 − η ) n ⌋ X i =1 B i R i M, n X i =1 B i > (1 − η ) n + e − κ ′ n . To estimate the probability on the right, let us condition with respect to the B i . Thereare at most ηn of them that vanish. Therefore, P B i R i is a sum of at least (1 − η ) n independent copies of R , and the probability that the sum is at most M is bounded by P ( P ⌊ (1 − η ) n ⌋ i =1 R i M ) . As this estimate is uniform over the choice of the B i s, this concludesthe proof. (cid:3) Precise estimates for walks without first moment.
In this paragraph, we considera discrete probability measure µ on the set of isometries of X which has no first moment: E ( d ( o, g · o )) = ∞ when g is drawn according to µ . We will prove Theorems 1.2 and 1.3under this assumption. It suffices to prove the latter, as the former follows readily.Let r > be arbitrary. We have to show the existence of κ > such that P (( Z n · o, Z ∞ ) o rn ) e − κn . Let η = 1 / . Let S be an ( η, C , D ) -Schottky set in the support of µ M for some M > , where D is large enough compared to C , as given by Corollary 3.13. We follow theconstruction in Paragraph 4.2 to reconstruct the µ -random walk, except that instead ofsampling the specific jumps from µ S , we will sample them from µ S ∗ µ ∗ µ S : for N = 4 M + 1 and some α > , we may write µ N = αµ S ∗ µ ∗ µ S + (1 − α ) ν for some probability measure ν , where µ S is the uniform measure on S .The random walk is reconstructed by starting from Bernoulli random variables ε i (satis-fying P ( ε i = 1) = α and P ( ε i = 0) = 1 − α ), and sampling from µ S ∗ µ ∗ µ S when ε i = 1 and from ν when ε i = 0 . Conditioning on ( ε i ) and on the jumps when ε i = 0 , we are leftwith a walk as in Proposition 5.1. For this walk, we define a sequence of refined pivotaltimes as in Subsection 5.1. Let τ = τ ( n ) be the last index j such that N ( t j + 1) n , sothat the interval [ N t j , N ( t j + 1)) is contained in [0 , n ) . Then the sequence of refined pivotaltimes associated to the walk until time n has the form ¯ P , ¯ P , . . . , ¯ P τ − , ¯ P ( n ) τ . Moreover, u n := | ¯ P ( n ) τ ( n ) | satisfies(5.5) P ( u n κn ) e − κn , for some κ > : this is proved as Proposition 4.11, just using Proposition 5.10 instead ofProposition 4.10 inside the proof.Assume now that the walk converges at infinity (this is true almost everywhere) andthat u k > κk for all k > n (this is true outside of a set of exponentially small measure, bysumming the estimates in (5.5)). Let x = x n be the position of the walk at the ( ⌊ κn ⌋ − -threfined pivotal time in ¯ P ( n ) τ ( n ) . Then for all k > n , the point Z k · o belongs to the (2 C + 6 δ ) -shadow of x seen from o (this is proved just like Lemma 4.12, using Lemma 5.4). As in (4.6),this implies the inequality ( Z n · o, Z ∞ ) o > d ( o, x n ) − (2 C + 9 δ ) . ANDOM WALKS WITHOUT MOMENT CONDITION 25
Finally, we have P (( Z n · o, Z ∞ ) o rn ) e − κn + P ( u n > κn, d ( o, x n ) rn + (2 C + 9 δ )) . Let us estimate the rightmost probability. We condition on the ( ε i ) (which fixes τ ) andon the jumps when ε i = 0 , to be in the setting of Subsection 5.1. As x is one of the points y − k +1 for ( κ/ n k n , we can sum the estimates of Proposition 5.1 (applied to k insteadof n ), to get a bound of the form n P ( R + · · · + R ⌊ (1 − η )( κ/ n ⌋ ( r + 1) n ) , where the R i are independent random variables distributed like d ( o, g · o ) where g is drawnaccording to µ . Letting β = (1 − η )( κ/ > , we get P (( Z n · o, Z ∞ ) o rn ) e − κn + n P ( R + · · · + R ⌊ βn ⌋ ( r + 1) n ) . Since we are assuming that µ has no first moment, the nonnegative random variables R i are not integrable. Applying the usual large deviations estimate to a truncated version of R ,we deduce that for any A > there exists c ( A ) such that P ( R + · · · + R k Ak ) e − c ( A ) k .Together with the previous equation, this gives an exponential bound on P (( Z n · o, Z ∞ ) o rn ) . This concludes the proof of Theorem 1.3 (and therefore also of Theorem 1.2) whenthere is no first moment. (cid:3) Precise estimates for walks with a first moment.
Assume now that µ is a measurewith a first moment. Then E µ n ( d ( o, g · o )) /n converges by subadditivity to a limit ℓ , theescape rate of the walk. Let r < ℓ . Our goal in this paragraph is to prove Theorem 1.3 (andtherefore also Theorem 1.2) in this setting: we will show that, for some κ > , we have P (( Z n · o, Z ∞ ) o rn ) e − κn . To prove this estimate, we will again use the refined model of Subsection 5.1, but we willhave to do so in a careful enough way.Fix η > small enough depending only on r and ℓ (how small will be prescribed at thevery end of the proof). By Corollary 3.13, there exists an ( η, C , D ) -Schottky set S in thesupport of µ M for some M > , where D is large enough compared to C . For N = 2 M ,we may write µ N = αµ S + (1 − α ) ν for some probability measure ν . Replacing α with α/ if necessary, we can also assume that ν is non-elementary.Let us now fix A > very large (how large will be described in the course of the proof,depending on η , α and ν ). Let ε i be a sequence of Bernoulli random variables, equal to with probability α and to with probability − α . Define inductively a sequence of times t , t ′ , t , t ′ , . . . as follows. First, t is the first time with ε t = 1 . Then t ′ is the smallesttime > t + A with ε t ′ = 1 . Then t is the smallest time > t ′ with ε t = 1 . And so on,picking the first times where ε i = 1 but keeping a gap at least A between t i and t ′ i . Then,pick γ n distributed according to the following measure: if n is of the form t i or t ′ i , use µ S .If n is in [ t i + 1 , t i + A ] , use µ N . Otherwise, use ν . Claim 5.11.
With this construction, γ · · · γ n − is distributed like Z Nn .Proof. Conditionally on the ε , . . . , ε n − and on γ , . . . , γ n − , we will show that γ n is dis-tributed according to µ N , from which the result follows. Consider the maximal t j or t ′ j before n . If it is a t j and n t j + A , then γ n is picked according to µ N by definition, and ANDOM WALKS WITHOUT MOMENT CONDITION 26 there is nothing left to prove. Otherwise, the choice of the measure for γ n depends on ε n :we use µ S if ε n = 1 (with probability α ) or ν if ε n = 0 (with probability − α ). Altogether, γ n is drawn according to αµ S + (1 − α ) ν = µ N , proving the claim. (cid:3) With a standard coupling argument, extending Ω if necessary, we can also constructon Ω a sequence of independent random variables g , g , . . . with distribution µ such that γ i = g iN · · · g iN + N − .The intuition behind the use of this decomposition is the following. Since α is possiblysmall, the times with ε i = 1 , which have frequency /α , may be sparse. However, if A ismuch larger than /α , the waiting time between t i + A and t ′ i , or between t ′ i and t i +1 , willbe comparatively much shorter. Therefore, the walk will be essentially a concatenation ofjumps corresponding to µ NA . These jumps essentially go in independent directions (thisis formalized precisely by Proposition 5.1), so the size of the walk at time N Ak will bebounded below by the sum of (1 − η ) k independent random variables distributed likejumps of µ NA , which are of order N Aℓ . Altogether, the probability to have size smallerthan (1 − η ) N Akℓ at time roughly
N Ak will be exponentially small, proving Theorem 1.2in this setting.To make this precise, we will need to control quantitatively the waiting times. Also, thedistribution of the jumps between t i and t ′ i is not µ NA , but µ NA ∗ ν t ′ i − ( t i + A ) . We will haveto show that the jumps of this family of measures are uniformly controlled from below,to be able to apply Proposition 5.1. Note that this application motivates why we had toformulate this proposition using different measures ρ i for the different jumps, instead of onesingle measure ρ .Let us start the proof, adapting the formalism of Subsection 4.2 to our current setting.Fix n ∈ N . We let τ = τ ( n ) be the last index j such that N ( t ′ j + 1) n , so that the interval [ N t j , N ( t ′ j +1)) is contained in [0 , n ) . We will decompose the product g · · · g n − as a productof the elements s ′ j (the product of all g i for i ∈ [ N t j , N ( t ′ j + 1)) ) interspersed with otherwords that we will consider as fixed, to be in the framework of Subsection 5.1. Let w j = g N ( t ′ j +1) · · · g Nt j +1 − (where by convention t ′ = 0 ), and let w ′ = w ′ ( n ) = g N ( t ′ τ ( n ) +1) · · · g n − be the last missing word (it really depends on n , contrary to the previous words that justfill the gaps between blocks [ t j , t ′ j ] ). By construction, Z n · o = w s ′ w · · · w τ − s ′ τ w ′ ( n ) · o. We can associate to this decomposition a sequence of refined pivotal times ¯ P ( n )1 , . . . , ¯ P ( n ) τ ,where the exponent ( n ) is here to emphasize that the intermediate words we use dependon n . In fact, the only word that really depends on n is the last word w ′ = w ′ ( n ) , as theother ones are w j = g ( N +1) t j · · · g Nt j +1 − so they only depend on t j . Hence, the sequence ofrefined pivotal times is rather ¯ P , ¯ P , . . . , ¯ P τ − , ¯ P ( n ) τ . If we condition on the ε i (which fixes the t i and t ′ i ), and on the g i for i not belonging to S [ N t j , N ( t ′ j +1)) (which fixes the w i and w ′ ( n ) ), then we are in the setting of Proposition 5.1,with ρ i = µ NA ∗ ν t ′ j − ( t j + A ) . To apply this proposition, we need to check that jumps withrespect to such a measure are uniformly bounded below. ANDOM WALKS WITHOUT MOMENT CONDITION 27
Lemma 5.12.
Assume that A is large enough. Let R NA be the distribution of the size ofjumps for µ NA . Let B be a Bernoulli random variable, equal to with probability − η andto with probability η , independent of R NA . Then, for any i > , for any M > , P µ NA ∗ ν i ( d ( o, g · o ) > M ) > P ( BR NA > M + ηN A ) . In other words, the jumps for µ NA ∗ ν i dominate stochastically BR NA − ηN A , uniformly in i .Proof. We have P µ NA ∗ ν i ( d ( o, g · o ) > M ) = X h µ NA ( h ) P ν i ( d ( o, hg · o ) > M ) . By Lemma 4.14 applied to the nonelementary measure ν and to ε = η , there exists C > such that, uniformly in h , with probability at least − η with respect to ν i for g one has d ( o, hg · o ) > d ( o, h · o ) − C . This gives P ν i ( d ( o, hg · o ) > M ) > (1 − ε )1 d ( o,h · o ) > M + C . Therefore, P µ NA ∗ ν i ( d ( o, g · o ) > M ) > X d ( o,h · o ) > M + C µ NA ( h )(1 − ε ) = (1 − ε ) P µ NA ( d ( o, h · o ) > M + C )= (1 − ε ) P ( R NA > M + C ) = P ( BR NA > M + C ) . Taking A large enough so that ηN A > C , this is bounded from below by P ( BR NA > M + ηN A ) . (cid:3) From now on, we will assume that A is large enough so that Lemma 5.12 holds. Lemma 5.13.
Assume that A is large enough. The sequence τ ( n ) grows like n/ ( N A ) withhigh probability. More precisely, there exists c > such that P ( τ ( n ) (1 − η ) n/ ( N A )) e − cn . Proof.
We have t ′ j = Aj + j X i =1 ( t ′ i − ( t i + A )) + j X i =1 ( t i − t ′ i − ) . The random variables t ′ i − ( t i + A ) and t i − t ′ i − are independent and have an exponentialtail (just depending on α ). Therefore, there exists C > and c > (not depending on A )such that P j X i =1 ( t ′ i − ( t i + A )) + j X i =1 ( t i − t ′ i − ) > Cj ! e − cj . Outside of a set O j with exponentially small probability, we obtain t ′ j Aj + Cj . Therefore, N ( t ′ j + 1) N ( Aj + Cj + 1) , which is bounded by N Aj/ (1 − η ) if A is large enoughcompared to C . Take j = j ( n ) = ⌊ (1 − η ) n/ ( N A ) ⌋ . It satisfies N Aj/ (1 − η ) n . On thecomplement of O j , we have N ( t ′ j + 1) n , and therefore τ ( n ) > j . Hence, the inequality τ ( n ) (1 − η ) n/ ( N A ) can only hold on O j , whose probability is exponentially small interms of n . (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 28
Let u n := | ¯ P ( n ) τ | be the number of refined pivotal times up to time n . Lemma 5.14.
There exists c > such that P ( u n (1 − η ) n/ ( N A )) e − cn .Proof. By Lemma 5.13, we have P ( u n (1 − η ) n/ ( N A )) e − cn + P ( u n (1 − η ) n/ ( N A ) , τ ( n ) > (1 − η ) n/ ( N A )) . Let us concentrate on the second set. We condition with respect to ε i (which fixes the t i ,the t ′ i , and τ ) and with respect to the g i outside of the intervals [ N t j , N ( t ′ j + 1)) (which fixesthe w j and w ′ ). Once these are fixed, we are in the framework of Subsection 5.1. We maytherefore apply Proposition 5.10 and deduce that, conditionally on these quantities, we have P ( u n (1 − η ) τ ) e − cτ , for some c > . As τ > (1 − η ) n/ ( N A ) , this gives conditionally P ( u n (1 − η )(1 − η ) n/ ( N A )) e − c (1 − η ) n/ ( NA ) . As − η (1 − η )(1 − η ) and theprevious bound is uniform on the conditioning, this implies the conclusion. (cid:3) Assume now that Z k · o converges to a point Z ∞ at infinity and moreover, for all k > n ,holds u k > (1 − η ) k/ ( N A ) (this happens outside of a set of exponentially small probability,by Lemma 5.14). Let ¯ t = ¯ t ( n ) = ⌊ (1 − η ) n/ ( N A ) ⌋ < | P ( n ) τ | , and let x = x n be the positionof the walk at the ¯ t -th refined pivotal time. An adaptation of Lemma 4.12 to this setting(based on Lemma 5.4) shows that, for all k > n , the point Z k · o belongs to the (2 C + 6 δ ) -shadow of x seen from o . In turn, as in (4.6), this implies the inequality ( Z n · o, Z ∞ ) o > d ( o, x n ) − (2 C + 9 δ ) . Finally, we have P (( Z n · o, Z ∞ ) o rn ) e − cn + P ( d ( o, x n ) rn + (2 C + 9 δ )) . For large enough n , we have rn + (2 C + 9 δ ) ( r + η ) n . Together with Lemma 5.13, we get P (( Z n · o, Z ∞ ) o rn ) e − cn + P ( d ( o, x n ) ( r + η ) n, τ ( n ) > (1 − η ) n/ ( N A )) . for some c > .To conclude, it suffices to show that the right-most probability is exponentially small.Let us condition on the ε i (which fixes the t i , the t ′ i and τ ) and on the g i for i not belongingto S [ N t j , N ( t ′ j + 1)) , to be again in the setting of Subsection 5.1. Note that ¯ t is not fixedby this conditioning. However, x n is one of the points y − m +1 = w s ′ · · · s ′ k w m , for some m > (1 − η ) n/ ( N A ) . We claim that it suffices to show that, for such an m , we have(5.6) P ( d ( o, y − m +1 ) ( r + η ) n ) e − cm . Indeed, the right hand side is exponentially small in terms of n . Summing over m ∈ [(1 − η ) n/ ( N A ) , n/ ( N A )] , we get a bound at most ne − c ′ n , which is again exponentiallysmall as desired.To prove the inequality (5.6), we apply Proposition 5.1, at the time m . Lemma 5.12shows that the stochastic domination assumptions of this lemma are satisfied, for R = BR NA − N Aη where B is a (1 − η ) -Bernoulli random variable. This proposition gives P ( d ( o, y − m +1 ) ( r + η ) n ) P ( R + · · · + R ⌊ (1 − η ) m ⌋ ( r + η ) n ) + e − cm , ANDOM WALKS WITHOUT MOMENT CONDITION 29 where the R i are independent copies of R . The last term is compatible with (5.6). For thefirst term, we will apply large deviations for sums of i.i.d. real random variables. We have E ( R i ) = E ( R ) = (1 − η ) E ( R NA ) − N Aη > (1 − η ) N Aℓ − ηN A, as E ( R NA ) / ( N A ) is the average drift at time N A , which converges to ℓ from above bysubadditivity. For z = (1 − η ) N Aℓ − ηN A < E ( R ) , large deviations ensure that P ( R + · · · + R k zk ) is exponentially small in terms of k . Therefore, it is enough to show that ( r + η ) n z (1 − η ) m to conclude. As m > (1 − η ) n/ ( N A ) , we have ( r + η ) nz (1 − η ) m ( r + η ) n ((1 − η ) N Aℓ − ηN A )(1 − η )(1 − η ) n/ ( N A )= r + η ((1 − η ) ℓ − η )(1 − η )(1 − η ) . When η converges to , this converges to r/ℓ < . Therefore, for small enough η , it is as desired. This concludes the proof of Theorem 1.3 when µ has a first moment. (cid:3) Continuity of the escape rate.
As an illustration of the power of the tools we haveintroduced above, we can recover the fact that the rate of escape ℓ ( µ ) depends continuouslyon the measure µ , a fact that was originally proved in hyperbolic groups by Erschler andKaimanovich in [EK13] (and which, in the general setting of non-proper hyperbolic spaces,follows from their proof together with the tools of [MT18]). Proposition 5.15.
Consider a discrete non-elementary measure µ on the space of isome-tries of a Gromov-hyperbolic space X with a basepoint o . Let r < ℓ ( µ ) . There exist ε > and a finite subset K of the support of µ with the following property. Let µ ′ be a probabilitymeasure with µ ′ ( g ) > µ ( g ) − ε for all g ∈ K . Then ℓ ( µ ′ ) > r .Even more, there exists κ > such that, for any µ ′ as above, the corresponding randomwalk Z ′ n satisfies for any n ∈ N the inequality (5.7) P ( d ( o, Z ′ n · o ) rn ) e − κn . Indeed, all the constants in the proofs in Subsection 5.3 are completely explicit. Once K is chosen large enough and ε small enough to ensure that µ ′ gives a weight bounded frombelow to all the elements in the Schottky set S chosen at the beginning of this subsection,then all the estimates go through for µ ′ just like for µ . In the end, this gives (5.7) with auniform κ . This exponential estimate implies ℓ ( µ ′ ) > r as d ( o, Z ′ n · o ) /n converges almostsurely to ℓ ( µ ′ ) .It follows from the proposition that, when µ n converges simply to µ , then lim inf ℓ ( µ n ) > ℓ ( µ ) . This is the nontrivial direction to prove that ℓ ( µ n ) → ℓ ( µ ) , as the other one followsfrom subadditivity (as ℓ ( µ ′ ) = Inf n ( E ( d ( o, Z ′ n · o )) /n ) , and each of these quantities when n is fixed is continuous in µ ′ for the L topology). We obtain the following corollary. Corollary 5.16.
Consider a discrete non-elementary measure µ on the space of isometriesof a Gromov-hyperbolic space X with a basepoint o , and a sequence of probability measures µ n converging to µ in the L -sense, i.e., P g d ( o, g · o ) | µ n ( g ) − µ ( g ) | → . Then ℓ ( µ n ) tendsto ℓ ( µ ) . ANDOM WALKS WITHOUT MOMENT CONDITION 30
References [BMSS20] Adrien Boulanger, Pierre Mathieu, Cagri Sert, and Alessandro Sisto,
Large deviations for randomwalks on hyperbolic spaces , preprint, 2020. Cited pages 1, 2, 3, 7, 9, 10, 15, and 18.[EK13] Anna Erschler and Vadim Kaimanovich,
Continuity of asymptotic characteristics for randomwalks on hyperbolic groups , Funktsional. Anal. i Prilozhen. (2013), 84–89. MR3113872. Citedpage 29.[Fur63] Harry Furstenberg, Noncommuting random products , Trans. Amer. Math. Soc. (1963), 377–428. 163345. Cited page 1.[GdlH90] Étienne Ghys and Pierre de la Harpe (eds.),
Sur les groupes hyperboliques d’après Mikhael Gromov ,Progress in Mathematics, vol. 83, Birkhäuser Boston Inc., Boston, MA, 1990, Papers from theSwiss Seminar on Hyperbolic Groups held in Bern, 1988. MR1086648. Cited page 8.[MS20] Pierre Mathieu and Alessandro Sisto,
Deviation inequalities for random walks , Duke Math. J. (2020), no. 5, 961–1036 (English). Cited page 3.[MT18] Joseph Maher and Giulio Tiozzo,
Random walks on weakly hyperbolic groups , J. Reine Angew.Math. (2018), 187–239. 3849626. Cited pages 1, 2, 17, and 29.[Sun20] Matthew Sunderland,
Linear progress with exponential decay in weakly hyperbolic groups , GroupsGeom. Dyn. (2020), no. 2, 539–566. 4118628. Cited page 1. IRMAR, CNRS UMR 6625, Université de Rennes 1, 35042 Rennes, France
Email address ::