[PDF] Exponential bounds for random walks on hyperbolic spaces without moment conditions

Abstract

We consider nonelementary random walks on general hyperbolic spaces. Without any moment condition on the walk, we show that it escapes linearly to infinity, with exponential error bounds. We even get such exponential bounds up to the rate of escape of the walk. Our proof relies on an inductive decomposition of the walk, recording times at which it could go to infinity in several independent directions, and using these times to control further backtracking.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b EXPONENTIAL BOUNDS FOR RANDOM WALKS ON HYPERBOLICSPACES WITHOUT MOMENT CONDITIONS

SÉBASTIEN GOUËZEL

Abstract.

We consider nonelementary random walks on general hyperbolic spaces. With-out any moment condition on the walk, we show that it escapes linearly to inﬁnity, withexponential error bounds. We even get such exponential bounds up to the rate of escapeof the walk. Our proof relies on an inductive decomposition of the walk, recording timesat which it could go to inﬁnity in several independent directions, and using these times tocontrol further backtracking. Introduction

Let X be a Gromov-hyperbolic space, with a ﬁxed basepoint o . Fix a discrete probabilitymeasure µ on the space of isometries of X . We assume that µ is non-elementary : in thesemigroup generated by the support of µ , there are two loxodromic elements with disjointﬁxed points. Let g , g , . . . be independent isometries of X distributed according to µ . Onecan then deﬁne a random walk on X given by Z n · o , where Z n = g · · · g n − .In general, results in the literature fall into two classes, qualitative and quantitative,where the second class requires more stringent assumptions on the walk.Without any moment assumption, it is known that Z n · o converges almost surely to apoint on the boundary ∂X , thanks to a beautiful non-constructive argument originally dueto Furstenberg [Fur63] in a matrix setting but that works in our setting when X is proper,and extended to the general situation above by Maher and Tiozzo [MT18]. The idea is to usea stationary measure on the boundary of X and the martingale convergence theorem thereto obtain the convergence of the random walk. When X is not proper, the boundary is notcompact, and showing the existence of a stationary measure on the boundary is a diﬃcultpart of [MT18]. In this article, the authors also show linear progress, in the following sense:there exists κ > such that, almost surely, lim inf d ( o, Z n · o ) /n > κ .Assuming additional moments conditions, one gets stronger results. [MT18] shows that, if µ has ﬁnite support, then P ( d ( o, Z n · o ) κn ) is exponentially small, for some κ > (we saythat the walk makes linear progress with exponential decay). The ﬁnite support assumptionhas been weakened to an exponential moment condition in [Sun20]. More recently, still underan exponential moment condition, [BMSS20] shows (among many other results) that theexponential bound holds for any κ strictly smaller than the escape rate ℓ = lim E ( d ( o, Z n · o )) /n .When X is a hyperbolic group, one has in fact linear progress with exponential decaywithout any moment assumption: this follows from nonamenability of the group, and the Date : February 3, 2021. fact that the cardinality of balls is at most exponential. This arguments breaks down whenthe space is non-proper, though, as in many interesting examples such as the curve complex.Our goal in this paper is to show that, to have linear progress with exponential decay(even in its strongest versions), there is no need for any moment condition. Deﬁne theescape rate of the walk ℓ ( µ ) = lim E ( d ( o, Z n · o )) /n if µ has a moment of order , i.e., P µ ( g ) d ( o, g · o ) < ∞ , and ℓ ( µ ) = ∞ otherwise.Our ﬁrst result is that the escape rate is positive, with an exponential error term. Theorem 1.1.

Consider a discrete non-elementary measure on the space of isometries ofa Gromov-hyperbolic space X with a basepoint o . Then there exists κ > such that, for all n , P ( d ( o, Z n · o ) κn ) e − κn . One recovers in particular that ℓ ( µ ) > , a fact already proved in [MT18]. The controlin the previous theorem can in fact be established up to the escape rate: Theorem 1.2.

Under the assumptions of Theorem 1.1, consider r < ℓ ( µ ) . Then thereexists κ > such that, for all n , P ( d ( o, Z n · o ) rn ) e − κn . In particular, when µ has no moment of order , this implies that d ( o, Z n · o ) /n → + ∞ almost surely.We also get the corresponding statement concerning directional convergence to inﬁnity.For ξ ∈ ∂X and x, y ∈ X , denote the corresponding Gromov product by(1.1) ( x, ξ ) y = inf z n → ξ lim inf n ( x, z n ) y , where ( x, z n ) y = ( d ( y, x ) + d ( y, z n ) − d ( x, z n )) / is the usual Gromov product inside thespace (see Section 3 for more background on Gromov-hyperbolic spaces). The limit onlydepends on the choice of the sequence z n up to δ . Intuitively, ( x, ξ ) y is the distance from y to a geodesic between x and ξ . It is also the amount that x has moved in the direction of ξ compared to y . A sequence x n converges to ξ if and only if ( x n , ξ ) o → ∞ . Theorem 1.3.

Under the assumptions of Theorem 1.2, Z n · o converges almost surely to apoint Z ∞ ∈ ∂X . Moreover, for any r < ℓ ( µ ) , there exists κ > such that, for all n , P (( Z n · o, Z ∞ ) o rn ) e − κn . Theorem 1.3 readily implies Theorem 1.2 as ( Z n · o, Z ∞ ) o d ( o, Z n · o ) , which followsdirectly from the deﬁnition.The convergence statement in Theorem 1.3 is due to [MT18]. The novelty is the quanti-tative exponential bound, without any moment assumption. Note that, in both theorems,when µ has no moment of order , one may take any r > , so the conclusion is superlineargrowth with exponential decay.It follows from subadditivity that, for any r ℓ , the sequence − log( P ( d ( o, Z n · o ) rn )) /n converges to a limit I ( r ) . This is a rate function in the classical sense of large deviationsin probability theory. Theorem 1.2 shows that the rate function is strictly positive for r < ℓ , recovering part of [BMSS20, Theorem 1.2] while removing their exponential momentassumption. Note that [BMSS20] also obtains exponential estimates for upper deviation ANDOM WALKS WITHOUT MOMENT CONDITION 3 inequalities P ( d ( o, Z n · o ) > rn ) for r > ℓ . These estimates can not hold without exponentialmoments, since exponential controls for lower and upper deviation probabilities imply anexponential moment for the measure, see [BMSS20, Subsection 3.1]. Remark 1.4.

The fact that we use discrete measures in the above theorems is for con-venience only, to avoid discussing measurability issues and conditioning on zero measuresets. Suitable versions removing discreteness, but adding measurability and separabilityconditions, hold with the same proofs.Our approach is elementary, in the spirit of [MS20] and [BMSS20] (the latter article isa strong inspiration for our work), and does not rely on any boundary theory. The mainintuition is the following. In the hyperbolic plane, we deﬁne a path as follows: walk straighton during a distance d , then turn by an angle θ ¯ θ < π , then walk straight on during adistance d , then turn by an angle θ ¯ θ , and so on. If all the lengths d i are larger than aconstant D = D (¯ θ ) , then this path is essentially going straight to inﬁnity, and at time n itis roughly at distance d + · · · + d n of the origin. The problem when doing a random walkis that the analogues of the angles θ i could be equal to π , i.e., the walker could come backexactly along its footsteps. But this should not happen often. Our main input is a technicalway to justify that indeed it does not happen often, in a precise quantitative version: we willkeep track of some times (called pivotal times below) at which the random walk can choosesome direction, with most choices leading to progress towards inﬁnity (this is implementedthrough the notion of Schottky set coming from [BMSS20]), and at which we will keep somedegree of freedom in an inductive construction. Of course, backtracking can happen lateron, and we will spend the degree of freedom we had kept to still control the behavior afterbacktracking.We could give directly the proof of Theorem 1.3, but it would be very hard to follow.Instead, we will start with proofs of easier statements, and add new ingredients in increas-ingly complicated proofs. Section 2 is devoted to the simplest instance of our proof, in thefree group, where everything is as transparent as possible. Then, Section 3 introduces sometools of Gromov-hyperbolic geometry (notably chains, shadows and Schottky sets) that willbe used to extend the previous proof to a non-tree setting. Section 4 uses these tools in acrude way to prove Theorem 1.1, i.e., linear escape with exponential decay, and also con-vergence at inﬁnity with exponential bounds. Section 5 follows the same strategy but in amore reﬁned way, to get Theorems 1.2 and 1.3.2. Linear escape with exponential decay on free groups

The goal of this section is to illustrate the concept of pivotal times in the simplest possiblesetting. We show that, for a class of measures without moments on the free group, there islinear escape with exponential decay. Of course, this follows from non-amenability. Insteadof the result, what matters here is the proof: the rest of the paper is an extension of the sameidea to technically more involved contexts (general measures, Gromov-hyperbolic spaces),but the main insight can be explained much more transparently in a tree setting.

Theorem 2.1.

Let d > . Let µ be a probability measure on F d that can be written as µ S ∗ ν , where µ S is the uniform probability measure on the canonical generators of F d , and ν is a probability measure with ν ( e ) = 0 . Let Z n = g · · · g n , where the g i are independent ANDOM WALKS WITHOUT MOMENT CONDITION 4 and distributed according to µ . There exists κ > (independent of ν and of d ) such that,for all n , P ( | Z n | κn ) e − κn . Remark 2.2.

The fact that κ can be chosen independently of ν and of d does not followfrom non-amenability, and is really a byproduct of our proof technique. Remark 2.3.

The restrictions d > and ν ( e ) = 0 are simplifying assumptions to havea proof that is as streamlined as possible. In the next sections, we will prove analogoustheorems but for general measures, on general hyperbolic spaces.The key point in the proof of Theorem 2.1 is the next lemma. Lemma 2.4.

There exists κ > satisfying the following. Consider d > and n > . Fix w , . . . , w n nontrivial words in F d , and let Z n = s w · · · s n w n , where the s i are generatorsof F d , chosen uniformly and independently. Then P ( | Z n | κn ) e − κn . This lemma directly implies Theorem 2.1, by conditioning with respect to the realizationsof ν and just keeping the randomness coming from the factor µ S in µ = µ S ∗ ν .To prove the lemma, one wants to argue that the walk does not backtrack too much. Ofcourse, the walk can backtrack completely: as the size of the w i is not controlled, it mayhappen that w n is exactly inverse to s w · · · s n and therefore that Z n = e . However, thisis unlikely to happen for most choices of s , . . . , s n .A diﬃculty is that the distance to the origin is not well-behaved under the walk. Forinstance, assume that Z n − = e , that w n − is very long (of length n , say) and that forsome generators s and t , one has tw n = ( sw n − ) − . Then Z n − is far away from the origin,and in particular it satisﬁes the inequality | Z n − | > n . However, Z n is equal to the originif s n − = s and s n = t , which happens with probability / (2 d ) . This is not exponentiallysmall, even though the distance control at time n − is good.For this reason, we will not try to control inductively the distribution of the distance tothe origin. Instead, we will control a number of branching points of the random walk upto time n , that we call pivotal points . In the general case of random walks in hyperbolicspaces, the deﬁnition will be quite involved, but for trees one can give a direct deﬁnition asfollows. Denote by γ n the path in the Cayley graph of F d corresponding to the walk up to Z n , i.e., the concatenation of the geodesics from e to s then to s w then to s w s and soon until s w s w · · · s n w n = Z n . Deﬁnition 2.5.

A time k ∈ [1 , n ] is a pivotal time (with respect to n ) if s k is the inverseneither of the last letter of Z k − , nor of the ﬁrst letter ( w k ) of w k (so that the path γ n islocally geodesic of length around Z k − ) and moreover the path γ n does not come back to Z k − s k afterwards.We will denote by P n the set of pivotal times with respect to n . In other words, k is pivotal if the walk at time k goes away from the origin during twosteps ( s k and then ( w k ) ) and then remains stuck in the subtree based at Z k − s k ( w k ) .The evolution of the set of pivotal times is not monotone: if the walk backtracks a lot,then many times that were pivotal with respect to n will not be any more pivotal withrespect to n + 1 , since the non-backtracking condition is not satisﬁed any more. On theother hand, the only possible new pivotal point is the last one: P n +1 ⊆ P n ∪ { n + 1 } . ANDOM WALKS WITHOUT MOMENT CONDITION 5

We will say that a sequence ( s ′ , . . . , s ′ n ) is pivoted from ¯ s = ( s , . . . , s n ) if they have thesame pivotal times and, additionally, s ′ k = s k for all k which is not a pivotal time. Thisis an equivalence relation. Moreover, a sequence has many pivoted sequences: if k is apivotal time and one changes s k to s ′ k which still satisﬁes the local geodesic condition (i.e., s ′ k is diﬀerent from the last letter of Z k − and from the ﬁrst letter of w k ), then we claimthat ( s , . . . , s ′ k , . . . , s n ) is pivoted from ( s , . . . , s n ) . Indeed, the part of γ n originating from Z k − s k ( w k ) never comes back on the edge from Z k − to Z k − s k (not even on its endpoints),so changing s k to s ′ k does not change this fact. Thus the behavior of γ ′ n after Z k − is exactlythe same as that of γ n , but in a diﬀerent subtree – one has pivoted the end of γ n around Z k − s k , hence the name. In particular, subsequent pivotal times are the same. Moreover,since the trajectory never comes back before Z k − s k , pivotal times before k are not aﬀected,and are the same for γ n and γ ′ n .More generally, denoting the pivotal times by p < · · · < p q , then changing the s p i to s ′ p i still satisfying the local geodesic condition gives a pivoted sequence. Let E n (¯ s ) be the set ofsequences which are pivoted from ¯ s . Conditionally on E n (¯ s ) , the previous discussion showsthat the random variables s ′ p i are independent (but not identically distributed as each ofthem is drawn from some subset of the generators depending on i , of cardinality | S | − or | S | − ). Proposition 2.6.

Let A n = | P n | be the number of pivotal times. Then, in distribution, A n +1 > A n + U where U is a random variable independent from A n and distributed asfollows: P ( U = − j ) = 2 d − d (2 d − j for j > , P ( U = 0) = 0 , P ( U = 1) = d − d . In other words, P ( A n +1 > i ) > P ( A n + U > i ) for all i .Proof. Let us ﬁx a sequence ¯ s = ( s , . . . , s n ) , and let q = | P n | be its number of pivotal times.We will prove the estimate by conditioning on E n (¯ s ) . Let ¯ s ′ ∈ E n (¯ s ) .First, assume there are no pivotal points, i.e., q = 0 . Then for each ¯ s ′ there are at least d − generators which are diﬀerent from the last letter of Z ′ n and from the ﬁrst letter of w n +1 ,giving rise to one pivotal time in P ′ n +1 with probability at least (2 d − / (2 d ) = P ( U = 1) .Otherwise, | P ′ n +1 | = 0 . Conditionally on E n (¯ s ) , it follows that the conclusion of the lemmaholds.Assume now that there is at least one pivotal point. From the last pivotal time onward,the behavior is the same over all the equivalence class E n (¯ s ) , so the last letter of Z ′ n doesnot depend on ¯ s ′ . There are at least d − generators of F d which are diﬀerent fromthe last letter of Z ′ n and from the ﬁrst letter of w n +1 . If s ′ n +1 is such a generator, then P ′ n +1 = P ′ n ∪ { n + 1 } . Therefore, P ( A n +1 > q + 1 | E n (¯ s )) > (2 d − / (2 d ) . We have adjusted the deﬁnition of U so that the right hand side is P ( U > . ANDOM WALKS WITHOUT MOMENT CONDITION 6

Fix now s ′ n +1 which is not such a nice generator. Then s ′ n +1 w n +1 may backtrack, possiblyuntil the last pivotal point Z ′ p q , thereby decreasing the number of pivotal points with respectto n + 1 . However, it may only backtrack further if the generator s ′ p q is exactly the inverseof the corresponding letter in w n +1 . This can happen for s ′ , but then it will not happenfor all the pivoted conﬁgurations of s ′ obtained by changing s ′ p q to another generator stillsatisfying the local geodesic condition. Therefore, P ( A n +1 q − | E n (¯ s )) d × d − , where the ﬁrst factor corresponds to the choice of a generator s ′ n +1 which does not satisfythe local geodesic condition, and the second factor corresponds to the choice of the speciﬁcgenerator for s ′ p q to make sure that one backtracks further.More generally, to cross j pivotal times, there is one speciﬁc choice of generator at eachof these pivotal times, which can only happen with a probability at most / (2 d − at eachof these times. Therefore, for j > , P ( A n +1 q − j | E n (¯ s )) d · d − j − . We have adjusted the distribution of U so that the right hand side is exactly P ( U − j ) .Finally, we obtain the inequalities P ( A n +1 q − j | E n (¯ s )) P ( U − j ) for j > , P ( A n +1 > q + 1 | E n (¯ s )) > P ( U > . Taking the complement in the ﬁrst inequality yields P ( A n +1 > q + k | E n (¯ s )) > P ( U > k ) for all k ∈ Z . As A n is constant equal to q on E n (¯ g ) , the right hand side is P ( A n + U > q + k | E n (¯ s )) . Writing i = q + k , we have obtained for all i the inequality P ( A n +1 > i | E n (¯ s )) > P ( A n + U > i | E n (¯ s )) . As this inequality is uniform over the conditioning, it gives the conclusion of the lemma. (cid:3)

Proof of Lemma 2.4.

Let U , U , . . . be a sequence of i.i.d. random variables distributed like U in Proposition 2.6. Iterating the proposition, one gets P ( A n > k ) > P ( U + · · · + U n > k ) .The random variables U i have an exponential moment. Moreover, their expectation ispositive when d > , as it is (2 d − · ( d − / ((2 d − · d ) . Large deviations for sums ofi.i.d. real random variables with an exponential moment ensure the existence of κ > suchthat P ( U + · · · + U n κn ) e − κn for all n . Then P ( A n κn ) e − κn . As the distance tothe origin is bounded from below by the number of pivotal points, this proves Lemma 2.4,except that the constant c depends on the number of generators d . However, the randomvariables U = U ( d ) depending on d increase with d (in the sense that when d > d ′ then P ( U ( d ) > k ) > P ( U ( d ′ ) > k ) for all k ). Therefore, one can use the random variables U (3) to obtain a lower bound in all free groups F d with d > . (cid:3) The rest of the paper is devoted to the extension of this argument to general measures andgeneral Gromov-hyperbolic spaces. While the intuition will remain the same, the deﬁnitionof pivotal times will need to be adjusted, as there is no well-deﬁned concept of subtree:instead, we will use a suitable notion of shadow, and require that the walk after the pivotaltime remains in the shadow. Also, to separate possible directions, we will rely on the notion

ANDOM WALKS WITHOUT MOMENT CONDITION 7 of Schottky sets introduced by [BMSS20], instead of just using the generators as in the freegroup. These notions are explained in the next section.3.

Prerequisites on Gromov-hyperbolic spaces

Let X be a metric space, and x, y, z ∈ X . Their Gromov product is deﬁned by ( x, z ) y = 12 ( d ( x, y ) + d ( y, z ) − d ( x, z )) . Let δ > . A metric space is δ -Gromov hyperbolic if, for all x, y, z, a ,(3.1) ( x, z ) a > min(( x, y ) a , ( y, z ) a ) − δ. When the space is geodesic, this is equivalent (up to changing δ ) to the fact that geodesictriangles are thin, i.e., each side is contained in the δ -neighborhood of the other two sides.In the rest of the paper, X is a δ -hyperbolic metric space (without any geodesicity orproperness or separability condition). We also ﬁx a basepoint o ∈ X .3.1. Boundary at inﬁnity.

We recall a few basic facts on the boundary at inﬁnity of aGromov-hyperbolic space that we will need later on.A sequence ( x n ) n ∈ N is converging at inﬁnity if ( x n , x m ) o tends to inﬁnity when m, n → ∞ .Two sequences ( x n ) and ( y n ) which are converging at inﬁnity are converging to the samelimit if ( x n , y n ) o → ∞ . This is an equivalence relation, thanks to the hyperbolicity inequality.Quotienting by this equivalence relation, one gets the boundary at inﬁnity of the space X denoted ∂X .The C -shadow of a point x , seen from o , is the set of points y such that ( y, o ) x C . Wedenote it with S o ( y ; C ) . Geometrically, this means that a geodesic from o to y goes withindistance C + O ( δ ) of x . Let us record a few classical properties of shadows. Lemma 3.1.

For y ∈ S o ( x ; C ) , one has d ( y, o ) > d ( x, o ) − C .Proof. We have d ( y, o ) = d ( y, x ) + d ( x, o ) − y, o ) x > d ( x, o ) − C. (cid:3) Lemma 3.2.

Let

C > , and let x n ∈ X be such that d ( o, x n ) → ∞ . Consider anothersequence y p such that, for all n , eventually y p ∈ S o ( x n ; C ) . Then y p converges at inﬁnity.Proof. Fix n large. For large enough p , one has y p ∈ S o ( x n ; C ) , i.e., ( o, y p ) x n C . As ( o, y p ) x n + ( x n , y p ) o = d ( o, x n ) , this gives ( x n , y p ) o > d ( o, x n ) − C .For large enough p, q , we get (using hyperbolicity for the ﬁrst inequality)(3.2) ( y p , y q ) o > min(( y p , x n ) o , ( y q , x n ) o ) − δ > d ( o, x n ) − C − δ. As d ( o, x n ) → ∞ by assumption, it follows that ( y p , y q ) o → ∞ , as claimed. (cid:3) Lemma 3.3.

Let

C > and x ∈ X . Consider y ∈ S o ( x ; C ) , and a point ξ ∈ ∂X which isa limit of points in S o ( x ; C ) . Then ( y, ξ ) o > d ( o, x ) − C − δ. Proof.

Let z n ∈ S o ( x ; C ) be a sequence converging to ξ . As the Gromov product at inﬁnitydoes not depend on the sequence up to δ , we have ( y, ξ ) o > lim inf( y, z n ) o − δ . Moreover,as both y and z n belong to S o ( x ; C ) , the inequality (3.2) gives ( y, z n ) o > d ( o, x ) − C − δ .The conclusion follows. (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 8

Chains and shadows.

In a hyperbolic space, ( x, z ) y is roughly the distance from y to a geodesic between x and z . In particular, if ( x, z ) y C for some constant C , this meansthat the points x, y, z are roughly aligned in this order, up to an error C . We will say thatthe points are C -aligned .In a hyperbolic space, if in a sequence of points all consecutive points are C -aligned, andthe points are separated enough, then the sequence is progressing linearly, and all pointsin the sequence are C + O ( δ ) -aligned (see for instance [GdlH90, Theorem 5.3.16]). We willneed variations around this classical idea.We start with distance estimates for 3 points. Lemma 3.4. . Consider x, y, z with ( x, z ) y C . Then d ( x, z ) > d ( x, y ) − C and d ( x, z ) > d ( y, z ) − C .Proof. By symmetry, it suﬃces to prove the ﬁrst inequality. We claim that d ( x, z ) > d ( x, y ) − ( x, z ) y , which implies the result. Expanding the deﬁnition of the Gromov product, thisinequality holds if and only if d ( y, x ) + d ( y, z ) − d ( x, z )2 + d ( x, z ) > d ( x, y ) . This reduces to d ( y, z ) + d ( x, z ) > d ( x, y ) , which is just the triangular inequality. (cid:3) The next lemma gives estimates for 4 points, from which results for more points willfollow by induction.

Lemma 3.5.

Consider w, x, y, z ∈ X , and C > . Assume ( w, y ) x C and ( x, z ) y C + δ and d ( x, y ) > C + 2 δ + 1 . Then ( w, z ) x C + δ .Proof. By deﬁnition of the Gromov product, ( x, z ) y + ( y, z ) x = d ( x, y ) . As ( x, z ) y C + δ ,we get ( y, z ) x > d ( x, y ) − C − δ . As d ( x, y ) > C + 2 δ + 1 , this gives ( y, z ) x > C + δ + 1 .Writing down the ﬁrst condition and the hyperbolicity condition, we get C > ( w, y ) x > min(( w, z ) x , ( z, y ) x ) − δ. If the minimum were realized by ( z, y ) x , we would get C > ( C + δ + 1) − δ , a contradiction.Therefore, the minimum is realized by ( w, z ) x , which gives ( w, z ) x C + δ . (cid:3) Deﬁnition 3.6.

For

C, D > , a sequence of points x , . . . , x n is a ( C, D ) -chain if one has ( x i − , x i +1 ) x i C for all < i < n , and d ( x i , x i +1 ) > D for all i < n . Lemma 3.7.

Let x , . . . , x n be a ( C, D ) chain with D > C +2 δ +1 . Then ( x , x n ) x C + δ ,and (3.3) d ( x , x n ) > n − X i =0 ( d ( x i , x i +1 ) − (2 C + 2 δ )) > n. Proof.

Let us show by decreasing induction on i that ( x i − , x n ) x i C + δ , the result beingtrue for i = n − by assumption. Assume it holds for i +1 . Then the points x i − , x i , x i +1 , x n satisfy the assumptions of Lemma 3.5, which gives ( x i − , x n ) x i C + δ as desired.Let us now show that d ( x j , x n ) > P n − i = j ( d ( x i , x i +1 ) − (2 C + 2 δ )) by decreasing inductionon j , the case j = n being trivial and the case j = 0 being (3.3). We have d ( x j , x n ) = d ( x j , x j +1 ) + d ( x j +1 , x n ) − x j , x n ) x j +1 > d ( x j , x j +1 ) + d ( x j +1 , x n ) − (2 C + 2 δ ) , ANDOM WALKS WITHOUT MOMENT CONDITION 9 which concludes the induction. (cid:3)

Lemma 3.8.

Let x , . . . , x n be a ( C, D ) chain with D > C + 4 δ + 1 . Then for all i onehas ( x , x n ) x i C + 2 δ .Proof. Lemma 3.7 applied to the ( C, D ) -chain x i , x i +1 , . . . , x n gives ( x i , x n ) x i +1 C + δ .The same lemma applied to the ( C, D ) -chain x i +1 , x i , . . . , x gives ( x i +1 , x ) x i C + δ .Therefore, the points x , x i , x i +1 , x n are ( C + δ ) -aligned. Let us apply Lemma 3.5 to thesepoints, with C + δ instead of C . It gives ( x , x n ) x i C + 2 δ , as claimed. (cid:3) We will need to say that a point z belongs to a half-space based at a point y and directedtowards a point y + . The usual deﬁnition for this is the shadow of y + seen from y , deﬁnedas the set S y ( y + ; C ) of points z with ( y, z ) y + C for some suitable C . Unfortunately,this deﬁnition is not robust enough for our purposes as we will need to say that being ina half-space and walking again from y one stays in the half-space, which is not satisﬁed bythis deﬁnition due to the loss of δ when one applies the hyperbolicity inequality.A more robust deﬁnition can be given in terms of chains. If we have a chain (which goesroughly in a straight direction by the previous lemma) and if we prescribe the direction of itsﬁrst jump, then we are essentially prescribing the direction of the whole chain. This makesit possible to deﬁne another notion that we call chain-shadow, as follows. The choice of theminimal distance C + 2 δ + 1 between points in the chain in this deﬁnition is somewhatarbitrary, it should just be large enough that lemmas on the linear progress of chains apply. Deﬁnition 3.9.

Let C > and y, y + , z ∈ X . We say that z belongs to the C -chain-shadowof y + seen from y if there exists a ( C, C + 2 δ + 1) -chain x = y, x , . . . , x n = z satisfyingadditionally ( x , x ) y + C . We denote the chain-shadow with CS y ( y + ; C ) . The next lemma shows that this deﬁnition of shadow is roughly equivalent to the usualdeﬁnition in terms of the Gromov product ( y, z ) y + . Lemma 3.10. If z ∈ CS y ( y + ; C ) , then ( y, z ) y + C + δ and d ( y, z ) > d ( y, y + ) − C − δ .Proof. Let x = y, x , . . . , x n = z be a ( C, C + 2 δ + 1) -chain as in the deﬁnition of chain-shadows. We have d ( y, z ) = d ( y, x )+ d ( x , z ) − y, z ) x = d ( y, y + )+ d ( y + , x ) − y, x ) y + + d ( x , z ) − y, z ) x . Let us bound ( y, x ) y + with C (by the deﬁnition of chain-shadows) and ( y, z ) x by C + δ (thanks to Lemma 3.7 applied to the chain x , . . . , x n ). Let us also bound from below d ( y + , x ) + d ( x , z ) with d ( y + , z ) . We get d ( y, z ) > d ( y, y + ) + d ( y + , z ) − C − δ. Expanding the deﬁnition of the Gromov product, this gives ( y, z ) y + C + δ . Then we get d ( y, z ) > d ( y, y + ) − C − δ by applying Lemma 3.4 to y, y + , z . (cid:3) Schottky sets.

To be able to prescribe enough directions at pivotal points, we willuse a variation around the notion of Schottky set in [BMSS20]. This is essentially a ﬁniteset of isometries such that, for all x and y , most of these isometries put x and sy in generalposition with respect to o , i.e., such that x, o, sy are C -aligned for some given C . Deﬁnition 3.11.

Let η, C, D > . A ﬁnite set S of isometries of X is ( η, C, D ) -Schottky if ANDOM WALKS WITHOUT MOMENT CONDITION 10 • For all x, y ∈ X , we have |{ s ∈ S, ( x, sy ) o C }| > (1 − η ) | S | . • For all x, y ∈ X , we have |{ s ∈ S, ( x, s − y ) o C }| > (1 − η ) | S | . • For all s ∈ S , we have d ( o, so ) > D . We could deﬁne analogously a notion of an ( η, C, D ) -probability measure, where theprevious deﬁnition would be this property for the uniform measure on S .The next proposition shows that one can ﬁnd Schottky sets by using powers of twoloxodromic isometries. Proposition 3.12.

Fix two loxodromic isometries u and v of X , with disjoint sets of ﬁxedpoints at inﬁnity. For all η > , there exists C > such that, for all D > , there exist n ∈ N and an ( η, C, D ) -Schottky set in { w · · · w n : w i ∈ { u, v }} .Proof. This is essentially a classical application of the ping-pong method. [BMSS20, Propo-sition A.2] contains a slightly less precise statement, but their proof also gives our strongerversion, as we explain now. Let S n = { w · · · w n : w i ∈ { u, v }} .The ping-pong argument at inﬁnity shows that one can choose n large enough so that,for all m the elements w · · · w m for w i ∈ { u n , v n } are all diﬀerent, loxodromic, with disjointsets of ﬁxed points at inﬁnity. Let us ﬁx such an n , and then such an m with − m < η/ ,and denote these m isometries with g , . . . , g m . They all belong to S nm . Let g + i and g − i be their attractive and repulsive ﬁxed points.Let K be large enough. Deﬁne a neighborhood V ( g + i ) = { x ∈ X : ( x, g + i ) o > K } anda smaller neighborhood V ′ ( g + i ) = { x ∈ X : ( x, g + i ) o > K + δ } . In the same way, deﬁne V ( g − i ) and V ′ ( g − i ) . If K is large enough, then the m +1 sets ( V ( g ± i )) i =1 ,..., m are disjoint asthe ﬁxed points at inﬁnity of the g i are all diﬀerent. Moreover, for large enough p , then g pi maps the complement of V ( g − i ) to V ′ ( g + i ) , and the complement of V ( g + i ) to V ′ ( g − i ) .We claim that, for all D , if p is large enough, then S = { g p , . . . , g p m } is an ( η, K + δ, D ) -Schottky set. As all these elements belong to S nmp , this will prove the theorem. First, thecondition d ( o, so ) > D for s = g pi is true if p is large enough, as g i is loxodromic. Let usshow that |{ s ∈ S, ( x, sy ) o K + δ }| > (1 − η ) | S | for all x, y (the corresponding inequalitywith s − is similar). There is at most one s = g i for which y ∈ V ( g − i ) , as all these sets aredisjoint. There is also at most one s = g j for which x ∈ V ( g + j ) , again by disjointness. If s = g k is not one of these two, we claim that ( x, sy ) o K + δ . This will prove the result,since this implies |{ s ∈ S, ( x, sy ) o K + δ }| > | S | − m − | S | (1 − · − m ) > (1 − η ) | S | . As x / ∈ V ( g + k ) , we have ( x, g + k ) o < K . As y / ∈ V ( g − k ) , we have sy = g k y ∈ V ′ ( g + k ) , i.e., ( sy, g + k ) o > K + δ . By hyperbolicity, we obtain K > ( x, g + k ) o > min(( x, sy ) o , ( sy, g + k ) o ) − δ. (Note that the hyperbolicity inequality (3.1), initially stated inside the space, remains truefor the Gromov product at inﬁnity as we have used an inf in its deﬁnition (1.1)). If theminimum were realized by ( sy, g + k ) o > K + δ , we would get K > ( K + δ ) − δ , a contradiction.Therefore, the minimum is realized by ( x, sy ) o , yielding K > ( x, sy ) o − δ as claimed. (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 11

Corollary 3.13.

Let µ be a non-elementary discrete measure on the set of isometries of X . For all η > , there exists C > such that, for all D > , there exist M > and an ( η, C, D ) -Schottky set in the support of µ M .Proof. By deﬁnition of a non-elementary measure, one can ﬁnd loxodromic elements u and v with disjoint ﬁxed points in the support of µ a and µ b for some a, b > . Then u = u b and v = v a belong to the support of µ ab and have disjoint ﬁxed points. ApplyingProposition 3.12, we obtain an ( η, C, D ) -Schottky set in the support of µ abn as desired. (cid:3) Linear escape

In this section, we prove Theorem 1.1, i.e., the random walk on X driven by a non-elementary measure escapes linearly towards inﬁnity, with exponential bounds. We copythe proof of Section 2, replacing subtrees with chain-shadows in the deﬁnition of pivotaltimes, and generators with elements of a Schottky set. The reader who would prefer touse shadows instead of chain-shadows may do so for intuition, but should be warned thatthe argument will then barely fail (at a single place, the backtracking step in the proof ofLemma 4.8).Like in Section 2, the main technical part is to understand what happens for walks ofthe form w s w · · · w n − s n w n , where the w i are ﬁxed, while the s i are random, and drawnfrom a Schottky set. This will be done in Subsection 4.1, while the application to proveTheorem 1.1 is done in Subsection 4.24.1. A simple model.

In this section, we ﬁx isometries w , w , · · · of X , a constant C > ,and S a (1 / , C , D ) -Schottky set of isometries of X . We will assume that D is largeenough compared to C (for deﬁniteness D > C + 100 δ + 1 will do). Let µ S be theuniform measure on S . Let s i be i.i.d. random variables distributed like µ S .We form a random process on X by composing the w i and s i and applying them to thebasepoint o . Our goal is to understand the behavior of y − n +1 = w s w · · · s n w n · o when n tends to inﬁnity. The main result of this subsection is the following proposition. Proposition 4.1.

There exists a universal constant κ > (independent of everything) suchthat, for all n , P ( d ( o, y − n +1 ) κn ) e − κn . Write s i = a i b i with a i , b i ∈ S . We deﬁne y − i = w s w · · · s i − w i − · o, y i = w s w · · · w i − a i · o, y + i = w s w · · · w i − a i b i · o, the three points visited during the transition around i . We have d ( y − i , y i ) = d ( o, a i · o ) > D as a i belongs to the (1 / , C , D ) -Schottky set S . In the same way, d ( y i , y + i ) > D . A diﬃcultythat we will need to handle is that d ( y + i , y − i +1 ) may be short, as there is no lower bound on w i , while we need long jumps everywhere to apply the results on chains of Subsection 3.2.We will deﬁne a sequence of pivotal times P n ⊆ { , . . . , n } , evolving with time: when goingfrom n to n + 1 , we will either add a pivotal time at time n + 1 (so that P n +1 = P n ∪ { n + 1 } ,if the walk is going more towards inﬁnity), or we will remove a few pivotal times at the endbecause the walk has backtracked (in this case, P n +1 = P n ∩ { , . . . , m } for some m ).Let us deﬁne inductively the pivotal times, starting from P = ∅ . Assume that P n − is deﬁned, and let us deﬁne P n . Let k = k ( n ) be the last pivotal time before n , i.e., ANDOM WALKS WITHOUT MOMENT CONDITION 12 k = max( P n − ) . (If P n − = ∅ , take k = 0 and let y k = o – we will essentially ignore theminor adjustments to be made in this special case in the forthcoming discussion). Let ussay that the local geodesic condition is satisﬁed at time n if(4.1) ( y k , y n ) y − n C , ( y − n , y + n ) y n C , ( y n , y − n +1 ) y + n C . In other words, the points y k , y − n , y n , y + n , y − n +1 follow each other successively, with a C -alignment condition. As the points are well separated by the deﬁnition of Schottky sets,this will guarantee that we have a chain, progressing in a deﬁnite direction.If the local geodesic condition is satisﬁed at time n , then we say that n is a pivotal time,and we set P n = P n − ∪ { n } . Otherwise, we backtrack to the largest pivotal time m ∈ P n − for which y − n +1 belongs to the ( C + δ ) -chain-shadow of y + m seen from y m . In this case, weerase all later pivotal times, i.e., we set P n = P n − ∩ { , . . . , m } . If there is no such pivotaltime m , we set P n = ∅ . Lemma 4.2.

Assume that P n is nonempty. Let m be its maximum. Then y − n +1 belongs tothe ( C + δ ) -chain-shadow of y + m seen from y m .Proof. If P n has been deﬁned from P n − by backtracking, then the conclusion of the lemmais a direct consequence of the deﬁnition. Otherwise, the last pivotal time is n . In thiscase, let us show that y − n +1 belongs to the ( C + δ ) -chain-shadow of y + n seen from y n , byconsidering the chain y n , y − n +1 . By deﬁnition of the chain-shadow, we should check that ( y n , y − n +1 ) y + n C + δ and d ( y n , y − n +1 ) > C + 4 δ + 1 . The ﬁrst inequality is obviousas ( y n , y − n +1 ) y + n C C + δ by the local geodesic condition (4.1). Moreover, since ( y n , y − n +1 ) y + n C by (4.1), Lemma 3.4 gives d ( y n , y − n +1 ) > d ( y n , y + n ) − C > D − C , whichis > C + 4 δ + 1 if D is large enough. (cid:3) Lemma 4.3.

Let P n = { k < · · · < k p } . Then the sequence y − k , y k , y − k , y k , . . . , y k p , y − n +1 is a (2 C + 3 δ, D − C − δ ) -chain.Proof. Let us ﬁrst check the condition on Gromov products. We have to show that ( y k i − , y k i ) y − ki C + 3 δ and ( y − k i , y − k i +1 ) y ki C + 3 δ . The ﬁrst inequality is obvious, as it follows from theﬁrst property in the local geodesic condition when introducing the pivotal time k i . Let usshow the second one. Lemma 4.2 applied to the time k i +1 − shows that y − k i +1 belongs to the ( C + δ ) chain-shadow of y + k i seen from y k i . Lemma 3.10 thus yields ( y k i +1 , y k i ) y + ki C + 3 δ .Moreover, ( y + k i , y − k i ) y ki C by the local geodesic condition when introducing the pivotaltime k i . We apply Lemma 3.5 with the points y − k i , y k i , y + k i , y − k i +1 , with C = 2 C + 2 δ . As d ( y k i , y + k i ) > D is large enough, this lemma applies and gives ( y − k i , y − k i +1 ) y ki C + 3 δ . Thisis the desired inequality.Let us check the condition on distances. We have to show that d ( y − k i , y k i ) > D − C − δ and d ( y k i , y − k i +1 ) > D − C − δ . The ﬁrst condition is obvious as d ( y − k i , y k i ) > D . For thesecond, Lemma 3.10 gives d ( y k i , y − k i +1 ) > d ( y k i , y + k i ) − C − δ > D − C − δ . (cid:3) The ﬁrst point in the previous chain can be replaced with o : ANDOM WALKS WITHOUT MOMENT CONDITION 13

Lemma 4.4.

Let P n = { k < · · · < k p } . Then the sequence o, y k , y − k , y k , . . . , y k p , y − n +1 isa (2 C + 4 δ, D − C − δ ) -chain.Proof. We have to control d ( o, y k ) and ( o, y − k ) y k as the other quantities are controlledby Lemma 4.3. For this, we will apply Lemma 3.5 to the points y − k , y k , y − k , o with C =2 C + 3 δ . We have ( y − k , y − k ) y k C + 3 δ by Lemma 4.3, and ( y k , o ) y − k C (this is theﬁrst property in the local geodesic condition when introducing the pivotal time k ), and d ( y k , y − k ) > D > C + δ + 1 . Therefore, Lemma 3.5 gives ( y − k , o ) y k C + 4 δ . Moreover,Lemma 3.4 gives d ( y k , o ) > d ( y k , y − k ) − ( y k , o ) y − k > D − C > D − C − δ. (cid:3) Proposition 4.5.

We have d ( o, y − n +1 ) > | P n | .Proof. This follows from Lemma 4.4, saying that we have a chain of length at least | P n | between o and y − n +1 , and from Lemma 3.7, saying that the distance grows linearly along achain. (cid:3) This proposition shows that, to obtain the linear escape rate with exponential decay, itsuﬃces to show that there are linearly many pivotal times.

Lemma 4.6.

Fix s , . . . , s n , and draw s n +1 according to µ S . The probability that | P n +1 | = | P n | + 1 (i.e., that n + 1 gets added as a pivotal time) is at least / .Proof. In the local geodesic condition (4.1), the last property reads ( g · o, gb n w n · o ) gb n · o C for g = w s · · · w n − a n . Composing with b − n g − , it becomes ( b − n · o, w n · o ) o C . By thedeﬁnition of a Schottky set, this inequality is satisﬁed with probability at least − η = 99 / when choosing b n . Once b n is ﬁxed, the other two properties in the geodesic condition onlydepend on a n , and each of them is satisﬁed with probability at least / , again by theSchottky property. They are satisﬁed simultaneously with probability at least / . As (99 / · (98 / > / , this concludes the proof. (cid:3) The key point is to control the backtracking length. For this, we will see that for oneconﬁguration that backtracks a lot, there are many conﬁgurations that do not. Given ¯ s = ( s , . . . , s n ) , let us say that another sequence ¯ s ′ = ( s ′ , . . . , s ′ n ) is pivoted from ¯ s if theyhave the same pivotal times, b ′ k = b k for all k , and a ′ k = a k when k is not a pivotal time. Lemma 4.7.

Let i be a pivotal time of ¯ s = ( s , . . . , s n ) . Replace s i = a i b i with s ′ i = a ′ i b i which still satisﬁes the local geodesic condition (4.1) (with n replaced by i ). Then ( s , . . . , s ′ i , . . . , s n ) is pivoted from ¯ s .Proof. We should show that the pivotal times of ¯ s ′ are the same as those of ¯ s . Until time i , the sequences are the same, hence they have the same pivotal times: P i − (¯ s ) = P i − (¯ s ′ ) .Then i is added as a pivotal time for both ¯ s and ¯ s ′ by assumption, therefore P i (¯ s ) = P i (¯ s ′ ) .Then the remaining part of the trajectory for ¯ s never backtracks beyond i , as i remains apivotal time. This backtracking property is deﬁned in terms of the relative position of thetrajectory compared to y i and y + i , and therefore it depends on b i but not on the beginningof the trajectory (and in particular it does not depend on a i ). Hence, replacing a i with a ′ i does not change the backtrackings, which are the same for ¯ s and ¯ s ′ until time n . (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 14

Lemma 4.7 shows that, if a trajectory has p pivotal times, then it has a lot of pivotedtrajectories (exponentially many in p ) as one can change a i to a ′ i at each pivotal time.Denote by E n (¯ s ) the set of trajectories which are pivoted from ¯ s . Conditionally on E n (¯ s ) ,the random variables a ′ i for i a pivotal time are independent (but not identically distributed,as they are each drawn from a subset of S depending on i , of large cardinality. Lemma 4.8.

Let ¯ s = ( s , . . . , s n ) be a trajectory with q pivotal times. We condition on E n (¯ s ) , and we draw s n +1 according to µ S . Then, for all j > , P ( | P n +1 | < q − j | E n (¯ s )) / j +1 . Proof. If q = 0 , then the result follows readily from Lemma 4.6. Assume q > .First, the probability that s n +1 creates a new pivotal time is at least / , by Lemma 4.6(and the elements s n +1 that create a new pivotal time are the same over the whole equiva-lence class E n (¯ s ) as q > ). Let us now ﬁx a bad s n +1 , giving rise to backtracking.Let us show the lemma for j = 1 . Let m < k be the last two pivotal times. We have toshow that(4.2) P ( | P n +1 | < q − | E n (¯ s ) , s n +1 ) / , i.e., most trajectories do not backtrack beyond k : for many choices of a k , then y − n +1 shouldbelong to the ( C + δ ) -chain-shadow of y + m seen from y m . By Lemma 4.2 applied at time k − , we already know that y − k belongs to this set. Therefore, there exists a chain x = y m , x , . . . , x i = y − k pointing in the chain-shadow. With a good choice of a k , we will increasethe chain by adding y − n +1 at its end.Let us consider a ′ k so that the points x i − , y − k , y k , y − n +1 are C -aligned, i.e., such that ( x i − , y k ) y − k C and ( y − k , y − n +1 ) y k C . By the Schottky property, there are at least (98 / | S | such a ′ k . Let us show that, with this choice, y − n +1 belongs to the chain-shadowof y + m seen from y m (and therefore backtracking stops here). For this, it is enough to see that x , . . . , x i − , y − k , y − n +1 is a ( C + δ, C + 4 δ + 1) -chain. We have to see that d ( y − k , y − n +1 ) > C + 4 δ + 1 and ( x i − , y − n +1 ) y − k C + δ . For this, apply Lemma 3.5 to the points x i − , y − k , y k , y − n +1 , which are C -aligned. As d ( y − k , y k ) > D is large enough, this lemma gives ( x i − , y − n +1 ) y − k C + δ . Moreover, Lemma 3.4 gives d ( y − k , y − n +1 ) > d ( y − k , y k ) − ( y − k , y − n +1 ) y k > D − C > C + 4 δ + 1 , as claimed.In the equivalence class, the number of possible choices for a ′ k when introducing thepivotal time k is at least (98 / | S | , since most choices satisfy the local geodesic condition(see the proof of Lemma 4.6). The number of choices of a ′ k that ensure there is no furtherbacktracking is also bounded below by (98 / | S | , by the previous discussion, so that thenumber of bad choices is at most (1 − (98 / | S | . Finally, the proportion of bad choicesthat lead to further backtracking is at most (1 − (98 / | S | (98 / | S | < . This proves (4.2) for j = 1 .To prove the lemma for j = 2 , let us ﬁx s n +1 as well as a bad choice of a ′ k that gives riseto backtracking beyond k (this happens with probability at most / ). We have to see ANDOM WALKS WITHOUT MOMENT CONDITION 15 that, once these quantities are ﬁxed, the probability to backtrack past the previous pivotaltime is at most / . This is the same argument as above. The case of general j is provedanalogously by induction. (cid:3) Lemma 4.9.

Let A n = | P n | be the number of pivotal times. Then, in distribution, A n +1 > A n + U where U is a random variable independent from A n and distributed as follows: P ( U = − j ) = 910 j +1 for j > , P ( U = 0) = 0 , P ( U = 1) = 910 . In other words, P ( A n +1 > i ) > P ( A n + U > i ) for all i .Proof. Conditionally on E n (¯ s ) , this follows from Lemma 4.8, just like in the proof of Propo-sition 2.6: one shows that P ( A n +1 > i | E n (¯ s )) > P ( A n + U > i | E n (¯ s )) . As the inequality is uniform over the conditioning, the unconditioned version follows. (cid:3)

Proposition 4.10.

There exists a universal constant κ > such that, for all n , P ( | P n | κn ) e − κn . Proof.

Let U , U , . . . be a sequence of independent copies of the variable U from Lemma 4.9.Iterating this lemma gives P ( | P n | > i ) > P ( U + · · · + U n > i ) for all i . In particular, P ( | P n | κn ) P ( U + · · · + U n κn ) . As the U i are real randomvariables with an exponential moment and positive expectation, P ( U + · · · + U n κn ) isexponentially small if κ is small enough. (cid:3) Proof of Proposition 4.1.

The linear escape with exponential error term follows from Propo-sition 4.5 giving d ( o, y − n +1 ) > | P n | , and from Proposition 4.10 ensuring that | P n | growslinearly outside of a set of exponentially small probability. (cid:3) Proof of linear escape and convergence at inﬁnity.

Let µ be a non-elementarymeasure on the set of isometries of the space X . In this subsection, we prove Theorem 1.1:the µ -random walk goes to inﬁnity linearly, with an exponential error term. The techniqueswe develop along the way will also prove convergence of the walk at inﬁnity.We apply Corollary 3.13 with η = 1 / . Let C = C be given by this corollary. Choose D = D ( C , δ ) large enough so that the result of the previous Subsection apply ( D =20 C + 100 δ + 1 suﬃces). The corollary gives an ( η, C , D ) Schottky set S included inthe support of µ M for some M . For α > small enough and N = 2 M , we may write µ N = αµ S + (1 − α ) ν for some probability measure ν , where µ S is the uniform measure on S . As in [BMSS20, Section 6], let us reconstruct in a slightly indirect way the random walk,as follows, on a space Ω containing Bernoulli random variables ε i (satisfying P ( ε i = 1) = α and P ( ε i = 0) = 1 − α ) and variables h i distributed according to ν and variables s i = a i b i ANDOM WALKS WITHOUT MOMENT CONDITION 16 distributed according to µ S , all independent. Deﬁne γ i = s i if ε i = 1 , and γ i = h i if ε i = 0 .Then γ · · · γ n − is distributed like Z Nn . With a standard coupling argument, extending Ω ifnecessary, we can also construct on Ω a sequence of independent random variables g , g , . . . with distribution µ such that γ i = g iN · · · g iN + N − .Let t < t < · · · be the times where ε i = 1 . Fix n ∈ N . We let τ = τ ( n ) be the last index j such that N ( t j + 1) n , so that the interval [ N t j , N ( t j + 1)) is contained in [0 , n ) . We willdecompose the product g · · · g n − as a product of the elements s ′ j = s t j (the product of all g i for i ∈ [ N t j , N ( t j + 1)) ) interspersed with other words that we will consider as ﬁxed, tobe in the framework of Subsection 4.1. Let w j = g N ( t j +1) · · · g Nt j +1 − (where by convention t = 0 ), and let w ′ = w ′ ( n ) = g N ( t τ ( n ) +1) · · · g n − be the last missing word (it really dependson n , contrary to the previous words that just ﬁll the gaps between blocks corresponding to ε j = 1 ). By construction, Z n · o = w s ′ w · · · w τ − s ′ τ w ′ ( n ) · o. We can associate to this decomposition a sequence of pivotal times P ( n )1 , . . . , P ( n ) τ , wherethe exponent ( n ) is here to emphasize that the intermediate words we use depend on n . Infact, the only word that really depends on n is the last word w ′ = w ′ ( n ) , as the other onesare w j = g ( N +1) t j · · · g Nt j +1 − so they only depend on t j . Hence, the sequence of pivotaltimes is rather(4.3) P , P , . . . , P τ − , P ( n ) τ . The main quantity we will control is u n := (cid:12)(cid:12)(cid:12) P ( n ) τ ( n ) (cid:12)(cid:12)(cid:12) , the ﬁnal number of pivotal times after n steps of the initial random walk. Proposition 4.11.

There exists κ > such that P ( u n κn ) e − κn .Proof. The sequence t j +1 − t j is a sequence of independent random variables with an expo-nential tail. Therefore, there exist C > and κ > such that P ( t j > Cj ) = P j − X i =0 ( t i +1 − t i ) > Cj ! e − κj . Hence, if β > is small enough, we have N ( t ⌊ βn ⌋ + 1) n outside of a set with exponentiallysmall probability. This gives P ( τ ( n ) > βn ) e − κn for some κ > . For any c > , we get P ( u n cn ) e − κn + P ( u n cn, τ > βn ) . Let us concentrate on the second set. We condition with respect to the ε i (which ﬁxes the t i , and τ ) and with respect to the g i outside of the intervals [ N t j , N ( t j + 1)) (which ﬁxesthe w j and w ′ ). Once these are ﬁxed, we are in the framework of Subsection 4.1. We maytherefore apply Proposition 4.10 and deduce that, conditionally on these quantities, we have P ( u n cτ ) e − cτ , for some c > . As τ > βn , this gives conditionally P ( u n cβn ) e − cβn .As this is uniform on the conditioning, this implies the conclusion. (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 17

Proof of Theorem 1.1.

Outside of a set with exponentially small probability, the numberof pivotal times at the n -th step of the random walk is at least κn for some κ > , byProposition 4.11. As the distance to the origin is bounded below by the number of pivotaltimes, by Proposition 4.5, this concludes the proof. (cid:3) This argument enables us to recover a theorem of [MT18], the convergence of the walkat inﬁnity. We even get exponential error terms in the speed of convergence. We start witha lemma ensuring that positions of the random walk stay in a shadow.

Lemma 4.12.

Let n ∈ N and C > . Assume that, for all k > n , one has u k > C . Let x be the position of the walk at the C -th pivotal time in P ( n ) τ ( n ) . Then, for all k > n , the point Z k · o belongs to the (2 C + 6 δ ) -shadow of x seen from o .Proof. For k > n , the set P ( k ) τ ( k ) has strictly more than C points by assumption. In particular,the C -th pivotal time is not introduced at the last step, and the last step does not backtrackbeyond this point. The set of pivotal times before the last index does not depend on k , asexplained before (4.3). It follows that the C -th pivotal time in P ( k ) τ ( k ) is independent of k > n .In particular, x is the position of the walk at a pivotal time in P ( k ) τ ( k ) , for any k > n .For k > n , Lemma 4.4 shows that there is a (2 C + 4 δ, D − C − δ ) -chain from o to Z k · o going through x . By Lemma 3.8, we deduce that ( o, Z k · o ) x C + 6 δ . In other words, allthe points Z k · o remain in the (2 C + 6 δ ) -shadow of x seen from o , as claimed. (cid:3) Proposition 4.13.

Almost surely, there is a point Z ∞ ∈ ∂X such that Z n · o converges to Z ∞ . Moreover, there exists κ > such that (4.4) P (( Z n · o, Z ∞ ) o κn ) e − κn . Proof.

Fix c > such that P ( u n cn ) e − cn , by Lemma 4.11. Since P ( u n cn ) isexponentially small, Borel-Cantelli ensures that almost surely one has eventually u n > cn .Lemma 4.12 then applies, with C = ⌊ cn ⌋ − . Let x n denote the position of the walk at the ( ⌊ cn ⌋ − -th pivotal time for large n . By Proposition 4.5, it satisﬁes(4.5) d ( o, x n ) > ⌊ cn ⌋ − . The sequence Z k · o is eventually trapped in the shadow of x n seen from o by Lemma 4.12.This implies the convergence at inﬁnity of Z k · o , by Lemma 3.2.Finally, let us show the quantitative estimate (4.4). Assume that for all k > n , one has u k > ck (this happens with probability at least − Ce − cn ). In this case, all the points Z k · o for k > n belong to the (2 C + 6 δ ) -shadow of x n . Therefore, Lemma 3.3 applies and gives(4.6) ( Z n · o, Z ∞ ) o > d ( o, x n ) − (2 C + 6 δ ) − δ. Together with (4.5), this gives a linear lower bound for the Gromov product, that holdsoutside of an exponentially small set. (cid:3)

We will also need the following lemma, that follows from the same techniques.

ANDOM WALKS WITHOUT MOMENT CONDITION 18

Lemma 4.14.

Let µ be a non-elementary discrete measure on the set of isometries of aGromov-hyperbolic space X with basepoint o . Let Z n = g · · · g n − where the g i are i.i.d.with distribution µ . Let ε > . There exists C > such that, for any isometry g , P ( ∀ n, d ( o, gZ n · o ) > d ( o, g · o ) − C ) > − ε. The point of the lemma is that the possible loss C is uniform in g . Without momentassumptions on µ , it is not possible to get a better bound, contrary to the case of walkswith an exponential moment (compare [BMSS20, Theorem 2.12]). Proof.

We follow the same construction as at the beginning of this subsection to reconstructthe random walk, but adding the isometry g before the ﬁrst step of the random walk. Sincethe estimates of Subsection 4.1 are uniform in w , replacing w with gw does not changethem. Therefore, the number u n := (cid:12)(cid:12)(cid:12) P ( n ) τ ( n ) (cid:12)(cid:12)(cid:12) of pivotal times for the random walk at time n still satisﬁes the estimate of Proposition 4.11: there exists κ > (independent of g ) suchthat P ( u n κn ) e − κn .Let us ﬁx n such that P i > n e − κi < ε/ . On a set A g of probability at least − ε/ (whichmay depend on g ), one has for all i > n the inequality u i > κi > κn . As in the proof ofProposition 4.13, one can then ﬁnd a point x n such that, for all i > n , the points gZ i · o belong to the (2 C + 6 δ ) -shadow of x n seen from o . In particular, by Lemma 3.1, d ( gZ i · o, o ) > d ( o, x n ) − C − δ. Moreover, x n is of the form gZ k · o for some k n .By measurability, we can ﬁnd a set A (independent of g ) of measure at least − ε/ anda constant C such that, for all ω ∈ A and all k n , holds d ( o, Z k · o ) C .Consider ω ∈ A g ∩ A (this set has measure at least − ε ). Then d ( o, x n ) = d ( o, gZ k · o ) > d ( o, g · o ) − d ( g · o, gZ k · o ) = d ( o, g · o ) − d ( o, Z k · o ) > d ( o, g · o ) − C. For all i > n , we get d ( gZ i · o, o ) > d ( o, g · o ) − C − C − δ . For i < n , this estimate alsoholds as d ( o, Z i · o ) C . This proves the lemma, for the constant C + 4 C + 12 δ which isindependent of g . (cid:3) Precise estimates

A more complicated model.

To obtain precise estimates on the rate of convergenceto inﬁnity, we will need to compare the distance to the origin with the sum of independentreal valued random variables corresponding to the size of jumps of the random walk. Thisis done in the next proposition.

Proposition 5.1.

For η ∈ (0 , / , there exists κ = κ ( η ) > with the following property.Let S be an ( η, C , D ) -Schottky set of isometries of a δ -hyperbolic space X with basepoint o , where D is large enough compared to C (for deﬁniteness D > C + 100 δ + 1 is enough).Let ρ , ρ , . . . be probability measures on the isometry set of X . Let R be a nonnegative realrandom variable such that for all i and all M > one has P ρ i ( d ( o, g · o ) > M ) > P ( R > M ) , i.e., the distance with respect to the origin for ρ i dominates stochastically R , for all i . ANDOM WALKS WITHOUT MOMENT CONDITION 19

Let w , w , . . . be ﬁxed isometries of X . Let s , s , . . . be independent random variables,where s i is sampled according to µ S ∗ ρ i ∗ µ S . Deﬁne y − n +1 = w s w · · · s n w n · o . Then forall M > , P ( d ( o, y − n +1 ) M ) P ( R + · · · + R ⌊ (1 − η ) n ⌋ M ) + e − κn , where R , R , . . . are independent copies of R . When all the ρ i are the Dirac mass at the origin, then the setting of the proposition isessentially the same as the simple model of Subsection 4.1, except that we are sampling the s i according to µ S instead of µ S (which does not really make a diﬀerence). The conclusionin the general setting of Proposition 5.1 is that the growth rate of the distance to the originis at least the growth rate of sums of i.i.d. random variables distributed like the ρ i , up to aminor loss (that tends to when the proportion η of bad elements in the Schottky set tendsto ) and an exponentially small error term. This model will be precise enough to capturethe right growth rate of a general random walk, to prove Theorems 1.2 and 1.3 in the nextparagraphs, in the same way that we have deduced linear escape with exponential estimatesfrom the results on the simple model of Subsection 4.1. The possibility to have diﬀerentmeasures ρ i at the diﬀerent jumps will be important in the application of this propositionin Subsection 5.3, but for the proof the reader may pretend for simplicity that they are allequal to a ﬁxed measure ρ (and then one can take R to be the distribution of d ( o, g · o ) withrespect to ρ ).To prove Proposition 5.1, let us introduce a reﬁned notion of pivotal times, in whichwe will keep the randomness coming from the ρ i . Write s i = a i b i r i c i d i , where a i , b i , c i , d i are distributed according to µ S while r i is distributed according to ρ i . This gives rise to successive points at the i -th transition: y − i = y (0) i = w s · · · s i − w i − · o, y (1) i = w s · · · s i − w i − a i · o,y (2) i = w s · · · s i − w i − a i b i · o, y (3) i = w s · · · s i − w i − a i b i r i · o,y i = y (4) i = w s · · · s i − w i − a i b i r i c i · o, y + i = y (5) i = w s · · · s i − w i − a i b i r i c i d i · o. The distances between two successive points in this list is at least D as it comes from theapplication of an element of the Schottky set S , except for the distance between y (2) i and y (3) i for which we have no lower bound as r i is drawn according to ρ i .Let us deﬁne inductively a set of reﬁned pivotal times, that we will denote by ¯ P n todiﬀerentiate it from the previous unreﬁned notion. We copy the deﬁnition of Subsection 4.1.We start from ¯ P = ∅ . Assume that ¯ P n − is deﬁned, and let us deﬁne ¯ P n . Let k = k ( n ) be the last pivotal time before n , i.e., k = max( ¯ P n − ) . (If ¯ P n − = ∅ , take k = 0 and let y k = o ). Let us say that the local geodesic condition is satisﬁed at time n if inthe sequence y k , y (0) n , y (1) n , y (2) n , y (3) n , y (4) n , y (5) n , y − n +1 , all successive points are C -aligned, andmoreover y (1) n , y (3) n , y (4) n are C -aligned (the latter condition is useful to compensate the factthat the jump from y (2) n to y (3) n may be small, preventing us to apply the results on chains ofSubsection 3.2). If the local geodesic condition is satisﬁed at time n , then we say that n isa reﬁned pivotal time, and we set ¯ P n = ¯ P n − ∪ { n } . Otherwise, we backtrack to the largestreﬁned pivotal time m ∈ ¯ P n − for which y − n +1 belongs to the ( C + δ ) chain-shadow of y + m seen ANDOM WALKS WITHOUT MOMENT CONDITION 20 from y m . In this case, we erase all later pivotal times, i.e., we set ¯ P n = ¯ P n − ∩ { , . . . , m } .If there is no such pivotal time m , we set ¯ P n = ∅ .For the reﬁned notion, we can prove the analogues of the lemmas of Subsection 4.1. Lemma 5.2.

Assume that ¯ P n is nonempty. Let m be its maximum. Then y − n +1 belongs tothe ( C + δ ) chain-shadow of y + m seen from y m .Proof. The proof is exactly the same as for Lemma 4.2: when there is backtracking, thisfollows from the deﬁnition, and when there is no backtracking (i.e., the last pivotal timeis n ), then the chain y n , y − n +1 satisﬁes all the properties to show that y − n +1 is in the chain-shadow. (cid:3) Lemma 5.3.

Let ¯ P n = { k < · · · < k p } . Then the sequence y − k , y k , y − k , y k , . . . , y k p , y − n +1 is a (2 C + 3 δ, D − C − δ ) -chain. Moreover, d ( y − k i , y k i ) > d ( o, r k i · o ) + D for all i .Proof. This diﬀers a little bit from the proof of Lemma 4.3 as there are more points involvedat each pivotal time. It is still basic chain manipulations, with the only diﬃculty that thejumps corresponding to r i and w i may be short (but since they are surrounded by big jumpswith controlled alignment conditions this can be circumvented easily).By deﬁnition, the points y k i − , y − k i , y (1) k i , y (2) k i , y (3) k i , y (4) k i , y (5) k i are C -aligned. However, thedistances between y k i − and y − k i on the one hand, and between y (2) k i and y (3) k i on the otherhand, are not obviously bounded below (contrary to the other distances, which are > D ),so one can not apply the results on chains to these points. However, we can ﬁx this byremoving one point: we claim that(5.1) y k i − , y − k i , y (1) k i , y (3) k i , y (4) k i (= y k i ) , y (5) k i form a ( C + δ, D − C − δ ) chain.Let us prove this claim. We may apply Lemma 3.5 to the points y − k i , y (1) k i , y (2) k i , y (3) k i , with C = C , to deduce that ( y − k i , y (3) k i ) y (1) ki C + δ . Moreover, Lemma 3.4 gives d ( y (1) k i , y (3) k i ) > d ( y (1) k i , y (2) k i ) − ( y (1) k i , y (3) k i ) y (2) ki > D − C . Moreover, d ( y k i − , y − k i ) > D − C − δ by Lemma 3.10,as y − k i is in the ( C + δ ) chain shadow of y + k i − seen from y k i − , by Lemma 5.2. Finally, notethat ( y (1) k i , y (4) k i ) y (3) ki C by the last assumption in the local geodesic condition. We havechecked all the nontrivial properties in (5.1), completing its proof.We have in particular d ( y k i − , y − k i ) > D − C − δ , and also by (3.3)(5.2) d ( y − k i , y k i ) = d ( y − k i , y (4) k i ) > d ( y − k i , y (1) k i ) + d ( y (1) k i , y (3) k i ) + d ( y (3) k i , y (4) k i ) − C + δ ) . By Lemma 3.4 applied to y (1) k i , y (2) k i , y (3) k i , d ( y (1) k i , y (3) k i ) > d ( y (2) k i , y (3) k i ) − ( y (1) k i , y (3) k i ) y (2) ki > d ( o, r i · o ) − C . The two other distances in (5.2) are bounded below by D . Using D > C + δ ) + C , weobtain d ( y − k i , y k i ) > D + d ( o, r i · o ) . This proves all the distance conditions in the claim of the lemma.

ANDOM WALKS WITHOUT MOMENT CONDITION 21

Let us now check the Gromov product estimates. Applying Lemma 3.7 to the chain (5.1),we get ( y k i − , y k i ) y − ki C + 2 δ C + 3 δ , proving one of the desired estimates. Theother one is ( y − k i , y − k i +1 ) y ki C + 3 δ . To prove it, let us apply Lemma 3.5 to the points y − k i , y k i , y + k i , y − k i +1 . The Gromov product of the last three is at most C + 3 δ by Lemmas 5.2and 3.10, and the Gromov product of the ﬁrst three is at most C +2 δ by applying Lemma 3.7to the reverse of the chain (5.1). Moreover, the distance d ( y k i , y + k i ) is at least D , large enough.Therefore, Lemma 3.5 indeed applies with C = 2 C + 2 δ , and gives ( y − k i , y − k i +1 ) y ki C + 3 δ as claimed. (cid:3) The ﬁrst point in the previous chain can be replaced with o : Lemma 5.4.

Let ¯ P n = { k < · · · < k p } . Then the sequence o, y k , y − k , y k , . . . , y k p , y − n +1 isa (2 C + 4 δ, D − C − δ ) -chain. Moreover, d ( o, y k ) > d ( o, r k · o ) + D − C − δ .Proof. The only diﬀerence compared to the proof of Lemma 4.4 is that we do not have theinequality ( y k , o ) y − k C due to the more complicated deﬁnition of reﬁned pivotal times.If we can prove that ( y k , o ) y − k C + 3 δ , the proof of Lemma 4.4 goes through. Let uscheck this inequality.As in (5.1), the points y − k , y (1) k , y (3) k , y (4) k (= y k ) , y (5) k form a ( C + δ, D − C − δ ) chain.Therefore, ( y − k , y k ) y (1) k C + 2 δ by Lemma 3.7. Moreover, ( o, y (1) k ) y − k C by thedeﬁnition of pivotal times. As d ( y (1) k , y − k ) > D is large, it follows that Lemma 3.5 appliesto the points o, y − k , y (1) k , y k with C = C + 2 δ . It gives ( y k , o ) y − k C + 3 δ , concluding theproof that we have a chain.Moreover, Lemma 3.4 together with Lemma 5.3 give d ( o, y k ) > d ( y − k , y k ) − ( o, y k ) y − k > ( d ( o, r k · o ) + D ) − ( C + 3 δ ) , proving the last claim. (cid:3) Proposition 5.5.

Let ¯ P n = { k < · · · < k p } . We have d ( o, y − n +1 ) > P i d ( o, r k i · o ) .Proof. This follows from Lemmas 5.3 and 5.4, saying that we have a chain between o and y − n +1 with jumps of size at least d ( o, r k i · o ) + D − C − δ , and from Lemma 3.7 saying thatthe distance grows at least as the size of the jumps along a chain. (cid:3) To prove Proposition 5.1, it follows that we should show that there are many reﬁnedpivotal times. For this, we follow the same strategy as in Subsection 4.1.

Lemma 5.6.

Fix s , . . . , s n , and draw s n +1 according to µ S ∗ ρ n +1 ∗ µ S . The probabilitythat | ¯ P n +1 | = | ¯ P n | + 1 (i.e., that n + 1 gets added as a reﬁned pivotal time) is at least − η .Proof. In the local geodesic condition, there are 7 alignment conditions to be satisﬁed. Whendrawing s n +1 according to µ S ∗ ρ n +1 ∗ µ S , each of them is satisﬁed with probability at least − η (for each of them, this can be seen by ﬁxing all variables but one and using that thelast one is picked from a Schottky set). Therefore, they are simultaneously satisﬁed withprobability at least − η . (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 22

To control the backtracking, we deﬁned pivoted sequences. Given ¯ s = ( s , . . . , s n ) , let ussay that another sequence ¯ s ′ = ( s ′ , . . . , s ′ n ) is pivoted from ¯ s if they have the same reﬁnedpivotal times, and d ′ k = d k at all times, and a ′ k = a k , b ′ k = b k , r ′ k = r k , c ′ k = c k at timeswhich are not a reﬁned pivotal time. In other words, we freeze the last jump d k , but wekeep the freedom in the other parts of s k at reﬁned pivotal times only.The next lemma is proved exactly like Lemma 4.7. Lemma 5.7.

Let i be a reﬁned pivotal time of ¯ s = ( s , . . . , s n ) . Replace s i = a i b i r i c i d i with s ′ i = a ′ i b ′ i r ′ i c ′ i d i which still satisﬁes the local geodesic condition (with n replaced by i ). Then ( s , . . . , s ′ i , . . . , s n ) is pivoted from ¯ s . Denote by ¯ E n (¯ s ) the sequences which are pivoted from ¯ s . Conditionally on ¯ E n (¯ s ) , thevariables s ′ i over pivotal times i are independent, but drawn from distributions that dependson i . Lemma 5.8.

Let ¯ s = ( s , . . . , s n ) be a trajectory with q reﬁned pivotal times. We conditionon ¯ E n (¯ s ) , and we draw s n +1 according to µ S ∗ ρ n +1 ∗ µ S . Then, for all j > , P ( | ¯ P n +1 | < q − j | ¯ E n (¯ s )) (7 η ) j +1 . Proof.

The proof is essentially the same as for Lemma 4.8. Assume that s n +1 is ﬁxedand gives rise to some backtracking. Let us show that further backtracking happens withprobability at most η , from which the estimate follows inductively. Let m < k be the lasttwo reﬁned pivotal times, and let x i − be the last point in a chain from y m to y − k witnessingthat y − k ∈ CS y m ( y + m ; C + δ ) as guaranteed by Lemma 5.2.In s ′ k , let us condition also with respect to b ′ k , r ′ k , c ′ k compatible with the local geodesiccondition. Then the total number of possible values for a ′ k that give rise to s ′ k satisfying thelocal geodesic condition is at least (1 − η ) | S | , as one should ensure the condition (( a ′ k ) − · o, b ′ k · o ) o C and S is a Schottky set. Among these, the values of a ′ k that may giverise to further backtracking are those for which the points x i − , y − k , y (1) k , y − n +1 are not C -aligned, because this alignment would imply y − n +1 ∈ CS y m ( y + m ; C + δ ) (as in the proof ofLemma 4.8) and would block the backtracking. By the Schottky condition applied twice,there are at most η | S | such a ′ k . Therefore, the probability of further backtracking is atmost η/ (1 − η ) η . (cid:3) Lemma 5.9.

Let A n = | ¯ P n | be the number of pivotal times. Then, in distribution, A n +1 > A n + U where U is a random variable independent from A n and distributed as follows: P ( U = − j ) = (1 − η )(7 η ) j for j > , P ( U = 0) = 0 , P ( U = 1) = 1 − η. In other words, P ( A n +1 > i ) > P ( A n + U > i ) for all i .Proof. This is proved exactly like Lemma 4.9 using Lemma 5.8. (cid:3)

Proposition 5.10.

There exists κ > only depending on η such that for all n , P ( | ¯ P n | (1 − η ) n ) e − κn . ANDOM WALKS WITHOUT MOMENT CONDITION 23

Proof.

Let U , U , . . . be a sequence of independent copies of the variable U from Lemma 5.9.Iterating this lemma gives P ( | ¯ P n | > i ) > P ( U + · · · + U n > i ) for all i . In particular, P ( | ¯ P n | (1 − η ) n ) P ( U + · · · + U n (1 − η ) n ) . The U i are realrandom variables with an exponential moment, and expectation (1 − η ) / (1 − η ) > − η .Large deviations for sums of i.i.d. real random variables ensure that P ( U + · · · + U n (1 − η ) n ) is exponentially small. (cid:3) Proof of Proposition 5.1.

We want to bound P ( d ( o, y − n +1 ) M ) . By Proposition 5.10, wehave(5.3) P ( d ( o, y − n +1 ) M ) P ( d ( o, y − n +1 ) M, | ¯ P n | > (1 − η ) n ) + e − κn . Therefore, we may focus on trajectories with | ¯ P n | > (1 − η ) n . Let ¯ s = ( s , . . . , s n ) be sucha trajectory, and ¯ E n (¯ s ) its equivalence class under the pivotal relation. We will estimate P ( d ( o, y − n +1 ) M | ¯ E n (¯ s )) .Along ¯ E n (¯ s ) , we have d ( o, y − n +1 ) > P pi =1 d ( o, r k i · o ) where the pivotal times are k < . . . (1 − η ) n , we obtain in particular(5.4) d ( o, y − n +1 ) > ⌊ (1 − η ) n ⌋ X i =1 d ( o, r k i · o ) . Along ¯ E n (¯ s ) , the random variables r k i are independent (as what happens at diﬀerent pivotaltimes is independent by construction), but they are not distributed like ρ k i a priori, sincethe local geodesic condition may twist its distribution. Denoting by L k i the set of ( a, b, r, c ) that satisfy the local geodesic condition, then the distribution of ( a, b, r, c ) is ( µ S ∗ ρ k i ∗ µ S )1 L ki / ( µ S ∗ ρ k i ∗ µ S )( L k i ) . In particular, the probability that r k i equals a given r is ρ k i ( r ) µ S { ( a, b, c ) such that ( a, b, r, c ) ∈ L k i } / ( µ S ∗ ρ k i ∗ µ S )( L k i ) > ρ k i ( r ) µ S { ( a, b, c ) such that ( a, b, r, c ) ∈ L k i } . Once r is ﬁxed, there are alignment relations to be satisﬁed for a, b, c to make sure that ( a, b, r, c ) satisﬁes the local geodesic condition. Each of them is satisﬁed with probability atleast − η , so we get µ S { ( a, b, c ) such that ( a, b, r, c ) ∈ L k i } > − η . Finally, P ( r k i = r | ¯ E n (¯ s )) > (1 − η ) ρ k i ( r ) . As the distance d ( o, r · o ) for r drawn according to ρ k i dominates the random variable R inthe assumptions of the lemma, it follows that the conditional distribution in ¯ E n (¯ s ) dominates BR , where B is a Bernoulli random variable, equal to with probability − η and to with probability η . Conditionally on ¯ E n (¯ s ) , it follows from (5.4) that d ( o, y − n +1 ) dominates P ⌊ (1 − η ) n ⌋ i =1 B i R i . As this estimate is uniform over the equivalence classes, we get from (5.3)the inequality P ( d ( o, y − n +1 ) M ) P  ⌊ (1 − η ) n ⌋ X i =1 B i R i M  + e − κn . ANDOM WALKS WITHOUT MOMENT CONDITION 24

Since the B i have expectation − η , the probability P ( P ni =1 B i (1 − η ) n ) is exponen-tially small. We get P ( d ( o, y − n +1 ) M ) P  ⌊ (1 − η ) n ⌋ X i =1 B i R i M, n X i =1 B i > (1 − η ) n  + e − κ ′ n . To estimate the probability on the right, let us condition with respect to the B i . Thereare at most ηn of them that vanish. Therefore, P B i R i is a sum of at least (1 − η ) n independent copies of R , and the probability that the sum is at most M is bounded by P ( P ⌊ (1 − η ) n ⌋ i =1 R i M ) . As this estimate is uniform over the choice of the B i s, this concludesthe proof. (cid:3) Precise estimates for walks without ﬁrst moment.

In this paragraph, we considera discrete probability measure µ on the set of isometries of X which has no ﬁrst moment: E ( d ( o, g · o )) = ∞ when g is drawn according to µ . We will prove Theorems 1.2 and 1.3under this assumption. It suﬃces to prove the latter, as the former follows readily.Let r > be arbitrary. We have to show the existence of κ > such that P (( Z n · o, Z ∞ ) o rn ) e − κn . Let η = 1 / . Let S be an ( η, C , D ) -Schottky set in the support of µ M for some M > , where D is large enough compared to C , as given by Corollary 3.13. We follow theconstruction in Paragraph 4.2 to reconstruct the µ -random walk, except that instead ofsampling the speciﬁc jumps from µ S , we will sample them from µ S ∗ µ ∗ µ S : for N = 4 M + 1 and some α > , we may write µ N = αµ S ∗ µ ∗ µ S + (1 − α ) ν for some probability measure ν , where µ S is the uniform measure on S .The random walk is reconstructed by starting from Bernoulli random variables ε i (satis-fying P ( ε i = 1) = α and P ( ε i = 0) = 1 − α ), and sampling from µ S ∗ µ ∗ µ S when ε i = 1 and from ν when ε i = 0 . Conditioning on ( ε i ) and on the jumps when ε i = 0 , we are leftwith a walk as in Proposition 5.1. For this walk, we deﬁne a sequence of reﬁned pivotaltimes as in Subsection 5.1. Let τ = τ ( n ) be the last index j such that N ( t j + 1) n , sothat the interval [ N t j , N ( t j + 1)) is contained in [0 , n ) . Then the sequence of reﬁned pivotaltimes associated to the walk until time n has the form ¯ P , ¯ P , . . . , ¯ P τ − , ¯ P ( n ) τ . Moreover, u n := | ¯ P ( n ) τ ( n ) | satisﬁes(5.5) P ( u n κn ) e − κn , for some κ > : this is proved as Proposition 4.11, just using Proposition 5.10 instead ofProposition 4.10 inside the proof.Assume now that the walk converges at inﬁnity (this is true almost everywhere) andthat u k > κk for all k > n (this is true outside of a set of exponentially small measure, bysumming the estimates in (5.5)). Let x = x n be the position of the walk at the ( ⌊ κn ⌋ − -threﬁned pivotal time in ¯ P ( n ) τ ( n ) . Then for all k > n , the point Z k · o belongs to the (2 C + 6 δ ) -shadow of x seen from o (this is proved just like Lemma 4.12, using Lemma 5.4). As in (4.6),this implies the inequality ( Z n · o, Z ∞ ) o > d ( o, x n ) − (2 C + 9 δ ) . ANDOM WALKS WITHOUT MOMENT CONDITION 25

Finally, we have P (( Z n · o, Z ∞ ) o rn ) e − κn + P ( u n > κn, d ( o, x n ) rn + (2 C + 9 δ )) . Let us estimate the rightmost probability. We condition on the ( ε i ) (which ﬁxes τ ) andon the jumps when ε i = 0 , to be in the setting of Subsection 5.1. As x is one of the points y − k +1 for ( κ/ n k n , we can sum the estimates of Proposition 5.1 (applied to k insteadof n ), to get a bound of the form n P ( R + · · · + R ⌊ (1 − η )( κ/ n ⌋ ( r + 1) n ) , where the R i are independent random variables distributed like d ( o, g · o ) where g is drawnaccording to µ . Letting β = (1 − η )( κ/ > , we get P (( Z n · o, Z ∞ ) o rn ) e − κn + n P ( R + · · · + R ⌊ βn ⌋ ( r + 1) n ) . Since we are assuming that µ has no ﬁrst moment, the nonnegative random variables R i are not integrable. Applying the usual large deviations estimate to a truncated version of R ,we deduce that for any A > there exists c ( A ) such that P ( R + · · · + R k Ak ) e − c ( A ) k .Together with the previous equation, this gives an exponential bound on P (( Z n · o, Z ∞ ) o rn ) . This concludes the proof of Theorem 1.3 (and therefore also of Theorem 1.2) whenthere is no ﬁrst moment. (cid:3) Precise estimates for walks with a ﬁrst moment.

Assume now that µ is a measurewith a ﬁrst moment. Then E µ n ( d ( o, g · o )) /n converges by subadditivity to a limit ℓ , theescape rate of the walk. Let r < ℓ . Our goal in this paragraph is to prove Theorem 1.3 (andtherefore also Theorem 1.2) in this setting: we will show that, for some κ > , we have P (( Z n · o, Z ∞ ) o rn ) e − κn . To prove this estimate, we will again use the reﬁned model of Subsection 5.1, but we willhave to do so in a careful enough way.Fix η > small enough depending only on r and ℓ (how small will be prescribed at thevery end of the proof). By Corollary 3.13, there exists an ( η, C , D ) -Schottky set S in thesupport of µ M for some M > , where D is large enough compared to C . For N = 2 M ,we may write µ N = αµ S + (1 − α ) ν for some probability measure ν . Replacing α with α/ if necessary, we can also assume that ν is non-elementary.Let us now ﬁx A > very large (how large will be described in the course of the proof,depending on η , α and ν ). Let ε i be a sequence of Bernoulli random variables, equal to with probability α and to with probability − α . Deﬁne inductively a sequence of times t , t ′ , t , t ′ , . . . as follows. First, t is the ﬁrst time with ε t = 1 . Then t ′ is the smallesttime > t + A with ε t ′ = 1 . Then t is the smallest time > t ′ with ε t = 1 . And so on,picking the ﬁrst times where ε i = 1 but keeping a gap at least A between t i and t ′ i . Then,pick γ n distributed according to the following measure: if n is of the form t i or t ′ i , use µ S .If n is in [ t i + 1 , t i + A ] , use µ N . Otherwise, use ν . Claim 5.11.

With this construction, γ · · · γ n − is distributed like Z Nn .Proof. Conditionally on the ε , . . . , ε n − and on γ , . . . , γ n − , we will show that γ n is dis-tributed according to µ N , from which the result follows. Consider the maximal t j or t ′ j before n . If it is a t j and n t j + A , then γ n is picked according to µ N by deﬁnition, and ANDOM WALKS WITHOUT MOMENT CONDITION 26 there is nothing left to prove. Otherwise, the choice of the measure for γ n depends on ε n :we use µ S if ε n = 1 (with probability α ) or ν if ε n = 0 (with probability − α ). Altogether, γ n is drawn according to αµ S + (1 − α ) ν = µ N , proving the claim. (cid:3) With a standard coupling argument, extending Ω if necessary, we can also constructon Ω a sequence of independent random variables g , g , . . . with distribution µ such that γ i = g iN · · · g iN + N − .The intuition behind the use of this decomposition is the following. Since α is possiblysmall, the times with ε i = 1 , which have frequency /α , may be sparse. However, if A ismuch larger than /α , the waiting time between t i + A and t ′ i , or between t ′ i and t i +1 , willbe comparatively much shorter. Therefore, the walk will be essentially a concatenation ofjumps corresponding to µ NA . These jumps essentially go in independent directions (thisis formalized precisely by Proposition 5.1), so the size of the walk at time N Ak will bebounded below by the sum of (1 − η ) k independent random variables distributed likejumps of µ NA , which are of order N Aℓ . Altogether, the probability to have size smallerthan (1 − η ) N Akℓ at time roughly

N Ak will be exponentially small, proving Theorem 1.2in this setting.To make this precise, we will need to control quantitatively the waiting times. Also, thedistribution of the jumps between t i and t ′ i is not µ NA , but µ NA ∗ ν t ′ i − ( t i + A ) . We will haveto show that the jumps of this family of measures are uniformly controlled from below,to be able to apply Proposition 5.1. Note that this application motivates why we had toformulate this proposition using diﬀerent measures ρ i for the diﬀerent jumps, instead of onesingle measure ρ .Let us start the proof, adapting the formalism of Subsection 4.2 to our current setting.Fix n ∈ N . We let τ = τ ( n ) be the last index j such that N ( t ′ j + 1) n , so that the interval [ N t j , N ( t ′ j +1)) is contained in [0 , n ) . We will decompose the product g · · · g n − as a productof the elements s ′ j (the product of all g i for i ∈ [ N t j , N ( t ′ j + 1)) ) interspersed with otherwords that we will consider as ﬁxed, to be in the framework of Subsection 5.1. Let w j = g N ( t ′ j +1) · · · g Nt j +1 − (where by convention t ′ = 0 ), and let w ′ = w ′ ( n ) = g N ( t ′ τ ( n ) +1) · · · g n − be the last missing word (it really depends on n , contrary to the previous words that justﬁll the gaps between blocks [ t j , t ′ j ] ). By construction, Z n · o = w s ′ w · · · w τ − s ′ τ w ′ ( n ) · o. We can associate to this decomposition a sequence of reﬁned pivotal times ¯ P ( n )1 , . . . , ¯ P ( n ) τ ,where the exponent ( n ) is here to emphasize that the intermediate words we use dependon n . In fact, the only word that really depends on n is the last word w ′ = w ′ ( n ) , as theother ones are w j = g ( N +1) t j · · · g Nt j +1 − so they only depend on t j . Hence, the sequence ofreﬁned pivotal times is rather ¯ P , ¯ P , . . . , ¯ P τ − , ¯ P ( n ) τ . If we condition on the ε i (which ﬁxes the t i and t ′ i ), and on the g i for i not belonging to S [ N t j , N ( t ′ j +1)) (which ﬁxes the w i and w ′ ( n ) ), then we are in the setting of Proposition 5.1,with ρ i = µ NA ∗ ν t ′ j − ( t j + A ) . To apply this proposition, we need to check that jumps withrespect to such a measure are uniformly bounded below. ANDOM WALKS WITHOUT MOMENT CONDITION 27

Lemma 5.12.

Assume that A is large enough. Let R NA be the distribution of the size ofjumps for µ NA . Let B be a Bernoulli random variable, equal to with probability − η andto with probability η , independent of R NA . Then, for any i > , for any M > , P µ NA ∗ ν i ( d ( o, g · o ) > M ) > P ( BR NA > M + ηN A ) . In other words, the jumps for µ NA ∗ ν i dominate stochastically BR NA − ηN A , uniformly in i .Proof. We have P µ NA ∗ ν i ( d ( o, g · o ) > M ) = X h µ NA ( h ) P ν i ( d ( o, hg · o ) > M ) . By Lemma 4.14 applied to the nonelementary measure ν and to ε = η , there exists C > such that, uniformly in h , with probability at least − η with respect to ν i for g one has d ( o, hg · o ) > d ( o, h · o ) − C . This gives P ν i ( d ( o, hg · o ) > M ) > (1 − ε )1 d ( o,h · o ) > M + C . Therefore, P µ NA ∗ ν i ( d ( o, g · o ) > M ) > X d ( o,h · o ) > M + C µ NA ( h )(1 − ε ) = (1 − ε ) P µ NA ( d ( o, h · o ) > M + C )= (1 − ε ) P ( R NA > M + C ) = P ( BR NA > M + C ) . Taking A large enough so that ηN A > C , this is bounded from below by P ( BR NA > M + ηN A ) . (cid:3) From now on, we will assume that A is large enough so that Lemma 5.12 holds. Lemma 5.13.

Assume that A is large enough. The sequence τ ( n ) grows like n/ ( N A ) withhigh probability. More precisely, there exists c > such that P ( τ ( n ) (1 − η ) n/ ( N A )) e − cn . Proof.

We have t ′ j = Aj + j X i =1 ( t ′ i − ( t i + A )) + j X i =1 ( t i − t ′ i − ) . The random variables t ′ i − ( t i + A ) and t i − t ′ i − are independent and have an exponentialtail (just depending on α ). Therefore, there exists C > and c > (not depending on A )such that P j X i =1 ( t ′ i − ( t i + A )) + j X i =1 ( t i − t ′ i − ) > Cj ! e − cj . Outside of a set O j with exponentially small probability, we obtain t ′ j Aj + Cj . Therefore, N ( t ′ j + 1) N ( Aj + Cj + 1) , which is bounded by N Aj/ (1 − η ) if A is large enoughcompared to C . Take j = j ( n ) = ⌊ (1 − η ) n/ ( N A ) ⌋ . It satisﬁes N Aj/ (1 − η ) n . On thecomplement of O j , we have N ( t ′ j + 1) n , and therefore τ ( n ) > j . Hence, the inequality τ ( n ) (1 − η ) n/ ( N A ) can only hold on O j , whose probability is exponentially small interms of n . (cid:3) ANDOM WALKS WITHOUT MOMENT CONDITION 28

Let u n := | ¯ P ( n ) τ | be the number of reﬁned pivotal times up to time n . Lemma 5.14.

There exists c > such that P ( u n (1 − η ) n/ ( N A )) e − cn .Proof. By Lemma 5.13, we have P ( u n (1 − η ) n/ ( N A )) e − cn + P ( u n (1 − η ) n/ ( N A ) , τ ( n ) > (1 − η ) n/ ( N A )) . Let us concentrate on the second set. We condition with respect to ε i (which ﬁxes the t i ,the t ′ i , and τ ) and with respect to the g i outside of the intervals [ N t j , N ( t ′ j + 1)) (which ﬁxesthe w j and w ′ ). Once these are ﬁxed, we are in the framework of Subsection 5.1. We maytherefore apply Proposition 5.10 and deduce that, conditionally on these quantities, we have P ( u n (1 − η ) τ ) e − cτ , for some c > . As τ > (1 − η ) n/ ( N A ) , this gives conditionally P ( u n (1 − η )(1 − η ) n/ ( N A )) e − c (1 − η ) n/ ( NA ) . As − η (1 − η )(1 − η ) and theprevious bound is uniform on the conditioning, this implies the conclusion. (cid:3) Assume now that Z k · o converges to a point Z ∞ at inﬁnity and moreover, for all k > n ,holds u k > (1 − η ) k/ ( N A ) (this happens outside of a set of exponentially small probability,by Lemma 5.14). Let ¯ t = ¯ t ( n ) = ⌊ (1 − η ) n/ ( N A ) ⌋ < | P ( n ) τ | , and let x = x n be the positionof the walk at the ¯ t -th reﬁned pivotal time. An adaptation of Lemma 4.12 to this setting(based on Lemma 5.4) shows that, for all k > n , the point Z k · o belongs to the (2 C + 6 δ ) -shadow of x seen from o . In turn, as in (4.6), this implies the inequality ( Z n · o, Z ∞ ) o > d ( o, x n ) − (2 C + 9 δ ) . Finally, we have P (( Z n · o, Z ∞ ) o rn ) e − cn + P ( d ( o, x n ) rn + (2 C + 9 δ )) . For large enough n , we have rn + (2 C + 9 δ ) ( r + η ) n . Together with Lemma 5.13, we get P (( Z n · o, Z ∞ ) o rn ) e − cn + P ( d ( o, x n ) ( r + η ) n, τ ( n ) > (1 − η ) n/ ( N A )) . for some c > .To conclude, it suﬃces to show that the right-most probability is exponentially small.Let us condition on the ε i (which ﬁxes the t i , the t ′ i and τ ) and on the g i for i not belongingto S [ N t j , N ( t ′ j + 1)) , to be again in the setting of Subsection 5.1. Note that ¯ t is not ﬁxedby this conditioning. However, x n is one of the points y − m +1 = w s ′ · · · s ′ k w m , for some m > (1 − η ) n/ ( N A ) . We claim that it suﬃces to show that, for such an m , we have(5.6) P ( d ( o, y − m +1 ) ( r + η ) n ) e − cm . Indeed, the right hand side is exponentially small in terms of n . Summing over m ∈ [(1 − η ) n/ ( N A ) , n/ ( N A )] , we get a bound at most ne − c ′ n , which is again exponentiallysmall as desired.To prove the inequality (5.6), we apply Proposition 5.1, at the time m . Lemma 5.12shows that the stochastic domination assumptions of this lemma are satisﬁed, for R = BR NA − N Aη where B is a (1 − η ) -Bernoulli random variable. This proposition gives P ( d ( o, y − m +1 ) ( r + η ) n ) P ( R + · · · + R ⌊ (1 − η ) m ⌋ ( r + η ) n ) + e − cm , ANDOM WALKS WITHOUT MOMENT CONDITION 29 where the R i are independent copies of R . The last term is compatible with (5.6). For theﬁrst term, we will apply large deviations for sums of i.i.d. real random variables. We have E ( R i ) = E ( R ) = (1 − η ) E ( R NA ) − N Aη > (1 − η ) N Aℓ − ηN A, as E ( R NA ) / ( N A ) is the average drift at time N A , which converges to ℓ from above bysubadditivity. For z = (1 − η ) N Aℓ − ηN A < E ( R ) , large deviations ensure that P ( R + · · · + R k zk ) is exponentially small in terms of k . Therefore, it is enough to show that ( r + η ) n z (1 − η ) m to conclude. As m > (1 − η ) n/ ( N A ) , we have ( r + η ) nz (1 − η ) m ( r + η ) n ((1 − η ) N Aℓ − ηN A )(1 − η )(1 − η ) n/ ( N A )= r + η ((1 − η ) ℓ − η )(1 − η )(1 − η ) . When η converges to , this converges to r/ℓ < . Therefore, for small enough η , it is as desired. This concludes the proof of Theorem 1.3 when µ has a ﬁrst moment. (cid:3) Continuity of the escape rate.

As an illustration of the power of the tools we haveintroduced above, we can recover the fact that the rate of escape ℓ ( µ ) depends continuouslyon the measure µ , a fact that was originally proved in hyperbolic groups by Erschler andKaimanovich in [EK13] (and which, in the general setting of non-proper hyperbolic spaces,follows from their proof together with the tools of [MT18]). Proposition 5.15.

Consider a discrete non-elementary measure µ on the space of isome-tries of a Gromov-hyperbolic space X with a basepoint o . Let r < ℓ ( µ ) . There exist ε > and a ﬁnite subset K of the support of µ with the following property. Let µ ′ be a probabilitymeasure with µ ′ ( g ) > µ ( g ) − ε for all g ∈ K . Then ℓ ( µ ′ ) > r .Even more, there exists κ > such that, for any µ ′ as above, the corresponding randomwalk Z ′ n satisﬁes for any n ∈ N the inequality (5.7) P ( d ( o, Z ′ n · o ) rn ) e − κn . Indeed, all the constants in the proofs in Subsection 5.3 are completely explicit. Once K is chosen large enough and ε small enough to ensure that µ ′ gives a weight bounded frombelow to all the elements in the Schottky set S chosen at the beginning of this subsection,then all the estimates go through for µ ′ just like for µ . In the end, this gives (5.7) with auniform κ . This exponential estimate implies ℓ ( µ ′ ) > r as d ( o, Z ′ n · o ) /n converges almostsurely to ℓ ( µ ′ ) .It follows from the proposition that, when µ n converges simply to µ , then lim inf ℓ ( µ n ) > ℓ ( µ ) . This is the nontrivial direction to prove that ℓ ( µ n ) → ℓ ( µ ) , as the other one followsfrom subadditivity (as ℓ ( µ ′ ) = Inf n ( E ( d ( o, Z ′ n · o )) /n ) , and each of these quantities when n is ﬁxed is continuous in µ ′ for the L topology). We obtain the following corollary. Corollary 5.16.

Consider a discrete non-elementary measure µ on the space of isometriesof a Gromov-hyperbolic space X with a basepoint o , and a sequence of probability measures µ n converging to µ in the L -sense, i.e., P g d ( o, g · o ) | µ n ( g ) − µ ( g ) | → . Then ℓ ( µ n ) tendsto ℓ ( µ ) . ANDOM WALKS WITHOUT MOMENT CONDITION 30

References [BMSS20] Adrien Boulanger, Pierre Mathieu, Cagri Sert, and Alessandro Sisto,

Large deviations for randomwalks on hyperbolic spaces , preprint, 2020. Cited pages 1, 2, 3, 7, 9, 10, 15, and 18.[EK13] Anna Erschler and Vadim Kaimanovich,

Continuity of asymptotic characteristics for randomwalks on hyperbolic groups , Funktsional. Anal. i Prilozhen. (2013), 84–89. MR3113872. Citedpage 29.[Fur63] Harry Furstenberg, Noncommuting random products , Trans. Amer. Math. Soc. (1963), 377–428. 163345. Cited page 1.[GdlH90] Étienne Ghys and Pierre de la Harpe (eds.),

Sur les groupes hyperboliques d’après Mikhael Gromov ,Progress in Mathematics, vol. 83, Birkhäuser Boston Inc., Boston, MA, 1990, Papers from theSwiss Seminar on Hyperbolic Groups held in Bern, 1988. MR1086648. Cited page 8.[MS20] Pierre Mathieu and Alessandro Sisto,

Deviation inequalities for random walks , Duke Math. J. (2020), no. 5, 961–1036 (English). Cited page 3.[MT18] Joseph Maher and Giulio Tiozzo,

Random walks on weakly hyperbolic groups , J. Reine Angew.Math. (2018), 187–239. 3849626. Cited pages 1, 2, 17, and 29.[Sun20] Matthew Sunderland,

Linear progress with exponential decay in weakly hyperbolic groups , GroupsGeom. Dyn. (2020), no. 2, 539–566. 4118628. Cited page 1. IRMAR, CNRS UMR 6625, Université de Rennes 1, 35042 Rennes, France

Email address ::