[PDF] Subgaussian concentration inequalities for geometrically ergodic Markov chains

Abstract

We prove that an irreducible aperiodic Markov chain is geometrically ergodic if and only if any separately bounded functional of the stationary chain satisfies an appropriate subgaussian deviation inequality from its mean.

Full PDF

aa r X i v : . [ m a t h . P R ] J u l SUBGAUSSIAN CONCENTRATION INEQUALITIES FORGEOMETRICALLY ERGODIC MARKOV CHAINS

JÉRÔME DEDECKER AND SÉBASTIEN GOUËZEL

Abstract.

We prove that an irreducible aperiodic Markov chain is geometrically ergodicif and only if any separately bounded functional of the stationary chain satisﬁes an appro-priate subgaussian deviation inequality from its mean.

Let K ( x , . . . , x n − ) be a function of n variables, which is separately bounded in thefollowing sense: there exist constants L i such that for all x , . . . , x n − , x ′ i ,(1) | K ( x , . . . , x i − , x i , x i +1 , . . . , x n − ) − K ( x , . . . , x i − , x ′ i , x i +1 , . . . , x n − ) | L i . It is well known that, if the random variables X , X , . . . are i.i.d., then K ( X , . . . , X n − ) satisﬁes a subgaussian concentration inequality around its average, of the form(2) P ( | K ( X , . . . , X n − ) − E K ( X , . . . , X n − ) | > t ) e − t / P L i , see for instance [McD89].Such concentration inequalities have also attracted a lot of interest for dependent randomvariables, due to the wealth of possible applications. For instance, Markov chains with goodmixing properties have been considered, as well as weakly dependent sequences.A particular instance of function K is a sum P f ( x i ) (also referred to as an additivefunctional). In this case, one can hope for better estimates than (2), involving for instancethe asymptotic variance instead of only L i (Bernstein-like inequalities). However, for thecase of a general functional K , estimates of the form (2) are rather natural.Under very strong assumptions ensuring that the dependence is uniformly small (say,uniformly ergodic Markov chains, or Φ -mixing dependent sequences), subgaussian concen-tration inequalities are well known (see [Rio00] for the extension of (2) and [Sam00] forother concentration inequalities). For additive functionals, Lezaud [Lez98, p 861] proved aProhohorov-type inequality under a spectral gap condition in L , from which a subgaussianbound follows. However, there are very few such results under weaker assumptions (say, ge-ometrically ergodic Markov chains, or α -mixing dependent sequences), where other type ofexponential bounds are more usual (let us cite [MPR11] for α -mixing sequences and [AB13]for geometrically ergodic Markov chains; see also the references in those two papers fora quite complete picture of the literature). As an exception, let us mention the result ofAdamczak, who proves in [Ada08] subgaussian concentration inequalities for geometricallyergodic Markov chains under the additional assumptions that the chain is strongly aperiodicand that the functional K is invariant under permutation of its variables.Our goal in this note is to prove subgaussian concentration inequalities for aperiodicgeometrically ergodic Markov chains, extending the above result of [Ada08]. Such a settinghas a wide range of applications, in particular to MCMC (see for instance Section 3.2 in [AB13]). Our proof is mainly a reformulation in probabilistic terms of the proof givenin [CG12] for dynamical systems. It is based on a classical coupling estimate (Lemma 6below), but used in an unusual way along an unusual ﬁltration (the relationship betweencoupling and concentration has already been explored in [CR09]). Similar results can alsobe proved for Markov chains that mix more slowly (for instance, if the return times toa small set have a polynomial tail, then polynomial concentration inequalities hold). Theinterested reader is referred to the articles [CG12] and [GM14] where such results are provedfor dynamical systems: the proofs given there can be readily adapted to Markov chains usingthe techniques we describe in the current paper (the only diﬃculty is to prove an appropriatecoupling lemma extending Lemma 6). Since the main case of interest for applications isgeometrically ergodic Markov chains, and since the proof is more transparent in this case,we only give details for this situation.Our results are valid for Markov chains on a general state space S , but they are alreadynew and interesting for countable state Markov chains. The reader who is unfamiliar withgeneral state space Markov chains is therefore invited to pretend that S is countable. Wechose to present our results for general state space ﬁrstly because of the wealth of appli-cations, and secondly because of a peculiarity of general state space that does not existfor countable state space: there is a distinction between strongly aperiodic and aperiodicchains, and several mixing results only apply in the strongly aperiodic case (i.e., m = 1 inDeﬁnition 1 below) while our argument always applies.From this point on, we consider an irreducible aperiodic positive Markov chain ( X n ) n > on a general state space S , which we assume as usual to be countably generated. We referto the books [Num84] or [MT93] for the classical background on Markov chains on generalstate spaces. Let us nevertheless recall the meaning of some of the above terms, since itmay vary slightly between sources.First, we are given a measurable transition kernel P of the chain, that is, for any measur-able set A in S , P ( x, A ) = E ( X ∈ A | X = x ) . Starting from any point x , we obtain a chain X = x , X , X , . . . , where X i is distributedaccording to the measure P ( X i − , · ) . This chain is irreducible, aperiodic and positive if thereexists a (necessarily unique) stationary probability measure π such that, for all x , all set A with π ( A ) > and all large enough n (depending on x and A ), one has P n ( x, A ) > (where P n denotes the kernel of the Markov chain at time n ). Other deﬁnitions of irreducibility onlyrequire this property to hold for almost every x (in this case, one can restrict to an absorbingset of full π -measure to obtain it for all x there), we follow the deﬁnition of [MT93].We will be interested in a speciﬁc class of such Markov chains, called geometrically ergodic.There are many equivalent deﬁnitions of this class, in terms of the properties of the returntime to a nice set, or of mixing properties. Essentially, geometrically ergodic Markov chainsare those Markov chains that mix exponentially fast, see [MT93, Chapters 15 and 16] forseveral equivalent characterizations. For instance, they can be deﬁned as follows [MT93,Theorem 15.0.1(ii)]. ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 3

Deﬁnition 1.

An irreducible aperiodic positive Markov chain is geometrically ergodic ifthe tails of the return time to some small set are exponential. More precisely, there exist aset C , an integer m > , a probability measure ν , and δ ∈ (0 , , κ > such that • For all x ∈ C , one has (3) P m ( x, · ) > δν. • The return time τ C to C satisﬁes (4) sup x ∈ C E x ( κ τ C ) < ∞ . A set C satisfying (3) is called a small set (there is a related notion of petite set, thesenotions coincide in irreducible aperiodic Markov chains, see [MT93, Theorem 5.5.7]).In the case of a countable state space, this property is equivalent to the fact that thereturn time to some (or equivalently any) point has an exponential moment.From Theorem 15.0.1 of [MT93], it follows that if a chain is geometrically ergodic in thesense of Deﬁnition 1, then(5) k P n ( x, · ) − π k V ( x ) ρ n , where k·k is the total variation norm, ρ ∈ (0 , and V is a positive function such thatthe set S V = { x : V ( x ) < ∞} is absorbing and of full measure. This property (5) is infact another classical deﬁnition for geometric ergodicity: from Theorem 15.4.2 in [MT93](or Theorem 6.14 in [Num84]) it follows that if a chain is irreducible, aperiodic, positivelyrecurrent (so that there exists an unique stationary distribution π ) and satisﬁes (5), thenthere exists a small set C for which (4) holds.We prove the following theorem. Theorem 2.

Let ( X n ) be an irreducible aperiodic Markov chain which is geometricallyergodic on a space S . Let π be its stationary distribution. Let C be a small set as inDeﬁnition 1. There exists a constant M (depending on C ) with the following property. Let n ∈ N . Let K ( x , . . . , x n − ) be a function of n variables on S n , which is separately boundedwith constants L i , as in (1) . Then, for all t > , (6) P π ( | K ( X , . . . , X n − ) − E π K ( X , . . . , X n − ) | > t ) e − M − t / P L i , and for all x in the small set C , (7) P x ( | K ( X , . . . , X n − ) − E x K ( X , . . . , X n − ) | > t ) e − M − t / P L i . As will be clear from the proof, the constant M can be written explicitly in terms ofsimple numerical properties of the Markov chain, more precisely of its coupling time and ofthe return time to the small set C . We shall in fact prove (7), and show how it implies (6)(see the ﬁrst step of the proof of Theorem 2).Note that there is no strong aperiodicity assumption in our theorems (i.e., we are notrequiring m = 1 ), contrary to several mixing results for Markov chains. The reason for thisis that we will use the splitting method of Nummelin (see Deﬁnition 5 below) only to controlcoupling times, but we will not need the independence of the blocks between two successiveentrance times to the atom of the split chain as in [Ada08]. Following the classical strategyof McDiarmid, we will rather decompose K as a sum of martingale increments, and estimate ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 4 each of them. However, if we try to use the natural ﬁltration given by the time, we have nocontrol on what happens away from C . The main unusual idea in our argument is to useanother ﬁltration indexed by the next return to C , the rest is mainly routine.The following remarks show that the above theorem is sharp: it is not possible to weakenthe boundedness assumption (1), nor the assumption of geometric ergodicity. Remark 3.

It is often desirable to have estimates for functions which are unbounded.A typical example in geometrically ergodic Markov chains is the following. Consider anappropriate drift function, i.e., a function V > which is bounded on a small set C andsatisﬁes P V ( x ) ρV ( x ) + A C ( x ) for some numbers ρ < and A > (where P is theMarkov operator of the chain). One thinks of V as being “large close to inﬁnity”. A naturalcandidate for stronger concentration inequalities would be functions K satisfying(8) | K ( x , . . . , x i − , x i , x i +1 , . . . , x n − ) − K ( x , . . . , x i − , x ′ i , x i +1 , . . . , x n − ) | L i f ( V ( x i ) ∨ V ( x ′ i )) , for some positive function f going to inﬁnity at inﬁnity, for instance f ( t ) = log(1 + t ) .Unfortunately, subgaussian concentration inequalities do not hold for such functionals ofgeometrically ergodic Markov chains: there exists a geometrically ergodic Markov chainsuch that, for any M , for any function f going to inﬁnity, there exist n and a functional K satisfying (8) for which the inequality (6) is violated. Even more, concentration inequalitiesfail for additive functionals.Consider for instance the chain on { , , . . . } given by P (1 → s ) = 2 − s for s > and P ( s → s −

1) = 1 for s > . The function V ( s ) = 2 s/ satisﬁes the drift condition, for thesmall set C = { } , since P V ( s ) = 2 − / V ( s ) for s > and P V (1) = 2 − / / (1 − − / ) < ∞ .The stationary measure π is given by π ( s ) = 2 − s . In particular, V is integrable.Assume by contradiction that a concentration inequality (6) holds for all functionalssatisfying the bound (8), for some function f going to inﬁnity and some M > . Let ˜ f bea nondecreasing function with ˜ f ( x ) min( f ( x ) , x ) , tending to inﬁnity at inﬁnity. Deﬁnea function g ( s ) = ˜ f ( V ( s )) , except for s = 1 where g (1) is chosen so that R g d π = 0 . Let K ( x , . . . , x n − ) = P g ( x i ) , it satisﬁes (8) with L i = L constant and E π K = 0 .For any N > and n > , the Markov chain has a probability − n − N to start from X = n + N , and then the next n iterates are n + N − i > N . In this case, g ( X ) + · · · + g ( X n − ) > ng ( N ) . Applying (6), we get − n − N = π ( n + N ) P π ( | K − E π K | > ng ( N )) e − M − ( ng ( N )) / ( nL ) = 2 e − M − L − g ( N ) n . Letting n tend to inﬁnity, we deduce that M − L − g ( N ) log 2 . This is a contradiction if N is large enough, since g tends to inﬁnity.For instance, if one takes f ( t ) = p ln( t ∨ e ) , then g satisﬁes the subgaussian condition E π (exp( g ( X ) )) < ∞ , but nevertheless the subgaussian inequality for the additive func-tional g ( X ) + · · · + g ( X n − ) fails. Remark 4.

One may wonder if the subgaussian concentration inequality (6) can be provedin larger classes of Markov chains. This is not the case: (6) characterizes geometricallyergodic Markov chains, as we now explain.

ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 5

Consider an irreducible aperiodic Markov chain such that (6) holds for any separatelybounded functional. We want to prove that it is geometrically ergodic. By [MT93, Theorem5.2.2], there exists a small set, i.e., a set C satisfying (3), for some m > . If the originalchain satisﬁes subgaussian concentration inequalities, then the chain at times which aremultiples of m (called its m -skeleton) also does. Moreover, an irreducible aperiodic Markovchain is geometrically ergodic if and only if its m -skeleton is, by [MT93, Theorem 15.3.6].It follows that is suﬃces to prove the characterization when m = 1 , which we assume fromnow on.The proof uses the split chain of Nummelin (see [Num78] and [Num84]), which we describenow. Deﬁnition 5.

Let P be a transition kernel satisfying (3) for δ ∈ (0 , and ν a probabilitymeasure. The split chain is a Markov chain Y n on ¯ S = S × [0 , , whose transition kernel ¯ P is as follows: if x / ∈ C , then ¯ P (( x, t ) , · ) = P ( x, · ) ⊗ λ , where λ is the uniform measureon [0 , . If x ∈ C , then if t ∈ [0 , δ ] one sets ¯ P (( x, t ) , · ) = ν ⊗ λ , and if t ∈ ( δ, then ¯ P (( x, t ) , · ) = (1 − δ ) − ( ¯ P ( x, · ) − δν ) ⊗ λ . Essentially, the corresponding chain behaves as the chain on S , except when it enters C where the part of the transition kernel corresponding to δν is explicitly separated from therest.For x ∈ S , let P ¯ x denote the distribution of the Markov chain Y n started from δ x ⊗ λ .The ﬁrst component of Y n , living on S , is then distributed as the original Markov chainstarted from x . In the same way, the chain Y n started from ¯ π = π ⊗ λ has a ﬁrst projectionwhich is distributed as the original Markov chain started from π . For obvious reasons, westill denote by X n the ﬁrst component of Y n .Let ¯ C = C × [0 , δ ] . This is an atom of the chain Y n , i.e., ¯ P ( y, · ) does not depend on y ∈ C .We will show that the return time τ ¯ C to ¯ C has an exponential moment. Let C ′ = C × [0 , ,and let U n be the second component of Y n . Each time the chain X n enters C , i.e., Y n enters C ′ , then Y n enters ¯ C if and only if U n δ . Denote by t k the k -th visit to C ′ ofthe chain Y n , and note that ( t k ) is an increasing sequence of stopping times. By the strongMarkov property, it follows that ( U t k ) is an i.i.d. sequence of random variables with commondistribution λ . Let K ( X , . . . , X n ) = P ni =1 C ( X i ) denote the number of visits of X i to C .For any k n , { K ( X , . . . , X n ) > k } = { t k n } . It follows that, for any k n , P ¯ π ( τ ¯ C > n ) P π ( K ( X , . . . , X n ) < k ) + P ¯ π ( t k n, τ ¯ C > n ) P π ( K ( X , . . . , X n ) < k ) + P ¯ π ( t k n, U t > δ, . . . , U t k > δ ) P π ( K ( X , . . . , X n ) < k ) + (1 − δ ) k . Take k = εn for ε = π ( C ) / < π ( C ) . The subgaussian concentration inequality (6) appliedto K gives, for some c > , the inequality P π ( K ( X , . . . , X n ) εn ) e − cn . We deducethat τ ¯ C has an exponential moment, as desired, ﬁrst for ¯ π , then for its restriction to ¯ C since ¯ π ( ¯ C ) > , and then for any point in ¯ C since it is an atom (i.e., all starting points in ¯ C giverise to a chain with the same distribution after time ). Hence, for some κ > , sup y ∈ ¯ C E y ( κ τ ¯ C ) < ∞ . ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 6

By deﬁnition, this shows that the extended chain Y n is geometrically ergodic in the senseof Deﬁnition 1. It is then easy to deduce that X n also is, as follows. By (5), there exists ameasurable function ¯ V which is ﬁnite π -almost everywhere such that k ¯ P n ( y, · ) − ¯ π k ¯ V ( y ) ρ n for ρ ∈ (0 , and all y . We may take ¯ V ( y ) = sup n > ρ − n k ¯ P n ( y, · ) − ¯ π k . For x C , thisfunction ¯ V is constant on { x } × [0 , since the chains Y n starting from ( x, t ) or ( x, t ′ ) havethe same distribution after time . In the same way, for x ∈ C , the function ¯ V is constanton { x } × [0 , δ ] and on { x } × ( δ, . In particular, ¯ V is bounded, hence integrable, on π -almostevery ﬁber { x } × [0 , . Letting V ( x ) = R ¯ V ( x, t ) d t , we get k ( δ x ⊗ λ ) ¯ P n − ¯ π k V ( x ) ρ n (weuse the standard notation: for any measure ν on ¯ S , the measure ν ¯ P n on ¯ S is deﬁned by ν ¯ P n ( A ) = R ¯ P n ( y, A ) ν ( dy ) ). Since the ﬁrst marginal of the chain Y n started from δ x ⊗ λ is X n started from x , this yields k P n ( x, · ) − π k V ( x ) ρ n , where V is ﬁnite π -almosteverywhere. As we already mentioned, this implies that the chain is geometrically ergodicin the sense of Deﬁnition 1, by Theorem 15.4.2 in [MT93].For the proof of Theorem 2, we will use the following coupling lemma. It says that thechains starting from any point in C or from the stationary distribution can be coupled insuch a way that the coupling time has an exponential moment.Let us ﬁrst be more precise about what we call a coupling time. In general, a coupling between two random variables U and V is a way to realize these two random variableson a common probability space, usually to assert some closeness property between them.Formally, it is a probability space Ω ∗ together with two random variables U ∗ and V ∗ on Ω ∗ ,distributed respectively like U and V . Abusing notations, we will usually implicitly identify U and U ∗ , and V and V ∗ .Let µ and ˜ µ be two initial distributions on S . They give rise to two chains X n and ˜ X n .We will construct couplings ( X ∗ n ) and ( ˜ X ∗ n ) between these two chains with the followingadditional property: there exists a random variable τ : Ω ∗ → N , the coupling time , suchthat X ∗ n = ˜ X ∗ n for all n > τ . Lemma 6.

Consider an irreducible aperiodic geometrically ergodic Markov chain and asmall set C as in Deﬁnition 1. There exist constants M > and κ > with the followingproperty. Fix x ∈ C . Consider the Markov chains X n and X ′ n starting respectively from x , and from the stationary measure π . Then there exists a coupling between them with acoupling time τ such that E ( κ τ ) M . While this lemma has a very classical ﬂavor, we have not been able to locate a precisereference in the literature. We stress that the constants κ and M are uniform, i.e., they donot depend on x ∈ C . Proof.

We will ﬁrst give the proof when the chain is strongly aperiodic, i.e., m in Deﬁnition 1is equal to . Then, we will deduce the general case from the strongly aperiodic one.Assume m = 1 . We use the split chain Y n on ¯ S = S × [0 , introduced in Deﬁnition 5.We will use the notations of Remark 4, in particular ¯ C = C × [0 , δ ] and ¯ π = π ⊗ λ is thestationary distribution for Y n . Every time the Markov chain X n on S returns to C , there ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 7 is by deﬁnition a probability δ that the lifted chain Y n enters ¯ C . Hence, it follows from (4)that, for some κ > ,(9) sup ( x,s ) ∈ C × [0 , E ( x,s ) ( κ τ ¯ C ) < ∞ . In the same way, the entrance time to C starting from π has an exponential moment, byTheorem 2.5 (i) in [NT82]. It follows that, for some κ > ,(10) E ¯ π ( κ τ ¯ C ) < ∞ . Deﬁne T = inf { n > Y n ∈ ¯ C } and the return times T + · · · + T i +1 = inf { n > T + · · · + T i : Y n ∈ ¯ C } . Then T is independent of ( T i ) i> and T , T , . . . are i.i.d. Denote by P ¯ π the probabilitymeasure on the underlying space starting from the invariant distribution ¯ π , and by P ¯ x theprobability measure starting from δ x ⊗ λ for x ∈ S : the corresponding Markov chains liftthe Markov chains on S starting from π and x respectively. We infer from (9) and (10) thatthere exist κ > and M < ∞ such that(11) sup x ∈ C E ¯ x ( κ T ) M, E ¯ π ( κ T ) M and E ( κ T ) M. Let now Y n and Y ′ n be the Markov chains on ¯ S where Y ∼ δ x ⊗ λ with x ∈ C , and Y ′ ∼ ¯ π .It follows from (11) that their respective return times T + · · · + T i and T ′ + · · · + T ′ i to ¯ C are such that: • Both T and T ′ have a uniformly bounded exponential moment, i.e., E ( κ T ) M and E ( κ T ′ ) M . • The times T i and T ′ i for i > are all independent, identically distributed, and theircommon distribution p is aperiodic with an exponential moment.Deﬁne τ as τ = inf { n > ∃ i with n = T + · · · + T i and ∃ j with n = T ′ + · · · + T ′ j } + 1 . Lindvall [Lin79, Page 66] proves that, under the above two assumptions, τ has an exponentialmoment: there exist κ < , M < ∞ , depending only on κ , M and p , such that E ( κ τ ) M .Let Y ∗ n = Y n if n < τ and Y ∗ n = Y ′ n if n > τ . As both Y τ − and Y ′ τ − both belong to theatom ¯ C by deﬁnition of τ , the strong Markov property shows that ( Y ∗ n ) n ∈ N is distributed as ( Y n ) n ∈ N . Hence, we have constructed a coupling between Y n and Y ′ n , with a coupling time τ which has an exponential moment, uniformly in x . Considering their ﬁrst marginals, thisyields the desired coupling between X n (the Markov chain on S started from x ) and X ′ n (the Markov chain on S started from π ). This concludes the proof when m = 1 .Assume now m > . In this case, one uses the m -skeleton of the original Markov chain,i.e., the Markov chain at times in m N . By [MT93, Theorem 15.3.6], this m -skeleton is stillgeometrically ergodic, and the return times to C have a uniformly bounded exponentialmoment. Hence, the result with m = 1 yields a coupling between the chains ( X mn ) n ∈ N and ( X ′ mn ) n ∈ N started respectively from x ∈ C and from π , with a coupling time τ having auniformly bounded exponential moment. Thus, we deduce a coupling between ( X n ) n ∈ N and ( X ′ n ) n ∈ N together with a random variable τ taking values in m N , such that X nm = X ′ nm for all nm ∈ [ τ, + ∞ ) ∩ m N (from the technical point of view, this follows by seeing the ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 8 fact that ( X nm ) is a subsequence of ( X n ) as a coupling between these two sequences, andthen using the transitivity of couplings given by Lemma A.1 of [BP79]). This is not yet thedesired coupling since there is no guarantee that X i = X ′ i for i > τ , i / ∈ m N . Let X ∗ i = X i for i < τ , and X ∗ i = X ′ i for i > τ . It is distributed as ( X n ) by the strong Markov propertysince X τ = X ′ τ , and satisﬁes X ∗ i = X ′ i for all i > τ as desired. (cid:3) The following lemma readily follows.

Lemma 7.

Under the assumptions of Lemma 6, let K ( x , . . . ) be a function of ﬁnitely orinﬁnitely many variables, satisfying the boundedness condition (1) for some constants L i .Then, for all x ∈ C , | E x ( K ( X , X , . . . )) − E π ( K ( X , X , . . . )) | M X i > L i ρ i , where M > and ρ < do not depend on x or K .Proof. Consider the coupling given by the previous lemma, between the Markov chain X n started from x and the Markov chain X ′ n started from the stationary distribution π . Re-placing successively X ′ i with X i for i < τ , we get | K ( X , X , . . . ) − K ( X ′ , X ′ , . . . ) | X i<τ L i . Taking the expectation, we obtain (cid:12)(cid:12) E ( K ( X , X , . . . )) − E ( K ( X ′ , X ′ , . . . )) (cid:12)(cid:12) E X i<τ L i ! = X i L i P ( τ > i ) X i L i κ − i E ( κ τ ) M X L i κ − i . (cid:3) We start the proof of Theorem 2. To simplify the notations, consider K as a function ofinﬁnitely many variables, with L i = 0 for i > n . We start with several simple reductions inthe ﬁrst steps, before giving the real argument in Step 5. First step: It suﬃces to prove (7) , i.e., the concentration estimate starting from a point x ∈ C . Indeed, ﬁx some large

N > , and consider the function K N ( x , . . . , x n + N − ) = K ( x N , . . . , x N + n − ) . It satisﬁes L i ( K N ) = 0 for i < N and L i ( K N ) = L i − N ( K ) for N i < n + N . In particular, P L i ( K N ) = P L i ( K ) . Applying the inequality (7) to K N , we get(12) P x ( | K ( X N , . . . , X N + n − ) − E x K ( X N , . . . , X N + n − ) | > t ) e − M − t / P i > L i . Let g n ( x ) = E ( K ( X , . . . , X n − ) | X = x ) = E ( K ( X N , . . . , X N + n − ) | X N = x ) . When N → ∞ , the distribution of X N converges towards π in total variation, by (5). Since g n is bounded, it follows that E x K ( X N , . . . , X N + n − ) = E x g n ( X N ) → E π g n ( X ) = E π K ( X , . . . , X n − ) as N → ∞ . ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 9

Hence, for any ε > , their diﬀerence is bounded by ε if N is large enough. We obtain P π ( | K ( X , . . . ,X n − ) − E π K ( X , . . . , X n − ) | > t ) P π ( | K ( X , . . . , X n − ) − E x K ( X N , . . . , X N + n − ) | > t − ε ) ε + P x ( | K ( X N , . . . , X N + n − ) − E x K ( X N , . . . , X N + n − ) | > t − ε ) , using again the fact that the total variation between π and the distribution of X N startingfrom x is bounded by ε . Using (12) and letting then ε tend to , we obtain the desiredconcentration estimate (6) starting from π , i.e., P π ( | K ( X , . . . , X n − ) − E π K ( X , . . . , X n − ) | > t ) e − M − t / P i > L i . Second step: It suﬃces to prove that, for x ∈ C , (13) E x ( e K − E x K ) e M P i > L i , for some constant M independent of K . Indeed, assume that this holds. Then, for any λ > , P x ( K − E x K > t ) E x ( e λK − λ E x K − λt ) e − λt e λ M P i > L i , by (13). Taking λ = t/ (2 M P L i ) , we get a bound e − t / (4 M P L i ) . Applying also the samebound to − K , we obtain P x ( | K − E x K | > t ) e − t M P L i , as desired. Third step: Fix some ε > . It suﬃces to prove (13) assuming moreover that each L i satisﬁes L i ε . Indeed, assume that (13) is proved whenever L i ( K ) ε for all i . Consider now a generalfunction K . Take an arbitrary point x ∗ ∈ S . Deﬁne a new function ˜ K by ˜ K ( x , . . . , x n − ) = K ( y , . . . , y n − ) , where y i = x i if L i ( K ) ε , and y i = x ∗ if L i ( K ) > ε . This new function ˜ K satisﬁes L i ( ˜ K ) = L i ( K ) ( L i ( K ) ε ) ε . Therefore, it satisﬁes (13). Moreover, | K − ˜ K | P L i ( K ) >ε L i ( K ) P L i ( K ) /ε . Hence, E x ( e K − E x K ) e P L i ( K ) /ε E x ( e ˜ K − E x ˜ K ) e P L i ( K ) /ε e M P L i ( ˜ K ) . This is the desired inequality.Let us now start the proof of (13) for a function K with L i ε for all i . We considerthe Markov chain X , X , . . . starting from a ﬁxed point x ∈ C . We deﬁne a stoppingtime τ i = inf { n > i : X n ∈ C } . Let F i be the σ -ﬁeld corresponding to this stoppingtime: an event A is F i -measurable if, for all n , A ∩ { τ i = n } is measurable with respect to σ ( X , . . . , X n ) . Let D i = E ( K | F i ) − E ( K | F i − ) . ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 10

It is F i -measurable. By deﬁnition of D i , K ( X , . . . ) − E x ( K ( X , . . . )) = n X i =1 D i . Fourth step: It suﬃces to prove that (14) E ( e D i | F i − ) e M P k > i L k ρ k − i , for some M > and some ρ < , both independent of K . Indeed, assume that this inequality holds. Conditioning successively with respect to F n ,then F n − , and so on, we get E ( e K − E K ) = E ( e P D i ) e M P ni =0 P k > i L k ρ k − i e M / (1 − ρ ) · P i L i . This is the desired inequality.

Fifth step: Proof of (14) . Note ﬁrst that on the set { τ i − > i − } one has τ i − = τ i , and consequently D i = 0 .Hence, the following decomposition holds: D i = ∞ X j = i ( E ( K | F i ) − E ( K | F i − )) τ i = j,τ i − = i − = ∞ X j = i ( g j ( X , . . . , X j ) − g i − ( X , . . . , X i − )) τ i = j,τ i − = i − , (15)where g j ( x , . . . , x j ) = E X = x j K ( x , . . . , x j , X , . . . , X n − j − ) . Here, we have used the fact that E ( K | F i ) τ i = j = E ( K τ i = j | F i ) = E ( K τ i = j | X , . . . , X j ) = E ( K | X , . . . , X j ) τ i = j , which is commonly used in the proof of the strong Markov property for stopping times.Let now g j,π ( x , . . . , x j ) = E X ∼ π K ( x , . . . , x j , X , . . . , X n − j − ) . By Lemma 7, for any x j ∈ C ,(16) | g j ( x , . . . , x j ) − g j,π ( x , . . . , x j ) | M X k > j +1 L k ρ k − j . From (15) and (16), we infer that D i = ∞ X j = i ( g j,π ( X , . . . , X j ) − g i − ,π ( X , . . . , X i − )) τ i = j,τ i − = i − + O  X k > τ i +1 L k ρ k − τ i  + O X k > i L k ρ k − i  . (17)Since π is the stationary measure, g j,π can also be written as g j,π ( x , . . . , x j ) = E X ∼ π K ( x , . . . , x j , X j − i +2 , . . . , X n − i ) . ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 11

It follows that(18) | g j,π ( x , . . . , x j ) − g i − ,π ( x , . . . , x i − ) | j X k = i L k . Write τ = τ i − ( i − for the return time to C of X i − . From (18), we get that(19) ∞ X j = i ( g j,π ( X , . . . , X j ) − g i − ,π ( X , . . . , X i − )) τ i = j,τ i − = i − i + τ − X k = i L k ! τ i − = i − . Since P k > i L k ρ k − i P i + τ − k = i L k + P k > i + τ L k ρ k − i − τ , it follows from (17) and (19) that(20) | D i | M  i + τ − X k = i L k + X k > i + τ L k ρ k − i − τ  τ i − = i − . As all the L k are bounded by ε , we obtain(21) | D i | M ε ( τ + 1 / (1 − ρ )) τ i − = i − M ε τ τ i − = i − . Choose σ ∈ [ ρ, . The equation (20) also gives | D i | M X k > i L k σ k − i σ − τ  τ i − = i − . By the Cauchy-Schwarz inequality, this yields | D i | M σ − τ X k > i L k σ k − i X k > i σ k − i  τ i − = i − M σ − τ X k > i L k σ k − i  τ i − = i − . (22)We have e t t + t e | t | for all real t . Applying this inequality to D i , taking theconditional expectation with respect to F i − and using that E ( D i | F i − ) = 0 , this gives E ( e D i | F i − ) E ( D i e | D i | | F i − ) . Combining this estimate with (21) and (22), we get E ( e D i | F i − ) E  M e M ε τ σ − τ X k > i L k σ k − i | F i −  τ i − = i − M X k > i L k σ k − i  E (cid:0) e M ε τ σ − τ | X i − (cid:1) X i − ∈ C . By the deﬁnition of geometric ergodicity (see Deﬁnition 1) one can choose ε small enoughand σ close enough to in such a way that sup x ∈ C E (cid:0) e M ε τ σ − τ | X i − = x (cid:1) < ∞ . ONCENTRATION INEQUALITIES FOR MARKOV CHAINS 12

It follows that E ( e D i | F i − ) M X k > i L k σ k − i e M P k > i L k σ k − i . This concludes the proof of (14), and of Theorem 2. (cid:3)

References [AB13] Radosław Adamczak and Witold Bednorz,

Exponential concentration inequalities for additive func-tionals of Markov chains , arXiv:1201.3569v2, 2013. Cited page 1.[Ada08] Radosław Adamczak,

A tail inequality for suprema of unbounded empirical processes with appli-cations to Markov chains , Electron. J. Probab. (2008), no. 34, 1000–1034. MR2424985. Citedpages 1 and 3.[BP79] István Berkes and Walter Philipp, Approximation theorems for independent and weakly dependentrandom vectors , Ann. Probab. (1979), 29–54. MR515811. Cited page 8.[CG12] Jean-René Chazottes and Sébastien Gouëzel, Optimal concentration inequalities for dynamicalsystems , Comm. Math. Phys. (2012), 843–889. MR2993935. Cited page 2.[CR09] Jean-René Chazottes and Frank Redig,

Concentration inequalities for Markov processes via cou-pling , Electron. J. Probab. (2009), no. 40, 1162–1180. 2511280 (2010g:60039). Cited page 2.[GM14] Sébastien Gouëzel and Ian Mebourne, Moment bounds and concentration inequalities for slowlymixing dynamical systems , Electron. J. Probab. (2014), no. 93, 30p. Cited page 2.[Lez98] Pascal Lezaud, Chernoﬀ-type bound for ﬁnite Markov chains , Ann. Appl. Probab. (1998), no. 3,849–867. MR1627795. Cited page 1.[Lin79] Torgny Lindvall, On coupling of discrete renewal processes , Z. Wahrsch. Verw. Gebiete (1979),no. 1, 57–70. MR533006. Cited page 7.[McD89] Colin McDiarmid, On the method of bounded diﬀerences , Surveys in combinatorics, 1989 (Norwich,1989), London Math. Soc. Lecture Note Ser., vol. 141, Cambridge Univ. Press, Cambridge, 1989,pp. 148–188. MR1036755. Cited page 1.[MPR11] Florence Merlevède, Magda Peligrad, and Emmanuel Rio,

A Bernstein type inequality and moderatedeviations for weakly dependent sequences , Probab. Theory Related Fields (2011), 435–474.MR2851689. Cited page 1.[MT93] Sean P. Meyn and Richard L. Tweedie,

Markov chains and stochastic stability , Communicationsand Control Engineering Series, Springer-Verlag London Ltd., London, 1993. MR1287609. Citedpages 2, 3, 5, 6, and 7.[NT82] Esa Nummelin and Pekka Tuominen,

Geometric ergodicity of Harris recurrent markov chains withapplications to renewal theory , Stochastic Process. Appl. (1982), no. 2, 187–202. MR651903.Cited page 7.[Num78] Esa Nummelin, A splitting technique for Harris recurrent Markov chains , Z. Wahrsch. Verw. Ge-biete (1978), no. 4, 309–318. MR0501353. Cited page 5.[Num84] , General irreducible Markov chains and nonnegative operators , Cambridge Tracts in Math-ematics, Cambridge University Press, Cambridge, 1984. MR776608. Cited pages 2, 3, and 5.[Rio00] Emmanuel Rio,

Inégalités de Hoeﬀding pour les fonctions lipschitziennes de suites dépendantes , C.R. Acad. Sci. Paris Sér. I Math. (2000), 905–908. MR1771956. Cited page 1.[Sam00] Paul-Marie Samson,

Concentration of measure inequalities for Markov chains and Φ -mixing pro-cesses , Ann. Probab. (2000), 416–461. MR1756011. Cited page 1. Laboratoire MAP5 UMR CNRS 8145, Université Paris Descartes, Sorbonne Paris Cité

E-mail address : [email protected] IRMAR, CNRS UMR 6625, Université de Rennes 1, 35042 Rennes, France

E-mail address ::