Convergence of Markov chain transition probabilities
aa r X i v : . [ m a t h . P R ] A p r Convergence of Markov chain transitionprobabilities
Michael Scheutzow ∗ Dominik Schindler † Consider a discrete time Markov chain with rather general state space which hasan invariant probability measure µ . There are several sufficient conditions in theliterature which guarantee convergence of all or µ -almost all transition probabilitiesto µ in the total variation (TV) metric: irreducibility plus aperiodicity, equivalenceproperties of transition probabilities, or coupling properties. In this work, we reviewand improve some of these criteria in such a way that they become necessary andsufficient for TV convergence of all respectively µ -almost all transition probabilities.In addition, we discuss so-called generalized couplings. Primary 60J05 Secondary 60G10
Keywords.
Markov chain; total variation; convergence of transition probabilities; invariantmeasure; coupling; generalized coupling; irreducibility; Harris chain
1. Introduction
It is a classical result that all transition probabilities of a discrete time Markov chain withinvariant probability measure (ipm) µ on a rather general state space E converge to µ in thetotal variation metric provided that the chain is recurrent and aperiodic ([10]). Further, Doob’stheorem states that under appropriate additional conditions, ultimate equivalence of every pairof transition probabilities implies the same result (see [3, Theorem 4.2.1] or [8]). Finally theexistence of couplings of chains starting at different initial conditions entails total variationconvergence to µ . The goal of this paper is to modify the sufficient conditions in the literaturein such a way that they become equivalent. It will turn out, for example, that asymptotic ∗ Institut für Mathematik, MA 7-5, Fakultät II, Technische Universität Berlin, Straße des 17. Juni 136, 10623Berlin, Germany; ms @ math.tu-berlin.de † Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, UnitedKingdom; dominik.schindler19 @ imperial.ac.uk quivalence of transition probabilities (which seems to be a new concept) is equivalent to totalvariation convergence of all transition probabilities. It is also of interest to find weaker conditionswhich only imply total variation convergence of the transition probabilities starting from µ almostevery x P E . Again we will provide necessary and sufficient conditions similar to those describedabove. We will also address a convergence property strictly between these two and again wewill provide necessary and sufficient conditions. Apart from couplings we will also formulateequivalent conditions in terms of generalized couplings for each of the convergence properties.Throughout this paper p E, E q denotes a measurable space for which E is countably generatedand the diagonal ∆ : “ tp x, x q : x P E u is in E b E (or, equivalently, E is countably generated andseparates points or, equivalently, E is countably generated and all singletons are in E (see [4, p.116]). Let P be a Markov kernel on E and denote the corresponding n -step transition probabilityby P n p ., . q , n P N . P x denotes the law of the Markov chain starting at x P E . Note that P x is a probability measure on p E N , E b N q . We will often identify a Markov chain and its Markovkernel P and denote the corresponding Markov chain by X . We denote the total variation metricon the space of probability measures on p E, E q by d , i.e. d p ν , ν q : “ sup A P E | ν p A q ´ ν p A q| . Wesay that P n p x, . q converges to a probability measure µ on p E, E q if P n p x, . q converges to µ in thetotal variation metric as n Ñ 8 . Throughout the paper we will assume that P admits an ipm µ (but we will not assume uniqueness of µ ). From now on, the letter µ will always denote aninvariant probability measure of the Markov chain X associated to P .Let ν and ν be measures on the same measurable space p ¯ E, ¯ E q . Then we say (as usual) that ν is absolutely continuous with respect to ν (notation ν ! ν ) if A P ¯ E with ν p A q “ implies ν p A q “ , and that ν and ν are equivalent (denoted ν „ ν ) if they are mutually absolutelycontinuous. Further we write ν M ν if ν and ν are non-singular, i.e. there does not exist aset A P ¯ E such that ν p A q “ and ν p A c q “ . Any measure ξ on p ¯ E ˆ ¯ E, ¯ E b ¯ E q with marginals ν and ν is called a coupling of ν and ν . We write ξ P C p ν , ν q . Recall the coupling equality :for probability measures ν and ν on p ¯ E, ¯ E q , we have d p ν , ν q “ inf t ξ p ∆ q : ξ P C p ν , ν qu ([7,Theorem 2.2.2]). We will call a pair p X, Y q of ¯ E -valued random variables defined on the sameprobability space a coupling of the probability measures ν and ν on p ¯ E, ¯ E q , if their joint law isa coupling of ν and ν . Below we will deal with the cases ¯ E : “ E and ¯ E : “ E N . We will definethe concept of a generalized coupling later. Generalized (asymptotic) couplings are particularlyuseful to prove weak convergence of transition probabilities (see [9] and [2]) but (non-asymptotic)generalized couplings can also be used to establish upper bounds on the total variation distanceof transition probabilities (see [5, Proof of Theorem 1.1]).We will formulate all results in the discrete-time set-up. This is essentially without loss ofgenerality. Indeed, assume that µ is an invariant probability measure of an E -valued continuous-time Markov process. Then µ is also an ipm of the associated skeleton chain sampled at times , h, h, ... and for each x P E total variation convergence of P nh p x, . q to µ (as n Ñ 8 ) for some h ą is equivalent to total variation convergence of P t p x, . q to µ since t ÞÑ d ` P t p x, . q , µ ˘ isnon-increasing.Once one has established convergence of all or almost all transition probabilities then it is natural2o ask for the speed of convergence. A large number of papers have been devoted to thesequestions, for example [6], [12] and [7]. We will however, not touch these questions here.At some point we will need a stronger condition on the measurable space p E, E q : as usual, wesay that p E, E q is a Borel space if it is isomorphic (as a measurable space) to a Borel subset of r , s . In particular, this holds for a complete, separable metric space E equipped with its Borel σ -field E .
2. Necessary and sufficient conditions for total variationconvergence
Let ` X n ˘ n P N be a Markov chain with transition kernel P , ipm µ and state space p E, E q as inthe introduction. We adopt the following notation (cf. [10]). Notation 2.1.
For x P E , A P E , Q p x, A q : “ P x ` X n P A for infinitely many n P N (˘ ,L p x, A q : “ P x ´ ď n “ X n P A (¯ . We start by defining three properties of increasing generality which we will be interested in.
Properties 2.2.
We say that • Property P holds if P n p x, . q converges to µ for every x P E . • Property P holds if P n p x, . q converges to µ for µ -almost all x P E and lim n Ñ8 d p P n p x, . q , µ q ă for all x P E . • Property P holds if P n p x, . q converges to µ for µ -almost all x P E . Remark 2.3.
Note that Properties P and P both imply uniqueness of µ (we will show thelatter claim in Remark 5.1). Note also that lim n Ñ8 d p P n p x, . q , µ q always exists since µ is invariantand the total variation distance can never increase when applying a measurable map. Therefore,we could replace “ lim n Ñ8 d p P n p x, . q , µ q ă for all x P E ” in P by “for each x there exists some n P N such that d p P n p x, . q , µ q ă ” without changing the class of chains for which P holds.One might also be interested in a modification ˜P of Property P in which the last property lim n Ñ8 d p P n p x, . q , µ q ă for all x P E is replaced by uniqueness of µ . Clearly, P is strongerthan ˜P and it is easy to see that it is strictly stronger. Property ˜P was studied in [8], forexample, but P is more closely related to conditions studied in the literature. We will see, inparticular, that the assumptions of [8, Corollary 1] do not only imply ˜P but even P . Example3.2 shows that one cannot delete the first part of property P without changing the class ofchains for which it holds.We will define four sets of assumptions, one in terms of equivalence or non-singularity of transitionprobabilities, one in terms of aperiodicity and recurrence or irreducibility properties, one in termsof couplings and one in terms of generalized couplings. It will turn out that all assumption withindex i , i P t , , u , not only imply property P i but are also necessary for P i to hold. Insome cases we formulate conditions with an additional prime (or some other symbol) which willformally be stronger than the same condition without prime but which will in fact turn out tobe equivalent (at least when the state space is Borel). Before we state various assumptions wedefine the (possibly new) concept of asymptotic equivalence of transition probabilities. Definition 2.4.
We say that the states x P E and y P E are asymptotically equivalent if for each ε ą there exists some n P N and a set A P E such that P n p x, A q ě ´ ε , P n p y, A q ě ´ ε , andthe measures P n p x, . q and P n p y, . q restricted to the set A are equivalent. Remark 2.5.
Note that if for given x, y P E , ε ą and n P N there exists a set A as in theprevious definition, then there exists a set ¯ A as in the previous definition (with the same ε ) if n is replaced by n ` (and, by iteration, the same holds for all integers larger than n ). Thisimplies, in particular, that asymptotic equivalence induces an equivalence relation on E . Assumptions 2.6.
We say that • Assumption A holds if all pairs p x, y q P E ˆ E are asymptotically equivalent. • Assumption A holds if for all p x, y q P E ˆ E there exists some n “ n x,y P N such that P n p x, . q M P n p y, . q . • Assumption A holds if for µ b µ -almost all p x, y q P E ˆ E there exists some n “ n x,y P N such that P n p x, . q M P n p y, . q . • Assumption A holds if µ b µ -almost all p x, y q P E ˆ E are asymptotically equivalent.Lemma A.7 states that the set of all p x, y q P E ˆ E which are asymptotically equivalent is ameasurable subset of p E ˆ E, E b E q . Remark 2.7.
Obviously, Property P implies that any two states x, y are asymptotically equiv-alent (i.e. A holds) while the simple Example 5.3 shows that it does not imply the strongerproperty “for all x, y P E there exists some n “ n x,y P N such that P n p x, . q „ P n p y, . q ” underwhich P was shown in [8, Theorem 1].Before we state the second set of assumptions, we define the concepts of aperiodicity, irreducibilityand the Harris property for a Markov kernel P with invariant measure µ .4 efinition 2.8. [12, p. 32] The Markov kernel P (with invariant probability measure µ ) is called d -periodic , if d ě , and there are disjoint sets E , E , ..., E d P E with µ p E q ą that fulfill P p x, E i ` p mod d q q “ @ x P E i , ď i ď d. (1)The chain is called aperiodic if no such d ě exists. Definition 2.9.
The Markov kernel P is called φ -irreducible if φ is a non-trivial σ -finite measureon p E, E q such that for all A P E with φ p A q ą and all x P E we have L p x, A q ą (or,equivalently, there exists some n “ n p x, A q P N such that P n p x, A q ą ). P is called irreducible if P is φ -irreducible for some non-trivial φ . We say that P is weakly irreducible (with respect tothe given ipm µ ) if there exists some non-trivial σ -finite measure φ on p E, E q and a set E P E satisfying µ p E q “ such that for every x P E and every A P E with φ p A q ą we have L p x, A q ą . Remark 2.10.
It is straightforward to check that if φ is as in the definition (either part),then φ ! µ . Further, if P is (weakly) µ -irreducible then P is (weakly) φ -irreducible for everynon-trivial σ -finite measure on p E, E q satisfying φ ! µ . We will show in Proposition A.1 theless obvious fact that ( φ -)irreducibility implies µ -irreducibility (which, in the terminology of [10,Proposition 4.2.2], means that µ is the maximal irreducibility measure ). We will use PropositionA.1 only in the proof of Theorem 2.17. Definition 2.11. [10, p. 199] P or the associated Markov chain X are called Harris (or
Harrisrecurrent ), if there exists a non-trivial σ -finite measure φ on p E, E q such that for all A P E with φ p A q ą and all x P E we have Q p x, A q “ (or, equivalently, L p x, A q “ for all x P E and A P E with φ p A q ą ). Assumptions 2.12.
We say that • Assumption B holds if P is aperiodic and Harris. • Assumption B holds if P is aperiodic and irreducible. • Assumption B holds if P is aperiodic and weakly irreducible.Note that Harris recurrence implies irreducibility, so B implies B .Let M p ¯ E q be the set of all probability measures on the measurable space p ¯ E, ¯ E q . For ξ P M p ¯ E ˆ ¯ E q , we denote the i -th marginal by ξ i , i P t , u . If p ¯ E, ¯ E q “ p E N , ¯ E N q , then we denotethe projection of ξ resp. ξ i onto the k -th coordinate by ξ k resp. ξ ik , k P N , i P t , u . Assumptions 2.13.
We say that • Assumption C holds if for each x, y P E and m P N there exists some k m P N and acoupling ζ r m s P C p P k m p x, . q , P k m p y, . qq such that ζ r m sp ∆ q ě ´ m .5 Assumption ˆC holds if for each x, y P E and m P N there exists a coupling ζ r m s P C p P m p x, . q , P m p y, . qq such that lim m Ñ8 ζ r m sp ∆ q “ . • Assumption ˚C holds if for each x, y P E there exists a coupling ξ P C p P x , P y q such that lim m Ñ8 ξ m p ∆ q “ . • Assumption C holds if for each x, y P E there exists a coupling p X k q k P N , p Y k q k P N of P x and P y on some space p Ω , F , P q such that lim n Ñ8 P ` X k “ Y k for all k ě n ˘ “ . • Assumption C holds if for all x, y P E there exists some k P N and a coupling ζ P C p P k p x, . q , P k p y, . qq such that ζ p ∆ q ą . • Assumption C holds if for each x, y P E there exists a coupling p X k q k P N , p Y k q k P N of P x and P y on some space p Ω , F , P q such that lim inf n Ñ8 P ` X k “ Y k for all k ě n ˘ ą andfor µ b µ -almost every p x, y q P E ˆ E there exists a coupling p X k q k P N , p Y k q k P N of P x and P y on some space p Ω , F , P q such that lim n Ñ8 P ` X k “ Y k for all k ě n ˘ “ . • Assumption C holds if for µ b µ -almost every p x, y q P E ˆ E there exists some k P N anda coupling ζ P C p P k p x, . q , P k p y, . qq such that ζ p ∆ q ą . • Assumption C holds if for µ b µ -almost every p x, y q P E ˆ E there exists a coupling p X k q k P N , p Y k q k P N of P x and P y on some space p Ω , F , P q such that lim n Ñ8 P ` X k “ Y k for all k ě n ˘ “ .We chose Condition C i such that it is as weak as possible and C i such that it is as strong aspossible subject to the requirement that both are equivalent to all other conditions with thesame index i (in case the state space is Borel). Note that there are several natural conditions inbetween C i and C i ( i “ , , ) for which there is no need to state them, since they will all turnout to be equivalent (at least in the Borel case). Finally, we define the concept of a generalizedcoupling . Definition 2.14.
For probability measures ν and ν on p ¯ E, ¯ E q , define • ˜ C p ν , ν q : “ ξ P M p ¯ E ˆ ¯ E q : ξ ! ν , ξ ! ν ( , • ˇ C p ν , ν q : “ ξ P M p ¯ E ˆ ¯ E q : ξ ! ν , ξ „ ν ( . Assumptions 2.15.
We say that • Assumption G holds if for each pair p x, y q P E ˆ E there exists some ξ P ˇ C p P x , P y q suchthat lim k Ñ8 ξ k p ∆ q “ , • Assumption G holds if for each pair p x, y q P E ˆ E there exists some k P N and ζ P ˜ C p P k p x, . q , P k p y, . qq such that ζ p ∆ q ą . 6 Assumption G holds if for µ b µ -almost every p x, y q P E ˆ E there exists some k P N and ζ P ˜ C p P k p x, . q , P k p y, . qq such that ζ p ∆ q ą .If we change “ ą ” in G to “ “ ”, then the resulting condition is not equivalent to G (seeExample 5.4). Theorem 2.16. A , B , C , ˆC , and P are equivalent and C ñ ˚C ñ G ñ A . If p E, E q isBorel, then all these conditions are equivalent. Theorem 2.17. A , B , C , G , and P are equivalent and are implied by C . If p E, E q isBorel, then each of the equivalent conditions implies C . Theorem 2.18. A , A , B , C , G , and P are equivalent and are implied by C . If p E, E q isBorel, then each of the equivalent conditions implies C . Remark 2.19.
We do not know if the equivalence of all conditions with the same index holdseven under our general conditions on the space p E, E q . We will comment on this in Remark 5.8.
3. First results and the proof of Theorem 2.16
Let us first state those implications in the theorems which are obvious from the definitions orare well-known.
Proposition 3.1.
We havea) B ñ P ,b) C ñ ˚C ñ G , P ñ ˆC ñ C ñ A ,c) C ñ C ñ G ñ A , P ñ A ô C ,d) P ñ A ñ A , C ñ C ô A , and C ñ G ñ A .Proof. Statement a) is a classical result and a proof can be found for example in [10, p. 328]. Theremaining implications are either obvious or easy consequences of the coupling equality statedin the introduction.We continue by providing a slightly generalized version of the
Recurrence Lemma from [8, Lemma2] that will turn out to be useful later. 7 emma 3.2 (Recurrence Lemma) . Assume that P satisfies Assumption A . Then for any B P E with µ p B q ą , for µ -almost every x P E Q p x, B q “ . (2) If, moreover, P satisfies Assumption A , then Q p x, B q ą holds for every x P E .If, moreover, P satisfies Assumption A , then (2) holds for every x P E .Proof. For B P E with µ p B q ą define ψ p x q : “ Q p x, B q “ P x p X k P B infinitely often), x P E .Starting X with law µ , we see that ψ p X n q , n P N is both stationary and a (bounded) martingalewhich converges to 1 }} t X k P B i.o. u almost surely which implies ψ p x q P t , u for µ -almost all x P E .Let Ψ i “ t x : ψ p x q “ i u , i P t , u . Then, by the martingale property, P n p x, Ψ i q “ for all n P N and for µ -almost all x P Ψ i , i P t , u . If A holds, then (at least) one of the sets Ψ , Ψ has µ -measure zero. Since µ p B q ą , Birkhoff’s ergodic theorem implies µ ` Ψ ˘ ą , so µ ` Ψ ˘ “ and µ ` Ψ ˘ “ , finishing the proof of the first statement.Let Assumption A hold and fix x P E . Since P n p y, Ψ q “ for µ -almost all y and all n P N ,there exists some y P E such that P n p y , Ψ q “ for all n P N . Now A applied to x and y shows that there exists some n P N such that P n p x, Ψ q ą , finishing the proof of the secondclaim.Let Assumption A hold and fix x P E . As above, there exists some y P E such that P n p y , Ψ q “ for all n P N . Now A applied to x and y shows that lim n Ñ8 P n p x, Ψ q “ , so x P Ψ andtherefore (2) holds. Proposition 3.3.
A Markov kernel P which satisfies Assumption A is aperiodic.Proof. Suppose P has period d ě , and let E , E , ..., E d be as in Definition 2.8. Then µ p E i q ą for i “ , , ..., d . Choose x P E , y P E , and n P N arbitrarily. Then P n p x, E n ` p mod d q q “ and P n p y, E n ` p mod d q q “ and therefore P n p x, ¨q K P n p y, ¨q . This contradicts Assumption A since p µ b µ qp E ˆ E q ą . Corollary 3.4. A ñ B , A ñ B , and A ñ B .Proof. Lemma 3.2, Proposition 3.3 and Remark 2.10 immediately imply the first two implica-tions (with φ : “ µ ) but not the last one since the conclusion of the Recurrence Lemma underthe assumption A (or the stronger assumption A ) is weaker than weak irreducibility (the ex-ceptional sets of µ -measure 0 may depend on the set B and there may be uncountably many8uch sets). Therefore, we argue as follows: for x P E , let R x : “ t y P E : x, y are as. equiv. u .Assumption A and Lemma A.7 imply that R x P E and µ p R x q “ for µ -a.a. x P E . Fix x P E such that µ p R x q “ . Since asymptotic equivalence is an equivalence relation by Remark 2.5, itfollows that property A holds on R x . Using Lemma A.4, we see that B holds on R x and hence B holds on E .Before we step into the proofs of Theorems 2.16, 2.17, and 2.18, we sketch how one can seethat A i implies C i for i P t , , u . The proofs are largely identical to those in [8] where theimplications ˜A ñ P , A ñ ˜P , and A ñ P were shown (with ˜A slightly stronger than A and ˜P slightly weaker than P and without the assumptions that the state space is Borel). Wewill need the Borel property only at the end of the proof when we apply the gluing lemma. Proposition 3.5.
We have A ñ P . Further, if p E, E q is Borel, then A ñ C , A ñ C , and A ñ C . Idea of the proof.
Under A , we define for N P N and p P p , q C N,p : “ p x, y q P E ˆ E : d ` P N p x, . q , P N p y, . q ˘ ď ´ p ( .C N,p P E b E by Proposition A.6 and Assumption A implies µ b µ p C N,p q ą for some N and p . Fix N and p and write C : “ C N,p . Let us first assume that N “ (this is without loss ofgenerality for proving A ñ P but not without loss of generality for proving A ñ C ). In [8],the authors proceed by constructing a Markov chain Z n , n P N on the product space E ˆ E ,which is a coupling of two chains with Markov kernel P with transition kernel S defined as S ` p x, y q , . ˘ : “ " Q ` p x, y q , . ˘ if p x, y q P CR ` p x, y q , . ˘ otherwise.Here, R ` p x, y q , . ˘ is the product of P ` x, . ˘ and P ` y, . ˘ and the kernel Q satisfies Q ` p x, y q , ∆ ˘ “ ´ d ` P p x, . q , P p y, . q ˘ and Q ` p x, y q , . ˘ restricted to p E ˆ E qz ∆ is absolutely continuous withrepect to the product of P ` x, . ˘ and P ` y, . ˘ (the fact that such a kernel Q exists is stated in [8,Lemma 1]). The idea behind the definition of the kernel S is the following: whenever the chainon E ˆ E is in a state p x, y q P C , then we try to couple the two coordinates in the next stepby applying Q which maximizes the coupling probability. Otherwise, we let the two coordinatesmove independently until the pair hits the set C . As soon as the chain Z hits the diagonal ∆ it remains in that state forever. It remains to ensure that the set C is hit infinitely many timesand therefore the process Z n will almost surely eventually hit ∆ . The fact that p Z n q will hit theset C almost surely in finite time can be seen as follows: consider an independent coupling p W n q of two copies of the chain. Since µ b µ p C q ą , the Recurrence Lemma shows that p W n q willhit the set C almost surely in finite time for almost all initial conditions and even for all initialconditions if we assume A . Since, up to the first hitting time of the set C , the processes W and9 have the same law, p Z n q will also hit the set C almost surely in finite time. If the couplingattempt at that time is unsuccessful, then the chain Z again performs an independent couplingup to the next hit of C , which, by the same argument (and the strong Markov property andthe assumptions on the kernel Q ), is an almost sure event. The constructed coupling thereforeshows that C holds under A and both C and P hold under A . Further, under A , for anypair x, y P E the probability that the constructed coupling is successful, is strictly positive bythe second part of the Recurrence Lemma, so C holds. This proves the claims in case N in thedefinition of the set C N,p can be chosen to be 1.Finally, we assume that N ě . The first claim follows from the case N “ since n ÞÑ d p P n p x, . q , µ q is non-increasing. To see the remaining claims, we apply the previous considerationto the skeleton chain evaluated at integer multiples of N and obtain corresponding couplings Z nN “ p X nN , Y nN q , n P N for the skeleton chains as above. We have to make sure thatthese can be appropriately interpolated between successive multiples of N . This follows froman application of the gluing lemma in the appendix (which requires the state space to be Borel)to each gap between successive multiples of N (with conditionally independent interpolations),see [12, p.43] for a similar construction (it seems that the authors forgot to mention that thisconstruction requires the space to be Borel, see Remark 5.8). Proof of Theorem 2.16.
Observing Proposition 3.1, Corollary 3.4 and Proposition 3.5 the claimfollows once we prove that G ñ A . G ñ A : Fix a pair p x, y q P E ˆ E . We show that x and y are asymptotically equivalent. Fix ε ą . By assumption there exists some ξ P ˇ C p P x , P y q such that lim k Ñ8 ξ k p ∆ q “ . Since ξ and P y are equivalent, we can find some δ ą such that for every Γ P E b N satisfying ξ p Γ q ă δ ,we have P y p Γ q ă ε . Let n P N be such that ξ k p ∆ q ą ´ δ for every k ě n . Then, for B P E and n ě n , P n p x, B q “ ñ ξ n p B q “ ñ ξ n p B q ă δ ñ P n p y, B q ă ε, where we used absolute continuity of ξ n with respect to P n p x, . q in the first step. Reversing theroles of x and y we get P n p y, B q “ ñ P n p x, B q ă ε for all n ě n . Fix n ě n _ n and let B P E be a set which maximizes P n p y, B q among all sets B P E which satisfy P n p x, B q “ andlet C P E be a set which maximizes P n p x, C q among all sets C P E which satisfy P n p y, C q “ .Define A : “ E zp B Y C q . Then P n p x, A q ě ´ ε , P n p y, A q ě ´ ε and the restrictions of P n p x, . q and P n p x, . q to A are equivalent. The claim follows since ε ą was arbitrary.
4. Proofs of Theorems 2.17 and 2.18
Proof of Theorem 2.17.
Thanks to Proposition 3.1, Corollary 3.4 and Proposition 3.5, the theo-rem is proved once we establish B ñ P . Rather than adapting the proof of B ñ P we preferto argue along the following lines: if B holds, then we show that there exists an invariant set10 Ă E (i.e. E P E and P p x, E q “ for all x P E ) of full µ -measure on which B holds andhence, by Theorem 2.16, P holds. Then we show that P holds on the full space E . B ñ P : We are not aware of a simple direct proof that there exists a subset of full µ -measureon which B holds. Even though ( µ -)irreducibility implies that Q p x, B q “ for every B P E forwhich µ p B q ą and µ -almost every x P E , the exceptional sets may depend on B and there are(typically) uncountably many such sets B .Since P is irreducible, Proposition A.2 shows that there exists a small set C P E (with ν and m as stated there). We can and will assume that ν p E z C q “ . Define G : “ t x P E : Q p x, C q “ u .Then G P E , G is invariant, and µ p G q “ . We claim that property B holds on G . All we haveto show is that Q p x, B q “ for all x P G X C and all B P E such that µ p B q ą . Fix such a set B and let H : “ t x P G X C : Q p x, B q “ u . Then µ p H q “ µ p C q ą and for x P H we have P m p x, H q “ P m p x, C q ě ν p C q ą . Assume that y P G X C satisfies Q p y, B q ă (i.e. y R H ).Then, P m p y, H q ě ν p H q “ ν p C q (since “ P m p x, C z H q ě ν p C z H q for x P H ). This meansthat, whenever the chain is in the set p C X G qz H , then with probability at least ν p C q ą it willhit the set H after m steps. Since the chain starting at y P G X G visits C X G infinitely often(almost surely), it follows that L p y, H q “ , contradicting our assumption on y . Using LemmaA.4, G equipped with the trace σ -field satisfies our assumption on the state space and we seethat property B holds on G .Theorem 2.16 shows that property P holds on G . Then, clearly, property P holds on E . Since P is irreducible, we have L p x, G q ą and hence lim n Ñ8 d p P n p x, . q , µ q ă for every x P E andtherefore P holds on E . Proof of Theorem 2.18.
By Proposition 3.1, Corollary 3.4 and Proposition 3.5 it suffices to showthat B ñ P . B ñ P : We can argue like in the proof of B ñ P (the present argument is even easier).Using the very definition of weak irreducibility, we find an invariant set E of full µ -measure onwhich B and hence, using Theorem 2.17, P hold. Therefore, P holds on E .
5. Complements, examples, and open problems
Remark 5.1.
We show that Property P implies uniqueness of µ (as claimed in Remark 2.3):assume that µ and ˜ µ are different ipm’s and let ˆ µ : “ ` µ ` ˜ µ ˘ . Since P ô A and property A is independent of the chosen ipm, we see that P holds with respect to both µ and ˆ µ , so P n p x, . q converges to µ for µ -almost all x and to ˆ µ for ˆ µ -almost all x . Since ˆ µ ! µ and ˆ µ ‰ µ this is acontradiction (this proof is adapted from [8, Proof of Corollary 1]).11 xample 5.2. Let E : “ t , u and P p , t uq “ P p , t uq “ . Then the unique invariantprobability measure µ is given by µ pt uq “ µ pt uq “ { . For this example, the second part ofproperty P holds but the first part doesn’t, so the first part of P cannot be deleted withoutchanging the class of chains for which P holds. Example 5.3.
Let E : “ N with the discrete σ -field E . Define P p x, t x ´ uq “ for x ě , P p , t uq “ and P p , t x uq “ ´ x for x P N . Clearly all transition probabilities converge to µ “ δ but P n p , . q and P n p , . q are non-equivalent for every n P N (but the states 0 and 1 areasymptotically equivalent). Example 5.4. (cf. [9, Example 5].) Let E : “ N with the discrete σ -field E . Define P p , t uq “ and P p x, t x ´ uq “ { and P p x, t x ` uq “ { for x P N . Clearly, µ “ δ is the unique invariantprobability measure and P n p x, . q does not converge to µ if x ą , so P satisfies P but not P .Note that for each x, y P E and k ě x ^ y , ζ : “ δ b δ satisfies ζ P ˜ C p P k p x, . q , P k p y, . qq and ζ p ∆ q “ , showing that if “ ą ” in Assumption G is replaced by “ “ ”, then the condition does not imply G . Remark 5.5.
Note that Assumption G is formally weaker than requiring that for each pair p x, y q P E ˆ E there exists some ξ P ˜ C p P x , P y q such that ξ „ P x and ξ „ P y , but these twoconditions are in fact equivalent: according to G we find, for each pair p x, y q , some ˇ ξ P ˇ C p P x , P y q such that lim k Ñ8 ˇ ξ k p ∆ q “ and some ˆ ξ P ˇ C p P y , P x q such that lim k Ñ8 ˆ ξ k p ∆ q “ . Then ξ : “ ˇ ξ ` ˆ ξ satisfies the formally stronger condition. Remark 5.6.
One may ask whether it is sufficient for P to hold if for each pair p x, y q P E ˆ E and each k P N there exists some probability measure ζ k on p E ˆ E, E b E q whose marginals areequivalent to P n p x, . q and P n p y, . q respectively, such that lim n Ñ8 ζ k p ∆ q “ . Again, Example5.4 provides a negative answer. Consider ξ as in the previous example. Then lim k Ñ8 ξ k p ∆ q ě lim k Ñ8 ξ k ptp , quq “ . Note that the marginals of the measures ξ k are equivalent to P k p x, . q and P k p y, . q respectively but that ξ and ξ are not equivalent to P x respectively P y . Remark 5.7.
From Theorem 2.16 we know that C ñ P holds since C ñ A ñ B ñ P .Here we present an essentially well-known direct proof. For x P E , n P N , and A P E we have | µ p A q ´ P n p x, A q| “ ˇˇˇ ż E P n p y, A q d µ p y q ´ P n p x, A q ˇˇˇ “ ˇˇˇ ż E ´ P n p y, A q ´ P n p x, A q ¯ d µ p y q ˇˇˇ ď ż E ˇˇˇ P n p y, A q ´ P n p x, A q ˇˇˇ d µ p y q ď ż E d ´ P n p y, . q , P n p x, . q ¯ d µ p y q which converges to 0 by dominated convergence (note that Proposition A.6 shows that the lastintegrand is measurable with respect to y ), so the claim follows.In fact, a slight modification of the proof shows the result without employing Proposition A.6(and without assuming that E is countably generated):fix x and let R n p y, A q : “ ˇˇˇ P n p y, A q ´ P n p x, A q ˇˇˇ , n P N . There exist sets A n P E such that U n : “ sup A P E ´ ż E R n p y, A q d µ p y q ¯ ď ż E R n p y, A n q d µ p y q ` ´ n , n Ñ 8 by dominated convergence.
Remark 5.8.
It seems to be an open question whether all properties stated in Theorem 2.16are equivalent even in the case in which p E, E q is not Borel (and similarly for Theorems 2.17 and2.18). The present proof which is based on the gluing lemma A.3 can not be applied in this case:[1] contains an example of a separable and metric space equipped with its Borel σ -field for whichthe conclusion in the gluing lemma fails. A. Auxiliary results and measurability issues
A.1. µ -irreducibility and the existence of small sets We start with a proposition which was announced in Remark 2.10 and whose proof is inspiredby that of [10, Proposition 4.2.2].
Proposition A.1. If P is φ -irreducible, then P is µ -irreducible.Proof. Let P be φ -irreducible. Then φ ! µ (see Remark 2.10) and, due to Lebesgue’s theorem,there exists a set B P E such that φ and µ restricted to B are equivalent and φ p B c q “ . Notethat µ p B q ą . If µ p B c q “ , then φ „ µ and we are done, so we assume that µ p B c q ą . Wehave to show that for any measurable set C Ă B c such that µ p C q ą we have L p x, C q ą forevery x P E . Fix such x and C and define the measure ν p . q : “ ż B ÿ m “ ´ m P m p y, . q d µ p y q . Invariance of µ implies ν ! µ and that the restriction of both measures to B are equivalent. Let G P E be a set such that ν „ µ on G , ν p G c q “ and B Ă G .First, we assume that µ p G c q ą . Let m P N be such that ş G c P m p y, G q d µ p y q ą (such an m exists since P is φ -irreducible). Using invariance of µ , we obtain ż G P m p y, G c q d µ p y q “ ż G c P m p y, G q d µ p y q ą . Therefore, there exists some ε ą such that for D : “ t y P G : P m p y, G c q ě ε u , wehave µ p D q ą and hence ν p D q ą , which means that there exists some m P N such that ş B P m p y, D q d µ p y q ą . 13herefore, ν p G c q ě ż B ´ m ´ m P m ` m p y, G c q d µ p y qě ´ m ´ m ż B ż D P m p z, G c q P m p y, d z q d µ p y qě ´ m ´ m ε ż B P m p y, D q d µ p y q ą , contradicting the definition of G , so µ p G c q “ .In this case µ „ ν and so ν p C q ą which implies that there exist some ε ą and m P N suchthat ˜ D : “ t y P B : P m p y, C q ě ε u satisfies µ p ˜ D q ą . φ -irreducibility and the definition of theset B imply L p x, ˜ D q ą , which, together with the definition of ˜ D , implies L p x, C q ą , so theproof of the proposition is complete.The following proposition is an easy consequence of the rather deep Theorem 5.2.2 in [10] (whichis a key step in the proof of B ñ P (in our notation)) and of the (not so deep) previousproposition. Proposition A.2. ([10, Theorem 5.2.2]) Let P be irreducible. Then there exists a small set C ,i.e. a set C P E such that µ p C q ą for which there exist a finite measure ν and some m P N such that ν p C q ą and P m p x, B q ě ν p B q for all x P C and B P E .Proof. Theorem 5.2.2 in [10] assumes that P is ψ -irreducible where ψ is a maximal irreducibilitymeasure . By the previous proposition we can take ψ “ µ and therefore the conclusions of [10,Theorem 5.2.2] and of Proposition A.2 are the same. A.2. A gluing lemma
A proof of the following gluing lemma can be found in [1, Lemma 4.] (or in [7, Lemma 4.3.2]under the additional condition that the spaces are standard Borel). The conditions in [1, Lemma4.] are even slightly weaker than ours.
Lemma A.3.
Let p E i , E i q , i “ , , be Borel spaces and let ρ and ρ be probability measureson p E ˆ E , E b E q and p E ˆ E , E b E q respectively such that ρ p E ˆ B q “ ρ p B ˆ E q forall B P E . Then there exists a probability measure µ on p E ˆ E ˆ E , E b E b E q such that µ p A ˆ E q “ ρ p A q for all A P E b E and µ p E ˆ B q “ ρ p B q for all B P E b E . .3. Measurability issues Lemma A.4.
Let ˜ E P E satisfy µ p ˜ E q “ . Then there exists a set ˆ E Ă ˜ E in E such that P p x, ˆ E q “ for all x P ˆ E and µ p ˆ E q “ . Further, for any ˜ E P E , ˜ E equipped with the trace σ -field of E satisfies our basic assumptions (countably generated σ -field and measurable diagonal).Proof. The last statement is clear. To see the first, define E : “ ˜ E and E i ` : “ t x P E i : P p x, E i q “ u , i P N . Then ˆ E : “ Ş i E i does the job.In the following two statements we assume that p E, E q satisfies our general assumptions spelledout in the introduction and that Q and ˜ Q are Markov kernels on E . Lemma A.5. [7, p. 30f.] Let Λ p x, y ; d z q : “ ` Q p x, d z q ` ˜ Q p y, d z q ˘ . There exist measurablemaps f and ˜ f such that for each A P E , we have Q p x, A q “ ż A f p x, y ; z q Λ p x, y, d z q , ˜ Q p y, A q “ ż A ˜ f p x, y ; z q Λ p x, y, d z q . This lemma is used in [7] to prove a result which, in particular, implies the following proposition(which is not immediate since the supremum of an uncountable family of real-valued measurablefunctions need not be measurable).
Proposition A.6. [7, Theorem 2.2.4 (i)] The function p x, y q ÞÑ d ` Q p x, . q , ˜ Q p y, . q ˘ is measurable. Lemma A.7.
The set of all p x, y q P E ˆ E for which x and y are asymptotically equivalent is ameasurable subset of p E ˆ E, E b E q .Proof. Applying Lemma A.5 with Q “ ˜ Q “ P n we see that there exists a jointly measurablefunction f n such that P n p x, A q “ ż A f n p x, y ; z q Λ n p x, y ; d z q , P n p y, A q “ ż A f n p y, x ; z q Λ n p x, y ; d z q , for all x, y P E (with Λ n defined as in Lemma A.5). Defining A n p x, y q : “ t z P E : f n p x, y ; z q f n p y, x ; z q ą u , we see that A n p x, y q P E and that P n p x, . q and P n p y, . q restricted to A n p x, y q are equiva-lent. Further, A n p x, y q is the largest set (up to sets of measure 0 with respect to Λ n p x, y ; . q )with this property. Observe that the map p x, y q ÞÑ P n p x, A n p x, y qq “ ş A n p x,y q p z q P n p x, d z q is measurable (by a well-known application of the monotone class theorem) since the inte-grand is jointly measurable. The claim follows since x and y are asymptotically equivalentiff lim n Ñ8 P n p x, A n p x, y qq “ lim n Ñ8 P n p y, A n p x, y qq “ .15 eferences [1] P. Berti, L. Pratelli, and P. Rigo, Gluing lemmas and Skorohod reprsentations, ElectronicComm. Probab.
20 (2015) 1-11.[2] O. Butkovsky, A. Kulik, and M. Scheutzow, Generalized couplings and ergodic rates forSPDEs and other Markov models,
Ann. Appl. Probab.
30 (2020) 1-39.[3] G. Da Prato and J. Zabczyk,
Ergodicity for Infinite Dimensional Systems , CambridgeUniv. Press, Cambridge, 1996.[4] J. Elstrodt,
Maß-und Integrationstheorie , 7th edition, Springer, Berlin, 2011.[5] A. Es-Sarhir, M. v. Renesse, and M. Scheutzow, Harnack inequality for functional SDEswith bounded memory,
Electronic Comm. Probab.
14 (2009) 560-565.[6] M. Hairer and J. Mattingly, Yet another look at Harris’ ergodic theorem for Markov chains,
Seminar on Stochastic Analysis, Random Fields and Applications VI , Progr. Probab. 63,BirkhÃďuser, Basel, (2011) 109-117.[7] A. Kulik,
Ergodic Behavior of Markov Processes , de Gruyter, Berlin, 2018.[8] A. Kulik and M. Scheutzow, A coupling approach to Doob’s theorem,
Atti Accad. Naz. LinceiRend. Lincei Mat. Appl.
26 (2015) 83-92.[9] A. Kulik and M. Scheutzow, Generalized couplings and convergence of transition probabil-ities,
Probab. Theory Related Fields
171 (2018) 333-376.[10] S. Meyn and R. L. Tweedie,
Markov Chains and Stochastic Stability , Second edition, Cam-bridge Univ. Press, Cambridge, 2009.[11] S. Orey,
Lecture Notes on Limit Theorems for Markov Chain Transition Probabilities , VanNostrand Reinhold, London, 1971.[12] G. O. Roberts and J. S. Rosenthal, General state space Markov chains and MCMC algo-rithms,
Probability Surveys