TToy examples for effectiveconcentration bounds
Benoรฎt R. Kloeckner โ February 19, 2021
In this note we prove a spectral gap for various Markov chains on vari-ous functional spaces. While proving that a spectral gap exists is relativelycommon, explicit estimates seems somewhat rare.These estimates are then used to apply the concentration inequalities of[Klo17] (most of the present material was part of Section 3 of that article,which has been reduced to its core in the published version).Let us recall briefly the notation and concentration inequalities from [Klo17].Let ( ๐ ๐ ) ๐ โฅ be a Markov chain taking value in a general state space ฮฉ with a uniquestationary measure ๐ , and let ๐ : ฮฉ โ R be a function (the โobservableโ). We are in-terested in the speed of the convergence of the empirical average ^ ๐ ๐ ( ๐ ) := ๐ โ๏ธ ๐๐ =1 ๐ ( ๐ ๐ )to ๐ ( ๐ ). We denote by ๐ the law of ๐ , which can be arbitrary. Assumption 1.
The observable ๐ belongs to a function space X satisfyingi. its norm โยทโ dominates the uniform norm: โยทโ โฅ โยทโ โ ,ii. X is a Banach algebra, i.e. for all ๐, ๐ โ X we have โ ๐ ๐ โ โค โ ๐ โโ ๐ โ ,iii. X contains the constant functions and โ โ = 1 (where denotes the constantfunction with value ). To the transition kernel M is associated an averaging operator acting on X :L ๐ ( ๐ฅ ) = โซ๏ธ ฮฉ ๐ ( ๐ฆ ) d ๐ ๐ฅ ( ๐ฆ ) . Since each ๐ ๐ฅ is a probability measure, L has 1 as eigenvalue, with eigenfunction . โ Universitรฉ Paris-Est, Laboratoire dโAnalyse et de Matรฉmatiques Appliquรฉes (UMR 8050), UPEM,UPEC, CNRS, F-94010, Crรฉteil, France a r X i v : . [ m a t h . D S ] F e b ssumption 2. The Markov chain M satisfies the following:i. L acts as a bounded operator from X to itself, and its operator norm โ L โ isequal to .ii. L is contracting with gap ๐ฟ > , i.e. there is a closed hyperplane ๐บ โ X suchthat โ L ๐ โ โค (1 โ ๐ฟ ) โ ๐ โ โ ๐ โ ๐บ . The second hypothesis is a particular case of a spectral gap : it implies in particularthat 1 is a simple isolated eigenvalue.In [Klo17] the following two results where proved (plus a Berry-Esseen bound that wewill not use here).
Theorem A.
Assuming assumptions 1 and 2, for all ๐ โฅ log 100 โ log(1 โ ๐ฟ / it holds: P ๐ [๏ธ | ^ ๐ ๐ ( ๐ ) โ ๐ ( ๐ ) | โฅ ๐ ]๏ธ โค โงโชโชโชโชโชโจโชโชโชโชโชโฉ .
488 exp (๏ธ โ ๐ ๐ฟ . ๐ฟ + 8 . ๐ โ ๐ โ )๏ธ if ๐ โ ๐ โ โค ๐ฟ .
624 exp (๏ธ โ ๐ . ๐ฟ
12 + 13 ๐ฟ (๏ธ ๐ โ ๐ โ โ . ๐ฟ )๏ธ)๏ธ otherwise. (We will often use the strenghtened hypothesis ๐ โฅ /๐ฟ for simplicity.) Theorem B.
Assuming assumptions 1 and 2, for all ๐ โฅ ๐ฟ , all ๐ โฅ ๐ ( ๐ ) and all ๐ โค ๐ โ ๐ โ log (๏ธ ๐ฟ ๐ฟ )๏ธ it holds: P ๐ [๏ธ | ^ ๐ ๐ ( ๐ ) โ ๐ ( ๐ ) | โฅ ๐ ]๏ธ โค .
637 exp (๏ธ โ ๐ ยท (๏ธ ๐ ๐ โ ๐ฟ โ ) โ ๐ โ ๐ ๐ )๏ธ)๏ธ . Above, we use the notation ๐ ( ๐ ) = ๐ ( ๐ ) โ ( ๐ ๐ ) + 2 โ๏ธ ๐ โฅ ๐ ( ๐ L ๐ ยฏ ๐ ) where ยฏ ๐ = ๐ โ ๐ ( ๐ ). This โdynamical varianceโ is precisely the variance appearing in the CLT.While it is well known that the presence of a spectral gap ensures classical limittheorems, these results turn explicit contraction estimates into explicit non-asymptoticresults. The main goal of this note is to compute lower bounds on ๐ฟ for several pairs ofMarkov chains and functional spaces. We shall apply the above result for illustration,and compare to previous results when available. In each example below we will use the following lemma which, in the spirit of Doeblin-Fortet and Lasota-Yorke inequalities, enables to turn an exponential contraction in theโregularity partโ of a functional norm into a spectral gap.2 emma 1.1.
Consider a normed space X of (Borel measurable, bounded) functions ฮฉ โ R , with norm โยทโ = โยทโ โ + ๐ ( ยท ) where ๐ is a semi-norm (usually quantifying someregularity of the argument, such as Lip or BV ).Assume that for some constant ๐ถ > , for all probability ๐ on ฮฉ and for all ๐ โ X such that ๐ ( ๐ ) = 0 , โ ๐ โ โ โค ๐ถ๐ ( ๐ ) .Let L โ B ( X ) and assume that for some ๐ โ (0 , and all ๐ โ X : โ L ๐ โ โ โค โ ๐ โ โ and ๐ (L ๐ ) โค ๐๐ ( ๐ ) and having eigenvalue with an eigenprobability ๐ , i.e. L * ๐ = ๐ .Then L is contracting with gap at least ๐ฟ = 1 โ ๐ ๐ถ๐ .
The condition โ ๐ โ โ โค ๐ถ๐ ( ๐ ) is often valid in practice (assuming ฮฉ has finite diameterfor spaces such as Lip(ฮฉ)): the condition that ๐ ( ๐ ) = 0 implies that ๐ vanishes (iffunctions in X are continuous) or at least takes both non-positive and non-negativevalues, and ๐ ( ๐ ) usually bounds the variations of ๐ , implying a bound on its uniformnorm. Proof.
Let ๐ โ ker ๐ ; then โ L ๐ โ โ โค โ ๐ โ โ and L ๐ โ ker ๐ , so that โ L ๐ โ โ โค ๐ถ๐ (L ๐ ) โค ๐ถ๐๐ ( ๐ ).Denote by ๐ก โ [0 ,
1] the number such that โ ๐ โ โ = ๐ก โ ๐ โ (and therefore ๐ ( ๐ ) =(1 โ ๐ก ) โ ๐ โ ). The above two controls on โ L ( ๐ ) โ โ can then be written as โ L ( ๐ ) โ โ โค min (๏ธ ๐ก, ๐ถ๐ (1 โ ๐ก ) )๏ธ โ ๐ โ and using ๐ (L ๐ ) โค ๐๐ ( ๐ ) again we get โ L ( ๐ ) โ โค min (๏ธ ๐ก + ๐ (1 โ ๐ก ) , ( ๐ถ + 1) ๐ (1 โ ๐ก ) )๏ธ โ ๐ โโ (L ) | ker ๐ โ โค max ๐ก โ [0 , min (๏ธ ๐ก + ๐ (1 โ ๐ก ) , ( ๐ถ + 1) ๐ (1 โ ๐ก ) )๏ธ . The maximum is reached when ๐ก + ๐ (1 โ ๐ก ) = ( ๐ถ +1) ๐ (1 โ ๐ก ), i.e. when ๐ก = ๐ถ๐/ (1+ ๐ถ๐ ), atwhich point the value in the minimum is ( ๐ถ + 1) ๐/ ( ๐ถ๐ + 1) โ (0 , โ ( ๐ถ + 1) ๐/ ( ๐ถ๐ + 1), as claimed. We start with a warm-up in the simplest example of a Banach Algebra of functions, thespace of measurable bounded functions ๐ฟ โ (ฮฉ). To fit our framework, we will need toendow ๐ฟ โ (ฮฉ) with the norm โยทโ ๐ = โ ๐ โ โ + ๐ ( ๐ ) where ๐ ( ๐ ) := sup ๐ฅ,๐ฆ โ ฮฉ | ๐ ( ๐ฅ ) โ ๐ ( ๐ฆ ) | = sup ๐ โ inf ๐ We do not have a single reference measure here, which is why we consider genuinely bounded functionsrather than essentially bounded functions. ๐ is, which we need to manage separately from the magnitudeof ๐ . Of course, this norm is equivalent to the uniform norm, and it is easily checkedwhat we still get a Banach Algebra.Observe that convergence of measures in duality to ๐ฟ โ (ฮฉ) is convergence in totalvariation, and the most usual normalization is ๐ TV ( ๐, ๐ ) := sup ๐ ( ๐ )=1 โโโ ๐ ( ๐ ) โ ๐ ( ๐ ) โโโ . For a transition kernel M , having an averaging operator L with a spectral gap is a verystrong condition, called uniform ergodicity .Glynn and Ormoneit [GO02] and Kontoyiannis, Lastras-Montaรฑo and Meyn [KLMM05]gave explicit concentration results for such chains, using the characterization of uniformergodicity by the Doeblin minorization condition : there exist an integer โ โฅ
1, a positivenumber ๐ฝ and a probability measure ๐ on ฮฉ such that for all ๐ฅ โ ฮฉ and all Borel set ๐ต โ ฮฉ: ๐ โ๐ฅ ( ๐ต ) โฅ ๐ฝ๐ ( ๐ต ) (1)where ๐ โ๐ฅ is the law of ๐ โ conditionally to ๐ = ๐ฅ .We shall look at the case โ = 1, which fits better in our context. For arbitrary valueof โ , one can in practice apply the result to each extracted chain ( ๐ ๐ + ๐โ ) ๐ โฅ . Proposition 2.1. If M satisfies Doeblinโs minorization condition (1) with โ = 1 , thenits averaging operator L is contracting on ๐ฟ โ (ฮฉ) with gap ๐ฝ/ (2 โ ๐ฝ ) .Proof. This is simply the classical maximal coupling method in a functional guise. Foreach ๐ฅ โ ฮฉ decompose ๐ ๐ฅ into ๐ฝ๐ and ๐ ๐ฅ := ๐ ๐ฅ โ ๐ฝ๐ (which is a positive measure ofmass 1 โ ๐ฝ ). Recall that we denote by ๐ the stationary measure of M . For all ๐ โ ๐ฟ โ (ฮฉ)we have: L ๐ ( ๐ฅ ) = ๐ฝ๐ ( ๐ ) + ๐ ๐ฅ ( ๐ )L ๐ ( ๐ฅ ) โ L ๐ ( ๐ฆ ) = โซ๏ธ ( ๐ ๐ฅ ( ๐ ) โ ๐ ๐ฆ ( ๐ )) d ๐ ( ๐ฆ ) โโโ L ๐ ( ๐ฅ ) โ L ๐ ( ๐ฆ ) โโโ โค โซ๏ธ (1 โ ๐ฝ ) ๐ ( ๐ ) d ๐ ( ๐ฆ ) ๐ (L ๐ ) โค (1 โ ๐ฝ ) ๐ ( ๐ ) . We can thus apply Lemma 1.1 with ๐ถ = 1 and ๐ = 1 โ ๐ฝ , obtaining a spectral gap ofsize ๐ฝ/ (2 โ ๐ฝ ). Corollary 2.2. If M satisfies Doeblinโs minorization condition (1) with โ = 1 and ๐ : ฮฉ โ [ โ , , for all ๐ โฅ /๐ฝ and all ๐ โค ๐ฝ/ it holds P ๐ [๏ธ | ^ ๐ ๐ ( ๐ ) โ ๐ ( ๐ ) | โฅ ๐ ]๏ธ โค . (๏ธ โ ๐๐ ยท ๐ฝ
150 + 47 ๐ฝ )๏ธ . Proof.
We have here โ ๐ โ ๐ โค ๐ฟ โฅ ๐ฝ/ (2 โ ๐ฝ ) โฅ ๐ฝ/
2. It thensuffices to apply Theorem A and round constants up.The exponent is proportional to ๐ฝ , which is the correct rate and improves on [GO02]and [KLMM05] which get a ๐ฝ ; but Paulin obtains better constants in this case [Pau15](Corollary 2.10), and we do not study this example further.4 Discrete hypercube
Let us consider the same toy example as Joulin and Ollivier [JO10], the lazy randomwalk (aka Gibbs sampler, aka Glauber dynamics) on the discrete hypercube { , } ๐ :the transition kernel M chooses randomly uniformly a slot ๐ โ { , . . . , ๐ } and replacesit with the result of a fair coin toss, i.e. ๐ ๐ฅ = 12 ๐ฟ ๐ฅ + โ๏ธ ๐ฆ โผ ๐ฅ ๐ ๐ฟ ๐ฆ . We consider two kind of observables: the โpolarizationโ ๐ : { , } ๐ โ R giving theproportion of 1โs in its argument, and the characteristic function ๐ of a subset ๐ โ{ , } ๐ . In this second example, we will in particular consider the simple case ๐ = [0] := { (0 , ๐ฅ , . . . , ๐ฅ ๐ ) : ๐ฅ ๐ โ { , }} . The discrete hypercube { , } ๐ is endowed with the Hamming metric: if ๐ฅ = ( ๐ฅ , . . . , ๐ฅ ๐ )and ๐ฆ = ( ๐ฆ , . . . , ๐ฆ ๐ ), then ๐ ( ๐ฅ, ๐ฆ ) is the number of indexes ๐ such that ๐ฅ ๐ ฬธ = ๐ฆ ๐ . Twoelements at distance 1 are said to be adjacent, denoted by ๐ฅ โผ ๐ฆ .We denote by ๐ธ the set of tuples ๐ = ( ๐ ๐ ) โค ๐ โค ๐ such that exactly one of the ๐ ๐ is 1.Identifying { , } with Z / Z , an edge thus writes ( ๐ฅ, ๐ฅ + ๐ ) for some ๐ฅ โ { , } ๐ andsome ๐ โ ๐ธ .We shall consider several function spaces to showcase the flexibility of the spectralmethod; since the space { , } ๐ is finite, we always consider the space of all functions { , } ๐ โ R , and it is the considered norm which will matter. Let us define:โข โ ๐ โ ๐ฟ = โ ๐ โ โ + Lip( ๐ ): this is the standard Lipschitz norm;โข โ ๐ โ ๐๐ฟ = โ ๐ โ โ + ๐ Lip( ๐ ): this is the Lipschitz norm with a weigth to the regularitypart equal to the diameter;โข โ ๐ โ ๐ = โ ๐ โ โ + ๐ ( ๐ ) where ๐ ( ๐ ) = sup ๐ฅ โ{ , } ๐ โ๏ธ ๐ โ ๐ธ | ๐ ( ๐ฅ + ๐ ) โ ๐ ( ๐ฅ ) | ;this norm stays small for functions having large variations only in few directions(small โlocal total variationโ).We shall use later the following non-trivial comparison with โยทโ ๐ . Lemma 3.1 (Fedor Petrov [Pet17]) . For all ๐ : { , } ๐ โ R we have max ๐ โ min ๐ โค ๐ ( ๐ ) . roof. Without lost of generality, we can assume ๐ ( ๐ ) โค ๐ (0 , , . . . ,
0) = 0, andreduce to proving ๐ (1 , , . . . , โค cost of a path ๐ฅ , ๐ฅ , . . . , ๐ฅ ๐ as the number โ๏ธ ๐ โ ๐ =0 | ๐ ( ๐ฅ ๐ +1 ) โ ๐ ( ๐ฅ ๐ ) | , and letฮฃ be the sum of the costs of all paths of length ๐ from (0 , , . . . ,
0) to (1 , , . . . , โค ๐ !, and since there are ๐ ! such paths one of them will have costat most 1, proving the lemma.We call โlevelโ of ๐ฅ โ { , } ๐ the number of 1s among the coordinates of ๐ฅ , anddenote it by | ๐ฅ | . For each ๐ โ { , , . . . , ๐ โ } , define ๐ ๐ = ๐ !( ๐ โ ๐ )! ๐ +1 . Then all ๐ ๐ arepositive and ๐ ๐ + ๐ ๐ +1 = ๐ !( ๐ โ ๐ โ ๐ to level ๐ + 1.The contribution to ฮฃ of an edge ( ๐ฅ, ๐ฅ + ๐ ) from level ๐ to level ๐ + 1 is thus ๐ !( ๐ โ ๐ โ | ๐ ( ๐ฅ + ๐ ) โ ๐ ( ๐ฅ ) | , which we split into two parts, one ๐ ๐ | ๐ ( ๐ฅ + ๐ ) โ ๐ ( ๐ฅ ) | attributedto ๐ฅ and the other ๐ ๐ +1 | ๐ ( ๐ฅ + ๐ ) โ ๐ ( ๐ฅ ) | to ๐ฅ + ๐ . It followsฮฃ โค โ๏ธ ๐ฅ โ{ , } ๐ ๐ | ๐ฅ | ๐ ( ๐ ) โค ๐ โ๏ธ ๐ =0 ๐ ๐ (๏ธ ๐๐ )๏ธ = ๐ โ โ๏ธ ๐ =0 ( ๐ ๐ + ๐ ๐ +1 ) (๏ธ ๐ โ ๐ )๏ธ = ๐ ( ๐ โ ๐ !as desired.We get the following gap estimates. Theorem 3.2.
Each of the norm โยทโ ๐ฟ , โยทโ ๐๐ฟ and โยทโ ๐ turns the space of all functions { , } ๐ โ R into a Banach algebra where has norm .Moreover the averaging operator L of the transition kernel M has operator norm ,and is contracting with gap respectively /๐ , / (2 ๐ โ and / (4 ๐ โ in the norms โยทโ ๐ฟ , โยทโ ๐๐ฟ and โยทโ ๐ .Proof. Each norm considered here has the form โยทโ = โยทโ โ + ๐ ( ยท ) for some semi-norm ๐ such that ๐ ( ๐ ๐ ) โค โ ๐ โ โ ๐ ( ๐ ) + ๐ ( ๐ ) โ ๐ โ โ ; it follows that the considered spaces areBanach algebras. All the other properties but the contraction are trivial.To prove the contraction, we simply apply Lemma 1.1. First, it is well-known that forall ๐ : { , } ๐ โ R , Lip(L ๐ ) โค (1 โ /๐ ) Lip( ๐ )(in the parlance of [Oll09], M is positively curved with ๐ = 1 /๐ ).In the case of โยทโ ๐ฟ , we get ๐ = 1 โ /๐ and ๐ถ = ๐ (since a function of vanishingaverage must take positive and negative values, and diam { , } ๐ = ๐ ), hence a con-traction with gap 1 /๐ . In the case of โยทโ ๐๐ฟ , the normalizing factor gives ๐ถ = 1 (andwe still have ๐ = 1 โ /๐ ), hence a spectral gap of size 1 / (2 ๐ โ โยทโ ๐ , we first show that in Lemma 1.1 we can take ๐ = 1 โ / (2 ๐ ). ๐ (L ๐ ) = sup ๐ฅ โ๏ธ ๐ โ ๐ธ โโโโโ ๐ ( ๐ฅ + ๐ ) + 12 ๐ โ๏ธ ๐ โ ๐ธ ๐ ( ๐ฅ + ๐ + ๐ ) โ ๐ ( ๐ฅ ) โ ๐ โ๏ธ ๐ โ ๐ธ ๐ ( ๐ฅ + ๐ ) โโโโโ = sup ๐ฅ โ๏ธ ๐ โ ๐ธ โโโโโ(๏ธ โ ๐ )๏ธ ๐ ( ๐ฅ + ๐ ) + 12 ๐ โ๏ธ ๐ ฬธ = ๐ ๐ ( ๐ฅ + ๐ + ๐ )6 (๏ธ โ ๐ )๏ธ ๐ ( ๐ฅ ) โ ๐ โ๏ธ ๐ ฬธ = ๐ ๐ ( ๐ฅ + ๐ ) โโโโโ โค sup ๐ฅ ๐ โ ๐ โ๏ธ ๐ โ ๐ธ | ๐ ( ๐ฅ + ๐ ) โ ๐ ( ๐ฅ ) | + 12 ๐ โ๏ธ ๐ โ ๐ธ โ๏ธ ๐ ฬธ = ๐ | ๐ ( ๐ฅ + ๐ + ๐ ) โ ๐ ( ๐ฅ + ๐ ) |โค ๐ โ ๐ ๐ ( ๐ ) + 12 ๐ sup ๐ฅ โ๏ธ ๐ฆ โผ ๐ฅ โ๏ธ ๐ โ ๐ธ | ๐ ( ๐ฆ + ๐ ) โ ๐ ( ๐ฆ ) | . Hence we obtain ๐ (L ๐ ) โค (๏ธ โ ๐ )๏ธ ๐ ( ๐ ).Then Lemma 3.1 shows that we can take ๐ถ = 1, providing a spectral gap of size1 / (4 ๐ โ Let us combine 3.2 with A and B to obtain explicit concentration estimates. We will notcompute the explicit constants, and concentrate on the dependency with the parameters ๐ and ๐ .Consider first the โpolarizationโ observable ๐ : { , } ๐ โ R , where ๐ ( ๐ฅ ) is the pro-portion of 1โs in the word ๐ฅ . We have โ ๐ โ ๐ฟ = 1 + 1 ๐ , โ ๐ โ ๐๐ฟ = 2 , โ ๐ โ ๐ = 2 . To use Theorem A with optimal efficiency, assuming ๐ will be small enough, we needto maximize ๐ฟ / โ ๐ โ . Here, we shall thus use the norm โยทโ ๐๐ฟ . For ๐ (cid:46) ๐ , TheoremA shows that we need at most ๐ ( ๐/๐ ) iterations to have a good convergence to theactual mean; meanwhile Joulin and Ollivier only need ๐ (1 /๐ ), but for concentrationaround the expectancy of the empiric process, not around the expectancy with respectto the stationary measure. Without burn-in, one also needs to bound the bias, whichapproaches zero in time ๐ ( ๐/๐ ) according to the bound of Joulin and Ollivier, for atotal run time of ๐ ( ๐/๐ + 1 /๐ ). With burn-in, they need a run time of ๐ ( ๐ + 1 /๐ ).For 1 /๐ (cid:46) ๐ (cid:46)
1, we enter our exponential regime while staying inside Joulin-OllivierโsGaussian window; Theorem A shows we need no more than ๐ ( ๐ /๐ ) iterations, while[JO10] still gives a bound of ๐ ( ๐ + 1 /๐ ).In this example, Joulin and Ollivier get a sharper result; this seems to be explainedin one part by the fact that we do not get to decouple the bias from the convergenceof expectancies, and in another part by our need to have a Banach algebra, hence toinclude the uniform norm in our norm.Consider now the potential ๐ , the indicator function for a (non-trivial) set ๐ . Thisfunction is only 1-Lipschitz, so that we have โ ๐ โ ๐ฟ = 2 and โ ๐ โ ๐๐ฟ = 1 + ๐ . If weinsist on using a Lipschitz norm, the unormalized one is thus better and with ๐ฟ = 1 /๐ Theorem A shows that we need (in the Gaussian regime) ๐ ( ๐ /๐ ) iterations to ensurethe error is probably less than ๐ , which is the same order of magnitude than given by[JO10] with a worse constant, โผ
34 instead of 8. But here we have two ways to improveon this bound. 7he first one is to use Theorem B. When ๐ = [0] := { ๐ฅ ๐ฅ ยท ยท ยท ๐ฅ ๐ โ { , } ๐ } , thedynamical variance can be computed explicitly (distinguish the cases when the first digithas been changed an odd or even number of times, and observe that at each step theprobability of changing the first digit is 1 / ๐ ): ๐ ( ) โ ( ๐ [0] ) = 1 / โ๏ธ ๐ โฅ ๐ ( [0] L ๐ ยฏ [0] ) = 14 โ๏ธ ๐ โฅ (๏ธ ๐ โ ๐ )๏ธ ๐ = ๐ โ . This gives ๐ ( ๐ ) โ ๐/
2. Switching back to the norm โยทโ ๐๐ฟ , when ๐ (cid:46) /๐ and ๐ โฅ ๐ , in Theorem B the positive term in the exponential is negligible comparedto the main term which is โ ๐๐ /๐ . In particular ๐ ( ๐/๐ ) iterations suffice to get asmall probability for a deviation at least ๐ : compared to Joulin and Ollivier, we gainone power of ๐ in this regime (and the optimal constant 1 in the leading term of therate) but only for very small values of ๐ . This choice of ๐ might seem very specific, butfor less regular ๐ the gain should be greater for sufficiently smaller ๐ . For example, if ๐ contains half the vertices and every vertex ๐ฅ โ { , } ๐ has exactly 2 ๐ ๐ neighbors withthe same ๐ value, the above computation of variance gives ๐ ( ๐ ) = + โ ๐ ๐ . We shallcall a family of sets ๐ ๐ โ { , } ๐ โscrambledโ when the indicator functions ๐ ๐ havebounded variance (independently of ๐ ) with respect to the lazy random walk; by abuse,we shall speak of a scrambled set for a member of such a family. For scrambled setstaking ๐ = ๐ (1 /๐ ) is sufficient: there is no dependency on the dimension. A furtherstudy of scrambled sets seems an interesting direction of work.The second way to improve our first estimate is to use the norm โยทโ ๐ in TheoremA. Then โ [0] โ ๐ = 2 and ๐ฟ โ /๐ . For ๐ (cid:46) /๐ , Theorem A ensures that weneed only ๐ ( ๐/๐ ) iterations to have a good convergence to the actual mean, which isagain the optimal order of magnitude (since it corresponds to the CLT) but obtainedon a much larger window than with Theorem B. This extends to all observables with ๐ ( ๐ ) (cid:46)
1; observe that this domain of applicability is quite complementary to thedomain of applicability of the previous paragraph.
We now consider the โBernoulli convolutionโ of parameter ๐ โ (0 , ๐ฝ ๐ of the random variable โ๏ธ ๐ โฅ ๐ ๐ ๐ ๐ where the ๐ ๐ are independent variables taking the value 1 with probability 1 / โ / If we want to consider ๐ of the order of 1 /๐ , we can then take ๐ โ ๐ to enlarge the window, at thecost of a weaker leading term. We get a bound similar to the one of Joulin-Ollivier, possibly with asmaller constant (depending on the value of ๐ ). a) ๐ = โ โ โ .
618 (b) ๐ = ๐ โ .
680 (c) ๐ = โ . Figure 1: Histogram of the empirical distribution of the Markov chain associated to( ๐ , ๐ ), with ๐ = 0, binned in 500 subintervals (averaged image over 30independent runs of 10 points each). Parameter ๐ is the inverse of a Pisotnumber on the left, a very well approximable irrational at the center, rationalon the right.One can realize naturally ๐ฝ ๐ as the stationary law of the Markov transition kernel M = ( ๐ ๐ฅ ) ๐ฅ โ R defined by ๐ ๐ฅ = 12 ๐ฟ ๐ ( ๐ฅ ) + 12 ๐ฟ ๐ ( ๐ฅ ) where ๐ ( ๐ฅ ) = ๐๐ฅ โ ๐ and ๐ ( ๐ฅ ) = ๐๐ฅ + ๐ (this is a particular case of an IteratedFunction System).In order to evaluate ๐ฝ ๐ ( ๐ ) by a MCMC method, one cannot use the methods developedfor ergodic Markov chains since, conditionally to ๐ = ๐ฅ , the law ๐ ๐๐ฅ of ๐ ๐ is atomicand thus singular with respect to ๐ฝ ๐ : ๐ TV ( ๐ ๐๐ฅ , ๐ฝ ๐ ) = 1 for all ๐ . The convergence onlyholds for observables satisfying some regularity assumption, and it is natural to ask whatregularity is needed.For a Lipschitz observable ๐ one only need to observe that M has positive curvaturein the sense of Ollivier (this is easy using the coupling ๐ฟ ( ๐ ( ๐ฅ ) ,๐ ( ๐ฆ )) + ๐ฟ ( ๐ ( ๐ฅ ) ,๐ ( ๐ฆ )) of ๐ ๐ฅ and ๐ ๐ฆ ) and apply [JO10]. But what if ๐ is not Lipschitz (or has large Lipschitzconstant)? We shall consider observables of bounded variation, a regularity which hasthe great advantage over Lipschitz to include the characteristic functions of intervals. Definition 4.1.
Given an interval ๐ผ โ R , we consider the Banach space BV( ๐ผ ) of bounded variation functions ๐ผ โ R , defined by the norm โยทโ BV = โยทโ โ + var( ยท , ๐ผ ) wherevar( ๐, ๐ผ ) := sup ๐ฅ <๐ฅ < ยทยทยท <๐ฅ ๐ โ ๐ผ ๐ โ๏ธ ๐ =1 | ๐ ( ๐ฅ ๐ ) โ ๐ ( ๐ฅ ๐ โ ) | (the uniform norm is usually replaced by the ๐ฟ norm, but when ๐ผ is bounded our choiceis equivalent up to a constant, it does not single out the Lebesgue measure, and mostimportantly it ensures that BV( ๐ผ ) is a Banach algebra).Important features of total variation are:9 its extensiveness: var( ๐, ๐ผ ) โฅ var( ๐, ๐ฝ ) + var( ๐, ๐พ ) whenever ๐ฝ, ๐พ are disjointsubintervals of ๐ผ ,โข its invariance under monotonic maps: var( ๐ โ ๐, ๐ผ ) = var( ๐, ๐ ( ๐ผ )) whenever ๐ ismonotonic.It turns out that the averaging operator L of the transition kernel M has a spectralgap for all ๐ , but is not a contraction when ๐ > / โ L ๐ ( ๐ ) โ โค ๐ถ (1 โ ๐ฟ ) ๐ โ ๐ โ on a closed hyperplane, but with ๐ถ > is a contraction, and to apply directly Theorems A and B we need to consider anextracted Markov chain ( ๐ โ๐ ) ๐ โฅ for some โ .Let ๐ผ ๐ be the attractor of the IFS ( ๐ , ๐ ), i.e. the interval whose endpoints are thefixed points of ๐ and ๐ : ๐ผ ๐ = [๏ธ โ ๐ โ ๐ , ๐ โ ๐ ]๏ธ . Given a word ๐ = ๐ ๐ . . . ๐ ๐ in the letters 0 and 1, we define ๐ ๐ = ๐ ๐ โ ๐ ๐ โ ยท ยท ยท โ ๐ ๐ ๐ : ๐ผ ๐ โ ๐ผ ๐ . Theorem 4.2. If ๐ โ < , then L โ has a spectral gap on BV( ๐ผ ๐ ) of size / (2 โ +1 โ andconstant .Proof. Let ๐ผ โ ๐ , ๐ผ + ๐ be the left and right halves of ๐ผ ๐ , i.e. ๐ผ โ ๐ = [๏ธ โ ๐ โ ๐ , )๏ธ and ๐ผ + ๐ = (๏ธ , ๐ โ ๐ ]๏ธ .Let ๐ โ BV( ๐ผ ๐ ) and observe that the condition ๐ โ < ensures that ๐ ... ( ๐ผ ๐ ) and ๐ ... ( ๐ผ ๐ ) are disjoint (they have length < | ๐ผ ๐ | and each contains an endpoint of ๐ผ ๐ ).Then: var(L โ ๐, ๐ผ ๐ ) โค โ โ๏ธ ๐ โ{ , } โ var( ๐ โ ๐ ๐ , ๐ผ ๐ ) โค โ โ๏ธ ๐ โ{ , } โ var( ๐, ๐ ๐ ( ๐ผ ๐ )) โค โ (๏ธ var( ๐, ๐ ... ( ๐ผ ๐ )) + var( ๐, ๐ ... ( ๐ผ ๐ ) + โ๏ธ ๐ ฬธ =00 ... ฬธ =11 ... var( ๐, ๐ผ ๐ ) )๏ธ โค โ (๏ธ var( ๐, ๐ผ ๐ ) + (2 โ โ
2) var( ๐, ๐ผ ๐ ) )๏ธ var(L โ ๐, ๐ผ ๐ ) โค (1 โ โ โ ) var( ๐, ๐ผ ๐ ) . Applying Lemma 1.1 with ๐ถ = 1 and ๐ = 1 โ โ โ yields the claim.This enables us to apply our result to estimate ๐ฝ ๐ ( ๐ ) for any ๐ of bounded variation.For example, Theorem A yields the following. Corollary 4.3.
Let ๐ โ ( , and let โ be an integer such that ๐ โ < . Consider aMarkov chain ( ๐ ๐ ) ๐ โฅ with transition probability โ โ from ๐ฅ โ ๐ผ ๐ to ๐ ๐ ( ๐ฅ ) , for each ๐ โ { , } โ . For any starting distribution ๐ โผ ๐ , any ๐ โ BV( ๐ผ ๐ ) , any positive ๐ < โ ๐ โ BV / โ +1 โ and any ๐ โฅ ยท โ we have P ๐ [๏ธ | ^ ๐ ๐ ( ๐ ) โ ๐ ( ๐ ) | โฅ ๐ ]๏ธ โค .
488 exp (๏ธ โ ๐๐ โ ๐ โ (16 . ยท โ + 5 . )๏ธ .
10o the best of our knowledge, this example could not be handled effectively by previ-ously known results. For example [GD12] needs the observable to be at least ๐ถ to haveexplicit estimates, and they do not give a concentration inequality. References [GD12] David M Gรณmez and Pablo Dartnell,
Simple monte carlo integration withrespect to Bernoulli convolutions , Applications of Mathematics (2012),no. 6, 617โ626. 4[GO02] Peter W Glynn and Dirk Ormoneit, Hoeffdingโs inequality for uniformlyergodic Markov chains , Statistics & probability letters (2002), no. 2,143โ146. 2, 2[JO10] Aldรฉric Joulin and Yann Ollivier, Curvature, concentration and error es-timates for Markov chain Monte Carlo , Ann. Probab. (2010), no. 6,2418โ2442. MR 2683634 3, 3.2, 4[KLMM05] Ioannis Kontoyiannis, Luis A Lastras-Montano, and Sean P Meyn, Relativeentropy and exponential deviation bounds for general Markov chains , Inter-national Symposium on Information Theory, 2005, IEEE, 2005, pp. 1563โ1567. 2, 2[Klo17] Benoรฎt R. Kloeckner,
Effective limit theorems for Markov chains with a spec-tral gap , arXiv:1703.09623, 2017. (document), 4[Oll09] Yann Ollivier,
Ricci curvature of Markov chains on metric spaces , J. Funct.Anal. (2009), no. 3, 810โ864. MR 2484937 3.1[Pau15] Daniel Paulin,
Concentration inequalities for Markov chains by Marton cou-plings and spectral methods , Electronic Journal of Probability (2015). 2[Pet17] Fedor Petrov, Answer to โdiameter of a weighted Hamming cubeโ , Math-Overflow, 2017, https://mathoverflow.net/a/286346/4961. 3.1[PSS00] Yuval Peres, Wilhelm Schlag, and Boris Solomyak,