Three steps mixing for general random walks on the hypercube at criticality
aa r X i v : . [ m a t h . P R ] F e b THREE STEPS MIXING FOR GENERAL RANDOM WALKS ON THEHYPERCUBE AT CRITICALITY.
ANDREA COLLEVECCHIO AND ROBERT GRIFFITHS
Abstract.
We introduce a general class of random walks on the N -hypercube, study cut-offfor the mixing time, and provide several types of representation for the transition proba-bilities. We observe that for a sub-class of these processes with long range (i.e. non-local)there exists a critical value of the range that allows an “almost-perfect” mixing in at mostthree steps. In other words, the total variation distance between the three steps transitionand the stationary distribution decreases geometrically in N , which is the dimension of thehypercube. In some cases, the walk mixes almost-perfectly in exactly two steps. Noticethat a well-known result (Theorem 1 in Diaconis and Shahshahani (1986)) shows that thereexist no random walk on Abelian groups (such as the hypercube) which mixes perfectly inexactly two steps. Contents
1. Introduction 22. Literature review and novelty of our results 43. Model and Main results 54. Spectral representation via tensor products 105. Proof of Theorem 3.3 115.1. General construction of the process X t mix t mix and t (2) mix . 209.2. Digression on which χ -distance to use. 20Upper bound for t (2) mix . 2110. Further discussion and a few examples 2311. Proof of Theorem 3.15 2412. Remarks and Conclusion 2513. Appendix 25References 26 Introduction
The field of mixing time has attracted the attention of many mathematicians in the past30 years. It can be described as the study of the rate of convergence of Markov chains totheir stationary distribution, and has an enormous amount of applications, for example inphysics, economics, biology, combinatorics.This field of study is interesting not only for the mathematical tools developed, and theirapplications to real life problems, but also for the variety of different behaviour that Markovchains can exhibit.A cut-off phenomena is observed in certain cases, which highlights a discontinuity ina Markov chain’s behaviour, where there is a sudden change from being very far away tobecome very close to stationarity. In some cases, the chain reaches stationarity in a finitenumber of steps, producing a perfect sampling from the stationary distribution. Morefrequently this happens in a random number of steps. For example consider the celebratedcoupling-from-the-past technique introduced in Propp and Wilson (1996) which had a hugeimpact in simulations of models from statistical mechanics (e.g. the Ising model). Morerarely, this perfect sampling is achieved in a deterministic number of steps.In this paper, we consider a large class G of reversible Markov chains X , not necessar-ily time homogeneous. We aim to study the behaviour of their mixing time as the statespace increases, both in terms of total variation and χ distances. Our results highlight acertain discontinuity of the mixing time in terms of the size of a single step of the randomwalk. We characterize the cases when the chain mixes ‘ almost- perfectly ’ in at mostthree steps . This means that the total variation distance between the distribution of theprocess at time 3 and the stationary distribution decreases fast to zero as the dimensionof the hypercube increases (see Definition 3.11). For example, to illustrate this phenom-ena, consider the following chain, which is described in detail in Example 1.2 below. Fix p > / α ∈ (0 , ⌊ αN ⌋ coordinates are pickeduniformly at random and their value is changed with the following procedure, which is re-peated independently for each coordinate selected. If it is 0 it changes to 1, while if it is1 an independent randomization is used. The 1 becomes 0 with probability (1 − p ) /p , anddoes not change otherwise. If α = p , we prove that it mixes almost perfectly in 2 steps. αt mix o ∼ ln N p • α = p is what we call the criti-cal value. Moreover almost-perfect mixing inat most 3 steps is observed in the window α ∈ [ p − v/ √ N , p + v/ √ N ] where v is anyreal number, and can even be random. Onthe other hand, if α = p the chain mixes inthe order of ln N steps, and a cut-off is provedin the χ distance. This unexpected disconti-nuity is described in Figure 1.Moreover, p is allowed to depend on N , andwe find interesting the case where p N con-verges to 1 /
2. We interpret this case as a small perturbation of the case p = 1 /
2. When we
STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 3 compare this result with the existing literature on long-range random walks on the hyper-cube with p = 1 /
2, we observe a big gap, as the latter process mixes slowly, at least in the χ distance (see Nestoridi (2017) and the discussion in Section 2 below).The almost-perfect mixing in exactly two steps described above is surprising also becauseof a well-known result by Diaconis and Shahshahani (1986) which implies that no randomwalk on an Abelian group reaches perfect stationarity in exactly two steps.Moreover, our result include a computable spectral representation for this class of pro-cesses, which enables us to show the so-called cut-off phenomenon for a large class ofprocesses.The class of process we consider are intimately related to the Ehrenfest Urn, and to itsgeneralizations (see discussion in Section 2 below), which in turn has direct applications tochemistry and physics (see e.g. Flegg et. al. (2008)).We assume that each process X = ( X t ) t in the class G takes values on the vertices of a hyper-cube, and its stationary distribution is the product measure of i.i.d. Bernoulli distributionswith parameter p ≥ / X satisfiesa ‘restriction principle’ as stated in Condition 2 in Section 3, which can be roughly describedas follows. The probability of any given collection of coordinates (say B ⊆ [ N ]) being up-dated at time t + 1 depends on the past of the process only through the coordinates B of X t , and might depend on some external randomization. We discuss below a few examplesof processes satisfying the properties described above. In particular, a simple random walkon the hypercube , Diaconis et. al. (1990), and a class of non-local random walks Nestoridi(2017) on the hypercube belong to G .Moreover, we highlight another phase transition which we find surprising. The startingposition can determine mixing in a bounded time, in this case for the Hamming distance,when the update size is far from criticality.We also provide general representations for the t -step transition matrix of X , one in termsof a system of random walks and the other as a spectral representation. Here are a fewexamples of process that lie in the class we describe above. Example 1.1. Lazy simple random walk on the hypercube (RWH) . Let X be avertex of the N -dimensional hypercube, and define the process ( X t ) t ∈ N recursively as follows.Suppose that at stage t + 1 , a fair coin is flipped. If it shows Head, X t +1 = X t . If the coinshows Tail, then a coordinate of X t is chosen uniformly at random, and it is changed. Thisprocess has been studied extensively. In particular it was shown in Diaconis et. al. (1990)that it exhibits a cut-off at (1 / N log N . Example 1.2. Non-local random walk on the hypercube (NLRWH)
Consider thefollowing random walk. Fix parameters p N ≥ / and z N ∈ [ N ] . Pick a set of coordinateswith cardinality z N uniformly at random, i.e. each possible choice is picked with probability (cid:18) Nz N (cid:19) − . For each coordinate i selected we perform the following procedure, which we call Accep-tance/Rejection with parameter p N : a) If X t [ i ] = 0 then X t +1 [ i ] = 1 A. COLLEVECCHIO AND R. GRIFFITHS b) If X t [ i ] = 1 then we randomize further, and set X t +1 [ i ] = ( with probability − p N p N otherwise . The stationary distribution is unique, and is a product measure of i.i.d. Bernoulli’s withparameter p N . The case p N ≡ / , with an additional assumption of lazyness, was studiedin Nestoridi (2017). Example 1.3. Mixture of i.i.d. updates for each coordinate.
Fix p N ≥ / . Definea process ( X t ) t ∈ N recursively. Let X = ∈ V N . Suppose that we have defined X t , then weobtain X t +1 as follows. Let I ( N ) t be a random variable with distribution ν N,t . We assume thatfor any fixed N ∈ N , the random variables ( I ( N ) t ) t ∈ N are independent. Given I ( N ) t = α N,t , foreach coordinate j ∈ [ N ] = { , , . . . N } flip an independent coin that has probability α N,t ofshowing Head. If the coin shows Tail we set X t +1 [ j ] = X t [ j ] . The coordinate is selected ifand only if the corresponding coin shows Head. For each selected coordinate we repeat theAcceptance/Rejection procedure described in the Example 1.2 with parameter p N .This process has stationary distribution product measure of i.i.d. Bernoulli’s with parameter p N . Example 1.4. Blocks update.
Let β N be a sequence such that N/β N is a positive integer.Partition the space [ N ] into N/β N disjoint subsets with cardinality β N each. Exactly onegroup is chosen, each with equal probability. For each coordinate j of this group we repeatthe Acceptance/Rejection method described in Example 1.2 with parameter p . The Markovchain X is reversible with respect the measure product measure of i.i.d. Bernoulli’s withparameter p . Literature review and novelty of our results
The main contributions of this paper can be summarized as follows. • Almost-perfect mixing with acceptance/rejection.
Long range versions of RWHshave been studied in Nestoridi (2017). In this context, the random walks were consideredto be ‘fair’, i.e. p = 1 /
2, and ‘lazy’, i.e. at each stage the process would not changewith probability 1/2. The latter assumption is convenient to avoid periodicity, and ensureergodicity of the process. The critical case z N = N/ N was provided. Moreover, a lowerbound for the χ distance was also provided, and still of the order N (see Remark 2 onpage 1297 of Nestoridi (2017)), suggesting that the chain would not mix rapidly. Thisbehaviour seems a bit subtle, as the mixing time for the same chain, when z N = αN , with α < .
5, is of the order log N . Our contribution, for this particular example, is to showthat lazyness is the cause of this slowing down in the case of χ distance. If we apply theacceptance rejection method described in the examples above, with p N ↓ /
2, and p N = 1 / p N = 1 / p N in such a way that thesimilarity between the two models is quite evident. To see this, we can identify the limitingdistribution of π N ( · , /
2) with a Uniform over the interval [0 , STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 5 vertices of the hypercube with a truncated binary expansion and the stationary measure isa product measure of Bernoulli(1 / ξ n ) n ∈ N of Bernoulli( p ) with p = 1 /
2, the limit of the P ∞ n =1 ξ n − n has a distribution singular withrespect to the Lebesgue measure. The latter, is a consequence of a beautiful Theorem ofKakutani (1948).Hence it makes sense to consider sequences p N → /
2. Fix ε > p N = 1 / δ ε /N a where a > δ ε > a and ε . Using again Kakutani Theorem we havethat the limiting distribution of the product Bernoulli ( p N ), using the binary expansiontrick, is uniformly continuous with respect to the Lebesgue measure. Denote by η ε ( · ) thisdistribution. It is not difficult to prove that we can choose δ ε such that the total variationdistance between η ε ( · ) and the uniform measure is less than ε . • Mixing in finitely many steps at critical (deterministic) initial conditions.
Inexample 1.2, if we choose as initial configuration a vector in the N -dimensional hypercubewhich has exactly N p ones, then the mixing time of the Hamming distance of the process isof constant order, provided z N = αN for some α ∈ (0 , α = p , the mixing time becomes of the order ln N inthe worst case scenario. • General representations.
Moreover, ours is a unifying approach which allows a studyof a general class of processes. Our work is inspired by papers Karlin et. al. (1993) andDiaconis and Griffiths (2012), where the Krawtchouk polynomials are used to study RWHthrough a spectral analysis. We combine this approach with an acceptance/rejection method.We identify a large class of processes whose transition kernel can be decomposed using theseKrawtchouk polynomials. This representation is explicit, in the sense that we can computethe eigenvalues, and is used to provide sharp bounds for the mixing time in L and L norms(i.e. with respect the total variation and the χ distances, respectively).3. Model and Main results
We define a class G of Markov chains as follows. Let X = ( X t ) t ∈ N be a reversible Markovchain with state space V N = { , } N , for some N ∈ N . The process X is in the class G ifand only if satisfies the following two conditions. Condition 1
There exists a parameter p ≥ / X ,(3.1) π N ( y , p ) = p k y k (1 − p ) N −k y k := p k y k q N −k y k , where y ∈ V N , and k y k is the sum of ones appearing in y , i.e. the Hamming distancebetween y and = (0 , , . . . , ∈ V N . In many occasions, we drop N from the notation,and simply use π ( y , p ). Condition 2
For any y ∈ V N , denote by y = ( y [1] , y [2] , . . . y [ N ]) its coordinates. For all B ⊆ [ N ] = { , , . . . N } let y ( B ) be the projection from V N on B defined as the vector y ( B ) = ( y [ j ] , j ∈ B ). We assume that(3.2) P ( X t ( B ) ∈ C | X t − ) = P ( X t ( B ) ∈ C | X t − ( B )) . A. COLLEVECCHIO AND R. GRIFFITHS
For any pair of probability measures µ and ν defined on a countable space Ω, define the totalvariation distance k µ − ν k T V = max A ⊆ Ω | µ ( A ) − ν ( A ) | . Definition 3.1.
Let X ∈ G . Define the sequence ( Z t ) t ∈ N of independent random vectors in V N , with the following distribution (3.3) P ( Z t ∈ S ) = P ( X t ∈ S | X t − = ) . From now on, we denote the coordinates of Z t by ( Z t [1] , Z t [2] , . . . Z t [ N ]) . Remark 3.2.
In what follows, we denote by P t ( · | x ) the probability mass function of X t given X = x . Moreover, when we consider a generic X ∈ G we denote by N the dimensionof the corresponding hypercube. We have the following representation.
Theorem 3.3 ( Spectral Representation).
Let X ∈ G . We have, (3.4) P t ( y | x ) = π ( y , p ) ( X A ⊆ [ N ] ,A = ∅ t Y m =1 ρ A,m ! (cid:18) pq (cid:19) | A | Y j ∈ A (cid:16) − x [ j ] p (cid:17)(cid:16) − y [ j ] p (cid:17)) . where we can give an explicit representation for the eigenvalues, i.e. (3.5) ρ A,m = E "Y j ∈ A (cid:16) − Z m [ j ] p (cid:17) . Remark 3.4.
The spectral representation in (3.4) simplifies when X is time-homogenousand instead of a product, we simply have ρ tA . Notice that the previous representation holdsalso in cases when the chain is reducible and/or periodic. If ρ A,m only depends on | A | then we denote ρ | A | = ρ A,m . Example 3.5.
Let q = p = 1 / and take N take as even. The elements of Z m are taken tobe exchangeable and || Z m || = N/ with probability . Each term in the product expressionfor (3.5) is either or − according to whether Z m [ j ] is 0 or 1. There is a hypergeometricprobability of k terms in the product appearing in the right-hand side of (3.5) being minusone. If | A | = n ρ n = n X k =0 (cid:0) N/ k (cid:1)(cid:0) N/ n − k (cid:1)(cid:0) Nn (cid:1) ( − k . Simplification shows that ρ n = 0 if n is odd, and for m ≤ N/ , ρ m = ( − m (cid:0) N/ m (cid:1)(cid:0) N m (cid:1) . The maximum value of | ρ n | is , when m = N/ . These eigenvalues have appeared, e.g., inNestoridi (2017). STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 7
Remark 3.6. De Finetti sequences.
In example 1.3, when X is time homogenous, and ν N,t is a Dirac mass at r , we have ρ n,t = ρ n = n X k =0 (cid:18) nk (cid:19) r k (1 − r ) n − k (cid:16) − qp (cid:17) k = (cid:16) − rp (cid:17) n If r = p then ρ n = 0 for n = 1 , . . . N . X is then an independence chain which mixes in onestep. If r = p then | ρ | is the maximum value of | ρ n | . More generally, when ν N,t ≡ ν N , wehave (3.6) ρ n,t = ρ n = Z [0 , (cid:16) − rp (cid:17) n ν N ( dr ) . For each fixed N , the coordinates of Z can represent N particular coordinates from a count-ably infinite de Finetti sequence with mixing measure ν N . If ν N ≡ Leb (0 , then ρ n,t = ρ n = Z [0 , (cid:16) − rp (cid:17) n dr = pn + 1 (cid:18) − (cid:16) − qp (cid:17) n +1 (cid:19) . Definition 3.7.
We define the collection C of sequences ( X ( N ) ) N ∈ N ∈ G N with thefollowing property. For each N ∈ N , • the state space of X ( N ) is V N , and • there exists a sequence ( p N ) N ∈ [1 / , N such that π N ( · , p N ) is a stationarydistribution for X ( N ) , and lim N p N = p for some p ∈ [1 / , .From now on, once an element of C is fixed, we denote by P ( N ) t the transition kernelof X ( N ) . Define(3.7) t mix ( ε, x ) = inf { t : k P ( N ) t ( · | x ) − π N ( · ) k T V ≤ ε } . Let t mix ( ε ) = sup x ∈V N t mix ( ε, x ).Theorem 3.8 is quite general and simple to prove. This result provides almost the correctorder for the mixing time, missing a logarithmic factor. It provides bounds that are sharpup to a logarithmic factor in the case of exchangeability (defined in (3.9) below). Theorem 3.8 ( General lower bound for t mix ). Suppose that ( X ( N ) ) N ∈ C and eachprocess in the sequence is time homogeneous, i.e. ( Z ( N ) i ) i ∈ N are identically distributedfor each N . Define (3.8) θ N = min j ∈ [ N ] P ( Z ( N ) [ j ] = 1) . There exists a > such that t mix ( ε ) ≥ aθ − N , where we set a/ ∞ . Notice that θ N > guarantees irreducibility of the Markov chain X ( N ) . A. COLLEVECCHIO AND R. GRIFFITHS
Definition 3.9.
A random variable Z which takes values on V N is said to be exchangeableif (3.9) P ( Z = x ) = P ( Z = y ) whenever k x k = k y k . Definition 3.10.
Let P be a measure and Q a positive measure both defined on the subsetsof V N . Define (3.10) χ ( P | Q ) = X y ∈V N (cid:0) P ( y ) − Q ( y ) (cid:1) Q ( y ) . We set χ ( x , t ) = χ ( P t ( · | x ) | π N ) . Moreover, let t (2) mix ( ε, x ) = inf { t : χ ( x , t ) ≤ ε } , and t (2) mix ( ε ) = sup x ∈V N t mix ( ε, x ) . Definition 3.11.
Consider a sequence ( X ( N ) ) N ∈ C . We say that this sequence mixes almostperfectly in t steps, if there exists constants C and β ∈ (0 , such that (3.11) sup x ∈V N k P t ( · | x ) − π N ( · , p N ) k T V ≤ Cβ N . Theorem 3.12 ( Almost-perfect mixing in three steps).
Consider a sequence ( X ( N ) ) N ∈ C . We make the following assumptions. • p N > / for all N ∈ N and each X ( N ) is time-homogenous. • Each of the random variables Z ( N ) , Z ( N ) , Z ( N ) is exchangeable, for each N ∈ N ,in the sense of definition 3.9. • Let ζ N = k Z ( N ) k . We assume that there exists a random variable V such that lim N →∞ ζ N − Np √ Npq = V (in distribution) (3.12) sup N E h ( ζ N − Np ) a ( Npq ) a/ i < ∞ for some a > , (3.13) E [e V / ] < ∞ , (3.14) Then, we have that ( X ( N ) ) N ∈ N mixes almost-perfectly in 3 steps.Moreover, if P ( V = 0) = 1 then ( X ( N ) ) N ∈ N mixes almost-perfectly in 2 steps. STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 9
Theorem 3.13 ( Cut-off for NLRWH).
Let ( X ( N ) ) ∈ C and assume that each X ( N ) are defined as in Example 1.2, with stationary distribution π N ( · , p N ) . (1) If lim z N /N = 0 , then both t mix ( ε ) and t (2) mix ( ε ) exhibt a sharp cutoff at Np N z N log N . In other words, if we set t C = N p N z N (log N + C ) , we have that for all C < small enough, t mix ( ε ) , t (2) mix ( ε ) > t C , and for all C large enough t mix ( ε ) , t (2) mix ( ε ) < t C . (2) If z N /N = w ∈ (0 , \ { p } , then t (2) mix ( ε ) exhibits a sharp cutoff at t C =( p N / (2 w ))(log N + C ) . Let P t ( · | x ) be the p.m.f. of k X t k conditional on X = x . Let Q N be a Binomial withparameters N and p . Define χ for the Hamming distance as χ H ( x , t ) = χ ( P t ( · | x ) | Q N ). t (2) mix ( ε ) = inf { t : max x χ H ( x , t ) ≤ ε } . Remark 3.14. De Finetti sequences-continued.
Suppose that in the De Finetti casedescribed in Remark 3.6 we set ν N,t ≡ Leb (0 , and p > q . Let t = aN/ (ln N ) where a > is a small enough constant to be specified below. Using the computation given in Remark 3.6we have that (3.15) χ H ( , t ) = N X n =1 (cid:18) Nn (cid:19) (cid:18) pq (cid:19) n (cid:18) pn + 1 (cid:19) t (cid:18) − (cid:16) − qp (cid:17) n +1 (cid:19) t ≥ q − N N − t (cid:18) N ⌊ pN ⌋ (cid:19) p ⌊ pN ⌋ q N −⌊ pN ⌋ (cid:18) N p ⌊ pN ⌋ + 1 (cid:19) t (cid:18) − (cid:16) − qp (cid:17) ⌊ pN ⌋ +1 (cid:19) t = q − N N − t √ πpqN (1 + o (1))= exp {− N ln q − aN } √ πpqN (1 + o (1)) . Hence, if a < − (ln q ) / , we have that lim t →∞ χ H ( , t ) = ∞ . Hence t (2) mix ( ε ) > aN/ ln N .Thisresult should be compared with the case were ν N is a dirac mass at a point α ∈ (0 , \ { p } ,i.e. i.i.d. updates. In this context, (3.16) χ H ( , t ) = N X n =1 (cid:18) Nn (cid:19) (cid:18) pq (cid:19) n (cid:18) − αp (cid:19) tn = (cid:18) pq (cid:16) − αp (cid:17) t (cid:19) N − . The latter equation shows a completely different behaviour. In fact, we show in Section 9.2that χ H ( , t ) = sup x ∈V N χ H ( x , t ) . Equation (3.16) shows that when ν N,t is a dirac mass at α = p then it has a cut-off at b ln N when we consider the χ distance, with b depending on α only. Hence, it mixes much faster than the case when ν N,t ≡ Leb (0 , , which requires atleast aN/ ln N steps to mix. The i.i.d. case that we just discussed was studied in Scoppola(2011). Theorem 3.15 ( Constant order mixing at critical initial conditions).
Fix ε > . Let ( X ( N ) ) ∈ C and assume that each X ( N ) is defined as in Example 1.2, withstationary distribution π N ( · , p N ) . If k x k /N = p and z N = wN , with w > , then thereexists t ε not depending on N such that t (2) mix ( ε ) ≤ t ε . The following representation characterises the process in G in terms of N (possibly) de-pendent random walks. Theorem 3.16 ( Random Walk Representation).
Suppose that X ∈ G . We have (3.17) P t ( y | x ) = π ( y , p ) E " N Y j =1 (cid:16) − qp (cid:17) S ( t ) [ x , y ,j ] ! where S ( t ) [ x , y , j ] = x [ j ] + y [ j ] − P tk =1 Z k [ j ] , and the parameters p ≥ q are thesame as in (3.1) . The vectors ( Z j ) j ∈ N are independent and their distribution is definedin Definition 3.1. Moreover, k P t ( · | x ) − π ( · ) k T V = 2 X y ∈V N π ( y , p ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E " N Y j =1 (cid:16) − qp (cid:17) S ( t ) [ x , y ,j ] ! − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Viceversa, if the random vectors ( Z j ) j ∈ N are independent, then the process with tran-sition functions defined as in (3.17) belongs to G . Spectral representation via tensor products
Proposition 4.1. If X ∈ G then there exists constants ( γ A,t ) A ⊆ [ N ] ,t ∈ N such that (4.1) P t ( y | x ) = π ( y , p ) ( X A ⊆ [ N ] ,A = ∅ t Y m =1 γ A,m ! (cid:18) pq (cid:19) | A | Y j ∈ A (cid:18) − x [ j ] p (cid:19) (cid:18) − y [ j ] p (cid:19) ) . Proof.
The general form of a 1-step transition density expansion for X is(4.2) P ( X t +1 = y | X t = x ) = π ( y , p ) ( X L,M ⊆ [ N ] ,L,M = ∅ γ ( t ) LM Y i ∈ L p − x [ i ] √ pq Y j ∈ M p − y [ j ] √ pq ) , with γ ( t ) LM = γ ( t ) ML . This is a well-known expansion, named after Lancaster (1969), for P t ( y | x ) /π ( x ) (also known as Fourier-Walsh basis expansion in part of the literature) using thetensor product sets(4.3) ( N O i =1 n , p − x [ i ] √ pq o) O ( N O j =1 n , p − y [ j ] √ pq o) . The following steps are well-known from basis theory, but we include the steps for the sakeof completeness. We emphasize that we can compute the eigenvalues explicitely. Roughlyspeaking, the
Lancaster expansion applies to the ratio P t /π in terms of the two tensor productsets which are complete orthogonal function sets on the Bernoulli product distributions on STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 11 the sequences. The symmetry γ ( t ) LM = γ ( t ) ML is a consequence of reversibility of X . Moreover,for L * M , where the p.m.f. of ( X t +1 , X t ) is P t +1 ( x t +1 | x t ) π ( x t , p ) ,we have γ ( t ) LM = E h Y i ∈ L p − X t [ i ] √ pq Y j ∈ M p − X t +1 [ j ] √ pq i = E h Y i ∈ L p − X t [ i ] √ pq E h Y j ∈ M p − X t +1 [ j ] √ pq | X t ii = E h Y i ∈ L \ M p − X t [ i ] √ pq E h Y j ∈ M p − X t +1 [ j ] √ pq Y k ∈ L ∩ M p − X t [ k ] √ pq | X t ( M ) ii = 0if L * M , in virtue of Condition 2. Using (4.1) we get the general representation for P t ( y | x )in (3.4), using the orthogonality of the functions (4.3). (cid:3) Also the reverse is true.
Proposition 4.2. If X is a reversible Markov process which satisfies condition 1 and whosetransition kernel satisfies (4.1) , then X ∈ G .Proof. It is enough to prove that X satisfies Condition 2. The marginal distribution of X t +1 ( B ) conditional to the event X t = x , is(4.4) P ( X t +1 ( B ) = y ( B ) | X t ( B ) = x ) = π ( y ( B ) , p ) ( X A ⊆ B,A = ∅ ρ A,t (cid:16) pq (cid:17) | A | Y j ∈ A (cid:18) − x [ j ] p (cid:19) (cid:18) − y [ j ] p (cid:19) ) , where π ( y ( B ) , p ) is the product Bernoulli( p ) distribution on the coordinates B . The right-hand side of (4.4) depends on x ( B ) only, and this proves our result. (cid:3) Proof of Theorem 3.3
General construction of the process X.
In this Section we provide a constructionfor any reversible X on V N which satisfies Conditions 1 and 2. We explicitly construct acollection of reversible Markov processes G ′ , using an acceptance/rejection method. Soonafter, we prove that G = G ′ (Theorem 5.1 below).A process X ∈ G ′ if and only if it can be constructed as follows. Let q, p as in (3.1), andrecall q ≤ p , and q + p = 1. Consider a sequence of independent random variables ( Z t ) t ∈ N which take values in V N , and let ( ξ i,t ) t ∈ N ,i ∈ [ N ] be a sequence of i.i.d. Bernoulli( q/p ), i.e. P ( ξ i,t = 1) = q/p = 1 − P ( ξ i,t = 0). Consider the following homogeneous Markov process, X = ( X t ) t ∈ N , which we define recursively. Suppose ( X i : i ≤ t ) is defined, then define X t +1 as follows. For all i ∈ [ N ], • If Z t [ i ] = 0 then X t +1 [ i ] = X t [ i ]. • If X t [ i ] = 0 and Z t [ i ] = 1 then X t +1 [ i ] = 1 . • If X t [ i ] = 1 and Z t [ i ] = 1, then X t +1 [ i ] = X t [ i ] + ξ i,t mod 2. Theorem 5.1. G = G ′ . Proof.
We first prove that G ′ ⊆ G . More specifically, we prove that(5.1) P ( X t +1 = y | X t = x ) = π ( y , p ) n X A ⊆ [ N ] ,A = ∅ ρ A,t (cid:16) pq (cid:17) | A | Y j ∈ A (cid:16) − x [ j ] p (cid:17)(cid:16) − y [ j ] p (cid:17)o . with(5.2) ρ A,t = E h Y i ∈ A (cid:18) − Z [ i ] p (cid:19) i . Using orthogonality, it is enough to prove the case t = 0. Assume X ∈ G ′ . A coordinate i is ‘picked’ if and only if Z [ i ] = 1. Recall that conditionally on Z , the coordinates that arepicked behave independently. Hence, E (cid:2) X [ i ] − p | X = x , Z (cid:3) = (1 − Z [ i ])( x [ i ] − p )+ Z [ i ] (cid:0) (1 − X [ i ])(1 − p ) + X [ i ]( − p ( q/p ) + (1 − p )(1 − ( q/p )) (cid:1) = (cid:18) − Z [ i ] p (cid:19) ( X [ i ] − p ) . Therefore for A ⊆ [ N ] E h Y j ∈ A ( X [ j ] − p ) | X = x i = E h Y i ∈ A (cid:18) − Z [ i ] p (cid:19) i Y j ∈ A ( x [ j ] − p )which implies (5.1), and in particular identifies ρ A, = E h Y i ∈ A (cid:18) − Z [ i ] p (cid:19) i . because the coefficients of Y j ∈ A (cid:16) − y [ j ] p (cid:17) in a tensor product expansion of P ( X t +1 = y | X t = x ) /π ( y , p ) with respect to y are equalto ρ A,t (cid:16) pq (cid:17) | A | Y j ∈ A (cid:16) − x [ j ] p (cid:17) . Next we prove that
G ⊆ G ′ . Take Z t to have the distribution of X t +1 | X t = . Z t has ap.m.f. π ( z , p ) ( X A ⊆ [ N ] ,A = ∅ ρ A,t (cid:16) pq (cid:17) | A | Y j ∈ A (cid:16) − z [ i ] p (cid:17)) and it follows that for A ⊆ [ N ] E h Y j ∈ A (cid:16) − Z t [ j ] p (cid:17)i = ρ A,t . As ( ρ A,t ) A,t identify the distribution of the reversible markov chain X , the proof follows fromthe following considerations. Construct a process X ′ in G ′ using the same ( Z t ) t ∈ N . Denote STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 13 by ( ρ ′ A,t ) A,t the eigenvalues of X ′ , then by the previous part of this proof ( G ′ ⊆ G ) we obtainthat ( ρ ′ A,t ) A,t = ( ρ A,t ) A,t . (cid:3) Proof of Theorem 3.16: Random Walk representation
Proof of Theorem 3.16.
The transition probability for X given X = x is(6.1) π ( y , p ) ( X A ⊆ [ N ] ,A = ∅ ρ A (cid:16) pq (cid:17) | A | Y j ∈ A (cid:16) − y [ i ] p (cid:17)(cid:16) − x [ i ] p (cid:17)) . Note the identity that for u ∈ { , } N , A ⊂ [ N ], Y j ∈ A (cid:16) − u [ j ] p (cid:17) = (cid:16) − qp (cid:17) || u ( A ) || . The expression in (6.1) can therefore be written as π ( y , p ) ( X A ⊆ [ N ] ,A = ∅ ( − | A | E h(cid:16) − qp (cid:17) k Z ( A ) k + k y ( A ) k + k x ( A ) k−| A | i) = π ( y , p ) E " N Y j =1 − (cid:16) − qp (cid:17) Z [ j ]+ x [ j ]+ y [ j ] − ! . (6.2)Eq. (3.17) follows by replacing ρ A in (6.1) with ρ tA = E h(cid:16) − qp (cid:17) P tk =1 k Z k | ( A ) k i = E h(cid:16) − qp (cid:17) S t ( A ) i , where we used the i.i.d. assumption on the sequence of vectors ( Z t ) t ∈ N . (cid:3) Next we introduce a family of orthogonal polynomials on the Binomial distribution.
Definition 6.1 ( Krawtchouk polynomials).
We define a class of polynomials n Q n ( x ; N, p ) : n, N ∈ N , x ∈ { , , . . . , N } o , using the generating function (6.3) N X n =0 (cid:18) Nn (cid:19) Q n ( x ; N, p ) s n = (1 − ( q/p ) s ) x (1 + s ) N − x . Proposition 6.2.
The family of polynomials n Q n ( x ; N, p ) : n, N ∈ N , x ∈ [ N ] , o satisfy thefollowing properties. (1) They are orthogonal in the following sense: E (cid:2) Q n ( X ; N, p ) Q m ( X ; N, p ) (cid:3) = δ m,n h − n , where X is Binomial ( N, p ) , h n = (cid:0) Nn (cid:1) ( p/q ) n and the Kronecker δ m,n ∈ { , } equals if and only if m = n . (2) If x ∈ V N then the family of polynomials satify a symmetric function representation (6.4) Q n ( k x k ; N, p ) = (cid:18) Nn (cid:19) − X A ⊆ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17) . The representation (6.4) is seen by noting that the generating function agrees with (6.3),since 1 + N X n =1 s n X A ⊆ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17) = N Y j =0 s (cid:16) − x [ j ] p (cid:17)! = (cid:16) − ( q/p ) s (cid:17) || x || (1 + s ) N −|| x || . These polynomials are scaled so that for all n ∈ [ N ], Q n (0; N, p ) = 1. The relationship withthe ‘usual’ Krawtchouk polynomials K n is that for any x ∈ N , Q n ( x ; N, p ) = K n ( x ; N, p ) N !( N − n )! ( − p ) n . See, eg. NIST Handbook (2010) Section 18.9, or Diaconis and Griffiths (2012) for moredetails about the Krawtchouk polynomials.
Proposition 6.3. If ( Z t ) t ∈ N are exchangeable we have ρ A ≡ ρ | A | = E (cid:2) Q | A | ( k Z k ; N, p ) (cid:3) . The transition probabilities are then (6.5) π ( y , p ) ( N X n =1 ρ n (cid:16) pq (cid:17) n X A ⊆ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17)(cid:16) − y [ j ] p (cid:17)) . Proof.
This follows from (6.4) since under exchangeability for any A ⊂ [ N ] with | A | = n , ρ A = E h Y j ∈ A (cid:16) − Z [ j ] p (cid:17)i = (cid:18) Nn (cid:19) − X A ⊆ [ N ] , | A | = n Y j ∈ A E h(cid:16) − Z [ j ] p (cid:17)i = E (cid:2) Q | A | ( k Z k ; N, p ) (cid:3) . (cid:3) Corollary 6.4.
The transition probabilities (6.5) can be written as (6.6) π ( y , p ) ( N X n =1 ρ n h n R n (cid:0) k x k , k y k , h x , y i (cid:1)) , where h·i denotes inner product and R n ( · , · , · ) is the coefficient of (cid:0) Nn (cid:1) s n in the generatingfunction (6.7) (1 + s ) N (1 − sq/p ) N + N (1 + sq /p ) N with N lm being the number of pairs x [ j ] = l, y [ j ] = m , j ∈ [ N ] , l, m ∈ { , } . The countsappearing in (6.7) satisfy N = 1 − || x || − || y || + h x , y i , N + N = k x k + k y k − h x , y i , N = h x , y i . Proposition 6.5.
Suppose Z is exchangeable. Fix y , x ∈ V N . We have (6.8) P ( k X k = k y k | X = k x k ) = (cid:18) N k y k (cid:19) p k y k q N −k y k n N X n =1 ρ n h n Q n ( || x || ; N, p ) Q n ( || y || ; N, p ) o . STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 15
Proof.
The sum in a permutation distribution σ in the symmetric group of order N N ! X σ ∈ S N X A ⊆ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17)(cid:16) − y [ σ ( j )] p (cid:17) = 1 N ! ( N − n )! n ! X A ⊆ [ N ] , | A | = n Y j ∈ A (cid:16) − y [ j ] p (cid:17) × X A ⊆ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17) = (cid:18) Nn (cid:19) Q n ( k x k ; N, p ) Q n ( k y k ; N, p ) . Summing to find the distribution of k y k in the permutation distribution of y from (6.5)gives (6.8). (cid:3) Proof of Theorem 3.8.
Fix N and fix a coordinate ℓ which minimizes P ( Z ( N ) [ ℓ ] = 1) = θ N . Let A = { y ∈V N : y [ ℓ ] = 1 } . Choose t ≤ a/θ N , where a is chosen as follows. The quantity (1 − θ N ) /θ N isbounded away from 0 as long as θ N is bounded away from 1. Choose a such that(1 − θ N ) a/θ N > − p N / . We have that(7.1) P t ( A | ) = 1 ≤ − (1 − θ N ) a/θ N ≤ p N . We have that π ( A ) = p N . Hence, k P t ( · | ) − π ( · , p N ) N ) k T V ≥ π ( A, p N ) − P t ( A | ) > p N ≥ . Proof of Theorem 3.12.
Definition 8.1.
Let ( H n ( v ) : n ∈ N , v ∈ R ) be the Hermite polynomials, which aredefined through the generating function (8.1) ∞ X n =0 H n ( v ) ψ n n ! = e ψv − ψ . Notice that the H n ( v ) are orthogonal polynomials with respect to the standard normaldistribution, i.e. for n = m , we have Z ∞−∞ H n ( v ) H m ( v ) 1 √ π e − v / d v = n ! δ mn , where δ mn is the Kronecker delta. Remark 8.2.
In what follows we use the following notation. For two sequences ( a N ) N and ( b N ) N of real numbers a N ∼ b N if an only if lim N →∞ a N b N = 1 . Recall that q N = 1 − p N and that lim N →∞ p N = p = 1 − q . Proposition 8.3.
Under the assumptions of Theorem 3.12 then we have lim N →∞ E (cid:2) Q n ( ζ N ; N, p N ) (cid:3) = ( − n h / n √ n ! E [ H n ( V )] . Proof.
It is enough to prove convergence in distribution, as we can use the moment condi-tion (3.13) to appeal to the dominated convergence theorem. In turn, in order to prove theconvergence in distribution, it is enough to prove that for any sequence z N such that(8.2) lim N →∞ z N − N p N p N p N (1 − p N ) = v for some number v , we have that(8.3) lim N →∞ h / n Q n ( z N ; N, p N ) = ( − n ( n !) / H n ( v ) . We prove the convergence in (8.3) using generating function.Note that for fixed n , as N → ∞ with p N → p we have( n !) / h / n = (cid:16) n ! (cid:18) Nn (cid:19) ( p N /q N ) n (cid:17) / ∼ (cid:16) N ( p/q ) (cid:17) n/ . Hence, we get the following estimate, which holds for all z, N ∈ N and p ∈ [1 / , N X n =0 ( n !) / h / n Q n ( z ; N, p N ) s n n ! ∼ N X n =0 (cid:18) Nn (cid:19) Q n ( z ; N, p ) (cid:16)r pN q s (cid:17) n = (cid:16) − ( q/p ) r pN q s (cid:17) z (cid:16) (cid:0)r pN q s (cid:17) N − z . (8.4)Taking the logarithm of both sides of (8.4) and setting a = ( q/p ) p ( p/q ) s = p ( q/p ) s , b = p ( p/q ) s , we haveln n X n =0 ( n !) / h / n Q n ( z ; N, p ) s n n ! ∼ z log (cid:16) − a √ N (cid:17) + ( N − z ) log (cid:16) b √ N (cid:17) = − z (cid:16) a + b √ N + 12 a − b N (cid:17) + N (cid:16) b √ N − b N (cid:17) + O ( N − / )= −√ N p ( a + b ) + √ N b − v √ pq ( a + b ) −
12 ( a − b ) p − b −
12 ( a − b ) √ pqvN − / + O ( N − / ) . (8.5)We have the following simplifications in (8.5) − p ( a + b ) + b = − pa + qb = −√ pqs + √ pqs = 0 −√ pq ( a + b ) = − qs − ps = − s − ( a − b ) p − b = − (cid:16) qp − pq (cid:17) ps − pq s = − s STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 17 so (8.5) is equal to − vs − s + v O ( N − / ) . That is, the generating function (8.4) is equal toexp (cid:8) − vs − s + v O ( N − / ) (cid:9) which converges to the generating function of ( − n H n ( v ). (cid:3) Using a C´esaro sum argument, we immediately get from Proposition 8.3 the followingresult.
Corollary 8.4.
For all p ∈ [0 , and t > , (8.6) lim N →∞ P Nn =1 (cid:0) Nn (cid:1)(cid:16) p N / (1 − p N ) (cid:17) n ρ tn P Nn =1 1 N n ( t − (cid:16) (1 − p N ) /p N (cid:17) n ( t −
1) 1 n ! ( E [ H n ( V )]) t = 1 . Proof of Theorem 3.12.
It is well-known (e.g. see Lemma 12.16 in Levin et. al. (2009)) thatfor a reversible Markov chain(8.7) χ ( x , t ) = X n ≥ λ tn f n ( x ) where λ = 1 and { λ i } i ≥ are eigenvalues and f j ( x ) orthonormal eigenvectors with respectto the stationary distribution.In our context χ ( x , t ) = X A ⊆ [ N ] : A = ∅ ρ tA Y i ∈ A (cid:16) p N − x [ i ] √ p N q N (cid:17) = X A ⊆ [ N ] : A = ∅ ρ tA (cid:16) p N q N (cid:17) | A | Y i ∈ A (cid:16) − x [ i ] p N (cid:17) (8.8) ≤ N X n =1 (cid:18) Nn (cid:19)(cid:16) p N q N (cid:17) n ρ tn . Notice that the bound is sharp, as it is achieved for initial condition x = . Hence,max x χ ( x , t ) = χ ( , t ) = N X n =1 (cid:18) Nn (cid:19)(cid:16) p N q N (cid:17) n ρ tn . In virtue of Corollary 8.4 in order to have an estimate of the χ distance, i.e. the numeratorin the right-hand side of (8.6), we simply need an estimate of the denominator, i.e.(8.9) N X n =1 N n ( t − (cid:16) q N p N (cid:17) n ( t − n ! ( E [ H n ( V )]) t . To this end, we use the following well-known formula (see, e.g., NIST Handbook (2010)18.10.10 p448) which holds for any v ∈ R , H n ( v ) = 2 n +1 √ π e v / Z ∞ e − τ τ n cos (cid:0) √ vτ − nπ (cid:1) dτ. Therefore | H n ( v ) | ≤ n +1 √ π e v / Z ∞ e − τ τ n dτ = 2 n +1 √ π e v /
12 Γ (cid:16) n (cid:17) = e v / m )!2 m m ! n = 2 m m +1 √ π e v / m ! n = 2 m + 1 . (8.10)If n is even | H n ( v ) | ≤ e v / | H n (0) | . We use these estimates to provide bounds for the sumof even terms in (8.9) as follows(8.11) ⌊ N/ ⌋ X m =1 N m ( t − (cid:16) q N p N (cid:17) m ( t − m )! ( E [ H m ( V )]) t ≤ E [ e V / ] t [ N/ X m =1 N m ( t − (cid:16) q N p N (cid:17) m ( t − m )! (cid:16) (2 m )!2 m m ! (cid:17) t . Denote the terms in the sum in the right-hand side of (8.11) as b m , b m +1 b m = 1 N · m + 1) (cid:16) q N p N (cid:17) t − (cid:16) m + 1 N (cid:17) t − < m + 1 ≤ [ N/ m ≤ [ N/ b m = b , i.e. the first term in the sum. Therefore ⌊ N/ ⌋ X m =1 N m ( t − (cid:16) q N p N (cid:17) m ( t − m )! ( E [ H m ( V )]) t ≤ E [ e V / ] t N √ π N t − (cid:16) q N p N (cid:17) t − t which tends to zero as N → ∞ if t > /
2. Notice that H n +1 (0) = 0. Hence, in the case of P ( V = 0) = 1 we do not have to estimate the odd terms, and the chain mixes almost-perfectlyin 2 steps. When P ( V = 0) < ⌊ N/ ⌋ X m =0 N (2 m +1)( t − (cid:16) q N p N (cid:17) (2 m +1)( t − m + 1)! ( E [ H m +1 ( V )]) t ≤ √ π E [ e V / ] t ⌊ N/ ⌋ X m =0 N (2 m +1)( t − (cid:16) q N p N (cid:17) (2 m +1)( t − m + 1)! m ! t . Writing the terms in the latest sum as c m , we have c m +1 c m = (cid:16) q N p N (cid:17) t − m + 3 (cid:16) m + 1 N (cid:17) t − . If t is of constant order this ratio is less than one for N sufficiently large because there existsa m of constant order such that for m ≥ m , 2 t − / (2 m + 3) <
1, then an N can be chosensuch that for N ≥ N the terms in the ratio for m < m are less than 1. The first term isthen again maximal for N large enough with t of constant order, and the sum of the oddterms is less than N √ π E [ e V / ] t t − N t − (cid:16) q N p N (cid:17) t − STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 19 which tends to zero if t >
2. Taking into account both even and odd terms in (8.9) themixing time is t = 3 if P ( V = 0) < (cid:3) Proof of Theorems 3.13
Lower bound for t mix . The following Theorem is due to David Wilson (see, e.g.,Theorem 13.5, p172 in Levin et. al. (2009)).
Theorem 9.1. [Wilson bound]
Let X be an irreducible aperiodic Markov chain with statespace Ω . Let Φ be an eigenfunction with eigenvalue λ satisfying / < λ < . Fix < ε < and let R > satisfy E x h(cid:12)(cid:12) Φ( X ) − Φ( x ) (cid:12)(cid:12) i ≤ R for all x ∈ Ω . Then for any x ∈ Ω(9.1) t mix ( ε ) ≥
12 log(1 /λ ) " log (1 − λ )Φ( x ) R ! + log − εε ! . Next, consider a sequence ( X ( N ) ) N ∈ N ∈ C . We apply Wilson’s Lemma to each element ofthe sequence, with the choice of first eigenvalue and eigenvector pair. ThenΦ N ( x ) = k x k − N p, λ N = 1 − z N N p .
From (13.1) in the Appendix, E x (cid:2) X ( X −
1) + X (cid:3) = N ( N − p (cid:16) ρ Q ( x ; N, p N ) − ρ Q ( x ; N, p N ) + 1 (cid:17) + N p (cid:16) − ρ Q ( x ; N, p N ) + 1)In particular E (cid:2) X ] ∼ N ( N − p (cid:16)(cid:0) − z N N p N (cid:1) − (cid:0) − z N N p N (cid:1) + 1 (cid:17) + N p N (cid:16) − (cid:0) − z N N p N (cid:1) + 1 (cid:17) = (cid:16) − N (cid:17) z N + z N (9.2)1 / < λ N < z N /N < p N /
2. We fix x = (0 , , . . . , ∈ V N . Notice that wecover also the case lim N →∞ z N /N = 0. There exists a constant c > λ N ≥ − c z N N p N . Using (9.3) in Wilson’s bound (9.1), we get t mix ( ε ) ≥ c N p N z N " log z N N p N ( N p N ) R ! + log − εε ! ≥ c N p N z N " log N p N z N R ! + log − εε ! ≥ c N p N z N " log N + log p N R + log 1 − εε . (9.4) The dominant term in (9.4) together gives that(9.5) t mix ( ε ) ≥ c N p N z N log N + O ( N ) . Notice that the bound in equation (9.3) is required just for all large N . Hence, in the caselim N →∞ z N /N = 0, we can choose any c ∈ (0 , ε >
0, we have(9.6) t mix ( ε ) ≥ (1 − ε ) N p N z N log N + O ( N ) . Comparison between t mix and t (2) mix . The following proposition is well-known in the lit-erature. We add a proof here for the sake of clarity and completeness.
Proposition 9.2.
We will have the following useful relation between the total variation andthe χ distances k P t ( x , · ) − π N ( · ) k T V ≤ χ ( x , t ) , for all x ∈ Ω . Proof. || P t ( · | x ) − π N ( · ) || TV = 12 X y | P t ( y | x ) − π N ( y ) | = 12 X y p π N ( y ) | P t ( y | x ) − π N ( y ) | / p π N ( y ) ≤ p χ ( x ) , where in the last step, we used Cauchy-Schwartz inequality. (cid:3) A by-product we have the following corollary.
Corollary 9.3. t mix ( ε ) ≤ t (2) mix ( √ ε/ . Digression on which χ -distance to use. Recall that(9.7) ρ n = E h Q n ( k Z k ; N, p ) i then for sequences;(9.8) χ t ( x ) = X n ≥ h n (cid:18) Nn (cid:19) − X A ⊆ [ N ] , | A | = n E h Y j ∈ A (cid:16) − Z [ j ] p (cid:17)i! t Y j ∈ A (cid:16) − x [ j ] p (cid:17) . From now on, in this section, assume that Z is exchangeable. The χ t distance (9.8) when Z is exchangeable simplifies to(9.9) χ t ( x ) = X n ≥ h n ρ tn (cid:18) Nn (cid:19) − X A ⊆ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17) . On the other hand, recall that P t ( · | x ) the p.m.f. of k X t k conditional to X = x . Let Q N be a Binomial with parameters N and p . Define χ for the Hamming distance as STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 21 χ H ( x , t ) = χ ( P t ( · | x ) | Q N ). We have the following representation(9.10) χ H ( x , t ) = X n ≥ h n ρ tn Q n ( k x k ; N, p ) . In general, we have that χ t ( x ) ≥ χ H ( x , t )which accords with intuition. This is because (cid:18) Nn (cid:19) − X A ⊂ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17) ≥ (cid:18) Nn (cid:19) − X A ⊂ [ N ] , | A | = n Y j ∈ A (cid:16) − x [ j ] p (cid:17)! = Q n ( k x k , N, p ) On the other hand, we already proved that the supremum over x of the two distinct χ ’sdistances (9.9) and (9.10) coincide, and occurs when x = . In other words, sup x χ H ( t, x ) = χ H ( t, ) and(9.11) sup x ∈V N χ ( t, x ) = χ ( t, ) = χ H ( t, ) = X n ≥ h n ρ tn . The reasoning above implies that if we look at the worse-case scenario, in terms of initialconfigurations, χ and χ H behave in the same way. On the other hand, it is possible tochoose initial conditions that make χ H much smaller than χ , resulting in a faster mixingfor the Hamming distance. This gives somehow the intuition behind Theorem 3.15. Upper bound for t (2) mix . We assume that k x k 6 = N p N . In virtue of our reasoning in the previous section, we can use the χ distance for theHamming distance, as when we take the supremum over x , it coincides with χ t . Recall that χ H ( x , t ) = N X n =1 ρ tn h n Q n ( k x k ; N, p N ) Proposition 9.4.
Under the assumptions of Theorem 3.13, we have that for any x ∈ [0 , N ] ∩ N , (9.12) Q n ( x, N, p N ) ∼ (cid:18) − xN p N (cid:19) n . Proof.
Replace s by s/N in the generating function (6.3) and take the logarithm of bothsides to get(9.13) log N X n =0 (cid:18) Nn (cid:19) Q n ( x ; N, p N ) s n N n ! = x log (cid:18) − qsp N N (cid:19) + ( N − x ) log (cid:16) sN (cid:17) . Let ζ N = x/N and α = q N /p N . The right-hand side of (9.13) becomes N ζ N log(1 − sα N /N ) + N (1 − ζ N ) log(1 + s/N )= ζ N (cid:16) − α N s − N α N s (cid:17) + (1 − ζ N ) (cid:16) s − N s (cid:17) + O ( N − )= s − N s − ζ N (cid:16) s/p N − s ( p N − q N ) 12 N p N (cid:17) + O ( N − ) . (9.14) If ζ N = p then (9.14) is equal to s (1 − ζ N /p N ) + O ( N − )however if ζ N = p N then (9.14) is equal to − ( q N /p N ) s N + O ( N − ) . Therefore for fixed ζ N = p , where recall that p = lim N →∞ p N , asymptotic values are Q n ( N ζ N ; N, p N ) ∼ (cid:16) − ζ N /p (cid:17) n . If ζ N = p then Q n +1 ( N p ; N, p ) = 0 and Q n ( N p ; N, p ) = ( − q/p ) n (2 n )! n ! 1(2 N ) n . If p = q then from the original generating function of (1 − s ) N/ (1 + s ) N/ Q n ( N/ N, p ) = ( − n (cid:0) N/ n (cid:1)(cid:0) N n (cid:1) , which agrees with the case above with p → / N → ∞ . (cid:3) Combining (9.7) with Proposition 9.12, we have that(9.15) ρ n ∼ (cid:18) − z N N p (cid:19) n . Hence, using (9.10), we have χ H ( x ) ∼ N X n =1 (cid:16) − z N N p (cid:17) nt (cid:18) Nn (cid:19)(cid:16) pq (cid:17) n (cid:18) − k x k N p (cid:19) n = N X n =1 N n n ! (cid:16) − z N N p (cid:17) nt (cid:16) pq (cid:17) n (cid:16) − k x k N p (cid:17) n ≤ exp ( N (cid:16) − z N N p (cid:17) t (cid:16) − k x k N p (cid:17) (cid:16) pq (cid:17)) − ≤ exp ( N e − t zNNp (cid:16) − k x k N p (cid:17) (cid:16) pq (cid:17)) − ≤ exp ( N e − t zNNp (cid:16) pq (cid:17)) − . (9.16)Choose(9.17) t N = N p z N (log N + C ) STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 23 then the upper bound in (9.16) is equal to(9.18) exp ( exp {− C } (cid:16) pq (cid:17)) − . For a lower bound take the first term in the χ t ( ) expression. χ t N ≥ χ t N ( ) = χ H ( , t N ) ≥ (cid:16) − z N N p (cid:17) t N N (cid:16) pq (cid:17) → e − C (cid:16) pq (cid:17) . (9.19)This shows that (9.17) is a cutoff time because if C is large and positive both the up-per and lower bounds are small, and if C is large and negative both bounds are large.Calculations here are related to chi-squared cutoff calculations for a multinomial model inDiaconis and Griffiths (2019), Section 4.1.10. Further discussion and a few examples
Example 10.1. Contingency Table.
In a classical paper Aitken and Gonin (1935) derivethe joint marginal distribution of ’s in rows and columns of a × contingency tablewith categories and . Let P be × bivariate probability matrix describing probabilitiesin the contingency table. Fix the margins in P to be Bernoulli ( p ) . Take N independentobservations in the table to form a table n = ( n ij ; i, j ∈ { , } ) . Let X = n + n and Y = n + n . Then Y | X has a p.m.f. of the form (6.8) (with notation X = k x k , Y = k y k ) where ρ n = ρ n , such that ρ = ( p − p ) /pq is the correlation coefficient in P . Theconditional distribution fits into Example 1.3 where each coordinate is updated independentlywith probability α N,t = p /q . The χ cutoff time when ρ = 0 is t N = log N + log( p/q ) + C − | ρ | . The proof is sketched as follows. The spectral representation for χ t N when starting from is N X n =1 h n ρ nt N = N X n =1 (cid:18) Nn (cid:19)(cid:16) pq (cid:17) n ρ nt N = (cid:16) pq ρ t N (cid:17) N − (cid:16) e log( p/q )+2 t N log | ρ | (cid:17) N − → e e − C − , (10.1)showing that t N is a cutoff time by taking C >
C < | C | large. Example 10.2.
This example illustrates the difference between the chi-squared cutoff for theHamming distance and the chi-squared cutoff for the sequences depending on the initial x . Consider a model where || Z || = N and || x || = N p . The factor in χ t ( x ) of (cid:18) Nn (cid:19) − X A ⊂ [ N ] , | A | = n (cid:16) − x [ i ] p (cid:17) . is the coefficient of (cid:0) Nn (cid:1) s n in the generating function (cid:16) (cid:16) qp (cid:17) s (cid:17) Np (cid:16) s (cid:17) Nq . Replacing s by s/N (cid:16) (cid:16) qp (cid:17) sN (cid:17) Np (cid:16) sN (cid:17) Nq → exp n s qp o . Then it follows that χ t ( x ) ∼ N X n =1 ρ tn h n (cid:16) qp (cid:17) n . If || Z || = N then ρ n = ( − q/p ) n and χ t ( x ) ∼ N X n =1 (cid:16) qp (cid:17) nt (cid:18) Nn (cid:19)(cid:16) pq (cid:17) n (cid:16) qp (cid:17) n = (cid:16) qp (cid:17) t ! N − . A calculation shows then, with (10.2) t N = log N + C − q/p ) then e − C < χ t ( x ) < exp { e − C } − so the cutoff time calculated from χ t ( x ) is given by (10.2) compared to the Hamming distancewhich has a finite mixing time when w = p = 1 / . Proof of Theorem 3.15
Let Q n +1 ( N p ; N, p ) = 0 and Q n ( N p ; N, p ) = ( − q/p ) n (2 n )! n ! 1(2 N ) n . Then(11.1) χ H ( N p, t ) = [ N/ X n =1 Q n ( z ; N, p ) t (cid:18) Nn (cid:19)(cid:16) pq (cid:17) n (2 n )! n ! 1(2 N ) n ! . If z/N = w and w = p , using Proposition 9.12 we have χ H ( N p, t ) ∼ [ N/ X n =1 (cid:18) − wp (cid:19) nt (cid:18) Nn (cid:19)(cid:16) pq (cid:17) n (2 n )! n ! 1(2 N ) n ! . (11.2)Let b n denote the nth term in the sum (11.2). The ratio of terms is b n +1 b n = (cid:0) − wp (cid:1) t N − nn + 1 pq (2 n + 1) N < (cid:0) − wp (cid:1) t pq N + 1) N < , STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 25 for t > t ◦ , where t ◦ is a finite time not depending on N . The first term b is thereforemaximal for t > t ◦ and(11.3) χ H ( N p ) < N b = 12 (cid:0) − wp (cid:1) t pq . Choose t ε > t ◦ such that the right-hand side of (11.3) is less than ε .12. Remarks and Conclusion
Remark 12.1.
Let Q be the set of probability measures P such that there exists X ∈ G satisfying P ( · ) = P ( X ∈ · | X = x ) , for some x ∈ V N . The set Q is convex. The extreme points of Q are the processes in G whereeach of ( Z t ) t ∈ N take a single value with probability 1. Let M be the set of × probabilitytransition matrices with stationary distribution ( q, p ) . M is a convex set with extreme points I , the identity matrix, and the transition matrix for acceptance/rejection of (cid:18) qp − qp (cid:19) . It is straightforward to show that a model where changes occur at coordinates according to P , . . . , P N , once picked to change with probability Z t , can be expressed in terms of our modelwith a modified distribution for Z t by using extreme point representations for the transitionmatrices. Remark 12.2.
The sufficiency that ρ A = E (cid:2) Q i ∈ A (cid:0) − Z [ i ] /p ) (cid:3) could also be proved by usingan important hypergroup property which can be described as follows. There exists a Bernoullirandom variable ξ such that for u, v ∈ { , } , (cid:0) − u/p (cid:1)(cid:0) − v/p (cid:1) = E uv (cid:2) − ξ/p ] . In fact P (cid:0) ξ = 1 (cid:1) = δ u + v, + (cid:16) − qp (cid:17) δ u + v, . A general reference is Bakry and Huet (2008) and areference more in the context of this paper is Diaconis and Griffiths (2012). Appendix
Proposition 13.1.
For any x ∈ [ N ] ∪ { } we have that (13.1) x ( x −
1) = 2 q h Q ( x ; N, p ) − pq ( N − h Q ( x ; N, p ) + N ( N − p = N ( N − p Q ( x ; N, p ) − N ( N − p Q ( x ; N, p ) + N ( N − p . Proof.
Consider, with expectation in the Binomial (
N, p ) distribution E (cid:2) X ( X − (cid:0) − qp s (cid:1) X (cid:0) s (cid:1) N − x (cid:3) = N ( N − p (cid:0) − qp s (cid:1) (cid:16) p (cid:0) − qp s (cid:1) + q (1 + s ) (cid:17) N − = N ( N − p (cid:0) − qp s (cid:1) . (13.2)Looking at coefficients of s and s , E (cid:2) X ( X − Q ( X ; N, p ) (cid:3) = (cid:18) N (cid:19) − N ( N − p × − qp = − pq ( N − E (cid:2) X ( X − Q ( X ; N, p ) (cid:3) = (cid:18) N (cid:19) − N ( N − p × (cid:16) qp (cid:17) = 2 q E (cid:2) X ( X − (cid:3) = N ( N − p , which proves our result. (cid:3) Acknowledgements
We thank Persi Diaconis and Tim Garoni for helpful comments thatreally improved our results.
ReferencesAitken, A. C. and Gonin, H. T. (1935) On fourfold sampling with and without replace-ment.
Proc. Roy. Soc. Edinb. Bakry, D. and Huet, N. (2008). The hypergroup property and representation of MarkovKernels. S´eminare de Probabiliti´es XLI,
Lecture Notes in Mathematics . Vol 1934, 295–347,Berlin: Springer-Verlag.
Diaconis, P. Graham, R. L. and Morrison, J. A. (1990). Asymptotic analysis of arandom walk on a hypercube with many dimensions.
Random structures and algorithms Diaconis, P. and Griffiths R. C. (2012). Exchangeable pairs of Bernoulli randomvariables, Krawtchouk polynomials, and Ehrenfest urns.
Aust NZ J Stat Diaconis, P. and Griffiths, R. C. (2019). Reproducing kernel polynomials on themultinomial distribution.
J. Approx. Theory , Diaconis, P. and Shahshahani, M. (1986) On square roots of the uniform distributionon compact groups, Proc. Amer. Math. Soc. no. 2, 341–348. Flegg, M. B., Pollett, P. K. & Gramotnev, D. K. (2008). Ehrenfest model for con-densation and evaporation processes in degrading aggregates with multiple bonds
PhysicalReview E - Statistical, Nonlinear, and Soft Matter Physics. Kakutani S. (1948) On Equivalence of Infinite Product Measures.
Annals of Mathematics Karlin S., Lindqvis B., Yao Y-C.
Markov Chains on Hypercubes: Spectral Represen-tations and Several Majorization Relations
Random Structures and Algorithms Lancaster H. O. (1969).
The chi-squared distribution , John Wiley & Sons.
Letac, G. and Tak´acs, L. (1979). Random walks on an m-dimensional cube .
Journalf¨ur die Reine und Angewandte Mathematik
Levin, D.A., Peres, Y., Wilmer, E.L. (2009).
Markov chains and mixing times , Amer-ican Mathematical Society, Providence, RI.
Nestoridi, E. (2017). A non-local random walk on the hypercube.
Adv. Appl. Prob. Propp, J. and Wilson, D. (1996) Exact sampling with coupled Markov chains and ap-plications to statistical mechanics,
Random Structures and Algorithms Scoppola, B. (2011). Exact Solution for a Class of RandomWalks on the Hypercube.
J.Stat. Phys.
STEPS MIXING FOR GENERAL RANDOM WALKS ON HYPERCUBE 27
Andrea Collevecchio, School of Mathematical Sciences, Monash University, Melbourne
E-mail address : [email protected]
Robert Griffiths, School of Mathematical Sciences, Monash University, Melbourne
E-mail address ::