[PDF] Cutoff Phenomenon for Random Walks on Kneser Graphs

Abstract

The cutoff phenomenon for an ergodic Markov chain describes a sharp transition in the convergence to its stationary distribution, over a negligible period of time, known as cutoff window. We study the cutoff phenomenon for simple random walks on Kneser graphs, which is a family of ergodic Markov chains. Given two integers n and k , the Kneser graph K(2n+k,n) is defined as the graph with vertex set being all subsets of {1,…,2n+k} of size n and two vertices A and B being connected by an edge if A∩B=∅ . We show that for any k=O(n) , the random walk on K(2n+k,n) exhibits a cutoff at 1 2 log 1+k/n (2n+k) with a window of size O( n k ) .

Full PDF

aa r X i v : . [ m a t h . C O ] A p r Cutoﬀ Phenomenon for Random Walks on Kneser Graphs

Ali Pourmiri ∗ Thomas Sauerwald † August 25, 2018

Abstract

The cutoﬀ phenomenon for an ergodic Markov chain describes a sharp transition in the conver-gence to its stationary distribution, over a negligible period of time, known as cutoﬀ window. Westudy the cutoﬀ phenomenon for simple random walks on Kneser graphs, which is a family of ergodicMarkov chains. Given two integers n and k , the Kneser graph K (2 n + k, n ) is deﬁned as the graphwith vertex set being all subsets of { , . . . , n + k } of size n and two vertices A and B being connectedby an edge if A ∩ B = ∅ . We show that for any k = O ( n ), the random walk on K (2 n + k, n ) exhibitsa cutoﬀ at log k/n (2 n + k ) with a window of size O ( nk ). Keywords:

Markov chain, random walk, cutoﬀ phenomenon, Kneser graph ∗ Max Planck Institute for Informatics, Saarbr¨ucken, Germany, email: [email protected] † University of Cambridge, United Kingdom, email: [email protected] Introduction

A simple random walk on a ﬁnite, non-bipartite graph is a discrete-time ergodic Markov chain, wherein each time step the walk, located at some vertex, chooses one of its neighbor uniformly at random andmoves to that neighbor. The cutoﬀ phenomenon for a sequence of chains describes a sharp transitionin the convergence of the chain distribution to its stationary distribution, over a negligible period oftime, known as cutoﬀ window. For applications such as MCMC a cutoﬀ is desirable, as running thechain any longer than the mixing time becomes essentially redundant. From a theoretical perspective,establishing a cutoﬀ is often surprisingly challenging, even for simple chains, as it requires very tightbounds on the distribution near the mixing time.Let P be a transition matrix of an ergodic (i.e., aperiodic and irreducible), discrete-time Markovchain ( X , X , . . . ) on a ﬁnite state space Ω with stationary distribution π . Let P t ( x, . ) be the probabilitydistribution of the chain at time t ∈ N with starting state x ∈ Ω. The total variation distance betweentwo probability distributions µ and ν on a probability space Ω is deﬁned by k µ − ν k T V = max A ⊂ Ω | µ ( A ) − ν ( A ) | ∈ [0 , . Therefore, we can deﬁne the worst-case total variation distance to stationarity at time t as d ( t ) = max x ∈ Ω k P t ( x, . ) − π k T V . For convenience, we deﬁne d ( t ) for non-integer t as d ( t ) := d ( ⌊ t ⌋ ). (If the reference is clear from thecontext, we will also just say total variation distance at time t ). The mixing time is deﬁned by t mix ( ǫ ) = min { t ∈ N : d ( t ) < ǫ } . Suppose now that we have a sequence of ergodic ﬁnite Markov chains indexed by n = 1 , , . . . . Let d n ( t )be the total variation distance of the n -th chain at time t and t ( n ) mix ( ǫ ) be its mixing time. Formally, wesay that the sequence of chains exhibits a cutoﬀ (in total variation distance), as deﬁned in [13, Section18.1], if for any ﬁxed 0 < ǫ <

1, lim n →∞ t ( n ) mix ( ǫ ) t ( n ) mix (1 − ǫ ) = 1 , or equivalently, a sequence of Markov chains has a cutoﬀ at time t n with a window of size w n = o ( t n (1 / λ →∞ lim inf n →∞ d n ( t n − λw n ) = 1 , lim λ →∞ lim sup n →∞ d n ( t n + λw n ) = 0 . (1)Although it is widely believed that many natural families of Markov chains exhibit a cutoﬀ, there arerelatively few examples where cutoﬀ has been shown. It turns out that this is quite challenging toprove or disprove the existence of a cutoﬀ even for simple family of chains. The ﬁrst results exhibitinga cutoﬀ appeared in the studies of card-shuﬄing processes by Aldous and Diaconis [1], and Diaconisand Shahshahani [6]. Later, the cutoﬀ phenomenon was also shown for random walks on hypercubes[7], for random walks on distance regular graphs including Johnson and Hamming graphs [2, 8], andfor randomized riﬄe shuﬄes [4]. For a more general view of Markov chains with and without cutoﬀ we2efer the reader to [9] or [13, Chapter 18]. A necessary condition, known as product condition, for afamily of chains to exhibit cutoﬀ is that t nmix (1 / · gap n tends to inﬁnity as n goes to inﬁnity, where gap n is the spectral gap of the transition matrix of n -th chain (see [13, Proposition 18.3]). Howeverthere are some chains where the product condition holds and they do not show any cutoﬀ (e.g see [13,Section 18]), Peres [16] conjectured that many natural family of chains satisfying the product conditionexhibit cutoﬀs. For instance, he conjectured that random walks on any family of n -vertex (transitive)expander graphs with gap n = Θ(1) and mixing time O (log n ) exhibit cutoﬀs. Chen and Saloﬀ-Coste[3] veriﬁed the conjecture for other distances like the ℓ p -norm for p >

1. Recently, Lubetzky and Sly[14] exhibited cutoﬀ phenomena for random walks on random regular graphs. They also showed thatthere exist families of explicit expanders with and without cutoﬀ [15]. Diaconis [9] pointed out that ifthe second largest eigenvalues of the transition matrix of a chain has high multiplicity, then this chainis more likely to show a cutoﬀ.In this work, we focus on simple random walks on Kneser graphs. The Kneser graph is deﬁned asfollows. For any two positive integers n and k , the Kneser graph K (2 n + k, n ) is the graph with all n -element subsets of [2 n + k ] = { , , . . . , n + k } as vertices and two vertices adjacent if and only iftheir corresponding n -element subsets are disjoint. We emphasize that throughout this paper, k and n are arbitrary integers, in particular, k can be a function of n . In the case that k = ω ( n ), the numberof vertices which is (cid:0) n + kn (cid:1) and degree of each vertex, (cid:0) n + kn (cid:1) , have the same magnitude so the simplerandom walk on K (2 n + k, n ) is mixed in just one step. For the special case k = 1, we obtain theso-called odd graph K (2 n + 1 , n ) with large odd cycles of size 2 n + 1, which is an induced subgraph of K (2 n + k, n ). This proves that K (2 n + k, n ) is not bipartite for every k ≥

1. The permutation group on[2 n + k ] is a subgroup of the automorphism group of K (2 n + k, n ), and thus the Kneser graph is alwaystransitive. Combining these two observations, we conclude that the simple random walk on K (2 n + k, n )is an ergodic and transitive Markov chain. Kneser graphs have been studied frequently in (algebraic)graph theory, in particular due to their connections to chromatic numbers and graph homomorphisms(see [12] for more details and references).Godsil [11] shows that for most values of n and k , the graph K (2 n + k, n ) is not a Cayley graph. Itis also well-known that the transition matrix of the simple random walk on Kneser graph K (2 n + k, n )has spectral gap kn + k and its second largest eigenvalue has multiplicity 2 n + k (cf. Corollary 4). Soby varying k = O ( n ), we obtain various family of chains with diﬀerent spectral gaps. For instance bysetting k = Θ( n ) we obtain a family of transitive expander graphs. In order to show a cutoﬀ for asimple random walk on Kneser graphs it is necessary to have a suﬃciently tight estimate of its mixingtime. Let P be the transition matrix of the simple random walk on Kneser graph K (2 n + k, n ) withspectrum λ i , 0 ≤ i ≤ (cid:0) n + kn (cid:1) − λ = 1. Then it is shown that [13, Lemma 12.16] d ( t ) = max x ∈ Ω k P t ( x, . ) − π k T V ≤ vuut | Ω |− X i =1 λ ti , (2)where Ω is the vertex set of the graph. It may be surprising that the upper bound obtained by thespectral properties of transition matrix is suﬃciently tight and matches the lower bound, which enablesus to show the existence of a cutoﬀ. Besides Kneser graphs, the bound in (2) has been successfullyapplied in computing of the mixing time of random walks on Cayley graphs (see [5, 10]). This maysuggest the following question: Question.

For which families of transitive ergodic chains is the upper bound in (2) tight up to loworder terms? Result

In the following we state the main result of the paper.

Theorem 1.

The simple random walk on K (2 n + k, n ) exhibits a cutoﬀ at log k/n (2 n + k ) with acutoﬀ window of size O ( nk ) for k = O ( n ) . We now give the proof of Theorem 1 using Proposition 5 and 8, whose statements and proofs aredeferred to later sections.

Proof.

For the proof of the upper bound on the mixing time, we use the spectrum of the transitionmatrix. Applying Proposition 5 implies thatlim c →∞ lim inf n →∞ d n (cid:16)

12 log k/n (2 n + k ) + c nk (cid:17) = 0 . We establish the lower bound by considering the vertices visited by a random walk starting from { n + 1 , . . . , n } and their intersection with [ n ] = { , . . . , n } . For any step, we compute the expectedsize of the intersection and derive an upper bound on its variance (to stationarity). Then applyingProposition 8 results into lim c →∞ lim inf n →∞ d n (cid:18)

12 log k/n (2 n + k ) − c nk (cid:19) = 0 . Combining these ﬁndings establishes a cutoﬀ at log k/n (2 n + k ) with a cutoﬀ window of size O ( nk )for k = O ( n ). To prove our results, we need two lemmas, the lemma below can be found in [13, Lemma 12.16].

Lemma 2 ([13, Lemma 12.16]) . Let P be a reversible transition matrix with eigenvalues λ ≥ λ ≥ · · · ≥ λ | Ω |− . If the Markov chain is transitive, then for every x ∈ Ω4 k P t ( x, . ) − π k T V ≤ | Ω |− X i =1 λ ti . To employ Lemma 2, we need to know all eigenvalues and their multiplicities. The spectrum of theadjacency matrix of Kneser graphs was computed in [12, Section 9.4] and [17].

Theorem 3 ([12, Section 9.4] and [17]) . The adjacency matrix of Kneser graphs K (2 n + k, n ) has thefollowing spectrum ( − i (cid:18) n + k − in − i (cid:19) with multiplicity of (cid:18) n + ki (cid:19) − (cid:18) n + ki − (cid:19) , i = 0 , . . . , n, where (cid:0) n + k − (cid:1) = 0 . K (2 n + k, n ) is a (cid:0) n + kn (cid:1) -regular graph, we immediately obtain the following corollary. Corollary 4.

The transition matrix of the simple random walk on K (2 n + k, n ) has the followingspectrum: ( − i (cid:0) n + k − in − i (cid:1)(cid:0) n + kn (cid:1) with multiplicity of (cid:18) n + ki (cid:19) − (cid:18) n + ki − (cid:19) , i = 0 , . . . , n. Proposition 5.

We have the following upper bounds on the total variation distance of the simple randomwalk on K (2 n + k, n ) . • If k = o ( n ) , then for every constant c ≥ / , d (cid:18)

12 log k/n (2 n + k ) + c nk (cid:19) ≤ e − c . • If k = Ω( n ) , then for every constant c with (1 + kn ) − c ≤ , d (cid:18)

12 log k/n (2 n + k ) + c (cid:19) ≤ (1 + k/n ) − c . Proof.

By Corollary 4 we have | λ i | = (cid:12)(cid:12)(cid:12)(cid:12) ( − i n ( n − n − · . . . · ( n − i + 1)( n + k )( n + k − n + k − · . . . · ( n + k − i + 1) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) nn + k (cid:19) i = (cid:18) − kn + k (cid:19) i . Now deﬁne g ( t ) = (cid:18) − kn + k (cid:19) t (2 n + k ) = (cid:18) kn (cid:19) − t (2 n + k ) . Applying Lemma 2 yields,4 k P t ( x, . ) − π k T V ≤ n X i =1 (cid:18) − kn + k (cid:19) i t · (cid:26)(cid:18) n + ki (cid:19) − (cid:18) n + ki − (cid:19)(cid:27) ≤ n X i =1 (cid:16) (1 − kn + k ) t (2 n + k ) (cid:17) i i ! ≤ e g ( t ) − . Using the fact that for every x , 0 ≤ x ≤ / e x − ≤ x , we conclude that for any 0 ≤ g ( t ) ≤ / k P t ( x, . ) − π k T V ≤ p g ( t ) / Case 1. k = o ( n ). We choose t = log k/n (2 n + k ) + c nk , where c ≥ /

2. Hence, g ( t ) = (cid:18) kn (cid:19) − t (2 n + k ) = (cid:18) kn (cid:19) − cnk ≤ e − c < / , d (cid:18)

12 log k/n (2 n + k ) + c nk (cid:19) ≤ e − c Case 2. k = Ω( n ). Now we choose t = log k/n (2 n + k ) + c . Then, g ( t ) = (cid:18) kn (cid:19) − t (2 n + k ) = (cid:18) kn (cid:19) − c ≤ / , where the last inequality holds due to assumption on c . Hence, inequality (3) yields d (cid:18)

12 log k/n (2 n + k ) + c (cid:19) ≤ (cid:18) kn (cid:19) − c In order to ﬁnd a lower bound for variation distance we use the following lemma which was applied in[19]. For further discussion on this method we refer the reader to [18]. Let f be a real-valued functionon Ω. We use E µ [ f ] and Var µ [ f ] to denote the expectation and variance of f under distribution of µ . Lemma 6 ([13, Proposition 7.8]) . Let µ and ν be two probability distributions on Ω and f : Ω → R bean arbitrary function. Suppose that max { Var µ [ f ] , Var ν [ f ] } ≤ σ ∗ . Then if | E µ [ f ] − E ν [ f ] | ≥ rσ ∗ , then k µ − ν k T V ≥ − r . Before proceeding, we recall that a random variable Y ∼ H ( N, m, n ) has a hypergeometric distri-bution if for every max { , n + m − N } ≤ i ≤ min { n, m } , Pr [ Y = i ] = ( mi )( N − mn − i )( Nn ) . The expected valueand variance of Y are E [ Y ] = nmN and Var [ Y ] = nm ( N − m )( N − n ) N ( N − respectively. Lemma 7.

Let X t be the vertex visited at step t by a simple random walk on K (2 n + k, n ) which startsat vertex X = { n + 1 , n + 2 , . . . , n } . Let f t = f ( X t ) = | X t ∩ [ n ] | , so f = 0 . Moreover, deﬁne a randomvariable f = | X ∩ [ n ] | with X being a vertex chosen uniformly at random from K (2 n + k, n ) . Then forany t ∈ N , Var [ f t ] ≤ C ( n, k ) Var [ f ] , where C ( n, k ) = (1 + o (1))(1 + k/n ) for k = O ( n ) .Proof. The random variable f under π has a hypergeometric distribution H (2 n + k, n, n ). Hence, E [ f ] = n n + k , (4)and Var [ f ] = n ( n + k ) (2 n + k ) (2 n + k − . (5)6n step t + 1 of the walk, an n -element subset of the complement of X t is chosen. If f t = s , | X t ∩ [ n ] | = s , then X ct has n − s common elements with [ n ] and s + k common elements with [ n ] c .Therefore f t +1 = n − Y where Y has hypergeometric distribution H ( n + k, s + k, n ). Hence, E [ f t +1 | f t = s ] = E [ n − Y ] = n − ( s + k ) nn + k = ( n − s ) · (cid:18) − kn + k (cid:19) = n (cid:18) − kn + k (cid:19) − E [ f t ] (cid:18) − kn + k (cid:19) . Solving this recursion allows us to compute the expectation of f t : E [ f t ] = n t X i =1 " ( − i +1 (cid:18) − kn + k (cid:19) i + ( − t E [ f ] (1 − kn + k ) t | {z } =0 = − n ( kn + k − t +1 − ( kn + k − kn + k − n n + k + ( − t +1 n ( n + k )(1 − kn + k ) t +1 n + k . (6)We have already shown that E [ f t +1 | f t ] = n (1 − kn + k ) − f t (1 − kn + k ), which immediately implies that Var [ E [ f t +1 | f t ]] = (cid:18) − kn + k (cid:19) Var [ f t ] . As observed earlier, the random variable f t +1 conditioned on f t has distribution n − Y where Y ∼ H ( n + k, f t + k, n ) which yields Var [ f t +1 | f t ] = Var [ n − Y ] = Var [ Y ] = ( f t + k )( n − f t )( n + k ) × nk ( n + k − . Assume now that A is an upper bound for ( f t + k )( n − f t )( n + k ) for every f t ; A will be speciﬁed later. In thefollowing, we use the total law of variance to ﬁnd a recursive formula for Var [ f t ], Var [ f t +1 ] = Var [ E [ f t +1 | f t ]] + E [ Var [ f t +1 | f t ]] ≤ (cid:18) − kn + k (cid:19) Var [ f t ] + A nk ( n + k − . Using this recursion, we obtain the following upper bound on

Var [ f t ]: Var [ f t ] ≤ A nkn + k − t − X i =0 "(cid:18) − kn + k (cid:19) i + (1 − kn + k ) t V ( f ) | {z } =0 = A nkn + k − × − (1 − kn + k ) t − (1 − kn + k ) ≤ A n ( n + k ) (2 n + k )( n + k − . Since always 0 ≤ f t ≤ n , ( f t + k )( n − f t )( n + k ) ≤ / A . Var [ f t ] ≤ · n ( n + k ) (2 n + k )( n + k −

1) = 14 · n (1 + k/n ) n (2 + k/n )(1 + k/n − o (1)) = n (1 + k/n )(1 + o (1))4(2 + k/n ) . Var [ f ] ≥ n (1 + k/n ) n (2 + k/n ) . Using the fact that 1 / ≤ x x for every x ≥ Var [ f ] · (1 + k/n ) · (1 + o (1)) ≥ n (1 + k/n )(1 + o (1))4(2 + k/n ) . By comparing

Var [ f ] and Var [ f t ], the claim follows.We are now ready to apply Lemma 6 to derive a lower bound on the total variation distance. Proposition 8.

For every constant c > , we have the following lower bounds on the total variationdistance for a simple random walk on K (2 n + k, n ) . • If k = o ( n ) , d (cid:18)

12 log k/n (2 n + k ) − c nk (cid:19) ≥ − o (1))( e − o (1)) − c . • If k = Θ( n ) , then d (cid:18)

12 log k/n (2 n + k ) − c (cid:19) ≥ − o (1))(1 + k/n ) − c +4 . Proof.

By using Lemma 7 and (5) p max { Var [ f ] , Var [ f t ] } ≤ p C ( n, k ) Var [ f ] ≤ C ( n, k ) n ( n + k )(2 n + k ) √ n + k − σ ∗ . Combining (6) and (4), | E [ f t ] − E [ f ] | = n ( n + k )2 n + k (cid:18) − kn + k (cid:19) t +1 = 1 C ( n, k ) σ ∗ √ n + k − (cid:18) kn (cid:19) − t − . Deﬁne ˜ g ( t ) = √ n + k − C ( n, k ) (cid:18) kn (cid:19) − t − . • Case 1. k = o ( n ). By Lemma 7 we know that C ( n, k ) = (1 + k/n )(1 + o (1)) = (1 + o (1)). Wechoose t = log k/n (2 n + k ) − c nk so that˜ g ( t ) = p − o (1)1 + o (1) (cid:18) kn (cid:19) c nk = (1 − o (1)) e cn , where ( e n ) n is an increasing sequence tending to e as n → ∞ . Applying Lemma 6 yields, d (cid:18)

12 log k/n (2 n + k ) − c nk (cid:19) = k P t ( X , . ) − π k T V ≥ − o (1)) e − cn , where X = { n + 1 , . . . , n } and the equality comes from the fact that the chain is transitive.8 Case 2. k = Θ( n ). By Lemma 7, C ( n, k ) = (1 + k/n )(1 + o (1)). Take t = log k/n (2 n + k ) − c .Hence, ˜ g ( t ) = p − o (1)1 + o (1) (1 + k/n ) c − . Again, using Lemma 6 gives d (cid:18)

12 log k/n (2 n + k ) − c (cid:19) = k P t ( X , . ) − π k T V ≥ − o (1)(1 + k/n ) − c +4 . References [1] D. Aldous and P. Diaconis,

Shuﬄing cards and stopping times.

Amer. Math. Monthly 333-348,93(1986).[2] E. D. Belsley,

Rates of convergence of random walk on distance regular graphs.

Probab. TheoryRelated Fields, no. 4, 493-533, 112(1998).[3] G.-Y. Chen and L. Saloﬀ-Coste,

The cutoﬀ phenomenon for ergodic Markov processes.

Electron. J.Probab., no. 3, 2678, 13(2008).[4] G.-Y. Chen and L. Saloﬀ-Coste,

The cutoﬀ phenomenon for randomized riﬄe shuﬄes . RandomStructures Algorithms, no. 3, 346-372, 32(2008).[5] P. Diaconis,

Group Representations in Probability and Statistics.

IMS, Hayward, CA. 1988.[6] P. Diaconis and M. Shahshahani,

Generating a random permutation with random transposition.

Z. Wahrsch. Verw. Gebiete, no. 2, 159-179, 57(1981).[7] P. Diaconis, R. L. Graham and J. A. Morrison,

Asymptotic Analysis of a Random Walk on aHypercube with Many Dimensions.

Random Structures Algorithms, no. 1, 51-72, 1(1990).[8] P. Diaconis and M. Shahshahani,

Time to reach stationarity in the Bernoulli-Laplace diﬀusionmodel.

SIAM J. Math. Anal. no. 1, 208-218, 18(1987).[9] P. Diaconis,

The cutoﬀ phenomenon in ﬁnite Markov chains.

Proc. Nat. Acad. Sci. USA, no. 4,1659-1664, 93(1996).[10] C. Dou and M. Hildebrand,

Enumeration and random random walks on ﬁnite groups.

Ann. Probab.,no. 2, 987-1000, 24(1996).[11] C. Godsil,

More odd graph theory.

Discrete Math., no. 2, 205-207, 32(1980).[12] C. Godsil and G. Royle,

Algebraic Graph Theory.

Graduate Texts in Mathematics, 207. Springer-Verlag, New York, 2001.[13] D. A. Levin, Y. Peres and E. L. Wilmer,

Markov Chains and Mixing Times.

AMS, Providence, RI,2009. 914] E. Lubetzky and A. Sly,

Cutoﬀ phenomenon for random walks on random regular graphs.

DukeMath. J., no. 3, 475-510, 153(2010).[15] E. Lubetzky and A. Sly,

Explicit expanders with cutoﬀ phenomena.

Electron. J. Probab., no. 15,419-435, 16(2011).[16] Y. Peres,

Sharp Thresholds for Mixing Times.

Chromatic polynomials and the spectrum of the Kneser graph.

Total variation lower bounds for ﬁnite Markov chains: Wilsons lemma.

Randomwalks and geometry, 515-532, Walter de Gruyter GmbH Co, KG, Berlin, (2004).[19] D. B. Wilson,