Cutoff Phenomenon for Random Walks on Kneser Graphs
aa r X i v : . [ m a t h . C O ] A p r Cutoff Phenomenon for Random Walks on Kneser Graphs
Ali Pourmiri ∗ Thomas Sauerwald † August 25, 2018
Abstract
The cutoff phenomenon for an ergodic Markov chain describes a sharp transition in the conver-gence to its stationary distribution, over a negligible period of time, known as cutoff window. Westudy the cutoff phenomenon for simple random walks on Kneser graphs, which is a family of ergodicMarkov chains. Given two integers n and k , the Kneser graph K (2 n + k, n ) is defined as the graphwith vertex set being all subsets of { , . . . , n + k } of size n and two vertices A and B being connectedby an edge if A ∩ B = ∅ . We show that for any k = O ( n ), the random walk on K (2 n + k, n ) exhibitsa cutoff at log k/n (2 n + k ) with a window of size O ( nk ). Keywords:
Markov chain, random walk, cutoff phenomenon, Kneser graph ∗ Max Planck Institute for Informatics, Saarbr¨ucken, Germany, email: [email protected] † University of Cambridge, United Kingdom, email: [email protected] Introduction
A simple random walk on a finite, non-bipartite graph is a discrete-time ergodic Markov chain, wherein each time step the walk, located at some vertex, chooses one of its neighbor uniformly at random andmoves to that neighbor. The cutoff phenomenon for a sequence of chains describes a sharp transitionin the convergence of the chain distribution to its stationary distribution, over a negligible period oftime, known as cutoff window. For applications such as MCMC a cutoff is desirable, as running thechain any longer than the mixing time becomes essentially redundant. From a theoretical perspective,establishing a cutoff is often surprisingly challenging, even for simple chains, as it requires very tightbounds on the distribution near the mixing time.Let P be a transition matrix of an ergodic (i.e., aperiodic and irreducible), discrete-time Markovchain ( X , X , . . . ) on a finite state space Ω with stationary distribution π . Let P t ( x, . ) be the probabilitydistribution of the chain at time t ∈ N with starting state x ∈ Ω. The total variation distance betweentwo probability distributions µ and ν on a probability space Ω is defined by k µ − ν k T V = max A ⊂ Ω | µ ( A ) − ν ( A ) | ∈ [0 , . Therefore, we can define the worst-case total variation distance to stationarity at time t as d ( t ) = max x ∈ Ω k P t ( x, . ) − π k T V . For convenience, we define d ( t ) for non-integer t as d ( t ) := d ( ⌊ t ⌋ ). (If the reference is clear from thecontext, we will also just say total variation distance at time t ). The mixing time is defined by t mix ( ǫ ) = min { t ∈ N : d ( t ) < ǫ } . Suppose now that we have a sequence of ergodic finite Markov chains indexed by n = 1 , , . . . . Let d n ( t )be the total variation distance of the n -th chain at time t and t ( n ) mix ( ǫ ) be its mixing time. Formally, wesay that the sequence of chains exhibits a cutoff (in total variation distance), as defined in [13, Section18.1], if for any fixed 0 < ǫ <
1, lim n →∞ t ( n ) mix ( ǫ ) t ( n ) mix (1 − ǫ ) = 1 , or equivalently, a sequence of Markov chains has a cutoff at time t n with a window of size w n = o ( t n (1 / λ →∞ lim inf n →∞ d n ( t n − λw n ) = 1 , lim λ →∞ lim sup n →∞ d n ( t n + λw n ) = 0 . (1)Although it is widely believed that many natural families of Markov chains exhibit a cutoff, there arerelatively few examples where cutoff has been shown. It turns out that this is quite challenging toprove or disprove the existence of a cutoff even for simple family of chains. The first results exhibitinga cutoff appeared in the studies of card-shuffling processes by Aldous and Diaconis [1], and Diaconisand Shahshahani [6]. Later, the cutoff phenomenon was also shown for random walks on hypercubes[7], for random walks on distance regular graphs including Johnson and Hamming graphs [2, 8], andfor randomized riffle shuffles [4]. For a more general view of Markov chains with and without cutoff we2efer the reader to [9] or [13, Chapter 18]. A necessary condition, known as product condition, for afamily of chains to exhibit cutoff is that t nmix (1 / · gap n tends to infinity as n goes to infinity, where gap n is the spectral gap of the transition matrix of n -th chain (see [13, Proposition 18.3]). Howeverthere are some chains where the product condition holds and they do not show any cutoff (e.g see [13,Section 18]), Peres [16] conjectured that many natural family of chains satisfying the product conditionexhibit cutoffs. For instance, he conjectured that random walks on any family of n -vertex (transitive)expander graphs with gap n = Θ(1) and mixing time O (log n ) exhibit cutoffs. Chen and Saloff-Coste[3] verified the conjecture for other distances like the ℓ p -norm for p >
1. Recently, Lubetzky and Sly[14] exhibited cutoff phenomena for random walks on random regular graphs. They also showed thatthere exist families of explicit expanders with and without cutoff [15]. Diaconis [9] pointed out that ifthe second largest eigenvalues of the transition matrix of a chain has high multiplicity, then this chainis more likely to show a cutoff.In this work, we focus on simple random walks on Kneser graphs. The Kneser graph is defined asfollows. For any two positive integers n and k , the Kneser graph K (2 n + k, n ) is the graph with all n -element subsets of [2 n + k ] = { , , . . . , n + k } as vertices and two vertices adjacent if and only iftheir corresponding n -element subsets are disjoint. We emphasize that throughout this paper, k and n are arbitrary integers, in particular, k can be a function of n . In the case that k = ω ( n ), the numberof vertices which is (cid:0) n + kn (cid:1) and degree of each vertex, (cid:0) n + kn (cid:1) , have the same magnitude so the simplerandom walk on K (2 n + k, n ) is mixed in just one step. For the special case k = 1, we obtain theso-called odd graph K (2 n + 1 , n ) with large odd cycles of size 2 n + 1, which is an induced subgraph of K (2 n + k, n ). This proves that K (2 n + k, n ) is not bipartite for every k ≥
1. The permutation group on[2 n + k ] is a subgroup of the automorphism group of K (2 n + k, n ), and thus the Kneser graph is alwaystransitive. Combining these two observations, we conclude that the simple random walk on K (2 n + k, n )is an ergodic and transitive Markov chain. Kneser graphs have been studied frequently in (algebraic)graph theory, in particular due to their connections to chromatic numbers and graph homomorphisms(see [12] for more details and references).Godsil [11] shows that for most values of n and k , the graph K (2 n + k, n ) is not a Cayley graph. Itis also well-known that the transition matrix of the simple random walk on Kneser graph K (2 n + k, n )has spectral gap kn + k and its second largest eigenvalue has multiplicity 2 n + k (cf. Corollary 4). Soby varying k = O ( n ), we obtain various family of chains with different spectral gaps. For instance bysetting k = Θ( n ) we obtain a family of transitive expander graphs. In order to show a cutoff for asimple random walk on Kneser graphs it is necessary to have a sufficiently tight estimate of its mixingtime. Let P be the transition matrix of the simple random walk on Kneser graph K (2 n + k, n ) withspectrum λ i , 0 ≤ i ≤ (cid:0) n + kn (cid:1) − λ = 1. Then it is shown that [13, Lemma 12.16] d ( t ) = max x ∈ Ω k P t ( x, . ) − π k T V ≤ vuut | Ω |− X i =1 λ ti , (2)where Ω is the vertex set of the graph. It may be surprising that the upper bound obtained by thespectral properties of transition matrix is sufficiently tight and matches the lower bound, which enablesus to show the existence of a cutoff. Besides Kneser graphs, the bound in (2) has been successfullyapplied in computing of the mixing time of random walks on Cayley graphs (see [5, 10]). This maysuggest the following question: Question.
For which families of transitive ergodic chains is the upper bound in (2) tight up to loworder terms? Result
In the following we state the main result of the paper.
Theorem 1.
The simple random walk on K (2 n + k, n ) exhibits a cutoff at log k/n (2 n + k ) with acutoff window of size O ( nk ) for k = O ( n ) . We now give the proof of Theorem 1 using Proposition 5 and 8, whose statements and proofs aredeferred to later sections.
Proof.
For the proof of the upper bound on the mixing time, we use the spectrum of the transitionmatrix. Applying Proposition 5 implies thatlim c →∞ lim inf n →∞ d n (cid:16)
12 log k/n (2 n + k ) + c nk (cid:17) = 0 . We establish the lower bound by considering the vertices visited by a random walk starting from { n + 1 , . . . , n } and their intersection with [ n ] = { , . . . , n } . For any step, we compute the expectedsize of the intersection and derive an upper bound on its variance (to stationarity). Then applyingProposition 8 results into lim c →∞ lim inf n →∞ d n (cid:18)
12 log k/n (2 n + k ) − c nk (cid:19) = 0 . Combining these findings establishes a cutoff at log k/n (2 n + k ) with a cutoff window of size O ( nk )for k = O ( n ). To prove our results, we need two lemmas, the lemma below can be found in [13, Lemma 12.16].
Lemma 2 ([13, Lemma 12.16]) . Let P be a reversible transition matrix with eigenvalues λ ≥ λ ≥ · · · ≥ λ | Ω |− . If the Markov chain is transitive, then for every x ∈ Ω4 k P t ( x, . ) − π k T V ≤ | Ω |− X i =1 λ ti . To employ Lemma 2, we need to know all eigenvalues and their multiplicities. The spectrum of theadjacency matrix of Kneser graphs was computed in [12, Section 9.4] and [17].
Theorem 3 ([12, Section 9.4] and [17]) . The adjacency matrix of Kneser graphs K (2 n + k, n ) has thefollowing spectrum ( − i (cid:18) n + k − in − i (cid:19) with multiplicity of (cid:18) n + ki (cid:19) − (cid:18) n + ki − (cid:19) , i = 0 , . . . , n, where (cid:0) n + k − (cid:1) = 0 . K (2 n + k, n ) is a (cid:0) n + kn (cid:1) -regular graph, we immediately obtain the following corollary. Corollary 4.
The transition matrix of the simple random walk on K (2 n + k, n ) has the followingspectrum: ( − i (cid:0) n + k − in − i (cid:1)(cid:0) n + kn (cid:1) with multiplicity of (cid:18) n + ki (cid:19) − (cid:18) n + ki − (cid:19) , i = 0 , . . . , n. Proposition 5.
We have the following upper bounds on the total variation distance of the simple randomwalk on K (2 n + k, n ) . • If k = o ( n ) , then for every constant c ≥ / , d (cid:18)
12 log k/n (2 n + k ) + c nk (cid:19) ≤ e − c . • If k = Ω( n ) , then for every constant c with (1 + kn ) − c ≤ , d (cid:18)
12 log k/n (2 n + k ) + c (cid:19) ≤ (1 + k/n ) − c . Proof.
By Corollary 4 we have | λ i | = (cid:12)(cid:12)(cid:12)(cid:12) ( − i n ( n − n − · . . . · ( n − i + 1)( n + k )( n + k − n + k − · . . . · ( n + k − i + 1) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) nn + k (cid:19) i = (cid:18) − kn + k (cid:19) i . Now define g ( t ) = (cid:18) − kn + k (cid:19) t (2 n + k ) = (cid:18) kn (cid:19) − t (2 n + k ) . Applying Lemma 2 yields,4 k P t ( x, . ) − π k T V ≤ n X i =1 (cid:18) − kn + k (cid:19) i t · (cid:26)(cid:18) n + ki (cid:19) − (cid:18) n + ki − (cid:19)(cid:27) ≤ n X i =1 (cid:16) (1 − kn + k ) t (2 n + k ) (cid:17) i i ! ≤ e g ( t ) − . Using the fact that for every x , 0 ≤ x ≤ / e x − ≤ x , we conclude that for any 0 ≤ g ( t ) ≤ / k P t ( x, . ) − π k T V ≤ p g ( t ) / Case 1. k = o ( n ). We choose t = log k/n (2 n + k ) + c nk , where c ≥ /
2. Hence, g ( t ) = (cid:18) kn (cid:19) − t (2 n + k ) = (cid:18) kn (cid:19) − cnk ≤ e − c < / , d (cid:18)
12 log k/n (2 n + k ) + c nk (cid:19) ≤ e − c Case 2. k = Ω( n ). Now we choose t = log k/n (2 n + k ) + c . Then, g ( t ) = (cid:18) kn (cid:19) − t (2 n + k ) = (cid:18) kn (cid:19) − c ≤ / , where the last inequality holds due to assumption on c . Hence, inequality (3) yields d (cid:18)
12 log k/n (2 n + k ) + c (cid:19) ≤ (cid:18) kn (cid:19) − c In order to find a lower bound for variation distance we use the following lemma which was applied in[19]. For further discussion on this method we refer the reader to [18]. Let f be a real-valued functionon Ω. We use E µ [ f ] and Var µ [ f ] to denote the expectation and variance of f under distribution of µ . Lemma 6 ([13, Proposition 7.8]) . Let µ and ν be two probability distributions on Ω and f : Ω → R bean arbitrary function. Suppose that max { Var µ [ f ] , Var ν [ f ] } ≤ σ ∗ . Then if | E µ [ f ] − E ν [ f ] | ≥ rσ ∗ , then k µ − ν k T V ≥ − r . Before proceeding, we recall that a random variable Y ∼ H ( N, m, n ) has a hypergeometric distri-bution if for every max { , n + m − N } ≤ i ≤ min { n, m } , Pr [ Y = i ] = ( mi )( N − mn − i )( Nn ) . The expected valueand variance of Y are E [ Y ] = nmN and Var [ Y ] = nm ( N − m )( N − n ) N ( N − respectively. Lemma 7.
Let X t be the vertex visited at step t by a simple random walk on K (2 n + k, n ) which startsat vertex X = { n + 1 , n + 2 , . . . , n } . Let f t = f ( X t ) = | X t ∩ [ n ] | , so f = 0 . Moreover, define a randomvariable f = | X ∩ [ n ] | with X being a vertex chosen uniformly at random from K (2 n + k, n ) . Then forany t ∈ N , Var [ f t ] ≤ C ( n, k ) Var [ f ] , where C ( n, k ) = (1 + o (1))(1 + k/n ) for k = O ( n ) .Proof. The random variable f under π has a hypergeometric distribution H (2 n + k, n, n ). Hence, E [ f ] = n n + k , (4)and Var [ f ] = n ( n + k ) (2 n + k ) (2 n + k − . (5)6n step t + 1 of the walk, an n -element subset of the complement of X t is chosen. If f t = s , | X t ∩ [ n ] | = s , then X ct has n − s common elements with [ n ] and s + k common elements with [ n ] c .Therefore f t +1 = n − Y where Y has hypergeometric distribution H ( n + k, s + k, n ). Hence, E [ f t +1 | f t = s ] = E [ n − Y ] = n − ( s + k ) nn + k = ( n − s ) · (cid:18) − kn + k (cid:19) = n (cid:18) − kn + k (cid:19) − E [ f t ] (cid:18) − kn + k (cid:19) . Solving this recursion allows us to compute the expectation of f t : E [ f t ] = n t X i =1 " ( − i +1 (cid:18) − kn + k (cid:19) i + ( − t E [ f ] (1 − kn + k ) t | {z } =0 = − n ( kn + k − t +1 − ( kn + k − kn + k − n n + k + ( − t +1 n ( n + k )(1 − kn + k ) t +1 n + k . (6)We have already shown that E [ f t +1 | f t ] = n (1 − kn + k ) − f t (1 − kn + k ), which immediately implies that Var [ E [ f t +1 | f t ]] = (cid:18) − kn + k (cid:19) Var [ f t ] . As observed earlier, the random variable f t +1 conditioned on f t has distribution n − Y where Y ∼ H ( n + k, f t + k, n ) which yields Var [ f t +1 | f t ] = Var [ n − Y ] = Var [ Y ] = ( f t + k )( n − f t )( n + k ) × nk ( n + k − . Assume now that A is an upper bound for ( f t + k )( n − f t )( n + k ) for every f t ; A will be specified later. In thefollowing, we use the total law of variance to find a recursive formula for Var [ f t ], Var [ f t +1 ] = Var [ E [ f t +1 | f t ]] + E [ Var [ f t +1 | f t ]] ≤ (cid:18) − kn + k (cid:19) Var [ f t ] + A nk ( n + k − . Using this recursion, we obtain the following upper bound on
Var [ f t ]: Var [ f t ] ≤ A nkn + k − t − X i =0 "(cid:18) − kn + k (cid:19) i + (1 − kn + k ) t V ( f ) | {z } =0 = A nkn + k − × − (1 − kn + k ) t − (1 − kn + k ) ≤ A n ( n + k ) (2 n + k )( n + k − . Since always 0 ≤ f t ≤ n , ( f t + k )( n − f t )( n + k ) ≤ / A . Var [ f t ] ≤ · n ( n + k ) (2 n + k )( n + k −
1) = 14 · n (1 + k/n ) n (2 + k/n )(1 + k/n − o (1)) = n (1 + k/n )(1 + o (1))4(2 + k/n ) . Var [ f ] ≥ n (1 + k/n ) n (2 + k/n ) . Using the fact that 1 / ≤ x x for every x ≥ Var [ f ] · (1 + k/n ) · (1 + o (1)) ≥ n (1 + k/n )(1 + o (1))4(2 + k/n ) . By comparing
Var [ f ] and Var [ f t ], the claim follows.We are now ready to apply Lemma 6 to derive a lower bound on the total variation distance. Proposition 8.
For every constant c > , we have the following lower bounds on the total variationdistance for a simple random walk on K (2 n + k, n ) . • If k = o ( n ) , d (cid:18)
12 log k/n (2 n + k ) − c nk (cid:19) ≥ − o (1))( e − o (1)) − c . • If k = Θ( n ) , then d (cid:18)
12 log k/n (2 n + k ) − c (cid:19) ≥ − o (1))(1 + k/n ) − c +4 . Proof.
By using Lemma 7 and (5) p max { Var [ f ] , Var [ f t ] } ≤ p C ( n, k ) Var [ f ] ≤ C ( n, k ) n ( n + k )(2 n + k ) √ n + k − σ ∗ . Combining (6) and (4), | E [ f t ] − E [ f ] | = n ( n + k )2 n + k (cid:18) − kn + k (cid:19) t +1 = 1 C ( n, k ) σ ∗ √ n + k − (cid:18) kn (cid:19) − t − . Define ˜ g ( t ) = √ n + k − C ( n, k ) (cid:18) kn (cid:19) − t − . • Case 1. k = o ( n ). By Lemma 7 we know that C ( n, k ) = (1 + k/n )(1 + o (1)) = (1 + o (1)). Wechoose t = log k/n (2 n + k ) − c nk so that˜ g ( t ) = p − o (1)1 + o (1) (cid:18) kn (cid:19) c nk = (1 − o (1)) e cn , where ( e n ) n is an increasing sequence tending to e as n → ∞ . Applying Lemma 6 yields, d (cid:18)
12 log k/n (2 n + k ) − c nk (cid:19) = k P t ( X , . ) − π k T V ≥ − o (1)) e − cn , where X = { n + 1 , . . . , n } and the equality comes from the fact that the chain is transitive.8 Case 2. k = Θ( n ). By Lemma 7, C ( n, k ) = (1 + k/n )(1 + o (1)). Take t = log k/n (2 n + k ) − c .Hence, ˜ g ( t ) = p − o (1)1 + o (1) (1 + k/n ) c − . Again, using Lemma 6 gives d (cid:18)
12 log k/n (2 n + k ) − c (cid:19) = k P t ( X , . ) − π k T V ≥ − o (1)(1 + k/n ) − c +4 . References [1] D. Aldous and P. Diaconis,
Shuffling cards and stopping times.
Amer. Math. Monthly 333-348,93(1986).[2] E. D. Belsley,
Rates of convergence of random walk on distance regular graphs.
Probab. TheoryRelated Fields, no. 4, 493-533, 112(1998).[3] G.-Y. Chen and L. Saloff-Coste,
The cutoff phenomenon for ergodic Markov processes.
Electron. J.Probab., no. 3, 2678, 13(2008).[4] G.-Y. Chen and L. Saloff-Coste,
The cutoff phenomenon for randomized riffle shuffles . RandomStructures Algorithms, no. 3, 346-372, 32(2008).[5] P. Diaconis,
Group Representations in Probability and Statistics.
IMS, Hayward, CA. 1988.[6] P. Diaconis and M. Shahshahani,
Generating a random permutation with random transposition.
Z. Wahrsch. Verw. Gebiete, no. 2, 159-179, 57(1981).[7] P. Diaconis, R. L. Graham and J. A. Morrison,
Asymptotic Analysis of a Random Walk on aHypercube with Many Dimensions.
Random Structures Algorithms, no. 1, 51-72, 1(1990).[8] P. Diaconis and M. Shahshahani,
Time to reach stationarity in the Bernoulli-Laplace diffusionmodel.
SIAM J. Math. Anal. no. 1, 208-218, 18(1987).[9] P. Diaconis,
The cutoff phenomenon in finite Markov chains.
Proc. Nat. Acad. Sci. USA, no. 4,1659-1664, 93(1996).[10] C. Dou and M. Hildebrand,
Enumeration and random random walks on finite groups.
Ann. Probab.,no. 2, 987-1000, 24(1996).[11] C. Godsil,
More odd graph theory.
Discrete Math., no. 2, 205-207, 32(1980).[12] C. Godsil and G. Royle,
Algebraic Graph Theory.
Graduate Texts in Mathematics, 207. Springer-Verlag, New York, 2001.[13] D. A. Levin, Y. Peres and E. L. Wilmer,
Markov Chains and Mixing Times.
AMS, Providence, RI,2009. 914] E. Lubetzky and A. Sly,
Cutoff phenomenon for random walks on random regular graphs.
DukeMath. J., no. 3, 475-510, 153(2010).[15] E. Lubetzky and A. Sly,
Explicit expanders with cutoff phenomena.
Electron. J. Probab., no. 15,419-435, 16(2011).[16] Y. Peres,
Sharp Thresholds for Mixing Times.
Chromatic polynomials and the spectrum of the Kneser graph.
Total variation lower bounds for finite Markov chains: Wilsons lemma.
Randomwalks and geometry, 515-532, Walter de Gruyter GmbH Co, KG, Berlin, (2004).[19] D. B. Wilson,