aa r X i v : . [ m a t h . P R ] D ec Mixing under monotone censoring
Jian Ding ∗ University of Chicago Elchanan Mossel † U.C. BerkeleyOctober 6, 2018
Abstract
We initiate the study of mixing times of Markov chain under monotone censoring. Supposewe have some Markov Chain M on a state space Ω with stationary distribution π and a monotoneset A ⊂ Ω. We consider the chain M ′ which is the same as the chain M started at some x ∈ A except that moves of M of the form x → y where x ∈ A and y / ∈ A are censored and replaced bythe move x → x . If M is ergodic and A is connected, the new chain converges to π conditionalon A . In this paper we are interested in the mixing time of the chain M ′ in terms of propertiesof M and A . Our results are based on new connections with the field of property testing. Anumber of open problems are presented. Consider critical percolation on a square in the hexagonal lattice. Formally this is given by theprobability space { , } H n with the uniform distribution, where we denote by H n the sites in thehexagonal lattice in the square. It is trivial to sample a configuration from this model by samplingeach hexagon independently. Let A be the event of a left to right crossing (by 1’s). It is well known,by duality, that P [ A ] = 0 .
5. Suppose we want to sample a configuration of A . One natural wayto do so is by rejection sampling: sampling a random configuration and accepting it if and onlyif it is in A . A different natural way to sample is to start with a particular left to right crossingconfiguration and then repeatedly re-sample edge as long as the resulting configuration is in A . Itis not hard to see that the second procedure will also converge to the uniform distribution on A .However, how long would it take to converge? { , } n We will study a more general question. Consider the partial order on { , } n where x y if andonly if x i y i for all i ∈ [1 , n ]. We say a set A ⊂ { , } n is monotone if x ∈ A and x y implythat y ∈ A . For a monotone set A and x ∈ A , let M x A denote the following Markov chain startedat x . ∗ Partially supported by NSF grant DMS-1313596. † Supported by NSF grant DMS-1106999, NSF grant CCF 1320105 and DOD ONR grant N000141110140. Given the current stat x , pick a coordinate i uniformly at random and re-randomize x i toobtain y . • Let the next state of the chain be y if y ∈ A . Otherwise let it be x .It is trivial to verify that the chain converges to the uniform distribution on A (it is clear that thechain is irreducible by monotonicity of A ). We aim to analyze the mixing time (see, e.g., [1, 10] fordefinition) for the chain M x A .To this end, we will use a standard geometric bound on the mixing time given by the conductance of the underlying graph for a Markov chain. Given a graph G = G ( V, E ), the conductance φ ( G ) isdefined to be φ ( G ) = min S ⊆ V : | vol( S ) | | E | Φ( S ) , where Φ( S ) △ = | ∂ E S | vol( S ) , (1)where vol( S ) is the sum of degrees over vertices in S and ∂ E ( S ) = { ( x, y ) ∈ E : x ∈ S, y S } denotes the edge boundary set of S . In light of this, we will view A as the underlying graph for theMarkov chain M A . Alternatively, A can be seen as the induced subgraph of the hypercube { , } n with a suitable number of self-loops added to each vertex so that the degree is n for every x ∈ A .In what follows, we denote by P the uniform probability measure on { , } n . Theorem 1.1.
For any monotone set A ⊂ { , } n , we have φ ( A ) > P [ A ]16 n . Combined with standard results in the theory of Markov chains [8, 9] (see also [10, Theorem13.14]), Theroem 1.1 yields the following corollary on the mixing time of M A . Corollary 1.2.
For any monotone set A ⊂ { , } n , the mixing time for the chain M A satisfies τ mix ( M A ) n P [ A ] ) log(4 · n P ( A )) . Note that this implies that the mixing time is polynomial in n as long as A is large (of measureat least inverse polynomial in n ). In particular, the mixing time for our motivating example ofsampling a critical percolation configuration with a left to right crossing has mixing time at most O ( n ). Our result is tight up to polynomial factors in n as the following example shows: Example 1.3.
Assume n > m and let A = { x : x = x . . . = x m = 1 } ∪ { x : x m +1 = . . . = x m =1 } . Clearly A is monotone and P [ A ] = 2 − m +1 . Considering Φ( B ) for B = { x : x = x . . . = x m =1 } ⊂ A , we see that φ ( A ) − m . Similarly starting from the point (1 , . . . , , , . . . , it is easy tosee that the mixing time is lower bounded by the time to hit (1 , . . . , with probability at least / ,which is lower bounded by m − . Our proof uses a new ingredient in the context of mixing of Markov chain, i.e., a result fromthe theory of property testing. Property testing, explicitly defined in [12], plays a central rolein probabilistically checkable proofs. However, it was extended and extensively studied on its ownright for checking properties such as graph properties with fascinating connections to many areas ofcombinatorics including in particular to regularity lemmas. It turns out that the natural algorithmwhich samples a number of random neighboring pairs and rejects the monotonicity if a violatingpair is seen, works well for monotonicity testing [6]. The key to the success of this natural testingalgorithm, which is also the key to our proof of Theorem 1.1, is the following structural theoremon approximately monotone set. 2 heorem 1.4. [6, Theorem 2] For any set S ⊂ { , } n , define δ ( S ) = ( n n ) − |{ ( x, y ) ∈ { , } n × { , } n : | x − y | = 1 , x y, x ∈ S, y S }| ,ε ( S ) = min { P ( S ⊕ A ) : A is monotone } . where ⊕ denotes the symmetric difference of two sets. Then we have δ ( S ) > ε ( S ) /n . Recall that P is the uniform measure on { , } n . In light of definition (1), it suffices to prove thatΦ( B ) > P ( A )16 n for all B ⊂ A such that P ( B ) P ( A ) / . (2)It is clear that (2) holds if P ( A ) P ( B ) < · − n since in this case by connectivity of A we have thatΦ( B ) > B ) = 1 P ( B ) n n > P ( A )8 n . It remains to consider the case when P ( A ) P ( B ) > · − n . Denote by C = A \ B and by Ω thecollection of monotone sets in the hypercube { , } n . We claim that either P ( B ⊕ F ) > P ( A ) P ( B )16 , for all F ∈ Ω , or P ( C ⊕ F ) > P ( A ) P ( B )16 , for all F ∈ Ω . (3)Otherwise, there exist monotone sets B ′ and C ′ such that P ( B ⊕ B ′ ) < P ( A ) P ( B )16 and P ( C ⊕ C ′ ) < P ( A ) P ( B )16 . (4)In particular, we have P ( B ′ ) > P ( B ) / P ( C ′ ) > P ( A ). An application of FKG inequality [5]gives that P ( B ′ ∩ C ′ ) > P ( | B ′ | ) · P ( C ′ ) > P ( A ) P ( B ) . Combined with (4), it follows that P ( B ∩ C ) > P ( B ′ ∩ C ′ ) − P ( B ⊕ B ′ ) − P ( C ⊕ C ′ ) > P ( A ) P ( B ) − P ( A ) P ( B ) > , contradicting with the fact that B ∩ C = ∅ . Thus, we completed verification of (3).Without loss of generality we assume now that P ( B ⊕ F ) > P ( A ) P ( B )16 , for all F ∈ Ω (if the sameholds for C , we just apply the following analysis to C in the same manner, with the observationthat ∂ E B = ∂ E C ). By Theorem 1.4, we get that | Ψ( B ) | > n P ( A ) P ( B )16 where Ψ( B ) △ = { ( x, y ) ∈ { , } n × { , } n : | x − y | = 1 , x y, x ∈ B, y B } . For ( x, y ) ∈ Ψ( B ), we have x ∈ B and x y , and thus y ∈ A since A is a monotone set. Therefore,we get ( x, y ) ∈ ∂ E B , yielding that Ψ( B ) ⊆ ∂ E B . This implies that | ∂ E B | > n P ( A ) P ( B )16 . Combinedwith the fact that vol( B ) = n n P ( B ), it completes the proof of (2) and thus the proof of thetheorem. 3 .4 Discussions and open problems It seems plausible that the bound on the mixing time obtained in Corollary 1.2 is not sharp. Acase of particular interest is when P ( A ) > /
2. Indeed, we ask the following open question.
Question 1.1.
Suppose that there exists a constant c > such that a monotone subset A ⊂ { , } n has measure P ( A ) > c . Is it true that τ mix ( M A ) Cn log n , where C > is a constant dependingonly on c ? In a different direction our results suggest testing non-product measures. For example, supposewe wish to reproduce Theorem 1.1 for the Ising model on some graph G , where we denote by µ thestationary measure. For this to work we will need an analogue of the testing result. In this setupit is natural to define for a set S ⊂ { , } n (identifying 0 with − and 1 with +) δ ( S ) = X ( x,y ) ∈ Ψ( S ) µ ( x ) n , where Ψ( S ) = { ( x, y ) ∈ { , } n × { , } n : | x − y | = 1 , x y, x ∈ S, y S } ,ε ( S ) = min { µ ( S ⊕ A ) : A is monotone } . We then ask
Question 1.2.
Consider the ferromagnetic Ising model on a graph G = ( V, E ) . Under whatassumptions is it the case that δ ( S ) > ( ε ( S ) /n ) a for all S ⊂ { , } n and a fixed constant a > ? The following example suggests that some assumptions are needed. Consider Curie-Weiss model(Ising model on the complete graph) at low temperature (so the stationary measure admits doublewells, see [2, 3]) with n sites. For convenience, suppose that n is even. Let A = { x : P ni x i n/ } .We claim that ε ( A ) > /
6. In order to see this, let A k = { x : P ni =1 x i = k } and A ′ k = { x : P ni =1 x i = n − k } for k n/
2. For x ∈ A k and y ∈ A c , define a ( x, y ) = y ∈ A ′ k ,y > x µ ( y ) |{ y : y ∈ A ′ k , y > x }| and so a ( x, y ) = y ∈ A ′ k ,y > x µ ( x ) |{ y : y ∈ A ′ k , y > x }| . Thus, P y ∈ A c a ( x, y ) = µ ( x ) for all x ∈ A . In addition, by symmetry for every y ∈ A c we have X x y,x ∈ A a ( x, y ) = µ ( y )(so a ( · , · ) is a mass transportation from A to A c with respect to measure µ ). Therefore, for anymonotone set B we have µ ( B ∩ A c ) > X x ∈ B ∩ A X y ∈ A c a ( x, y ) = X x ∈ B ∩ A µ ( x ) = µ ( A ∩ B ) . This implies that µ ( B ∩ A c ) > µ ( B ) /
2. Combined with the simple fact that µ ( A ) = 1 /
2, it followsthat µ ( A ⊕ B ) > max( µ ( A ) − µ ( B ) , µ ( B ) / > / , as desired. However, it is clear that δ ( S ) µ ( A n/ ) , n at low temperature [2, 3].It would be interesting to further study testing other properties for various non-product distri-butions.Finally, we note that the influence to the mixing time of censoring was studied in [11], where itwas shown that the mixing can only be delayed for Glauber dynamics on monotone spin systems bycensoring some updates (the censoring is prescribed without information on what is the proposedupdate). In [7], an example was given to demonstrate that censoring can indeed speed up themixing for proper coloring. This question was then studied in [4] in much more general settings,which introduced a certain partial order on the class of stochastically monotone Markov kernels andproved that the monotonicity of Markov chains implies monotonicity of mixing times. These resultsare different from ours in at least the following two senses: (1) They focus on Markov chains withthe same stationary measure while our censoring will even change the state space of the Markovchain; (2) They aim at qualitative results which ensure monotonicity for mixing times of Markovchains under consideration, while ours aims to give a quantitative bound on the mixing time forthe censored Markov chain. References [1] D. Aldous and J. Fill.
Reversible Markov Chains and Random Walks on Graphs . In preparation,available at ∼ aldous/RWG/book.html .[2] R. S. Ellis and C. M. Newman. Limit theorems for sums of dependent random variablesoccurring in statistical mechanics. Z. Wahrsch. Verw. Gebiete , 44(2):117–139, 1978.[3] R. S. Ellis, C. M. Newman, and J. S. Rosen. Limit theorems for sums of dependent randomvariables occurring in statistical mechanics. II. Conditioning, multiple phases, and metastabil-ity.
Z. Wahrsch. Verw. Gebiete , 51(2):153–169, 1980.[4] J. Fill and J. Kahn. Comparison inequalities and fastest-mixing markov chains.
Ann. Appl.Probab. , 23(5):1778–1816, 2013.[5] C. M. Fortuin, P. W. Kasteleyn, and J. Ginibre. Correlation inequalities on some partiallyordered sets.
Comm. Math. Phys. , 22:89–103, 1971.[6] O. Goldreich, S. Goldwasser, E. Lehman, D. Ron, and A. Samorodnitsky. Testing monotonicity.
Combinatorica , 20(3):301–337, 2000.[7] A. E. Holroyd. Some circumstances where extra updates can delay mixing.
J. Stat. Phys. ,145(6):1649–1652, 2011.[8] M. Jerrum and A. Sinclair. Approximating the permanent.
SIAM J. Comput. , 18(6):1149–1178, 1989.[9] G. F. Lawler and A. D. Sokal. Bounds on the L spectrum for Markov chains and Markovprocesses: a generalization of Cheeger’s inequality. Trans. Amer. Math. Soc. , 309(2):557–580,1988. 510] D. A. Levin, Y. Peres, and E. L. Wilmer.
Markov chains and mixing times . American Math-ematical Society, Providence, RI, 2009. With a chapter by James G. Propp and David B.Wilson.[11] Y. Peres and P. Winkler. Can Extra Updates Delay Mixing?
Comm. Math. Phys. , 323(3):1007–1016, 2013.[12] R. Rubinfeld and M. Sudan. Robust characterizations of polynomials with applications toprogram testing.