Spatial Mixing and Non-local Markov chains
Antonio Blanca, Pietro Caputo, Alistair Sinclair, Eric Vigoda
SSpatial Mixing and Non-local Markov chains
Antonio Blanca ∗ Pietro Caputo † Alistair Sinclair ‡ Eric Vigoda § August 7, 2017
Abstract
We consider spin systems with nearest-neighbor interactions on an n -vertex d -dimensional cubeof the integer lattice graph Z d . We study the effects that exponential decay with distance of spincorrelations, specifically the strong spatial mixing condition (SSM), has on the rate of convergence toequilibrium distribution of non-local Markov chains. We prove that SSM implies O (log n ) mixing of a block dynamics whose steps can be implemented efficiently. We then develop a methodology, consistingof several new comparison inequalities concerning various block dynamics, that allow us to extend thisresult to other non-local dynamics. As a first application of our method we prove that, if SSM holds,then the relaxation time (i.e., the inverse spectral gap) of general block dynamics is O ( r ) , where r is thenumber of blocks. A second application of our technology concerns the Swendsen-Wang dynamics forthe ferromagnetic Ising and Potts models. We show that SSM implies an O (1) bound for the relaxationtime. As a by-product of this implication we observe that the relaxation time of the Swendsen-Wangdynamics in square boxes of Z is O (1) throughout the subcritical regime of the q -state Potts model, forall q ≥ . We also prove that for monotone spin systems SSM implies that the mixing time of systematicscan dynamics is O (log n (log log n ) ) . Systematic scan dynamics are widely employed in practice buthave proved hard to analyze. Our proofs use a variety of techniques for the analysis of Markov chainsincluding coupling, functional analysis and linear algebra. ∗ School of Computer Science, Georgia Tech, Atlanta, GA 30332. Email: [email protected] . Research supportedin part by NSF grants 1420934, 1563838 and 1617306. † Department of Mathematics, University of Roma Tre, Largo San Murialdo 1, 00146 Roma, Italy. Email: [email protected] ‡ Computer Science Division, U.C. Berkeley, Berkeley, CA 94720. Email: [email protected] . Research sup-ported in part by NSF grant 1420934. § School of Computer Science, Georgia Tech, Atlanta, GA 30332. Email: [email protected] . Research supported inpart by NSF grants 1563838 and 1617306. (cid:63)(cid:63)
Part of this work was done at the Simons Institute for the Theory of Computing. a r X i v : . [ c s . D M ] A ug Introduction
Spin systems are a general framework for modeling interacting systems of simple elements, and arise in awide variety of settings including statistical physics, computer vision and machine learning (where theyare often referred to as “graphical models” or “Markov random fields”). A spin system consists of a finitegraph G = ( V, E ) and a set S of spins ; a configuration σ ∈ S V assigns a spin value to each vertex v ∈ V .For definiteness in this version of the paper, we focus on the classical case where G is a cube in the d -dimensional lattice Z d . The probability of finding the system in a given configuration σ is given by the Gibbs (or
Boltzmann ) distribution µ ( σ ) = exp( − H ( σ )) /Z, (1)where Z is the normalizing factor (or “partition function”) and the Hamiltonian H contains terms thatdepend on the spin values at each vertex (a “vertex potential”) and at each pair of adjacent vertices (an“edge potential”). See Section 2 for a precise definition.One of the most fundamental properties of spin systems is (strong) spatial mixing (SSM) , which cap-tures the fact that the correlation between spins at different vertices decays with the distance betweenthem (uniformly over the size of the underlying graph G )—again, see Section 2 for a precise definition.SSM is closely related to the classical physical concept of a phase transition , which refers to the suddendisappearance of long-range correlations as some parameter of the system (typically, the edge or vertexpotential) is continuously varied. SSM has proved to have a number of powerful algorithmic applica-tions, both in the analysis of spin system dynamics (discussed in detail below) and in the design of efficientapproximation algorithms for the partition function (a weighted generalization of approximate counting)using the associated self-avoiding walk trees (see, e.g., [49, 41, 30, 19, 40, 42, 43]).While SSM is a static property of a spin system, there is equal interest in dynamic properties. By thiswe mean the behavior of ergodic Markov chains whose states are the configurations of the spin system andwhose equilibrium measure is the Gibbs distribution (1). Such dynamics are of interest in their own right:they provide algorithms for sampling from the Gibbs distribution and (in many cases) are a plausible modelfor the evolution of the underlying system of spins. Of particular interest are
Glauber dynamics , which ateach step pick a vertex v ∈ V uniformly at random and update its spin in a reversible fashion dependingon the neighboring spins.It has been well known since pioneering work in mathematical physics from the late 1980s (see, e.g.,[26, 1, 50, 44, 33, 34, 8]) that SSM implies that the mixing time (i.e., rate of convergence) of the Glauberdynamics is O ( | V | log | V | ) , and hence optimal [25]; indeed, the reverse implication is also true, so thephase transition is manifested in the mixing time of the dynamics (see, e.g., [44, 33, 15]). The above impli-cation was established using sophisticated functional analytic techniques, though more recently a simplecombinatorial proof was given in [15] for the special case of monotone systems (where the edge potentialfavors pairs of equal spins—see Section 6 for a precise definition).The intuition for these mixing time bounds comes from the fact that in the absence of long-rangecorrelations (i.e., SSM), the system mimics the behavior of one with no interactions where the Gibbs dis-tribution (1) is simply a product measure. Consequently, local Markov chains like the Glauber dynamicsrequire Θ( | V | log | V | ) steps to mix. On the other hand, non-local dynamics, where a large fraction of theconfiguration may be updated in a single step, could potentially converge to the Gibbs distribution muchfaster. These dynamics have to contend with the possibly high computational cost of implementing a sin-gle step. However, in some cases, non-local steps can be efficiently implemented by taking advantage ofspecific features of the models. Actually phase transitions are usually related to a weaker notion called “weak spatial mixing” (WSM); in two dimensionalspin systems WSM and SSM are known to be equivalent [35]. non-local dynamics. Our first contribution consists of tight bounds for the mixing time and the spectral gap of a blockdynamics . The spectral gap is the inverse of the relaxation time , which measures the speed of convergenceto the stationary distribution when the initial configuration is reasonably close to this distribution (a “warmstart”), whereas the mixing time assumes a worst possible starting configuration. The relaxation time isanother well studied notion of rate of convergence (see, e.g., [27, 28]).Let { A , . . . , A r } be a collection of sets (or blocks) such that V = ∪ i A i . A (heat-bath) block dynamics with blocks { A , . . . , A r } is a Markov chain that in each step picks a block A i uniformly at random andupdates the configuration in A i with a new configuration distributed according to the conditional measurein A i given the configuration in V \ A i . We first consider the following choice of blocks. Start with aregular pattern of non-overlapping d -dimensional lattice cubes of side L (cid:28) | V | /d , with a fixed minimaldistance between cubes, and let A denote the union of all cubes in this pattern. By considering all possiblelattice translations of the set A ∩ V we obtain the blocks { A , . . . , A r } where r = O ( L d ) ; see Figure 1on page 8. Each such block A i is called a tiling of V and the associated block dynamics is called the tiledblock dynamics . We refer to Section 3 for a precise definition. Theorem 1.1.
When L is a sufficiently large constant (independent of | V | ), SSM implies that the mixingtime of the tiled block dynamics is O (log n ) and that its relaxation time is O (1) . In practice, the steps of the tiled block dynamics can be implemented efficiently in parallel. However,the main significance of this result is that, in conjunction with a comparison methodology we develop, itallows us to establish several new results for standard non-local dynamics. The first consequence of thistechnology is a tight bound for the relaxation time of general block dynamics.
Theorem 1.2.
SSM implies that the spectral gap of any heat-bath block dynamics with r blocks is Ω( r ) , andhence its relaxation time is O ( r ) . We observe that there are no restrictions on the geometry of the blocks A i in this theorem, other than V = ∪ i A i . This optimal bound for the spectral gap was known before only for certain specific collectionsof blocks (see, e.g., [32, 15]), and previous analytic methods apparently do not apply to the general setting.A second application of our techniques concerns the so-called Swendsen-Wang (SW) dynamics [45].The SW dynamics is a widely studied reversible dynamics for the ferromagnetic Ising and Potts models,which are among the most important and classical of all spin systems. In the ferromagnetic q -state Pottsmodel , there are q spin values and the edge potential favors equal spins on neighbors. More precisely, µ ( σ ) ∝ exp( βa ( σ )) where a ( σ ) is the number of edges connecting vertices with the same spin values in σ , and β > is a parameter of the model. The Ising model is just the special case q = 2 .The SW dynamics is non-local, and updates the entire configuration in a single step, according to ascheme inspired by the related random-cluster model. (The exact definition of this dynamics is given inSection 4.) We prove that the relaxation time of the SW dynamics is Ω(1) , provided SSM holds. Moreformally, let SW be the transition matrix of the Swendsen-Wang dynamics for the Potts model on an n -vertex cube in Z d , and let λ ( SW ) denote its spectral gap. Theorem 1.3.
For all q ≥ , SSM implies that λ ( SW ) = Ω(1) ; hence the relaxation time of the SWdynamics is O (1) . This optimal bound for the spectral gap is a substantial improvement over the best previous result due toUllrich [47], where, in Z d , SSM was shown to imply that λ ( SW ) = Ω( n − ) . For earlier related work in Z d see [36, 9]. Tight spectral gap bounds such as ours for the SW dynamics were known previously onlyin the mean-field setting, where the graph G is the complete graph [31, 18, 4]. For other relevant worksee [22], where Guo and Jerrum proved that when q = 2 the SW dynamics mixes in polynomial time on2 ny graph. We note that our spectral gap result does not immediately imply a polylog( n ) bound on themixing time, as one might hope; this is because there is an inherent penalty of O ( n ) in relating spectralgap to mixing time, so the mixing time bound implied by Theorem 1.3 is O ( n ) .In two dimensions SSM is known to hold for all q ≥ and all β < β c ( q ) , where β c ( q ) = log(1 + √ q ) isthe uniqueness threshold; this is a consequence of the results in [3, 2, 35]. Therefore, we have the followinginteresting corollary of Theorem 1.3. Corollary 1.4.
In an n -vertex square box of Z , for all q ≥ and all β < β c ( q ) we have λ ( SW ) = Ω(1) ;hence the relaxation time of the SW dynamics is O (1) . In Z , Ullrich’s result [47] implies that the relaxation time of the SW dynamics is O ( n ) for β < β c ( q ) , O ( n log n ) for β > β c ( q ) , and at most polynomial in n for β = β c ( q ) and q = 2 . Recently, Gheissariand Lubetzky [20, 21], using the results of Duminil-Copin et al. [11, 10] settling the continuity of phasetransition, analyzed the dynamics at the critical point β c ( q ) for all q . They showed that the mixing timeis at most polynomial in n for q = 3 , at most quasi-polynomial for q = 4 , and exp(Ω( n )) for q > .Previously, Borgs et al. [6, 5] proved an exponential lower bound for the mixing time on the d -dimensionaltorus when β = β c ( q ) , but only for sufficiently large q .Our last contribution concerns the systematic scan dynamics, which is a version of Glauber dynamicsin which the vertex v to be updated is chosen not uniformly at random but according to a fixed orderingof the vertex set V ; one step of systematic scan consists of updating each vertex v ∈ V once accordingto this ordering. Systematic scan is widely employed in practice, and there is a folklore belief that itsmixing time should be closely related to that of standard (random update) Glauber dynamics; however, ithas proved much harder to analyze, and indeed a number of works have been devoted to this topic (see,e.g., [12, 13, 14, 24]). The best general condition under which systematic scan dynamics is known to berapidly mixing is due to Dyer, Goldberg and Jerrum [14], and is closely related to the Dobrushin conditionfor uniqueness of the Gibbs measure; this condition in turn is known to be stronger (and in some casessignificantly stronger) than SSM [44, 33].For the special case of monotone spin systems we can show that the systematic scan dynamics mixes in O (log n (log log n ) ) steps for any ordering of the vertices, whenever SSM holds. Additionally, for a wideclass of orderings we can show that the mixing time is O (log n ) , provided again that SSM holds. For avertex ordering O , let L ( O ) denote the length of the longest subsequence of O that is a path in G . Theorem 1.5.
In a monotone spin system on Z d , SSM implies that the mixing time for the systematic scandynamics on an n -vertex cube in Z d is O (log n (log log n ) ) for any ordering O . Moreover, if L ( O ) = O (1) then SSM implies that the mixing time is O (log n ) . Note that the condition L ( O ) = O (1) is usually easy to check in practice. Moreover, it is easy to chooseorderings O for which L ( O ) is bounded; for example, in Z d , G is always bipartite, so the ordering EO thatupdates first all the even vertices, then all the odd ones, has L ( EO ) = 2 . This particular systematic scandynamics, called the alternating scan dynamics , is used in practice to sample from the Gibbs distributionand thus has received some attention [38, 23]. Using our comparison technology we prove that, for general spin systems, the relaxation time of the alternating scan dynamics is O (1) , provided SSM holds. Theorem 1.6.
SSM implies that the relaxation time of the alternating scan dynamics on an n -vertex cube in Z d is O (1) . We emphasize that Theorem 1.6 applies to general (not necessarily monotone) spin systems. In spin sys-tems with the SSM property, the best previously known bound for the relaxation time of the alternatingscan dynamics was O ( n ) ; this bound follows from a recent result of Guo et al. [23]. We observe that sincethe alternating scan dynamics is non-reversible, its relaxation time is defined in terms of the spectral gapof its multiplicative reversiblization; see, e.g., [17, 37].3he rest of the paper is organized as follows. We conclude this introduction with a brief discussion ofour techniques. Section 2 contains some basic terminology, definitions and facts used throughout the paper.In Section 3 we derive our results for the tiled block dynamics (Theorem 1.1) and introduce our comparisontechnology in Section 3.1. In Sections 4 and 5 we provide two applications of this technology: bounds forthe spectral gaps of the SW dynamics (Theorem 1.3) and of the general block dynamics (Theorem 1.2),respectively. Finally, in Section 6 we provide our proofs for Theorems 1.5 and 1.6 concerning systematicscan dynamics. We conclude this introduction by briefly indicating some of our techniques. We use the path couplingmethod of Bubley and Dyer [7] to establish our results for the tiled (heat-bath) block dynamics in Theo-rem 1.1. Our proof of this theorem is a generalization of the methods in [15]. We then develop a novelcomparison methodology, consisting of several new comparison inequalities concerning various blockdynamics, that together with this result allow us to establish Theorems 1.2 and . . We provide next ahigh-level overview of this technology.We consider a more general class of tiled block dynamics. Suppose that for each i = 1 , . . . , r andeach configuration τ in V \ A i , we are given an ergodic Markov chain S τi that acts only on the tiling A i ,has τ as the fixed configuration in V \ A i and is reversible with respect to µ ( ·| τ ) . Given this family ofMarkov chains, we consider the tiled block dynamics that chooses a tiling A i uniformly at random from { A , . . . , A r } and updates the configuration in A i with a step of S τi , provided τ is the configuration in V \ A i . We are able to show that the spectral gap of any such tiled block dynamics is determined by thespectral gap of the tiled heat-bath block dynamics (which is considered in Theorem 1.1) and the spectralgaps of the S τi ’s. To bound the spectral gaps of the S τi ’s we crucially use the fact that, by design, the A i ’sconsists of non-interacting d -dimensional cubes of constant volume.We use this methodology in the proof of Theorem 1.2 to show that the heat-bath block dynamics withexactly two blocks, one “even” block containing all the even vertices and an “odd” one with all the oddvertices, has a constant spectral gap provided SSM holds. For this, we consider the tiled block dynamicsthat picks a tiling A i uniformly at random and with probability / performs a heat-bath update in all theeven vertices in A i , and otherwise in all the odd ones. The other part of the proof consists of establishinga comparison inequality between the spectral gaps of the even/odd heat-bath block dynamics (i.e., theblock dynamics with exactly two blocks: the even and odd ones) and general heat-bath block dynamics(i.e., where the collection of blocks { A , . . . , A r } is arbitrary). For this, we use two key properties of thevariance functional: monotonicity and tensorization.To derive our results for the SW dynamics in Theorem 1.3 we introduce an auxiliary variant of the SWdynamics that only updates isolated vertices (instead of connected components of any size). This isolatedvertices variant can be compared to a tiled block dynamics that in a step updates all the isolated verticesin a single block A i chosen uniformly at random from { A , . . . , A r } . Our comparison methodology aboveis then used to show that the spectral gap of this tiled block dynamics is Ω(1) . To establish comparisoninequalities between the spectral gaps of the SW dynamics, the isolated vertices variant of the SW dynamicsand the tiled block dynamics that updates isolated vertices in a tiling, we use elementary functional analysisand the comparison framework of Ullrich [47, 48, 46].The proof of our later theorem on systematic scan for monotone systems (Theorem 1.5) is looselybased on ideas from [15]. Finally, to establish our result for the alternating scan dynamics (Theorem 1.6),we relate the spectral gap of this dynamics to that of the even/odd heat-bath block dynamics, which weanalyze in the proof of Theorem 1.2. 4
Background
Let L = ( Z d , E ) be the infinite d -dimensional lattice graph, where for u, v ∈ Z d , ( u, v ) ∈ E iff || u − v || =1 . Let V be a finite subset of Z d and let G = ( V, E ) be the induced subgraph. We use ∂V to denote theboundary of G , i.e., the set of vertices in Z d \ V connected by an edge in E to V .A spin system on G consists of a set of spins S = { , . . . , q } , a symmetric edge potential U : S × S → R and a vertex potential W : S → R . A configuration σ : V → S of the system is an assignment of spinsto the vertices of G ; we denote by Ω the set of all configurations. A boundary condition ψ for G is anassignment of spins to some (or all) vertices in ∂V ; i.e., ψ : A ψ → S with A ψ ⊂ ∂V . The boundarycondition where A ψ = ∅ is called the free boundary condition .Given a boundary condition ψ , each configuration σ ∈ Ω is assigned probability µ ψ ( σ ) = 1 Z · e − H ψG ( σ ) , where Z is the normalizing constant and H ψG ( σ ) = − (cid:88) ( u,v ) ∈ E U ( σ ( u ) , σ ( v )) − (cid:88) ( u,v ) ∈ E : u ∈ A ψ ,v ∈ V U ( ψ ( u ) , σ ( v )) − (cid:88) u ∈ V W ( σ ( u )) . In the statistical physics literature, Z is called the partition function and H ψG the Hamiltonian of the system.A particularly well known and widely studied spin system is the
Ising/Potts model , where S = { ,. . ., q } , U ( s , s ) = β · ( s = s ) and W ( s ) = βh s . The parameter β ∈ R is related to the inverse temperatureof the system and ( h , ..., h q ) ∈ R q to an external magnetic field. In Section 4 we analyze dynamics forthe Ising/Potts model with ferromagnetic interactions ( β > ) and no external field ( h i = 0 for all i ). Remark . There are important spin systems, such as the hard-core model and the antiferromagnetic Pottsmodel at zero temperature (proper q -colorings), that require the edge potential U to be infinite for certainconfigurations; namely, there are hard constraints in the system that make certain configurations invalid.Our results in Sections 3, 5 and 6 hold in this more general setting provided the system is permissive . Aspin system is permissive if for any V ⊂ Z d and any configuration τ on Z d \ V , there is at least oneconfiguration σ on V such that µ ( σ | τ ) > . This ensures that the measure µ ( ·| τ ) is well-defined. It iseasy to verify that, in addition to systems without hard constraints, the hard-core model for all λ > andproper q -colorings when q ≥ d + 1 are all permissive systems. Consider the spin system ( S = { , . . . , q } , U, W ) on G = ( V, E ) with a fixed boundary condition ψ . Let M be a Markov chain that, given a configuration σ on V , performs the following update:1. Pick v ∈ V uniformly at random (u.a.r.);2. Replace σ ( v ) with a spin from S = { , ..., q } sampled according to the distribution µ ( ·| σ ( V \ v )) .This Markov chain is called the (heat-bath) Glauber dynamics . M is clearly reversible with respect to(w.r.t.) µ ψ and, to avoid complications, we assume that it is irreducible. (This is always the case in systemswithout hard constraints, but M could be reducible for some permissive systems; e.g., proper q -coloringswhen q = 2 d + 1 .) 5 .3 Strong spatial mixing (SSM) Several notions of decay of correlations in spin systems have been useful in the analysis of local algorithms.A particularly important one is SSM, which says that the influence of a set on another decays exponentiallywith the distance between these sets.For a fixed finite V ⊂ Z d and a, b > , let C ( V, a, b ) be the condition that for all B ⊂ V , all u ∈ ∂V ,and any pair of boundary conditions ψ , ψ u on ∂V that differ only at u , we have (cid:107) µ ψB − µ ψ u B (cid:107) tv ≤ b exp( − a · dist( u, B )) , (2)where µ ψB and µ ψ u B are the probability measures induced in B by µ ψ and µ ψ u , respectively, (cid:107) · (cid:107) tv denotestotal variation distance and dist( u, B ) = min v ∈ B (cid:107) u − v (cid:107) . Definition 2.1.
A spin system on Z d has SSM if there exist a, b > such that C (Λ , a, b ) holds for every d -dimensional cube Λ ⊂ Z d . Remark . The definition of SSM varies in the literature. The main difference lies in the class of subsets V ⊂ Z d for which C ( V, a, b ) is required to hold. The two boundary conditions may also differ on a largersubset of ∂V . We work here with one of the weakest versions of SSM. In particular, this notion is known tohold for the Ising/Potts model on Z for all q ≥ and β < β c ( q ) , where β c ( q ) is the uniqueness threshold. Let M be an ergodic Markov chain over Ω with stationary distribution µ ψ . Let M t ( X , · ) denote thedistribution of M after t steps starting from X ∈ Ω , and let τ mix ( M, ε ) = max X ∈ Ω min (cid:110) t ≥ (cid:107) M t ( X , · ) − µ ψ (cid:107) tv ≤ ε (cid:111) . The mixing time of M is defined as τ mix ( M ) = τ mix ( M, / .A (one step) coupling of the Markov chain M specifies, for every pair of states ( X t , Y t ) ∈ Ω × Ω ,a probability distribution over ( X t +1 , Y t +1 ) such that the processes { X t } and { Y t } , viewed in isolation,are faithful copies of M , and if X t = Y t then X t +1 = Y t +1 . Let T coup ( ε ) be the minimum T such that Pr[ X T (cid:54) = Y T ] ≤ ε , maximized over pairs of initial configurations X , Y . The following inequality isstandard: τ mix ( M, ε ) ≤ T coup ( ε ); (see, e.g., [29]). The coupling time is T coup = T coup (1 / and thus τ mix ( M ) ≤ T coup . Moreover, if T = k · T coup for any positive integer k , then Pr[ X T (cid:54) = Y T ] ≤ / k . (3) Our proofs use elementary notions from functional analysis, which we briefly review here. For extensivebackground on the application of such ideas to the analysis of finite Markov chains, see [39, 37].Let P be the transition matrix of a finite irreducible Markov chain with state space Ω and stationarydistribution µ . For any f ∈ R | Ω | , we let P f ( x ) = (cid:80) y ∈ Ω P ( x, y ) f ( y ) . If we endow R | Ω | with the innerproduct (cid:104) f, g (cid:105) µ = (cid:80) x ∈ Ω f ( x ) g ( x ) µ ( x ) , we obtain a Hilbert space denoted L ( µ ) = ( R | Ω | , (cid:104)· , ·(cid:105) µ ) and P defines an operator from L ( µ ) to L ( µ ) . The Cauchy-Schwarz inequality implies (cid:104) f, P f (cid:105) µ ≤ (cid:104) f, f (cid:105) µ . (4)6onsider two Hilbert spaces S and S with inner products (cid:104)· , ·(cid:105) S and (cid:104)· , ·(cid:105) S respectively, and let K : S → S be a bounded linear operator. The adjoint of K is the unique operator K ∗ : S → S satisfying (cid:104) f, Kg (cid:105) S = (cid:104) K ∗ f, g (cid:105) S for all f ∈ S and g ∈ S . If S = S , K is self-adjoint when K = K ∗ .In our setting, the adjoint of P in L ( µ ) is given by the transition matrix P ∗ ( x, y ) = µ ( y ) P ( y, x ) /µ ( x ) ,and therefore P is self-adjoint iff P is reversible w.r.t. µ . In this case the spectrum of P is real and we let λ > λ ≥ ... ≥ λ | Ω | ≥ − denote its eigenvalues ( > λ because P is irreducible). The absolutespectral gap of P is defined by λ ( P ) = 1 − λ ∗ , where λ ∗ = max {| λ | , | λ | Ω | |} . If P is ergodic (i.e.,irreducible and aperiodic), then λ ( P ) > , and it is a standard fact that for all ε > all reversible Markovchains satisfy τ mix ( P, ε ) ≥ (cid:0) λ ( P ) − − (cid:1) log (cid:18) ε (cid:19) , (5)(see Theorem 12.4 in [29]). λ − ( P ) is called the relaxation time . P is positive semidefinite if P = P ∗ and (cid:104) f, P f (cid:105) µ ≥ , ∀ f ∈ R | Ω | . In this case P has only nonnegativeeigenvalues. The Dirichlet form of a reversible Markov chain is defined as E P ( f, f ) = (cid:104) f, ( I − P ) f (cid:105) µ = 12 (cid:88) x,y ∈ Ω µ ( x ) P ( x, y )( f ( x ) − f ( y )) , for any f ∈ R | Ω | . If P is positive semidefinite, then the absolute spectral gap of P satisfies λ ( P ) = 1 − λ = min f ∈ R | Ω | , Var µ ( f ) (cid:54) =0 E P ( f, f )Var µ ( f ) , (6)where Var µ ( f ) = (cid:80) x ∈ Ω ( f ( x ) − µ ( f )) µ ( x ) and µ ( f ) = (cid:80) x ∈ Ω f ( x ) µ ( x ) . Let V ⊂ Z d be a d -dimensional cube of volume n . Let G = ( V, E ) be the induced subgraph and let ψ bea fixed boundary condition on ∂V . For ease of notation we set µ = µ ψ .Let { A , . . . , A r } be a collection of sets (or blocks) such that V = ∪ i A i . A block dynamics w.r.t. thiscollection of sets is a Markov chain that in each step picks a set A i uniformly at random from { A , . . . , A r } and updates the configuration in A i . The heat-bath block dynamics corresponds to the case where theconfiguration in A i is replaced by a new configuration distributed according to the conditional measurein A i given the configuration in V \ A i .In this section we consider two different versions of the block dynamics for a particular collection ofsets, that with slight abuse of terminology we call tilings . The steps of this dynamics can be efficientlyimplemented in parallel, so we believe it is interesting in its own right. Moreover, the mixing time andspectral gap bounds we derive here will be crucially used later in our proofs in Sections 4 and 5, where weconsider the SW dynamics and general block dynamics, respectively.We define the collection of blocks first, which we denote D . Let L (cid:28) n /d be an odd integer. For each x i ∈ { , . . . , L + 2 } d ⊂ Z d , let C ( x i ) be the union of all d -dimensional cubes of side length L − withcenters at x i + (cid:126)h ( L + 3) for some (cid:126)h ∈ Z d . The cubes in C ( x i ) have volume L d and are at distance fromeach other (see Figure 1). For each x i ∈ { , . . . , L +2 } d , let B i = C ( x i ) ∩ V and let D = { B , B , . . . , B m } ;then m = ( L + 3) d . We call each B i a tiling of V since it corresponds to a tiling of Z d with cubes of sidelengh L + 3 . Any block dynamics w.r.t. D is called a tiled block dynamics . For A ⊂ Z d , the volume of A is | A | . L -14 (a) v L -14 (b) v L -14 (c) Figure 1: Three distinct tilings of V . In (a) vertex v is in the interior of the tiling; in (b) vertex v is in theexterior; and in (c) vertex v is right on the boundary of the tiling. Remark . In our proofs we will choose L to be a sufficiently large constant independent of n . The choice ofthe distance between the d -dimensional cubes is so that neighboring cubes do not interact. This distanceis sufficient because we are considering spin systems with only nearest-neighbor interactions. To extendour proofs to arbitrary finite range spin systems on Z d it suffices to choose a larger distance between thesecubes.Let B D be the transition matrix of the heat-bath tiled block dynamics . That is, given a configuration σ t ∈ Ω at time t , the chain proceeds as follows:1. Pick k ∈ { , ..., m } u.a.r.;2. Update the configuration in B k with a sample from µ ( · | σ t ( V \ B k )) .This chain is clearly ergodic and reversible w.r.t. µ . We prove the following lemma, which corresponds toTheorem 1.1 from the introduction. Lemma 3.1.
When L is a sufficiently large constant (independent of n ), SSM implies that τ mix ( B D ) = O (log n ) and λ ( B D ) ≥ / .Proof. The proof is a generalization of the path coupling argument in [15]. Let X t and Y t be two copies ofthe tiled heat-bath block dynamics B D that differ at a single vertex v ∈ V . We construct a coupling of thesteps of B D such that the expected number of disagreements between X t +1 and Y t +1 is strictly less thanone.The region chosen in step 1 of the chain is the same in both copies. For every tiling B k there are threepossibilities (see Figure 1):(a) v ∈ B k , in which case we use the same configuration for B k in both copies and so X t +1 = Y t +1 with probability 1;(b) v ∈ V \ ( B k ∪ ∂B k ) , and again we use the same configuration to update B k in both copies. Then, X t +1 and Y t +1 differ only at v with probability 1; or(c) v ∈ ∂B k . In this case disagreements could propagate from v to the interior of B k , but we describenext a coupling that limits the extent of such propagation.Case (a) occurs with probability L d / ( L + 3) d ≥ / , for large enough L . Let us consider case (c); i.e., v ∈ ∂B k . This case occurs with probability at most dL d − / ( L + 3) d ≤ d/L . Moreover, v is in the8oundary of exactly one of the smaller cubes (of side length at most L − ) in B k , which we denote Λ .The cube Λ can be partitioned into the sets of vertices that are close and far from v . More precisely, let R = (cid:0) L d (cid:1) /d , C = { u ∈ Λ : dist( u, v ) ≤ R } and F = Λ \ C . SSM implies (cid:107) µ ψF − µ ψ v F (cid:107) tv ≤ b exp {− a dist( v, F ) } , where ψ and ψ v are the two boundary conditions induced in Λ by X t and Y t , respectively, and thus differonly at v . This implies that there is a coupling of the distributions µ ψF and µ ψ v F such that if ( Z , Z ) is asample from this coupling (so, Z and Z are configurations on F ), then Pr[ Z (cid:54) = Z ] ≤ b exp {− a dist( v, F ) } ≤ b exp {− aR } ≤ L d , where the last inequality holds for large enough L . Hence, we can couple the update on Λ such that X t +1 and Y t +1 disagree on F with probability at most L − d . Then, the expected number of disagreements in Λ is crudely bounded by | C | + | F | L d ≤ (2 R ) d + 1 ≤ L d + 1 . The same configuration is used to update both copies in B k \ Λ and so X t +1 ( B k \ Λ) = Y t +1 ( B k \ Λ) withprobability one. This is possible because the configuration in the boundary of B k \ Λ is the same in both X t and Y t .Combining all these facts, we get there is a coupling such that the expected number of disagreementsat time t + 1 is at most: −
12 + 2 dL (cid:18) L d + 1 (cid:19) = 34 + 2 dL ≤ , provided that L is large enough. The path coupling method [7] then implies that max σ ∈ Ω (cid:107)B t D ( σ, · ) − µ ( · ) (cid:107) tv ≤ n (cid:18) (cid:19) t . This implies that the mixing time of B D is O (log n ) and that λ ∗ ( B D ) ≤ / (see, e.g., Corollary 12.6 in[29]); hence, λ ( B D ) = 1 − λ ∗ ( B D ) ≥ / as claimed. In this subsection we introduce a more general class of tiled block dynamics and relate the spectral gapsof the dynamics in this class to that of the heat-bath tiled block dynamics. This will allow us to deducebounds for the spectral gaps of various tiled block dynamics, a key step in our comparison methodology.Each dynamics in this class chooses a tiling B k uniformly at random from D and updates the config-uration in B k in a reversible fashion. Formally, for each ≤ k ≤ m and each valid configuration τ in B ck = V \ B k , let S τk be the transition matrix of an ergodic Markov chain whose state space is the set ofvalid configurations in B k given that τ is the configuration in B ck . That is, S τk is a Markov chain acting onthe specific tiling B k with τ as the fixed configuration in the exterior of B k . We assume that, for each k and τ , S τk is reversible w.r.t. µ ( ·| τ ) and positive semidefinite. Using the S τk ’s we define a tiled block dynamicsas follows. Given a spin configuration σ t ∈ Ω , consider the chain that performs the following update toobtain σ t +1 ∈ Ω :1. Pick k ∈ { , ..., m } u.a.r.;2. If τ = σ t ( B ck ) , let σ t +1 ( B ck ) = τ and perform a step of S τk to obtain σ t +1 ( B k ) .9et S D denote the transition matrix of this chain. The ergodicity and reversibility of S D w.r.t. µ follow fromthe ergodicity and reversibility of the S τk ’s w.r.t. µ ( ·| τ ) . We establish the following inequality between thespectral gaps of B D and S D . For A ⊂ V , let Ω( A ) be the set of the valid configurations of A . Then, Lemma 3.2. λ ( S D ) ≥ λ ( B D ) min k =1 ,...,m min τ ∈ Ω( B ck ) λ ( S τk ) . In words, this inequality states that the spectral gap of a generic tiled block dynamics S D is bounded frombelow by the spectral gap of the tiled heat-bath block dynamics times the smallest spectral gap of anyof the S τk ’s. This is indeed a natural inequality since roughly λ − ( S τk ) steps of S τk should be enough tosimulate one step of B D in B k when τ is the configuration in B ck . Lemmas 3.1 and 3.2 put together allow usto bound the spectral gap of a general class of tiled block dynamics, provided that SSM holds and that weknow the spectral gaps of the S τk ’s. As we shall see in our later applications of these results, the geometryof the tilings in D was chosen in a way that facilitates the analysis of many natural choices of the S τk ’s.Before proving Lemma 3.2 we state the two standard properties of heat-bath updates which will beused in the proof. For A ⊂ V let K A be the transition matrix that corresponds to a heat-bath update inthe set A . That is, for σ, σ (cid:48) ∈ Ω , K A ( σ, σ (cid:48) ) = ( σ ( A c ) = σ (cid:48) ( A c )) µ ( σ (cid:48) ( A ) | σ ( A c )) . For ease of notation let E A denote the Dirichlet form of K A ; i.e., E A = E K A . Fact 3.3. K A is positive semidefinite. Moreover, for any f ∈ R | Ω | E A ( f, f ) = (cid:88) τ ∈ Ω( A c ) Var τA ( f ) µ ( τ ) , where Var τA ( f ) = E τA [( f − E τA [ f ]) ] and E τA [ f ] = (cid:80) σ ∈ Ω( A ) f ( σ ∪ τ ) µ ( σ | τ ) . We proceed with the proof of Lemma 3.2.
Proof of Lemma 3.2.
Let f ∈ R | Ω | . Since B D = m (cid:80) mk =1 K B k , E B D ( f, f ) = 1 m m (cid:88) k =1 E B k ( f, f ) = 1 m m (cid:88) k =1 (cid:88) τ ∈ Ω( B ck ) Var τB k ( f ) µ ( τ ) , (7)by Fact 3.3.For τ ∈ Ω( B ck ) , let Ω τ ( B k ) be the set of valid configurations on B k given that τ is the configuration on V \ B k . For f ∈ R | Ω | , let f τ ∈ R | Ω τ ( B k ) | be such that f τ ( σ ) = f ( σ ∪ τ ) for any σ ∈ Ω τ ( B k ) . By assumption, S τk is positive semidefinite, ergodic and reversible w.r.t. µ ( · | τ ) . Since also Var µ ( ·| τ ) ( f τ ) = Var τB k ( f ) , from(6), we get < λ ( S τk ) ≤ E S τk ( f τ , f τ )Var µ ( ·| τ ) ( f τ ) = E S τk ( f τ , f τ )Var τB k ( f ) . (8)Let λ min = min k =1 ,...,m min τ ∈ Ω( B ck ) λ ( S τk ) . E S D ( f, f ) = 1 m m (cid:88) k =1 (cid:88) τ ∈ Ω( B ck ) µ ( τ ) E S τk ( f τ , f τ ) (9) ≥ m m (cid:88) k =1 (cid:88) τ ∈ Ω( B ck ) µ ( τ ) λ ( S τk )Var τB k ( f ) ≥ λ min E B D ( f, f ) . Finally, we claim that both B D and S D are positive semidefinite. B D is an average over heat-bathupdates each of which is positive semidefinite by Fact 3.3. Hence, B D is positive semidefinite. Similarly,the positivity of S D follows from the fact that by assumption the S τk ’s are positive semidefinite. Indeed,from (9) and the definition of Dirichlet form, we get (cid:104) f, S D f (cid:105) µ = 1 m m (cid:88) k =1 (cid:88) τ ∈ Ω( B ck ) µ ( τ ) (cid:104) f τ , S τk f τ (cid:105) µ ( ·| τ ) ≥ . Therefore, by (6), λ ( S D ) ≥ λ ( B D ) λ min , as claimed.We conclude this section with the proof of Fact 3.3. Proof of Fact 3.3.
Since K A = K ∗ A = K A , K A positive semidefinite. For τ ∈ Ω( A c ) , let Ω τ ( A ) be the setof valid configurations on A when the configuration on V \ A is τ . Then, by the definition of the Dirichletform, E A ( f, f ) = 12 (cid:88) τ ∈ Ω( A c ) (cid:88) σ,σ (cid:48) ∈ Ω τ ( A ) µ ( σ ∪ τ ) µ ( σ (cid:48) | τ )( f ( σ ∪ τ ) − f ( σ (cid:48) ∪ τ )) = 12 (cid:88) τ ∈ Ω( A c ) µ ( τ ) (cid:88) σ,σ (cid:48) ∈ Ω τ ( A ) µ ( σ | τ ) µ ( σ (cid:48) | τ )( f ( σ ∪ τ ) − f ( σ (cid:48) ∪ τ )) = (cid:88) τ ∈ Ω( A c ) Var τA ( f ) µ ( τ ) . In this section we show that SSM implies fast mixing of the
Swendsen-Wang (SW) dynamics. In particular,we prove that when V ⊂ Z d is a finite d -dimensional cube, the relaxation time (i.e., the inverse spectralgap) of the SW dynamics on the graph induced by V is at most O (1) , provided the system has SSM.The SW dynamics is a non-local Markov chain for the ferromagnetic Potts model ( β > ) with noexternal field ( h i = 0 for all i ); see Section 2.1 for the definition of this model. The state space of the SWdynamics is the set of Potts configurations Ω P , and it is straightforward to verify the reversibility of thischain w.r.t. the Potts measure, which, for distinctness, we will denote π (see, e.g., [16]). We focus here onthe free boundary condition case for clarity, but our results hold without significant modifications for theSW dynamics with arbitrary boundary conditions.Let V ⊂ Z d be a d -dimensional cube of volume n and let G = ( V, E ) be the induced subgraph. Givena Potts configuration σ t , a step of the SW dynamics results in a new configuration σ t +1 as follows:1. Add each monochromatic edge independently with probability p = 1 − e − β to obtain a joint config-uration ( A t , σ t ) , where A t ⊆ E and an edge ( u, v ) is monochromatic if σ t ( u ) = σ t ( v ) ;11. Assign to each connected component of ( V, A t ) independently a new spin from { , . . . , q } u.a.r.;3. Remove all edges to obtain the new Potts configuration σ t +1 .Let SW be the transition matrix of the SW dynamics on G . In this section we prove Theorem 1.3 from theintroduction. Corollary 1.4 follows directly from Theorem 1.3 and the fact that, in Z , SSM holds for all β < β c ( q ) and q ≥ (see [3, 2, 35]). In the proof of Theorem 1.3 we use several auxiliary Markov chainsthat we define and briefly motivate in Section 4.1. The proof of Theorem 1.3 is then provided in Section 4.2. In Section 3 we established that the spectral gap of the heat-bath tiled block dynamics is at least / ,provided SSM holds (see Lemma 3.1). To prove Theorem 1.3 we show that the spectral gap of the SWdynamics is at least the spectral gap of the heat-bath tiled blocked dynamics times a constant that dependsonly on β , L and d . Establishing such inequality directly seems difficult because the SW dynamics couldchange the spins in a large component intersecting many of the d -dimensional cubes in a tiling. To workaround this issue we introduce the following Markov chain. Isolated vertices (SW) dynamics I sw . Consider the Markov chain that, given a Potts configuration σ t at time t , performs the following update to obtain σ t +1 :1. Add each monochromatic edge independently with probability p to obtain ( A t ⊆ E, σ t ) ;2. Assign to each isolated vertex of ( V, A t ) independently a new spin from { , . . . , q } u.a.r.;3. Remove all edges to obtain σ t +1 .We call this chain the isolated vertices dynamics and with a slight abuse of notation we let I sw also denoteits transition matrix. Intuitively, the SW dynamics ought to be faster than the isolated vertices dynamicssince it updates all the components of any size simultaneously, instead of just the isolated vertices. Weshow that this is indeed the case. Lemma 4.1. λ ( SW ) ≥ λ ( I sw ) . The proof of this lemma is given in Section 4.2.2. The motivation for introducing I sw is that now we caneasily define a tiled variant of this chain as follows. Isolated vertices tiled dynamics I D . Recall that D = { B , . . . , B m } is the collection of tilings; seeSection 3 for the precise definition. Given a Potts configuration σ t , one step of the isolated vertices tileddynamics is given by:1. Add each monochromatic edge independently with probability p to obtain ( A t ⊆ E, σ t ) ;2. Pick k ∈ { , ..., m } u.a.r.;3. Assign to each isolated vertex in B k independently a new spin from { , . . . , q } u.a.r.;4. Remove all edges to obtain σ t +1 .We use I D to denote the transition matrix of this chain. Intuitively, I sw should reach equilibrium fasterthan I D since in each step it updates the spins of all isolated vertices, instead of just those in a single tiling.This intuition is made rigorous in the following lemma, which is proved in Section 4.2.2. Lemma 4.2. λ ( I sw ) ≥ λ ( I D ) . S τk ’s from Section 3 for the tiled dynamics I D . Conditional isolated vertices tiled dynamics I τk . For each k = 1 , . . . , m and each fixed configuration τ in B ck , we consider the Markov chain with transition matrix I τk and state space Ω P ( B k ) , that if σ t ∈ Ω P ( B k ) , then σ t +1 ∈ Ω P ( B k ) is obtained as follows:1. Add each monochromatic edge in E (according to σ t ∪ τ ) independently with probability p ;2. Assign to each isolated vertex in B k independently a new spin from { , . . . , q } u.a.r.;3. Remove all edges to obtain σ t +1 . Let λ min = min k =1 ,...,m min τ ∈ Ω P ( B ck ) λ ( I τk ) . (Recall that Ω P ( B ck ) is the set of valid configurations of B ck and I τk is the conditional isolated vertex tileddynamics on B k with τ as the fixed configuration in the exterior of B k .) We prove the following twolemmas that, together with Lemmas 4.1 and 4.2 and the results in Section 3, imply Theorem 1.3. Lemma 4.3. (i) I sw and I D are reversible w.r.t. π and positive semidefinite.(ii) For all k = 1 , . . . , m and τ ∈ Ω P ( B ck ) , I τk is reversible w.r.t. π ( ·| τ ) and positive semidefinite. Lemma 4.4. λ min ≥ e − βdL d .Proof of Theorem 1.3. By Lemmas 4.1 and 4.2, λ ( SW ) ≥ λ ( I sw ) ≥ λ ( I D ) . I D is a tiled block dynamics. Indeed, if τ is the configuration in B ck , then the configuration in B k isupdated with a step of the ergodic Markov chain I τk . By Lemma 4.3, I D is reversible w.r.t. π and positivesemidefinite. Lemma 4.3 also implies that I τk is reversible w.r.t. π ( ·| τ ) and positive semidefinite, for all k = 1 , . . . , m and τ ∈ Ω P ( B ck ) . Hence, by Lemma 3.2 λ ( I D ) ≥ λ min λ ( B D ) . By Lemma 3.1, when L is a sufficiently large constant (independent of n ), SSM implies that λ ( B D ) ≥ / .Moreover, by Lemma 4.4, λ min ≥ e − βdL d . Then λ ( SW ) ≥
156 e − βdL d , and the result follows from the fact that L = O (1) .The rest of this section is organized as follows. The proofs of Lemmas 4.1, 4.2 and 4.3 use a commonrepresentation of the Markov chains SW , I sw and I D which we introduce in Section 4.2.1. The actualproofs of these lemmas are provided in Section 4.2.2. The proof of Lemma 4.4 is provided in Section 4.2.3and crucially uses the fact that by design the d -dimensional cubes of side length L − in each tiling donot interact with each other. 13 .2.1 Common representation We provide here a decomposition of the transition matrices SW , I sw and I D as products of simplermatrices, which will be used in our proofs of Lemmas 4.1, 4.2 and 4.3. We are able to do this becausethe steps of these chains all include a “lifting” substep to a joint configuration space Ω J ⊂ Ω P × E , whereconfigurations consist of a spin assignment to the vertices together with a subset of the edges of G . Thejoint Edwards-Sokal measure ν on Ω J is given by ν ( A, σ ) = p | A | (1 − p ) | E \ A | ( A ⊆ E ( σ )) , where p = 1 − e − β , A ⊂ E , σ ∈ Ω P and E ( σ ) denotes the set of monochromatic edges of E in σ [16].Let T be the | Ω P | × | Ω J | matrix indexed by Potts and joint configurations given by: T ( σ, ( A, τ )) = ( σ = τ ) ( A ⊆ E ( σ )) p | A | (1 − p ) | E ( σ ) \ A | , where σ ∈ Ω P and ( A, τ ) ∈ Ω J . The matrix T corresponds to adding each monochromatic edge of E in σ independently with probability p , as in step 1 of the SW dynamics, and defines an operator from L (Ω J , ν ) to L (Ω P , π ) . It is straightforward to check that its adjoint operator T ∗ : L (Ω P , π ) → L (Ω J , ν ) is givenby the | Ω J | × | Ω P | matrix T ∗ (( A, τ ) , σ ) = ( τ = σ ) , with ( A, τ ) ∈ Ω J and σ ∈ Ω P . T ∗ corresponds to step 3 of the SW dynamics. Finally, let R be a | Ω J |×| Ω J | matrix indexed by joint configurations such that R (( A, σ ) , ( B, τ )) = ( A = B ) ( A ⊆ E ( σ ) ∩ E ( τ )) · q − c ( A ) , where c ( A ) is the number of connected components of ( V, A ) and ( A, σ ) , ( B, τ ) ∈ Ω J . The matrix R corresponds to assigning a new spin from { , . . . , q } u.a.r. to each connected component of ( V, A ) inde-pendently as in step 2 of the SW dynamics. Hence, we get SW = T RT ∗ . This useful decompositionof the SW dynamics was discovered first in [47, 48, 46] and has already been used in other comparisonarguments involving the SW dynamics (see, e.g., [4, 20]).The following | Ω J | × | Ω J | matrices allow us to obtain similar decompositions for I sw and I D . For ( A, σ ) , ( B, τ ) ∈ Ω J , let Q (( A, σ ) , ( B, τ )) = ( A = B ) ( A ⊆ E ( σ ) ∩ E ( τ )) ( σ ( V \ I ( A )) = τ ( V \ I ( A ))) · q −|I ( A ) | Q k (( A, σ ) , ( B, τ )) = ( A = B ) ( A ⊆ E ( σ ) ∩ E ( τ )) ( σ ( V \ I k ( A )) = τ ( V \ I k ( A ))) · q −|I k ( A ) | where I ( A ) , I k ( A ) denote the sets of isolated vertices in V and B k , respectively. Then, the following factsfollow straightforwardly from the definition of these matrices: Fact 4.5. (i) I sw = T QT ∗ ;(ii) I D = m (cid:80) mk =1 T Q k T ∗ . In this subsection we provide our proofs of Lemmas 4.1, 4.2 and 4.3, all of which use the common repre-sentation of the transition matrices SW , I sw and I D introduced in Section 4.2.1, as well as the analytictools briefly reviewed in Section 2.5. 14 roofs of Lemmas 4.1 and 4.2. The matrix R is symmetric and ν ( A, σ ) = ν ( A, τ ) for all A ⊂ E and σ, τ ∈ Ω P compatible with A ; hence R is reversible w.r.t. the joint measure ν and R = R ∗ . The same holds for Q and Q k for all k = 1 , . . . , m . Moreover, since the matrices R , Q and Q k assign spins u.a.r. to componentsof a joint configuration, we deduce the following. Fact 4.6. (i) R , Q and Q k define self-adjoint idempotent operators from L (Ω J , ν ) to L (Ω J , ν ) .(ii) R = QRQ and Q = Q k QQ k . Using this fact and the definition of the adjoint operator we get that for any f ∈ R | Ω P | (cid:104) f, SW f (cid:105) π = (cid:104) f, T RT ∗ f (cid:105) π = (cid:104) f, T QRQT ∗ f (cid:105) π = (cid:104) QT ∗ f, RQT ∗ f (cid:105) ν ≤ (cid:104) QT ∗ f, QT ∗ f (cid:105) ν = (cid:104) f, T Q T ∗ f (cid:105) π = (cid:104) f, I sw f (cid:105) π , (10)where the inequality follows from (4). Similarly, for any f ∈ R | Ω P | (cid:104) f, I sw f (cid:105) π = (cid:104) f, T QT ∗ f (cid:105) π = (cid:104) f, T Q k QQ k T ∗ f (cid:105) π = (cid:104) Q k T ∗ f, QQ k T ∗ f (cid:105) ν ≤ (cid:104) Q k T ∗ f, Q k T ∗ f (cid:105) ν = (cid:104) f, T Q k T ∗ f (cid:105) π = (cid:104) f, T Q k T ∗ f (cid:105) π . Since this holds for every k , we get (cid:104) f, I sw f (cid:105) π ≤ m m (cid:88) k =1 (cid:104) f, T Q k T ∗ f (cid:105) π = (cid:104) f, I D f (cid:105) π . (11)Putting (10) and (11) together we get (cid:104) f, SW f (cid:105) π ≤ (cid:104) f, I sw f (cid:105) π ≤ (cid:104) f, I D f (cid:105) π . By Fact 4.6, R = R = R ∗ and so (cid:104) f, SW f (cid:105) π = (cid:104) RT ∗ f, RT ∗ f (cid:105) π ≥ . Hence, the matrices SW , I sw and I D are all positive semidefinite. Then, from the definition of the Dirichlet form and (6), we get λ ( SW ) ≥ λ ( I sw ) ≥ λ ( I D ) , as claimed. Proof of Lemma 4.3.
Fact 4.6 implies that I ∗ sw = ( T QT ∗ ) ∗ = I sw and I ∗D = m (cid:80) mk =1 ( T Q k T ∗ ) ∗ = I D .Hence I sw , I D define self-adjoint operators from L (Ω P , π ) to L (Ω P , π ) and so I sw , I D are reversiblew.r.t. π . Moreover, Q = Q = Q ∗ by Fact 4.6 and thus (cid:104) f, I sw f (cid:105) π = (cid:104) QT ∗ f, QT ∗ f (cid:105) ν ≥ . Therefore, I sw is positive semidefinite. Similarly, we obtain that I D is positive semidefinite, which concludes the proof ofpart (i) of the lemma.For part (ii), observe that by definition I τk ( σ, σ (cid:48) ) = T Q k T ∗ ( σ ∪ τ, σ (cid:48) ∪ τ ) for all σ, σ (cid:48) ∈ Ω P ( B k ) and τ ∈ Ω P ( B ck ) . Since T Q k T ∗ = ( T Q k T ∗ ) ∗ by Fact 4.6, T Q k T ∗ is reversible w.r.t. π . Hence, π ( σ ∪ τ ) T Q k T ∗ ( σ ∪ τ, σ (cid:48) ∪ τ ) = π ( σ (cid:48) ∪ τ ) T Q k T ∗ ( σ (cid:48) ∪ τ, σ ∪ τ ) π ( σ | τ ) I τk ( σ, σ (cid:48) ) = π ( σ (cid:48) | τ ) I τk ( σ (cid:48) , σ ) I τk is reversible w.r.t. π ( ·| τ ) . Finally, for f ∈ R | Ω P ( B k ) | let ˆ f ∈ R | Ω P | be such that ˆ f ( σ ∪ τ ) = f ( σ ) for all σ ∈ Ω P ( B k ) and τ ∈ Ω P ( B ck ) . Then, (cid:104) f, I τk f (cid:105) π ( ·| τ ) = (cid:88) σ,σ (cid:48) ∈ Ω P ( B k ) f ( σ ) f ( σ (cid:48) ) I τk ( σ, σ (cid:48) ) π ( σ | τ )= (cid:88) τ ∈ Ω P ( B ck ) (cid:88) σ,σ (cid:48) ∈ Ω P ( B k ) ˆ f ( σ ∪ τ ) ˆ f ( σ (cid:48) ∪ τ ) T Q k T ∗ ( σ ∪ τ, σ (cid:48) ∪ τ ) π ( σ ∪ τ )= (cid:104) ˆ f , T Q k T ∗ ˆ f (cid:105) π = (cid:104) Q k T ∗ ˆ f , Q k T ∗ ˆ f (cid:105) π ≥ , where in the last equality we used that Q k = Q k = Q ∗ k which follows from Fact 4.6. Thus, I τk is positivesemidefinite for all ≤ k ≤ m and τ ∈ Ω( B ck ) . In this subsection we prove Lemma 4.4 by showing that λ ( I τk ) ≥ e − βdL d for all k = 1 , . . . , m and τ ∈ Ω P ( B τk ) . As mentioned earlier, our proof uses the fact in each tiling the small d -dimensional cubesdo not interact with each other. Hence, I τk is a product Markov chain where each component acts onexactly one of the d -dimensional cubes of the tiling B k . The spectral gap of I τk is then given by thesmallest spectral gap of any component. The spectral gap of any component can be bounded using a crudecoupling argument, since each component acts on a set of constant volume. We proceed to formalize theseideas.The following linear algebra fact about the spectrum of a product Markov chain will be used in theproof of Lemma 4.4. Lemma 4.7.
Let S , . . . , S t be a finite spaces, and call C = S × · · · × S t their cartesian product. For i =1 , . . . , t let P i be the transition matrix of an ergodic Markov chain acting on S i reversible w.r.t. a probabilitymeasure ϕ i on S i . Let P = (cid:81) ti =1 P i be the matrix given by P ( x, y ) = t (cid:89) i =1 P i ( x i , y i ) , where x = ( x , . . . , x t ) ∈ C and y = ( y , . . . , y t ) ∈ C , x i ∈ S i , and y i ∈ S i . Then, λ ( P ) = min i =1 ,...,t λ ( P i ) . We provide next the proof of Lemma 4.4.
Proof of Lemma 4.4.
Recall that λ min = min k =1 ,...,m min τ ∈ Ω( B ck ) λ ( I τk ) . We claim that I τk is a product chain. Indeed, if B (1) k , . . . , B ( l k ) k are the d -dimensional cubes that form thetiling B k and I τkj is the isolated vertices dynamics acting on B ( j ) k (with the boundary condition inducedby τ ), then for σ, σ (cid:48) ∈ Ω P ( B k ) , I τk ( σ, σ (cid:48) ) = l k (cid:89) j =1 I τkj ( σ ( B ( j ) k ) , σ (cid:48) ( B ( j ) k )) . Hence, by Lemma 4.7 λ ( I τk ) = min j =1 ,...,l k λ ( I τkj ) .
16e bound λ ( I τkj ) via a crude coupling argument. Since | B ( j ) k | ≤ L d , the probability that in the firststep of I τkj every vertex is isolated is (1 − p ) K , where K ≤ dL d is the number of edges incident to B ( j ) k .Starting from two arbitrary configurations in B ( j ) k , if all vertices become isolated in both configurations,then we can couple them with probability . Hence, we can couple two arbitrary configurations in onestep with probability at least (1 − p ) dL d . Therefore, the probability that the two copies have not coupleafter − p ) − dL d steps is at most / by Markov’s inequality. Then, the mixing time of I τkj is at most − p ) − dL d = 4e βdL d for each k = 1 , . . . , m , τ ∈ Ω P ( B ck ) and j = 1 , . . . , l k . Consequently, λ ( I τk ) ≥ e − βdL d by (5).For completeness, we also provide here a proof of Lemma 4.7. Proof of Lemma 4.7. P is reversible w.r.t. ϕ = ⊗ ni =1 ϕ i . Moreover, if { f ( i ) j , l ( i ) j , j = 1 , . . . , | S i |} denoteeigenfunctions and eigenvalues of P i , respectively, then F k ( x ) = t (cid:89) i =1 f ( i ) k i ( x i ) , l k = t (cid:89) i =1 l ( i ) k i , are the eigenfunctions and eigenvalues of P , where k = ( k , . . . , k t ) , and k i = 1 , . . . , | S i | , for all i =1 , . . . , t . To see this, note that { F k } , k = ( k , . . . , k t ) , form an orthogonal basis in L ( C , ϕ ) , such that P F k = l k F k . This implies that F k , l k are the eigenfunctions and eigenvalues of P .Now, suppose that l ( i )2 is the eigenvalue l ( i ) j (cid:54) = 1 with maximal absolute value for all i , so that λ ( P i ) =1 − l ( i )2 . Then, by taking all l ( i ) k i = 1 except for the one index i and by setting l k = l ( i )2 one has λ ( P ) =1 − max i l ( i )2 . In this section we use our results for the tiled block dynamics in Section 3 to deduce a tight spectral gapbound for general heat-bath block dynamics. Let V ⊂ Z d be a d -dimensional cube of volume n , G = ( V, E ) the induced subgraph and ψ a fixed boundary condition on ∂V .Let A = { A , . . . , A r } be a collection of blocks such that A i ⊂ V and V = ∪ i A i . Let B A be thetransition matrix of the heat-bath block dynamics w.r.t. A . Recall that given a configuration σ t ∈ Ω attime t a step of the heat-bath block dynamics picks a block A i u.a.r. and updates the configuration in A i with a sample from µ ψ ( ·| σ t ( V \ A i )) . We prove here that λ ( B A ) = Ω( r − ) whenever SSM holds. That is,we establish Theorem 1.2 from the introduction.In the proof of this theorem we relate the spectral gap of B A to that of the following block dynamics.Let V e and V o be the set of all even and all odd vertices of V , respectively. A vertex is even (resp., odd) ifits coordinate sum in Z d is even (resp., odd). Let B eo be the heat-bath block dynamics w.r.t. { V e , V o } . Acrucial part of the proof of Theorem 1.2 is the following. Lemma 5.1.
SSM implies that λ ( B eo ) = Ω(1) . The other key ingredients in the proof of Theorem 1.2 are two properties of the variance functional: mono-tonicity and tensorization. (Recall that for A ⊆ V , K A denotes the matrix that corresponds to the heat-bathupdate in A and that we use E A for the Dirichlet form of K A .) Fact 5.2.
Let A ⊆ B ⊆ V . Then, for any f ∈ R | Ω | , E A ( f, f ) ≤ E B ( f, f ) . act 5.3. Let U = ∪ U i ⊆ V such that K U i K U j = K U j K U i for all i (cid:54) = j . Then, for any f ∈ R | Ω | E U ( f, f ) ≤ (cid:88) i E U i ( f, f ) . We are now ready to prove Theorem 1.2.
Proof of Theorem 1.2.
For any f ∈ R | Ω | , we have E B A ( f, f ) = r (cid:80) ri =1 E A i ( f, f ) . By Fact 5.2, if A (cid:48) i ⊂ A i ,then E A i ( f, f ) ≥ E A (cid:48) i ( f, f ) . Thus, we may assume without loss of generality that A is a partition of V .Fact 5.2 also implies E A i ( f, f ) ≥ E A i ∩ V e ( f, f ) + E A i ∩ V o ( f, f )2 Hence, E B A ( f, f ) ≥ r r (cid:88) i =1 E A i ∩ V e ( f, f ) + E A i ∩ V o ( f, f )2 . For i (cid:54) = j , dist( A i ∩ V e , A j ∩ V e ) ≥ , since by assumption A i ∩ A j = ∅ . Then, K A i ∩ V e K A j ∩ V e = K A j ∩ V e K A i ∩ V e and r (cid:88) i =1 E A i ∩ V e ( f, f ) ≥ E V e ( f, f ) by Fact 5.3. Similarly, we get (cid:80) ri =1 E A i ∩ V o ( f, f ) ≥ E V o ( f, f ) . Hence, E B A ( f, f ) ≥ E V e ( f, f ) + E V o ( f, f )2 r = 1 r E B eo ( f, f ) . Since B A and B eo are both positive semidefinite we get λ ( B A ) ≥ r λ ( B eo ) by (6). The result follows fromLemma 5.1.To prove Lemma 5.1 we use our results for tiled block dynamics from Section 3. In particular, we con-sider the tiled block dynamics that picks one tiling B i from D = { B , . . . , B m } u.a.r. and with probability / performs a heat-bath update in B i ∩ V e and otherwise updates B i ∩ V o . The restriction of this tiledblock dynamics to each B i is not a product Markov chain, as it was the case in the previous applicationof our technology to the SW dynamics in Section 4. Hence, we cannot hope to use Lemma 4.7 for prod-uct Markov chains directly. To work around this difficulty we consider systematic scan variants of therestricted chains. Proof of Lemma 5.1.
For ease of notation let µ = µ ψ . Let P be the transition matrix of the tiled variant of B eo that given a configuration σ t proceeds as follows:1. Pick j ∈ { , ..., m } u.a.r.;2. With probability / update the spins of V e ∩ B j with a sample from µ ( ·| σ t ( V \ ( V e ∩ B j ))) ;3. Otherwise, update the configuration in V o ∩ B j with a sample from µ ( ·| σ t ( V \ ( V o ∩ B j ))) .18his chain is reversible w.r.t. µ and ergodic; the latter follows directly from the assumption that the heat-bath Glauber dynamics is ergodic (see Section 4.1).By Fact 5.2, E V e ( f, f ) ≥ E V e ∩ B j ( f, f ) and E V o ( f, f ) ≥ E V o ∩ B j ( f, f ) for any f ∈ R | Ω | . Thus, E B eo ( f, f ) = E V e ( f, f ) + E V o ( f, f )2 ≥ E V e ∩ B j ( f, f ) + E V o ∩ B j ( f, f )2 ≥ m m (cid:88) j =1 E V e ∩ B j ( f, f ) + E V o ∩ B j ( f, f )2 = E P ( f, f ) . (12)Since both P and B eo are averages of positive semidefinite matrices (see Fact 3.3), they are also positivesemidefinite and so λ ( B eo ) ≥ λ ( P ) . We bound next λ ( P ) . For each j = 1 , . . . , m and each configuration τ ∈ Ω( B cj ) , we consider theMarkov chain with transition matrix P τj whose state space is the set Ω τ ( B j ) of valid configurations in B j given that τ is the configuration in B cj . Given a configuration σ t , this chain obtains σ t +1 as follows:1. With probability / update the spins of V e ∩ B j with a sample from µ ( ·| σ t ( B j \ ( V e ∩ B j )) , τ ) ;2. Otherwise, update the configuration in V o ∩ B j with a sample from µ ( ·| σ t ( B j \ ( V o ∩ B j )) , τ ) .It is straightforward to check that this chain is ergodic and reversible w.r.t. ϕ = µ ( ·| τ ) . Moreover, P τj ispositive semidefinite since it is an average of heat-bath updates (see Fact 3.3). (Observe that the Markovchains P τj ’s correspond to the S τj ’s from Secion 3.)Let λ min = min j =1 ,...,m min τ ∈ Ω( B cj ) λ ( P τj ) . By Lemma 3.2, λ ( P ) ≥ λ min λ ( B D ) and, by Lemma 3.1, λ ( B D ) ≥ , provided L is a large enough constantindependent of n and that there is SSM. Hence, λ ( B eo ) ≥ λ min . (13)We show next that λ min = Ω(1) by bounding λ ( P τj ) for each j and τ . Fix j and τ and let P e (resp., P o )be the transition matrix that corresponds to updating the configuration in V e ∩ B j (resp., V o ∩ B j ) with a newconfiguration distributed according to the conditional measure given the configuration in B j \ ( V e ∩ B j ) (resp., B j \ ( V o ∩ B j ) ) and τ . P e and P o are reversible w.r.t. ϕ and P τj = P e + P o .Let P eoe = P e P o P e be a systematic scan variant of P τj and let P leoe be the “lazy” version of P eoe thatwith probability / stays put and with probability / proceeds like P eoe ; that is, P leoe = P eoe +7 I . Weshow that three steps of the chain P τj are as fast as one of P leoe . For this, note that ( P τj ) = 18 ( P e P o P e + P e + P o + P e P o + P o P e + P o P e + P e P o + P o P e P o ) . (14)Each of the terms in the right hand side of (14) is at most (cid:104) f, f (cid:105) ϕ by (4). Thus, (cid:104) f, ( P τj ) f (cid:105) ϕ ≤ (cid:104) f, P e P o P e f (cid:105) ϕ + 78 (cid:104) f, f (cid:105) ϕ = (cid:104) f, P leoe f (cid:105) ϕ . By Fact 3.3 the matrices P e and P o are positive semidefinite, and thus P τj , ( P τj ) , P eoe and P leoe are alsopositive semidefinite. Then, λ (( P τj ) ) ≥ λ ( P leoe ) . x − x + 2 ≥ for | x | ≤ , we have λ ( P τj ) ≥ λ (( P τj ) ) . Moreover, E P leoe ( f, f ) = (cid:104) f, ( I − P leoe ) f (cid:105) ϕ = 18 E P eoe ( f, f ) , and so λ ( P leoe ) = λ ( P eoe ) . Hence, λ ( P τj ) ≥ λ ( P eoe ) . (15)We bound next λ ( P eoe ) . Let B (1) j , B (2) j , . . . , B ( l ) j be the d -dimensional cubes of volume at most L d thatform the tiling B j .For k = 1 , . . . , l let P ( k )e and P ( k )o be the | Ω τ ( B ( k ) j ) |×| Ω τ ( B ( k ) j ) | transition matrices that correspond toa heat-bath update on V e ∩ B ( k ) j and V o ∩ B ( k ) j , respectively. Let P ( k ) eoe = P ( k )e P ( k )o P ( k )e . For σ, σ (cid:48) ∈ Ω τ ( B j ) ,we have P eoe ( σ, σ (cid:48) ) = l (cid:89) k =1 P ( k ) eoe ( σ ( B ( k ) j ) , σ (cid:48) ( B ( k ) j )) . Moreover, P ( k ) eoe ergodic and reversible w.r.t. the probability measure induced in B ( k ) j by ϕ . The formerfollows from the fact that by assumption the heat-bath dynamics on B ( k ) j is ergodic; see Section 2.2. Thus,Lemma 4.7 implies λ ( P eoe ) = min k =1 ,...,l λ ( P ( k ) eoe ) . (16)We bound λ ( P ( k ) eoe ) for each k with a crude coupling argument. This is sufficient because each B ( k ) j hasvolume at most L d = O (1) . For any U ⊆ B ( k ) j and any spin configuration η on B ( k ) j \ U , the probability ofeach valid configuration on U given η and τ can be crudely bounded from below by ( q e) − Ω( L d ) . Since P ( k ) eoe is irreducible, for any pair of configurations σ , σ (cid:48) of B ( k ) j , we can go from σ to σ (cid:48) in at most T = q L d steps. Therefore, the probability that a realization of P ( k ) eoe follows this sequence of updates is then at least ( q e) − Ω( T L d ) . Moreover, the probability that an instance of P ( k ) eoe that starts in σ (cid:48) remains at σ (cid:48) after T steps is also at least ( q e) − Ω( T L d ) . Thus, there exists a coupling for the steps of P ( k ) eoe that starting froman arbitrary pair of configurations couples in O (1) steps with probability Ω(1) . Consequently, λ ( P ( k ) eoe ) =Ω(1) for all k . This bound together with (16) and (15) imply that λ ( P τj ) = Ω(1) , and so λ min = Ω(1) . Theresult follows from (13).We conclude this section with the proofs of Facts 5.2 and 5.3. Proof of Fact 5.2.
Since A ⊆ B , K B = K A K B K A . Then, for any f ∈ R | Ω | (cid:104) f, K B f (cid:105) µ = (cid:104) f, K A K B K A f (cid:105) µ = (cid:104) K A f, K B K A f (cid:105) µ ≤ (cid:104) K A f, K A f (cid:105) µ = (cid:104) f, K A f (cid:105) µ , where the inequality follows from (4). Then, we get E B ( f, f ) ≥ E A ( f, f ) . Proof of Fact 5.3.
To simplify the notation, let K i = K U i and K ij = K U i ∪ U j . By assumption K ij = K i K j = K j K i ; also, K i = K i . Then, I − K i = ( I − K i ) , ( I − K i )( I − K j ) = ( I − K j )( I − K i ) and (cid:104) f, ( I − K i − K j + K ij ) f (cid:105) µ = (cid:104) f, ( I − K i )( I − K j ) f (cid:105) µ = (cid:104) ( I − K i )( I − K j ) f, ( I − K i )( I − K j ) f (cid:105) µ ≥ . Hence, E K i ( f, f ) + E K j ( f, f ) ≥ E K ij ( f, f ) . Applying this to U and U first, and then iterating we getthe result. 20 SSM and the system scan dynamics
Let V ⊂ Z d be a finite d -dimensional cube of volume n . Let G = ( V, E ) be the induced subgraph and let ψ be a fixed boundary condition on ∂V . For ease of notation we use µ for µ ψ .We consider in this section the class of systematic scan Markov chains on G . In a systematic scan chainthere is a fixed ordering O of the vertices of G and one step of the chain consists of updating every v ∈ V according to the conditional distribution at v given the configuration of its neighbors and the boundarycondition ψ , in the order specified by O . We use M ( O ) to denote the systematic scan dynamics w.r.t. theordering O and S ( O ) to denote its transition matrix. Hence, if O = { v , . . . , v n } S ( O ) = K v . . . K v n (Recall that K v i is the transition matrix corresponding to a heat-bath update in v i .) Since each K v i leaves µ invariant then µ is the equilibrium distribution of S ( O ) . In general S ( O ) is non-reversible, but one canobtain a reversible matrix by multiplicative symmetrization (see, e.g., [17, 37]): S ( O ) S ( O ) ∗ = K v . . . K v n − K v n K v n − . . . K v , which corresponds to the systematic scan dynamics M ( O (cid:48) ) with O (cid:48) = { v , . . . , v n , . . . , v } .In this section we prove three results related to the speed of convergence to equilibrium of systematicscan dynamics. These results correspond to Theorems 1.5 and 1.6 from the introduction.The first of our results concerns the alternating scan dynamics , which corresponds to the systematicscan dynamics whose ordering consists of first all the even vertices and then all the odd ones. In fact,we consider the multiplicative reversiblization of this dynamics as above. More formally, let EO be anordering of the vertices of V that first contains all even vertices and then all the odd ones. Similarly definethe ordering EOE , that contains all even vertices, then all the odd ones, and finally all the even ones again.The alternating scan dynamics on G correponds to the systmatic scan dynamics M ( EO ) . The relaxationtime of the non-reversible chain M ( EO ) is given by τ rel ( M ( EO )) = 11 − (cid:112) − λ ( S ( EOE )) ; (17)see, e.g., [17, 37]. Thus, we may restrict our attention to estimating the spectral gap of the reversibleMarkov chain M ( EOE ) . Let V e (resp., V o ) be the set of the even (resp., the odd) vertices of G . Then, S ( EOE ) = K V e K V o K V e . We prove the following. Theorem 6.1.
SSM implies that λ ( S ( EOE )) ≥ Ω(1) . We observe that Theorem 6.1 and (17) imply Theorem 1.6 from the introduction.For the special case of monotone spin systems we show that SSM implies rapid mixing of any systematicscan dynamics. In a monotone system for each vertex v ∈ V there is a linear ordering (cid:23) v of the spins.These linear orderings induce a partial order (cid:23) over the state space. The spin system is monotone w.r.t. thispartial order if for every B ⊂ V and every pair of boundary conditions ξ (cid:23) ξ on ∂B , µ ξ B stochasticallydominates µ ξ B . From this definition it follows that a monotone system has unique maximal and minimalconfigurations in the partial order (cid:23) , a fact that will be crucially used in our proofs. Several well-knownspin systems, including the Ising model and the hard-core model, are monotone systems.For monotone systems we establish the following two theorems which together imply Theorem 1.5from the introduction. Theorem 6.2.
Let O be an ordering of the vertices in V . In a monotone system SSM implies that the mixingtime of M ( O ) is O (log n (log log n ) ) .
21e emphasize that Theorem 6.2 holds for any ordering O and any boundary condition ψ on ∂V .Let L ( O ) be the length of the longest subsequence of O that is a path in G . With the additionalassumption that L ( O ) = O (1) we can prove a slightly better bound for the mixing time of the systematicscan dynamics. Theorem 6.3.
Let O be an ordering of the vertices in V such that L ( O ) = O (1) . In a monotone system SSMimplies that the mixing time of M ( O ) is O (log n ) and that the spectral gap of M ( O ) is Ω(1) . We proceed to give proofs to these three theorems. We start with the proof of Theorem 6.1, which isdeduced straightforwardly from the following more general fact.
Lemma 6.4.
Let
S, T be positive semidefinite stochastic matrices, reversible w.r.t. µ . Assume that S is alsoidempotent. Then, for all a ∈ [0 , λ ( S T S ) ≥ λ ( aS + (1 − a ) T ) (18) Proof of Theorem 6.1.
Since, by Fact 3.3, K V e and K V o are positive semidefinite matrices, and K V e is idem-potent, it follows from Lemma 6.4 that λ ( S ( EOE )) ≥ λ (cid:18) K V e + K V o (cid:19) = λ ( B eo ) , where B eo is the block dynamics considered in Section 5. From Lemma 5.1 we know that λ ( B eo ) = Ω(1) whenever SSM holds, and thus the result follows. Proof of Lemma 6.4.
Let P = ST S . For any f ∈ R | Ω | (cid:104) f, P f (cid:105) µ = (cid:104) f, ST Sf (cid:105) µ = (cid:104) Sf, T Sf (cid:105) µ ≥ , since by assumption T is positive semidefinite. Hence, P is positive semidefinite and λ ( P ) = 1 − λ ( P ) ,where λ ( P ) is the maximal eigenvalue of P different from . By the variational principle (see (6)) λ ( P ) = max f : µ ( f )=0 , (cid:107) f (cid:107)≤ (cid:104) f, P f (cid:105) µ , where µ ( f ) = (cid:80) σ ∈ Ω f ( σ ) µ ( σ ) and (cid:107) f (cid:107) = (cid:104) f, f (cid:105) µ .Let Q = aS + (1 − a ) T and let g ∈ R | Ω | be such that µ ( g ) = 0 and (cid:107) g (cid:107) = 1 . Then λ ( Q ) = max f : µ ( f )=0 , (cid:107) f (cid:107)≤ (cid:104) f, Qf (cid:105) µ ≥ (cid:104) Sg, QSg (cid:105) µ = (cid:104) g, SQSg (cid:105) µ , (19)where the inequality follows from the fact that any g with µ ( g ) = 0 and (cid:107) g (cid:107) = 1 satisfies µ ( Sg ) = (cid:104) (cid:126) , Sg (cid:105) µ = (cid:104) S · (cid:126) , g (cid:105) µ = µ ( g ) = 0 , and (cid:107) Sg (cid:107) = (cid:104) g, S g (cid:105) µ ≤ by (4). On the other hand, (cid:104) f, P f (cid:105) µ = (cid:104) Sf, T Sf (cid:105) µ ≤ (cid:104) Sf, Sf (cid:105) µ = (cid:104) f, Sf (cid:105) µ , where the inequality follows from (4). Hence, (cid:104) g, SQSg (cid:105) µ = (cid:104) g, aS + (1 − a ) P g (cid:105) µ = a (cid:104) g, Sg (cid:105) µ + (1 − a ) (cid:104) g, P g (cid:105) µ ≥ (cid:104) g, P g (cid:105) µ . Therefore, taking g as a normalized eigenfunction corresponding to λ ( P ) we get λ ( Q ) ≥ λ ( P ) . This proves λ ( ST S ) ≥ λ ( aS + (1 − a ) T ) , as desired.22 emark . A reverse inequality for (18) also holds. Indeed, using the same argument as in the proof of theestimate (15) one has, for all a ∈ (0 , : λ ( ST S ) ≤ a (1 − a ) λ ( aS + (1 − a ) T ) . We provide next the proofs of Theorems 6.2 and 6.3.
Proof of Theorem 6.2.
Let σ (1) , . . . , σ ( k ) be spin configurations such that σ (1) (cid:23) · · · (cid:23) σ ( k ) . For any v ∈ V , the monotonicity of the system implies that there exists a monotone coupling for updating v simultaneously in σ (1) , . . . , σ ( k ) such that the resulting configurations, denoted σ (1) v , . . . , σ ( k ) v , satisfy σ (1) v (cid:23) · · · (cid:23) σ ( k ) v . These local couplings can be straightforwardly extended to a monotone couplingfor the steps of any number of copies M ( O ) . Indeed, let { X (1) t } , . . . , { X ( k ) t } be k copies of M ( O ) andsuppose X (1) t (cid:23) · · · (cid:23) X ( k ) t . If the local monotone couplings are used to update each vertex v ∈ V in X (1) t , . . . , X ( k ) t , sequentially in the order specified by O , then X (1) t +1 (cid:23) · · · (cid:23) X ( k ) t +1 .We bound the coupling time of the monotone coupling for two instances { X t } and { Y t } of M ( O ) .Since in monotone systems there are unique maximal and minimal configurations in the partial order, itis sufficient to analyze the coupling time starting from these extremal configurations. Thus, suppose that X and Y are the maximal and minimal configurations, respectively, and let T coup be the coupling timeof the monotone coupling starting from these two configurations.We show that T coup ≤ T = c log n (log log n ) for a suitable constant c > . This implies that themixing time of M ( O ) is O (log n (log log n ) ) , as claimed. The proof is inductive. For the base case ofthe induction, observe that if | V | ≤ n , where n ≥ is a large constant we choose later, then we canchoose c = c ( n ) large enough such that for any boundary condition on ∂V the coupling time boundholds. This is a consequence of the irreducibility of M ( O ) which follows from the assumption that theGlauber dynamics is irreducible; see Section 2.2.Let us assume now inductively that for all d -dimensional cubes V (cid:48) ⊂ Z d such that | V (cid:48) | ≤ (4 a − log n ) d (where a is the constant in the definition of SSM), any boundary condition on ∂V (cid:48) and any ordering O (cid:48) ofthe vertices of V (cid:48) we have that the coupling time of the monotone coupling in the subgraph induced by V (cid:48) (w.r.t. ordering O (cid:48) ) is at most c log | V (cid:48) | (log log | V (cid:48) | ) .We show that, for all v ∈ V , after T = c log n (log log n ) steps of the monotone coupling, we have Pr[ X T ( v ) (cid:54) = Y T ( v )] ≤ n . (20)A union bound over the vertices implies that T coup ≤ T . We introduce some notation first.For v ∈ V and (cid:96) > , let B v ( (cid:96) ) ⊂ V be the intersection of V with the d -dimensional cube of Z d ofside length (cid:96) + 1 centered at v . Let r = 2 a − log n , B v = B v ( r ) and B cv = V \ B v .For each v ∈ V we consider four additional copies of M ( O ) : { W t } , { W µt } , { Z t } and { Z µt } . Thesefour chains ignore all the updates outside of B v and their steps are coupled with those of { X t } and { Y t } .More precisely, for each u ∈ V (in the order specified by O ), if u ∈ B v then the local monotone coupling isused to update the configurations in W t ( u ) , W µt ( u ) , Z t ( u ) , Z µt ( u ) , X t ( u ) and Y t ( u ) . Otherwise, if u (cid:54)∈ B v ,the local monotone coupling is used only to update X t ( u ) and Y t ( u ) and W t ( u ) , W µt ( u ) , Z t ( u ) and Z µt ( u ) are not updated.We specify next the initial configuration of these chains. We set W = X , Z = Y , W µ ( B cv ) = X ( B cv ) and Z µ ( B cv ) = Y ( B cv ) . To define the configurations of W µ and Z µ in B v , let φ w and φ z be thestationary measures of { W t } and { Z t } , respectively. These are the distributions induced in B v by theconfigurations in W µ ( B cv ) and Z µ ( B cv ) , respectively, and possibly the boundary condition ψ on ∂V . Theconfigurations in W µ ( B v ) and Z µ ( B v ) are sampled independently from φ w and φ z , respectively.23ur choice of initial configurations and the monotonicity of the coupling imply that W t (cid:23) X t (cid:23) Y t (cid:23) Z t for all t ≥ . Hence, Pr[ X T ( v ) (cid:54) = Y T ( v )] ≤ Pr[ W T ( v ) (cid:54) = Z T ( v )] ≤ Pr[ W T ( v ) (cid:54) = W µT ( v )] + Pr[ W µT ( v ) (cid:54) = Z µT ( v )] + Pr[ Z µT ( v ) (cid:54) = Z T ( v )] , (21)where the second inequality follows from a union bound. We bound the first and third terms in the right-hand side of (21) using the inductive hypothesis. The bound for the inner term follows from SSM.The chains { W t } , { Z t } , { W µt } and { Z µt } are systematic scan dynamics on B v w.r.t. the ordering O ( B v ) that O induces on the vertices of B v . Since | B v | ≤ (2 r ) d = (4 a − (log n )) d , for n sufficiently large and n ≥ n c log n (log log n ) ≥ c log | B v | (log log | B v | ) log (12 n ) . So, the inductive hypothesis and (3) imply that
Pr[ W T ( v ) (cid:54) = W µT ( v )] ≤ / (12 n ) . The same bound for Pr[ Z µT ( v ) (cid:54) = Z T ( v )] can be deduced analogously.To bound the probability that W µT ( v ) (cid:54) = Z µT ( v ) , i.e., the inner term of (21), let us assume without ofgenerality that the linear ordering on the spins is q (cid:23) q − (cid:23) · · · (cid:23) . Since, W µt ( v ) (cid:23) Z µt ( v ) for all t ≥ ,then W µt ( v ) ≥ Z µt ( v ) . Moreover, the configurations in W µt ( B v ) and Z µt ( B v ) are distribtued according to φ w and φ z , respectively, for all t ≥ . Therefore, Pr[ W µT ( v ) (cid:54) = Z µT ( v )] ≤ E[ W µT ( v ) − Z µT ( v )] ≤ ( q − (cid:107) φ w ,v − φ z ,v (cid:107) tv , where φ w ,v and φ z ,v are the distributions induced in { v } by φ w and φ z , respectively. Hence, SSM and aunion bound over the boundary of B v imply that Pr[ W µT ( v ) (cid:54) = Z µT ( v )] ≤ ( q − b | ∂B v | exp( − ar ) ≤ q − bd (4 a − log n ) d − n ≤ n , where in the second inequality we used that | ∂B v | ≤ d (2 r ) d − and r = 2 a − log n and the last oneholds for all n ≥ n and n large enough. Putting all these bounds together we get (20). A union boundover the vertices implies that Pr[ X T (cid:54) = Y T ] ≤ / . Consequently, T coup ≤ T = c log n (log log n ) andthe mixing time of M ( O ) is O (log n (log log n ) ) . Proof of Theorem 6.3.
Let { X t } , { Y t } be two copies of M ( O ) such that X and Y are the unique maximaland minimal configurations of the partial order, respectively. We couple these two realizations of M ( O ) with the monotone coupling described at the beginning of the proof of Theorem 6.2, where we establishedthat the coupling time of this monotone coupling in a d -dimensional cube V with an arbitrary boundarycondition ψ is at most c log | V | (log log | V | ) .Let ρ ( t ) = max v ∈ V Pr[ X t ( v ) (cid:54) = Y t ( v )] . We show that ρ ( T ) ≤ /n for some T = O (log n ) . A union bound over the vertices then implies that Pr[ X T (cid:54) = Y T ] ≤ /n , and thus τ mix ( M ( O ) , /n ) = O (log n ) . Consequently, the mixing time of M ( O ) is at most T = O (log n ) and its relaxation time is O (1) by (5).To bound ρ we establish a recurrence relation. Prior to this, we show that after t = (cid:100) (log n ) log n (cid:101) steps ρ ( t ) ≤ /n , where n is sufficiently large constant. This will provide a stopping point for ourrecurrence for ρ .As before, for v ∈ V and (cid:96) > , let B v ( (cid:96) ) ⊂ V be the intersection of the d -dimensional cube ofside length (cid:96) + 1 centered at v with V . Let r = (cid:98) n /d / (cid:99) and let B v = B v ( r ) . Let { W t } and { Z t } betwo auxiliary copies of M ( O ) such that X ( B v ) = W ( B v ) and Y ( B v ) = Z ( B v ) . In B cv = V \ B v ,24 and Z have the same fixed configuration; this configuration can be any valid configuration provided W ( B cv ) = Z ( B cv ) .These four copies of the chain are coupled with the monotone coupling, but { W t } and { Z t } ignore allthe updates outside of B v . That is, for each u ∈ V (in the order specified by O ), if u ∈ B v then the localmonotone coupling is used to update the spins of W t ( u ) , Z t ( u ) , X t ( u ) and Y t ( u ) . Otherwise, if u (cid:54)∈ B v ,the local monotone coupling is used only to update X t ( u ) and Y t ( u ) and W t ( u ) , Z t ( u ) are not updated. Aunion bound implies Pr[ X t ( v ) (cid:54) = Y t ( v )] ≤ Pr[ X t ( v ) (cid:54) = W t ( v )] + Pr[ W t ( v ) (cid:54) = Z t ( v )] + Pr[ Z t ( v ) (cid:54) = Y t ( v )] . Let l = L ( O ) and observe that r = (cid:98) n /d / (cid:99) > t l for sufficiently large n . Thus, Pr[ X t ( v ) (cid:54) = W t ( v )] =0 and Pr[ Z t ( v ) (cid:54) = Y t ( v )] = 0 , since it is impossible for disagreements to propagate from ∂B to v . Hence, Pr[ X t ( v ) (cid:54) = Y t ( v )] ≤ Pr[ W t ( v ) (cid:54) = Z t ( v )] . Now, let O ( B v ) be the ordering induced on B v by O . Since | B v | ≤ (2 r ) d ≤ n , the coupling time ofthe monotone coupling for the systematic scan chain on B v (w.r.t. O ( B v ) ) is at most (log n ) , provided n is sufficiently large (see proof of Theorem 6.2). Hence, since t = (cid:100) (log n ) log n (cid:101) , (3) implies that Pr[ W t ( v ) (cid:54) = Z t ( v )] ≤ /n and so ρ ( t ) ≤ n . (22)We establish next our recurrence for ρ . We prove that ρ (2 t ) ≤ (4 tl ) d ρ ( t ) (23)for all t = o ((log n ) ) . Let A be the event that X t ( B v (2 tl )) (cid:54) = Y t ( B v (2 tl )) . (The restriction that t = o ((log n ) ) is to ensure that tl (cid:28) n and avoid unnecessary complications.) Then, Pr[ X t ( v ) (cid:54) = Y t ( v )] ≤ Pr[ X t ( v ) (cid:54) = Y t ( v ) |A ] Pr[ A ] + Pr[ X t ( v ) (cid:54) = Y t ( v ) |¬A ] . Observe that
Pr[ X t ( v ) (cid:54) = Y t ( v ) |A ] ≤ ρ ( t ) , since ρ ( t ) is the maximum probability of disagreement atany vertex assuming the worst possible pair of staring configurations. Moreover, Pr[ A ] ≤ | B v (2 tl ) | ρ ( t ) by a union bound and Pr[ X t ( v ) (cid:54) = Y t ( v ) |¬A ] = 0 since disagreements can only propagate a distance ofat most tl in t steps. Hence, for all v ∈ V , Pr[ X t ( v ) (cid:54) = Y t ( v )] ≤ (4 tl ) d ρ ( t ) , and (23) follows.Finally, we use this recurrence together with the stopping point in (22) to show that ρ ( T ) ≤ /n forsome T = O (log n ) . Let φ ( t ) = (8 tl ) d ρ ( t ) . Then, φ (2 t ) ≤ φ ( t ) , and so for T = 2 α t we get ρ ( T ) ≤ φ ( T ) ≤ φ ( t ) T/t . Since φ ( t ) = (8 t l ) d ρ ( t ) ≤ (8 t l ) d /n and t = (cid:100) (log n ) log n (cid:101) , for large enough n we have φ ( t ) ≤ / e and thus ρ ( T ) ≤ e − T/t . Taking T = O (log n ) (i.e., α = O (log log n ) ), we get ρ ( T ) ≤ /n as desired. 25 eferences [1] M. Aizenman and R. Holley. Rapid convergence to equilibrium of stochastic Ising models in theDobrushin Shlosman regime. In Percolation theory and ergodic theory of infinite particle systems ,pages 1–11. Springer, 1987.[2] K.S. Alexander. On weak mixing in lattice models.
Probability Theory and Related Fields , 110(4):441–471, 1998.[3] V. Beffara and H. Duminil-Copin. The self-dual point of the two-dimensional random-cluster modelis critical for q ≥ . Probability Theory and Related Fields , 153:511–542, 2012.[4] A. Blanca and A. Sinclair. Dynamics for the mean-field random-cluster model.
Proceedings of the 19thInternational Workshop on Randomization and Computation , pages 528–543, 2015.[5] C. Borgs, J. Chayes, and P. Tetali. Swendsen-Wang algorithm at the Potts transition point.
ProbabilityTheory and Related Fields , 152:509–557, 2012.[6] C. Borgs, A.M. Frieze, J.H. Kim, P. Tetali, E. Vigoda, and V. Vu. Torpid mixing of some Monte CarloMarkov chain algorithms in statistical physics.
Proceedings of the 40th Annual IEEE Symposium onFoundations of Computer Science (FOCS) , pages 218–229, 1999.[7] R. Bubley and M. Dyer. Path coupling: A technique for proving rapid mixing in markov chains. In , pages 223–231. IEEE, 1997.[8] F. Cesi. Quasi-factorization of the entropy and logarithmic Sobolev inequalities for Gibbs randomfields.
Probability Theory and Related Fields , 120(4):569–584, 2001.[9] C. Cooper and A.M. Frieze. Mixing properties of the Swendsen-Wang process on classes of graphs.
Random Structures and Algorithms , 15(3-4):242–261, 1999.[10] H. Duminil-Copin, M. Gagnebin, M. Harel, I. Manolescu, and V. Tassion. Discontinuity of the phasetransition for the planar random-cluster and Potts models with q > . arXiv preprint arXiv:1611.09877 ,2016.[11] H. Duminil-Copin, V. Sidoravicius, and V. Tassion. Continuity of the Phase Transition for Pla-nar Random-Cluster and Potts Models with ≤ q ≤ . Communications in Mathematical Physics ,349(1):47–107, 2017.[12] M. Dyer, L.A. Goldberg, and M. Jerrum. Systematic scan for sampling colorings.
The Annals of AppliedProbability , 16(1):185–230, 2006.[13] M. Dyer, L.A. Goldberg, and M. Jerrum. Dobrushin conditions and systematic scan.
Combinatorics,Probability and Computing , 17(6):761–779, 2008.[14] M. Dyer, L.A. Goldberg, and M. Jerrum. Matrix norms and rapid mixing for spin systems.
The Annalsof Applied Probability , 19(1):71–107, 2009.[15] M. Dyer, A. Sinclair, E. Vigoda, and D. Weitz. Mixing in time and space for lattice spin systems: Acombinatorial view.
Random Structures & Algorithms , 24:461–479, 2004.[16] R.G. Edwards and A.D. Sokal. Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representa-tion and Monte Carlo algorithm.
Physical Review D , 38(6):2009–2012, 1988.2617] J. A. Fill. Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, withan application to the exclusion process.
The annals of applied probability , pages 62–87, 1991.[18] A. Galanis, D. Štefankovič, and E. Vigoda. Swendsen-Wang algorithm on the mean-field Potts model.
Proceedings of the 19th International Workshop on Randomization and Computation , pages 815–828,2015.[19] D. Gamarnik and D. Katz. Correlation decay and deterministic FPTAS for counting list-colorings of agraph. In
Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages1245–1254. SIAM, 2007.[20] R. Gheissari and E. Lubetzky. Mixing times of critical 2D Potts models. arXiv preprint arXiv:1607.02182 ,2016.[21] R. Gheissari and E. Lubetzky. The effect of boundary conditions on mixing of 2D Potts models atdiscontinuous phase transitions. arXiv preprint arXiv:1701.00181 , 2017.[22] H. Guo and M. Jerrum. Random cluster dynamics for the Ising model is rapidly mixing. In
Proceedingsof the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1818–1827.SIAM, 2017.[23] H. Guo, K. Kara, and C. Zhang. Layerwise Systematic Scan: Deep Boltzmann Machines and Beyond. arXiv preprint arXiv:1705.05154 , 2017.[24] T.P. Hayes. A simple condition implying rapid mixing of single-site dynamics on spin systems. In , pages 39–46. IEEE, 2006.[25] T.P. Hayes and A. Sinclair. A general lower bound for mixing of single-site dynamics on graphs.
Annals of Applied Probability , 17(3):931–952, 2007.[26] R. Holley. Possible rates of convergence in finite range, attractive spin systems. In
Particle systems,random media and large deviations , volume 41 of
Contemp. Math. , pages 215–234. Amer. Math. Soc.,Providence, RI, 1985.[27] M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial-time approximation algorithm for the permanentof a matrix with nonnegative entries.
Journal of the ACM , 51(4):671–697, 2004.[28] R. Kannan, L. Lovász, and M. Simonovits. Random walks and an O ∗ ( n ) volume algorithm for convexbodies. Random structures and algorithms , 11(1):1–50, 1997.[29] D.A. Levin, Y. Peres, and E.L. Wilmer.
Markov Chains and Mixing Times . American MathematicalSociety, 2008.[30] L. Li, P. Lu, and Y. Yin. Correlation decay up to uniqueness in spin systems. In
Proceedings of theTwenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 67–84. SIAM, 2013.[31] Y. Long, A. Nachmias, W. Ning, and Y. Peres. A power law of order 1/4 for critical mean-fieldSwendsen-Wang dynamics.
Memoirs of the American Mathematical Society , 232(1092), 2011.[32] F. Martinelli.
Lectures on Glauber dynamics for discrete spin models , volume 1717 of
Springer LectureNotes in Mathematics . Springer Verlag, 1999.[33] F. Martinelli and E. Olivieri. Approach to equilibrium of Glauber dynamics in the one phase region.I. The attractive case.
Communications in Mathematical Physics , 161(3):447–486, 1994.2734] F. Martinelli and E. Olivieri. Approach to equilibrium of Glauber dynamics in the one phase region.II. The general case.
Communications in Mathematical Physics , 161(3):458–514, 1994.[35] F. Martinelli, E. Olivieri, and R.H. Schonmann. For 2-d lattice spin systems weak mixing impliesstrong mixing.
Communications in Mathematical Physics , 165(1):33–47, 1994.[36] F. Martinelli, E. Olivieri, and E. Scoppola. On the Swendsen-Wang dynamics. I. Exponential conver-gence to equilibrium.
Journal of statistical physics , 62(1-2):117–133, 1991.[37] R. Montenegro and P. Tetali.
Mathematical aspects of mixing times in Markov chains . Now PublishersInc, 2006.[38] J. Propp and D. Wilson. Exact sampling with coupled Markov chains and applications to statisticalmechanics.
Random Structures & Algorithms , 9:223–252, 1996.[39] L. Saloff-Coste.
Lectures on finite Markov chains , volume 1665 of
Lecture Notes in Mathematics .Springer Berlin Heidelberg, 1997.[40] A. Sinclair, P. Srivastava, D. Štefankovič, and Y. Yin. Spatial mixing and the connective constant: Opti-mal bounds. In
Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms ,pages 1549–1563. SIAM, 2015.[41] A. Sinclair, P. Srivastava, and M. Thurley. Approximation algorithms for two-state anti-ferromagneticspin systems on bounded degree graphs.
Journal of Statistical Physics , 155(4):666–686, 2014.[42] A. Sly. Computational transition at the uniqueness threshold. In , pages 287–296. IEEE, 2010.[43] A. Sly and N. Sun. The computational hardness of counting in two-spin models on d-regular graphs.In , pages 361–369. IEEE,2012.[44] D.W. Stroock and B. Zegarlinski. The logarithmic Sobolev inequality for discrete spin systems on alattice.
Communications in Mathematical Physics , 149(1):175–193, 1992.[45] R.H. Swendsen and J.S. Wang. Nonuniversal critical dynamics in Monte Carlo simulations.
PhysicalReview Letters , 58:86–88, 1987.[46] M. Ullrich. Comparison of Swendsen-Wang and heat-bath dynamics.
Random Structures and Algo-rithms , 42(4):520–535, 2013.[47] M. Ullrich. Rapid mixing of Swendsen-Wang and single-bond dynamics in two dimensions.
Disser-tationes Mathematicae , 502:64, 2014.[48] M. Ullrich. Swendsen-Wang is faster than single-bond dynamics.
SIAM Journal on Discrete Mathe-matics , 28(1):37–48, 2014.[49] D. Weitz. Counting independent sets up to the tree threshold. In
Proceedings of the Thirty-EighthAnnual ACM Symposium on Theory of Computing , pages 140–149. ACM, 2006.[50] B. Zegarlinski. On log-Sobolev inequalities for infinite lattice systems.