[PDF] The Critical Mean-field Chayes-Machta Dynamics

Abstract

The random-cluster model is a unifying framework for studying random graphs, spin systems and electrical networks that plays a fundamental role in designing efficient Markov Chain Monte Carlo (MCMC) sampling algorithms for the classical ferromagnetic Ising and Potts models. In this paper, we study a natural non-local Markov chain known as the Chayes-Machta dynamics for the mean-field case of the random-cluster model, where the underlying graph is the complete graph on n vertices. The random-cluster model is parametrized by an it edge probability p and a cluster weight q. Our focus is on the critical regime: p = p_c(q) and q \in (1,2), where p_c(q) is the threshold corresponding to the order-disorder phase transition of the model. We show that the mixing time of the Chayes-Machta dynamics is O(\log n \cdot \log \log n) in this parameter regime, which reveals that the dynamics does not undergo an exponential slowdown at criticality, a surprising fact that had been predicted (but not proved) by statistical physicists. This also provides a nearly optimal bound (up to the \log\log n factor) for the mixing time of the mean-field Chayes-Machta dynamics in the only regime of parameters where no non-trivial bound was previously known. Our proof consists of a multi-phased coupling argument that combines several key ingredients, including a new local limit theorem, a precise bound on the maximum of symmetric random walks with varying step sizes, and tailored estimates for critical random graphs. In addition, we derive an improved comparison inequality between the mixing time of the Chayes-Machta dynamics and that of the local Glauber dynamics on general graphs; this results in better mixing time bounds for the local dynamics in the mean-field setting.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b The Critical Mean-ﬁeld Chayes-Machta Dynamics

Antonio Blanca ∗ Alistair Sinclair † Xusheng Zhang ‡ February 8, 2021

Abstract

The random-cluster model is a unifying framework for studying random graphs, spin systems andrandom networks. The model is closely related to the classical ferromagnetic Ising and Potts modelsand is often viewed as a generalization of these models. In this paper, we study a natural non-localMarkov chain known as the

Chayes-Machta dynamics for the mean-ﬁeld case of the random-clustermodel, where the underlying graph is the complete graph on 𝑛 vertices.The random-cluster model is parametrized by an edge probability 𝑝 and a cluster weight 𝑞 . Ourfocus is on the critical regime: 𝑝 = 𝑝 𝑐 ( 𝑞 ) and 𝑞 ∈ ( , ) , where 𝑝 𝑐 ( 𝑞 ) is the threshold correspondingto the order-disorder phase transition of the model. We show that the mixing time of the Chayes-Machta dynamics is 𝑂 ( log 𝑛 · log log 𝑛 ) in this parameter regime, which reveals that the dynamics doesnot undergo an exponential slowdown at criticality, a surprising fact that had been predicted (but notproved) by statistical physicists. This provides a nearly optimal bound (up to the log log 𝑛 factor) forthe mixing time of the mean-ﬁeld Chayes-Machta dynamics in the only regime of parameters whereno previous bound was known. Our proof consists of a multi-phased coupling argument that combinesseveral key ingredients, including a new local limit theorem, a precise bound on the maximum ofsymmetric random walks with varying step sizes, and tailored estimates for critical random graphs. ∗ Pennsylvania State University. Email: [email protected]. Research supported in part by NSF grant CCF-1850443. † UC Berkeley. Email: [email protected]. Research supported in part by NSF grant CCF-1815328. ‡ Pennsylvania State University. Email: [email protected]. Research supported in part by NSF grant CCF-1850443.

Introduction

The random-cluster model was introduced in the 1960s by Fortuin and Kasteleyn [15] as a unifying frame-work for the study of random graphs, spin systems and electrical networks. It is deﬁned on a ﬁnitegraph 𝐺 = ( 𝑉 , 𝐸 ) by an edge probability parameter 𝑝 ∈ ( , ) and a cluster weight 𝑞 >

0. The set of conﬁgurations of the model is the set of all subsets of edges 𝐴 ⊆ 𝐸 . The probability of each conﬁguration 𝐴 is given by the Gibbs distribution: 𝜇 𝐺,𝑝,𝑞 ( 𝐴 ) = 𝑍 · 𝑝 | 𝐴 | ( − 𝑝 ) | 𝐸 |−| 𝐴 | 𝑞 𝑐 ( 𝐴 ) , (1)where 𝑐 ( 𝐴 ) is the number of connected components in ( 𝑉 , 𝐴 ) and 𝑍 is the normalizing constant called the partition function. When 𝑞 = 𝐺 where eachedge appears independently with probability 𝑝 . When 𝑞 > 𝑞 <

1) the resulting distribution favorssubgraphs with more (resp., fewer) connected components; hence, the random-cluster model is a strictgeneralization of the Erdős-Rényi model.When 𝑞 ≥ 𝑞 -state Ising/Potts model, the most classical of all spin systems. In this model, conﬁgurationsare assignments of spin values { , . . . , 𝑞 } to the vertices of 𝐺 ; the duality between the models is realized viaa coupling of their Gibbs distributions (see, e.g., [20]). The random-cluster model has become a key toolin the analysis of the Ising/Potts model phase transition [2, 13, 12], and also plays an indispensable role inthe design of eﬃcient Markov Chain Monte Carlo (MCMC) sampling algorithms. In particular, random-cluster dynamics , i.e., Markov chains on random-cluster conﬁgurations that converge to the random-clustermeasure, are of major interest, as they provide eﬃcient samplers for the ferromagnetic Ising/Potts modelseven in cases where standard MCMC methods for those models are known to require exponential time [31,8, 21].In this paper we investigate the Chayes-Machta (CM) dynamics [10], a natural generalization to non-integer values of 𝑞 of the widely studied Swendsen-Wang dynamics [30]. As with all applications of theMCMC method, the primary object of study is the mixing time , i.e., the number of steps until the dynam-ics is close to its stationary distribution, starting from the worst possible initial conﬁguration. We areinterested in understanding how the mixing time of the CM dynamics grows as the size of the graph 𝐺 increases, and in particular how it relates to the phase transition of the model.Given a random-cluster conﬁguration ( 𝑉 , 𝐴 ) , one step of the CM dynamics is deﬁned as follows:(i) activate each connected component of ( 𝑉 , 𝐴 ) independently with probability 1 / 𝑞 ;(ii) remove all edges connecting active vertices;(iii) add each edge between active vertices independently with probability 𝑝 , leaving the rest of the con-ﬁguration unchanged.We call step (i) the activation sub-step, and (ii) and (iii) combined the percolation sub-step. It is easy tocheck that this dynamics is reversible with respect to the Gibbs distribution (1) and thus converges toit [10]. For integer 𝑞 , the CM dynamics may be viewed as a variant of the Swendsen-Wang dynamics. In theSwendsen-Wang dynamics, each connected component of ( 𝑉 , 𝐴 ) receives a random color from { , . . . , 𝑞 } ,and the edges are updated within each color class as in sub-steps (ii) and (iii) above; in contrast, the CMdynamics updates the edges of exactly one color class. However, note that the Swendsen-Wang dynamicsis only well-deﬁned for integer 𝑞 , while the CM dynamics is feasible for any real 𝑞 >

1. The CM dynamicswas introduced precisely to allow this generalization.1he study of the interplay between phase transitions and the eﬃciency of MCMC methods goes backto pioneering work in mathematical physics in the late 1980s. This connection for the speciﬁc case of theCM dynamics on the complete 𝑛 -vertex graph, known as the mean-ﬁeld model , has received some attentionin recent years (see [7, 16, 19]) and is the focus of this paper. As we shall see, the mean-ﬁeld case is alreadyquite non-trivial, and has historically proven to be a useful starting point in understanding various typesof dynamics on more general graphs. We note that, so far, the mean-ﬁeld is the only setting in which thereare tight mixing time bounds for the CM dynamics; all other known bounds are deduced indirectly viacomparison with other dynamics, thus incurring signiﬁcant overhead [8, 6, 18, 5, 31, 7].The phase transition for the mean-ﬁeld random-cluster model is fairly well-understood [9, 25]. In thissetting, it is natural to re-parameterize by setting 𝑝 = ℭ / 𝑛 ; the phase transition then occurs at the criticalvalue ℭ = ℭ cr ( 𝑞 ) given by ℭ cr ( 𝑞 ) =  𝑞 for 0 < 𝑞 ≤ (cid:16) 𝑞 − 𝑞 − (cid:17) log ( 𝑞 − ) for 𝑞 > . For ℭ < ℭ cr ( 𝑞 ) all components are of size 𝑂 ( log 𝑛 ) with high probability (w.h.p.); that is, with probabil-ity going to 1 as 𝑛 → ∞ . This regime is known as the disordered phase . On the other hand, for ℭ > ℭ cr ( 𝑞 ) there is a unique giant component of size 𝜃𝑛 , where 𝜃 = 𝜃 ( ℭ , 𝑞 ) is the largest 𝑥 > 𝑒 − ℭ 𝑥 = − 𝑞𝑥 + ( 𝑞 − ) 𝑥 ; (2)this regime of parameters is known as the ordered phase . The phase transition is thus often called an order - disorder phase transition and is analogous to that in 𝐺 ( 𝑛, 𝑝 ) corresponding to the appearance of a giantcomponent of linear size.The phase structure of the mean-ﬁeld random-cluster model, however, is much more subtle and de-pends crucially on the second parameter 𝑞 . In particular, when 𝑞 > discontinuous ,which implies that at criticality ( ℭ = ℭ cr ( 𝑞 ) ) the ordered and disordered phases coexist [25]; i.e., eachcontributes a constant fraction of the probability mass. For 𝑞 ≤ continuous phase transition and there is no phase coexistence.The phase coexistence phenomenon at ℭ = ℭ cr ( 𝑞 ) when 𝑞 > ℭ = ℭ cr ( 𝑞 ) but has also beenshown to persist in a constant-width interval ( ℭ l , ℭ r ) around ℭ cr . This is the so-called metastability win-dow in which several standard Markov chains, including the CM dynamics, are known to be slow (see,e.g., [11, 16, 7]). Indeed, for the CM dynamics the following detailed connection between the phase struc-ture and the mixing time 𝜏 mix ( CM ) has been recently established in [4, 7, 19]. For any 𝑞 ∈ ( , ] the mixingtime of the mean-ﬁeld CM dynamics is Θ ( log 𝑛 ) for all ℭ ≠ ℭ cr ( 𝑞 ) . When 𝑞 >

2, we have: 𝜏 mix ( CM ) =  Θ ( log 𝑛 ) if ℭ ∉ [ ℭ l , ℭ r ) ; Θ ( 𝑛 / ) if ℭ = ℭ l ; 𝑒 Ω ( 𝑛 ) if ℭ ∈ ( ℭ l , ℭ r ) , where ( ℭ l , ℭ r ) is the metastability window. It is known that ℭ r = 𝑞 , but ℭ l does not have a closed form;see [7, 25]. (For 𝑞 ∈ ( , ] , we have ℭ l = ℭ cr ( 𝑞 ) = ℭ r = 𝑞 .)In view of these results, we note that the only case remaining open is when 𝑞 ∈ ( , ] and ℭ = ℭ cr ( 𝑞 ) .Our main result in this paper concerns precisely this regime, which is particularly delicate for reasons weexplain in our proof overview below. 2 heorem 1.1. The mixing time of the CM dynamics on the complete 𝑛 -vertex graph when ℭ = ℭ cr ( 𝑞 ) = 𝑞 and 𝑞 ∈ ( , ) is 𝑂 ( log 𝑛 · log log 𝑛 ) . A Ω ( log 𝑛 ) lower bound for the mixing time is known for the CM mean-ﬁeld dynamics [7], so ourresult is tight up to the lower order 𝑂 ( log log 𝑛 ) factor, and in fact even better as we explain in Remark 1.The conjectured upper bound is 𝑂 ( log 𝑛 ) . We mention that the ℭ = 𝑞 , 𝑞 = Θ ( 𝑛 / ) bound was established for itsmixing time.Our result establishes a striking behavior for random-cluster dynamics when 𝑞 ∈ ( , ) . Namely, thereis no critical slowdown (exponential or power law) in this regime. Note that for 𝑞 ≥

2, the mixing timeof the dynamics undergoes an exponential slowdown, transitioning from Θ ( log 𝑛 ) when ℭ < ℭ l , to apower law at ℭ = ℭ l , and to exponential in 𝑛 when ℭ ∈ ( ℭ l , ℭ r ) . The absence of a critical slowdownfor 𝑞 ∈ ( , ) was in fact predicted by the statistical physics community [17], and our result provides atheoretical conﬁrmation of this interesting phenomenon.We provide next some brief remarks about our analysis techniques, which we believe are novel andcombine several key ingredients in a non-trivial way. Our bound on the mixing time uses the well-knowntechnique of coupling : in order to show that the mixing time is 𝑂 ( log 𝑛 · log log 𝑛 ) , it suﬃces to couple theevolutions of two copies of the dynamics, starting from two arbitrary conﬁgurations, in such a way thatthey arrive at the same conﬁguration after 𝑂 ( log 𝑛 · log log 𝑛 ) steps w.h.p. (The moves of the two copiescan be correlated any way we choose, provided that each copy, viewed in isolation, is a valid realizationof the dynamics.) Because of the delicate nature of the phase transition in the random-cluster model,combined with the fact that the percolation sub-step of the CM dynamics is critical when ℭ = 𝑞 , ourcoupling is somewhat elaborate and proceeds in multiple phases. The ﬁrst phase of the coupling consistsof a burn-in period , where the two copies of the chain are run independently and the evolution of theirlargest components is observed until they have shrunk to their “typical” sizes. This part of the analysis isinspired by similar arguments in earlier work [7, 24, 16].In the second phase, we design a coupling of the activation of the connected components of the twocopies which uses: (i) a local limit theorem, which can be thought of as a stronger version of a central limittheorem; (ii) a precise understanding of the distribution of the maximum of symmetric random walks on Z with varying step sizes; and (iii) precise estimates for the component structure of random graphs. Wedevelop tailored versions of these probabilistic tools for our setting and combine them to guarantee thatthe same number of vertices from each copy are activated in each step w.h.p. for suﬃciently many steps.This phase of the coupling is the main novelty in our analysis, and allows us to quickly converge to twoconﬁgurations with the same component structure. Such pairs of conﬁgurations can then be easily coupledin a third and ﬁnal phase. We give a more detailed overview of our proof in the following subsection. We now give a more detailed sketch of the multi-phased coupling argument mentioned above; see Figure 1for a high-level schematic of our coupling.The initial phase of our coupling is a burn-in period , in which the two copies of the dynamics evolveindependently until certain structural properties of both conﬁgurations can be guaranteed. The burn-inperiod itself consists of three sub-phases which we describe next. For this, we need the following keynotation. For a graph 𝐺 , let 𝐿 𝑖 ( 𝐺 ) denote the size of the 𝑖 -th largest component of 𝐺 , and let R 𝑖 ( 𝐺 ) : = Í 𝑗 ≥ 𝑖 𝐿 𝑗 ( 𝐺 ) ; in particular, note that R ( 𝐺 ) is the sum of the squares of the sizes of all the componentsof 𝐺 . Controlling these sums of squares is crucial for us because they largely determine the concentrationof the number of active vertices around its mean.In the ﬁrst sub-phase of the burn-in period, we reach a conﬁguration 𝑋 such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) ;this takes 𝑂 ( log 𝑛 ) steps, and allows us to ensure that the number of active vertices in each step is well3 hase 1– Burn-in period: run two copies { 𝑋 𝑡 } , { 𝑌 𝑡 } independently, starting from a pair ofarbitrary initial conﬁgurations until R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) and R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) ; Phase 2–

Coupling to the same component structure: starting from 𝑋 𝑇 and 𝑌 𝑇 such that R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) and R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) , we design a two-phased couplingthat reaches two conﬁgurations with the same component structure as follows: Phase 2a:

A two-step coupling after which the two conﬁgurations agreein all “large components”;

Phase 2b:

A coupling that ensures the two conﬁgurations will also havethe same small component structure after 𝑂 ( log 𝑛 ) steps; Phase 3–

Coupling to the same conﬁguration: starting from two conﬁgurations with thesame component structure, there is a straightforward coupling that couples thetwo conﬁgurations in 𝑂 ( log 𝑛 ) steps w.h.p.Figure 1: Main phases of coupling to establish Theorem 1.1.concentrated around its mean. In the second and third sub-phases, we use this concentration propertyto show that the size of the largest component contracts at a constant rate for 𝑇 = 𝑂 ( log 𝑛 ) steps untilwe reach a conﬁguration 𝑋 𝑇 such that R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) . This part of the analysis is split into twosub-phases because the contraction for 𝐿 ( 𝑋 𝑡 ) occurs diﬀerently according to whether 𝐿 ( 𝑋 𝑡 ) = Ω ( 𝑛 ) or 𝐿 ( 𝑋 𝑡 ) = 𝑜 ( 𝑛 ) . Combining these ideas we prove the following lemma. (Its proof and further insights canbe found in Section 3.) Lemma 1.2.

Let

ℭ = 𝑞 , 𝑞 ∈ ( , ) and let 𝑋 be an arbitrary random-cluster conﬁguration of the complete 𝑛 -vertex graph. Then, with probability Ω ( ) , after 𝑇 = 𝑂 ( log 𝑛 ) steps R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) . We note that the contraction of 𝐿 ( 𝑋 𝑡 ) mentioned above only happens when 𝑞 ∈ ( , ) ; when 𝑞 > 𝐿 ( 𝑋 𝑡 ) may increase in expectation, whereas when 𝑞 = [ 𝐿 ( 𝑋 𝑡 + ) | 𝑋 𝑡 ] ≈ 𝐿 ( 𝑋 𝑡 ) and the contraction of the size of the largest component is due instead to ﬂuctuations caused by a largesecond moment. (This is what causes the power law slowdown when ℭ = 𝑞 = 𝐺 ( 𝑚, 𝑞 / 𝑛 ) random graph, where 𝑚 is the number of active vertices. SinceE [ 𝑚 ] = 𝑛 / 𝑞 , one key challenge in the proof of Lemma 1.2, and in fact in the entirety of our analysis, isthat the random graph 𝐺 ( 𝑚, 𝑞 / 𝑛 ) is critical (or nearly critical) w.h.p. since 𝑚 · 𝑞 / 𝑛 ≈

1; consequently itsstructural properties are not well concentrated and cannot be maintained for the required 𝑂 ( log 𝑛 ) stepsof the coupling. This is one key reason why the ℭ = ℭ cr ( 𝑞 ) = 𝑞 regime (even when 𝑞 >

2) is quite delicate.For the second phase of the coupling we assume that we start from a pair of conﬁgurations 𝑋 , 𝑌 such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , R ( 𝑌 ) = 𝑂 ( 𝑛 / ) . We show that after 𝑇 = 𝑂 ( log 𝑛 ) steps, with probability Ω ( log log 𝑛 ) , we reach two conﬁgurations 𝑋 𝑇 and 𝑌 𝑇 with the same component structure; i.e., 𝐿 𝑗 ( 𝑋 𝑇 ) = 𝐿 𝑗 ( 𝑌 𝑇 ) for all 𝑗 ≥

1. This coupling has two main sub-phases; see Figure 1.The ﬁrst is a two-step coupling after which the two conﬁgurations agree in all the components of sizeabove a certain threshold 𝑛 / / 𝜔 ( 𝑛 ) , where 𝜔 ( 𝑛 ) is a slowly increasing function. By choosing 𝜔 ( 𝑛 ) smallenough, we can ensure that there are not too many components with sizes above this threshold, and we can4ondition on the event that all of them are activated simultaneously. Then the diﬀerence in the number ofactive vertices from these large components can be “corrected” by a coupling of the activation of the smallercomponents. This coupling is designed using a local limit theorem for the random variables correspondingto the number of active vertices from the small components of each copy (see Theorem 2.2). We prove avariant of the local limit theorem that applies to our setting using a result of Mukhin [28]. It is also crucialfor us to guarantee that, among the small components, there are many components of many diﬀerent sizes.We ensure this by reﬁning some classical results in the random graph literature (see Lemma 2.16).We remark that when ℭ = 𝑞 and 𝑞 > 𝑂 ( log 𝑛 ) stepsthe component structure can be shown to be simpler, with much better concentration. On the other hand,when ℭ = 𝑞 and 𝑞 ∈ ( , ] the model is critical which, combined with the fact mentioned above that thepercolation step of the dynamics is also critical, is the reason why this regime had largely resisted analysisuntil now.In the second sub-phase, after the large components are matched, any diﬀerence in the number ofactivated vertices that is due to the activation of a set of small components can be ﬁxed by coupling theactivation of the remaining small components. To design such a coupling we use a precise estimate on thedistribution of the maximum of symmetric random walks over integers (with steps of diﬀerent sizes). Fora more detailed overview of this key step, see the sketch of the main ideas in Section 4.3. We summarizethe outcome of the second phase of the coupling in the following result. Lemma 1.3.

Let

ℭ = 𝑞 , 𝑞 ∈ ( , ] and suppose 𝑋 , 𝑌 are random-cluster conﬁgurations such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and R ( 𝑌 ) = 𝑂 ( 𝑛 / ) . Then, there exists a coupling of the CM steps such that after 𝑇 = 𝑂 ( log 𝑛 ) steps 𝑋 𝑇 and 𝑌 𝑇 have the same component structure with probability Ω (cid:0) ( log log 𝑛 ) − (cid:1) . The last ingredient of our proof is a straightforward coupling argument that starts from two conﬁg-urations with the same component structure and ensures that after 𝑇 = 𝑂 ( log 𝑛 ) steps we have 𝑋 𝑇 = 𝑌 𝑇 w.h.p. This coupling was already analyzed in [7]. Lemma 1.4 ([7], Lemma 24) . Let 𝑞 > and ℭ > . Let 𝑋 and 𝑌 be two random-cluster conﬁgurationswith the same component structure. Then, there exists a coupling of the CM steps such that after 𝑇 = 𝑂 ( log 𝑛 ) steps, 𝑋 𝑇 = 𝑌 𝑇 w.h.p.Proof of Theorem 1.1. By Lemma 1.2, after 𝑡 = 𝑂 ( log 𝑛 ) steps, with probability Ω ( ) , we have R ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) and R ( 𝑌 𝑡 ) = 𝑂 ( 𝑛 / ) . If this is the case, Lemmas 1.3 and 1.4 imply that there exists a coupling ofthe CM steps such that with probability Ω (cid:0) ( log log 𝑛 ) − (cid:1) after an additional 𝑡 = 𝑂 ( log 𝑛 ) steps, 𝑋 𝑡 + 𝑡 = 𝑌 𝑡 + 𝑡 . The ﬁnal result follows from a standard boosting argument that relates coupling time to mixingtime (see Section 2.1). (cid:3) Remark . The probability of success in Lemma 1.3, which governs the lower order term 𝑂 ( log log 𝑛 ) in our mixing time bound, is controlled by our choice of the function 𝜔 ( 𝑛 ) for the deﬁnition of “largecomponents”. By choosing 𝜔 ( 𝑛 ) that goes to ∞ more slowly, we could improve our mixing time boundto 𝑂 ( log 𝑛 · 𝑔 ( 𝑛 )) where 𝑔 ( 𝑛 ) is any function that tends to inﬁnity arbitrarily slowly. However, it seemsthat new ideas are required to obtain a bound of 𝑂 ( log 𝑛 ) which matches the known lower bound. Ourparticular choice of 𝜔 ( 𝑛 ) yields the 𝑂 ( log 𝑛 · log log 𝑛 ) bound and makes our analysis cleaner. Remark . We use the fact that 𝜔 ( 𝑛 ) → ∞ throughout our proofs; in most cases we could set 𝜔 ( 𝑛 ) to be asuﬃciently large constant without much additional work. However, in the proof of Lemma 1.3, speciﬁcallyin the second sub-step where we use random walks in the analysis, we crucially use that 𝜔 ( 𝑛 ) → ∞ ; thereader is referred to Section 4.3.1 for additional details. Remark . We note that Theorem 1.1, combined with a comparison result from [7], implies that the single-edge random-cluster Glauber dynamics mixes in 𝑂 ( 𝑛 × polylog ( 𝑛 )) steps when ℭ = 𝑞 and 𝑞 ∈ ( , ) .Previously, no polynomial bound on the mixing time was known for the Glauber dynamics in this regime.5 Preliminaries

In this section we provide a number of standard deﬁnitions and results that we will refer to in our proofs.

Let Ω RC be the set of random-cluster conﬁgurations of a graph 𝐺 ; let M 𝑡 ( 𝑋 , ·) be the distribution of M after 𝑡 steps starting from 𝑋 ∈ Ω RC and 𝜏 mix ( 𝜀 ) : = max 𝑋 ∈ Ω RC min 𝑡 (cid:8) ||M 𝑡 ( 𝑋 , ·) − 𝜇 𝐺 (·) || TV ≤ 𝜀 (cid:9) , where || · || TV denotes total variation distance. The mixing time of M is 𝜏 mix : = 𝜏 mix ( / ) .A (one step) coupling of the Markov chain M speciﬁes, for every pair of states ( 𝑋 𝑡 , 𝑌 𝑡 ) ∈ Ω RC × Ω RC ,a probability distribution over ( 𝑋 𝑡 + , 𝑌 𝑡 + ) such that the processes { 𝑋 𝑡 } and { 𝑌 𝑡 } are valid realizationsof M , and if 𝑋 𝑡 = 𝑌 𝑡 then 𝑋 𝑡 + = 𝑌 𝑡 + . The coupling time , denoted 𝑇 coup , is the minimum 𝑇 such thatPr [ 𝑋 𝑇 ≠ 𝑌 𝑇 ] ≤ /

4, starting from the worst possible pair of conﬁgurations in Ω RC . It is a standard fact that 𝜏 mix ≤ 𝑇 coup ; moreover, when Pr [ 𝑋 𝑇 = 𝑌 𝑇 ] ≥ 𝛿 for some coupling, then 𝜏 mix = 𝑂 ( 𝑇 𝛿 − ) (see, e.g., [23]). Let 𝑐 ≤ · · · ≤ 𝑐 𝑚 be integers and for 𝑖 = , . . . , 𝑚 let 𝑋 𝑖 be the random variable that it is equal to 𝑐 𝑖 withprobability 𝑟 ∈ ( , ) and it is zero otherwise. Let 𝑋 , . . . , 𝑋 𝑚 be independent. Let 𝑆 𝑚 = Í 𝑚𝑖 = 𝑋 𝑖 , 𝜇 𝑚 = E [ 𝑆 𝑚 ] and 𝜎 𝑚 = Var ( 𝑆 𝑚 ) . We say that the local limit theorem holds ifPr [ 𝑆 𝑚 = 𝑎 ] = √ 𝜋𝜎 𝑚 exp (cid:18) − ( 𝑎 − 𝜇 𝑚 ) 𝜎 𝑚 (cid:19) + 𝑜 (cid:18) 𝜎 𝑚 (cid:19) . (3)We provide next suﬃcient conditions for the local limit theorem. First, we introduce some notation.For a random variable 𝑋 and 𝑑 ∈ R , let 𝐻 ( 𝑋, 𝑑 ) = E [h 𝑋 ∗ 𝑑 i ] , where h·i denotes distance to the closestinteger and 𝑋 ∗ is a symmetrized version of 𝑋 ; i.e., 𝑋 ∗ = 𝑋 − 𝑋 ′ where 𝑋 ′ is an i.i.d. copy of 𝑋 . Let 𝐻 𝑚 = inf 𝑑 ∈[ , ] Í 𝑚𝑖 = 𝐻 ( 𝑋 𝑖 , 𝑑 ) . The following local limit theorem is due to Mukhin [28]. Throughout thissection, all limits are taken as 𝑚 → ∞ . Theorem 2.1 ([28], Theorem 1) . Suppose that the sequence 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 converges in distribution to a standardnormal random variable and that 𝜎 𝑚 → ∞ . If 𝐻 𝑚 → ∞ and there exists 𝛼 > such that ∀ 𝑢 ∈ [ 𝐻 / 𝑚 , 𝜎 𝑚 ] wehave Í 𝑖 : 𝑐 𝑖 ≤ 𝑢 𝑐 𝑖 ≥ 𝛼𝑢𝜎 𝑚 , then the local limit theorem holds. We provide next a corollary of Theorem 2.1 which will be used in our proofs. We introduce ﬁrstsome additional notation. Let 𝜔 : N → R be an increasing positive function such that 𝜔 ( 𝑚 ) → ∞ and 𝜔 ( 𝑚 ) = 𝑜 ( log 𝑚 ) . Given 𝑘 ∈ N and a ﬁxed constant 𝜗 >

0, let I 𝑘 = (cid:20) 𝜗𝑚 / 𝜔 ( 𝑚 ) 𝑘 , 𝜗𝑚 / 𝜔 ( 𝑚 ) 𝑘 (cid:21) . Theorem 2.2.

Suppose that 𝑋 , ..., 𝑋 𝑚 are independent random variables described above. Suppose 𝑐 𝑚 = 𝑂 (cid:0) 𝑚 / 𝜔 ( 𝑚 ) − (cid:1) , Í 𝑚𝑖 = 𝑐 𝑖 = 𝑂 (cid:0) 𝑚 / 𝜔 ( 𝑚 ) − / (cid:1) and 𝑐 𝑖 = for all 𝑖 ≤ 𝜌𝑚 , where 𝜌 ∈ ( , ) is independentof 𝑚 . Let ℓ > be the smallest integer such that 𝜗𝑚 / 𝜔 ( 𝑚 ) − ℓ = 𝑜 ( 𝑚 / ) . If for all ≤ 𝑘 ≤ ℓ , we have |{ 𝑖 : 𝑐 𝑖 ∈ I 𝑘 ( 𝜔 )}| = Ω ( 𝜔 ( 𝑚 ) · 𝑘 − ) , then the local limit theorem holds. We show how derive Theorem 2.2 from Theorem 2.1. For the sake of completeness we also provide inAppendix A an alternative proof of Theorem 2.2 from ﬁrst principles.6 roof of Theorem 2.2.

We check that the 𝑋 𝑖 ’s satisfy the conditions from Theorem 2.1. Lemmas A.1 andA.2 imply 𝜎 𝑚 → ∞ , 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 → 𝑁 ( , ) and for any 𝑢 satisfying 𝜎 𝑚 ≥ 𝑢 ≥ Í 𝑗 : 𝑐 𝑗 ≤ 𝑢 𝑐 𝑗 ≥ 𝑢𝜎 𝑚 / 𝑟 ( − 𝑟 ) . Itremains to show 𝐻 𝑚 → ∞ .Now, for 𝑖 ≤ 𝜌𝑚 and 1 / ≤ 𝑑 ≤ /

2, we have that 𝐸 [h 𝑋 ∗ 𝑖 𝑑 i ] = 𝑟 ( − 𝑟 ) 𝑑 . Thus, 𝐻 𝑚 ≥ inf ≤ 𝑑 ≤ ⌈ 𝜌𝑚 ⌉ Õ 𝑖 = 𝐻 ( 𝑋 𝑖 , 𝑑 ) = ⌈ 𝜌𝑚 ⌉ Õ 𝑖 = 𝑟 ( − 𝑟 ) 𝑑 = Ω ( 𝑚 ) → ∞ . Since we have shown that the 𝑋 𝑖 ’s satisfy all the conditions from Theorem 2.1, the result follows. (cid:3) Another important tool in our proofs are couplings based on the evolutions of certain random walks. Inthis section we consider a (lazy) symmetric random walk ( 𝑆 𝑘 ) on Z with bounded step size, and the ﬁrstresult we present is an estimate on 𝑀 𝑘 = max { 𝑆 , . . . , 𝑆 𝑘 } which is based on the well-known reﬂectionprinciple (see, e.g., Chapter 2.7 in [23]). Lemma 2.3.

Let 𝐴 > and let 𝐴 ≤ 𝑐 , 𝑐 , . . . , 𝑐 𝑛 ≤ 𝐴 be positive integers. Let 𝑟 ∈ ( , / ] and consider thesequence of random variables 𝑋 , . . . , 𝑋 𝑛 where for each 𝑖 = , . . . , 𝑛 : 𝑋 𝑖 = 𝑐 𝑖 with probability 𝑟 ; 𝑋 𝑖 = − 𝑐 𝑖 withprobability 𝑟 ; and 𝑋 𝑖 = otherwise. Let 𝑆 𝑘 = Í 𝑘𝑖 = 𝑋 𝑖 and 𝑀 𝑘 = max { 𝑆 , . . . , 𝑆 𝑘 } . Then, for any 𝑦 ≥ [ 𝑀 𝑛 ≥ 𝑦 ] ≥ [ 𝑆 𝑛 ≥ 𝑦 + 𝐴 + ] . Proof.

First, note thatPr [ 𝑀 𝑛 ≥ 𝑦 ] = 𝐴𝑛 Õ 𝑘 = 𝑦 Pr [ 𝑀 𝑛 ≥ 𝑦, 𝑆 𝑛 = 𝑘 ] + 𝑦 − Õ 𝑘 = − 𝐴𝑛 Pr [ 𝑀 𝑛 ≥ 𝑦, 𝑆 𝑛 = 𝑘 ] = Pr [ 𝑆 𝑛 ≥ 𝑦 ] + 𝑦 − Õ 𝑘 = − 𝐴𝑛 Pr [ 𝑀 𝑛 ≥ 𝑦, 𝑆 𝑛 = 𝑘 ] . (4)If 𝑀 𝑛 ≥ 𝑦 , let 𝑊 𝑛 be the value of the random walk { 𝑆 𝑖 } the ﬁrst time its value was at least 𝑦 . Then,Pr [ 𝑀 𝑛 ≥ 𝑦, 𝑆 𝑛 = 𝑘 ] = 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 Pr [ 𝑀 𝑛 ≥ 𝑦, 𝑆 𝑛 = 𝑘, 𝑊 𝑛 = 𝑏 ] = 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 Pr [ 𝑀 𝑛 ≥ 𝑦, 𝑆 𝑛 = 𝑏 − 𝑘, 𝑊 𝑛 = 𝑏 ] = 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 Pr [ 𝑆 𝑛 = 𝑏 − 𝑘, 𝑊 𝑛 = 𝑏 ] , where in the second equality we used the fact that the random walk is symmetric and the last one follows7rom the fact that 2 𝑏 − 𝑘 ≥ 𝑦 . Plugging this into (4), we getPr [ 𝑀 𝑛 ≥ 𝑦 ] = Pr [ 𝑆 𝑛 ≥ 𝑦 ] + 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 𝑦 − Õ 𝑘 = − 𝐴𝑛 Pr [ 𝑆 𝑛 = 𝑏 − 𝑘, 𝑊 𝑛 = 𝑏 ] = Pr [ 𝑆 𝑛 ≥ 𝑦 ] + 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 𝐴𝑛 Õ 𝑘 = 𝑏 − 𝑦 + Pr [ 𝑆 𝑛 = 𝑘, 𝑊 𝑛 = 𝑏 ] = Pr [ 𝑆 𝑛 ≥ 𝑦 ] + 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 Pr [ 𝑆 𝑛 ≥ 𝑏 − 𝑦 + , 𝑊 𝑛 = 𝑏 ]≥ Pr [ 𝑆 𝑛 ≥ 𝑦 ] + 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 Pr [ 𝑆 𝑛 ≥ 𝑦 + 𝐴 + , 𝑊 𝑛 = 𝑏 ] , since 𝑏 < 𝑦 + 𝐴 . Finally, observe that 𝑦 + 𝐴 − Õ 𝑏 = 𝑦 Pr [ 𝑆 𝑛 ≥ 𝑦 + 𝐴 + , 𝑊 𝑛 = 𝑏 ] = Pr [ 𝑆 𝑛 ≥ 𝑦 + 𝐴 + ] and so Pr [ 𝑀 𝑛 ≥ 𝑦 ] ≥ Pr [ 𝑆 𝑛 ≥ 𝑦 ] + Pr [ 𝑆 𝑛 ≥ 𝑦 + 𝐴 + ] ≥ [ 𝑆 𝑛 ≥ 𝑦 + 𝐴 + ] , as desired. (cid:3) The following lemma will be crucial in our proofs.

Theorem 2.4.

Let 𝐴 > and let 𝐴 ≤ 𝑐 , 𝑐 , . . . , 𝑐 𝑚 ≤ 𝐴 be positive integers. Let 𝑟 ∈ ( , / ] and consider thesequences of random variables 𝑋 , . . . , 𝑋 𝑚 and 𝑌 , . . . , 𝑌 𝑚 where for each 𝑖 = , . . . , 𝑚 : 𝑋 𝑖 = 𝑐 𝑖 with probability 𝑟 ; 𝑋 𝑖 = − 𝑐 𝑖 with probability 𝑟 ; 𝑋 𝑖 = otherwise and 𝑌 𝑖 has the same distribution as 𝑋 𝑖 . Let 𝑋 = Í 𝑚𝑖 = 𝑋 𝑖 and 𝑌 = Í 𝑚𝑖 = 𝑌 𝑖 . Then for any 𝑑 > , there exist a constant 𝛿 : = 𝛿 ( 𝑟 ) > and a coupling of 𝑋 and 𝑌 such that Pr [ 𝑑 + 𝐴 ≥ 𝑋 − 𝑌 ≥ 𝑑 ] ≥ − 𝛿 ( 𝑑 + 𝐴 ) 𝐴 √ 𝑚 . Proof.

Set 𝛿 = √ 𝑟 . Let 𝐷 𝑘 = Í 𝑘𝑖 = ( 𝑋 𝑖 − 𝑌 𝑖 ) for each 𝑘 ∈ { , . . . , 𝑚 } . We construct a coupling for ( 𝑋, 𝑌 ) bycoupling each ( 𝑋 𝑘 , 𝑌 𝑘 ) as follows:1. If 𝐷 𝑘 < 𝑑 , sample 𝑋 𝑘 + and 𝑌 𝑘 + independently.2. If 𝐷 𝑘 ≥ 𝑑 , set 𝑋 𝑘 + = 𝑌 𝑘 + .Observe that if 𝐷 𝑘 ≥ 𝑑 for any 𝑘 ≤ 𝑚 , then 𝑑 + 𝐴 ≥ 𝑋 − 𝑌 ≥ 𝑑 . Therefore,Pr [ 𝑑 + 𝐴 ≥ 𝑋 − 𝑌 ≥ 𝑑 ] ≥ Pr [ 𝑀 𝑚 ≥ 𝑑 ] where 𝑀 𝑚 = max { 𝐷 , ..., 𝐷 𝑚 } . Note that { 𝐷 𝑘 } behaves like a (lazy) symmetric random walk until the ﬁrsttime 𝜏 it is at least 𝑑 ; after that { 𝐷 𝑘 } stays put.Let { 𝐷 ′ 𝑘 } denote such random walk which does not stop after 𝜏 , and 𝑀 ′ 𝑚 : = max { 𝐷 ′ , ..., 𝐷 ′ 𝑚 } . NoticePr [ 𝑀 𝑚 ≥ 𝑑 ] = Pr [ 𝑀 ′ 𝑚 ≥ 𝑑 ] . { 𝐷 ′ 𝑘 } is at least 𝐴 and at most 4 𝐴 , by Lemma 2.3 for any 𝑑 ≥ [ 𝑀 ′ 𝑚 ≥ 𝑑 ] ≥ [ 𝐷 ′ 𝑚 ≥ 𝑑 + 𝐴 + ] . Let 𝜎 = Í 𝑚𝑖 = E [( 𝑋 𝑖 − 𝑌 𝑖 ) ] = 𝑟 Í 𝑚𝑖 = 𝑐 𝑖 and 𝜌 = Í 𝑚𝑖 = E [| 𝑋 𝑖 − 𝑌 𝑖 | ] = 𝑟 ( + 𝑟 ) Í 𝑚𝑖 = 𝑐 𝑖 . By the Berry-Esséentheorem for independent (but not necessarily identical) random variables (see, e.g. [3]), we get that for any 𝑦 ∈ R (cid:12)(cid:12) Pr [ 𝐷 ′ 𝑚 > 𝑦𝜎 ] − Pr [ 𝑁 > 𝑦 ] (cid:12)(cid:12) ≤ 𝑐𝜌𝜎 ≤ 𝑐𝐴𝜎 . where 𝑁 is a standard normal random variable, and 𝑐 ∈ [ . , . ] is an absolute constant. Then,Pr [ 𝐷 ′ 𝑚 > 𝑦𝜎 ] ≥ Pr [ 𝑁 > 𝑦 ] − 𝑐𝐴𝜎 . (5)Notice 𝜎 ≥ 𝐴 √ 𝑟𝑚 . If 𝑑 + 𝐴 ≥ 𝜎 , the theorem holds vacuously since1 − 𝛿 ( 𝑑 + 𝐴 ) 𝐴 √ 𝑚 = − ( 𝑑 + 𝐴 ) 𝐴 √ 𝑟𝑚 < − 𝑑 + 𝐴𝐴 √ 𝑟𝑚 ≤ − 𝜎𝐴 √ 𝑟𝑚 ≤ − < . If 𝑑 + 𝐴 < 𝜎 , since it can be checked via a Taylor’s expansion that 2 Pr [ 𝑁 > 𝑦 ] ≥ − q 𝜋 𝑦 for 𝑦 < [ 𝑀 𝑚 ≥ 𝑑 ] ≥ [ 𝐷 ′ 𝑚 > 𝑑 + 𝐴 ] ≥ (cid:20) 𝑁 > 𝑑 + 𝐴𝜎 (cid:21) − 𝑐𝐴𝜎 ≥ − p / 𝜋 ( 𝑑 + 𝐴 ) 𝜎 − 𝑐𝐴𝜎 ≥ − ( 𝑑 + 𝐴 ) 𝜎 ≥ − 𝛿 ( 𝑑 + 𝐴 ) 𝐴 √ 𝑚 , as claimed. (cid:3) We note that Theorem 2.4 is a generalization of the following more standard fact which will also beuseful to us.

Lemma 2.5 ([4], Lemma 2.18) . Let 𝑋 and 𝑌 be binomial random variables with parameters 𝑚 and 𝑟 , where 𝑟 ∈ ( , ) is a constant. Then, for any integer 𝑦 > , there exists a coupling ( 𝑋, 𝑌 ) such that for a suitableconstant 𝛾 = 𝛾 ( 𝑟 ) > , Pr [ 𝑋 − 𝑌 = 𝑦 ] ≥ − 𝛾𝑦 √ 𝑚 . In this section, we compile a number of standard facts about the 𝐺 ( 𝑛, 𝑝 ) random graph model which willbe useful in our proofs. Recall that in this model the state of every possible edge is independent andcorresponds to a Bernoulli 0/1 random variable that equals to 1 with probability 𝑝 . We use 𝐺 ∼ 𝐺 ( 𝑛, 𝑝 ) todenote that the graph 𝐺 is sampled from 𝐺 ( 𝑛, 𝑝 ) . A 𝐺 ( 𝑛, 𝑝 ) random graph is said to be sub-critical when 𝑛𝑝 <

1. It is called super-critical when 𝑛𝑝 > 𝑛𝑝 = Fact 2.6.

Given < 𝑁 < 𝑁 and 𝑝 ∈ [ , ] . Let 𝐺 ∼ 𝐺 ( 𝑁 , 𝑝 ) and 𝐺 ∼ 𝐺 ( 𝑁 , 𝑝 ) . For any 𝐾 > , Pr [ 𝐿 ( 𝐺 ) > 𝐾 ] ≤ Pr [ 𝐿 ( 𝐺 ) > 𝐾 ] . roof. Consider the coupling of ( 𝐺 , 𝐺 ) such that 𝐺 is a subgraph of 𝐺 . 𝐿 ( 𝐺 ) ≤ 𝐿 ( 𝐺 ) with probability1. Proposition just follows from Strassen’s theorem (see, e.g., Theorem 22.6 in [23]). (cid:3) Lemma 2.7 ([24], Lemma 5.7) . Let 𝐼 ( 𝐺 ) denote the number of isolated vertices in 𝐺 . If 𝑛𝑝 = 𝑂 ( ) , then thereexists a constant 𝐶 > such that Pr [ 𝐼 ( 𝐺 ) > 𝐶𝑛 ] = − 𝑂 ( 𝑛 − ) . Consider the equation 𝑒 − 𝑑𝑥 = − 𝑥 (6)and let 𝛽 ( 𝑑 ) be deﬁned as its unique positive root. Observe that 𝛽 is well-deﬁned for 𝑑 > Lemma 2.8 ([4], Lemma 2.7) . Let 𝐺 ∼ 𝐺 ( 𝑛 + 𝑚, 𝑑 𝑛 / 𝑛 ) random graph where | 𝑚 | = 𝑜 ( 𝑛 ) and lim 𝑛 →∞ 𝑑 𝑛 = 𝑑 .Assume < 𝑑 𝑛 = 𝑂 ( ) and 𝑑 𝑛 is bounded away from for all 𝑛 ∈ N . Then, For 𝐴 = 𝑜 ( log 𝑛 ) and suﬃcientlylarge 𝑛 , there exists a constant 𝑐 > such that Pr (cid:2) | 𝐿 ( 𝐺 ) − 𝛽 ( 𝑑 ) 𝑛 | > | 𝑚 | + 𝐴 √ 𝑛 (cid:3) ≤ 𝑒 − 𝑐𝐴 . Lemma 2.9 ([4], Lemma 2.16) . For 𝑛𝑝 > , we have E [R ( 𝐺 )] = 𝑂 (cid:0) 𝑛 / (cid:1) . Consider the near-critical random graph 𝐺 (cid:0) 𝑛, + 𝜀𝑛 (cid:1) with 𝜀 = 𝜀 ( 𝑛 ) = 𝑜 ( ) . Lemma 2.10 ([24], Theorem 5.9) . Assume 𝜀 𝑛 ≥ , then for any 𝐴 satisfying ≤ 𝐴 ≤ √ 𝜀 𝑛 / , there existssome constant 𝑐 > such that Pr (cid:20) | 𝐿 ( 𝐺 ) − 𝜀𝑛 | > 𝐴 r 𝑛𝜀 (cid:21) = 𝑂 (cid:16) 𝑒 − 𝑐𝐴 (cid:17) . Corollary 2.11.

Let 𝐺 ∼ 𝐺 (cid:0) 𝑛, + 𝜀𝑛 (cid:1) with 𝜀 = 𝑜 ( ) . For any positive constant 𝜌 ≤ / , there exist constants 𝐶 ≥ and 𝑐 > such that if 𝜀 𝑛 ≥ 𝐶 , then Pr [| 𝐿 ( 𝐺 ) − 𝜀𝑛 | > 𝜌𝜀𝑛 ] = 𝑂 ( 𝑒 − 𝑐𝜀 𝑛 ) . Lemma 2.12 ([24], Theorem 5.12) . Let 𝜀 < , then E [R ( 𝐺 )] = 𝑂 ( 𝑛 /| 𝜀 |) . Lemma 2.13 ([24], Theorem 5.13) . Let 𝜀 > and 𝜀 𝑛 ≥ for large 𝑛 , then E [R ( 𝐺 )] = 𝑂 ( 𝑛 / 𝜀 ) . For the next results, suppose that 𝐺 ∼ 𝐺 ( 𝑛, + 𝜆𝑛 − / 𝑛 ) , where 𝜆 = 𝜆 ( 𝑛 ) may depend on 𝑛 . Lemma 2.14. If | 𝜆 | = 𝑂 ( ) , then E [R ( 𝐺 )] = 𝑂 (cid:0) 𝑛 / (cid:1) .Proof. Follows from Lemmas 2.13, 2.15 and 2.16 in [4]. (cid:3)

Lemma 2.15.

Suppose | 𝜆 | = 𝑂 ( 𝜔 ( 𝑛 )) and let 𝐵 𝜔 = 𝑛 / 𝜔 ( 𝑛 ) , where 𝜔 : N → R is a positive increasing functionsuch that 𝜔 ( 𝑛 ) = 𝑜 ( log 𝑛 ) . Then with Ω ( ) probability, Õ 𝑗 : 𝐿 𝑗 ( 𝐺 ) ≤ 𝐵 𝜔 𝐿 𝑗 ( 𝐺 ) = 𝑂 (cid:16) 𝑛 / 𝜔 ( 𝑛 ) − / (cid:17) . Lemma 2.16.

Let 𝑆 𝐵 = { 𝑗 : 𝐵 ≤ 𝐿 𝑗 ( 𝐺 ) ≤ 𝐵 } and suppose there exists a positive increasing function 𝑔 suchthat 𝑔 ( 𝑛 ) → ∞ , 𝑔 ( 𝑛 ) = 𝑜 ( 𝑛 / ) , | 𝜆 | ≤ 𝑔 ( 𝑛 ) and 𝐵 ≤ 𝑛 / 𝑔 ( 𝑛 ) . If 𝐵 → ∞ , then there exists constants 𝛿 , 𝛿 > independent of 𝑛 such that Pr h | 𝑆 𝐵 | ≤ 𝛿 𝑛𝐵 / i ≤ 𝛿 𝐵 / 𝑛 . The proofs of Lemmas 2.15 and 2.16 are provided in Appendix C.10

Initial Burn-in Phase: Proof of Lemma 1.2

The initial burn-in phase will consist of three sub-phases. In the ﬁrst one, the goal is to reach a conﬁgura-tion 𝑋 such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and 𝐿 ( 𝑋 ) > 𝐶𝑛 / ; this analysis was already carried out in [4] for all 𝑞 . Lemma 3.1 ([4], Lemma 3.42) . Let

ℭ = 𝑞 , and let 𝑋 be an arbitrary random-cluster conﬁguration. Then, forany constant 𝐶 ≥ , after 𝑇 = 𝑂 ( log 𝑛 ) steps R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) and 𝐿 ( 𝑋 𝑇 ) > 𝐶𝑛 / with probability Ω ( ) . In the second sub-phase, the goal is to reach a conﬁguration 𝑋 such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and 𝐿 ( 𝑋 ) ≤ 𝛿𝑛 for a suﬃciently small constant 𝛿 > Lemma 3.2.

For any constant 𝛿 > , suppose R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and 𝐿 ( 𝑋 ) ≥ 𝛿𝑛 , then there exists 𝑇 = 𝑂 ( ) such that R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) and 𝐿 ( 𝑋 𝑇 ) ≤ 𝛿𝑛 with probability Ω ( ) . In the third sub-phase of the burn-in period, we show that 𝐿 ( 𝑋 𝑡 ) contracts at a constant rate, while R ( 𝑋 𝑡 ) can be shown to be 𝑂 ( 𝑛 / ) with reasonably high probability throughout the execution of thesub-phase. The precise description of this phenomenon is captured in the following lemma. To simplifythe notation, we let ˆ 𝐿 ( 𝑋 ) : = 𝐿 ( 𝑋 )/ 𝑛 / and let Λ 𝑡 denote the event that the largest component of theconﬁguration is activated in step 𝑡 . Lemma 3.3.

Suppose R ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) , 𝛿𝑛 / ≥ ˆ 𝐿 ( 𝑋 𝑡 ) ≥ 𝐵 for a large constant 𝐵 : = 𝐵 ( 𝑞 ) , and a smallconstant 𝛿 ( 𝑞, 𝐵 ) . Then:1. There exists a constant 𝛼 : = 𝛼 ( 𝐵, 𝑞, 𝛿 ) < such that Pr [ 𝐿 ( 𝑋 𝑡 + ) ≤ max { 𝛼𝐿 ( 𝑋 𝑡 ) , 𝐿 ( 𝑋 𝑡 )} | 𝑋 𝑡 , Λ 𝑡 ] ≥ − exp (cid:16) − Ω (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) (cid:17) (cid:17) ; Pr [ 𝐿 ( 𝑋 𝑡 + ) = 𝐿 ( 𝑋 𝑡 ) | 𝑋 𝑡 , ¬ Λ 𝑡 ] ≥ − 𝑂 (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) − (cid:17) . Lemma 3.4 ([4], Claim 3.45) . Suppose R ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) and ˆ 𝐿 ( 𝑋 𝑡 ) ≥ 𝐵 for a large constant 𝐵 , and let 𝐶 bea ﬁxed large constant. Then1. Pr (cid:20) R ( 𝑋 𝑡 + ) < R ( 𝑋 𝑡 ) + 𝐶𝑛 / √ ˆ 𝐿 ( 𝑋 𝑡 ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 𝑡 , Λ 𝑡 (cid:21) = − 𝑂 (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) − / (cid:17) .2. Pr (cid:20) R ( 𝑋 𝑡 + ) < R ( 𝑋 𝑡 ) + 𝐶𝑛 / √ ˆ 𝐿 ( 𝑋 𝑡 ) (cid:12)(cid:12)(cid:12)(cid:12) 𝑋 𝑡 , ¬ Λ 𝑡 (cid:21) = − 𝑂 (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) − / (cid:17) . Lemmas 3.3 and 3.4 can be combined to derive the following more accurate contraction estimate whichwill be crucial in the proof Lemma 1.2.

Lemma 3.5.

Suppose 𝑔 ( 𝑛 ) is an arbitrary function with range in the interval (cid:2) 𝐵 , 𝛿𝑛 / (cid:3) where 𝐵 is a largeenough constant such that for 𝑥 ≥ 𝐵 we have 𝑥 ≥ 𝐵 ( log 𝑥 ) , and 𝛿 : = 𝛿 ( 𝑞, 𝐵 ) is a small constant.Suppose 𝑋 is such that 𝑔 ( 𝑛 ) ≥ ˆ 𝐿 ( 𝑋 ) ≥ 𝐵 ( log 𝑔 ( 𝑛 )) and R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , then there exists a constant 𝐷 and 𝑇 = 𝑂 ( log 𝑔 ( 𝑛 )) such that at time 𝑇 , ˆ 𝐿 ( 𝑋 𝑇 ) ≤ max { 𝐵 ( log 𝑔 ( 𝑛 )) , 𝐷 } and R ( 𝑋 𝑇 ) ≤ R ( 𝑋 ) + 𝑂 (cid:16) 𝑛 / log 𝑔 ( 𝑛 ) (cid:17) with probability at least − 𝑂 (cid:0) log − 𝑔 ( 𝑛 ) (cid:1) . We are now ready to prove Lemma 1.2. 11 roof of Lemma 1.2.

Let 𝐵 be a constant large enough so that ∀ 𝑥 ≥ 𝐵 , we have 𝑥 ≥ ( log 𝑥 ) . By Lem-mas 3.1 and 3.2, starting from an arbitrary random-cluster conﬁguration, there exists 𝑡 = 𝑂 ( log 𝑛 ) suchthat with at least constant probability, the conﬁguration 𝑋 𝑡 is such that ˆ 𝐿 ( 𝑋 𝑡 ) ≤ 𝛿𝑛 / and R ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) for the constant 𝛿 = 𝛿 ( 𝑞, 𝐵 ) from Lemma 3.5. Suppose also ˆ 𝐿 ( 𝑋 𝑡 ) ≥ 𝐵 ; otherwise there is noth-ing to prove.Let 𝑔 ( 𝑛 ) : = 𝛿𝑛 / and 𝑔 𝑖 + ( 𝑛 ) : = 𝐵 ( log 𝑔 𝑖 ( 𝑛 )) for all 𝑖 ≥

0. Let 𝐾 be deﬁned as the minimum naturalnumber such that 𝑔 𝐾 ( 𝑛 ) ≤ 𝐵 . Note that 𝐾 = 𝑂 ( log ∗ 𝑛 ) . Assume at time 𝑡 ≥ 𝑡 , there exists an integer 𝑗 ≥ 𝑋 𝑡 satisﬁes:1. 𝑔 𝑗 + ( 𝑛 ) ≤ ˆ 𝐿 ( 𝑋 𝑡 ) ≤ 𝑔 𝑗 ( 𝑛 ) , and2. R ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) + 𝑂 (cid:16)Í 𝑗 − 𝑘 = 𝑛 / log 𝑔 𝑘 ( 𝑛 ) (cid:17) .We show there exists time 𝑡 ′ > 𝑡 such that properties 1 and 2 hold for 𝑋 𝑡 ′ for a diﬀerent index 𝑗 ′ > 𝑗 .The following bounds on sums and products involving the 𝑔 𝑖 ’s will be useful; the proof is provided inAppendix B. Claim 3.6.

Let 𝐾 be deﬁned as above. ∀ 𝑗 < 𝐾 .(i) For any positive constant 𝑐 , we have Î 𝑗𝑖 = (cid:16) − 𝑐 log 𝑔 𝑖 ( 𝑛 ) (cid:17) ≥ − . 𝑐 log 𝑔 𝑗 ( 𝑛 ) (ii) Í 𝑗𝑖 = 𝑔 𝑖 ( 𝑛 ) ≤ . 𝑔 𝑗 ( 𝑛 ) By part (ii) of this claim, note that 𝑂 𝑗 − Õ 𝑘 = 𝑛 / log 𝑔 𝑘 ( 𝑛 ) ! = 𝑂 (cid:18) 𝑛 / log 𝑔 𝑗 − ( 𝑛 ) (cid:19) = 𝑂 ( 𝑛 / ) . Hence, Lemma 3.5 implies that with probability 1 − 𝑂 (cid:0) ( log 𝑔 𝑗 ( 𝑛 )) − (cid:1) there exist a time 𝑡 ′ ≤ 𝑡 + 𝑂 ( log 𝑔 𝑗 ( 𝑛 )) and a large constant 𝐷 such that ˆ 𝐿 ( 𝑋 𝑡 ′ ) ≤ max { 𝐵 ( log 𝑔 𝑗 ( 𝑛 )) , 𝐷 } and R ( 𝑋 𝑡 ′ ) ≤ R ( 𝑋 𝑡 ) + 𝑂 (cid:16) 𝑛 / log 𝑔 𝑗 ( 𝑛 ) (cid:17) . If ˆ 𝐿 ( 𝑋 𝑡 ′ ) ≤ max { 𝐷, 𝐵 } we are done. Hence, suppose otherwise that ˆ 𝐿 ( 𝑋 𝑡 ′ )∈ ( 𝐵 , log 𝑔 𝑗 + ( 𝑛 )] .Since the interval ( 𝐵 , log 𝑔 𝑗 + ( 𝑛 )] is completely covered by the union of the intervals [ 𝑔 𝑗 + , 𝑔 𝑗 + ] , ..., [ 𝑔 𝐾 , 𝑔 𝐾 − ] , there must be an integer 𝑗 ′ ≥ 𝑗 + 𝑔 𝑗 ′ + ( 𝑛 ) ≤ ˆ 𝐿 ( 𝑋 𝑡 ′ ) ≤ 𝑔 𝑗 ′ ( 𝑛 ) . Also, notice R ( 𝑋 𝑡 ′ ) ≤ R ( 𝑋 𝑡 ) + 𝑂 (cid:18) 𝑛 / log 𝑔 𝑗 ( 𝑛 ) (cid:19) = 𝑂 ( 𝑛 / ) + 𝑂 𝑗 − Õ 𝑘 = 𝑛 / log 𝑔 𝑘 ( 𝑛 ) ! + 𝑂 (cid:18) 𝑛 / log 𝑔 𝑗 ( 𝑛 ) (cid:19) = 𝑂 ( 𝑛 / ) + 𝑂 𝑗 Õ 𝑘 = 𝑛 / log 𝑔 𝑘 ( 𝑛 ) ! = 𝑂 ( 𝑛 / ) + 𝑂 𝑗 ′ − Õ 𝑘 = 𝑛 / log 𝑔 𝑘 ( 𝑛 ) ! . By taking at most 𝐾 steps of induction, we obtain that there exist constants 𝐶 and 𝑐 such that with proba-bility at least 𝜌 : = Î 𝐾 − 𝑖 = (cid:16) − 𝑐 log 𝑔 𝑖 ( 𝑛 ) (cid:17) , there exists a time 𝑡 𝐾 ≤ 𝑡 + 𝐾 − Õ 𝑖 = 𝐶 log 𝑔 𝑖 ( 𝑛 ) that satisﬁes ˆ 𝐿 ( 𝑋 𝑡 𝐾 ) ≤ 𝑔 𝐾 ( 𝑛 ) ≤ 𝐵 and R ( 𝑋 𝑡 𝐾 ) = 𝑂 ( 𝑛 / ) . Observe that 𝑡 𝐾 is a time when our goal hasbeen achieved, so it only remains to show that 𝜌 = Ω ( ) and 𝑡 𝐾 = 𝑂 ( log 𝑛 ) . The lower bound on 𝜌 follows12rom part (i) of Claim 3.6: 𝐾 − Ö 𝑖 = − 𝑐 log 𝑔 𝑖 ( 𝑛 ) ≥ − . 𝑐 log 𝑔 𝐾 − ( 𝑛 ) > − . 𝑐 log 𝐵 = Ω ( ) . By noting that 𝐾 = 𝑂 ( log ∗ 𝑛 ) , we can also bound 𝑡 𝐾 ′ − 𝑡 since Í 𝐾 − 𝑖 = 𝐶 log 𝑔 𝑖 ( 𝑛 ) is at most log 𝑔 ( 𝑛 ) + ( 𝐾 − ) log 𝑔 ( 𝑛 ) = 𝑂 ( log 𝑛 ) . (cid:3) In this section, we introduce a tool that would be helpful for proving Lemma 3.2.Consider the equation 𝑒 − ℭ 𝑥 = − 𝑞𝑥 + ( 𝑞 − ) 𝜃 (7)and let 𝜙 ( 𝜃, ℭ , 𝑞 ) be deﬁned as the largest positive root of (7). We shall see that 𝜙 is not deﬁned for all 𝑞 and ℭ since there may not be a positive root. When ℭ and 𝑞 are clear from the context we use 𝜙 ( 𝜃 ) = 𝜙 ( 𝜃, ℭ , 𝑞 ) .Note that 𝛽 ( ℭ ) deﬁned by equation (6) is the special case of (7) when 𝑞 =

1; observe that 𝛽 is only well-deﬁned when ℭ > 𝑘 ( 𝜃, 𝑞 ) : = ( + ( 𝑞 − ) 𝜃 )/ 𝑞 so that 𝜙 ( 𝜃, ℭ , 𝑞 ) = 𝛽 ( ℭ · 𝑘 ( 𝜃, 𝑞 )) · 𝑘 ( 𝜃, 𝑞 ) . Hence, 𝜙 ( 𝜃, ℭ , 𝑞 ) is onlydeﬁned when ℭ · 𝑘 ( 𝜃, 𝑞 ) >

1; that is, 𝜃 ∈ ( 𝜃 𝑚𝑖𝑛 , ] , where 𝜃 𝑚𝑖𝑛 = 𝑞 − ℭℭ ( 𝑞 − ) . Note that when ℭ = 𝑞 , 𝜙 ( 𝜃 ) isdeﬁned for every 𝜃 ∈ ( , ] .For ﬁxed ℭ and 𝑞 , we call 𝑓 ( 𝜃 ) : = 𝜃 − 𝜙 ( 𝜃 ) the drift function . The function 𝑓 is deﬁned on ( max { 𝜃 𝑚𝑖𝑛 , } , ] . Lemma 3.7.

When 𝑞 = ℭ < , the drift function 𝑓 is non-negative for any 𝜃 ∈ [ 𝜉, ] , where 𝜉 is an arbitrarilysmall positive constant.Proof. When

ℭ = 𝑞 <

2, the drift function 𝑓 does not have a positive root, it is continuous in (0,1], and 𝑓 ( ) >

0; see Lemma 2.5 in [9] and Fact 3.5 in [4]. Since lim 𝜃 → 𝑓 ( 𝜃 ) =

0, the result follows. (cid:3)

We use 𝐴 ( 𝑋 ) to denote the number of vertices activated by the step CM dynamics from conﬁguration 𝑋 . Proof of Lemma 3.2.

Let ˆ 𝑇 be the ﬁrst time 𝑡 when ˆ 𝐿 ( 𝑋 𝑡 ) ≤ 𝛿𝑛 / , let 𝑇 ′ be a large constant we choose later;we set 𝑇 = ˆ 𝑇 ∧ 𝑇 ′ . Observe that with constant probability the largest component in the conﬁguration isactivated by the CM dynamics for every 𝑡 ≤ 𝑇 ′ ; i.e., the event Λ 𝑡 occurs for every 𝑡 ≤ 𝑇 ′ . Let us assumethis is the case and ﬁx 𝑡 < 𝑇 . Suppose R ( 𝑋 𝑡 ) ≤ R ( 𝑋 ) + 𝑡 · 𝐶 √ 𝛿 𝑛 where 𝐶 is the positive constant fromLemma 3.4. We show that with high probability:(i) R ( 𝑋 𝑡 + ) ≤ R ( 𝑋 ) + 𝑡 · 𝐶 √ 𝛿 𝑛 ; and(ii) 𝐿 ( 𝑋 𝑡 + ) ≤ 𝐿 ( 𝑋 𝑡 ) − 𝜉𝑛 where 𝜉 is a positive constant independent of 𝑡 and 𝑛 .In particular, it suﬃces to set 𝑇 ′ = ( − 𝛿 )/ 𝜉 for the lemma to hold.First, we show that 𝐴 ( 𝑋 𝑡 ) is concentrated around its mean. Let 𝐿 ( 𝑋 𝑡 ) : = 𝜃 𝑡 𝑛 and 𝐿 ( 𝑋 𝑡 + ) : = 𝜃 𝑡 + 𝑛 .Let E [ 𝐴 𝑡 | Λ 𝑡 ] = 𝜇 𝑡 = 𝑛𝑞 + ( − 𝑞 ) · 𝜃 𝑡 𝑛 , 𝛾 : = 𝑛 / , and 𝐽 𝑡 : = [ 𝜇 𝑡 − 𝛾, 𝜇 𝑡 + 𝛾 ] . Hoeﬀding’s inequality impliesPr [ 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 | Λ 𝑡 ] ≥ − (cid:18) − 𝛾 R ( 𝑋 𝑡 ) (cid:19) = − 𝑒 − Ω ( 𝑛 / ) . 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 , then the random graph 𝐺 ( 𝐴 ( 𝑋 𝑡 ) , 𝑝 ) is super-critical since 𝐴 ( 𝑋 𝑡 ) · 𝑝 ≥ ( 𝜇 𝑡 − 𝛾 𝑡 ) · 𝑞𝑛 = (cid:20) 𝑛𝑞 + (cid:18) − 𝑞 (cid:19) · 𝜃 𝑡 𝑛 − 𝑛 / (cid:21) · 𝑞𝑛 = + ( 𝑞 − ) 𝜃 𝑡 − 𝑜 ( ) > . Next, we give a bound for the size of largest new component, provided 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 . We can write 𝐺 ( 𝐴 ( 𝑋 𝑡 ) , ℭ / 𝑛 ) as 𝐺 ( 𝜇 𝑡 + 𝑚, 𝑘 ( 𝜃 𝑡 , 𝑞 ) · 𝑞 / 𝜇 𝑡 ) where 𝑚 : = 𝐴 ( 𝑋 𝑡 ) − 𝜇 𝑡 ; notice that | 𝑚 | ≤ 𝛾 = 𝑜 ( 𝑛 ) . Let 𝐻 ∼ 𝐺 ( 𝜇 𝑡 + 𝑚, 𝑘 ( 𝜃 𝑡 , 𝑞 ) · 𝑞 / 𝜇 𝑡 ) . Since 𝑘 ( 𝜃 𝑡 , 𝑞 ) · 𝑞 = + ( 𝑞 − ) 𝜃 𝑡 > + 𝛿 ( 𝑞 − ) > 𝑛 , Lemma 2.8 implies that for 𝜙 ( 𝜃 𝑡 ) > 𝐿 ( 𝐻 ) ∈ h 𝜙 ( 𝜃 𝑡 ) 𝑛 − p 𝑛 log 𝑛, 𝜙 ( 𝜃 𝑡 ) 𝑛 + p 𝑛 log 𝑛 i . Note that 𝐿 ( 𝐻 ) = Ω ( 𝑛 ) w.h.p.; hence, since 𝐿 ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) we have 𝐿 ( 𝑋 𝑡 + ) = 𝐿 ( 𝐻 ) w.h.p. Wehave shown that w.h.p. 𝜃 𝑡 + − 𝜃 𝑡 ≤ 𝜙 ( 𝜃 𝑡 ) + p 𝑛 log 𝑛𝑛 − 𝜃 𝑡 = − 𝑓 ( 𝜃 𝑡 ) + r log 𝑛𝑛 , where 𝑓 is the drift function deﬁned in Section 3.1. By Lemma 3.7, we know 𝑓 ( 𝜃 𝑡 ) > 𝜉 > 𝜉 (independent of 𝑛 and 𝑡 ). Hence, w.h.p. for suﬃciently large 𝑛𝐿 ( 𝑋 𝑡 + ) − 𝐿 ( 𝑋 𝑡 ) ≤ − 𝜉 𝑛 + 𝑜 ( 𝑛 ) ≤ − 𝜉 𝑛 𝑡 < 𝑇 we have ˆ 𝐿 ( 𝑋 𝑡 ) > 𝛿𝑛 / , so Lemma 3.4 implies,Pr (cid:20) R ( 𝑋 𝑡 + ) < R ( 𝑋 ) + 𝑡 · 𝐶 √ 𝛿 𝑛 + 𝐶𝑛 / √ 𝛿𝑛 / (cid:21) = − 𝑜 ( ) . A union bound implies that these two events occur simultaneously w.h.p. and the result follows. (cid:3)

Before proving Lemma 3.5 we provide the proof of Lemma 3.3.

Proof of Lemma 3.3.

We start with part 1. Let 𝜇 𝑡 : = E [ 𝐴 ( 𝑋 𝑡 ) | Λ 𝑡 , 𝑋 𝑡 ] , 𝛾 𝑡 : = q ˆ 𝐿 ( 𝑋 𝑡 ) · 𝑛 / , and 𝐽 𝑡 : = [ 𝜇 𝑡 − 𝛾 𝑡 , 𝜇 𝑡 + 𝛾 𝑡 ] . Hoeﬀding’s inequality implies thatPr [ 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 | Λ 𝑡 , 𝑋 𝑡 ] ≥ − (cid:18) − 𝛾 𝑡 R ( 𝑋 𝑡 ) (cid:19) = − exp (cid:16) − Ω ( ˆ 𝐿 ( 𝑋 𝑡 )) (cid:17) . Let 𝑚 : = 𝜇 𝑡 + 𝛾 𝑡 , 𝐺 ∼ 𝐺 ( 𝑚, 𝑞𝑛 ) and ˆ 𝐺 ∼ 𝐺 ( 𝐴 ( 𝑋 𝑡 ) , 𝑝 ) . Then, the monotonicity of the largest componentin a random graph implies that for any ℓ > (cid:2) 𝐿 ( ˆ 𝐺 ) > ℓ | 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 (cid:3) = Õ 𝑎 ∈ 𝐽 𝑡 Pr (cid:2) 𝐿 ( ˆ 𝐺 ) > ℓ | 𝐴 ( 𝑋 𝑡 ) = 𝑎 (cid:3) Pr [ 𝐴 ( 𝑋 𝑡 ) = 𝑎 | 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 ]≤ Õ 𝑎 ∈ 𝐽 𝑡 Pr (cid:2) 𝐿 ( ˆ 𝐺 ) > ℓ | 𝐴 ( 𝑋 𝑡 ) = 𝑚 (cid:3) Pr [ 𝐴 ( 𝑋 𝑡 ) = 𝑎 | 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 ] = Pr [ 𝐿 ( 𝐺 ) > ℓ ] Õ 𝑎 ∈ 𝐽 𝑡 Pr [ 𝐴 ( 𝑋 𝑡 ) = 𝑎 | 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 ] = Pr [ 𝐿 ( 𝐺 ) > ℓ ] .

14e bound next Pr [ 𝐿 ( 𝐺 ) > ℓ ] . For this, we rewrite 𝐺 ( 𝑚, 𝑞𝑛 ) as 𝐺 (cid:0) 𝑚, + 𝜀𝑚 (cid:1) ; since 𝜇 𝑡 = ˆ 𝐿 ( 𝑋 𝑡 ) · 𝑛 / + (cid:16) 𝑛 − ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / (cid:17) 𝑞 − we have 𝜀 = 𝑚 · 𝑞𝑛 − = © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / . Thus, 𝜀 · 𝑚 = © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 (cid:18) ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / + 𝑛 − ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / 𝑞 + q ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / (cid:19) ≥ © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ · ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 · 𝑛𝑞 ≥ 𝑞 · (cid:18) ( 𝑞 − ) ˆ 𝐿 ( 𝑋 𝑡 ) + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) (cid:19) ≥ 𝑞 ˆ 𝐿 ( 𝑋 𝑡 ) / ≥

100 ˆ 𝐿 ( 𝑋 𝑡 ) , where the last inequality follows from the fact that ˆ 𝐿 ( 𝑋 𝑡 ) > 𝐵 , where 𝐵 = 𝐵 ( 𝑞 ) is a suﬃciently largeconstant.Since 𝜀 · 𝑚 ≥

1, Theorem 2.10 impliesPr (cid:20) | 𝐿 ( 𝐺 ) − 𝜀𝑚 | > q ˆ 𝐿 ( 𝑋 𝑡 ) r 𝑚𝜀 (cid:21) = 𝑒 − Ω ( ˆ 𝐿 ( 𝑋 𝑡 ) ) . Let 𝑐 = q +( 𝑞 − ) 𝛿𝑞 ( 𝑞 − ) . The upper tail bound impliesPr h 𝐿 ( 𝐺 ) ≤ 𝜀𝑚 + 𝑐 𝑛 / i ≥ − 𝑒 − Ω ( ˆ 𝐿 ( 𝑋 𝑡 ) ) . We show next that 2 𝜀𝑚 + 𝑐 𝑛 / ≤ 𝛼𝐿 ( 𝑋 𝑡 ) for some 𝛼 ∈ ( , ) .2 𝜀𝑚 + 𝑐 𝑛 / = © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / (cid:18) ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / + 𝑛 − ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / 𝑞 + q ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / (cid:19) + 𝑐 𝑛 / = 𝑞 © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 /  𝑛 + © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 /  + 𝑐 𝑛 / = 𝑞 © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) + 𝑐 𝑞 ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / + 𝑞 © « 𝑞 − + 𝑞 q ˆ 𝐿 ( 𝑋 𝑡 ) ª®®¬ ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / ≤ 𝑞 (cid:20) 𝛿 (cid:16) 𝑞 − + 𝑂 (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) − / (cid:17) (cid:17) + (cid:16) 𝑞 − + 𝑂 (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) − / (cid:17) (cid:17) (cid:21) ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / , 𝛿𝑛 / ≥ ˆ 𝐿 ( 𝑋 𝑡 ) . For suﬃciently small 𝛿 and suﬃ-ciently large 𝐵 , ∃ 𝛼 < 𝛼 > 𝑞 " 𝛿 (cid:18) 𝑞 − + 𝑞𝐵 / (cid:19) + (cid:18) 𝑞 − + 𝑞𝐵 / (cid:19) . Consequently, 𝐿 ( 𝐺 ) ≤ 𝜀𝑚 + 𝑐 𝑛 / ≤ 𝛼𝐿 ( 𝑋 𝑡 ) with probability 1 − exp (cid:16) − Ω ( ˆ 𝐿 ( 𝑋 𝑡 ) (cid:17) . If that is the case, 𝐿 ( 𝑋 𝑡 + ) ≤ max { 𝛼𝐿 ( 𝑋 𝑡 ) , 𝐿 ( 𝑋 𝑡 )} = : 𝐿 + . Therefore,Pr (cid:2) 𝐿 ( 𝑋 𝑡 + ) ≤ 𝐿 + | 𝑋 𝑡 , Λ 𝑡 (cid:3) ≥ Pr (cid:2) 𝐿 ( 𝑋 𝑡 + ) ≤ 𝐿 + | 𝑋 𝑡 , Λ 𝑡 , 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 (cid:3) · Pr [ 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 𝑡 | 𝑋 𝑡 , Λ 𝑡 ]≥ − exp (cid:16) − Ω ( ˆ 𝐿 ( 𝑋 𝑡 )) (cid:17) , which concludes the proof of part 1.For part 2, note ﬁrst that when the largest component is inactive, we have 𝐿 ( 𝑋 𝑡 + ) ≥ 𝐿 ( 𝑋 𝑡 ) ; hence,it is suﬃcient to show that 𝐿 ( 𝑋 𝑡 + ) ≤ 𝐿 ( 𝑋 𝑡 ) with the desired probability.Let 𝜇 ′ 𝑡 : = E [ 𝐴 ( 𝑋 𝑡 ) | ¬ Λ 𝑡 , 𝑋 𝑡 ] = (cid:16) 𝑛 − ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / (cid:17) 𝑞 − , 𝛾 ′ 𝑡 : = q ˆ 𝐿 ( 𝑋 𝑡 ) · 𝑛 / , and 𝐽 ′ 𝑡 : = [ 𝜇 ′ 𝑡 − 𝛾 ′ 𝑡 , 𝜇 ′ 𝑡 + 𝛾 ′ 𝑡 ] . ByHoeﬀding’s inequality, Pr (cid:2) 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 ′ 𝑡 | ¬ Λ 𝑡 , 𝑋 𝑡 (cid:3) ≥ − exp (cid:16) − Ω (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) (cid:17) (cid:17) . Let 𝐺 ∼ 𝐺 ( 𝐴 ( 𝑋 𝑡 ) , 𝑝 ) , 𝑚 = 𝜇 ′ 𝑡 + 𝛾 ′ 𝑡 and let 𝐺 + ∼ 𝐺 (cid:0) 𝜇 ′ 𝑡 + 𝛾 ′ 𝑡 , 𝑝 (cid:1) , By monotonicity of the largest componentin a random graph, Pr (cid:2) 𝐿 ( 𝐺 ) > 𝐿 ( 𝑋 𝑡 ) | 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 ′ 𝑡 (cid:3) ≤ Pr (cid:2) 𝐿 ( 𝐺 + ) > 𝐿 ( 𝑋 𝑡 ) (cid:3) . Rewrite 𝐺 (cid:0) 𝜇 ′ 𝑡 + 𝛾 ′ 𝑡 , 𝑝 (cid:1) as 𝐺 (cid:0) 𝑚, + 𝜀𝑚 (cid:1) , where 𝜀 = (cid:18) 𝑛 − ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / 𝑞 + q ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / (cid:19) · 𝑞𝑛 − = (cid:18)q ˆ 𝐿 ( 𝑋 𝑡 ) 𝑞 − ˆ 𝐿 ( 𝑋 𝑡 ) (cid:19) 𝑛 − / . From this bound, applying Lemma 2.12 to 𝐺 + , we obtainE (cid:2) R ( 𝐺 + ) (cid:3) = 𝑂 (cid:16) 𝑚𝜀 (cid:17) = 𝑂 (cid:18) 𝑛 / ˆ 𝐿 ( 𝑋 𝑡 ) (cid:19) . Hence, E (cid:2) 𝐿 ( 𝐺 + ) (cid:3) = 𝑂 (cid:16) 𝑛 / / ˆ 𝐿 ( 𝑋 𝑡 ) (cid:17) and by Markov’s inequalityPr h 𝐿 ( 𝐺 + ) > ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / i = Pr h 𝐿 ( 𝐺 + ) > ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / i ≤ 𝐸 [ 𝐿 ( 𝐺 + ) ] ˆ 𝐿 ( 𝑋 𝑡 ) 𝑛 / = 𝑂 (cid:18) 𝐿 ( 𝑋 𝑡 ) (cid:19) . To conclude, we observe thatPr [ 𝐿 ( 𝑋 𝑡 + ) ≤ 𝐿 ( 𝑋 𝑡 ) | 𝑋 𝑡 , ¬ Λ 𝑡 ]≥ Pr (cid:2) 𝐿 ( 𝐺 ) ≤ 𝐿 ( 𝑋 𝑡 ) | 𝑋 𝑡 , ¬ Λ 𝑡 , 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 ′ 𝑡 (cid:3) Pr (cid:2) 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 ′ 𝑡 | 𝑋 𝑡 , ¬ Λ 𝑡 (cid:3) ≥ (cid:16) − 𝑒 − Ω ( ˆ 𝐿 ( 𝑋 𝑡 ) ) (cid:17) (cid:18) − 𝑂 (cid:18) 𝐿 ( 𝑋 𝑡 ) (cid:19) (cid:19) = − 𝑂 (cid:18) 𝐿 ( 𝑋 𝑡 ) (cid:19) , as desired. (cid:3)

16e are now ready to prove Lemma 3.5.

Proof of Lemma 3.5.

Suppose R ( 𝑋 ) ≤ 𝐷 𝑛 / for a constant 𝐷 . Let 𝑇 ′ : = 𝐵 ′ log 𝑔 ( 𝑛 ) , where 𝐵 ′ is aconstant such that 𝐵 ′ log 𝑔 ( 𝑛 ) = 𝑞 log / 𝛼 (cid:16) 𝑔 ( 𝑛 ) 𝐵 ( log 𝑔 ( 𝑛 )) (cid:17) and 𝛼 : = 𝛼 ( 𝐵, 𝑞, 𝛿 ) is the constant from Lemma 3.3.Let ˆ 𝑇 be the ﬁrst time ˆ 𝐿 ( 𝑋 𝑡 ) ≤ max { 𝐵 ( log 𝑔 ( 𝑛 )) , 𝐷 } , where 𝐷 is a large constant we choose later. Let 𝑇 : = min { 𝑇 ′ , ˆ 𝑇 } . Deﬁne 𝑒 ( 𝑡 ) as the number of steps up totime 𝑡 in which the largest component of the conﬁguration is activated.To facilitate the notation, we deﬁne the following events. (The constants 𝐶 and 𝛼 are those fromLemmas 3.3 and 3.4, respectively).1. Let 𝐻 𝑖 denote ˆ 𝐿 ( 𝑋 𝑖 ) > max { 𝐵 ( log 𝑔 ( 𝑛 )) , 𝐷 } ;2. Let 𝐹 𝑖 denote R ( 𝑋 𝑖 ) ≤ R ( 𝑋 𝑖 − ) + 𝐶𝑛 / ˆ 𝐿 ( 𝑋 𝑖 − ) − / ; let us assume 𝐹 occurs;3. Let 𝐹 ′ 𝑖 denote R ( 𝑋 𝑖 ) ≤ R ( 𝑋 𝑖 − ) + 𝐶𝑛 / ( log 𝑔 ( 𝑛 )) − 𝐵 − / ; again, we assume 𝐹 ′ occurs;4. Let 𝑄 𝑖 denote ˆ 𝐿 ( 𝑋 𝑖 ) ≤ max { 𝛼 𝑒 ( 𝑖 ) ˆ 𝐿 ( 𝑋 ) , 𝐷 } ;5. Let 𝐵𝑎𝑠𝑒 𝑖 be the intersection of { 𝐹 ′ , 𝑄 , 𝐻 } , ..., { 𝐹 ′ 𝑖 − , 𝑄 𝑖 − , 𝐻 𝑖 − } , and { 𝐹 ′ 𝑖 , 𝑄 𝑖 }.By induction, we ﬁnd a lower bound for the probability of 𝐵𝑎𝑠𝑒 𝑇 . For the base case, note that Pr [ 𝐵𝑎𝑠𝑒 ] = [ 𝐵𝑎𝑠𝑒 𝑖 + ∧ 𝑇 | 𝐵𝑎𝑠𝑒 𝑖 ∧ 𝑇 ] = − 𝑂 ( ( log 𝑔 ( 𝑛 )) − ) . If 𝑇 ≤ 𝑖 , then 𝐵𝑎𝑠𝑒 𝑖 ∧ 𝑇 = 𝐵𝑎𝑠𝑒 𝑇 = 𝐵𝑎𝑠𝑒 𝑖 + ∧ 𝑇 , so the induction holds. If 𝑇 > 𝑖 , then we have 𝐻 𝑖 . By theinduction hypothesis 𝐹 ′ , 𝐹 ′ , ..., 𝐹 ′ 𝑖 − , R ( 𝑋 𝑖 ) ≤ R ( 𝑋 ) + 𝑖 · 𝐶𝑛 / ( log 𝑔 ( 𝑛 )) − 𝐵 − / . Moreover, since 𝑖 < 𝑇 ≤ 𝑇 ′ = 𝐵 ′ log 𝑔 ( 𝑛 ) and R ( 𝑋 ) ≤ 𝐷 𝑛 / , we have R ( 𝑋 𝑖 ) ≤ 𝐷 𝑛 / + 𝐶𝐵 ′ 𝑛 / ( log 𝑔 ( 𝑛 )) − 𝐵 − / . Given R ( 𝑋 𝑖 ) = 𝑂 ( 𝑛 / ) and 𝐻 𝑖 , Lemma 3.4 implies that 𝐹 𝑖 + occurs with probability1 − 𝑂 ( ˆ 𝐿 ( 𝑋 𝑖 ) − / ) = − 𝑂 ( ( log 𝑔 ( 𝑛 )) − ) . In addition, note that 𝐹 𝑖 + ∪ 𝐻 𝑖 to 𝐹 ′ 𝑖 + . Let ( Λ 𝑡 ) be the indicator function for the event Λ 𝑡 . Given 𝐻 𝑖 , 𝑄 𝑖 and R ( 𝑋 𝑖 ) = 𝑂 ( 𝑛 / ) , Lemma 3.3 implies 𝐿 ( 𝑋 𝑖 + ) ≤ max { 𝛼 ( Λ 𝑡 ) 𝐿 ( 𝑋 𝑖 ) , 𝐿 ( 𝑋 𝑖 )} (8)with probability at least 1 − 𝑂 (cid:16) ˆ 𝐿 ( 𝑋 𝑡 ) − (cid:17) = − 𝑂 (cid:0) ( log 𝑔 ( 𝑛 )) − (cid:1) .Dividing equation (8) by 𝑛 / , we obtain 𝑄 𝑖 + for large enough 𝐷 . In particular, we can choose 𝐷 to be 𝐷 +

2. A union bound then impliesPr [ 𝐵𝑎𝑠𝑒 𝑖 + ∧ 𝑇 | 𝐵𝑎𝑠𝑒 𝑖 ∧ 𝑇 ] ≥ Pr [ 𝐵𝑎𝑠𝑒 𝑖 + ∧ 𝑇 | 𝐵𝑎𝑠𝑒 𝑖 , 𝐻 𝑖 ] = − 𝑂 ( ( log 𝑔 ( 𝑛 )) − ) . 𝐵𝑎𝑠𝑒 𝑇 can then be bounded as follows:Pr [ 𝐵𝑎𝑠𝑒 𝑇 ] ≥ 𝑇 − Ö 𝑖 = Pr [ 𝐵𝑎𝑠𝑒 𝑖 + ∧ 𝑇 | 𝐵𝑎𝑠𝑒 𝑖 ∧ 𝑇 ] = 𝑇 − Ö 𝑖 = − 𝑂 ( ( log 𝑔 ( 𝑛 )) − ) = − 𝑂 ( ( log 𝑔 ( 𝑛 )) − ) . Next, let us assume

𝐵𝑎𝑠𝑒 𝑇 . Then we have R ( 𝑋 𝑇 ) ≤ R ( 𝑋 ) + 𝑇 ′ · 𝐶𝑛 / ( log 𝑔 ( 𝑛 )) − 𝐵 − / = R ( 𝑋 ) + 𝑂 (cid:16) 𝑛 / ( log 𝑔 ( 𝑛 )) − (cid:17) . Notice that if 𝑇 = ˆ 𝑇 then the proof is complete. Consequently, it suﬃces to show ˆ 𝑇 ≤ 𝑇 ′ with probabilityat least 1 − 𝑔 ( 𝑛 ) − Ω ( ) .Observe that 𝐾 : = 𝑒 ( 𝑇 ′ ) is a binomial random variable 𝐵𝑖𝑛 ( 𝑇 ′ , / 𝑞 ) , whose expectation is 𝑇 ′ 𝑞 = 𝐵 ′ 𝑞 log 𝑔 ( 𝑛 ) . By Chernoﬀ boundPr (cid:20) 𝐾 < 𝐵 ′ 𝑞 log 𝑔 ( 𝑛 ) (cid:21) ≤ exp (cid:18) − 𝐵 ′ 𝑞 log 𝑔 ( 𝑛 ) (cid:19) = 𝑔 ( 𝑛 ) − Ω ( ) . If indeed 𝑇 = 𝑇 ′ and 𝐾 ≥ 𝐵 ′ 𝑞 log 𝑔 ( 𝑛 ) , then the event 𝑄 𝑇 impliesˆ 𝐿 ( 𝑋 𝑇 ) < 𝛼 𝑒 ( 𝑇 ) ˆ 𝐿 ( 𝑋 ) ≤ 𝛼 log 𝛼 (cid:18) 𝐵 ( log 𝑔 ( 𝑛 )) 𝑔 ( 𝑛 ) (cid:19) ˆ 𝐿 ( 𝑋 ) = 𝐵 ( log 𝑔 ( 𝑛 )) 𝑔 ( 𝑛 ) ˆ 𝐿 ( 𝑋 ) ≤ 𝐵 ( log 𝑔 ( 𝑛 )) , which leads to ˆ 𝑇 ≤ 𝑇 . Therefore,Pr (cid:2) ˆ 𝑇 > 𝑇 ′ | 𝐵𝑎𝑠𝑒 𝑇 (cid:3) ≤ Pr (cid:20) 𝐾 < 𝐵 ′ 𝑞 log 𝑔 ( 𝑛 ) (cid:21) = 𝑔 ( 𝑛 ) − Ω ( ) , as desired. (cid:3) In this section, we prove Lemma 1.3; that is, we design a coupling for the steps of the CM dynamics thatstarting from conﬁgurations 𝑋 , 𝑌 such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , R ( 𝑌 ) = 𝑂 ( 𝑛 / ) , after 𝑇 = 𝑂 ( log 𝑛 ) steps 𝑋 𝑇 and 𝑌 𝑇 have the same component structure. We introduce some notation ﬁrst. We say a variable isa constant when it does not depend on 𝑛 . Let 𝜔 ( 𝑛 ) = log log log log 𝑛 and let 𝐵 𝜔 = 𝑛 / 𝜔 ( 𝑛 ) − . For arandom-cluster conﬁguration 𝑋 , let e R 𝜔 ( 𝑋 ) = Í 𝑗 : 𝐿 𝑗 ( 𝑋 ) ≤ 𝐵 𝜔 𝐿 𝑗 ( 𝑋 ) . We use 𝐴 ( 𝑋 ) to denote the number ofvertices activated by the CM dynamics from conﬁguration 𝑋 , and 𝐼 ( 𝑋 ) for the number of isolated verticesof 𝑋 .The proof of Lemma 1.3 consists of a multi-phased coupling argument. The ﬁrst step is a continuationof the initial burn-in phase until certain (additional) key properties of the conﬁgurations arise. Lemma 4.1.

Let

ℭ = 𝑞 , 𝑞 ∈ ( , ) and suppose 𝑋 is such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) . Then, there exists 𝑇 = 𝑂 ( log 𝜔 ( 𝑛 )) and a constant 𝛽 > such that e R 𝜔 ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) , R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) and 𝐼 ( 𝑋 𝑇 ) = Ω ( 𝑛 ) with probability Ω ( 𝜔 ( 𝑛 ) − 𝛽 ) . Once we have obtained suitable bounds on e R 𝜔 ( 𝑋 𝑇 ) , e R 𝜔 ( 𝑌 𝑇 ) , 𝐼 ( 𝑋 𝑇 ) and 𝐼 ( 𝑌 𝑇 ) , we construct a two-stepcoupling which relies on our local limit theorem (Theorem 2.2) that ensures that both conﬁgurations havethe same large component structure. Let S 𝜔 ( 𝑋 ) be the set of connected components of 𝑋 with sizes greaterthan 𝐵 𝜔 . 18or an increasing positive function 𝑔 and each integer 𝑘 ≥ I 𝑘 ( 𝑔 ) = (cid:20) 𝜗𝑛 / 𝑔 ( 𝑛 ) 𝑘 , 𝜗𝑛 / 𝑔 ( 𝑛 ) 𝑘 (cid:21) , where 𝜗 > 𝑁 𝑘 ( 𝑋, 𝑔 ) be the number of components of 𝑋 whose sizes are inthe interval I 𝑘 ( 𝑔 ) .To couple the component activation of two random-cluster conﬁgurations 𝑋 𝑡 and 𝑌 𝑡 , we consider amaximal matching 𝑊 𝑡 between the components of 𝑋 𝑡 and 𝑌 𝑡 with the restriction that only components ofequal size are matched to each other. Deﬁne ˆ 𝑁 𝑘 ( 𝑡, 𝑔 ) : = ˆ 𝑁 𝑘 ( 𝑋 𝑡 , 𝑌 𝑡 , 𝑔 ) as the number of matched pairs in 𝑊 𝑡 whose component sizes are in the interval I 𝑘 ( 𝑔 ) . Lemma 4.2.

Let

ℭ = 𝑞 , 𝑞 ∈ ( , ) and suppose 𝑋 , 𝑌 are random-cluster conﬁgurations such that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , e R 𝜔 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) , 𝐼 ( 𝑋 ) = Ω ( 𝑛 ) and similarly for 𝑌 . Then, there exists a two-step cou-pling the CM dynamics such that, with probability exp (cid:0) − 𝑂 ( 𝜔 ( 𝑛 ) ) (cid:1) , S 𝜔 ( 𝑋 ) = S 𝜔 ( 𝑌 ) , ˆ 𝑁 𝑘 ( 𝑋 , 𝑌 , 𝜔 ( 𝑛 )) = Ω ( 𝜔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ such that 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 − → ∞ , 𝐿 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 )) , R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , e R 𝜔 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) , 𝐼 ( 𝑋 ) = Ω ( 𝑛 ) , and similarly for 𝑌 . The proof of Lemma 4.2 is provided in Section 4.1. After matching the component structure of the largecomponents, we run the chains coupled to ensure that some additional stationarity properties, required toﬁnalize the coupling of the smaller components, arise.Let 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) be the components in 𝑊 𝑡 from 𝑋 𝑡 and 𝑌 𝑡 , respectively, and let 𝐷 ( 𝑋 𝑡 ) and 𝐷 ( 𝑌 𝑡 ) be the complements of 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) . Let 𝑍 𝑡 = Õ C∈ 𝐷 ( 𝑋 𝑡 )∪ 𝐷 ( 𝑌 𝑡 ) |C| . Lemma 4.3.

Let

ℭ = 𝑞 , 𝑞 ∈ ( , ) . Suppose 𝑋 and 𝑌 are random-cluster conﬁgurations such that S 𝜔 ( 𝑋 ) = S 𝜔 ( 𝑌 ) , and ˆ 𝑁 𝑘 ( 𝑋 , 𝑌 , 𝜔 ( 𝑛 )) = Ω ( 𝜔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ such that 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 − → ∞ . Suppose also that 𝐿 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 )) , R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , e R 𝜔 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) , 𝐼 ( 𝑋 ) = Ω ( 𝑛 ) , and similarly for 𝑌 .Then, there exists a coupling of the CM steps such that with probability exp (− 𝑂 ( ( log 𝜔 ( 𝑛 )) )) after 𝑇 = 𝑂 ( log 𝜔 ( 𝑛 )) steps: S 𝜔 ( 𝑋 𝑇 ) = S 𝜔 ( 𝑌 𝑇 ) , 𝑍 𝑇 = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) , ˆ 𝑁 𝑘 ( 𝑋 𝑇 , 𝑌 𝑇 , 𝜔 ( 𝑛 ) / ) = Ω ( 𝜔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ such that 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 − → ∞ , R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) , 𝐼 ( 𝑋 𝑇 ) = Ω ( 𝑛 ) , and similarly for 𝑌 𝑇 . The proof of Lemma 4.3 also uses our local limit theorem (Theorem 2.2) and is provided in Section 4.2.The ﬁnal key step of our proof is a coupling of the activation of the small components of size less than 𝐴 that ensures that exactly the same number of vertices is activated from each copy in each step w.h.p. Thiscoupling uses our estimates on the maximum of symmetric random walks (Theorem 2.4), and its proof isprovided in Section 4.3. Lemma 4.4.

Let

ℭ = 𝑞 , 𝑞 ∈ ( , ) and suppose 𝑋 and 𝑌 are random-cluster conﬁgurations such that S 𝜔 ( 𝑋 ) = S 𝜔 ( 𝑌 ) , 𝑍 = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) , and ˆ 𝑁 𝑘 (cid:0) 𝑋 , 𝑌 , 𝜔 ( 𝑛 ) / (cid:1) = Ω ( 𝜔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ such that 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 − → ∞ . Suppose also that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and, 𝐼 ( 𝑋 ) = Ω ( 𝑛 ) and similarly for 𝑌 . Then, thereexist a coupling of the CM steps and a constant 𝛽 > such that after 𝑇 = 𝑂 ( log 𝑛 ) steps 𝑋 𝑇 and 𝑌 𝑇 have thesame component structure with probability Ω (cid:0) ( log log log 𝑛 ) − 𝛽 (cid:1) . The proof of Lemma 1.3 follows immediately.

Proof of Lemma 1.3.

Suppose R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and R ( 𝑌 ) = 𝑂 ( 𝑛 / ) . It follows immediately fromLemma 4.1 4.2, 4.3 and 4.4 that there exists a coupling of the CM steps such that after 𝑇 = 𝑂 ( log 𝑛 ) 𝑋 𝑇 and 𝑌 𝑇 could have the same component structure. This coupling succeeds with probability atleast 𝜌 = Ω ( 𝜔 ( 𝑛 ) − 𝛽 ) · exp (cid:0) − 𝑂 ( 𝜔 ( 𝑛 ) ) (cid:1) · exp (cid:0) − 𝑂 (cid:0) ( log 𝜔 ( 𝑛 )) (cid:1) (cid:1) · Ω (cid:16) ( log log log 𝑛 ) − 𝛽 (cid:17) , where 𝛽 and 𝛽 are positive constants independent of 𝑛 . Thus, 𝜌 = Ω (cid:0) ( log log 𝑛 ) − (cid:1) , since 𝜔 ( 𝑛 ) = log log log log 𝑛 . (cid:3) Proof of Lemma 4.1.

We show that there exist suitable constants 𝐶 , 𝐷 > 𝛼 ∈ ( , ) such that if R ( 𝑋 𝑡 ) ≤ 𝐶𝑛 / and e R 𝜔 ( 𝑋 𝑡 ) > 𝐷𝑛 / 𝜔 ( 𝑛 ) − / , then R ( 𝑋 𝑡 + ) ≤ 𝐶𝑛 / , and e R 𝜔 ( 𝑋 𝑡 + ) ≤ ( − 𝛼 ) e R 𝜔 ( 𝑋 𝑡 ) (9)with probability 𝜌 = Ω ( ) . This implies that we can maintain (9) for 𝑇 steps with probability 𝜌 𝑇 . Precisely,if we let 𝜏 = min { 𝑡 > R ( 𝑋 𝑡 ) > 𝐶𝑛 / } ,𝜏 = min { 𝑡 > e R 𝜔 ( 𝑋 𝑡 ) > ( − 𝛼 ) e R 𝜔 ( 𝑋 𝑡 − )} ,𝑇 = min { 𝜏 , 𝜏 , 𝑐 log 𝜔 ( 𝑛 )} , where the constant 𝑐 > ( − 𝛼 ) 𝑐 log 𝜔 ( 𝑛 ) = 𝑂 ( 𝜔 ( 𝑛 ) − / ) , then 𝑇 = 𝑐 log 𝜔 ( 𝑛 ) withprobability 𝜌 𝑐 log 𝜔 ( 𝑛 ) , (Note that 𝜌 𝑐 log 𝜔 ( 𝑛 ) = 𝜔 ( 𝑛 ) − 𝛽 for a suitable constant 𝛽 >

0) and so R ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) and e R 𝜔 ( 𝑋 𝑇 ) ≤ e R 𝜔 ( 𝑋 ) · 𝑂 ( 𝜔 ( 𝑛 ) − / ) ≤ R ( 𝑋 ) · 𝑂 ( 𝜔 ( 𝑛 ) − / ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) . The lemma then follows from the fact that 𝐼 ( 𝑋 𝑇 ) = Ω ( 𝑛 ) with probability 1 − 𝑜 ( ) by Lemma 2.7.To establish (9), let H 𝑡 be the event that 𝐴 ( 𝑋 𝑡 ) ∈ (cid:2) 𝑛 / 𝑞 − 𝛿𝑛 / , 𝑛 / 𝑞 + 𝛿𝑛 / (cid:3) , where 𝛿 > H 𝑡 , Lemma 2.15 implies that with Ω ( ) probability e R 𝜔 ( 𝑋 𝑡 + ) ≤ (cid:18) − 𝑞 (cid:19) e R 𝜔 ( 𝑋 𝑡 ) + e R 𝜔 (cid:16) 𝐺 (cid:16) 𝐴 ( 𝑋 𝑡 ) , 𝑞𝑛 (cid:17) (cid:17) = (cid:18) − 𝑞 (cid:19) e R 𝜔 ( 𝑋 𝑡 ) + 𝑂 (cid:18) 𝑛 / 𝜔 ( 𝑛 ) / (cid:19) ≤ (cid:18) − 𝑞 (cid:19) e R 𝜔 ( 𝑋 𝑡 ) , where the last inequality holds for suﬃciently large 𝐷 since e R 𝜔 ( 𝑋 𝑡 ) > 𝐷𝑛 / 𝜔 ( 𝑛 ) − / . For suitable constant 𝛼 ∈ ( , ) we have obtained: Pr [ e R 𝜔 ( 𝑋 𝑡 + ) ≤ ( − 𝛼 ) e R 𝜔 ( 𝑋 𝑡 ) | H 𝑡 ] = Ω ( ) . By Hoeﬀding’s inequality note that Pr [H 𝑡 ] = Ω ( ) since R ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) . Thus, with probability Ω ( ) , e R 𝜔 ( 𝑋 𝑡 + ) ≤ ( − 𝛼 ) e R 𝜔 ( 𝑋 𝑡 ) .Similarly, for large enough 𝐶 , Lemma 2.14 impliesE [R ( 𝑋 𝑡 + ) | H 𝑡 ] ≤ (cid:18) − 𝑞 (cid:19) R ( 𝑋 𝑡 ) + 𝐸 h R (cid:16) 𝐺 (cid:16) 𝐴 ( 𝑋 𝑡 ) , 𝑞𝑛 (cid:17) (cid:17) i = (cid:18) − 𝑞 (cid:19) R ( 𝑋 𝑡 ) + 𝑂 (cid:16) 𝑛 / (cid:17) ≤ (cid:18) − 𝑞 (cid:19) 𝐶𝑛 / , and so by Markov’s inequality R ( 𝑋 𝑡 + ) ≤ 𝐶𝑛 / with probability Ω ( ) . (cid:3) .1 Using the Local Limit Theorem: Proof of Lemma 4.2 The following corollary of Lemma 2.16 is useful for proofs of Lemma 4.2, 4.3 and 4.4.

Fact 4.5.

Let 𝑚 ∈ ( 𝑛 / 𝑞, 𝑛 ] . Let 𝑔 be an increasing positive function that such that 𝑔 ( 𝑛 ) = 𝑜 ( 𝑚 / ) , 𝑔 ( 𝑛 ) → ∞ and | 𝜆 | ≤ 𝑔 ( 𝑚 ) . Let 𝐻 be distributed according to a 𝐺 (cid:16) 𝑚, + 𝜆𝑚 − / 𝑚 (cid:17) random graph. There exists a constant 𝑏 > such that, with probability at least − 𝑂 (cid:0) 𝑔 ( 𝑛 ) − (cid:1) , 𝑁 𝑘 ( 𝐻, 𝑔 ) ≥ 𝑏𝑔 ( 𝑛 ) · 𝑘 − for all 𝑘 ≥ such that 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ .Proof Fact 4.5. Lemma 2.16 implies that for a suitable constant 𝑏 > h 𝑁 𝑘 ( 𝑋 𝑡 + , 𝑔 ) < 𝑏𝑔 ( 𝑛 ) · 𝑘 − i = 𝑂 ( 𝑔 ( 𝑛 ) − · 𝑘 − ) , for any 𝑘 ≥ 𝑔 ( 𝑛 ) 𝑘 = 𝑜 ( 𝑚 / ) . Observe that Õ 𝑘 ≥ 𝑔 ( 𝑛 ) · 𝑘 − ≤ Õ 𝑖 ≥ 𝑔 ( 𝑛 ) 𝑖 = 𝑂 ( 𝑔 ( 𝑛 ) − ) . Hence, a union bound over 𝑘 , i.e., over the intervals I 𝑘 ( 𝑔 ) , implies that, with probability at least 1 − 𝑂 ( 𝑔 ( 𝑛 ) − ) , 𝑁 𝑘 ( 𝑋 𝑡 + , 𝑔 ) ≥ 𝑏𝑔 ( 𝑛 ) · 𝑘 − for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ , as claimed. (cid:3) The two-step coupling in Lemma 4.2 ensures that two conﬁgurations agree in all the large components.The ﬁrst step of the coupling provides 𝑋 and 𝑌 with additional structural properties. In the second step,we are able to use the local limit theorem to deﬁne a coupling of the activation step such that 𝐴 ( 𝑋 ) = 𝐴 ( 𝑌 ) .When 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) , we create an arbitrary bijective map 𝜑 between the activated vertices of 𝑋 𝑡 and theactivated vertices of 𝑌 𝑡 . Then, the new state of any edge between two activated vertices 𝑢 and 𝑣 in 𝑋 𝑡 isset to the same state of the edge between 𝜑 ( 𝑢 ) and 𝜑 ( 𝑣 ) . This yields a coupling of the percolation sub-step which results in two identical sub-graphs in the activated portion of the conﬁgurations. We can nowprovide the proof of Lemma 4.2. Proof of Lemma 4.2.

First, both { 𝑋 𝑡 } , { 𝑌 𝑡 } perform an independent CM step from the initial conﬁgurations 𝑋 , 𝑌 . We show that 𝑋 and 𝑌 , in addition to preserving the structural properties assumed for 𝑋 and 𝑌 with probability Ω ( ) , also have many connected components of diﬀerent sizes. This fact will be crucialin the design of our coupling.By assumption R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , so Hoeﬀding’s inequality implies with probability Ω ( ) , 𝐴 ( 𝑋 ) ∈ (cid:2) 𝑛 / 𝑞 − 𝑂 ( 𝑛 / ) , 𝑛 / 𝑞 + 𝑂 ( 𝑛 / ) (cid:3) . Letting ℓ = 𝐴 ( 𝑋 ) , with at least constant probability the percolation stepis distributed as a 𝐺 ( ℓ, + 𝜆ℓ − / ℓ ) where | 𝜆 | = 𝑂 ( ) is a constant. Thus, Lemmas 2.14, 2.15, 2.7, Markov’sinequality and a union bound imply that the events 𝐼 ( 𝑋 ) = Ω ( 𝑛 ) , R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and e R 𝜔 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) all occur with probability Ω ( ) . The same holds for 𝑌 .Let 𝑊 𝑋 (resp., 𝑊 𝑌 ) be the set of components of 𝑋 (resp., 𝑌 ) with sizes in the interval h 𝑐𝑛 / 𝜔 ( 𝑛 ) , 𝑐𝑛 / 𝜔 ( 𝑛 ) i where 𝑐 > 𝛿 , 𝛿 independent of 𝑛 ,Pr  | 𝑊 𝑋 | , | 𝑊 𝑌 | ≥ 𝛿 𝑛 (cid:16) 𝑐𝑛 / 𝜔 ( 𝑛 ) (cid:17) / = Ω ( 𝜔 ( 𝑛 ) )  ≥ − · 𝛿 (cid:16) 𝑐𝑛 / 𝜔 ( 𝑛 ) (cid:17) / 𝑛 = − 𝑂 (cid:18) 𝜔 ( 𝑛 ) (cid:19) . Let us assume that 𝑋 and 𝑌 do have all these properties. Let 𝐶 𝑋 and 𝐶 𝑌 be the set of components in 𝑋 and 𝑌 , respectively, with sizes larger than 𝐵 𝜔 . Since R ( 𝑋 ) = 𝑂 ( 𝑛 / ) , the total number of components21n 𝐶 𝑋 is 𝑁 = 𝑂 ( 𝜔 ( 𝑛 ) ) ; moreover, the Cauchy–Schwarz inequality implies that the number of vertices in 𝐶 𝑋 is at most Õ 𝑗 : 𝐿 𝑗 ( 𝑋 ) ∈ 𝐶 𝑋 𝐿 𝑗 ( 𝑋 ) ≤ √ 𝑁 · s Õ 𝑗 : 𝐿 𝑗 ( 𝑋 ) ∈ 𝐶 𝑋 𝐿 𝑗 ( 𝑋 ) ≤ √ 𝑁 p 𝑅 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 )) ;the same holds for 𝐶 𝑌 . Without loss of generality, let us assume that | 𝐶 𝑋 | ≥ | 𝐶 𝑌 | , where | · | denotes thenumber of vertices in the respective components.Let 𝛤 = { 𝐶 ⊂ 𝑊 𝑌 : | 𝐶 𝑌 ∪ 𝐶 | ≥ | 𝐶 𝑋 |} , and let 𝐶 min = arg min 𝐶 ∈ 𝛤 | 𝐶 𝑌 ∪ 𝐶 | . In words, 𝐶 min is the smallest subset 𝐶 of 𝑊 𝑌 that ensures that | 𝐶 𝑌 ∪ 𝐶 | ≥ | 𝐶 𝑋 | . Since every component in 𝑊 𝑌 has size at least 𝑐𝑛 / 𝜔 ( 𝑛 ) − and | 𝑊 𝑌 | = Ω ( 𝜔 ( 𝑛 ) ) , thenumber of vertices in 𝑊 𝑌 is Ω ( 𝑛 / 𝜔 ( 𝑛 ) ) and so 𝛤 ≠ ∅ . Indeed, the number components in 𝐶 𝑚𝑖𝑛 is 𝑂 ( 𝜔 ( 𝑛 ) ) . Let 𝐶 ′ 𝑌 = 𝐶 𝑌 ∪ 𝐶 min . Observe that the number of components in 𝐶 ′ 𝑌 is 𝑂 ( 𝜔 ( 𝑛 ) ) and that0 ≤ | 𝐶 ′ 𝑌 | − | 𝐶 𝑋 | ≤ 𝑐𝑛 / 𝜔 ( 𝑛 ) − .Suppose now that in the second CM step all the components in 𝐶 𝑋 and 𝐶 ′ 𝑌 are activated simultaneously.If this is the case, then the diﬀerence in the number of active vertices is 𝑑 ≤ 𝑐𝑛 / 𝜔 ( 𝑛 ) − and we will usea local limit theorem (i.e., Theorem 2.2) to argue that there is coupling of the activation of the remainingcomponents in 𝑋 and 𝑌 such that the total number of active vertices in both copies is the same withprobability Ω ( ) . Since all the components in 𝐶 𝑋 and 𝐶 ′ 𝑌 are activated with probability exp (− 𝑂 ( 𝜔 ( 𝑛 ) )) ,the overall success probability of the coupling will be exp (− 𝑂 ( 𝜔 ( 𝑛 ) )) .Lemma 2.16 with 𝐵 = 𝜗𝑛 / 𝜔 ( 𝑛 ) − implies for constant 𝑏 > 𝑁 ( 𝑋 , 𝜔 ) ≥ 𝑏 𝜔 ( 𝑛 ) / with probabilityat least 1 − 𝑂 ( 𝜔 ( 𝑛 ) − / ) . Moreover, by Fact 4.5, for all positive integer 𝑘 such that 𝑐𝑛 / 𝜔 ( 𝑛 ) − 𝑘 → ∞ , thereexists a constant 𝑏 > − 𝑂 (cid:0) 𝜔 ( 𝑛 ) − (cid:1) 𝑁 𝑘 ( 𝑋 , 𝜔 ) ≥ 𝑏𝜔 ( 𝑛 ) · 𝑘 − . Let H be the event that for all non-negative integer 𝑘 , 𝑁 𝑘 ( 𝑋 , 𝜔 ) satisfy these bounds, and it occurs w.h.p.Now, let 𝑥 , 𝑥 , . . . , 𝑥 𝑚 be sizes of the components of 𝑋 that are not in 𝐶 𝑋 in increasing order. Letˆ 𝐴 ( 𝑋 ) be the random variable corresponding to the number of active vertices from these components.Since, the number of isolated vertices in 𝑋 is Ω ( 𝑛 ) , 𝑚 = Θ ( 𝑛 ) and so 𝑥 𝑚 = 𝑂 ( 𝑚 / 𝜔 ( 𝑚 ) − ) , Í 𝑚𝑖 = 𝑥 𝑖 = 𝑂 ( 𝑚 / 𝜔 ( 𝑚 ) − / ) and 𝑥 𝑖 = 𝑖 ≤ 𝛼𝑚 , where 𝛼 ∈ ( , ) is a positive constant.Let 𝜇 𝑋 = E [ ˆ 𝐴 ( 𝑋 )] = 𝑞 − Í 𝑚𝑖 = 𝑥 𝑖 and let 𝜎 𝑋 = Var ( ˆ 𝐴 ( 𝑋 )) = 𝑞 − ( − 𝑞 − ) Í 𝑚𝑖 = 𝑥 𝑖 . Thus, 𝜎 𝑋 = Θ ( 𝑚 / 𝜔 ( 𝑚 ) − / ) since 𝑂 𝑚 / p 𝜔 ( 𝑚 ) ! = 𝑚 Õ 𝑖 = 𝑥 𝑖 ≥ Õ 𝑥 𝑖 ∈ 𝐼 𝜗 𝑛 / 𝜔 ( 𝑛 ) = 𝑏 𝜗 𝑛 / 𝜔 ( 𝑛 ) 𝜔 ( 𝑛 ) / = Ω 𝑚 / p 𝜔 ( 𝑚 ) ! . Then, assuming the event H occurs, Theorem 2.2 implies that for any 𝑎 ∈ [ 𝜇 𝑋 − 𝜎 𝑋 , 𝜇 𝑋 + 𝜎 𝑋 ] Pr [ ˆ 𝐴 ( 𝑋 ) = 𝑎 ] = Pr hÕ 𝑚𝑖 = 𝑥 𝑖 = 𝑎 i = Ω (cid:0) 𝜎 − 𝑋 (cid:1) . Similarly, we get Pr [ ˆ 𝐴 ( 𝑌 ) = 𝑎 ] = Ω ( 𝜎 − 𝑌 ) for any 𝑎 ∈ [ 𝜇 𝑌 − 𝜎 𝑌 , 𝜇 𝑌 + 𝜎 𝑌 ] , with ˆ 𝐴 ( 𝑌 ) , 𝜇 𝑌 and 𝜎 𝑌 deﬁnedanalogously. Note that 𝜇 𝑋 − 𝜇 𝑌 = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − ) and 𝜎 𝑋 , 𝜎 𝑌 = Θ ( 𝑛 / 𝜔 ( 𝑛 ) − / ) . Without loss of generality,suppose 𝜎 𝑋 < 𝜎 𝑌 . Then for any 𝑎 ∈ [ 𝜇 𝑋 − 𝜎 𝑋 / , 𝜇 𝑌 + 𝜎 𝑋 / ] and 𝑑 = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − ) , we havemin (cid:8) Pr [ ˆ 𝐴 ( 𝑋 ) = 𝑎 ] , Pr [ ˆ 𝐴 ( 𝑌 ) = 𝑎 − 𝑑 ] (cid:9) = min (cid:8) Ω ( 𝜎 − 𝑋 ) , Ω ( 𝜎 − 𝑌 ) (cid:9) = Ω ( 𝜎 − 𝑌 ) . Hence, we can couple ( ˆ 𝐴 ( 𝑋 ) , ˆ 𝐴 ( 𝑌 )) so thatPr [ ˆ 𝐴 ( 𝑋 ) = 𝑎, ˆ 𝐴 ( 𝑌 ) = 𝑎 − 𝑑 ] = Ω ( 𝜎 − 𝑌 ) , ∀ 𝑎 ∈ [ 𝜇 𝑋 − 𝜎 𝑋 / , 𝜇 𝑌 + 𝜎 𝑋 / ] . 𝐴 ( 𝑋 ) and ˆ 𝐴 ( 𝑌 ) such thatPr [ ˆ 𝐴 ( 𝑋 ) − ˆ 𝐴 ( 𝑌 ) = 𝑑 ] = Ω ( 𝜎 𝑋 / 𝜎 𝑌 ) = Ω ( ) . Putting all the above probability bounds together, we deduce that 𝐴 ( 𝑋 ) = 𝐴 ( 𝑌 ) with probability atleast exp (− 𝑂 ( 𝜔 ( 𝑛 ) )) . If this is the case, we can couple the edge re-sampling step bijectively such that S 𝜔 ( 𝑋 ) = S 𝜔 ( 𝑌 ) .It remains for us to guarantee the other desired structural properties of 𝑋 and 𝑌 . By Hoeﬀding’sinequality, 𝐴 ( 𝑋 ) ≤ 𝑛𝑞 + ( 𝑞 − ) | 𝐶 𝑋 | 𝑞 + 𝑂 ( 𝑛 / ) with probability Ω ( ) . Letting ℓ ′ = 𝐴 ( 𝑋 ) , with probability Ω ( ) the percolation step is distributed asa 𝐹 ∼ 𝐺 (cid:16) ℓ ′ , + 𝜆 ′ ℓ ′− / ℓ ′ (cid:17) random graph, where 0 < 𝜆 ′ = 𝑂 ( 𝜔 ( 𝑛 )) since | 𝐶 𝑋 | = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 )) . Since thecomponents of 𝐹 contribute to both 𝑋 and 𝑌 , Fact 4.5 implies that with probability at least 1 − 𝜔 ( 𝑛 ) wehave ˆ 𝑁 𝑘 ( 𝑋 , 𝑌 , 𝜔 ( 𝑛 )) = Ω ( 𝜔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 → ∞ . Moreover, Lemmas 2.9,2.15 and Markov’s inequality imply that R ( 𝑋 ) = 𝑂 ( 𝑛 / ) and e R 𝜔 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) with probability Ω ( ) , and by Lemma 2.7, 𝐼 ( 𝑋 ) = Ω ( 𝑛 ) with probability Ω ( ) . The same holds for 𝑌 .Finally, we derive a bound for 𝐿 ( 𝑋 ) . Let 𝜀 : = 𝜆 ′ ℓ ′− / . If 𝜀 ℓ ′ >

1, then Corollary 2.11 impliesthat for constant 𝑐 > 𝐿 ( 𝐹 ) = 𝑂 ( 𝜀ℓ ′ ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 )) with probability 1 − 𝑂 ( exp (− 𝑐𝜀 ℓ ′ )) = Ω ( ) .Otherwise, 𝜆 ′ <

1, and by Lemma 2.14 and Markov’s inequality, 𝐿 ( 𝐹 ) = 𝑂 ( 𝑛 / ) with at least constantprobability. We also know that the largest inactivated component in 𝑋 has size less than 𝑛 / 𝜔 ( 𝑛 ) − , so 𝐿 ( 𝑋 ) = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 )) with probability Ω ( ) . Similar bound holds for 𝐿 ( 𝑌 ) and the result follows. (cid:3) In the previous section, we designed a coupling argument to ensure that the largest components of bothconﬁgurations have the same size. For this, we needed to relax our constraint on the size of the largestcomponent of the conﬁgurations. In this section we prove Lemma 4.3, which ensures that after 𝑂 ( log 𝜔 ( 𝑛 )) steps the largest components of each conﬁguration have size 𝑂 ( 𝑛 / ) again.The following lemma is the core of the proof Lemma 4.3 and it may be viewed as a generalization ofthe coupling from the proof of Lemma 4.2 using the local limit theorem from Section 2.2.We recall some notation from the beginning of the section. Given two random-cluster conﬁgurations 𝑋 𝑡 and 𝑌 𝑡 , 𝑊 𝑡 is maximal matching between the components of 𝑋 𝑡 and 𝑌 𝑡 that only matches components ofequal size to each other. We use 𝑀 ( 𝑋 𝑡 ) , 𝑀 ( 𝑌 𝑡 ) for the components in 𝑊 𝑡 from 𝑋 𝑡 , 𝑌 𝑡 , respectively, 𝐷 ( 𝑋 𝑡 ) , 𝐷 ( 𝑌 𝑡 ) for the complements of 𝑀 ( 𝑋 𝑡 ) , 𝑀 ( 𝑌 𝑡 ) , and 𝑍 𝑡 = Í C∈ 𝐷 ( 𝑋 𝑡 )∪ 𝐷 ( 𝑌 𝑡 ) |C| . Lemma 4.6.

There exists a coupling of the activation step of the CM dynamics such that 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) withat least Ω (cid:16) 𝜔 ( 𝑛 ) (cid:17) probability, provided 𝑋 𝑡 and 𝑌 𝑡 are random-cluster conﬁgurations satisfying1. S 𝜔 ( 𝑋 𝑡 ) = S 𝜔 ( 𝑌 𝑡 ) ;2. 𝑍 𝑡 = 𝑂 (cid:16) 𝑛 / 𝜔 ( 𝑛 ) / (cid:17) ;3. ˆ 𝑁 𝑘 ( 𝑋 𝑡 , 𝑌 𝑡 , 𝜔 ( 𝑛 )) = Ω (cid:16) 𝜔 ( 𝑛 ) · 𝑘 − (cid:17) for all 𝑘 ≥ such that 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 → ∞ ;4. 𝐼 ( 𝑋 𝑡 ) , 𝐼 ( 𝑌 𝑡 ) = Ω ( 𝑛 ) . roof. The activation coupling has two parts. First we use the maximal matching 𝑊 𝑡 to couple the activa-tion of a subset of the components in 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) . Speciﬁcally, let ℓ be deﬁned as in Theorem 2.2; forall 𝑘 ∈ [ , ℓ ] , we exclude Θ ( 𝜔 ( 𝑛 ) · 𝑘 − ) pairs of components of size in the interval I 𝑘 ( 𝜔 ) and we exclude Θ ( 𝑛 ) pairs of matched isolated vertices. (These components exist by assumptions 3 and 4.) All other pairsof components matched by 𝑊 𝑡 are jointly activated (or not). Hence, the number of vertices activated from 𝑋 𝑡 in this ﬁrst part of the coupling is the same as that from 𝑌 𝑡 .Let C ( 𝑋 𝑡 ) and C ( 𝑌 𝑡 ) denote the sets containing the components in 𝑋 𝑡 and components in 𝑌 𝑡 not con-sidered to be activated in the ﬁrst step of the coupling. This includes all the components from 𝐷 ( 𝑋 𝑡 ) and 𝐷 ( 𝑌 𝑡 ) , and all the components from 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) excluded in the ﬁrst part of the coupling. Let 𝐴 ′ ( 𝑋 𝑡 ) and 𝐴 ′ ( 𝑌 𝑡 ) denote the number of activated vertices from C ( 𝑋 𝑡 ) and C ( 𝑌 𝑡 ) respectively. The second part isa coupling of the activation step in a way such thatPr [ 𝐴 ′ ( 𝑋 𝑡 ) = 𝐴 ′ ( 𝑌 𝑡 )] = Ω ( 𝜔 ( 𝑛 ) − ) . Let 𝑚 𝑥 : = |C ( 𝑋 𝑡 ) | = Θ ( 𝑛 ) , and similarly for 𝑚 𝑦 : = |C ( 𝑌 𝑡 ) | . Let C ≤ · · · ≤ C 𝑚 𝑥 (resp., C ′ ≤ · · · ≤C ′ 𝑚 𝑦 ) be sizes of components in C ( 𝑋 𝑡 ) (resp., C ( 𝑌 𝑡 ) ) in ascending order. For all 𝑖 ≤ 𝑚 𝑥 , let X 𝑖 be arandom variable that equals to C 𝑖 with probability 1 / 𝑞 and 0 otherwise, which corresponds to the numberof activated vertices from 𝑖 th component in C ( 𝑋 𝑡 ) . Note that X , . . . , X 𝑚 𝑥 are independent. We check that X , . . . , X 𝑚 𝑥 satisfy all other conditions of Theorem 2.2.Assumption S 𝜔 ( 𝑋 𝑡 ) = S 𝜔 ( 𝑌 𝑡 ) and the ﬁrst part of the activation ensure that C 𝑚 𝑥 ≤ 𝐵 𝜔 = 𝑂 (cid:16) 𝑛 / 𝜔 ( 𝑛 ) − (cid:17) = 𝑂 (cid:16) 𝑚 / 𝑥 𝜔 ( 𝑚 𝑥 ) − (cid:17) . Observe also that there exists a constant 𝜌 such that C 𝑖 = 𝑖 ≤ 𝜌𝑚 𝑥 and |{ 𝑖 : C 𝑖 ∈ I 𝑘 ( 𝜔 )}| = Θ (cid:16) 𝜔 ( 𝑛 ) · 𝑘 − (cid:17) for 1 ≤ 𝑘 ≤ ℓ ; lastly, from assumption 𝑍 𝑡 = 𝑂 (cid:16) 𝑛 / 𝜔 ( 𝑛 ) / (cid:17) , we obtain 𝑚 𝑥 Õ 𝑖 = C 𝑖 ≤ 𝑍 𝑡 + 𝑂 ( 𝜌𝑚 𝑥 ) + ℓ Õ 𝑘 = 𝜗𝑛 / 𝜔 ( 𝑛 ) 𝑘 + · 𝑂 (cid:16) 𝜔 ( 𝑛 ) · 𝑘 − (cid:17) = 𝑂 𝑚 / 𝑥 p 𝜔 ( 𝑚 𝑥 ) ! + 𝑂 ℓ Õ 𝑘 = 𝑚 / 𝑥 𝜔 ( 𝑚 𝑥 ) 𝑘 − ! = 𝑂 𝑚 / 𝑥 p 𝜔 ( 𝑚 𝑥 ) ! + 𝑂 ℓ Õ 𝑘 = 𝑚 / 𝑥 𝜔 ( 𝑚 𝑥 ) 𝑘 ! = 𝑂 𝑚 / 𝑥 p 𝜔 ( 𝑚 𝑥 ) ! + 𝑂 𝑚 / 𝑥 𝜔 ( 𝑚 𝑥 ) ! = 𝑂 𝑚 / 𝑥 p 𝜔 ( 𝑚 𝑥 ) ! . (10)Therefore, if 𝜇 𝑥 = E (cid:2)Í 𝑚 𝑥 𝑖 = X 𝑖 (cid:3) and 𝜎 𝑥 = 𝑉 𝑎𝑟 (cid:0)Í 𝑚 𝑥 𝑖 = X 𝑖 (cid:1) , Theorem 2.2 implies that for any 𝑥 ∈ [ 𝜇 𝑥 − 𝜎 𝑥 , 𝜇 𝑥 + 𝜎 𝑥 ] , Pr [ 𝐴 ′ ( 𝑋 𝑡 ) = 𝑥 ] = Pr " 𝑚 𝑥 Õ 𝑖 = X 𝑖 = 𝑥 = √ 𝜋𝜎 𝑥 exp (cid:18) − ( 𝑥 − 𝜇 𝑥 ) 𝜎 𝑥 (cid:19) + 𝑜 (cid:18) 𝜎 𝑥 (cid:19) = Ω (cid:18) 𝜎 𝑥 (cid:19) . Similarly, we get that Pr [ 𝐴 ′ ( 𝑌 𝑡 ) = 𝑦 ] = Ω ( 𝜎 − 𝑦 ) for any 𝑦 ∈ [ 𝜇 𝑦 − 𝜎 𝑦 , 𝜇 𝑦 + 𝜎 𝑦 ] , with 𝜇 𝑦 and 𝜎 𝑦 deﬁnedanalogously. Without loss of generality, suppose 𝜎 𝑦 ≤ 𝜎 𝑥 . Since 𝜇 𝑥 = 𝜇 𝑦 , for 𝑥 ∈ (cid:2) 𝜇 𝑥 − 𝜎 𝑦 , 𝜇 𝑥 + 𝜎 𝑦 (cid:3) , weobtain min { Pr [ 𝐴 ′ ( 𝑋 𝑡 ) = 𝑥 ] , Pr [ 𝐴 ′ ( 𝑌 𝑡 ) = 𝑥 ]} = Ω (cid:18) 𝜎 𝑥 (cid:19) . ( 𝐴 ′ ( 𝑋 𝑡 ) , 𝐴 ′ ( 𝑌 𝑡 )) so that Pr [ 𝐴 ′ ( 𝑋 𝑡 ) = 𝐴 ′ ( 𝑌 𝑡 ) = 𝑥 ] = Ω ( 𝜎 − 𝑥 ) for all 𝑥 ∈ [ 𝜇 𝑥 − 𝜎 𝑦 , 𝜇 𝑥 + 𝜎 𝑦 ] . Consequently, under this coupling,Pr [ 𝐴 ′ ( 𝑋 𝑡 ) = 𝐴 ′ ( 𝑌 𝑡 )] = Ω (cid:18) 𝜎 𝑦 𝜎 𝑥 (cid:19) . Since X , . . . , X 𝑚 𝑥 are independent, 𝜎 𝑥 = Θ (cid:0)Í 𝑚 𝑥 𝑖 = C 𝑖 (cid:1) , and similarly 𝜎 𝑦 = Θ (cid:16)Í 𝑚 𝑦 𝑖 = C ′ 𝑖 (cid:17) . Hence, in-equality (10) gives an upper bound for 𝜎 𝑥 ; meanwhile, a lower bound for 𝜎 𝑦 can be obtained by countingcomponents in the largest interval: 𝑚 𝑦 Õ 𝑖 = C ′ 𝑖 ≥ Õ 𝑖 : C ′ 𝑖 ∈I ( 𝜔 ) C ′ 𝑖 ≥ 𝐵𝑛 / 𝜔 ( 𝑛 ) · Θ (cid:0) 𝜔 ( 𝑛 ) (cid:1) = Ω (cid:18) 𝑛 / 𝜔 ( 𝑛 ) (cid:19) . Therefore, Pr [ 𝐴 ′ ( 𝑋 𝑡 ) = 𝐴 ′ ( 𝑌 𝑡 )] = Ω 𝑛 / 𝜔 ( 𝑛 ) / · 𝜔 ( 𝑚 𝑥 ) / 𝑚 / 𝑥 ! = Ω (cid:18) 𝜔 ( 𝑛 ) (cid:19) , as desired. (cid:3) We are now ready to prove Lemma 4.3.

Proof of Lemma 4.3.

Let 𝐶 be a suitable constant that we choose later. We wish to maintain the followingproperties for all 𝑡 ≤ 𝑇 : = 𝐶 log 𝜔 ( 𝑛 ) :1. S 𝜔 ( 𝑋 𝑡 ) = S 𝜔 ( 𝑌 𝑡 ) ;2. 𝑍 𝑡 = 𝑂 (cid:16) 𝑛 / 𝜔 ( 𝑛 ) / (cid:17) ;3. ˆ 𝑁 𝑘 ( 𝑋 𝑡 , 𝑌 𝑡 , 𝜔 ( 𝑛 )) = Ω (cid:16) 𝜔 ( 𝑛 ) · 𝑘 − (cid:17) for all 𝑘 ≥ 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 → ∞ ;4. 𝐼 ( 𝑋 𝑡 ) , 𝐼 ( 𝑌 𝑡 ) = Ω ( 𝑛 ) ;5. R ( 𝑋 𝑡 ) , R ( 𝑌 𝑡 ) = 𝑂 ( 𝑛 / ) ;6. 𝐿 ( 𝑋 𝑡 ) ≤ 𝛼 𝑡 𝐿 ( 𝑋 ) , 𝐿 ( 𝑌 𝑡 ) ≤ 𝛼 𝑡 𝐿 ( 𝑌 ) for some constant 𝛼 independent of 𝑡 .By assumption, 𝑋 and 𝑌 satisfy these properties. Suppose that 𝑋 𝑡 and 𝑌 𝑡 satisfy these properties atstep 𝑡 ≤ 𝑇 . We show that there exists a one-step coupling of the CM dynamics such that 𝑋 𝑡 + and 𝑌 𝑡 + preserve all six properties with probability Ω (cid:0) 𝜔 ( 𝑛 ) − (cid:1) .We provide the high-level ideas of the proof ﬁrst. We will crucially exploit the coupling from Lemma4.6. Assuming 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) , properties 1 and 2 hold immediately at 𝑡 +

1, and properties 3 and 4 can beshown by a “standard” approach used throughout the paper. In addition, we reuse simple arguments fromprevious stages to guarantee properties 5 and 6.Consider ﬁrst the activation step. By Lemma 4.6, 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) with probability at least Ω ( 𝜔 ( 𝑛 ) − ) . Ifthe number of vertices in the percolation is the same in both copies, we can couple the edge re-sampling sothat the updated part of the conﬁguration is identical in both copies. In other words, all new componentscreated in this step are automatically contained in the component matching 𝑊 𝑡 + ; this includes all newcomponents whose sizes are greater than 𝐵 𝜔 . Since none of the new components contributes to 𝑍 𝑡 + , weobtain 𝑍 𝑡 + ≤ 𝑍 𝑡 = 𝑂 (cid:16) 𝑛 / 𝜔 ( 𝑛 ) / (cid:17) . Therefore, 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) immediately implies properties 1 and 2 at time 𝑡 +

1. 25ith probability 1 / 𝑞 , the largest components of 𝑋 𝑡 and 𝑌 𝑡 are activated simultaneously. Suppose thatthis is the case. By Hoeﬀding’s inequality, for constant 𝐾 >

0, we havePr h | 𝐴 ( 𝑋 𝑡 ) − E [ 𝐴 ( 𝑋 𝑡 )] | ≥ 𝐾𝑛 / i ≤ exp (cid:18) − 𝐾 𝑛 / R ( 𝑋 𝑡 ) (cid:19) . Property 5 and the observation that E [ 𝐴 ( 𝑋 𝑡 )] = 𝐿 ( 𝑋 𝑡 ) + 𝑛 − 𝐿 ( 𝑋 𝑡 ) 𝑞 imply thatPr (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) 𝐴 ( 𝑋 𝑡 ) − 𝐿 ( 𝑋 𝑡 ) − 𝑛 − 𝐿 ( 𝑋 𝑡 ) 𝑞 (cid:12)(cid:12)(cid:12)(cid:12) ≥ 𝐾𝑛 / (cid:21) = 𝑂 ( ) . By noting that 𝐿 ( 𝑌 ) , 𝐿 ( 𝑋 ) ≤ 𝑛 / 𝜔 ( 𝑛 ) , property 6 implies thatPr (cid:20) 𝐴 ( 𝑋 𝑡 ) · 𝑞𝑛 ≤ + ( 𝑞 − ) 𝜔 ( 𝑛 ) + 𝐾𝑞𝑛 / (cid:21) = Ω ( ) . (11)We denote 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) by 𝑚 . By inequality (11), with at least constant probability, the random graphfor both chains is 𝐻 ∼ 𝐺 (cid:16) 𝑚, + 𝜆𝑚 − / 𝑚 (cid:17) , where 𝜆 ≤ 𝜔 ( 𝑚 ) . Let us assume that is the case. Fact 4.5 ensures thatthere exists a constant 𝑏 > − 𝑂 ( 𝜔 ( 𝑛 ) − ) , 𝑁 𝑘 ( 𝐻, 𝜔 ( 𝑛 )) ≥ 𝑏𝜔 ( 𝑛 ) · 𝑘 − for all 𝑘 ≥ 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 → ∞ . Since components in 𝐻 are simultaneously added to both 𝑋 𝑡 + and 𝑌 𝑡 + , property 3 is satisﬁed. Moreover, Lemma 2.7 implies that with high probability Ω ( 𝑛 ) isolatedvertices are added to 𝑋 𝑡 + and 𝑌 𝑡 + , and thus property 4 is satisﬁed at time 𝑡 + h R ( 𝑋 𝑡 + ) = 𝑂 ( 𝑛 / ) i = Ω ( ) ;By Lemma 3.3, there exists 𝛼 < 𝐿 ( 𝑋 𝑡 + ) ≤ max { 𝛼𝐿 ( 𝑋 𝑡 ) , 𝐿 ( 𝑋 𝑡 )} , where 𝛼 is independent of 𝑡 and 𝑛 . Potentially, property 6 may not hold when 𝛼𝐿 ( 𝑋 𝑡 ) < 𝐿 ( 𝑋 𝑡 + ) ≤ 𝐿 ( 𝑋 𝑡 ) = 𝑂 ( 𝑛 / ) , but then we stop at this point. (We will argue that in this case all the desired propertiesare also established shortly.) Hence, we suppose otherwise and establish properties 5 and 6 for 𝑋 𝑡 + . Similarbounds hold for 𝑌 𝑡 + .Hence, 𝑋 𝑡 + and 𝑌 𝑡 + have all six properties with probability Ω (cid:0) 𝜔 ( 𝑛 ) − (cid:1) . Inductively, the probabilitythat 𝑋 𝑇 and 𝑌 𝑇 satisfy the six properties is 𝑂 ( 𝜔 ( 𝑛 )) − 𝐶 log 𝜔 ( 𝑛 ) = exp (cid:16) log 𝑂 ( 𝜔 ( 𝑛 )) − 𝐶 log 𝜔 ( 𝑛 ) (cid:17) = exp (cid:0) − 𝑂 (cid:0) ( log 𝜔 ( 𝑛 )) (cid:1) (cid:1) . Suppose 𝑋 𝑇 and 𝑌 𝑇 have the six properties. By choosing 𝐶 > / log 𝛼 , properties 5 and 6 imply R ( 𝑋 𝑇 ) = 𝐿 ( 𝑋 𝑇 ) + R ( 𝑋 𝑇 ) ≤ (cid:16) 𝛼 𝐶 log 𝜔 ( 𝑛 ) 𝑛 / 𝜔 ( 𝑛 ) (cid:17) + 𝑂 ( 𝑛 / ) = 𝑂 ( 𝑛 / ) , and R ( 𝑌 𝑇 ) = 𝑂 ( 𝑛 / ) . While the lemma almost follows from these properties, notice that property 3 doesnot match the desired bounds on the components in the lemma statement. To ﬁx this issue, we performone additional step of the coupling.Consider the activation step at 𝑇 . Assume again 𝐴 ( 𝑋 𝑇 ) = 𝐴 ( 𝑌 𝑇 ) = : 𝑚 ′ . By Hoeﬀding’s inequality, forsome constant 𝐾 ′ , we obtainPr (cid:20)(cid:12)(cid:12)(cid:12) 𝑚 ′ · 𝑞𝑛 − (cid:12)(cid:12)(cid:12) > 𝐾 ′ 𝑛 / (cid:21) = Pr (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) 𝑚 ′ − 𝑛𝑞 (cid:12)(cid:12)(cid:12)(cid:12) > 𝐾 ′ 𝑛 / (cid:21) ≤ exp (cid:18) − 𝐾 ′ 𝑛 / R ( 𝑋 𝑇 ) (cid:19) = 𝑂 ( ) . (12)26et 𝜆 ′ : = ( 𝑚 ′ 𝑞𝑛 − − ) · 𝑚 ′ / . Inequality (12) implies with at least constant probability the random graphin the percolation step is 𝐻 ′ ∼ 𝐺 (cid:16) 𝑚 ′ , + 𝜆 ′ 𝑚 ′− / 𝑚 ′ (cid:17) , where 𝜆 ′ ≤ 𝐾 ′ and 𝑚 ′ ∈ ( 𝑛 / 𝑞, 𝑛 ) . If so, Fact 4.5 ensureswith high probability ˆ 𝑁 𝑘 ( 𝑋 𝑇 + , 𝑌 𝑇 + , 𝜔 ( 𝑛 ) / ) = Ω (cid:16) 𝜔 ( 𝑛 ) · 𝑘 − (cid:17) for all 𝑘 ≥ 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 − → ∞ .By the preceding argument, with 𝜔 ( 𝑛 ) − probability, the six properties are still valid at step 𝑇 + 𝑁 𝑘 ( 𝑋 𝑇 + , 𝑌 𝑇 + , 𝜔 ( 𝑛 ) / ) = Ω ( 𝜔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝜔 ( 𝑛 ) − 𝑘 − → ∞ . (cid:3) Our starting point are two conﬁgurations with the same “large” component structure. That is, S 𝜔 ( 𝑋 ) = S 𝜔 ( 𝑌 ) . We use the maximal matching 𝑊 to couple the activation of the large components in 𝑋 and 𝑌 .The small components not matched by 𝑊 , i.e., those counted in 𝑍 , are then matched independently. Thiscreates a discrepancy D between the number of active vertices from each copy. We note that E [D ] = (D ) = Θ ( 𝑍 ) = Θ ( 𝑛 / 𝜔 ( 𝑛 ) − / ) ; hence D ≤ 𝑛 / 𝜔 ( 𝑛 ) − / w.h.p. To ﬁx this discrepancy, weuse the small components matched by 𝑊 . Speciﬁcally, the assumptions in Lemma 4.4 ensure that theconditions from Theorem 2.4 are satisﬁed and consequently there is a coupling of the activation of thesmall components so that the diﬀerence in the number of activated vertices from the small componentsfrom each copy is exactly D with probability Ω ( ) .We would need to repeat this process until 𝑍 𝑡 =

0; this takes 𝑂 ( log 𝑛 ) steps since 𝑍 𝑡 ≈ ( − / 𝑞 ) 𝑡 𝑍 .However, there are a few complications. First, the initial assumptions on the component structure of theconﬁgurations are not preserved for this many steps w.h.p., so we need to relax the requirements as theprocess evolves. This is in turn possible because the discrepancy D 𝑡 decreases with each step, which alsoimplies that the probability of success of the coupling increases in each step because of the diminishing D 𝑡 . We introduce ﬁrst some notations that will be useful in the proof of Lemma 4.4. Let 𝑆 ( 𝑋 ) = ∅ , and given 𝑆 ( 𝑋 𝑡 ) , 𝑆 ( 𝑋 𝑡 + ) is obtained as follows:(i) 𝑆 ( 𝑋 𝑡 + ) = 𝑆 ( 𝑋 𝑡 ) ;(ii) every component in 𝑆 ( 𝑋 𝑡 ) activated by the CM dynamics at time 𝑡 is removed from 𝑆 ( 𝑋 𝑡 + ) ; and(iii) the largest new component (breaking ties arbitrarily) is added to 𝑆 ( 𝑋 𝑡 + ) .Let C ( 𝑋 𝑡 ) denote the set of connected components of 𝑋 𝑡 and note that 𝑆 ( 𝑋 𝑡 ) is a subset of C ( 𝑋 𝑡 ) ; weuse | 𝑆 ( 𝑋 𝑡 ) | to denote the total number of vertices of the components in 𝑆 ( 𝑋 𝑡 ) . Finally, let 𝑄 ( 𝑋 𝑡 ) = Õ C∈C( 𝑋 𝑡 )\ 𝑆 ( 𝑋 𝑡 ) |C| . In the proof of Lemma 4.4, we use the following lemmas.

Lemma 4.7.

Let 𝑟 be an increasing positive function such that 𝑟 ( 𝑛 ) = 𝑜 ( 𝑛 / ) and let 𝑐 > be a suﬃcientlylarge constant. Suppose | 𝑆 ( 𝑋 𝑡 ) | ≤ 𝑐𝑡𝑛 / 𝑟 ( 𝑛 ) , 𝑄 ( 𝑋 𝑡 ) ≤ 𝑡𝑛 / 𝑟 ( 𝑛 ) + 𝑂 ( 𝑛 / ) and 𝑡 ≤ 𝑟 ( 𝑛 )/ log 𝑟 ( 𝑛 ) . Then,with probability at least − 𝑂 (cid:0) 𝑟 ( 𝑛 ) − (cid:1) , | 𝑆 ( 𝑋 𝑡 + ) | ≤ 𝑐 ( 𝑡 + ) 𝑛 / 𝑟 ( 𝑛 ) and 𝑄 ( 𝑋 𝑡 + ) ≤ ( 𝑡 + ) 𝑛 / 𝑟 ( 𝑛 ) + 𝑂 ( 𝑛 / ) . emma 4.8. Let 𝑓 be a positive function such that 𝑓 ( 𝑛 ) = 𝑜 ( 𝑛 / ) and 𝑓 ( 𝑛 ) → ∞ . Suppose that R ( 𝑋 𝑡 ) = 𝑂 (cid:0) 𝑛 / 𝑓 ( 𝑛 ) ( log 𝑓 ( 𝑛 )) − (cid:1) . Let 𝑚 denote the number of activated vertices in this step, and 𝜆 : = ( 𝑚𝑞 / 𝑛 − ) · 𝑚 / . With probability − 𝑂 (cid:0) 𝑓 ( 𝑛 ) − (cid:1) , 𝑚 ∈ ( 𝑛 / 𝑞, 𝑛 ) and | 𝜆 | ≤ 𝑓 ( 𝑛 ) . Lemma 4.9.

Let 𝑔 and ℎ be two increasing positive functions of 𝑛 . Assume 𝑔 ( 𝑛 ) = 𝑜 ( 𝑛 / ) . Let 𝑋 𝑡 and 𝑌 𝑡 be two random-cluster conﬁguration such that ˆ 𝑁 𝑘 ( 𝑋 𝑡 , 𝑌 𝑡 , 𝑔 ) ≥ 𝑏𝑔 ( 𝑛 ) · 𝑘 − for some ﬁxed constant 𝑏 > independent of 𝑛 and for all 𝑘 ≥ such that 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ . Assume also that 𝑍 𝑡 ≤ 𝐶𝑛 / ℎ ( 𝑛 ) − for someconstant 𝐶 > . Lastly, assume 𝐼 ( 𝑋 𝑡 ) , 𝐼 ( 𝑌 𝑡 ) = Ω ( 𝑛 ) . Then for every positive function 𝜂 there exists a couplingfor the activation step of the components of 𝑋 𝑡 and 𝑌 𝑡 such that Pr [ 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 )] ≥ − 𝑒 − 𝜂 ( 𝑛 ) − s 𝑔 ( 𝑛 ) 𝜂 ( 𝑛 ) ℎ ( 𝑛 ) − 𝛿𝑔 ( 𝑛 ) , for some constant 𝛿 > independent of 𝑛 .Proof of Lemma 4.4. The coupling has four phases: in phase 1 we will consider 𝑂 ( log log log log 𝑛 ) stepsof the coupling, 𝑂 ( log log log 𝑛 ) steps in phase 2, 𝑂 ( log log 𝑛 ) steps in phase 3 and phase 4 consists of 𝑂 ( log 𝑛 ) steps.We will keep track of the random variables R ( 𝑋 𝑡 ) , R ( 𝑌 𝑡 ) , 𝐼 ( 𝑋 𝑡 ) , 𝐼 ( 𝑌 𝑡 ) , 𝑍 𝑡 and ˆ 𝑁 𝑘 ( 𝑡, 𝑔 ) for a function 𝑔 we shall carefully choose for each phase, and use these random variables to derive probability of variousevents. Phase 1.

We set 𝑔 ( 𝑛 ) = 𝜔 ( 𝑛 ) / and ℎ ( 𝑛 ) = 𝐾 𝜔 ( 𝑛 ) / where 𝐾 > 𝑎 : = − 𝑞 and let 𝑇 : =

12 log 𝑎 ( log log log 𝑛 ) , and we ﬁx 𝑡 < 𝑇 . Suppose we have R ( 𝑋 𝑡 ) , R ( 𝑌 𝑡 ) ≤ 𝐶 𝑛 / , 𝐼 ( 𝑋 𝑡 ) , 𝐼 ( 𝑌 𝑡 ) = Ω ( 𝑛 ) , and ˆ 𝑁 𝑘 ( 𝑡, 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ , where 𝐶 > 𝐾 >

0, we obtain a coupling for the activation of 𝑋 𝑡 and 𝑌 𝑡 such that the same number of vertices are activated in 𝑋 𝑡 and 𝑌 𝑡 , with probability at least1 − 𝑒 − 𝐾 − s 𝐾𝜔 ( 𝑛 ) / 𝐾 𝜔 ( 𝑛 ) / − 𝛿𝜔 ( 𝑛 ) / = Ω ( ) . Suppose 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) ; then we couple the percolation step so that the components newly generatedin both copies are exactly identical, and we claim the following holds with constant probability:1. R ( 𝑋 𝑡 + ) , R ( 𝑌 𝑡 + ) ≤ 𝐶 𝑛 / ;2. 𝑍 𝑡 + ≤ 𝑎 𝑍 𝑡 ;3. 𝐼 ( 𝑋 𝑡 + ) , 𝐼 ( 𝑌 𝑡 + ) = Ω ( 𝑛 ) ;4. ˆ 𝑁 𝑘 ( 𝑡 + , 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ .First, note that 𝑍 𝑡 + does not possibly increase because the maximal matching 𝑊 𝑡 + can only growunder the coupling if indeed 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) . Observe that only the inactivated components in 𝑋 𝑡 and 𝑌 𝑡 would contribute to 𝑍 𝑡 + , so on expectation only (cid:16) − 𝑞 (cid:17) fraction of 𝑍 𝑡 survive the activation step eachtime; that is, E [ 𝑍 𝑡 + | 𝑋 𝑡 ] = ( − / 𝑞 ) 𝑍 𝑡 . Markov’s inequality thus impliesPr [ 𝑍 𝑡 + ≥ 𝑎 𝑍 𝑡 | 𝑋 𝑡 ] ≤ − 𝑞 − 𝑞 < . 𝑚 : = 𝐴 ( 𝑋 𝑡 ) , and 𝜆 : = ( 𝑚𝑞 / 𝑛 − ) · 𝑚 / . By Lemma 4.8, 𝑚 ∈ ( 𝑛 / 𝑞, 𝑛 ) and 𝜆 = 𝑂 ( ) with at least constant probability. Let 𝐻 ∼ 𝐺 (cid:16) 𝑚, + 𝜆𝑚 − / 𝑚 (cid:17) . Fact 4.5 im-plies 𝑁 𝑘 ( 𝐻, 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ , with probability at least1 − 𝑂 (cid:0) 𝑔 ( 𝑛 ) − (cid:1) . Moreover, Lemma 2.7 implies that with high probability 𝐼 ( 𝐻 ) = Ω ( 𝑛 ) . Since the perco-lation step is coupled, both 𝑋 𝑡 + and 𝑌 𝑡 + will have all the components in 𝐻 , so we have ˆ 𝑁 𝑘 ( 𝑡 + , 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ , and 𝐼 ( 𝑋 𝑡 + ) , 𝐼 ( 𝑌 𝑡 + ) = Ω ( 𝑛 ) .Finally, assuming that | 𝜆 | = 𝑂 ( ) , we may use a crude method to bound R ( 𝑋 𝑡 + ) and R ( 𝑌 𝑡 + ) . ByLemma 2.14, E [R ( 𝑋 𝑡 + ) | 𝑋 𝑡 ] ≤ (cid:18) − 𝑞 (cid:19) R ( 𝑋 𝑡 ) + E [R ( 𝐻 )] = (cid:18) − 𝑞 (cid:19) R ( 𝑋 𝑡 ) + 𝑂 ( 𝑛 / ) ≤ (cid:18) − 𝑞 (cid:19) 𝐶 𝑛 / , for large enough 𝐶 . It follows from Markov’s inequality that R ( 𝑋 𝑡 + ) ≤ 𝐶 𝑛 / with at least constantprobability. Similarly, R ( 𝑌 𝑡 + ) ≤ 𝐶 𝑛 / with at least constant probability.By assumption, 𝑋 and 𝑌 satisfy these four properties. A union bound implies that at each step ofupdate all four properties are maintained with at least constant probability 𝜚 >

0. Thus, the probabilitythat they can be maintained throughout Phase 1 is at least 𝜚 𝑇 = 𝜚

12 log 𝑎 ( log log log 𝑛 ) = ( log log log 𝑛 ) − log 𝜚 ( / 𝑎 ) . If property 2 holds at the end of Phase 1, we have 𝑍 𝑇 = 𝑂 (cid:18) 𝑛 / ℎ ( 𝑛 ) · 𝑎 − 𝑇 (cid:19) = 𝑂 (cid:18) 𝑛 / ℎ ( 𝑛 ) · 𝑎 −

12 log 𝑎 ( log log log 𝑛 ) (cid:19) = 𝑂 (cid:18) 𝑛 / ( log log log 𝑛 ) (cid:19) . To facilitate discussions in Phase 2, we show that the two copies of chains satisfy one additionalproperty at the end of Phase 1. In particular, there exist a lower bound for the number of componentsin a diﬀerent set of intervals. We consider the last percolation step in Phase 1. Then, Fact 4.5 with 𝑔 ( 𝑛 ) : = ( log log log 𝑛 · log log log log 𝑛 ) implies ˆ 𝑁 𝑘 ( 𝑇 , 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ , with high probability.Recall 𝑆 ( 𝑋 ) and 𝑄 ( 𝑋 ) deﬁned at the beginning of Section 4.3.1. In Phase 2, 3 and 4, a new elementof the argument is to also control the behavior 𝑆 ( 𝑋 𝑡 ) and 𝑄 ( 𝑋 𝑡 ) . We provide a general result that will beused in the analysis of the three phases: Claim 4.10.

Given positive increasing functions

𝑇 , 𝑔, ℎ and 𝑟 that tend to inﬁnity and satisfy1. 𝑔 ( 𝑛 ) = 𝑜 ( 𝑛 / ) ;2. 𝑇 = 𝑜 ( 𝑔 ) ;3. 𝑟 ( 𝑛 ) = 𝑜 ( 𝑛 / ) ;4. 𝑇 ( 𝑛 ) · 𝑟 ( 𝑛 ) ≤ 𝑔 ( 𝑛 ) / log 𝑔 ( 𝑛 ) ;5. 𝑇 ( 𝑛 ) ≤ 𝑟 ( 𝑛 )/ log 𝑟 ( 𝑛 ) ;6. 𝑔 ( 𝑛 ) log 𝑔 ( 𝑛 ) ≤ ℎ ( 𝑛 ) / .and random-cluster conﬁgurations 𝑋 , 𝑌 satisfying . 𝑍 = 𝑂 (cid:16) 𝑛 / ℎ ( 𝑛 ) (cid:17) ;2. | 𝑆 ( 𝑋 ) | , | 𝑆 ( 𝑌 ) | ≤ 𝑛 / 𝑟 ( 𝑛 ) ;3. 𝑄 ( 𝑋 ) , 𝑄 ( 𝑌 ) = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 )) ;4. 𝐼 ( 𝑋 ) , 𝐼 ( 𝑌 ) = Ω ( 𝑛 ) ;5. ˆ 𝑁 𝑘 ( , 𝑔 ) = Ω (cid:16) 𝑔 ( 𝑛 ) · 𝑘 − (cid:17) for all 𝑘 ≥ such that 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ .There exists a coupling of CM steps such that after 𝑇 = 𝑇 ( 𝑛 ) steps, with Ω ( ) probability,1. 𝑍 𝑇 = 𝑂 (cid:16) 𝑛 / 𝑎 𝑇 ( 𝑛 ) (cid:17) ;2. | 𝑆 ( 𝑋 𝑇 ) | , | 𝑆 ( 𝑌 𝑇 ) | = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) 𝑇 ( 𝑛 )) ;3. 𝑄 ( 𝑋 𝑇 ) , 𝑄 ( 𝑌 𝑇 ) = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) 𝑇 ( 𝑛 )) ;4. 𝐼 ( 𝑋 𝑇 ) , 𝐼 ( 𝑌 𝑇 ) = Ω ( 𝑛 ) ;5. If a function 𝑔 ′ satisﬁes 𝑔 ′ ≥ 𝑔 and 𝑔 ′ ( 𝑛 ) = 𝑜 ( 𝑛 / ) , then ˆ 𝑁 𝑘 ( 𝑇 , 𝑔 ′ ) = Ω (cid:16) 𝑔 ′ ( 𝑛 ) · 𝑘 − (cid:17) for all 𝑘 ≥ suchthat 𝑛 / 𝑔 ′ ( 𝑛 ) − 𝑘 → ∞ . Proof of this claim is provided later.

Phase 2.

Let 𝑎 = 𝑞 /( 𝑞 − ) . For Phase 2, we set 𝑔 ( 𝑛 ) = ( log log log 𝑛 · log log log log 𝑛 ) , 𝑔 ( 𝑛 ) = ( log log 𝑛 · log log log 𝑛 ) , ℎ ( 𝑛 ) = ( log log log 𝑛 ) , 𝑟 ( 𝑛 ) =

13 log 𝑎 log log 𝑛 · log log 𝑎 log log 𝑛 and 𝑇 = 𝑇 +

12 log 𝑎 log log 𝑛 . Notice these functions satisfy the conditions of 4.10:1. 𝑔 ( 𝑛 ) = 𝑜 ( 𝑛 / ) ;2. 𝑇 − 𝑇 = 𝑜 ( 𝑔 ( 𝑛 )) ;3. 𝑟 ( 𝑛 ) = 𝑜 ( 𝑛 / ) ;4. ( 𝑇 − 𝑇 ) 𝑟 ( 𝑛 ) ≤ ( log 𝑎 log log 𝑛 ) ( log log 𝑎 log log 𝑛 ) ≤ 𝑔 ( 𝑛 ) / log 𝑔 ( 𝑛 ) ;5. 𝑇 − 𝑇 =

12 log 𝑎 log log 𝑛 ≤ 𝑟 ( 𝑛 )/ log 𝑟 ( 𝑛 ) ;6. 𝑔 ( 𝑛 )/ log 𝑔 ( 𝑛 ) ≤ ( log log log 𝑛 ) = ℎ ( 𝑛 ) / .Suppose that we have all the desired properties from Phase 1, so at the beginning of Phase 2 we have:1. 𝑍 𝑇 = 𝑂 (cid:16) 𝑛 / ( log log log 𝑛 ) (cid:17) = 𝑂 (cid:16) 𝑛 / ℎ ( 𝑛 ) (cid:17) ;2. 𝑆 ( 𝑋 𝑇 ) ≤ p 𝑅 ( 𝑋 𝑇 ) ≤ 𝑛 / 𝑟 ( 𝑛 ) , 𝑆 ( 𝑌 𝑇 ) = p 𝑅 ( 𝑌 𝑇 ) ≤ 𝑛 / 𝑟 ( 𝑛 ) ;3. 𝐼 ( 𝑋 𝑇 ) = Ω ( 𝑛 ) , 𝐼 ( 𝑌 𝑇 ) = Ω ( 𝑛 ) ;4. 𝑄 ( 𝑋 𝑇 ) ≤ 𝑅 ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / ) , 𝑄 ( 𝑌 𝑇 ) ≤ 𝑅 ( 𝑌 𝑇 ) = 𝑂 ( 𝑛 / ) ;5. ˆ 𝑁 𝑘 ( 𝑇 , 𝑔 ) = Ω (cid:16) 𝑔 ( 𝑛 ) · 𝑘 − (cid:17) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ .Claim 4.10 implies there exists a coupling such that with Ω ( ) probability30. 𝑍 𝑇 = 𝑂 (cid:16) 𝑛 / ( log log 𝑛 ) (cid:17) ;2. | 𝑆 ( 𝑋 𝑇 ) | ≤ 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log log 𝑛 , | 𝑆 ( 𝑌 𝑇 ) | ≤ 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log log 𝑛 ;3. 𝑄 ( 𝑌 𝑇 ) = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log log 𝑛 ) , 𝑄 ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log log 𝑛 ) ;4. 𝐼 ( 𝑋 𝑇 ) = Ω ( 𝑛 ) , 𝐼 ( 𝑌 𝑇 ) = Ω ( 𝑛 ) ;5. ˆ 𝑁 𝑘 ( 𝑇 , 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ . Phase 3.

Suppose the coupling in Phase 2 succeeds.For Phase 3, we set the functions as 𝑔 ( 𝑛 ) = ( log log 𝑛 · log log log 𝑛 ) , 𝑔 ( 𝑛 ) = ( log 𝑛 · log log 𝑛 ) , ℎ ( 𝑛 ) = ( log log 𝑛 ) , 𝑟 ( 𝑛 ) =

20 log 𝑎 log 𝑛 · log log 𝑎 log 𝑛 and 𝑇 = 𝑇 +

10 log 𝑎 log 𝑛 . Claim 4.10 impliesthere exists a coupling such that with Ω ( ) probability1. 𝑍 𝑇 = 𝑂 (cid:16) 𝑛 / ( log 𝑛 ) (cid:17) ;2. | 𝑆 ( 𝑋 𝑇 ) | = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log 𝑛 ) , | 𝑆 ( 𝑌 𝑇 ) | = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log 𝑛 ) ;3. 𝑄 ( 𝑋 𝑇 ) = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log 𝑛 ) , 𝑄 ( 𝑌 𝑇 ) = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) log 𝑎 log 𝑛 ) ;4. 𝐼 ( 𝑋 𝑇 ) = Ω ( 𝑛 ) , 𝐼 ( 𝑌 𝑇 ) = Ω ( 𝑛 ) ,5. ˆ 𝑁 𝑘 ( 𝑇 , 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ . Phase 4.

Suppose the coupling in Phase 3 succeeds. Let 𝐶 be a constant greater than 4 /

3. We set 𝑔 ( 𝑛 ) = ( log 𝑛 · log log 𝑛 ) , ℎ ( 𝑛 ) = ( log 𝑛 ) , 𝑟 ( 𝑛 ) = 𝐶 log 𝑎 𝑛 · log log 𝑎 𝑛 and 𝑇 = 𝑇 + 𝐶 log 𝑎 𝑛 .Claim 4.10 implies there exists a coupling such that with Ω ( ) probability 𝑍 𝑇 <

1. Since 𝑍 𝑇 is a non-negative integer-value random variable, Pr [ 𝑍 𝑇 < ] = Pr [ 𝑍 𝑇 = ] . When 𝑍 𝑇 = 𝑋 𝑇 and 𝑌 𝑇 have thesame component structure.Therefore, if the coupling in every phase succeeds, 𝑋 𝑇 and 𝑌 𝑇 have the same component structure. Theprobability that coupling in Phase 1 succeeds is ( log log log 𝑛 ) − 𝑂 ( log 𝜚 ( / 𝑎 )) . Conditional on the success oftheir previous phases, couplings in Phase 2, 3 and 4 succeed respectively with at least constant probability.Thus, the entire coupling succeeds with probability ( log log log 𝑛 ) − 𝑂 ( log 𝜚 ( / 𝑎 )) · Ω ( ) · Ω ( ) · Ω ( ) = (cid:18) 𝑛 (cid:19) 𝛽 , where 𝛽 is a positive constant. (cid:3) Proof of Claim 4.10.

We will show that given the following properties at any time 𝑡 ≤ 𝑇 ( 𝑛 ) , we can main-tain them at time 𝑡 + − 𝑂 ( 𝑔 ( 𝑛 ) − ) − 𝑂 ( 𝑟 ( 𝑛 ) − ) :1. 𝑍 𝑡 = 𝑂 (cid:16) 𝑛 / ℎ ( 𝑛 ) (cid:17) ;2. | 𝑆 ( 𝑋 𝑡 ) | , | 𝑆 ( 𝑌 𝑡 ) | ≤ 𝐶 𝑡𝑛 / 𝑟 ( 𝑛 ) for a constant 𝐶 > 𝑄 ( 𝑋 𝑡 ) , 𝑄 ( 𝑌 𝑡 ) ≤ 𝑡𝑛 / 𝑟 ( 𝑛 ) + 𝑂 ( 𝑛 / ) ;4. 𝐼 ( 𝑋 𝑡 ) , 𝐼 ( 𝑌 𝑡 ) = Ω ( 𝑛 ) ; 31. ˆ 𝑁 𝑘 ( 𝑡, 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ .By assumption, 𝑡 ≤ 𝑇 ( 𝑛 ) ≤ 𝑟 ( 𝑛 )/ log 𝑟 ( 𝑛 ) . According to Lemma 4.7, 𝑋 𝑡 + and 𝑌 𝑡 + retain properties 2and 3 with probability at least 1 − 𝑂 ( 𝑟 ( 𝑛 ) − ) .Given properties 1, 4 and 5, Lemma 4.9 (with 𝜂 = log 𝑔 ( 𝑛 )/

2) implies that there exist a constant 𝛿 > 𝑋 𝑡 and 𝑌 𝑡 Pr [ 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 )] ≥ − 𝑒 − log 𝑔 ( 𝑛 ) − s 𝑔 ( 𝑛 ) log 𝑔 ( 𝑛 ) ℎ ( 𝑛 ) − 𝛿𝑔 ( 𝑛 ) = − 𝑂 (cid:18) ℎ ( 𝑛 ) / (cid:19) − 𝑂 (cid:18) 𝑔 ( 𝑛 ) (cid:19) = − 𝑂 (cid:18) 𝑔 ( 𝑛 ) (cid:19) . Note that condition 𝑔 ( 𝑛 ) log 𝑔 ( 𝑛 ) ≤ ℎ ( 𝑛 ) / is used to deduce the inequality above. Suppose 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑋 𝑡 ) ; we couple components generated in the percolation step and preclude the growth of 𝑍 𝑡 . Hence, 𝑍 𝑡 + ≤ 𝑍 𝑡 = 𝑂 (cid:16) 𝑛 / ℎ ( 𝑛 ) (cid:17) , and property 1 holds immediately.Recall that R ( 𝑋 ) = 𝑄 ( 𝑋 ) + | 𝑆 ( 𝑋 ) | . Properties 2 and 3 imply that R ( 𝑋 𝑡 ) = 𝑂 ( 𝑡 𝑛 / 𝑟 ( 𝑛 ) ) and R ( 𝑌 𝑡 ) = 𝑂 ( 𝑡 𝑛 / 𝑟 ( 𝑛 ) ) . Since 𝑡 < 𝑇 ( 𝑛 ) and 𝑇 ( 𝑛 ) · 𝑟 ( 𝑛 ) ≤ 𝑔 ( 𝑛 ) / log 𝑔 ( 𝑛 ) , we can upper bound R ( 𝑋 𝑡 ) and R ( 𝑌 𝑡 ) by 𝑂 (cid:16) 𝑛 / ( 𝑇 ( 𝑛 ) · 𝑟 ( 𝑛 )) (cid:17) = 𝑂 (cid:18) 𝑛 / 𝑔 ( 𝑛 ) log 𝑔 ( 𝑛 ) (cid:19) . We establish properties 4 and 5 with a similar argument as the one used in Phase 1.Let 𝐻 𝑡 ∼ 𝐺 ( 𝐴 ( 𝑋 𝑡 ) , 𝑛 / 𝑞 ) . Due to Lemma 4.8 (with 𝑓 = 𝑔 ) and Fact 4.5 with probability at least 1 − 𝑂 ( 𝑔 ( 𝑛 ) − ) , 𝑁 𝑘 ( 𝐻 𝑡 , 𝑔 ) = Ω ( 𝑔 ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ( 𝑛 ) − 𝑘 → ∞ . In addition, 𝐼 ( 𝐻 𝑡 ) = Ω ( 𝑛 ) with probability 1 − 𝑂 ( 𝑛 − ) by Lemma 2.7. Since the coupling adds components in 𝐻 𝑡 to both 𝑋 𝑡 + and 𝑌 𝑡 + , properties 4 and 5 are maintained at time 𝑡 +

1, with probability at least 1 − 𝑂 ( 𝑔 ( 𝑛 ) − ) .A union bound concludes that at time 𝑡 + − 𝑂 ( 𝑔 ( 𝑛 ) − ) − 𝑂 ( 𝑟 ( 𝑛 ) − ) . Hence, the probability that 𝑋 𝑇 ( 𝑛 ) and 𝑌 𝑇 ( 𝑛 ) still satisfy the listed 5 propertiesabove is (cid:20) − 𝑂 (cid:18) 𝑔 ( 𝑛 ) (cid:19) − 𝑂 (cid:18) 𝑟 ( 𝑛 ) (cid:19) (cid:21) 𝑇 ( 𝑛 ) = − 𝑜 ( ) . It remains for us to show the bound for 𝑍 𝑇 and that for a given function 𝑔 ′ satisfying 𝑔 ′ ≥ 𝑔 and 𝑔 ′ ( 𝑛 ) = 𝑜 ( 𝑛 / ) , then ˆ 𝑁 𝑘 ( 𝑇 , 𝑔 ′ ) = Ω (cid:16) 𝑔 ′ ( 𝑛 ) · 𝑘 − (cid:17) for all 𝑘 ≥ 𝑛 / 𝑔 ′ ( 𝑛 ) − 𝑘 → ∞ .Conditioned on 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 ) for every activation step in this phase, a bound for 𝑍 𝑇 can be obtainedthrough a ﬁrst moment method. On expectation 𝑍 𝑡 contract by a factor of 𝑎 = − 𝑞 each step. Thus, wecan recursively compute the expectation of E [ 𝑍 𝑇 ] :E [ 𝑍 𝑇 ] = E [ E [ 𝑍 𝑇 | 𝑍 𝑇 − ]] = 𝑎 · E [ 𝑍 𝑇 − ] = ... = (cid:18) 𝑎 (cid:19) 𝑇 E [ 𝑍 ] = 𝑂 (cid:18) 𝑎 (cid:19) 𝑇 · 𝑛 / ℎ ( 𝑛 ) ! . (13)It follows from Markov’s inequality that with at least constant probability 𝑍 𝑇 = 𝑂 (cid:18) 𝑛 / 𝑎 𝑇 ( 𝑛 ) (cid:19) . Finally, in the last percolation step in this phase, Fact 4.5 guarantees that with high probability ˆ 𝑁 𝑘 ( 𝑇 , 𝑔 ′ ) = Ω ( 𝑔 ′ ( 𝑛 ) · 𝑘 − ) for all 𝑘 ≥ 𝑛 / 𝑔 ′ ( 𝑛 ) − 𝑘 → ∞ . The claim follows from a union bound. (cid:3) roof of Lemma 4.7. We establish ﬁrst the bound for | 𝑆 ( 𝑋 𝑡 + ) | . Suppose 𝑠 vertices are activated from 𝑆 ( 𝑋 𝑡 ) .By assumption 𝑄 ( 𝑋 𝑡 ) ≤ 𝑡𝑛 / 𝑟 ( 𝑛 ) + 𝑂 ( 𝑛 / ) ≤ 𝑛 / 𝑟 ( 𝑛 ) log 𝑟 ( 𝑛 ) , for suﬃciently large 𝑛 . Hence, Hoeﬀding’s inequality implies that 𝐴 ( 𝑋 𝑡 ) ≤ 𝑠 + 𝑛 − | 𝑆 ( 𝑋 𝑡 ) | 𝑞 + 𝑛 / 𝑟 ( 𝑛 ) ≤ 𝑛𝑞 + ( 𝑞 − ) 𝑠𝑞 + 𝑛 / 𝑟 ( 𝑛 ) , with probability at least 1 − 𝑂 ( 𝑟 ( 𝑛 ) − ) .We consider two cases. First suppose that 𝛿 ( 𝑞 − ) 𝑠 / 𝑞 ≥ 𝑛 / 𝑟 ( 𝑛 ) , where 𝛿 > 𝐴 ( 𝑋 𝑡 ) ≤ 𝑛𝑞 + ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑞 = : 𝑀. The largest new component corresponds to the largest component of a 𝐺 ( 𝐴 ( 𝑋 𝑡 ) , 𝑞 / 𝑛 ) random graph. Let 𝑁 be the size of that component, and let 𝑁 𝑀 be the size of the largest component of a 𝐺 (cid:0) 𝑀, + 𝜀𝑀 (cid:1) randomgraph, where 𝜀 = 𝑞𝑀 / 𝑛 −

1. By Fact 2.6, 𝑁 is stochastically dominated by 𝑁 𝑀 . Then by Corollary 2.11there exists a constant 𝑐 > [ 𝑁 > ( + 𝜌 ) 𝜀𝑀 ] ≤ Pr [ 𝑁 𝑀 > ( + 𝜌 ) 𝜀𝑀 ] = 𝑂 ( e − 𝑐𝜀 𝑀 ) , (14)for any 𝜌 < /

10. Now, 𝜀𝑀 = ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑛 (cid:18) 𝑛𝑞 + ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑞 (cid:19) = ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑞 + 𝑂 (cid:18) 𝑠 𝑛 (cid:19) ≤ ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑞 + 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) ) , ≤ ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑞 , (15)where for the second to last inequality we use that 𝑠 ≤ | 𝑆 ( 𝑋 𝑡 ) | = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) ) , and the last inequalityfollows from the assumptions 𝛿 ( 𝑞 − ) 𝑠𝑞 ≥ 𝑛 / 𝑟 ( 𝑛 ) and 𝑟 ( 𝑛 ) = 𝑜 (cid:0) 𝑛 / (cid:1) . Also, since 𝑠 = 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) ) and 𝑟 ( 𝑛 ) = 𝑜 (cid:0) 𝑛 / (cid:1) , 𝜀 𝑀 = (cid:20) ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑛 (cid:21) (cid:18) 𝑛𝑞 + ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑞 (cid:19) = Ω (cid:18) 𝑠 𝑛 + 𝑠 𝑛 (cid:19) = Ω ( 𝑟 ( 𝑛 ) ) . (16)Hence, (14), (15) and (16) implyPr (cid:20) 𝑁 ≥ ( + 𝜌 ) ( + 𝛿 ) ( 𝑞 − ) 𝑠𝑞 (cid:21) = e − Ω ( 𝑟 ( 𝑛 ) ) . Since 𝑞 <

2, for suﬃciently small 𝜌 and 𝛿 ( + 𝜌 ) ( + 𝛿 ) ( 𝑞 − ) 𝑞 < . 𝑁 ≤ 𝑠 with probability 1 − exp (− Ω ( 𝑟 ( 𝑛 ) )) . If this is the case, then | 𝑆 ( 𝑋 𝑡 + ) | ≤ | 𝑆 ( 𝑋 𝑡 ) | and soby a union bound | 𝑆 ( 𝑋 𝑡 + ) | ≤ 𝑐 ( 𝑡 + ) 𝑛 / 𝑟 ( 𝑛 ) with probability at least 1 − 𝑂 ( 𝑟 ( 𝑛 ) − ) .For the second case we assume 𝛿 ( 𝑞 − ) 𝑠𝑞 < 𝑛 / 𝑟 ( 𝑛 ) and proceed in similar fashion. In this case, Hoeﬀd-ing’s inequality implies with probability at least 1 − 𝑂 ( 𝑟 ( 𝑛 ) − ) , 𝐴 ( 𝑋 𝑡 ) ≤ 𝑛𝑞 + ( + / 𝛿 ) 𝑛 / 𝑟 ( 𝑛 ) = : 𝑀 ′ . The size of the largest new component, denoted 𝑁 ′ , is stochastically dominated by the size of the largestcomponent of a 𝐺 ( 𝑀 ′ , + 𝜀 ′ 𝑀 ′ ) random graph, with 𝜀 ′ = 𝑞𝑀 ′ / 𝑛 −

1. Now, since we assume 𝑟 ( 𝑛 ) = 𝑜 (cid:0) 𝑛 / (cid:1) , 𝜀 ′ 𝑀 ′ ≤ 𝑞 ( + / 𝛿 ) 𝑟 ( 𝑛 ) 𝑛 / (cid:20) 𝑛𝑞 + ( + / 𝛿 ) 𝑛 / 𝑟 ( 𝑛 ) (cid:21) = ( + / 𝛿 ) 𝑛 / 𝑟 ( 𝑛 ) + 𝑂 ( 𝑛 / 𝑟 ( 𝑛 ) ) ≤ 𝑐 𝑛 / 𝑟 ( 𝑛 ) , where the last inequality holds for large 𝑛 and a suﬃciently large constant 𝑐 . Moreover, ( 𝜀 ′ ) 𝑀 ′ = Ω (cid:18) 𝑟 ( 𝑛 ) 𝑛 (cid:20) 𝑛𝑞 + 𝑛 / 𝑟 ( 𝑛 ) (cid:21) (cid:19) = Ω ( 𝑟 ( 𝑛 ) ) . Hence, Pr h 𝑁 ′ ≥ 𝑐𝑛 / 𝑟 ( 𝑛 ) i ≤ Pr (cid:20) 𝑁 ′ ≥ ( + 𝜌 ) 𝑐𝑛 / 𝑟 ( 𝑛 ) (cid:21) ≤ Pr [ 𝑁 ′ ≥ ( + 𝜌 ) 𝜀 ′ 𝑀 ′ ] , where 𝜌 < /

10, and by Corollary 2.11Pr [ 𝑁 ′ ≥ ( + 𝜌 ) 𝜀 ′ 𝑀 ′ ] = e − Ω (( 𝜀 ′ ) 𝑀 ′ ) = e − Ω ( 𝑟 ( 𝑛 ) ) . Since, | 𝑆 ( 𝑋 𝑡 + ) | ≤ | 𝑆 ( 𝑋 𝑡 ) | + 𝑁 ′ , a union bound implies that | 𝑆 ( 𝑋 𝑡 + ) | ≤ 𝑐 ( 𝑡 + ) 𝑛 / 𝑟 ( 𝑛 ) with probability atleast 1 − 𝑂 ( 𝑟 ( 𝑛 ) − ) as desired.Finally, to bound 𝑄 ( 𝑋 𝑡 + ) we observe that if 𝐶 , . . . , 𝐶 𝑘 are all the new components in order of theirsizes, then by Lemma 2.9 and Markov’s inequality:Pr "Õ 𝑗 ≥ | 𝐶 𝑗 | ≥ 𝑛 / 𝑟 ( 𝑛 ) = 𝑂 ( 𝑟 ( 𝑛 ) − ) . Thus, 𝑄 ( 𝑋 𝑡 + ) ≤ 𝑄 ( 𝑋 𝑡 ) + 𝑛 / 𝑟 ( 𝑛 ) ≤ ( 𝑡 + ) 𝑛 / 𝑟 ( 𝑛 ) with probability at least 1 − 𝑂 ( 𝑟 ( 𝑛 ) − ) as claimed. Thelemma follows from a union bound. (cid:3) Proof of Lemma 4.8.

Since R ( 𝑋 𝑡 ) = 𝑂 (cid:0) 𝑛 / 𝑓 ( 𝑛 ) ( log 𝑓 ( 𝑛 )) − (cid:1) , by Hoeﬀding’s inequality 𝐴 ( 𝑋 𝑡 ) ∈ (cid:20) 𝑛 − 𝑛 / 𝑓 ( 𝑛 ) 𝑞 , 𝑛 + 𝑛 / 𝑓 ( 𝑛 ) 𝑞 (cid:21) = : 𝐽 , with probability at least 1 − 𝑂 ( 𝑓 ( 𝑛 ) − ) . The new connected components in 𝑋 𝑡 + correspond to those of a 𝐺 ( 𝐴 ( 𝑋 𝑡 ) , + 𝜀𝐴 ( 𝑋 𝑡 ) ) random graph, where 𝜀 = 𝐴 ( 𝑋 𝑡 ) 𝑞 / 𝑛 −

1. If 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 , then − 𝑛 − / 𝑓 ( 𝑛 ) ≤ 𝜀 ≤ 𝑛 − / 𝑓 ( 𝑛 ) . (17)Since 𝐴 ( 𝑋 𝑡 ) ∈ 𝐽 we can also deﬁne 𝑚 : = 𝐴 ( 𝑋 𝑡 ) = 𝜃𝑛 for 𝜃 ∈ ( / 𝑞, ) , and 𝜆 : = 𝜀𝑚 / , so we mayrewrite (17) as − 𝑓 ( 𝑛 ) ≤ − 𝜃 / 𝑓 ( 𝑛 ) ≤ 𝜆 ≤ 𝜃 / 𝑓 ( 𝑛 ) ≤ 𝑓 ( 𝑛 ) , and the lemma follows. (cid:3) roof of Lemma 4.9. For ease of notation let I 𝑘 = I 𝑘 ( 𝑔 ) and ˆ 𝑁 𝑘 = ˆ 𝑁 𝑘 ( 𝑡, 𝑔 ) for each 𝑘 ≥

1. Also recall thenotations 𝑊 𝑡 , 𝑀 ( 𝑋 ) and 𝐷 ( 𝑋 ) deﬁned in Section 4. Let ˆ 𝐼 ( 𝑋 𝑡 ) and ˆ 𝐼 ( 𝑌 𝑡 ) be the isolated vertices in 𝑊 𝑡 from 𝑋 𝑡 and 𝑌 𝑡 , respectively.Let 𝑘 ∗ : = min 𝑘 { 𝑘 ∈ Z : 𝑔 ( 𝑛 ) 𝑘 ≥ 𝜗𝑛 / } . The activation of the non-trivial components in 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) whose sizes are not in { } ∪ I ∪ · · · ∪ I 𝑘 ∗ is coupled using the matching 𝑊 𝑡 . That is, 𝑐 ∈ 𝑀 ( 𝑋 𝑡 ) and 𝑊 𝑡 ( 𝑐 ) ∈ 𝑀 ( 𝑌 𝑡 ) are activated simultaneously with probability 1 / 𝑞 . The components in 𝐷 ( 𝑋 𝑡 ) and 𝐷 ( 𝑌 𝑡 ) are activated independently. After independently activating these components, the number of activevertices from each copy is not necessarily the same. The idea is to couple the activation of the remainingcomponents in 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) in way that corrects this diﬀerence.Let 𝐴 ( 𝑋 𝑡 ) and 𝐴 ( 𝑌 𝑡 ) be number of active vertices from 𝑋 𝑡 and 𝑌 𝑡 , respectively, after the activation ofthe components from 𝐷 ( 𝑋 𝑡 ) and 𝐷 ( 𝑌 𝑡 ) . Observe that E [ 𝐴 ( 𝑋 𝑡 )] = E [ 𝐴 ( 𝑌 𝑡 )] = : 𝜇 and that by Hoeﬀding’sinequality, for any 𝜂 ( 𝑛 ) > h | 𝐴 ( 𝑋 𝑡 ) − 𝜇 | ≥ p 𝜂 ( 𝑛 ) 𝑍 𝑡 i ≤ 𝑒 − 𝜂 ( 𝑛 ) . Recall 𝑍 𝑡 ≤ 𝐶𝑛 / ℎ ( 𝑛 ) . Hence, with probability at least 1 − (− 𝜂 ( 𝑛 )) , 𝑑 : = | 𝐴 ( 𝑋 𝑡 ) − 𝐴 ( 𝑌 𝑡 ) | ≤ p 𝜂 ( 𝑛 ) 𝑍 𝑡 ≤ p 𝐶𝜂 ( 𝑛 ) 𝑛 / p ℎ ( 𝑛 ) . We ﬁrst couple the activation of the components in I , then in I and so on up to I 𝑘 ∗ . Without loss ofgenerality, suppose that 𝑑 = 𝐴 ( 𝑌 𝑡 ) − 𝐴 ( 𝑋 𝑡 ) . If 𝑑 ≤ 𝜗𝑛 / 𝑔 ( 𝑛 ) , we simply couple the components with sizes in I using the matching 𝑊 𝑡 . Suppose otherwise that 𝑑 > 𝜗𝑛 / 𝑔 ( 𝑛 ) . Let 𝐴 ( 𝑋 𝑡 ) and 𝐴 ( 𝑌 𝑡 ) be random variablescorresponding to the numbers of active vertices from 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) with sizes in I respectively. Byassumption ˆ 𝑁 ≥ 𝑏𝑔 ( 𝑛 ) . Hence, Theorem 2.4 implies that for 𝛿 = 𝛿 ( 𝑞 ) >

0, there exists a coupling for theactivation of the components in 𝑀 ( 𝑋 𝑡 ) and 𝑀 ( 𝑌 𝑡 ) with sizes in I such that 𝑑 ≥ 𝐴 ( 𝑋 𝑡 ) − 𝐴 ( 𝑌 𝑡 ) ≥ 𝑑 − 𝜗𝑛 / 𝑔 ( 𝑛 ) with probability at least1 − 𝛿 (cid:16) 𝑑 − 𝜗𝑛 / 𝑔 ( 𝑛 ) (cid:17) 𝜗𝑛 / 𝑔 ( 𝑛 ) p 𝑏𝑔 ( 𝑛 ) ≥ − 𝛿𝑑 𝜗𝑛 / 𝑔 ( 𝑛 ) p 𝑏𝑔 ( 𝑛 ) ≥ − 𝛿 p 𝐶𝜂 ( 𝑛 ) 𝑔 ( 𝑛 ) 𝜗 p 𝑏ℎ ( 𝑛 ) ≥ − s 𝜂 ( 𝑛 ) 𝑔 ( 𝑛 ) ℎ ( 𝑛 ) , where the last inequality holds for 𝜗 large enough. Let 𝑑 : = ( 𝐴 ( 𝑌 𝑡 ) − 𝐴 ( 𝑋 𝑡 )) + ( 𝐴 ( 𝑌 𝑡 ) − 𝐴 ( 𝑋 𝑡 )) . If thecoupling succeeds, we have 0 ≤ 𝑑 ≤ 𝜗𝑛 / 𝑔 ( 𝑛 ) . Thus, we have shown that 𝑑 ≤ 𝜗𝑛 / 𝑔 ( 𝑛 ) with probability at least (cid:16) − 𝑒 − 𝜂 ( 𝑛 ) (cid:17) © « − s 𝜂 ( 𝑛 ) 𝑔 ( 𝑛 ) ℎ ( 𝑛 ) ª®¬ ≥ − 𝑒 − 𝜂 ( 𝑛 ) − s 𝜂 ( 𝑛 ) 𝑔 ( 𝑛 ) ℎ ( 𝑛 ) . Now, let 𝑑 𝑘 be the diﬀerence in the number of active vertices after activating the components in I 𝑘 .Suppose that 𝑑 𝑘 ≤ 𝜗𝑛 / 𝑔 ( 𝑛 ) 𝑘 , for 𝑘 ≤ 𝑘 ∗ . By assumption, ˆ 𝑁 𝑘 + ≥ 𝑏𝑔 ( 𝑛 ) · 𝑘 . Thus, using Theorem 2.4 again weget that there exists a coupling for the activation of the components in I 𝑘 + such thatPr (cid:20) 𝑑 𝑘 + ≤ 𝜗𝑛 / 𝑔 ( 𝑛 ) 𝑘 + (cid:12)(cid:12)(cid:12)(cid:12) 𝑑 𝑘 ≤ 𝜗𝑛 / 𝑔 ( 𝑛 ) 𝑘 (cid:21) ≥ − 𝛿𝑑 𝑘𝜗𝑛 / 𝑔 ( 𝑛 ) 𝑘 + q 𝑏𝑔 ( 𝑛 ) · 𝑘 ≥ − 𝛿 √ 𝑏𝑔 ( 𝑛 ) 𝑘 − . I , I , . . . , I 𝑘 ∗ such thatPr (cid:20) 𝑑 𝑘 ∗ ≤ 𝑛 / (cid:12)(cid:12)(cid:12)(cid:12) 𝑑 ≤ 𝜗𝑛 / 𝑔 ( 𝑛 ) (cid:21) ≥ 𝑘 ∗ Ö 𝑘 = (cid:18) − 𝛿 ′ 𝑔 ( 𝑛 ) 𝑘 − (cid:19) , where 𝛿 ′ = 𝛿 /√ 𝑏 . Note that for a suitable constant 𝛿 ′′ >

0, we have 𝑘 ∗ Ö 𝑘 ≥ (cid:18) − 𝛿 ′ 𝑔 ( 𝑛 ) 𝑘 − (cid:19) = exp 𝑘 ∗ Õ 𝑘 ≥ ln (cid:18) − 𝛿 ′ 𝑔 ( 𝑛 ) 𝑘 (cid:19) ! ≥ exp − 𝛿 ′′ 𝑘 ∗ Õ 𝑘 ≥ 𝑔 ( 𝑛 ) 𝑘 ! , and since 𝑘 ∗ Õ 𝑘 ≥ 𝑔 ( 𝑛 ) 𝑘 ≤ ∞ Õ 𝑘 ≥ 𝑔 ( 𝑛 ) 𝑘 ≤ ∞ Õ 𝑘 ≥ 𝑔 ( 𝑛 ) 𝑘 ≤ 𝑔 ( 𝑛 ) − 𝑔 ( 𝑛 ) , we get 𝑘 ∗ Ö 𝑘 = (cid:18) − 𝛿 ′ 𝑔 ( 𝑛 ) 𝑘 − (cid:19) ≥ exp (cid:18) − 𝛿 ′′ 𝑔 ( 𝑛 ) − 𝑔 ( 𝑛 ) (cid:19) ≥ − 𝛿 ′′ 𝑔 ( 𝑛 ) − 𝑔 ( 𝑛 ) . Finally, we couple ˆ 𝐼 ( 𝑋 𝑡 ) and ˆ 𝐼 ( 𝑌 𝑡 ) to ﬁx 𝑑 𝑘 ∗ . By assumption 𝐼 ( 𝑋 𝑡 ) , 𝐼 ( 𝑌 𝑡 ) = Ω ( 𝑛 ) , so 𝑚 : = | ˆ 𝐼 ( 𝑋 𝑡 ) | = | ˆ 𝐼 ( 𝑌 𝑡 ) | = Ω ( 𝑛 ) . Let 𝐴 𝐼 ( 𝑋 𝑡 ) and 𝐴 𝐼 ( 𝑌 𝑡 ) denote the total number of activated isolated vertices from ˆ 𝐼 ( 𝑋 𝑡 ) and ˆ 𝐼 ( 𝑌 𝑡 ) respectively. We activate all isolated vertices independently, so 𝐴 𝐼 ( 𝑋 𝑡 ) and 𝐴 𝐼 ( 𝑌 𝑡 ) can be seenas two binomial random variables with the same parameters 𝑚 and 1 / 𝑞 . Lemma 2.5 gives a coupling forbinomial random variables such that for 𝑟 ≤ 𝑛 / ,Pr [ 𝐴 𝐼 ( 𝑋 𝑡 ) − 𝐴 𝐼 ( 𝑌 𝑡 ) = 𝑟 ] ≥ − 𝑂 (cid:18) 𝑛 / (cid:19) = − 𝑜 (cid:18) 𝑔 ( 𝑛 ) (cid:19) . Therefore, Pr [ 𝐴 ( 𝑋 𝑡 ) = 𝐴 ( 𝑌 𝑡 )] ≥ − 𝑒 − 𝜂 ( 𝑛 ) − s 𝜂 ( 𝑛 ) 𝑔 ( 𝑛 ) ℎ ( 𝑛 ) − 𝑂 (cid:18) 𝑔 ( 𝑛 ) (cid:19) , as claimed. (cid:3) References [1] N. Alon and J.H. Spencer.

The probabilistic method . John Wiley & Sons, 2000.[2] V. Beﬀara and H. Duminil-Copin. The self-dual point of the two-dimensional random-cluster modelis critical for 𝑞 ≥ Probability Theory and Related Fields , 153:511–542, 2012.[3] A. C. Berry. The accuracy of the gaussian approximation to the sum of independent variates. 1941.[4] A. Blanca.

Random-cluster dynamics . PhD thesis, UC Berkeley, 2016.[5] A. Blanca and R. Gheissari. Random-cluster dynamics on random graphs in tree uniqueness.

ArXivpreprint arXiv:2008.02264 , 2020.[6] A. Blanca, R. Gheissari, and E. Vigoda. Random-cluster dynamics in Z : rapid mixing with generalboundary conditions. Annals of Applied Probability , 30(1):418–459, 2020.367] A. Blanca and A. Sinclair. Dynamics for the mean-ﬁeld random-cluster model.

Proceedings of the 19thInternational Workshop on Randomization and Computation (RANDOM) , pages 528–543, 2015.[8] A. Blanca and A. Sinclair. Random-Cluster Dynamics in Z . Probability Theory and Related Fields ,168:821–847, 2017.[9] B. Bollobás, G.R. Grimmett, and S. Janson. The random-cluster model on the complete graph.

Proba-bility Theory and Related Fields , 104(3):283–317, 1996.[10] L. Chayes and J. Machta. Graphical representations and cluster algorithms II.

Physica A , 254:477–516,1998.[11] P. Cuﬀ, J. Ding, O. Louidor, E. Lubetzky, Y. Peres, and A. Sly. Glauber dynamics for the mean-ﬁeldPotts model.

Journal of Statistical Physics , 149(3):432–477, 2012.[12] H. Duminil-Copin, M. Gagnebin, M. Harel, I. Manolescu, and V. Tassion. Discontinuity of the phasetransition for the planar random-cluster and Potts models with 𝑞 > Annales de l’ENS , 2016. ToAppear.[13] H. Duminil-Copin, V. Sidoravicius, and V. Tassion. Continuity of the Phase Transition for PlanarRandom-Cluster and Potts Models with 1 ≤ 𝑞 ≤ Communications in Mathematical Physics , 349(1):47–107, 2017.[14] R. Durrett.

Probability: Theory and Examples . Cambridge Series in Statistical and Probabilistic Math-ematics. Cambridge University Press, 2010.[15] C.M. Fortuin and P.W. Kasteleyn. On the random-cluster model I. Introduction and relation to othermodels.

Physica , 57(4):536–564, 1972.[16] A. Galanis, D. Štefankovič, and E. Vigoda. Swendsen-Wang algorithm on the mean-ﬁeld Potts model.

Proceedings of the 19th International Workshop on Randomization and Computation (RANDOM) , pages815–828, 2015.[17] T. Garoni. Personal communication, 2015.[18] R. Gheissari and E. Lubetzky. Quasi-polynomial mixing of critical two-dimensional random clustermodels.

Random Structures & Algorithms , 56(2):517–556, 2020.[19] R. Gheissari, E. Lubetzky, and Y. Peres. Exponentially slow mixing in the mean-ﬁeld Swendsen-Wangdynamics. In

Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) ,pages 1981–1988. SIAM, 2018.[20] G.R. Grimmett.

The Random-Cluster Model , volume 333 of

Grundlehren der mathematischen Wis-senschaften [Fundamental Principles of Mathematical Sciences] . Springer-Ver- lag, Berlin, 2006.[21] H. Guo and M. Jerrum. Random cluster dynamics for the Ising model is rapidly mixing. In

Proceedingsof the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1818–1827. SIAM,2017.[22] S. Janson, T. Łuczak, and A. Ruciński.

Random Graphs . Wiley Series in Discrete Mathematics andOptimization. Wiley, 2011.[23] D.A. Levin and Y. Peres.

Markov Chains and Mixing Times . MBK. American Mathematical Society,2017. 3724] Y. Long, A. Nachmias, W. Ning, and Y. Peres. A power law of order 1/4 for critical mean-ﬁeldSwendsen-Wang dynamics.

Memoirs of the American Mathematical Society , 232(1092), 2011.[25] Malwina Luczak and Tomasz Luczak. The phase transition in the cluster-scaled model of a randomgraph.

Random Structures & Algorithms , 28(2):215–246, 2006.[26] T. Luczak. Cycles in a random graph near the critical point.

Random Structures & Algorithms , 2(4):421–439, 1991.[27] T. Łuczak, B. Pittel, and J.C. Wierman. The structure of a random graph at the point of the phasetransition.

Transactions of the American Mathematical Society , 341(2):721–748, 1994.[28] A.B. Mukhin. Local limit theorems for lattice random variables.

Theory of Probability & Its Applica-tions , 36(4):698–713, 1992.[29] B. Pittel. On tree census and the giant component in sparse random graphs.

Random Structures &Algorithms , 1(3):311–342, 1990.[30] R.H. Swendsen and J.S. Wang. Nonuniversal critical dynamics in Monte Carlo simulations.

PhysicalReview Letters , 58:86–88, 1987.[31] M. Ullrich. Swendsen-Wang is faster than single-bond dynamics.

SIAM Journal on Discrete Mathe-matics , 28(1):37–48, 2014.[32] Roman Vershynin. High-dimensional probability. 2019.

A Proof of the local limit theorem

In this appendix, we provide an alternative proof of Theorem 2.2 that does not use Theorem 2.1.

Lemma A.1.

For the random variables satisfying the conditions from Theorem 2.2, 𝜎 𝑚 → ∞ and 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 converges in distribution to a standard normal random variable.Proof. Observe that 𝜎 𝑚 = 𝑟 ( − 𝑟 ) 𝑚 Õ 𝑖 = 𝑐 𝑖 ≥ 𝑟 ( − 𝑟 ) Õ 𝑖 : 𝑐 𝑖 ∈ 𝐼 𝑐 𝑖 = Ω (cid:18) 𝑚 / 𝜔 ( 𝑚 ) · 𝜔 ( 𝑚 ) (cid:19) = Ω (cid:18) 𝑚 / 𝜔 ( 𝑚 ) (cid:19) → ∞ , and also 1 𝜎 𝑚 𝑚 Õ 𝑖 = E [| 𝑋 𝑖 − E [ 𝑋 𝑖 ] | ] = 𝜎 𝑚 𝑚 Õ 𝑖 = 𝑟 ( − 𝑟 ) 𝑐 𝑖 ≤ 𝜎 𝑚 𝑐 𝑚 𝜎 𝑚 = 𝑂 (cid:18) 𝑐 𝑚 𝜎 𝑚 (cid:19) = 𝑂 (cid:18) 𝑚 / 𝜔 ( 𝑚 ) − 𝑚 / 𝜔 ( 𝑚 ) − / (cid:19) = 𝑂 (cid:16) 𝜔 ( 𝑚 ) − / (cid:17) → . Hence, the random variables { 𝑋 𝑖 } satisfy Lyapunov’s central limit theorem conditions (see, e.g., [14]), andso 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 converges in distribution to a standard normal random variable. (cid:3) Lemma A.2.

Suppose 𝑐 , . . . , 𝑐 𝑚 satisfy the conditions from Theorem 2.2. For any 𝑢 satisfying 𝜎 𝑚 ≥ 𝑢 ≥ , Í 𝑗 : 𝑐 𝑗 ≤ 𝑢 𝑐 𝑗 ≥ 𝑢𝜎 𝑚 / 𝑟 ( − 𝑟 ) . roof. We have 𝜎 𝑚 = 𝑟 ( − 𝑟 ) Í 𝑚𝑖 = 𝑐 𝑖 = 𝑂 (cid:18) 𝑚 / √ 𝜔 ( 𝑚 ) (cid:19) . We consider three cases. First, if 𝑚 / ≤ 𝑢 ≤ 𝑐 𝑚 = 𝑂 (cid:0) 𝑚 / 𝜔 ( 𝑚 ) − (cid:1) , there exists a largest integer 𝑘 ∈ [ , ℓ ] such that 𝑢 = 𝑂 (cid:16) 𝜗𝑚 / 𝜔 ( 𝑚 ) 𝑘 (cid:17) . Then, Õ 𝑖 : 𝑐 𝑖 ≤ 𝑢 𝑐 𝑖 ≥ Õ 𝑖 : 𝑐 𝑖 ∈ 𝐼 𝑘 + 𝑐 𝑖 ≥ 𝜗 𝑚 / 𝜔 ( 𝑚 ) 𝑘 + 𝜔 ( 𝑚 ) · 𝑘 = 𝜗 𝑚 / 𝜔 ( 𝑚 ) 𝑘 ≫ 𝑢𝜎 𝑚 ;by ≫ we mean that 𝑢𝜎 𝑚 is of lower order with respect to 𝜗 𝑚 / 𝜔 ( 𝑚 ) 𝑘 . Now, when 𝜎 𝑚 ≥ 𝑢 ≥ 𝑐 𝑚 , we have Õ 𝑖 : 𝑐 𝑖 ≤ 𝑢 𝑐 𝑖 = 𝑚 Õ 𝑖 = 𝑐 𝑖 = 𝜎 𝑚 𝑟 ( − 𝑟 ) ≥ 𝑢𝜎 𝑚 𝑟 ( − 𝑟 ) . Finally, if 1 ≤ 𝑢 ≤ 𝑚 / , 𝑢𝜎 𝑚 is sublinear and so Õ 𝑖 : 𝑐 𝑖 ≤ 𝑢 𝑐 𝑖 ≥ 𝜌𝑚 Õ 𝑖 = 𝑐 𝑖 = 𝜌𝑚 ≫ 𝑚 / 𝜎 𝑚 ≥ 𝑢𝜎 𝑚 . (cid:3) Proof of Theorem 2.2.

Let Φ (·) denote the probability density function of a standard normal distribution.We will show for any ﬁxed 𝑎 ∈ R , (cid:12)(cid:12)(cid:12)(cid:12) Pr (cid:20) 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 = 𝑎 (cid:21) − Φ ( 𝑎 ) 𝜎 𝑚 (cid:12)(cid:12)(cid:12)(cid:12) = 𝑜 (cid:18) 𝜎 𝑚 (cid:19) , (18)which is equivalent to (3).Let 𝜙 ( 𝑡 ) denote the characteristic function for the random variable ( 𝑆 𝑚 − 𝜇 𝑚 )/ 𝜎 𝑚 . By applying theinversion formula (see Theorem 3.3.14 and Exercise 3.3.2 in [14]), Φ ( 𝑎 ) = 𝜋 ∫ ∞−∞ 𝑒 − 𝑖𝑡𝑎 𝑒 − 𝑡 / 𝑑𝑡, and Pr (cid:20) 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 = 𝑎 (cid:21) = 𝜋𝜎 𝑚 ∫ 𝜋𝜎 𝑚 − 𝜋𝜎 𝑚 𝑒 − 𝑖𝑡𝑎 𝜙 ( 𝑡 ) 𝑑𝑡 . Hence, the left hand side of (18) can be bounded from above by12 𝜋𝜎 𝑚 (cid:20)∫ 𝜋𝜎 𝑚 − 𝜋𝜎 𝑚 (cid:12)(cid:12)(cid:12) 𝑒 − 𝑖𝑡𝑎 (cid:16) 𝜙 ( 𝑡 ) − 𝑒 − 𝑡 (cid:17)(cid:12)(cid:12)(cid:12) 𝑑𝑡 + ∫ ∞ 𝜋𝜎 𝑚 𝑒 − 𝑡 𝑑𝑡 (cid:21) . Since | 𝑒 − 𝑖𝑡𝑎 | ≤

1, it suﬃces to show that for all 𝜀 > 𝑀 > 𝑚 > 𝑀 then ∫ 𝜋𝜎 𝑚 − 𝜋𝜎 𝑚 (cid:12)(cid:12)(cid:12) 𝜙 ( 𝑡 ) − 𝑒 − 𝑡 (cid:12)(cid:12)(cid:12) 𝑑𝑡 + ∫ ∞ 𝜋𝜎 𝑚 𝑒 − 𝑡 𝑑𝑡 ≤ 𝜀. (19)We can bound from above the left hand side of (19) by: ∫ 𝐴 − 𝐴 (cid:12)(cid:12)(cid:12) 𝜙 ( 𝑡 ) − 𝑒 − 𝑡 (cid:12)(cid:12)(cid:12) 𝑑𝑡 + ∫ 𝜎 𝑚 / 𝐴 | 𝜙 ( 𝑡 ) | 𝑑𝑡 + ∫ 𝜋𝜎 𝑚 𝜎 𝑚 / | 𝜙 ( 𝑡 ) | 𝑑𝑡 + ∫ ∞ 𝐴 𝑒 − 𝑡 𝑑𝑡 . (20)39he division depends on some constant 𝐴 that we will choose soon. We proceed to bound integral termsin (20) independently.Lemma A.1 implies that 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 converges in distribution to a standard normal. Combined with thecontinuity theorem (see Theorem 3.3.17 in [14]), we get that 𝜙 ( 𝑡 ) → 𝑒 − 𝑡 as 𝑚 → ∞ . The dominatedconvergence theorem (see Theorem 1.5.8 in [14]) hence implies that for any 𝐴 < ∞ the ﬁrst integral of(20) converges to 0. We select 𝑀 large enough so that the integral is less than 𝜀 / 𝐴 increases(see e.g. Proposition 2.1.2 in [32]). Therefore, we are able to select 𝐴 large enough so that each tail hasprobability mass less than 𝜀 / 𝜙 ( 𝑡 ) . By deﬁ-nition and the independence between 𝑋 𝑖 ’s, 𝜙 ( 𝑡 ) = E (cid:20) exp (cid:18) 𝑖𝑡 · 𝑆 𝑚 − 𝜇 𝑚 𝜎 𝑚 (cid:19) (cid:21) = exp (cid:18) − 𝑖𝑡 𝜇 𝑚 𝜎 𝑚 (cid:19) 𝑚 Ö 𝑗 = 𝜙 𝑗 ( 𝑡 ) , where 𝜙 𝑗 ( 𝑡 ) denotes the characteristic function of 𝑋 𝑗 / 𝜎 𝑚 . Since exp (− 𝑖𝑡𝜇 𝑚 𝜎 𝑚 ) always has modulo 1, | 𝜙 ( 𝑡 ) | ≤ Î 𝑚𝑗 = | 𝜙 𝑗 ( 𝑡 ) | .We proceed to bound the third integral of (20). Note that | 𝜙 𝑗 ( 𝑡 ) | ≤ 𝑗 and 𝑡 . Therefore, | 𝜙 ( 𝑡 ) | ≤ 𝑚 Ö 𝑗 = | 𝜙 𝑗 ( 𝑡 ) | ≤ Ö 𝑗 ≤ 𝜌𝑚 | 𝜙 𝑗 ( 𝑡 ) | . Notice that the 𝑋 𝑗 ’s for 𝑗 ≤ 𝜌𝑚 are Bernoulli random variables. By periodicity (see Theorem 3.5.2 in [14]), | 𝜙 𝑗 ( 𝑡 ) | equals to 1 only when 𝑡 equals to the multiples of 2 𝜋𝜎 𝑚 . For 𝑡 ∈ [ 𝜎 𝑚 / , 𝜎 𝑚 𝜋 ] , | 𝜙 𝑗 ( 𝑡 ) | is boundedaway from 1, and there exists a constant 𝜂 < | 𝜙 𝑗 ( 𝑡 ) | ≤ 𝜂 . Hence, | 𝜙 ( 𝑡 ) | ≤ 𝜂 𝜌𝑚 . By choosing 𝑀 to be suﬃciently large, we may bound the integral for 𝑚 > 𝑀 : ∫ 𝜋𝜎 𝑚 𝜎 𝑚 / | 𝜙 ( 𝑡 ) | 𝑑𝑡 ≤ ∫ 𝜋𝜎 𝑚 𝜎 𝑚 / 𝜂 𝜌𝑚 𝑑𝑡 ≤ 𝜋𝜎 𝑚 𝜂 𝜌𝑚 ≤ 𝑚𝜂 𝜌𝑚 ≤ 𝜀 . Finally, we bound the second integral of (20). By the deﬁnition of 𝑋 𝑗 , we have 𝜙 𝑗 ( 𝑡 ) = 𝑟𝑒 𝑖𝑡 · 𝑐𝑗𝜎𝑚 + ( − 𝑟 ) = 𝑟 · (cid:18) cos 𝑐 𝑗 𝑡𝜎 𝑚 + 𝑖 · sin 𝑐 𝑗 𝑡𝜎 𝑚 (cid:19) + − 𝑟, where the last identity uses Euler’ formula. Take the modulo of both sides, | 𝜙 𝑗 ( 𝑡 ) | = s 𝑟 sin 𝑐 𝑗 𝑡𝜎 𝑚 + (cid:18) 𝑟 𝑐 𝑗 𝑡𝜎 𝑚 + − 𝑟 (cid:19) = r 𝑟 sin 𝑐 𝑗 𝑡𝜎 𝑚 + 𝑟 cos 𝑐 𝑗 𝑡𝜎 𝑚 + ( − 𝑟 ) + 𝑟 ( − 𝑟 ) cos 𝑐 𝑗 𝑡𝜎 𝑚 = r 𝑟 + ( − 𝑟 ) + 𝑟 ( − 𝑟 ) cos 𝑐 𝑗 𝑡𝜎 𝑚 = s − 𝑟 ( − 𝑟 ) (cid:18) − cos 𝑐 𝑗 𝑡𝜎 𝑚 (cid:19) = − 𝑟 ( − 𝑟 ) (cid:18) − cos 𝑐 𝑗 𝑡𝜎 𝑚 (cid:19) − 𝑟 ( − 𝑟 ) (cid:18) − cos 𝑐 𝑗 𝑡𝜎 𝑚 (cid:19) − . . . , √ + 𝑦 when | 𝑦 | ≤

1. We can also Taylorexpand cos 𝑐 𝑗 𝑡𝜎 𝑚 as 1 − 𝑐 𝑗 𝑡 𝜎 𝑚 + 𝑐 𝑗 𝑡 𝜎 𝑚 − 𝑐 𝑗 𝑡 𝜎 𝑚 + . . . Observe that if 𝑐 𝑗 𝑡𝜎 𝑚 <

1, then we can bound cos 𝑐 𝑗 𝑡𝜎 𝑚 from above by 1 − 𝑐 𝑗 𝑡 𝜎 𝑚 . Furthermore, if we keep onlythe ﬁrst order term from the expansion for | 𝜙 𝑗 ( 𝑡 ) | , we have | 𝜙 𝑗 ( 𝑡 ) | ≤ − 𝑟 ( − 𝑟 ) 𝑐 𝑗 𝑡 𝜎 𝑚 ≤ exp − 𝑟 ( − 𝑟 ) 𝑐 𝑗 𝑡 𝜎 𝑚 ! . (21)Note that (21) only holds if 𝑐 𝑗 𝑡 < 𝜎 𝑚 . However, if we ﬁx 𝜎 𝑚 then for every 𝑡 ∈ [ 𝐴, 𝜎 𝑚 / ] , there alwaysexists a real number 𝑢 ( 𝑡 ) ≥ 𝑢 ( 𝑡 ) 𝑡 < 𝜎 𝑚 ≤ 𝑢 ( 𝑡 ) 𝑡 , which implies for all 𝑐 𝑗 ≤ 𝑢 ( 𝑡 ) , (21) can beestablished. Now we aggregate all | 𝜙 𝑗 ( 𝑡 ) | for which we could claim (21). | 𝜙 ( 𝑡 ) | ≤ Ö 𝑗 : 𝑐 𝑗 ≤ 𝑢 ( 𝑡 ) | 𝜙 𝑗 ( 𝑡 ) | ≤ exp © « − 𝑟 ( − 𝑟 ) Õ 𝑗 : 𝑐 𝑗 ≤ 𝑢 ( 𝑡 ) 𝑐 𝑗 𝑡 𝜎 𝑚 ª®¬ . (22)Without loss of generality, we assume 𝐴 >

1. Consequently, 𝑢 ( 𝑡 ) ≤ 𝜎 𝑚 . Lemma A.2 implies for 𝜎 𝑚 ≥ 𝑢 ( 𝑡 ) ≥ Í 𝑗 : 𝑐 𝑗 ≤ 𝑢 ( 𝑡 ) 𝑐 𝑗 ≥ 𝑢 ( 𝑡 ) 𝜎 𝑚 / 𝑟 ( − 𝑟 ) . Plugging this inequality into (22), we obtain | 𝜙 ( 𝑡 ) | ≤ exp (cid:18) − 𝑢 ( 𝑡 ) 𝜎 𝑚 𝑡 𝜎 𝑚 (cid:19) = exp (cid:18) − 𝑡 · 𝑢 ( 𝑡 ) 𝑡𝜎 𝑚 (cid:19) ≤ exp (cid:16) − 𝑡 (cid:17) . Therefore, for suﬃciently large 𝐴 , ∫ 𝜎 𝑚 / 𝐴 | 𝜙 ( 𝑡 ) | 𝑑𝑡 ≤ ∫ ∞ 𝐴 exp (cid:16) − 𝑡 (cid:17) 𝑑𝑡 ≤ 𝑒 − 𝐴 / ≤ 𝜀 . Thus we established (19) and the proof is complete. (cid:3)

B A proof missing from Section 3

Proof of Claim 3.6.

We ﬁrst show the following inequality:1 . 𝑔 𝑗 ( 𝑛 ) + 𝑔 𝑗 + ( 𝑛 ) ≤ . 𝑔 𝑗 + ( 𝑛 ) . (23)Note that by direction computation1 . 𝑔 𝑗 ( 𝑛 ) + 𝑔 𝑗 + ( 𝑛 ) = . ( log 𝐵 + log ( log 𝑔 𝑗 ( 𝑛 )) ) + log 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 + ( 𝑛 ) . From the deﬁnition of 𝐾 , we know that 𝑔 𝑗 ( 𝑛 ) > 𝐵 for all 𝑗 < 𝐾 . Hence, log 𝐵 < log 𝑔 𝑗 ( 𝑛 ) / . In addi-tion, recall that 𝐵 is such that ∀ 𝑥 ≥ 𝐵 , we have 𝑥 ≥ ( log 𝑥 ) ; therefore, 𝑔 𝑗 ( 𝑛 ) ≥ ( log 𝑔 𝑗 ( 𝑛 )) . Then,log ( log 𝑔 𝑗 ( 𝑛 )) ≤ log 𝑔 𝑗 ( 𝑛 ) / . Putting all these together,1 . ( log 𝐵 + log ( log 𝑔 𝑗 ( 𝑛 )) ) + log 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 + ( 𝑛 ) ≤ . ( log 𝑔 𝑗 ( 𝑛 ) + log 𝑔 𝑗 ( 𝑛 )) + log 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 + ( 𝑛 ) = . 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 ( 𝑛 ) log 𝑔 𝑗 + ( 𝑛 ) = . 𝑔 𝑗 + ( 𝑛 ) 𝑖 =

0) holds trivially. For the inductive step note that 𝑗 + Ö 𝑖 = (cid:18) − 𝑐 log 𝑔 𝑖 ( 𝑛 ) (cid:19) = (cid:18) − 𝑐 log 𝑔 𝑗 + ( 𝑛 ) (cid:19) 𝑗 Ö 𝑖 = (cid:18) − 𝑐 log 𝑔 𝑖 ( 𝑛 ) (cid:19) ≥ (cid:18) − 𝑐 log 𝑔 𝑗 + ( 𝑛 ) (cid:19) (cid:18) − . 𝑐 log 𝑔 𝑗 ( 𝑛 ) (cid:19) ≥ − 𝑐 (cid:18) . 𝑔 𝑗 ( 𝑛 ) + 𝑔 𝑗 + ( 𝑛 ) (cid:19) ≥ − . 𝑐 log 𝑔 𝑗 + ( 𝑛 ) , where the last inequality follows from (23).For part (ii) we also use induction. The base case ( 𝑖 =

0) can be checked straightforwardly. For theinductive step, 𝑗 + Õ 𝑖 = 𝑔 𝑖 ( 𝑛 ) ≤ 𝑔 𝑗 + ( 𝑛 ) + 𝑗 Õ 𝑖 = 𝑔 𝑖 ( 𝑛 ) ≤ 𝑔 𝑗 + ( 𝑛 ) + . 𝑔 𝑗 ( 𝑛 ) ≤ . 𝑔 𝑖 ( 𝑛 ) , where the last inequality follows from (23). (cid:3) C Random Graphs Estimates

In this section, we provide additional random graph facts that are useful for our proofs, together withproofs of lemmas which do not appear in the literature.Recall 𝐺 ∼ 𝐺 ( 𝑛, + 𝜆𝑛 − / 𝑛 ) , where 𝜆 = 𝜆 ( 𝑛 ) may depend on 𝑛 . Both of Lemmas 2.15 and 2.16 are provedusing the following precise estimates on the moments of the number of trees of a given size in 𝐺 . We notethat similar estimates can be found in the literature (see, e.g., [29, 1]); a proof is included for completeness. Claim C.1.

Let 𝑡 𝑘 be the number of trees of size 𝑘 in 𝐺 . Suppose there exists a positive increasing function 𝑔 such that 𝑔 ( 𝑛 ) → ∞ , | 𝜆 | ≤ 𝑔 ( 𝑛 ) and 𝑖, 𝑗, 𝑘 ≤ 𝑛 / 𝑔 ( 𝑛 ) . If 𝑖, 𝑗, 𝑘 → ∞ as 𝑛 → ∞ , then:(i) E [ 𝑡 𝑘 ] = Θ (cid:16) 𝑛𝑘 / (cid:17) ;(ii) Var ( 𝑡 𝑘 ) ≤ E [ 𝑡 𝑘 ] + ( + 𝑜 ( )) 𝜆𝑛 / 𝜋𝑘 ;(iii) For 𝑖 ≠ 𝑗 , Cov ( 𝑡 𝑖 , 𝑡 𝑗 ) ≤ ( + 𝑜 ( )) 𝜆𝑛 / 𝜋𝑖 / 𝑗 / . To prove Lemma 2.15, we also use the following result.

Lemma C.2.

Suppose 𝜀 𝑛 → ∞ and 𝜀 = 𝑜 ( ) . Then w.h.p. the largest component of 𝐺 ∼ 𝐺 (cid:0) 𝑛, + 𝜀𝑛 (cid:1) is theonly component of 𝐺 which contains more than one cycle. Also, w.h.p. the number of vertices contained in theunicyclic components of 𝐺 is less than 𝜔 ( 𝑛 ) 𝜀 − for any function 𝜔 ( 𝑛 ) → ∞ .Proof. An equivalent result was established in [26] for the 𝐺 ( 𝑛, 𝑀 ) model, in which exactly 𝑀 edges arechosen independently at random from the set of all (cid:0) 𝑛 (cid:1) possible edges (see Theorem 7 in [26]). The resultfollows from the asymptotic equivalence between the 𝐺 ( 𝑛, 𝑝 ) and 𝐺 ( 𝑛, 𝑀 ) models when 𝑀 = (cid:0) 𝑛 (cid:1) 𝑝 (see,e.g., Proposition 1.12 in [22]). (cid:3) roof of Lemma 2.15. First, we consider the case when | 𝜆 | is large. If 𝜆 < | 𝜆 | = Ω ( 𝜔 ( 𝑛 ) / ) , thenLemma 2.12 implies thatE (cid:20)Õ 𝑗 : 𝐿 𝑗 ( 𝐺 ) ≤ 𝐵 𝜔 𝐿 𝑗 ( 𝐺 ) (cid:21) ≤ E [R ( 𝐺 )] = 𝑂 (cid:18) 𝑛𝜆𝑛 − / (cid:19) = 𝑂 (cid:18) 𝑛 / 𝜔 ( 𝑛 ) / (cid:19) . Similarly, if 𝜆 > 𝜆 = Ω ( 𝜔 ( 𝑛 ) / ) , then Lemma 2.13 implies that E [R ( 𝐺 )] = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) . Wemay assume 𝐿 ( 𝐺 ) ≤ 𝐵 𝜔 since otherwise the size of the largest component does not contribute to the sum.Then, E (cid:20)Õ 𝑗 : 𝐿 𝑗 ( 𝐺 ) ≤ 𝐵 𝜔 𝐿 𝑗 ( 𝐺 ) (cid:21) ≤ E [R ( 𝐺 )] + 𝐵 𝜔 = 𝑂 (cid:18) 𝑛 / 𝜔 ( 𝑛 ) / (cid:19) . Hence, if | 𝜆 | = Ω ( 𝜔 ( 𝑛 ) − / ) , the result simply follows from Markov’s inequality.Suppose next | 𝜆 | ≤ p 𝜔 ( 𝑛 ) . Let 𝑡 𝑘 be the number of trees of size 𝑘 in 𝐺 and let T 𝐵 𝜔 be the set of treesof size at most 𝐵 𝜔 in 𝐺 . By Claim C.1.i,E hÕ 𝜏 ∈T 𝐵𝜔 | 𝜏 | i = 𝐵 𝜔 Õ 𝑘 = 𝑘 E [ 𝑡 𝑘 ] = 𝑂 ( 𝑛𝜔 ( 𝑛 ) ) + 𝐵 𝜔 Õ 𝑘 = ⌊ 𝜔 ( 𝑛 ) ⌋ 𝑘 E [ 𝑡 𝑘 ] = 𝑂 ( 𝑛𝜔 ( 𝑛 ) ) + 𝑂 ( 𝑛 ) 𝐵 𝜔 Õ 𝑘 = ⌊ 𝜔 ( 𝑛 ) ⌋ 𝑘 / = 𝑂 (cid:18) 𝑛 / 𝜔 ( 𝑛 ) / (cid:19) . By Markov’s inequality, we get that Í 𝜏 ∈T 𝐵𝜔 | 𝜏 | = 𝑂 ( 𝑛 / 𝜔 ( 𝑛 ) − / ) with at least constant probability.All that is left to prove is that the contribution from complex (non-tree) components is small. When | 𝜆 | = 𝑂 ( ) , this follows immediately from the fact that the expected number of complex components is 𝑂 ( ) (see, e.g., Lemma 2.1 in [27]). Then, if C 𝐵 𝜔 is the set of complex components in 𝐺 of size at most 𝐵 𝜔 ,we have E hÕ 𝐶 ∈C 𝐵𝜔 | 𝐶 | i = 𝑂 (cid:18) 𝑛 / 𝜔 ( 𝑛 ) (cid:19) E (cid:2)(cid:12)(cid:12) C 𝐵 𝜔 (cid:12)(cid:12)(cid:3) = 𝑂 (cid:18) 𝑛 / 𝜔 ( 𝑛 ) (cid:19) . When | 𝜆 | → ∞ , Lemma C.2 implies that w.h.p. there is no multicyclic component except the largestcomponent and that the number of vertices in unicyclic components is bounded by 𝑛 / 𝑔 ( 𝑛 )/ 𝜆 , for anyfunction 𝑔 ( 𝑛 ) → ∞ . Hence, Õ 𝐶 ∈C 𝐵𝜔 | 𝐶 | ≤ 𝑛 / 𝑔 ( 𝑛 ) 𝜆 + 𝐵 𝜔 . Setting 𝑔 ( 𝑛 ) = 𝜆 , it follows that w.h.p. Õ 𝐶 ∈C 𝐵𝜔 | 𝐶 | ≤ 𝐵 𝜔 (cid:18) 𝑛 / 𝑔 ( 𝑛 ) 𝜆 + 𝐵 𝜔 (cid:19) ≤ 𝑛 / 𝜔 ( 𝑛 ) . (cid:3) Proof of Lemma 2.16.

Let 𝑇 𝐵 be number of trees in 𝐺 with size in the interval [ 𝐵, 𝐵 ] ; then | 𝑆 𝐵 | ≥ 𝑇 𝐵 . ByChebyshev’s inequality for 𝑎 >

0: Pr [ 𝑇 𝐵 ≤ E [ 𝑇 𝐵 ] − 𝑎𝜎 ] ≤ 𝑎 , where 𝜎 = Var ( 𝑇 𝐵 ) . By Claim C.1.i, E [ 𝑇 𝐵 ] = 𝐵 Õ 𝑘 = 𝐵 E [ 𝑇 𝑘 ] ≥ 𝑐 𝑛𝐵 / 𝑐 >

0. Now,Var ( 𝑇 𝐵 ) = 𝐵 Õ 𝑘 = 𝐵 Var ( 𝑡 𝑘 ) + Õ 𝑗 ≠ 𝑖 : 𝑗,𝑖 ∈[ 𝐵, 𝐵 ] Cov ( 𝑡 𝑖 , 𝑡 𝑗 ) . By Claim C.1.i and Claim C.1.ii, 𝐵 Õ 𝑘 = 𝐵 Var ( 𝑡 𝑘 ) ≤ 𝐵 Õ 𝑘 = 𝐵 E [ 𝑡 𝑘 ] + 𝐵 Õ 𝑘 = 𝐵 ( + 𝑜 ( )) 𝜆𝑛 / 𝜋𝑘 = 𝑂 (cid:18) 𝑛𝐵 / (cid:19) + 𝑂 (cid:18) | 𝜆 | 𝑛 / 𝐵 (cid:19) = 𝑂 (cid:18) 𝑛𝐵 / (cid:19) , where in the last equality we used the assumption that 𝜆 = 𝑜 ( 𝑛 / ) . Similarly, by Claim C.1.iii Õ 𝑗 ≠ 𝑖 : 𝑗,𝑖 ∈[ 𝐵, 𝐵 ] Cov ( 𝑡 𝑖 , 𝑡 𝑗 ) ≤ Õ 𝑗 ≠ 𝑖 : 𝑗,𝑖 ∈[ 𝐵, 𝐵 ] ( + 𝑜 ( )) 𝜆𝑛 / 𝜋𝑖 / 𝑗 / ≤ ( + 𝑜 ( )) | 𝜆 | 𝑛 / 𝜋𝐵 = 𝑂 (cid:18) 𝑛𝐵 / (cid:19) , where the last inequality follows from the assumption that 𝐵 ≤ 𝑛 / 𝑔 ( 𝑛 ) . Hence, for a suitable constant 𝑐 > ( 𝑇 𝐵 ) ≤ 𝑐 𝑛𝐵 / and taking 𝑎 = 𝑐 𝑛 𝐵 / 𝜎 we getPr (cid:20) | 𝑆 𝐵 | ≤ 𝑐 𝑛 𝐵 / (cid:21) ≤ Pr (cid:20) 𝑇 𝐵 ≤ 𝑐 𝑛 𝐵 / (cid:21) ≤ (cid:18) 𝐵 / 𝜎𝑐 𝑛 (cid:19) ≤ 𝑐 𝐵 / 𝑐 𝑛 , as desired. (cid:3) Proof of Claim C.1.

Let 𝑐 = + 𝜆𝑛 − / . The following combinatorial identity follows immediately from thefact that there are exactly 𝑘 𝑘 − trees of size 𝑘 .E [ 𝑡 𝑘 ] = (cid:18) 𝑛𝑘 (cid:19) 𝑘 𝑘 − (cid:16) 𝑐𝑛 (cid:17) 𝑘 − (cid:16) − 𝑐𝑛 (cid:17) 𝑘 ( 𝑛 − 𝑘 )+ ( 𝑘 ) − 𝑘 + . Using the Taylor expansion for ln ( − 𝑥 ) and the fact that 𝑘 = 𝑜 ( 𝑛 / ) , we get 𝑛 ! ( 𝑛 − 𝑘 ) ! = 𝑛 𝑘 𝑘 − Ö 𝑖 = (cid:18) − 𝑖𝑛 (cid:19) = 𝑛 𝑘 exp (cid:18) − 𝑘 𝑛 − 𝑘 𝑛 + 𝑜 ( ) (cid:19) . (24)Similarly, (cid:16) 𝑐𝑛 (cid:17) 𝑘 − = 𝑛 𝑘 − exp (cid:18) 𝜆𝑘𝑛 / − 𝜆 𝑘 𝑛 / + 𝑜 ( ) (cid:19) , (cid:16) − 𝑐𝑛 (cid:17) 𝑘 ( 𝑛 − 𝑘 )+ ( 𝑘 ) − 𝑘 + = exp (cid:18) − 𝑘 − 𝜆𝑘𝑛 / + 𝑘 𝑛 + 𝜆𝑘 𝑛 / + 𝑜 ( ) (cid:19) . Since 𝑘 → ∞ , Stirling’s approximation gives 𝑘 𝑘 − 𝑘 ! = ( + 𝑜 ( )) 𝑒 𝑘 √ 𝜋𝑘 / . (25)44utting all these bounds together, we getE [ 𝑡 𝑘 ] = ( + 𝑜 ( )) 𝑛 √ 𝜋𝑘 / exp (cid:18) − 𝜆 𝑘 𝑛 / + 𝜆𝑘 𝑛 / − 𝑘 𝑛 (cid:19) = Θ (cid:18) 𝑛𝑘 / (cid:19) , (26)where in the last inequality we used the assumptions that | 𝜆 | ≤ 𝑔 ( 𝑛 ) and 𝑘 ≤ 𝑛 / 𝑔 ( 𝑛 ) . This establishes part(i). For part (ii) we proceed in similar fashion, starting instead from the following combinatorial identity:E [ 𝑡 𝑘 ( 𝑡 𝑘 − )] = 𝑛 ! 𝑘 ! 𝑘 ! ( 𝑛 − 𝑘 ) ! ( 𝑘 𝑘 − ) (cid:16) 𝑐𝑛 (cid:17) 𝑘 − (cid:16) − 𝑐𝑛 (cid:17) 𝑚 , where 𝑚 = (cid:0) 𝑘 (cid:1) − ( 𝑘 − ) + 𝑘 + 𝑘 ( 𝑛 − 𝑘 ) (see, e.g., [29]). Using the Taylor expansion for ln ( − 𝑥 ) , weget 𝑛 ! ( 𝑛 − 𝑘 ) ! = 𝑛 𝑘 exp (cid:18) − 𝑘 𝑛 − 𝑘 𝑛 + 𝑜 ( ) (cid:19) , (cid:16) 𝑐𝑛 (cid:17) 𝑘 − = 𝑛 𝑘 − exp (cid:18) 𝜆𝑘𝑛 / − 𝜆 𝑘𝑛 / + 𝑜 ( ) (cid:19) , (cid:16) − 𝑐𝑛 (cid:17) 𝑚 = exp (cid:18) − 𝑘 + 𝑘 𝑛 − 𝜆𝑘𝑛 / + 𝜆𝑘 𝑛 / + 𝑜 ( ) (cid:19) . These three bounds together with (25) implyE [ 𝑡 𝑘 ( 𝑡 𝑘 − )] = ( + 𝑜 ( )) 𝑛 𝜋𝑘 exp (cid:18) − 𝑘 𝑛 − 𝜆 𝑘𝑛 / + 𝜆𝑘 𝑛 / (cid:19) . From (26), we get E [ 𝑡 𝑘 ] = ( + 𝑜 ( )) 𝑛 𝜋𝑘 exp (cid:18) − 𝜆 𝑘𝑛 / + 𝜆𝑘 𝑛 / − 𝑘 𝑛 (cid:19) . Hence, Var ( 𝑡 𝑘 ) = E [ 𝑡 𝑘 ] + ( + 𝑜 ( )) 𝑛 𝜋𝑘 exp (cid:18) − 𝜆 𝑘𝑛 / + 𝜆𝑘 𝑛 / − 𝑘 𝑛 (cid:19) (cid:20) exp (cid:18) 𝜆𝑘 𝑛 / − 𝑘 𝑛 (cid:19) − (cid:21) = E [ 𝑡 𝑘 ] + ( + 𝑜 ( )) 𝑛 𝜋𝑘 (cid:20) exp (cid:18) 𝜆𝑘 𝑛 / − 𝑘 𝑛 (cid:19) − (cid:21) ≤ E [ 𝑡 𝑘 ] + ( + 𝑜 ( )) 𝑛 𝜋𝑘 (cid:20) exp (cid:18) 𝜆𝑘 𝑛 / (cid:19) − (cid:21) ≤ E [ 𝑡 𝑘 ] + ( + 𝑜 ( )) 𝜆𝑛 / 𝜋𝑘 where in the second equality we used the assumptions that | 𝜆 | ≤ 𝑔 ( 𝑛 ) and 𝑘 ≤ 𝑛 / 𝑔 ( 𝑛 ) and for the lastinequality we used the Taylor expansion for 𝑒 𝑥 . This completes the proof of part (ii).For part (iii), let ℓ = 𝑖 + 𝑗 . When 𝑖 ≠ 𝑗 we have the following combinatorial identity (see, e.g., [29]):E [ 𝑡 𝑖 𝑡 𝑗 ] = 𝑛 ! 𝑖 ! 𝑗 ! ( 𝑛 − ℓ ) ! 𝑖 𝑖 − 𝑗 𝑗 − (cid:16) 𝑐𝑛 (cid:17) ℓ − (cid:16) − 𝑐𝑛 (cid:17) 𝑚 ′ , 𝑚 ′ = (cid:0) 𝑖 (cid:1) − ( 𝑖 − ) + (cid:0) 𝑗 (cid:1) − ( 𝑗 − ) + 𝑖 𝑗 + ℓ ( 𝑛 − ℓ ) . Using Taylor expansions and Stirling’s approximationas in the previous two parts, we getE [ 𝑡 𝑖 𝑡 𝑗 ] = ( + 𝑜 ( )) 𝑛 𝜋𝑖 / 𝑗 / exp (cid:18) − ℓ 𝑛 − 𝜆 ℓ 𝑛 / + 𝜆ℓ 𝑛 / (cid:19) . Moreover, from (26) we haveE [ 𝑡 𝑖 ] E [ 𝑡 𝑗 ] = ( + 𝑜 ( )) 𝑛 𝜋𝑖 / 𝑗 / exp (cid:18) − 𝜆 ℓ 𝑛 / + 𝜆 ( 𝑖 + 𝑗 ) 𝑛 / − 𝑖 + 𝑗 𝑛 + 𝑜 ( ) (cid:19) , and so Cov ( 𝑡 𝑖 , 𝑡 𝑗 ) = E [ 𝑡 𝑖 𝑡 𝑗 ] − E [ 𝑡 𝑖 ] E [ 𝑡 𝑗 ] = ( + 𝑜 ( )) 𝑛 𝜋𝑖 / 𝑗 / exp (cid:18) − ℓ 𝑛 − 𝜆 ℓ 𝑛 / + 𝜆ℓ 𝑛 / (cid:19) (cid:20) − exp (cid:18) − 𝜆𝑖 𝑗𝑛 / + 𝑖 𝑗 ℓ 𝑛 (cid:19) (cid:21) = ( + 𝑜 ( )) 𝑛 𝜋𝑖 / 𝑗 / (cid:20) − exp (cid:18) − 𝜆𝑖 𝑗𝑛 / + 𝑖 𝑗 ( 𝑖 + 𝑗 ) 𝑛 (cid:19) (cid:21) ≤ ( + 𝑜 ( )) 𝜆𝑛 / 𝜋𝑖 / 𝑗 / where in the third equality we used the assumptions that | 𝜆 | ≤ 𝑔 ( 𝑛 ) and 𝑖, 𝑗 ≤ 𝑛 / 𝑔 ( 𝑛 ) and the last inequalityfollows from the Taylor expansion for 𝑒 𝑥 . (cid:3)(cid:3)