[PDF] A new notion of commutativity for the algorithmic Lovász Local Lemma

Abstract

The Lovász Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be used to establish the existence of objects that satisfy certain properties. The breakthrough paper of Moser and Tardos and follow-up works revealed that the LLL has intimate connections with a class of stochastic local search algorithms for finding such desirable objects. In particular, it can be seen as a sufficient condition for this type of algorithms to converge fast. Besides conditions for existence of and fast convergence to desirable objects, one may naturally ask further questions regarding properties of these algorithms. For instance, "are they parallelizable?", "how many solutions can they output?", "what is the expected "weight" of a solution?", etc. These questions and more have been answered for a class of LLL-inspired algorithms called commutative. In this paper we introduce a new, very natural and more general notion of commutativity (essentially matrix commutativity) which allows us to show a number of new refined properties of LLL-inspired local search algorithms with significantly simpler proofs.

Full PDF

aa r X i v : . [ c s . D S ] A ug A new notion of commutativity for the algorithmicLov´asz Local Lemma

David G. HarrisUniversity of Maryland, College Park [email protected]

Fotis Iliopoulos ∗ Institute for Advanced Study [email protected]

Vladimir Kolmogorov † Institute of Science and Technology Austria [email protected]

August 14, 2020

Abstract

The Lov´asz Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be usedto establish the existence of objects that satisfy certain properties. The breakthrough paper of Moser andTardos and follow-up works revealed that the LLL has intimate connections with a class of stochasticlocal search algorithms for ﬁnding such desirable objects. In particular, it can be seen as a sufﬁcientcondition for this type of algorithms to converge fast.Besides conditions for existence of and fast convergence to desirable objects, one may naturallyask further questions regarding properties of these algorithms. For instance, “are they parallelizable?”,“how many solutions can they output?”, “what is the expected “weight” of a solution?”, etc. Thesequestions and more have been answered for a class of LLL-inspired algorithms called commutative. Inthis paper we introduce a new, very natural and more general notion of commutativity (essentially matrixcommutativity) which allows us to show a number of new reﬁned properties of LLL-inspired local searchalgorithms with signiﬁcantly simpler proofs.

The Lov´asz Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be used toestablish the existence of objects that satisfy certain properties [7]. At a high level, it states that given a col-lection of bad events in a probability space µ , if each bad-event is not too likely and, further, is independentof most other bad events, then the probability of avoiding all of them is strictly positive.In its simplest, “symmetric” form, it states that if each bad-event has probability at most p and is de-pendent with at most d others, where epd ≤ , then with positive probability no bad-events become true.In particular, a conﬁguration avoiding all the bad-events exists. Although the LLL applies to general prob-ability spaces, most constructions in combinatorics use a simpler setting we refer to as the variable versionLLL . Here, the probability space µ is a cartesian product with n independent variables, and each bad-eventis determined by a subset of the variables. Two bad-events conﬂict if they depend on a common variable. ∗ This material is based upon work directly supported by the IAS Fund for Math and indirectly supported by the National ScienceFoundation Grant No. CCF-1900460. Any opinions, ﬁndings and conclusions or recommendations expressed in this material arethose of the author(s) and do not necessarily reﬂect the views of the National Science Foundation. This work is also supported bythe National Science Foundation Grant No. CCF-1815328. † Supported by the European Research Council under the European Unions Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no 616160 φ on n variables with clauses c , c , . . . , c m , such that each clausehas k literals and each variable appears in at most L clauses. For each clause c i we can deﬁne the bad event B i that c i is violated in a uniformly at random chosen assignment of the variables of φ . For such a uniformassignment, each bad-event has probability p = 2 − k and affects at most d ≤ kL others. So when L ≤ k ek ,the formula φ is satisﬁable.A generalization known as the Lopsided LLL (LLLL) allows bad-events to be positively correlatedwith others; this is as good as independence for the purposes of the LLL. Some notable probability spacessatisfying the LLLL include the uniform distribution on permutations and the variable setting, where twobad-events B, B ′ are dependent only if they disagree on the value of a common variable.In a seminal work, Moser and Tardos [22] presented a simple local search algorithm to make the variable-version LLLL constructive. This algorithm can be described as follows: Algorithm 1

The Moser-Tardos resampling algorithm Draw the state σ from distribution µ while some bad-event B is true on σ do Select, arbitrarily, a bad-event B true on σ Resample, according to distribution µ , all variables in σ affecting B Moser & Tardos showed that if the symmetric LLL criterion (or more general asymmetric criterion) issatisﬁed, then this algorithm quickly converges. Following this work, a large amount of effort has beendevoted to making different variants of the LLL constructive. This research has taken many directions,including further analysis of Algorithm 1 and its connection to different LLL criteria [4, 19, 20, 23].One particular line of research has been to view this type of local search algorithm as an importantrandom process in its own right. With some variants, it can be applied to general probability spaces beyondthe variable LLL, such as random permutations or random matchings of the complete graph [1, 2, 14, 18, 16].It can also be applied in settings which are not directly connected to the LLL itself [3, 5, 15].At a high level of generality, we can summarize these local search algorithms in the following frame-work. We have a discrete state space Ω , with a collection F of subsets (which we sometimes call ﬂaws ) of Ω .We also have some problem-speciﬁc resampling oracle R f for each ﬂaw; it takes some action to potentially“address” that ﬂaw, resulting in a new state σ ′ = R f ( σ, r ) (here, r denotes a random seed passed to R ).With these ingredients, we deﬁne the general local search algorithm as follows: Algorithm 2

The general local search algorithm Draw the state σ from some distribution θ while some ﬂaw f holds on σ do Select a ﬂaw f of σ , according to some rule S . Draw r from some probability distribution R f . Update σ ← R f ( σ, r ) .Besides conditions for existence of and fast convergence to perfect objects, one may naturally ask furtherquestions regarding properties of Algorithm 2. For instance, “is it parallelizable?”, “how many solutionscan it output?”, “what is the expected “weight” of a solution?”, etc. These questions and more have beenanswered for the Moser-Tardos (MT) algorithm in a long series of papers [4, 6, 8, 9, 11, 13, 19, 22]. Asa prominent example, the result of Haeupler, Saha and Srinivasan [9], as well as follow-up work of Harrisand Srinivasan [12, 13], allows one to argue about the dynamics of the MT process, resulting in severalnew applications such as estimating the entropy of the output distribution, partially avoiding bad events anddealing with super-polynomially many bad events. 2e note one important difference between Algorithm 1 and Algorithm 2: the choice of which ﬂaw toresample, if multiple ﬂaws are simultaneously true. For the original MT algorithm, there is nearly completefreedom for this. For general local search algorithms, this is much more constrained. Only a few relativelyrigid rules are allowed, such as selecting the ﬂaw of least index [16].In [21], Kolmogorov identiﬁed a general property of resampling oracles that allows a free choice forresampling rule called commutativity. The commutativity property and the free choice for S , while seem-ingly a minor detail, turns out to play a key role in extending the additional algorithmic properties of theMT algorithm to the general setting of Algorithm 2. For instance, it leads to parallel algorithms [21] and tobounds on the output distribution [17]. The main contribution of this paper is to introduce a more naturalnotion of commutativity, essentially matrix commutativity, that is also more general than the notion of [21].Using this new deﬁnition, we recover all the results of [21, 17, 12] with much simpler proofs.We will formally describe our new deﬁnition in Section 2.1, but let us provide some intuition here.Imagine that for each ﬂaw f we deﬁne an | Ω | × | Ω | transition matrix A f . Each row of A f describesthe probability distribution obtained by resampling a ﬂaw f at a given state σ . We call the algorithm commutative if the transition matrices commute for any pair of ﬂaws which are “independent” (in the LLLsense). Some main consequences for this new deﬁnition are the following:1. We show a number of results on the distribution of the state at the termination of Algorithm 2. In manycases, the output of Algorithm 2 approximates the LLL-distribution , i.e., the distribution induced byconditioning on avoiding all bad events. These bounds are similar to results of [12] for the variable-version LLL and [17] for general commutative resampling oracles. However, they are more generaland avoid a number of technical conditions. Furthermore, the proofs are much simpler.2. We show that there are generic parallel implementations of Algorithm 2. This extends results onparallel algorithms of [21, 11]. In addition to being more general, we get much simpler proofs.3. It has been known that some probability spaces have specialized distributional and convergencebounds which go beyond the “generic” LLL bounds. Some examples of such spaces include per-mutations [12] and the variable-version LLLL [10, 24]. Previously, these types of bounds had beenshown with ad-hoc arguments specialized to each probability space. Our construction recovers mostof these results automatically.4. Many resampling oracles are formed from smaller “atomic” events [11]. For example, in the permu-tation setting, these would be events of the form πx = y . We show here that, if the atomic eventssatisfy the generalized commutativity deﬁnition, then so do the larger set of “composed” events. Dueto some technical restrictions, this natural property did not seem to hold for the original commutativitydeﬁnition of [21]. Although it will require signiﬁcant deﬁnitions and technical development to state our results formally, letus try to provide a high-level summary here. As a starting point, consider Algorithm 1. One of the maintechnical ideas introduced by Moser & Tardos [22] to analyze this algorithm was an object referred to asa witness tree . For each resampling of a bad-event B at a given time, one can generate a correspondingwitness tree which records an “explanation” of why B was true at that time. More properly, this witness treeprovides a history of all the prior resampling which affected the variables involved in B .The main technical lemma governing the behavior of the MT algorithm is the “Witness Tree Lemma,”which states that the probability that a given witness tree is produced is at most the product of the probabil-ities of the corresponding events. The bound on the algorithm runtime, as well as parallel algorithms and3istributional properties, then follows by taking a union bound over witness trees. This inﬁnite sum can bebounded using techniques similar to the analysis of Galton-Watson processes.Versions of this Witness Tree Lemma have been shown for some variants of the MT algorithm [10, 15]Iliopoulos [17] further showed that it held for general spaces which satisfy the commutativity property; this,in turn, leads to the nice algorithmic properties such as parallel algorithms.The main technical innovation in this paper is to generalize this Witness Tree Lemma, in two distinctways. First, instead of keeping track of a scalar product of probabilities in a witness tree, we insteadconsider a certain matrix product. We bound the probability of a given witness tree (or, more properly, aslight generalization known as the witness dag) in terms of the products of the transition matrices of thecorresponding ﬂaws. Commutativity can thus be rephrased and simpliﬁed in terms of matrix commutativity for the transition matrices.Second, we change the criterion for when to add nodes to the witness tree. In the usual LLL construction,including in the analysis of Moser & Tardos, the rule was to add a node for ﬂaw f if the tree already includeda node corresponding to some later-resampled ﬂaw g which is dependent with f . In our construction,we only add the new node f if it can possibly increase transition probabilities corresponding to the tree.This is a strictly more restrictive criterion, and leads to more “compressed” or concise explanations of theresamplings. This in turn leads to improved convergence bounds as well as simpler and uniﬁed proofs.We obtain the scalar form of the Witness Tree lemma by projecting everything to a one-dimensionalspace. For this, we take advantage of some methods of [3] for viewing the evolution of Algorithm 2 in termsof spectral bounds. In Section 2, we provide basic deﬁnitions for resampling oracles and for analyzing the trajectories in them.In particular, in Section 2.1, we provide our new matrix-based deﬁnition for commutativity. In addition tobeing more general than the previous deﬁnitions, it also is easier to work with algebraically. We discuss someexamples of how this new deﬁnition simpliﬁes and works with a number of prior frameworks for resamplingoracles. For example, in Section 7, we discuss how it can be combined with a method of building resamplingoracles from simpler “atomic” ﬂaws discussed in [11].In Section 3, we deﬁne the witness dag, which is a generalization of a witness tree developed by [8].We discuss how to associate a certain transition matrix for each witness dag. This allows us to show a kindof matrix version of the witness tree lemma, wherein the probability that a given witness dag is producedby the algorithm is bounded in terms of the products of the transition matrices of the corresponding ﬂawsin it. This bound is useful for analyzing the algorithm in general terms, and lends itself to clean and simpleinduction proofs.In Section 4, we show how to project down this matrix bound to get useful probabilistic bounds on thebehavior of Algorithm 2. We also relate it to more standard criteria such as the symmetric or asymmetricLLL criterion.In Section 5, we show that the new commutativity deﬁnition leads to bounds on the distribution at thetermination of Algorithm 2.In Section 6, we show that, if there is slack in the LLL conditions, then resampling process is likely tohave low depth. As a consequence, parallel algorithms can be used to implement Algorithm 2.In Section 7, we consider a construction of Harris [11] for building resampling oracles out of smaller“atomic” events. We show that if the underlying atomic events are commutative as in our new deﬁnition, theresulting resampling oracle is as well. 4

Background and Basic Deﬁnitions

Throughout the paper we consider algorithms that start from a state σ ∈ Ω picked from an initial distribu-tion θ , and then repeatedly pick a ﬂaw that is present in the current state and address it. The algorithm alwaysterminates when it encounters a ﬂawless state. Once the algorithm begins, the evolution is determined bythe ﬂaw selection rule S ; this is some rule to select a ﬂaw f ∋ σ for a state σ t at time t . The rule S maydepend on the prior values σ , . . . , σ t − and may be randomized.Once we choose such a ﬂaw f , we will apply a randomization procedure which we refer to as resampling ﬂaw f . Speciﬁcally, we will draw some random source r from a distribution R f , and then update the stateto σ ′ ← R f ( σ, r ) . We also write this more compactly as σ ′ = rσ . For each ﬂaw f , state σ ∈ f , and state σ ′ ∈ Σ , let us deﬁne A f [ σ, σ ′ ] to be the probability that applying R f to σ yields state σ ′ , i.e. A f [ σ, σ ′ ] = Pr r ∼ R f ( R f ( σ, r ) = σ ′ ) = Pr r ∼ R f ( rσ = σ ′ ) For a state σ / ∈ f , we deﬁne A f [ σ, σ ′ ] = 0 . We sometimes write σ f −→ σ ′ to denote that the algorithmaddresses ﬂaw f at σ and moves to σ ′ . We deﬁne a trajectory to be a sequence of the form ( f , . . . , f t ) . Ifwe run Algorithm 2 for some ﬁnite time t , possibly to completion, then we deﬁne the actual trajectory to bethe resulting sequence ˆ T = ( f , f , . . . , f t ) .The key to analyzing Algorithm 2 is to keep track of the possible ways in which addressing certain ﬂaws f can cause other ﬂaws g . For our purposes, we deﬁne this in terms of an undirected notion of dependence.Formally, we suppose that we have some symmetric relation ∼ on Ω , with the property that for every distinctpair of ﬂaws f g , we are guaranteed that resampling ﬂaw f cannot introduce g or vice-versa, i.e. R f never maps a state Ω − g into g and likewise R g never maps a state from Ω − f into f .For ﬂaw f , we deﬁne Γ( f ) to be the set of ﬂaws g with f ∼ g . We also deﬁne f ≈ g if f ∼ g or f = g ,and we deﬁne Γ( f ) = Γ( f ) ∪ { f } = { g : g ≈ f } . We say that a set I ⊆ F is stable if f g for all distinctpairs f, g ∈ I .For an arbitrary event E ⊆ Ω , we deﬁne e E to be the indicator vector for E , i.e. e E [ σ ] = 1 if σ ∈ E and e E σ = 0 otherwise. For a state σ ∈ Ω , we write e σ as shorthand for e { σ } , i.e. the basis vector whichhas a in position σ and zero elsewhere. For vectors u, v we write u (cid:22) v if u [ i ] ≤ v [ i ] for all entries i .We remark that some previous analyses of local search algorithms have used a directed notion of causal-ity [2]. This can sometimes give more precise bounds, but it is not directly compatible with our deﬁnitionsand framework. Also, some previous analyses [21] have further conditions on whether f ∼ f for some ﬂaw f . These would lead to slightly ﬁner, but much more complex, conditions for our algorithms; for simplicity,we do not use these here. The original deﬁnition of commutativity given by Kolmogorov [21] required that for every f ≁ g ∈ F ,there is an injective function mapping state transitions of the form σ f −→ σ g −→ σ to state transitions of theform σ g −→ σ ′ f −→ σ , so that A f ( σ , σ ) A g ( σ , σ ) = A g ( σ , σ ′ ) A f ( σ ′ , σ ) .This deﬁnition is cumbersome to use, and also lacks important symmetry and invariance properties. Asone of the major contributions of this paper, we introduce a more natural notion of algorithmic commutativitythat is also more general than the notion of [21]. Deﬁnition 2.1 (Transition matrix commutativity) . We say that the resampling oracle is transition matrixcommutative (abbreviated

T-commutative ) with respect to dependence relation ∼ on F if A f A g = A g A f ,for every f, g ∈ F such that f ≁ g . bservation 2.2. If the resampling oracle is commutative in the sense of [21], then it is T-commutative.Proof.

Consider f g and states σ, σ ′ . By symmetry, we need to show that A f A g [ σ, σ ′ ] ≤ A g A f [ σ, σ ′ ] .Since f g , we can see that both the LHS and RHS are zero unless σ ∈ f ∩ g .Let V denote the set of states σ ′′ with A f [ σ, σ ′′ ] A g [ σ ′′ , σ ′ ] > . By the deﬁnition of [21], there is aninjective function F : V → Σ such that A f [ σ, σ ′′ ] A g [ σ ′′ , σ ′ ] = A g [ σ, F ( σ ′′ )] A f [ F ( σ ′′ ) , σ ′ ] . Therefore, wehave ( A f A g )[ σ, σ ′ ] = X σ ′′ ∈ V A f [ σ, σ ′′ ] A g [ σ ′′ , σ ′ ] = X σ ′′ ∈ V A g [ σ, F ( σ ′′ )] A f [ F ( σ ′′ ) , σ ′ ] Since function F is injective, each term of the form A g [ σ, τ ] A f [ τ, σ ′ ] is counted at most once in thissum with τ = F ( σ ′′ ) . So ( A f A g )[ σ, σ ′ ] ≤ P τ ∈ f A g [ σ, τ ] A f [ τ, σ ′ ] = ( A g A f )[ σ, σ ′ ] .We remark that dependency information can in fact be recovered from knowledge of the transitionmatrices themselves: Observation 2.3.

The relation deﬁned by f ∼ g iff A f A g = A g A f gives a T-commutative dependencyrelation for the resampling oracle.Proof. Consider a pair of distinct ﬂaws f, g such that ﬂaw f causes ﬂaw g . There must be a transition σ f −→ τ such that σ / ∈ g and τ ∈ g . Since matrices A f , A g have non-negative entries, this implies that A f A g [ σ, τ ] > , while A g A f [ σ, τ ] = 0 . Thus, A f A g = A g A f .When this deﬁnition applies, we deﬁne A I to be the matrix Q f ∈ I A f for a stable set I ; note that thisproduct is well-deﬁned (without specifying ordering of I ) since the matrices A f all commute. Important note:

For the remainder of this paper, we assume that the resampling oracle R is T-commutative unless explicitly stated otherwise. The key object for analyzing the evolution of commutative resampling oracles is a graph structure called a witness DAG . Formally, consider a directed acyclic graph G , where each vertex v ∈ G has a label L ( v ) fromthe set F . We say that G is a witness DAG (abbreviated wdag ) if it satisﬁes the property that for all pairs ofvertices v, w ∈ G , there is an edge between v and w (in either direction) if and only if L ( v ) ≈ L ( w ) .For a wdag G with sink nodes v , . . . , v k , note that L ( v ) , . . . , L ( v k ) are all distinct and { L ( v ) , . . . , L ( v k ) } is a stable set; we denote this set by sink( G ) = { L ( v ) , . . . , L ( v k ) } ⊆ F .The primary motivation for analyzing wdags comes from the following construction. For any wdag H ,we deﬁne a matrix A H inductively according to the following rules. If H = ∅ , then A H is the identity.Otherwise, choose an arbitrary leaf node v of H and set A H = A L ( v ) A H − v . Proposition 3.1.

The deﬁnition of A H does not depend on the chosen source node v . Furthermore, there isan enumeration of the nodes of H as v , . . . , v t such that A H = Q ti =1 A L ( v i ) .Proof. We show this by induction on | H | . When | H | = 0 this is vacuously true.For the second property, we have A H = A v A H − v for a leaf node v . By induction hypothesis, thereis an ordering v , . . . , v t of H − v such that A H − v = A L ( v ) . . . A L ( v t ) . Setting v = v , we have A H = A L ( v ) . . . A L ( v t ) .For the ﬁrst property, suppose H has two leaf nodes v , v . We need to show that we get the same valueby recursing on v or v , i.e A L ( v ) A H − v = A L ( v ) A H − v (1)6ow apply the induction hypothesis to H − v and H − v , noting that v is a leaf node of H − v and v is a leaf node of H − v . We get A H − v = A L ( v ) A H − v − v , A H − v = A L ( v ) A H − v − v Thus, in order to show Eq. (1), it sufﬁces to show that A L ( v ) A L ( v ) = A L ( v ) A L ( v ) . Since v , v areboth leaf nodes, we have L ( v ) L ( v ) . Thus, this follows from T-commutativity.We say that a ﬂaw f is unrelated to a wdag H if there is no node v ∈ H with L ( v ) ≈ f . We say that f is dominated by H if A f A H ~ (cid:22) A H ~ ; recall that this means that e ⊤ σ A f A H ~ ≤ e ⊤ σ A H ~ for all states σ .To get tight bounds for distributional properties or parallel algorithms, we need to “locally” explain thehistory of a given resampling. Consider a trajectory T = ( f , . . . , f r ) of length r . For each t ≤ r , we cangenerate a corresponding wdag G Tt = GenWitness ( Q, T, t ) which provides the history of a resampling attime t . The main idea is to build the tree backward in time for s = t, . . . , ; if the current wdag G Tt does notdominate ﬂaw f s , we should add a vertex labeled f s .For a variety of technical reasons in our analysis, we may also want to add a node labeled f s , even if f s is dominated; this explains the role of Q (more details will be provided later). Note that choice of Q doesnot affect Algorithm 2 itself, only the analysis. Algorithm 3

Forming witness G Tt = GenWitness ( Q, T, t ) Initialize G Tt to contain a single vertex labeled f t for s = t − , . . . , do if f s is not dominated by G Tt OR f s ∈ Q ( G Tt ) then Add to G Tt a node v s labeled f s , with an edge from v s to each v j such that L ( v j ) ≈ f s We write G Tt,s to be the wdag G Tt just before iteration s , so that G Tt,s − is derived from G Tt,s by adding(or not) a vertex labeled f s . In this case we have G Tt = G Tt, . We also write for convenience that G Tt is theempty graph if t = 0 or t > length( T ) .There are two possible choices for Q that we will use. The default rule Q is deﬁned by setting f ∈ Q ( G ) if and only if G has a source node labeled f . This rule covers most of our analysis, includingall our results for distributional properties and for convergence of Algorithm 2. When analyzing parallelalgorithms, we will use an alternative rule Q deﬁned by setting f ∈ Q ( H ) if and only H contains asource node labeled g ≈ f . For the remainder of this paper, we will always assume that Q = Q or Q = Q unless speciﬁcally stated otherwise. We say that a wdag H appears if H ∼ = GenWitness ( Q, T, t ) for any value t . If we have ﬁxed some rule Q , then we denote by H Q the set of all non-empty wdags G which can be produced as G = GenWitness ( Q, T, t ) for any search strategy S and corresponding trajectory T during the evolution of Algorithm 2.We have the following important characterization of wdags in H Q . Proposition 3.2.

Any wdag H in H Q or H Q has a single sink node.Proof. Consider forming G Tt = GenWitness ( Q, T, t ) . Suppose that at step s of Algorithm 3, the ﬂaw f s is unrelated to G Tt,s . So A f s commutes with A L ( v ) for every v ∈ G Tt,s ; by Proposition 3.1, this implies that A f,s commutes with A G Tt,s , and so we have e ⊤ σ A f s A G Tt,s ~ e ⊤ σ A G Tt,s A f s ~ Since matrix A f s is stochastic, this is at most e ⊤ σ A G Tt,s ~ . So f s is dominated by G Tt,s . Also, since f s isunrelated to G Tt,s , the latter does not have a source node labeled f s . Hence, for either rule Q or rule Q , wewould not add a new vertex v s to G Tt,s − . 7hus, we see that whenever we add a vertex v s to G Tt , it has an edge to an already-existing node of G Tt .In particular, G Tt never gets an additional sink node (aside from the one side node corresponding to f t ).In light of Proposition 3.2, we deﬁne H Q ( f ) to be the wdags H ∈ H Q with sink( H ) = { f } , noting thatwe have H = S f H Q ( f ) . If Q is understood, we often just write H and H ( f ) .The following is the key result which ties together wdags with the algorithm dynamics (recalling that θ denotes the initial state distribution): Lemma 3.3.

For a given wdag H , the probability that H is at most θ ⊤ A H ~ . We need a number of preliminary results to prove Lemma 3.3.

Proposition 3.4.

Consider some wdag G and ﬂaw f . If, for any trajectory T = ( f , f , . . . , f r ) it holds thatGenWitness ( Q, T, t ) = G , then for trajectory T ′ = ( f , . . . , f r ) the wdag G ′ = GenWitness ( Q, T ′ , t − is uniquely determined according to the following rule:1. If G contains a unique source node v labeled f , then G ′ = G Tt − v

2. Otherwise, G ′ = G and f is dominated by G Tt .Proof. First, by deﬁnition of wdag, there must be a directed edge between any nodes labeled f . Thus, G has either zero or one source nodes labeled f .If t = 1 , this is clear since in this case G Tt would consist of a single node labeled f and G T ′ t − is theempty wdag. So let us suppose that t > . Note that Algorithm 3 obtains G Tt by possibly a node v labeled f to G T ′ t − , which by hypothesis is equal to G ′ .If Algorithm 3 adds node v to G ′ , then f is the label of a source node v of G Tt , and G ′ = G Tt − v . IfAlgorithm 3 does not add such node, then G Tt = G ′ . Since Q = Q or Q = Q , we know that G ′ does nothave a source node labeled f , and also f must be dominated by G ′ . Since G ′ = G Tt , these imply that f isdominated by G Tt = G as well. Proposition 3.5.

Let H be a wdag, and t be a non-negative integer. If we run Algorithm 2 starting withstate τ , then Pr (cid:16)S ts =1 G ˆ Ts ∼ = H (cid:17) ≤ e ⊤ τ A H ~ . Proof.

Let us deﬁne E T,H to be the event S ts =1 ( G Ts ∼ = H ) . By conditioning on the random seed used bythe ﬂaw choice strategy S (if any), we may assume that the search strategy S is deterministic. We prove theclaim by induction on t .If H is the empty wdag, the RHS is one and the statement is vacuous. So, suppose that H is non-empty.Now if t = 0 or τ is ﬂawless, then E T,H is impossible and again this is vacuous. So let us suppose that t ≥ , and that S selects a ﬂaw g to resample in τ . We can now view the evolution of Algorithm 2 as atwo-part process: we ﬁrst resample g , getting a new state τ ′ with probability A g [ τ, τ ′ ] . We then execute anew search algorithm starting at state τ ′ , wherein the ﬂaw selection rule S ′ on history ( σ ′ , σ , . . . , σ t ) is thesame as the choice of S on history ( σ, σ ′ , σ , . . . , σ t ) . We also deﬁne a corresponding trajectory ˆ T ′ whichis ˆ T shifted down by one, i.e. if ˆ T = ( f , f , . . . , f r ) then T ′ = ( f , . . . , f r ) .Suppose that it holds that G ˆ Ts ∼ = H for some value s . In this case, by Proposition 3.4, one of the twoconditions must hold: either H has a unique source node labeled v and G ˆ T ′ s − ∼ = H − v ; or H has no suchnode and G ˆ T ′ s − ∼ = H and g is dominated by H .So, let us suppose that H has a unique source node labeled v . In this case, in order for event E ˆ T ,H tooccur, we must also have E ˆ T ′ ,H − v starting at node τ ′ . By induction hypothesis, this has probability at most e ⊤ τ ′ A H ~ conditional on a ﬁxed τ ′ . Summing over τ ′ , we get a total probability of X τ ′ A g [ τ, τ ′ ] e ⊤ τ ′ A H − v ~ e ⊤ τ A g A H − v ~ e ⊤ τ A H ~

8s required.Next, let us suppose that H has no such source node. In this case, in order for event E ˆ T ,H to occur,we must also have E ˆ T ′ ,H starting at node τ ′ . By induction hypothesis, this has probability at most e ⊤ τ ′ A H ~ conditional on a ﬁxed τ ′ . Summing over τ ′ , we get a total probability of X τ ′ A g [ τ, τ ′ ] e ⊤ τ ′ A H ~ e ⊤ τ A g A H ~ Since g is dominated by H , this is at most e ⊤ τ A H ~ , again completing the induction. Proof of Lemma 3.3.

If we start the search with state τ , then the probability that H appears in ˆ T t is at most e ⊤ τ A H ~ . Integrating over τ , we see that that probability is at most P τ θ [ τ ] e ⊤ τ A H ~ θ ⊤ A H ~ .In order for H to appear, there must be some integer t ≥ such that H appears in ˆ T t . By countableadditivity of the probability measure, we have Pr( H appears ) = Pr( ∞ [ t =0 H appears in ˆ T t ) = lim t →∞ Pr( H appears in ˆ T t ) By Proposition 3.5, each term in this limit is at most θ ⊤ A H ~ , so the limit is also at most θ ⊤ A H ~ .Let us next summarize how Lemma 3.3 governs the behavior of Algorithm 2. Proposition 3.6.

For a trajectory T and values ≤ t ′ < t ≤ length( T ) we have G Tt = G Tt ′ Proof.

We show this by induction on t ′ < t . When t ′ = 0 , this is clear since G Tt ′ is empty and G Tt is not.For the induction step, suppose t > and G Tt ′ = G Tt . Let T ′ be the trajectory T shifted down by ,By Proposition 3.4, both G T ′ t − and G T ′ t ′ − are updated in the same manner depending on the ﬂaw f . Thus, G T ′ t − = G T ′ t ′ − . But by induction hypothesis, this is impossible. Proposition 3.7.

For any choice of Q , the expected number of steps taken in Algorithm 2 is at most P H ∈ H Q θ ⊤ A H ~ . In particular, if this sum converges, then Algorithm 2 terminates with probability one.Proof. Suppose we run Algorithm 2 resulting in trajectory ˆ T of length t (possibly inﬁnite). For each ﬁnitevalue s ≤ t , consider forming the wdag H s = G ˆ Ts . The resulting wdag H t clearly appears. By Proposi-tion 3.6, all such wdags H s are distinct. As a result, we have length( ˆ T ) ≤ P H ∈ H [[ H appears ]] .Taking the expectation of both sides and applying Lemma 3.3, we have E [length( ˆ T )] ≤ X H ∈ H Pr( H appears ) ≤ X H ∈ H θ ⊤ A H ~ As we show in Appendix A, under some natural conditions the T-commutativity property is necessaryin order to obtain Lemma 3.3.

The statement of Lemma 3.3 in terms of matrix products is very general and powerful theoretically, but dif-ﬁcult for calculations. In order to use Lemma 3.3 to control the algorithm runtime, among other properties,we need to bound the sums of the form X H ∈ H Q θ ⊤ A H ~ H , we need to estimate θ ⊤ A H ~ θ ⊤ Q ti =1 A L ( v i ) ~ . Second, we need to bound the sum of these quantities over the sets H Q .These two issues are quite distinct. The second issue is at the heart of the probabilistic and algorithmicforms of the LLL. As discussed by Moser & Tardos [22], it can be viewed in terms of the evolution ofcertain Galton-Watson branching processes. For the LLL, a number of local criteria are available, such asthe symmetric or asymmetric criterion. These methods can often be adapted to non-LLL settings as well.The ﬁrst issue is not as familiar, since most previous analyses of local search algorithms have focusedon scalar-valued weights for each wdag. In particular, for the original Moser-Tardos algorithm, the weightof a wdag is simply the product of the probabilities of the corresponding bad-events.As discussed in [3], many of these prior estimates can be viewed in terms of spectral bounds on thematrices A f . One effective method is to ﬁnd a vector µ which is a common (approximate) eigenvector tothe matrices A f . In the LLL setting, this vector µ plays the same role as the ambient probability space µ on Ω . We emphasize that this vector µ is only an expedient to upper-bounding the matrix products θ ⊤ A H ~ in terms of scalar products; for some applications [3], it is possible to take advantage of higher-dimensionalinformation to get more detailed bounds.Let us suppose now we have ﬁxed some probability vector µ over Ω . We deﬁne quantities called the ﬂaw charge γ f for each ﬂaw f , and the initial charge λ init , deﬁned by: γ f = max τ ∈ Ω X σ ∈ f µ ( σ ) µ ( τ ) A f [ σ, τ ] , λ init = max σ ∈ Ω θ [ σ ] µ ( σ ) . (2)Note that since θ and µ are probability distributions we must have λ init ≥ , with equality iff µ = θ .The following result of [18] illustrates the connection between this measure and the Lopsided Lov´aszLocal Lemma (LLLL): Theorem 4.1 ([18]) . Given a family of ﬂaws F and a measure µ over Ω , then for each set S ⊆ F − Γ( f ) we have µ (cid:16) f | T g ∈ S g (cid:17) ≤ γ f , where the γ f are the charges of the algorithm as deﬁned in (2) . Thus, in a certain sense, the ﬂaws F satisfy the LLLL with probabilities γ f . Moreover, as shown in [2],the charge γ f captures the compatibility between the actions of the algorithm for addressing ﬂaw f and themeasure µ . To see this, consider the probability of ending up in state τ after (i) sampling a state σ ∈ f according to µ ; and then (ii) resampling ﬂaw f at σ . Deﬁne the distortion associated with f as d f := max τ ∈ Ω P σ ∈ f µ ( σ ) B [ σ,τ ] µ ( f ) µ ( τ ) = max τ ∈ Ω µ ⊤ A f e τ µ ( τ ) µ ( f ) ≥ , (3)i.e., the maximum possible inﬂation of a state probability incurred by addressing f (relative to its probabilityunder µ , and averaged over the initiating state σ ∈ f according to µ ). Now observe from (2) that γ f = max τ ∈ Ω µ ( τ ) X σ ∈ f µ ( σ ) A f [ σ, τ ] = d f · µ ( f ) . (4)A resampling oracle R with λ init = d f = 1 for all f , is called a regenerating oracle [16], and notice thatit perfectly removes the conditional of the addressed ﬂaw. Such regenerating oracles can be used captureapplications of the more standard versions of the LLLL.For a wdag H , let us deﬁne the scalar value w ( H ) = Y v ∈ H γ L ( v ) We get the following estimate for θ ⊤ A H ~ in terms of w ( H ) :10 heorem 4.2. For any state E we have θ ⊤ A H e E ≤ λ init µ ( E ) w ( H ) . In particular, with E = Ω , we have θ ⊤ A H ~ ≤ λ init w ( H ) .Proof. Let M be the diagonal matrix on Ω with entry M [ σ, σ ] = µ ( σ ) . As observed in [3], the charge ofeach ﬂaw f can be written as γ f = k M A f M − k , since the k·k -norm of a matrix with non-negative entriesequals its maximum column sum. By Proposition 3.1, we can write A H = A f . . . A f t where v , . . . , v t arethe nodes of H and f i = L ( v i ) . We now calculate: θ ⊤ A H e E = θ ⊤ M − t Y i =1 M A f i M − ! M e E ≤ (cid:13)(cid:13)(cid:13) θ ⊤ M − t Y i =1 M A f i M − (cid:13)(cid:13)(cid:13) ∞ (cid:13)(cid:13)(cid:13) M e E (cid:13)(cid:13)(cid:13) ≤ k θ ⊤ M − k ∞ t Y i =1 k M A f i M − k µ ( E )= λ init µ ( E ) t Y i =1 γ f i = λ init µ ( E ) Y v ∈ H γ L ( v ) , where here we use the fact that the dual of the k · k -norm is the k · k ∞ -norm.The following corollary makes even more clear the connection between the matrix products A H and theunderlying LLL space µ . In past works such as [17], this corollary has been called the Witness Tree Lemma : Corollary 4.3.

For a regenerating oracle, any wdag H appears with probability at most Q v ∈ H µ ( L ( v )) . In light of Theorem 4.2, we deﬁne for any ﬂaw f the key quantity Φ Q ( f ) = X H ∈ H Q ( f ) w ( H ) . We write Φ( f ) alone if Q is clear from context. With these notations, we have the following crispcorollaries of our previous estimates: Corollary 4.4.

1. Any given wdag H appears with probability at most λ init w ( H ) .2. The expected number of resamplings of any ﬂaw f is at most λ init Φ Q ( f )

3. The expected runtime of Algorithm 2 is at most λ init P f Φ Q ( f ) .4. If Φ Q ( f ) < ∞ for all f , then Algorithm 2 terminates with probability one. The main way to bound Φ Q ( f ) is to inductively bound sums P H ∈ W w ( H ) , where W is deﬁned as thecollection of all possible wdags, not just those which could be produced as ˆ G T . We deﬁne W ( I ) to be thecollection of all wdags H with sink( H ) = I . The sum over W is tractable because of the fundamentalobservation that if G ∈ W ( I ) has sink nodes v , . . . , v t , then G ′ = G − v − · · · − v t is a smaller wdagin W ( J ) for J ⊆ S f ∈ I Γ( f ) . Shearer’s criterion for the LLL [24] essentially boils down to using thisrecursion to show show that P H ∈ W w ( H ) < ∞ . For some probability spaces, such as the variable LLLL,we may have additional structural restrictions on the wdags.Some related useful quantities are Ψ( I ) = P H ∈ W ( I ) w ( H ) and Ψ( I ) = P J ⊆ I Ψ( J ) . For a ﬂaw f ,we write Ψ( f ) as shorthand for Ψ( { f } ) . Note that Φ Q ( f ) ≤ Ψ( f ) for any Q . A useful and standardformula (see e.g., [16, Claim 59]) is that for any stable set I we have Ψ( I ) ≤ Q f ∈ I Ψ( f ) and Ψ( I ) ≤ Q f ∈ I (1 + Ψ( f )) .We summarize a few bounds on these quantities, based on versions of LLL criteria, as follows:11 roposition 4.5.

1. (Symmetric criterion) Suppose that γ f ≤ p and | Γ( f ) | ≤ d for parameters p, d with epd ≤ . Then Ψ( f ) ≤ eγ f ≤ ep for all f .2. (Neighborhood bound) Suppose that every f has P g ∈ Γ( f ) γ g ≤ / . Then Ψ( f ) ≤ γ f for all f .3. (Asymmetric criterion) Suppose there is some function x : F → [0 , with the property that ∀ f γ f ≤ x ( f ) Y g ∈ Γ( f ) (1 − x ( g )) . Then Ψ( f ) ≤ x ( f )1 − x ( f ) for all f .4. (Cluster-expansion criterion) Suppose there is some function η : F → [0 , ∞ ) with the property that ∀ f η ( f ) ≥ γ f × X I ⊆ Γ( f ) Y g ∈ I η ( g ) Then Ψ( f ) ≤ η ( f ) for all f .5. (Clique-bound criterion) Suppose that the dependency graph is covered by a collection V of cliques,i.e. f ∼ g iff there exists v ∈ V with f, g ∈ v . Suppose there is some function ζ : V → [0 , ∞ ) withthe property that ∀ v ∈ V ζ ( v ) ≥ X f ∈ v γ f Y u ∈V : f ∈ u ζ ( u ) Then Ψ( f ) ≤ Q u ∈V : f ∈ u ζ ( u ) for all f .Proof. For completness, we brieﬂy sketch the proofs. For the cluster-expansion criterion, we use an induc-tion on wdag depth to show that the total weight of all wdags H ∈ W ( I ) is at most Q f ∈ I η ( f ) .For the clique-bound criterion, apply the cluster-expansion criterion using function η ( f ) = γ f Q v ∈V : f ∈ v ζ ( v ) .For the asymmetric criterion, apply the cluster-expansion criterion using function η ( f ) = x ( f )1 − x ( f ) For the neighborhood bound criterion, apply the asymmetric criterion using x ( f ) = 2 γ f for all f .For the symmetric criterion, apply the cluster-expansion criterion using function η ( f ) = eγ f .To emphasize the connection between various LLL-type bounds, our analysis of wdags, and the behaviorof Algorithm 2, we record the following results: Proposition 4.6.

Let R denote the expected runtime of Algorithm 2. Under the conditions of Proposition 4.5,we have the following bounds on R :1. If the symmetric criterion holds, then R ≤ eλ init P f γ f ≤ O ( λ init |F | /d ) .2. If the neighborhood-bound criterion holds, then R ≤ λ init P f γ f ≤ O ( λ init |F | ) .3. If the asymmetric criterion holds, then R ≤ λ init P f x ( f )1 − x ( f )

4. If the cluster-expansion criterion holds, then R ≤ λ init P f η ( f ) .5. If the clique-bound criterion holds, then R ≤ λ init P v ∈V ζ ( v ) . In [10], Harris described an alternate convergence criterion for the variable LLLL. This analysis used ad-hoc techniques and analysis. In Appendix B, we show that this convergence criterion is also a consequenceof Lemma 3.3; it can be viewed as a different method of bounding the sums θ ⊤ P H ∈ H A H ~ . While theoverall result here is the same as [10], we ﬁnd it intriguing that this specialized result can be obtained as a“merely combinatorial” consequence of the main Lemma 3.3.12 Distributional properties

The most important consequence of commutativity is that it leads to good bounds on the distribution ofthe output of Algorithm 2 as well as its intermediate states. Heuristically, these states should be similar indistribution to the “benchmark” distribution µ .Let us consider an event E , which is an arbitrary subset of Ω , and we also deﬁne the random variable ∆ to be the terminal state of Algorithm 2, if any. Note that ∆ is necessarily ﬂawless. Our goal is to upper-bound the probability that E holds on state ∆ . In fact, we will show a stronger bound: let us deﬁne P ( E ) to be the probability that E occurs at any time during the evolution of Algorihm 2. We will show an upperbound on P ( E ) ; this immediately also bounds the probability that E holds on ∆ .To analyze this, consider adding a new ﬂaw f E , with an arbitrary choice of resampling rule (e.g. to donothing). We also modify the ﬂaw-selection strategy S to always select to resample E , if available. For thisexpanded set of ﬂaws F E , we deﬁne f E ∼ g for all existing g ∈ F . Note that since f E ∼ g for all existingﬂaws F E , the new resampling oracle we obtain is also T-commutative, if F is.Let us deﬁne H E to be the set of wdags which are produced as GenWitness ( Q , T, t ) where event E holds at time t but not at times , . . . , t − . To avoid confusion, all other quantities H , w ( H ) , γ f , Ψ( I ) , W ( I ) etc. should be interpreted in terms of the original ﬂaw set F .The following is the fundamental observation we have for distributional bounds: Proposition 5.1. P ( E ) ≤ P H ∈ H E θ ⊤ A H ~ .Proof. Suppose Algorithm 2 (with respect to the original search strategy S ) reaches a state in E ; for thetrajectory ˆ T = ( f , . . . , f t ) , let t be the ﬁrst such time it did so. So E is false at all the times , . . . , t − .Note that Algorithm 2 with search strategy S agrees with Algorithm 2 with the new search strategy S ′ atprevious times , . . . , t . Thus, trajectory ˆ T ′ = ( f , . . . , f t , f E ) appears for Algorithm 2 with respect tosearch strategy S ′ . The resulting wdag H = GenWitness ( Q , T, t + 1) is in H E .Thus, whenever E is true in the execution of Algorithm 2 on search strategy S , there is H ∈ H E whichappears for search strategy S ′ . By Lemma 3.3, for any ﬁxed such H this has probability at most θ ⊤ A H ~ We say that that a stable set I ⊆ F of ﬂaws is orderable for E if there is an enumeration I = { f , . . . , f t } such that ∀ i = 1 , . . . , t A f i A f i +1 . . . A f t e E A f i +1 . . . A f t e E We denote by I ( E ) the collection of sets I ⊆ F which are orderable for E . With this notation, we get thefollowing more legible bound: Theorem 5.2. P ( E ) ≤ P I ∈ I ( E ) P H ∈ W ( I ) θ ⊤ A H e E .Proof. Consider H ∈ H E with sink node v labeled f E which is generated as H = GenWitness ( Q , T, t ) for some trajectory T of the search strategy S ′ . For i = 0 , . . . , t let H i = G Tt,i − v , and so that H = G − v .We claim that sink( H ) is orderable for E . For, suppose not; then let s be maximal such that sink( H s ) is orderable for E . (This is well-deﬁned since sink( H t − ) = ∅ is orderable for E ). Note that H s − musthave a new sink node labeled f s , as otherwise we would have sink( H s ) = sink( H s − ) . In particular, f s isunrelated to H s .Now let I = sink( H s − v ) ; since I is orderable for E , it has associated ordering I = { g , . . . , g r } .Also, by maximality of s , I ∪ { f s } is not orderable for E . By considering the ordering { f s , g , . . . , g r } for I ∪ { f s } , we see that for all states σ it holds that e ⊤ σ A f s A g . . . A g r e E ≤ e ⊤ σ A g . . . A g r e E (5)13et V denote the sink nodes of H s , so we can write A G Tt,s = A H s − V A V A f E . We claim now that f s isdominated by G Tt,s . For, consider any state σ ; we have e ⊤ σ A f s A G Tt,s ~ e ⊤ σ A f s A H s − V A V A f E ~ e ⊤ σ A f s A g . . . A g r A H s − V e E By Eq. (5), this is at most e ⊤ σ A g . . . A g r A H s − v e E = e ⊤ σ A G Tt,s ~ .So f s is dominated by G Tt,s . Also, f s = L ( g ) for any source node of G Tt,s . So, by rule Q , we would notadd node labeled f s to G Tt,s − . Hence we would have G Tt,s = G Tt,s − , contradicting maximality of s .Thus, the wdag H ′ is in W ( I ) where I = sink( H ) is orderable to E . To get the upper bound on P ( E ) ,we take a union bound over possible choices for such H ′ ; by Proposition 5.1, we have P ( E ) ≤ λ init µ ( E ) X I ∈ I ( E ) Ψ( I ) X I ∈ I ( E ) X H ′ ∈ W ( I ) θ ⊤ A H ′ A f E ~ Since f E only maps states in E , this is at most P I ∈ I ( E ) P H ′ ∈ W ( I ) θ ⊤ A H ′ e E .For event E and state σ ∈ E , let us deﬁne ˜Γ( E, σ ) to be the set of ﬂaws f ∈ F which can cause E tooccur via state σ , i.e. f maps some state σ ′ / ∈ E to σ ∈ E . We also deﬁne ˜Γ( E ) = S σ ∈ E ˜Γ( E, σ ) , i.e. theset of ﬂaws which can cause E . To explain and motivate Proposition 5.1, we get a constructive version ofthe LLL distribution bound shown in [9]. Corollary 5.3. P ( E ) ≤ λ init µ ( E ) P I ∈ I ( E ) Ψ( I ) ≤ λ init µ ( E )Ψ(˜Γ( E )) .Proof. The ﬁrst inequality follows directly from Theorem 5.2, Theorem 4.2 and deﬁnition of Ψ . For thesecond inequality, we claim that if I ∈ I ( E ) then I ⊆ ˜Γ( E ) . For, let I be ordered as I = { g , . . . , g r } andsuppose for contradiction that g i ˜Γ( E ) . Since I is stable, it holds that for all states σ we have e ⊤ σ A g i A g i +1 . . . A g r e E = e ⊤ σ A g i +1 . . . A g r A g i e E Note that g i only maps states already in E to E , and so e ⊤ τ A g i e E ≤ e ⊤ τ e E for any state τ . Thus, for all states σ , we have e ⊤ σ A g i A g i +1 . . . A g r e E ≤ e ⊤ σ A g i +1 . . . A g r e E , contradicting the deﬁnition of orderable set.We note that Iliopoulos [17] had previously shown a bound similar to Corollary 5.3, but it had threeadditional technical restrictions: (i) it only worked for commutative resampling oracles, not T-commutativeresampling oracles; (ii) it additionally required the construction of a commutative resampling oracle for theevent E itself; and (iii) if the resampling oracle for E is not regenerating, it gives a strictly worse bound.Theorem 5.2 can be used with the common LLL criteria, as follows; the proofs are immediate frombounds on Ψ shown in Proposition 4.5. Proposition 5.4.

Under four criteria of Proposition 4.5, we have the following estimates for P :1. If the symmetric criterion holds, then P ( E ) ≤ µ ( E ) × e e | ˜Γ( E ) | p ≤ µ ( E )16 | ˜Γ( E ) | p .2. If the neighborhood-bound criterion holds, then P ( E ) ≤ µ ( E ) × e P f ∈ ˜Γ( E ) γ f ≤ µ ( E )55 P f ∈ ˜Γ( E ) γ f .3. If function x satisﬁes the asymmetric criterion, then P ( E ) ≤ µ ( E ) × Q f ∈ ˜Γ( E ) 11 − x ( f ) .4. If function η satisﬁes the cluster-expansion criterion, then P ( E ) ≤ µ ( E ) × P I ∈ I ( E ) Q g ∈ I η ( g ) . Ω is the uniform distribution on the set of permutations on n letters.Each ﬂaw should have the form f ≡ πx = y ∧ · · · ∧ πx r = y r with the dependency graph given by f ∼ g if f ∩ g = ∅ . We then get the following distributional result: Theorem 5.5.

In the permutation LLL setting, consider an event E = g ∩ · · · ∩ g r where g i is an atomicevent πx i = y i . We have P ( E ) ≤ ( n − r )! n ! r Y i =1 (cid:16) X f ∈F : f ∼ g i Ψ( f ) (cid:17) The work [12] used an ad-hoc analysis based on a variant of witness trees to show a bound on theterminal distribution ∆ ; this result can be recovered automatically as well from Theorem 5.2. (The proof istechnical and is deferred to Appendix C.) A number of resampling oracles have an additional useful property that we refer to as injectivity . We saythat R is injective if for every ﬂaw f and state σ there is at most one state σ ′ with A f [ σ ′ , σ ] > . When thisproperty holds, we deﬁne the pull-back Pull ( f, σ ) to be σ ′ ; if no such state exists we deﬁne Pull ( f, σ ) = ∅ .For the remainder of this section, we assume R is injective.When the resampling oracle is injective, there is a different type of distributional bounds available whichcan be stronger than Theorem 5.3 for “complex” events (i.e., events which are composed from simplerevents). Most known resampling oracles, including virtually all of the commutative ones, are injective.For example, it holds for the variable LLLL, for the uniform distribution of permutations, the uniformdistribution on matchings of K n , and the uniform distribution on hamiltonian cycles of K n [11].It is useful to extend the deﬁnition of pullback to wdags. A simple induction shows that for any state σ and wdag H there is at most one state τ such that e ⊤ τ A H e σ > . We thus deﬁne Pull ( H, σ ) to be this state τ (or ∅ if no such state exists.)For a wdag H , we say that a wdag G is a preﬁx of H if G is a subgraph of H and for each directed edge ( u, v ) ∈ H , where v ∈ G , we also have u ∈ G . We denote this by G E H . If H = G we say it is a strictpreﬁx and write G ⊳ H .Now let us ﬁx some state subset E ⊆ Ω . We deﬁne A ( E ) to be the collection of all pairs ( H, σ ) suchthat σ ∈ E and there is no strict preﬁx H ′ ⊳ H with Pull ( H − H ′ , σ ) ∈ E . We note the followingcharacterization of set A ( E ) : Proposition 5.6. If ( H, σ ) ∈ A ( E ) then sink( H ) ⊆ ˜Γ( E, σ ) .Proof. Let v be a sink node of H . Then, consider preﬁx H ′ = H − v and note that A H − H ′ = A f . Bydeﬁnition of E -minimality, there is exactly one state τ = Pull ( f, σ ) that can map to σ to f . This state τ isnot in E . Hence, we have f ∈ ˜Γ( E, σ ) .With this deﬁnition, we will show the following bound: Theorem 5.7.

If the resampling oracle is injective, then P ( E ) ≤ P ( H,σ ) ∈ A ( E ) θ ⊤ A H e σ In previous papers [1, 3, 21] this property was referred to as atomicity . We use the terminology “injectivity” to distinguish itfrom our later discussion of “atomic” events. T ending in a state ρ . For a wdag H and state σ , we say that ( H, σ ) is a subwitness of T if H E G T and Pull ( G T − H, ρ ) = σ . Note that if ( H, σ ) is a subwitness of the truncated trajectory T s for any s , then necessarily ( H, σ ) is a subwitness of T . We also deﬁne E H,σ to be the event that ( H, σ ) is a subwitness of the trajectory ˆ T . We have the following bound: Lemma 5.8.

If Algorithm 2 starts at state τ , then event E H,σ occurs with probability at most e ⊤ τ A H e σ .Proof. We will show that the probability that E H,σ occurs within t max steps is at most e ⊤ τ A H e σ . Taking thelimit as t max → ∞ then gives the claimed result. We show this by induction on t max .The base cases where either t max = 0 or when τ is ﬂawless are clear. For the induction step, supposethat we have ﬁxed a deterministic resampling rule S which selects ﬂaw g in τ . Suppose that we resamplestate τ to τ ′ , and let ˆ T ′ = ( f , . . . , f t ) be the subsequent trajectory. Deﬁne G = G ˆ Tt and G ′ = G ˆ T ′ t − .Suppose that E H,σ occurs for trajectory T , so that H E G . By Proposition 3.4, either G contains aunique source node v labeled g , or g is unrelated to G . Since H E G , these in turn imply that H contains aunique source node v labeled g or g is unrelated to H .In the ﬁrst case, the wdag H ′ = H − v must be a preﬁx of G ′ = G − v , and we have Pull ( G ′ − H ′ , ρ ) = Pull ( G − H, ρ ) = σ . Thus, in trajectory ˆ T ′ , the event E H ′ ,σ occurs. Summing over τ ′ , we thus get the bound Pr( E H,σ ) ≤ X τ ′ A g [ τ, τ ′ ] Pr( E H − v,σ holds for ˆ T ′ starting at τ ′ ) By induction hypothesis, this is at most X τ ′ A g [ τ, τ ′ ] e ⊤ τ ′ A H − v e σ = e ⊤ τ A g A H − v e σ = e ⊤ τ A H e σ as desired.In the second case, we have H E G ′ = G . Letting σ ′ = Pull ( G ′ − H, ρ ) , we see that the event E H,σ ′ holds for trajectory ˆ T ′ . Also observe that Pull ( g, σ ′ ) = Pull ( g, Pull ( G ′ − H, ρ )) =

Pull ( G − H, ρ ) = σ .Now let us deﬁne V to be the set of all states ν with Pull ( g, ν ) = σ , and so σ ′ ∈ V . To bound theprobability of event E H,σ ′ , we integrate over τ ′ and take a union bound over all states σ ′ ∈ V . This gives: Pr( E H,σ holds on ˆ T ) ≤ X σ ′ ∈ V,τ ′ Pr( E H,σ ′ holds on T starting at state τ ′ ) By using the induction hypothesis, and the fact that matrices A g and A H commute, this implies that Pr( E H,σ holds on ˆ T ) ≤ X σ ′ ∈ V,τ ′ A g [ τ, τ ′ ] e ⊤ τ ′ A H e σ ′ = X σ ′ ∈ V e ⊤ τ A g A H e σ ′ = e ⊤ τ A H X σ ′ ∈ V A g e σ ′ Let us deﬁne the vector x = e ⊤ τ A H , and we can write this as X σ ′ ∈ V xA g e σ ′ = X i x [ i ] X σ ′ : Pull ( g,σ ′ )= σ A g [ i, σ ′ ] . The term i only contributes here if A g [ i, σ ′ ] > and A g [ σ, σ ′ ] > . Since R is injective, this occursonly for i = σ . So we have: X i x [ i ] X σ ′ : Pull ( g,σ ′ )= σ A g [ i, σ ′ ] = X σ ′ x [ σ ] A g [ σ, σ ′ ] which, by stochasticity, is precisely x [ σ ] . So overall, we have shown e ⊤ τ A H P σ ′ ∈ V A g e σ ′ = e ⊤ τ A H e σ ,which implies that Pr( E H,σ holds on ˆ T ) ≤ e ⊤ τ A H e σ This completes the induction.16e now prove Theorem 5.7:

Proof of Theorem 5.7.

Consider running the Algorithm 2 until it reaches a state in E or a ﬂawless state.Suppose the resulting trajectory ˆ T reaches a state ρ ∈ E at some time t . Observe then that event E ρ,G has occured for trajectory ˆ T , where write G = G ˆ Tt . Accordingly, we may select H to be of minimal sizesuch that event E H,σ occurs for ˆ T . It must be the case that ( H, σ ) ∈ A ( E ) , as otherwise there wouldbe some H ′ ⊳ H with Pull ( H − H ′ , σ ) = σ ′ ∈ E . In this case, we would have Pull ( G − H ′ , ρ ) = Pull ( H − H ′ , Pull ( G − H, ρ )) =

Pull ( H − H ′ , σ ) = σ ′ and H ′ ⊳ H E ˆ G . So event E H ′ ,σ ′ also occurs,contradicting minimality of H .Thus, a necessary condition for E to occur in ˆ T is that E H,σ holds for some ( H, σ ) ∈ A ( E ) . Taking aunion bound over A ( E ) then gives: P ( E ) ≤ X ( H,σ ) ∈ A ( E ) Pr((

H, σ ) is subwitness of ˆ T ) By Lemma 5.8, each term in the sum is at most θ ⊤ A H e σ .As usual, we can use our scalar weights and Theorem 4.2 to get a simpliﬁed bound: Corollary 5.9.

We have the bounds P ( E ) ≤ λ init X ( H,σ ) ∈ A ( E ) µ ( σ ) w ( H ) ≤ λ init X σ ∈ E µ ( σ )Ψ(˜Γ( E, σ )) ≤ µ ( E ) · λ init · max σ ∈ E Ψ(˜Γ(

E, σ )) Corollary 5.10.

Let C be a collection of events in Ω and let E = S C ∈C C . Then P ( E ) ≤ λ init · µ ( E ) · max C ∈C Ψ(˜Γ( C )) Proof.

For a state σ ∈ E , there must be event C σ ∈ C holding on σ . Consider some g ∈ ˜Γ( E, σ ) ; there mustbe a state τ / ∈ E which gets mapped via g to σ . In particular, event C σ is false on τ . So g ∈ ˜Γ( C σ ) . Thisimplies that Ψ(˜Γ(

E, σ )) ≤ Ψ(˜Γ( C σ )) and so Ψ(˜Γ(

E, σ )) ≤ max C ∈C Ψ(˜Γ( C )) . The result now followsfrom Corollary 5.9.We remark that a slightly weaker version of Corollary 5.10 had been shown in [12] for the variable LLL,based on arguments speciﬁcally tailored to that space. Moser & Tardos [22] described a simple parallel version of their resampling algorithm, which can be sum-marized as follows:

Algorithm 4

Parallel Moser-Tardos algorithm Draw state X from distribution µ for k = 1 , , . . . while some bad-event is true on X do Select some arbitrary MIS I of bad-events true on X Resample, in parallel, all variables involved in events in I A variety of problem-speciﬁc resampling algorithms had been developed for other probability spaces[14, 10]. One main beneﬁt of the commutativity property is that it enables much more general parallel17mplementations of Algorithm 2. As a starting point, [21] discussed a generic framework for parallelizationsummarized as follows:

Algorithm 5

Generic parallel resampling framework Initialize the state σ for k = 1 , , . . . while some ﬂaw holds on σ do Set V = ∅ to be the set of ﬂaws currently holding on σ while V = ∅ do Select, arbitrarily, a ﬂaw f ∈ V . Update σ ← R σ ( σ ) . Remove from V all ﬂaws g such (i) σ / ∈ g ; or (ii) f ≈ g Each iteration of the loop over k is called a round of this algorithm. We emphasize this is a sequential algorithm, which can be viewed as a version of Algorithm 2 with an unusual ﬂaw-selection choice. If a roundcan be simulated in polylogarithmic time, we get an RNC algorithm to ﬁnd a ﬂawless object. Almost allknown parallel local search algorithms, including Algorithm 4, fall into this framework. Harris [11] furthershowed a general simulation method for resampling oracles which are obllivious, as well as satisfying a fewadditional computational properties.We deﬁne V k to be the set of ﬂaws V in round k , and we deﬁne I k to be the set of ﬂaws which areactually resampled at round k (i.e. a ﬂaw f selected at some iteration of line (5)). Note that I k is a stable set.Let b k = P i

For all f ∈ V k there exists g ∈ I k − with f ≈ g .Proof. First, suppose that f / ∈ V k − . In this case, the only way f could become true at round k would bethat some g ≈ f was resampled at round k − , i.e. g ∈ I k − . Otherwise, suppose that f ∈ V k − . Theneither it was removed from V k − due to resampling of some g ≈ f , or f became false during round k − .In the latter case, note that in order to later become true at the beginning of round k , there must be someother g ′ ∈ I k − resampled after g with g ′ ≈ f .We now show the claim by induction on k . For the base case k = 1 , we can easily see that if f isresampled at round then the wdag with a singleton node labeled f appears.For the induction step, suppose that V k = ∅ . So there is some f ∈ V k . By our above claim, there mustbe some g ∈ I k − with g ≈ f . Now by induction hypothesis there is some wdag H with sink node labeled g and depth k − which appears. If we form a new dag H ′ by adding a sink node labeled f , we get a wdagwith depth k and sink node labeled f which appears. Proposition 6.2.

Consider running Algorithm 5 obtaining trajectory T . Then, for each t in the range b k + 1 , . . . , b k the wdag G Tt = GenWitness ( Q , T, t ) has depth precisely k .Proof. For each j = 1 , . . . , k , let us deﬁne the corresponding wdag H j = G Tt,b +1 . We show by backwardsinduction on j the following properties hold:1. The depth of H j is precisely k − j + 1

2. The nodes v ∈ H j with depth k − j + 1 correspond to resamplings in round j j = k is clear, since then H j consists of a singleton node corresponding to the resamplingat time t in round k .For the induction step, observe that we form H j from H j +1 by adding nodes corresponding to re-samplings in I j . By induction hypothesis, H j +1 has depth k − j + 1 . Since I j is a stable set, we have depth( H j ) ≤ H j +1 ) and furthermore the nodes at maximal depth correspond to resamplings in I j . By induction hypothesis, this implies that depth( H j − ) ≤ k − j + 1 and that nodes v ∈ H j with depth k − j + 1 correspond to resamplings in round j . So we just need to show that there is at least one such node.Consider any node v of H j +1 with depth k − ( j + 1) + 1 and L ( v ) = g ; by induction hypothesis thiscorresponds to a resampling in round j + 1 . By Proposition 6.1, we have g ≈ f s for some time s in round j .Let H ′ = G Tt,s . If v is no longer a source node in G Tt,s , then the node w with an edge to v would be a nodeof H j of depth k − j + 1 , as desired.Otherwise, suppose that v is such a source node. Since g = L ( v ) ≈ f s , our deﬁnition of Q ensuresthat Algorithm 3 will add a node labeled f s as a source node, which will have an edge to v . So f s has depth k − j + 1 in H j .This completes the induction. The stated bound then holds since G Tt = G Tt,b j +1 for j = 1 . Proposition 6.3.

For any f ∈ F and index k ≥ , we have Pr( f ∈ V k ) ≤ X H ∈ H Q ( f )depth( H )= k θ ⊤ A H ~ Proof.

As we have discussed, Algorithm 5 can be viewed as an instantiation of Algorithm 2 with a ﬂawselection rule S . For a ﬁxed f , let us deﬁne a new ﬂaw selection rule S f as follows: it agrees with S up toround k ; it then selects f to resample at round k if it is true.Now, let us notice that the behavior of Algorithm 2 for S and S f is identical up through the ﬁrst b k − resamplings. Furthermore, we have f ∈ V k for Algorithm 5 if and only if Algorithm 2 selects f for resam-pling at iteration b k + 1 . Consider the resulting wdag G T = GenWitness ( Q , T, b k + 1) ; by Proposition 6.2it has depth k . Furthermore, it has a sink node labeled f . Finally, since it is produced from a trajectorycorresponding to a ﬂaw selection rule S , we have G T ∈ H Q .Thus, we see that if f ∈ V k , then there is some H ∈ H Q ( f ) with depth( H ) = k which appears. Tobound the probability of f ∈ V k , we take a union bound over all such H and apply Lemma 3.3 Corollary 6.4. P k E [ | V k | ] ≤ P f λ init Φ Q ( f )

2. For any integer t ≥ , the probability that Algorithm 5 runs for more than t rounds is at most λ init P H ∈ H Q :depth( H ) ≥ t w ( H ) /t .Proof. First, by Theorem 4.2 and Proposition 6.3, we have for any k : E [ | V k | ] = X f Pr( f ∈ V k ) ≤ X H ∈ H Q depth( H )= k λ init w ( H ) Using this bound, we compute X k E [ | V k | ] = X k X H ∈ H Q depth( H )= k w ( H ) = X f X H ∈ H Q ( f ) λ init w ( H ) = λ init X f Φ Q ( f ) Y = P k ≥ t | V k | . We then have: E [ Y ] = X k ≥ t E [ | V k | ] ≤ X H ∈ H Q depth( H )= k λ init w ( H ) Now, if Algorithm 5 reaches iteration t , then necessarily V k = ∅ for k = t, . . . , t , and so Y ≥ t . ByMarkov’s inequality applied to Y , we thus get Pr(

Alg reaches round t + 1) ≤ Pr( Y ≥ t ) ≤ E [ Y ] /t ≤ λ init X H ∈ H Q depth( H ) ≥ t w ( H ) /t The usual strategy to bound the sum over wdags H with depth( H ) ≥ t in Corollary 6.4 is to use an“inﬂated” weight function deﬁned as w ǫ ( H ) = w ( H )(1 + ǫ ) | H | = Y v ∈ H (cid:16) (1 + ǫ ) γ L ( v ) (cid:17) and corresponding sum W ǫ = λ init X H ∈ H Q w ǫ ( H ) , for some ǫ > . This gives the following results: Proposition 6.5.

With probability at least − δ , Algorithm 5 terminates in O ( log(1 /δ + ǫW ǫ ) ǫ ) rounds and has P k | V k | ≤ O ( W ǫ /δ ) . Furthermore, if the resampling oracle is regenerating and satisﬁes the computationalrequirements given in [11] for input length n , then with probability − / poly( n ) the algorithm of [11]terminates in O ( log ( n + ǫW ǫ ) ǫ ) time on an EREW PRAM.Proof. We show only the ﬁrst result; the second depends on numerous deﬁnitions and results of [11].For the bound on P k | V k | , simply use Corollary 6.4(1) and Markov’s inequality. For the bound on thenumber of rounds, we calculate X H ∈ H Q depth( H ) ≥ t w ( H ) = λ init X H ∈ H Q depth( H ) ≥ t w ǫ ( H )(1 + ǫ ) −| H | ≤ λ init (1 + ǫ ) − t X H ∈ H Q w ǫ ( H ) = λ init (1 + ǫ ) − t W ǫ By Corollary 6.4(2), we thus need λ init (1 + ǫ ) − t W ǫ /t ≤ δ to ensure termination by round t withprobability at least δ . Straightforward analysis shows that this holds for t = O ( log(1 /δ + ǫW ǫ ) ǫ ) .Bounding W ǫ is very similar to bounding P H w ( H ) = W , except with a small “slack” in the charges.More speciﬁcally, we need to satisfy Proposition 4.5 except with the charges γ f replaced with inﬂated values (1 + ǫ ) γ f . For example, using standard estimates (see [8, 21, 3]) we can get the following simpliﬁed bounds: Proposition 6.6.

1. Suppose that the resampling oracle is regenerating and that the vector of probabil-ities p (1 + ǫ ) still satisﬁes the LLLL criterion. Then W ǫ/ ≤ O ( m/ǫ ) . In particular, Algorithm 5terminates after O ( log( m/δ ) ǫ ) rounds with probability − δ .2. Suppose that γ f ≤ p and | Γ( f ) | ≤ d such that epd (1 + ǫ ) ≤ . Then W ǫ/ ≤ O ( λ init m/ǫ ) .Algorithm 5 terminates after O ( log( λ init m/δ ) ǫ ) rounds with probability at least − δ . Compositional properties for resampling oracles

In many applications, the ﬂaws and their resampling oracles are built out of a collection of simpler, “atomic”events. For example, in the permutation LLL setting, these would be events of the form πx = y . In [11],Harris described a generic construction for this type of composition when the atomic events satisfy anadditional property referred to as obliviousness . Let us now review this construction, and how it works withT-commutativity.Consider a set A of events, along with a resampling oracle R and a dependency relation ∼ . The set A should be thought of as “pre-ﬂaws”, that is, it has all the structural algebraic properties of a resamplingoracle, but does not necessarily satisfy any convergence condition such as the LLLL. If we run the localsearch algorithm with this set of ﬂaws A , the algorithm will likely not converge. Deﬁnition 7.1 (Oblivious resampling oracle) . The resampling oracle is oblivious if for each pair f, g ∈ A with f g and each r ∈ R f one of the following two conditions holds: • For all σ ∈ f ∩ g we have R f ( σ, r ) ∈ g • For all σ ∈ f ∩ g we have R f ( σ, r ) g Let us now suppose that this condition holds. For each f ∈ A and g , . . . , g s ∈ A with g i f , wedeﬁne R f ; g ,...,g s to be the set of values r ∈ R f such that R f ( σ, r ) ∈ g ∩ · · · ∩ g t . With some abuse ofnotation, we also use R f ; g ,...,g s to refer to the probability distribution of drawing r from R f , conditionedon having r in the set R f ; g ,...,g s . Note that in light of Deﬁnition 7.1 this is well-deﬁned irrespective of σ .For each stable subset E , we deﬁne h E i to be the intersection of the events in E i.e. h E i = T f ∈ E f .Given the set A , one can construct a new set of events A as A = {h E i | E a stable subset of A} The intent is to choose the ﬂaw set F to be some arbitrary subset of A . To that end, we will showthat A has a resampling oracle which satisﬁes all its required structural properties. Again, the local searchalgorithm will not necessarily converge using the full ﬂaw set A ; we must use some more problem-speciﬁcarguments to show that our chosen subset F satisﬁes the required convergence properties.We ﬁrst deﬁne the resampling oracle R on A . Consider some g = h E i for a stable set E , and arbitrarilyenumerate E as E = { f , . . . , f t } . We deﬁne R g to the probability distribution on tuples ( r , . . . , r t ) wherein r i is drawn independently from R f i ; f i +1 ,...,f s . For r = ( r , . . . , r t ) , we then set rσ = R g ( σ, ( r , . . . , r t )) = r t . . . r σ Finally, we deﬁne the relation ∼ on A by setting h E i ∼ h E ′ i iff there exist f ∈ E, f ′ ∈ E ′ with f ∼ f ′ . Theorem 7.2 ([11]) . Suppose that R is an oblivious resampling oracle for A , which is not necessarilyT-commutative. Then:1. R is an oblivious resampling oracle for A .2. The relation ∼ is a dependency relation for A .3. If the resampling oracle R on A is regenerating, then the resampling oracle on A is also regenerating.4. If the resampling oracle R on A is injective, then the resampling oracle on A is also injective.

21t would seem reasonable that if A is commutative in the sense of Kolmogorov, then A would be aswell. Unfortunately, we do not know how to show such a result. We can show, however, that if A is T-commutative, then A is as well, plus inheriting further nice properties. This is a good illustration of how thenew deﬁnition of commutativity is easier to work with, beyond its advantage of greater generality. Proposition 7.3.

Suppose that A is not necessarily T-commutative. For a given ﬂaw g = h E i , let ussuppose that, in order to deﬁne R g , we have enumerated the stable-set E as E = { f , . . . , f t } . Then A g = cA f . . . A f t where scalar c is given by c = t Y i =1 r i ∼ R fi ( r i ∈ R f i ; f i +1 , . . . , f t ) Proof.

By deﬁnition of R g , we have A g [ σ, σ ′ ] = Pr ( r ,...,r t ) ∼ R g ( r t . . . r σ = σ ′ ) . Let us deﬁne S i = R f i ; f i +1 ,...,f t . Note that each r i is drawn independently from S i . So can further decompose this sum in termsof the intermediate values σ i = r i . . . r σ for i = 0 , . . . , t (where σ = σ ) as follows: A g [ σ, σ ′ ] = X σ ,...,σ t σ t = σ ′ t Y i =1 Pr r i ∼ S i ( r i σ i − = σ i ) (6)Now, suppose that we have σ i / ∈ f j for some j > i . In this case, the term Pr r i ∼ S i ( r i σ i − = σ i ) inEq. (6) must be zero, since r i ∈ S i = R f i ; f i +1 ,...,f t . Also, since ∼ is a dependency relation and we have f k f i for k = i + 1 , . . . , j , it must be that σ k / ∈ f j for all k = i + 1 , . . . , j as well. Consequently, σ j − / ∈ f j and so the term A f j [ σ j − , σ j ] in Eq. (6) is equal to zero.So we may assume that σ i ∈ f i +1 ∩ · · · ∩ f t for all i = 0 , . . . , t . In this case, we have Pr r i ∼ R i ( r i σ i − = σ i ) = A f i [ σ i − , σ i ] for all i . Furthermore, since R is oblivious, any r i ∈ R i which satisﬁes r i σ i − = σ i must also lie in the set R f i ; f i +1 ,...,f t = S i . Therefore, we have Pr r i ∼ S i ( r i σ i − = σ i ) = Pr r i ∼ R i ( r i σ i − = σ i ∧ r i ∈ S i )Pr r i ∼ R i ( r i ∈ S i ) = Pr r i ∼ R i ( r i σ i − = σ i )Pr r i ∼ R i ( r i ∈ S i ) = A f i [ σ i − , σ ]Pr r i ∼ R i ( r i ∈ S i ) Substituting into Eq. (6), we get: A g [ σ, σ ′ ] = X σ ,...,σ t σ t = σ ′ A f [ σ , σ ] . . . , A f t [ σ t − , σ t ] Q ti =1 Pr r i ∼ R i ( r i ∈ S i ) = P σ ,...,σ t σ t = σ ′ A f [ σ , σ ] . . . , A f t [ σ t − , σ t ] Q ti =1 Pr r i ∼ R i ( r i ∈ S i )= c ( A f . . . A f t )[ σ, σ ′ ] Theorem 7.4.

If the resampling oracle is T-commutative on A , then it is also T-commutative on A .Proof. Let g = h E i and g ′ = h E ′ i for stable sets E, E ′ such that g g ′ . So f f ′ for all f ∈ E and f ′ ∈ E ′ . By Proposition 7.3 we have A g A g ′ = c g c g ′ (cid:16) Y f ∈ E A f Y f ′ ∈ E ′ A f ′ (cid:17) , A g ′ A g = c g ′ c g (cid:16) Y f ′ ∈ E ′ A f ′ Y f ∈ E A f (cid:17) All these matrices A f , A f ′ commute with each other, so both quantities are equal.22 Necessity of T-commutativity for Lemma 3.3

Below we assume that B ∗ is a set of events that comes with a dependency relation ∼ and resampling oracles R f for each f ∈ B ∗ . Deﬁnition A.1. B ∗ is called complete if for each σ ∈ Ω there exists a ﬂaw h σ = { σ } ∈ B ∗ , and with h σ ∼ g for all g ∈ B ∗ . Note that this deﬁnition is satisﬁed if B ∗ is generated by atomic events corresponding to permutations,perfect matchings of hypergraphs, or spanning trees. We show now that if T-commutativity fails in suchspaces, even for a single pair of ﬂaws, then some wdags may appear with probability arbitrarily higher thantheir weight. Theorem A.2.

Let B ∗ be a complete set of regenerating oracles, and suppose that there exists f, g ∈ B ∗ with f ≁ g and A f A g = A g A f . Then for any C > there exists a set of ﬂaws B ⊆ B ∗ with |B| = 3 ,wdag H with a single sink and a ﬂaw resampling strategy S such that the probability that H appears in theexecution of the algorithm at least C · w ( H ) = C · Q v ∈ H µ ( L ( v )) .Proof. Consider states σ, τ with A f A g [ σ, τ ] = A g A f [ σ, τ ] . Denote x = A f A g e τ and y = A g A f e τ , then x [ σ ] = y [ σ ] . Assume w.l.o.g. that x [ σ ] < y [ σ ] . Note that µ ⊤ A f A g = µ ⊤ A g A f = γ f γ g · µ ⊤ since theoracles are regenerating, and therefore µ ⊤ x = µ ⊤ y = γ f γ g · µ [ τ ] = γ f γ g γ h .Consider the following strategy S given a current state σ : (i) if σ = σ then prioritize ﬂaws f, g, h atsteps 1,2,3 respectively; (ii) if σ = σ then prioritize ﬂaws g, f, h at steps 1,2,3 respectively. We say that therun succeeds if the sequence of addessed ﬂaws is ( f, g, h ) in the ﬁrst case and ( g, f, h ) in the second case.Clearly, the probability of success equals e ⊤ σ A f A g e τ = e ⊤ σ x in the ﬁrst case and e ⊤ σ A g A f e τ = e ⊤ σ y in thesecond case. If σ is distributed according to µ then the probability of success is p = µ [ σ ] · e ⊤ σ y + X σ ∈ Ω −{ σ } µ [ σ ] · e ⊤ σ x = µ [ σ ] · ( e ⊤ σ y − e ⊤ σ x ) + X σ ∈ Ω µ [ σ ] · e ⊤ σ x = µ ⊤ x + µ [ σ ] · ( y [ σ ] − x [ σ ]) > γ f γ g γ h Furthermore, if the run succeeds then the last state is distributed according to µ (since step 3 resamples h atstate τ , and the oracles are regenerating).Now consider the trajectory T which repeats the sequence f, g, h for n times, and the correspondingwdag H = G T n which has a single sink node labeled h . Let S n be the strategy S repeated cyclically. Fromthe previous paragraph, the probabality that the run starting with some distribution θ produces H is givenby c θ · p n − , where c θ depends only on the initial distribution. Note that w ( H ) = ( γ f γ g γ h ) n . Choosing asufﬁciently large n now gives the claim. B The variable LLLL

Consider the probability space Ω consisting of n independent variables X , . . . , X n , and where every ﬂaw f ∈ F is a monomial term of the form X i = j ∧ · · · ∧ X i k = j k We deﬁne var( f ) = { i , . . . , i k } , and we also say that f demands X i ℓ = j ℓ for each ℓ = 1 , . . . , k .When discussing wdags, etc. we often write var( v ) as shorthand for var( L ( v )) .This gives a regenerating oracle, and the LLLL applies to this space. However, indeed, [10] showed astronger convergence could be used for this space, based on a condition called “orderability.” This couldlead to tighter bounds for applications such as k -SAT with bounded variable occurences.23he analysis given in [10] was somewhat ad hoc; in addition, the condition given was not clearly com-parable in strength to other existing LLL conditions. In this section, we show this orderability conditionfollows automatically from the generic construction of wdags in Section 3. Deﬁnition B.1 ([10]) . Consider a ﬂaw f and set I ⊆ Γ( f ) . We say that I is v-orderable to f if either I = { f } or there is an ordering I = { g , . . . , g ℓ } such that, for p = 1 , . . . , ℓ , the ﬂaw g p disagrees with f on some variable i p but the ﬂaws g , . . . , g p − do not disagree with f on variable i p .We deﬁne Ord ( f ) to be the collection of v-orderable sets for f . (This is closely related, but not exactlyequivalent, to the deﬁnition of orderability given in Section 5) Theorem B.2 ([10]) .

1. Suppose that there is some function η : F → [0 , ∞ ) satisfying the condition: ∀ f η ( f ) ≥ γ f X I ∈ Ord ( f ) Y g ∈ I η ( g ) Then Algorithm 2 converges, and the expected number of resamplings of any ﬂaw f is at most η ( f ) .2. Suppose that there is some function η : F → [0 , ∞ ) satisfying the condition: ∀ f η ( f ) ≥ (1 + ǫ ) γ f X I ∈ Ord ( f ) Y g ∈ I η ( g ) Then there is a parallel implementation of Algorithm 2 converging in O ( log( m P f η ( f )) ǫ ) rounds w.h.p. To show this, we will show that a wdag in H Q corresponds to a type of “modiﬁed witness tree”. Todescribe this, consider a tree K whose nodes are labeled by ﬂaws f . We deﬁne L ( v ) to be the label of anode v and C ( v ) = { L ( w ) : w a child of v } . The key invariant we maintain is that C ( v ) is v-orderable to L ( v ) for all nodes v ; we deﬁne T to be the collection of all labeled trees which satisfy this property. We saythat a ﬂaw f is eligible for v if f / ∈ C ( v ) and C ( v ) ∪ { f } remains v-orderable to L ( v ) .In parallel to Algorithm 3 to construct wdag G Tt,s , consider the following procedure to generate a tree K Tt,s ∈ T . Initially, K Tt,t consists of a singleton node labeled f , where f is the corresponding label of theroot of G Tt,t − . At step s < t , we obtain K Tt,s − from K Tt,s as follows. If G Tt,s − = G Tt,s , we do nothing.Otherwise, if G Tt,s − is obtained by adding a source node labeled f , we select the node w ∈ K Tt,s of greatestdepth such that f is eligible to w , and add a new child of w labeled f . (If there is no such node, we seed K Tt,s = ⊥ ; as we will show, this never occurs).To analyze the evolution of this process, we deﬁne the Active Condition imposed by a wdag or a modiﬁedwitness tree. Speciﬁcally, for a variable i and value j , we say that ( i, j ) ∈ Active( G ) if there is a node v ∈ G such that L ( v ) demands X i = j , and there is no node w ∈ G with depth( w ) > depth( v ) and i ∈ var( w ) .(This deﬁnition applies both if G is a wdag or witness tree). If K = ⊥ , we say formally that Active( K ) = ∅ .The main motivation for this construction is the following result, which is easily shown by induction H : Proposition B.3.

For a state σ and wdag H , we have e ⊤ σ A H ~ if there is some i with σ [ i ] = j but ( i, j ′ ) ∈ Active( H ) for j ′ = j ; otherwise, we have e ⊤ σ A H ~ w ( H ) Q ( i,j ) ∈ Active( H ) p ij . Proposition B.4.

A ﬂaw f is dominated by a wdag H if and only if the following condition holds: there isno pair ( i, j ) such that f demands X i = j and ( i, j ′ ) ∈ Active( H ) for j ′ = j .Proof. Deﬁne H ′ to be the wdag obtained by adding a source node labeled f . So f is dominated by H iff A H ′ ~ (cid:22) A H ~ .For the forward direction, suppose that f demands X i = j and ( i, j ′ ) ∈ Active( H ) for j ′ = j .Consider a state σ chosen to agree with all the active conditions in H ′ ; in particular, σ [ i ] = j . (It can24ake arbitrary value on variables not covered by Active( H ′ ) ). By Proposition B.3, we have e ⊤ σ A H ′ ~ w ( H ′ ) Q ( i,j ) ∈ Active( H ′ ) p ij > . On the other hand, σ disagrees with Active( H ) on variable i , so by Proposition B.3we have e ⊤ σ A H ~ .For the reverse direction, suppose there is no such pair ( i, j ) . Let f ≡ X i = j ∧ · · · ∧ X i k = j k where ( i , j ) , . . . , ( i r , j r ) ∈ Active( H ) and variables i r +1 , . . . , i k are not constrained by Active( H ) . So then Active( H ′ ) = Active( H ) ∪ { ( i r +1 , j r +1 ) , . . . , ( i k , j k ) } . Consider an arbitrary state σ . If σ disagrees with Active( H ′ ) , then e ⊤ σ A H ~ and so the required inequality holds. Otherwise, by Proposition B.3 we have e ⊤ σ A H ′ ~ w ( H ′ ) Q ( i,j ) ∈ Active( H ′ ) p ij = w ( H ) Q kℓ =1 p i ℓ j ℓ Q kℓ = r +1 p i ℓ j ℓ × Q ( i,j ) ∈ Active( H ) p ij ≤ w ( H ) Q ( i,j ) ∈ Active( H ) p ij Since

Active( H ) ⊆ Active( H ′ ) , the state σ also agrees with the active conditions of H . So by Propo-sition B.3, we also have e ⊤ σ A H ~ w ( H ) Q ( i,j ) ∈ Active( H ) p ij . This shows that e ⊤ σ A H ′ ~ ≤ e ⊤ σ A H ~ as desired. Proposition B.5.

For any variable i , there is at most one value j such that ( i, j ) ∈ Active( G Tt,s ) or ( i, j ) ∈ Active( K Tt,s ) .Proof. The result for

Active( G Tt,s ) follows immediately from the fact that the vertices at the same depth of G cannot have labels which disagree on a variable i .For the second result, suppose that ( i, j ) , ( i, j ′ ) ∈ Active( K Tt,s ) for some variable i and j = j ′ . For, ifso, there must be two nodes w, w ′ at the same depth such that L ( w ) demands X i = j and L ( w ′ ) demands X i = j ′ . Suppose that w was added at time s and w ′ was added at time s ′ < s . Furthermore, there cannotbe any nodes at greater depth constraining i .Now note that, when adding node w ′ , we would have L ( w ′ ) eligible for w . The reason is that none ofthe children of w constrain variable i , and yet L ( w ) , L ( w ′ ) disagree on i . Thus, w ′ should be placed eitheras a child of w , or at another location of greater depth. In either case, the depth of w ′ should be greater thanthat of w , a contradiction. Proposition B.6. If K Tt,s = ⊥ , then we have Active( G Tt,s ) = Active( K Tt,s ) Proof.

Suppose we add a node v with label f ≡ X i = j ∧ · · · ∧ X i k = j k to G Tt,s . This will cause ( i , j ) , . . . , ( i k , j k ) to be added to Active( G Tt,s ) compared to Active( G Tt,s +1 ) ; it may also remove somevalues ( i ℓ , j ′ ) from Active( G Tt,s +1 ) . In light of Proposition B.5, it sufﬁces to show that we also have ( i ℓ , j ℓ ) ∈ Active( K Tt,s ) for each ℓ = 1 , . . . , k .We add to K Tt,s a new node v labeled f . This node v demands X i ℓ = j ℓ . Let w denote the node of K Tt,s +1 of greatest depth such that i ℓ ∈ var( w ) ; if there is no such node, then clearly v already satisﬁesthe criteria for active conditions in K Tt,s and so ( i ℓ , j ℓ ) ∈ Active( K Tt,s ) . If L ( w ) demands X i ℓ = j ℓ , thenalready ( i ℓ , j ℓ ) ∈ Active( K Tt,s ) and we are done. Otherwise, L ( w ) disagrees with f on i ℓ . But then notethat f would be eligible to be a child of w , since none of the children of w constraint i ℓ . Since v is chosento have maximum depth, we must have depth( v ) > depth( w ) .We see that L ( v ) demands X i ℓ = j ℓ and there is no node w of greater depth with i ℓ ∈ var( w ) . So ( i ℓ , j ℓ ) ∈ Active( K Tt,s ) .We are now ready to show, as promised, that there is always at least one node of K Tt,s which is eligiblefor a given ﬂaw f to be placed. Proposition B.7.

Suppose that G Tt,s is obtained from G Tt,s +1 by adding a node labeled f . Then there is somenode v ∈ K Tt,s +1 eligible for f . roof. First, suppose that f ∈ Q ( G Ts +1 ) , and so G Tt,s +1 contains a source node w labeled g ≈ f . This nodenode w must correspond to some node w ′ ∈ K Tt,s +1 . Furthermore, w ′ must be a leaf node, as any child z ′ of w ′ would correspond to a node z ∈ G Tt,s +1 such that L ( z ) ≈ L ( w ) and z comes earlier in time than w .Thus, C ( w ′ ) = ∅ in K Tt,s +1 and f ≈ L ( w ′ ) , so w ′ is eligible for f and we are done.Next, suppose f / ∈ Q ( G Ts +1 ) and that f is not dominated by G Tt,s +1 . By Proposition B.4, for this tooccur, there must be some variable i such that f demands X i = j but ( i, j ′ ) ∈ Active( G Tt,s +1 ) where j ′ = j . By Proposition B.6, we also have ( i, j ′ ) ∈ Active( K Tt,s +1 ) . So, consider some node v in K Tt,s +1 such that i ∈ var( v ) , and v has the greatest depth among all such nodes, and L ( v ) demands X i = j ′ . Now f is eligible for v .This shows that K Tt,s is never equal to ⊥ . Also, it immediately implies that, for any ﬂaw f , the numberof nodes of G Tt,s labeled f is precisely the same as the number of nodes of K Tt,s labeled f and in particular K T t, s and G Tt,s have the same number of nodes. Next, we need to show that it is possible to reconstruct G Tt given K Tt . Proposition B.8.

Consider a trajectory T = ( f , . . . , f t ) such that G Tt contains t nodes, and let v be a nodeof K Tt of maximum depth with g = L ( v ) . Let s be minimum such that f s = g .For the trajectory T ′ = ( f , . . . , f s − , f s +1 , . . . , f t ) we have K T ′ t − = K Tt − v . Also, G Tt is obtainedfrom G T ′ t − by adding a new source node labeled g .Proof. Clearly, v is a leaf node of K Tt . Suppose that it corresponds to node w ∈ G Tt . Then w must be asource node of G Tt ; for, if there were some w ′ with an edge to w in G Tt , then L ( w ′ ) would be eligible for v ,and so K Tt would have an additional node v ′ of greater depth.Let us observe that f r g for r = 1 , . . . , s − ; for, suppose that r is maximum such that f r ≈ g and r < s . Considering forming G Tt,r and corresponding K Tt,r . Since f r ≈ L ( w ) for a source node w of G Tt,r ,we will add a new node labeled f r into G Tt,r , contradicting that w is a source node of G Tt,r .We will show by induction on r that K T ′ t − ,r = K Tr − v and G T ′ t − ,r = G Tt,r − w hold for r ≤ s . The case r = s holds since K T ′ t − ,s = K Tt,s +1 = K Tt,s − v as well as G T ′ t − ,s = G Tt,s +1 = G Tt,s − w , since the trajectory T and T ′ agree on all positions beyond s (with a shift). For brevity we write H r = G Tt,r and H ′ r = G T ′ t − ,r .We now turn to the induction step. Since G Tt has size t , it must be that H r − gains a node at iteration r when procesing f r . We claim that H ′ r − also does so. First, suppose that H r has a source node z with f r ≈ L ( z ) . By our discussion above, we cannot have z = w . Since H ′ r = H r − w , this node z remains asource node in G T ′ t,r ; so f r ≈ L ( z ) and so f r will be added to H ′ r to get H ′ r − .Next, suppose that f r is not dominated by H r . By Proposition B.4, there must be a node z ∈ H r suchthat L ( z ) disagrees with f r on the value of some variable i , and z has greatest depth in H r among all nodes z ′ such that i ∈ var( z ′ ) . Now if z = w , we would have g = L ( w ) = L ( z ) ≈ f r , which cannot occur.Thus, z remains in H r − w , which by induction hypothesis is H ′ r . Furthermore, z still has maximum depthin H r − w among all nodes z ′ with i ∈ var( z ′ ) . Thus, by Proposition B.4, f r is not dominated by H ′ r , andso again f r is added to H ′ r to get H ′ r − .Thus H ′ r − is obtained from H ′ r by adding a new source node labeled f r . Likewise H r − is obtained by H r by adding a new source node labeled f r . Since f r L ( w ) , we thus preserve that H ′ r − = H r − − v .Next, when forming K Tt,r − from K Tt,r , we are going to add a new node labeled f r as a child of somenode z for which f r is eligible. Since f r g , the node v is not such an eligible node. Thus, node z is alsoa node of K Tt,r − − v , which by induction hypothesis is K T ′ t − ,r − . So, in forming K T ′ t − ,r − , we will add anode in the same position as when forming K Tt,r . Thus, we preserve that K T ′ t − ,r − = K Tt,r − − v . Proposition B.9.

Consider a pair of trajectories T , T of length t such that K T t = K T t = K and the tree K has t nodes. Then G T t = G T t . roof. We show this by induction on t . When t = 0 this holds vacuously. Otherwise, choose a node v of K of maximum depth, and let g = L ( v ) . Let T ′ , T ′ be obtained by deleting the earliest position in T ′ , T ′ labeled g . By Proposition B.8, we have K T ′ = K T − v = K − v = K T − v = K T ′ . Furthermore,trajectories T ′ , T ′ have length t − . So, by induction hypothesis, we have G T ′ t − = G T ′ t − .By Proposition B.8, G T t is obtained from G T ′ t − by adding a source node labeled g , and the same alsoholds for G T t . So G T t = G T t . Proposition B.10.

For any ﬂaw f , there is an injective mapping F from wdags H ∈ H Q ( f ) to trees F ( H ) ∈ T with root node labled f , and such that w ( H ) = w ( F ( H )) .Proof. Consider some wdag H ∈ H Q ( f ) and some arbitrary trajectory T such that H = G Tt . Let r = | H | .There must be a subsequence T ′ of length r such that G T ′ r ; namely, T ′ corresponds to all the times i when anode was actually added to G Tt,i . Choose an arbitrary such sequence T ′ , and deﬁne now F ( H ) = K T ′ r .It is clear that w ( F ( H )) = w ( K T ′ r ) = w ( G T ′ r ) = w ( G Tt ) and that the image of G lies in T , and that theroot node of F ( H ) has the same label as the sink node f of H . Note that F ( H ) has r nodes.We need to show that F is injective. Suppose that F ( H ) = F ( H ) = K . So F ( G T t ) = F ( G T t ) = K for some trajectories T , T . Let T ′ = T ′ be the corresponding minimal subsequences of length r , so that F ( G T ′ r ) = F ( G T ′ r ) = K . By Proposition B.9, we have G T ′ r = G T ′ r . So G T t = G T t and H = H .This means that Theorem B.2 is an immediate consequence of our existing framework. Theorem B.11.

Under the condition of Theorem B.2, we have Φ Q ( f ) ≤ η ( f ) for any ﬂaw f . In particular,Algorithm 2 converges and the expected number of resamplings of f is at most η ( f ) .Proof. By Proposition B.10, we have P H ∈ H Q ( f ) w ( H ) ≤ P H ∈ H Q ( f ) w ( F ( H )) , and this is at most P K w ( K ) where K ranges over witness trees with root node labeled f . Using standard Galton-Watson-type estimates,[10] shows that this sum is at most η ( f ) .We can now use these results to show Theorem B.2. For the ﬁrst result, using Q = Q , we have P H ∈ H Q ( f ) w ( H ) ≤ η ( f ) for any ﬂaw f . In particular, Algorithm 2 converges and the expected number ofsamplings of f is at most η ( f ) . For the second result, using Q = Q , we have P H ∈ H Q ( f ) w ǫ ( H ) ≤ η ( f ) ,and so W ǫ ≤ P f η ( f ) . Thus, Algorithm 5 converges in O ( log( m P f η ( f )) ǫ ) rounds w.h.p.This also gives a related distributional result: Theorem B.12.

Under the condition of Theorem B.2, for any monomial event E we have P ( E ) ≤ µ ( E ) X I ∈ Ord ( E ) Y g ∈ I η ( g ) Proof.

By Proposition 5.1, we have P ( E ) ≤ P H ∈ H E θ ⊤ A H ~ . Consider now some H ∈ H E . This has asingle sink node v labeled f E . Note that the expanded set of ﬂaws F E also consists solely of monomialevents, and so all the bounds here apply to it as well. Thus, the mapping F of Proposition B.10 maps it to atree T with root node r labeled f E , and which has the same node labels. Let v , . . . , v s denote the childrenof r with labels g , . . . , g s respectively. By deﬁnition, these have distinct labels and set I = { g , . . . , g s } isin Ord ( E ) . Since the deﬁnition of T is a tree property, the subtrees T , . . . , T k of v , . . . , v k are themselvesin T . Also, we have θ ⊤ A H ~ ≤ w ( H − v ) µ ( E ) = w ( T ) · · · w ( T k ) µ ( E ) .By injectivity of F , therefore, the sum P H ∈ H E θ ⊤ A H ~ is at most the sum of w ( T ) · · · w ( T k ) µ ( E ) overall possible choices for T , . . . , T k . For a ﬁxed choice of g i , arguments of [10] shows that the sum of w ( T i ) is at most η ( g i ) . So, the overall sum here is at most P I ∈ Ord ( E ) Q g ∈ I η ( g ) .We remark that it would be possible to show Theorem B.12 directly using arguments of [10], but it isnice to know that it follows from the general framework of Section 5 as well.27 Distributional bound for permutation LLLL: Proof of Theorem 5.5

Consider the setting where Ω is the set of permutations on n letters and A is the set of atomic events πx = y ,which we also write h x, y i . The resampling oracle here, for such an event, is to update the state π ← ( y z ) π ,where z is uniformly drawn [ n ] .Throughout this section, let us ﬁx event E = h C i , for a stable set C . Consider a stable set I ⊆ A . Wecan form a bipartite graph G I,E as follows: the left nodes correspond to C , and the right nodes correspondto I . There is an edge from ( x, y ) to ( x ′ , y ′ ) if x = x ′ or y = y ′ . (In this case, for brevity, we write ( x, y ) ∼ ( x ′ , y ′ ) .) This is slightly denser than the standard dependency graph here (which would have ( x, y ) ( x, y ) ), but it simpliﬁes a number of formulas and has little impact in applications.We make a few simple observations about this graph. Since I and C are stable, the graph G I,E hasdegree at most two. (Each ( x ′ , y ′ ) on the left may have an edge to some node ( x ′ , y ) and to an a node of theform ( x, y ′ ) , but no other nodes). So, G I,E decomposes into paths and cycles.We deﬁne the active conditions

Active( I ) as follows. First, for each left-node ( x ′ , y ′ ) , we have ( x ′ , y ′ ) ∈ Active( I ) . Next, consider some maximal path starting and ending at right nodes (which we call a right path ).The path can be written (in one of its two orientations) as ( x , y ) , ( x , y ) , ( x , y ) , . . . , ( x k , y k − ) , ( x k , y k ) .In this case, we also have an active condition ( x k , y ) in Active( I ) . (It is possible that k = 1 here, in whichcase ( x , y ) is an isolated right node of G I,E .) We say that a permutation π satisﬁes I if πx = y for all ( x, y ) ∈ Active( I ) . We also write a ( I ) = | Active( I ) | .The explanation for active conditions comes from the following observation: Proposition C.1.

Consider a state π ∈ Ω . If π does not satisfy I , then e ⊤ π A I e E = 0 . Otherwise, we have e ⊤ π A I e E = ( n − | C | )! n | I | ( n − a ( I ))! Proof.

We show this by induction on I . In the base case I = ∅ , A I is the identity, and note that Active( I ) issimply the set C . In this case, e ⊤ π A I e E is simply the indicator function that π ∈ E , which holds iff πx = y for all ( x, y ) ∈ Active( I ) iff πx = y for all ( x, y ) ∈ C .For the induction step, let us ﬁrst show that e ⊤ π A I e E = 0 if π does not satisfy I . First, suppose that πx = y for some f = h x, y i ∈ I . In this case, we can write A I = A f A I − f . Since e ⊤ π A f = 0 , we clearlyhave e ⊤ π A I e E = 0 .Next, suppose that πx = y k where ( x , y ) , ( x , y ) , ( x , y ) , . . . , ( x k , y k − ) , ( x k , y k ) is a path startingat right-node ( x , y ) and ending at ( x k , y k ) . If k = 1 , then none of the ﬂaws f ∈ I are neighbors of event ( x , y ) , and in particular if ( x , y ) is false on π then it is also false after resampling them all. So in thiscase, again if πx = y k then e ⊤ π A I e E = 0 .So let us assume that k > , and hence we know ( x , y ) is a left-node. Thus, as discussed above, wemust have πx = y . We can write A I = A f A I − f e E where f = h x , y i , and so e ⊤ π A I e E = X π ′ A f [ π, π ′ ] e ⊤ π ′ A I − f e E . Consider some possible state π ′ here which can be obtained from π by resampling f . By inductionhypothesis, the summand is zero unless π ′ satisfes I − f . Removing f from I changes the active conditions:now ( x , y ) becomes an isolated node in G I − f,E , and there is a new maximal path starting at ( x , y ) whichgives rise to an active condition ( x k , y ) .We know that π ′ = ( y z ) π for some value z . We also know that πx = y , π ′ x = y . This meansthat we must have z = y and hence π ′ = ( y y ) π . Also, π ′ must satisfy the active conditions π ′ x k = y .Hence πx k = ( y y ) π ′ x k = ( y y ) y = y as desired.So we have shown that the desired bound holds if π does not satisfy I . Suppose now that π satisﬁes I .There are a number of possible cases here: 28. Suppose as before there is some maximal-length path ( x , y ) , ( x , y ) , ( x , y ) , . . . , ( x k , y k − ) , ( x k , y k ) starting and ending at right-nodes and k > . In this case, by the above argument, letting f = h x , y i ,we have again: e ⊤ π A I e E = X π ′ A f [ π, π ′ ] e ⊤ π ′ A I − f e E . and, as we have discussed, there is only possible non-zero summand, corresponding to π ′ = ( y y ) π .By similar reasoning, we can see that π ′ satisﬁes I − f , and hence by induction hypothesis we have e ⊤ π A I e E = A f [ π, π ′ ] ( n − | C | )! n | I − f | ( n − a ( I − f ))! Here, a ( I − f ) = a ( I ) and A f [ π, π ′ ] = 1 /n . So this is ( n −| C | )! n | I | ( n − a ( I ))! , as claimed.2. Suppose there is some maximal path in G I,E of the form ( x , y ) , ( x , y ) , . . . , ( x k − , y k ) , ( x k , y k ) where ( x , y ) is a left node and ( x k , y k ) is a right node (possibly k = 2 ). Letting f = h x , y i ∈ I ,we have again: e ⊤ π A I e E = X π ′ A f [ π, π ′ ] e ⊤ π ′ A I − f e E . Consider π ′ = ( y z ) π . By induction hypothesis, π ′ must satisfy Active( I − f ) . Removing f destroys the active condition ( x , y ) but adds a new active condition ( x k , y ) corresponding to themaximal path ( x , y ) , . . . , ( x k , y k ) . This is the only condition that could possibly be affected in π ′ .There is exactly one value z which satisﬁes this condition. as π ′ x k = y iff ( y z ) πx k = y iff πx k = z . As a ( I − f ) = a ( I ) again, the calculation is precisely analogous to the previous case.3. Suppose that G I,E has a maximal path ( x , y ) , ( x , y ) , . . . , ( x k − , y k ) , ( x k , y k ) , ( x k , y k +1 ) where ( x , y ) is a right node and ( x k , y k +1 ) is a left node. Letting f = h x k , y k +1 i ∈ I , we have again: e ⊤ π A I e E = X π ′ A f [ π, π ′ ] e ⊤ π ′ A I − f e E . Consider π ′ = ( y k +1 z k +1 ) π . By induction hypothesis, π ′ must satisfy Active( I − f ) . Removing f destroys the active condition ( x k , y k +1 ) but adds a new active condition ( x k , y ) corresponding tothe maximal path ( x , y ) , . . . , ( x k , y k ) . This is the only condition that could possibly be affected in π ′ . Again, there is exactly one value z k +1 which satisﬁes this condition and a ( I − f ) = a ( I ) ; thecalculation is analogous to the previous case.4. Suppose that G I,E has a maximal path ( x , y ) , ( x , y ) , . . . , ( x k − , y k ) , ( x k , y k ) , ( x k , y k ) where ( x , y ) and ( x k , y k ) are right nodes. Letting f = h x , y i ∈ I , we have again: e ⊤ π A I e E = X π ′ A f [ π, π ′ ] e ⊤ π ′ A I − f e E . Removing this f destroys an active condition ( x , y ) and adds no new ones. Furthermore, in orderfor π ′ = ( y z ) π to satisfy an active condition ( x, y ) ∈ Active( I − f ) , we must have z = y (ascurrently πx = y . Thus, there are precisely n − a ( I − f ) choices for z .By induction hypothesis, for each such choice of π ′ , the value of e ⊤ π ′ A I − f e E is ( n −| C | )! n | I − f | ( n − a ( I − f ))! = ( n −| C | )! n | I |− ( n − a ( I )+1)! . Summing over the n − a ( I ) + 1 choices for z , we get a total probability of ( n − a ( I ) + 1) × n × ( n − | C | )! n | I |− ( n − a ( I ) + 1)! = ( n − | C | )! n | I | ( n − a ( I ))! roposition C.2. If f ∈ I , then e ⊤ π A f A I e E = n e ⊤ π A I e E for any state π .Proof. Let f = h x, y i , and consider state π . If πx = y , then both LHS and RHS are equal to zero (since ( x, y ) is an active condition of I ). Suppose that π satisﬁes πx = y , and we resample π to π ′ = ( y z ) π . Wecan write e ⊤ π A f A I e E = X π ′ A f [ π, π ′ ] e ⊤ π ′ A I e E . By Proposition C.1, the summand non-zero only if π ′ satisﬁes Active( I ) , and in particular satisﬁes π ′ x = y . This occurs only if z = y in which case π ′ = π with summand n e ⊤ π A I e E . Proposition C.3.

Consider a stable set I = {h F i , . . . , h F k i} of A and let J = F ∪ · · · ∪ F k . For any state π , we have e ⊤ π A I e E = n | J | k Y i =1 ( n − | F i | )! n ! e ⊤ π A J e E Proof.

We show this by induction on k . Let us deﬁne f i = h F i i for each i . Case k = 0 holds vacuously.For the induction step, let I ′ = {h F i , . . . , h F k − i} and J ′ = F ∪ · · · ∪ F k − . By induction hypothesis, wecan write e ⊤ π A I e E = e ⊤ π A f k A I ′ e E = e ⊤ π n | J ′ | A J ′ k − Y i =1 ( n − | F i | )! n ! e ⊤ π A f k nA J ′ e E Let us write F k = { g , . . . , g t } . By Proposition 7.3, we have A f k = cA g . . . A g t = cA F k , where thescalar constant c is given by c = t Y i =1 r ∈ R gi ( r ∈ R g i ; g i +1 , . . . , g t ) = n t ( n − t )! n ! Thus, we have e ⊤ π A I e E = e ⊤ π A f k A I ′ e E = e ⊤ π n | J ′ | A J ′ k − Y i =1 ( n − | F i | )! n ! e ⊤ π × n t ( n − t )! n ! A F k e E We have J = J ′ ∪ F k , and A J ′ A F k = A J Q g ∈ F k ∩ J ′ A g and t = | F k +1 | , so we can write this as: e ⊤ π A I e E = e ⊤ π A f k A I ′ e E = n t + | J ′ | k Y i =1 ( n − | F i | )! n ! e ⊤ π Y g ∈ F k ∩ J A g A J e E (7)By Proposition C.2, we have e ⊤ π A g A J e E = n A J e E for each g ∈ F k ∩ J ′ . Hence, we have e ⊤ π A J Y g ∈ F k ∩ J A g e E = n −| F k ∩ J ′ | e ⊤ π A J e E , and substituting back into Eq. (7), we have: e ⊤ π A I e E = n t + | J ′ |−| F k ∩ J ′ | k Y i =1 ( n − | F i | )! n ! e ⊤ π A J To ﬁnish the induction, observe that t + | J ′ | − | F k ∩ J ′ | = | F k | + | J ′ | − | F k ∩ J ′ | = | F k ∪ J ′ | = | J | .30 roposition C.4. Consider a stable set I ⊆ A and some f ∈ A with f I . Let I ′ = I ∪ { f } . Then either a ( I ′ ) = a ( I ) or a ( I ′ ) = a ( I ) + 1 . The former case holds if and only if there is some right path with anendpoint g ∼ f . In the latter case, we have Active( I ) ⊆ Active( I ′ ) .Proof. Suppose that f = h x, y i and let I ′ = I ∪ { f } . Every active condition corresponding to a left node in G I,E is preserved in G I ′ ,E , plus there will be one new active condition corresponding to ( x, y ) .The only way that G I ′ ,E could gain an additional active condition, beyond this one, is if ( x, y ) par-ticipates in a maximal path beginning and ending at right nodes. In this case, there would be right pathsin G I,E with endpoints ( x ′′ , y ′′ ) , ( x, y ′ ) and ( x ′ , y ) , ( x ′′′ , y ′′′ ) respectively. Then the two active conditions ( x, y ′′ ) and ( x ′′′ , y ) are removed in Active( I ) , replaced by two new active conditions ( x, y ) and ( x ′′′ , y ′′ ) in Active( I ′ ) . So a ( I ′ ) = a ( I ) .The only way that G I ′ ,E could lose an active condition would be if G I,E has a right path with an endpoint g ∼ f . We have already discussed what occurs if there are two such paths. If there is just one, then this isthe only additional active condition lost in I ′ , and so a ( I ′ ) = a ( I ) .In other cases, we have Active( I ′ ) = Active( I ) ∪ { ( x, y ) } and the result holds. Proposition C.5.

Let I = {h F i , . . . , h F k i} be a stable set in A and let f = h F k +1 i ∈ A where f I .Consider the stable sets J = F ∪ · · · ∪ F k and J ′ = J ∪ F k +1 of A . If a ( J ′ ) = a ( J ) + | J ′ − J | , then f isdominated by I in A .Proof. Consider stable set I ′ = I ∪ { f } = {h F i , . . . , h F k +1 i} . We want to show that e ⊤ π A I ′ e E ≤ e ⊤ π A I e E for any state π . By Proposition C.3, this is equivalent to showing that n | J ′ | k +1 Y i =1 ( n − | F i | )! n ! e ⊤ π A J ′ e E ≤ n | J | k Y i =1 ( n − | F i | )! n ! e ⊤ π A J e E If we deﬁne t = | J ′ − J | , and divide common terms, this is equivalent to showing that: e ⊤ π A J ′ e E e ⊤ π A J e E ≤ n ! n t ( n − | F k +1 | )! (8)By Proposition C.4, the only way to have a ( J ′ ) = a ( J ) + t is to have Active( J ) ⊆ Active( J ′ ) . In thiscase, by Proposition C.1, we have e ⊤ π A J ′ e E = 0 = e ⊤ π A J e E if π does not satisfy J . If π does satisfy J ,then by Proposition C.4 we have e ⊤ π A J e E = ( n −| C | )! n | J | ( n − a ( J ))! and also e ⊤ π A J ′ e E ≤ ( n − | C | )! n | J ′ | ( n − a ( J ′ ))! = ( n − | C | )! n | J | + t ( n − a ( J ) − t )! Overall, we have e ⊤ π A J ′ e E e ⊤ π A J e E ≤ ( n −| C | )! n | J | + t ( n − a ( J ) − t )!( n −| C | )! n | J | ( n − a ( J ))! = ( n − a ( J ))! n t ( n − a ( J ) − t )! ≤ n ! n t ( n − t )! ≤ n ! n t ( n − | F k +1 | )! which satisﬁes Eq. (8). Proposition C.6.

For a stable set I ∈ I ( E ) , there is an injective function φ I : I → C such that g ∼ φ I ( g ) holds for all g ∈ I . roof. By deﬁnition, I can be ordered as I = {h F i , . . . , h F k i} such that each h F i i is not dominatedby { F , . . . , F i − } . Let us deﬁne J i = F ∪ · · · ∪ F i for each i . By Proposition C.5, we must have a ( J i ) = a ( J i − ) + | F i | for each value i , each F i would be dominated by {h F i , . . . , h F i − i} .By Proposition C.4, there must be some g i ∈ F i − J i − with a ( J i ∪ { g i } ) = a ( J i ) , i.e. the graph G J i ,E has a right path ending at a node f ∼ g i . This right path remains (possibly shortened) in the graph G { g ,...,g i − } ,E , and so must also have a ( { g , . . . , g i − } ) = a ( { g , . . . , g i } ) . Since a ( ∅ ) = | C | , this impliesthat a ( { g , . . . , g i } ) = | C | for all i .Let H = { g , . . . , g k } . Now, all the paths in graph G H,E must be right paths. For any such right path ofthe form h , g i , h , g i , . . . , g i s , h s +1 , we can select φ I ( g i ) = h , . . . , φ I ( g i s ) = h h +1 .We can now show Theorem 5.5. Let C = { g , . . . , g t } . First, we have µ ( E ) = ( n − t )! n ! . Next, considerenumerating a set I ∈ I ( E ) . By Proposition C.6, we can choose, for each g ∈ C , to either include somecorresponding preimage f = φ − I ( g ) in I , or not include it. Let us write I g = ∅ if there is no such f , or I g = { f } for such f . Thus I = S g ∈ C I g . Overall, this shows that X I ∈ I ( E ) Ψ( I ) ≤ X I g ,...,I gt Ψ( I g ∪ · · · ∪ I g t ) ≤ X I g ,...,I gt Ψ( I g ) · · · Ψ( I g t ) where the last inequality follows from log-subadditivity of Ψ . This can be written as Q ti =1 P I gi Ψ( I g i ) .The case of I g i = ∅ contributes , and the case of I g i = { f } contributes Ψ( f ) . References [1] Dimitris Achlioptas and Fotis Iliopoulos. Random walks that ﬁnd perfect objects and the Lov´asz locallemma.

Journal of the ACM , 63(3):Article

SIAM Journal on Computing , 48(5):1583–1602, 2019.[3] Dimitris Achlioptas, Fotis Iliopoulos, and Alistair Sinclair. Beyond the Lov´asz local lemma: Point toset correlations and their algorithmic applications. In

Proc. 60th Annual IEEE Symposium on Founda-tions of Computer Science (FOCS) , pages 725–744, 2019.[4] Karthekeyan Chandrasekaran, Navin Goyal, and Bernhard Haeupler. Deterministic algorithms for theLov´asz local lemma.

SIAM Journal on Computing , 42(6):2132–2155, 2013.[5] Antares Chen, David G. Harris, and Aravind Srinivasan. Partial resampling to approximate coveringinteger programs. In

Proc. 27th annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages1984–2003, 2016.[6] Kai-Min Chung, Seth Pettie, and Hsin-Hao Su. Distributed algorithms for the Lov´asz local lemma andgraph coloring.

Distributed Computing , 30(4):261–280, 2017.[7] Paul Erd˝os and L´aszl´o Lov´asz. Problems and results on -chromatic hypergraphs and some relatedquestions. In Inﬁnite and ﬁnite sets (Colloq., Keszthely, 1973; dedicated to P. Erd˝os on his 60thbirthday), Vol. II , pages 609–627. Colloq. Math. Soc. J´anos Bolyai, Vol. 10. 1975.[8] Bernhard Haeupler and David G. Harris. Parallel algorithms and concentration bounds for the Lov´aszlocal lemma via witness DAGs.

ACM Transactions on Algorithms (TALG) , 13(4):Article

Journal of the ACM , 58(6):Article

ACM Transactions on Algorithms , 13(1):Article

Proc. 30th annual ACM-SIAM Symposium on Discrete Algorithm (SODA) , pages 841–860,2019.[12] David G. Harris. New bounds for the Moser-Tardos distribution.

Random Structures & Algorithms ,57(1):97–131, 2020.[13] David G. Harris and Aravind Srinivasan. Algorithmic and enumerative aspects of the Moser-Tardosdistribution.

ACM Transactions on Algorithms , 13(3):Article

The-ory of Computing , 13(1):Article

Jour-nal of the ACM , 66(5):Article

SIAM Journal on Computing , 49(2):394–428, 2020.[17] Fotis Iliopoulos. Commutative algorithms approximate the LLL-distribution.

Approximation, Ran-domization, and Combinatorial Optimization. Algorithms and Techniques , 2018.[18] Fotis Iliopoulos and Alistair Sinclair. Efﬁciently list-edge coloring multigraphs asymptotically opti-mally. In

Proc. 14th annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 2319–2336, 2020.[19] Kashyap Kolipaka and Mario Szegedy. Moser and Tardos meet Lov´asz. In

Proc. 43rd annual ACMSymposium on Theory of Computing (STOC) , pages 235–244, 2011.[20] Kashyap Kolipaka, Mario Szegedy, and Yixin Xu. A sharper local lemma with improved applica-tions. In

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Tech-niques , pages 603–614. 2012.[21] Vladimir Kolmogorov. Commutativity in the algorithmic Lov´asz local lemma.

SIAM Journal onComputing , 47(6):2029–2056, 2018.[22] Robin A. Moser and G´abor Tardos. A constructive proof of the general Lov´asz local lemma.

Journalof the ACM , 57(2):Article

SIAM Journal on DiscreteMathematics , 28(2):911–917, 2014.[24] J.B. Shearer. On a problem of Spencer.