New Constructive Aspects of the Lovasz Local Lemma
aa r X i v : . [ c s . D S ] O c t New Constructive Aspects of the Lov´asz Local Lemma ∗ Bernhard Haeupler † Barna Saha ‡ Aravind Srinivasan § October 4, 2011
Abstract
The Lov´asz Local Lemma (LLL) is a powerful tool that gives sufficient conditions for avoiding all ofa given set of “bad” events, with positive probability. A series of results have provided algorithms toefficiently construct structures whose existence is non-constructively guaranteed by the LLL, culminatingin the recent breakthrough of Moser & Tardos for the full asymmetric LLL. We show that the outputdistribution of the Moser-Tardos algorithm well-approximates the conditional LLL-distribution – thedistribution obtained by conditioning on all bad events being avoided. We show how a known bound onthe probabilities of events in this distribution can be used for further probabilistic analysis and give newconstructive and non-constructive results.We also show that when a LLL application provides a small amount of slack, the number of resamplingsof the Moser-Tardos algorithm is nearly linear in the number of underlying independent variables (notevents!), and can thus be used to give efficient constructions in cases where the underlying proof appliesthe LLL to super-polynomially many events. Even in cases where finding a bad event that holds iscomputationally hard, we show that applying the algorithm to avoid a polynomial-sized “core” subsetof bad events leads to a desired outcome with high probability. This is shown via a simple union boundover the probabilities of non-core events in the conditional LLL-distribution, and automatically leads tosimple and efficient Monte-Carlo (and in most cases
RNC ) algorithms.We demonstrate this idea on several applications. We give the first constant-factor approximationalgorithm for the Santa Claus problem by making a LLL-based proof of Feige constructive. We pro-vide Monte Carlo algorithms for acyclic edge coloring, non-repetitive graph colorings, and Ramsey-typegraphs. In all these applications, the algorithm falls directly out of the non-constructive LLL-based proof.Our algorithms are very simple, often provide better bounds than previous algorithms, and are in severalcases the first efficient algorithms known.As a second type of application we show that the properties of the conditional LLL-distribution can beused in cases beyond the critical dependency threshold of the LLL: avoiding all bad events is impossiblein these cases. As the first (even non-constructive) result of this kind, we show that by sampling aselected smaller core from the LLL-distribution, we can avoid a fraction of bad events that is higher thanthe expectation. MAX k -SAT is an illustrative example of this. ∗ A preliminary version of this paper appeared in 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS),2010, Las Vegas. † [email protected]; CSAIL, Dept. of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139. Partof this work was done while visiting the University of Maryland. ‡ [email protected]; Dept. of Computer Science, University of Maryland, College Park, MD 20742 Supported in part byNSF Award CCF-0728839 and NSF Award CCF-0937865 and a Google Research Award § [email protected]; Dept. of Computer Science and Institute for Advanced Computer Studies, University of Maryland, CollegePark, MD 20742. Supported in part by NSF ITR Award CNS-0426683, NSF Award CNS-0626636, and NSF Award CNS1010789. Introduction
The well-known Lov´asz Local Lemma (LLL) [24] is a powerful probabilistic approach to prove the existence ofcertain combinatorial structures. Its diverse range of applications include breakthroughs in packet-routing[33], a variety of theorems in graph-coloring including list coloring, frugal coloring, total coloring, andcoloring graphs with lower-bounded girth [38], as well as a host of other applications where probabilityappears at first sight to have no role [10]. Furthermore, almost all known applications of the LLL have noalternative proofs known. While the original LLL was non-constructive – it was unclear how the existenceproofs could be turned into polynomial-time algorithms – a series of works [17, 1, 23, 37, 38, 46, 41, 39, 40]beginning with Beck [17] and culminating with the breakthrough of Moser & Tardos (MT) [40] have led toefficient algorithmic versions for most such proofs. However, there are several LLL applications to whichthese approaches inherently cannot apply; our work makes progress toward bridging this gap, by uncoveringand exploiting new properties of [40]. We also obtain what are, to our knowledge, the first algorithmicapplications of the LLL where a few of the bad events have to happen, and where we aim to keep the numberof these small.We will use standard notation: e denotes the base of the natural logarithm, and ln and log denote thelogarithm to the base e and 2, respectively.Essentially all known applications of the LLL use the following framework. Let P be a collection of n mutually independent random variables { P , P , . . . , P n } , and let A = { A , A , . . . , A m } be a collection of m (“bad”) events, each determined by some subset of P . The LLL (Theorem 1.1) shows sufficient conditionsunder which, with positive probability, none of the events A i holds: i.e., that there is a choice of values forthe variables in P (corresponding to a discrete structure such a suitable coloring of a given graph) that avoidsall the A i . Under these same sufficient conditions, MT shows the following very simple algorithm to makesuch a choice: (i) initially choose the P i independently from their given distributions; (ii) while the currentassignment to P does not avoid all the A i , repeat : arbitrarily choose a currently-true A i , and resample, fromtheir product distribution, the variables in P on which A i depends. The amazing aspect of MT is that theexpected number of resamplings is small [40]: at most poly( n, m ) in all known cases of interest. However,there are two problems with implementing MT, that come up in some applications of the LLL: (a) the number of events m can be superpolynomial in the number of variables n ; this can result in asuperpolynomial running time in the “natural” parameter n ; and, even more seriously, (b) given an assignment to P , it can be computationally hard (e.g., NP-hard or yet-unknown to be inpolynomial time) to either certify that no A i holds, or to output an index i such that A i holds.Since detection and resampling of a currently-bad event is the seemingly unavoidable basic step in the MTalgorithm, these applications seemed far out of reach. We deal with a variety of applications wherein (a)and/or (b) hold, and develop Monte Carlo (and in many cases, RN C ) algorithms whose running time ispolynomial in n : some of these applications involve a small loss in the quality of the solution. (We loosely let“ RN C algorithms” denote randomized parallel algorithms that use poly( n ) processors and run in polylog( n )time, to output a correct solution with high probability.) First we show that the MT algorithm needs only O ( n log n ) many resampling steps in all applications that are known (and in most cases O ( n · polylog( n ))),even when m is superpolynomial in n . This makes those applications constructive that allow an efficient implicit representation of the bad events (in very rough analogy with the usage of the ellipsoid algorithm forconvex programs with exponentially many constraints but with good separation oracles). Still, most of ourapplications have problem (b). For these cases, we introduce a new proof-concept based on the (conditional)LLL-distribution – the distribution D on P that one obtains when conditioning on no A i happening. Somevery useful properties are known for D [10]: informally, if B depends “not too heavily” on the events in A ,then the probability placed on B by D is “not much more than” the unconditional probability Pr [ B ]: at n is the parameter of interest since the output we seek is one value for each of P , P , . . . , P n . f A ( B ) · Pr [ B ] (see (3)). Such bounds in combination with further probabilistic analysis can be usedto give interesting (nonconstructive) results. Our next main contribution is that the MT algorithm has anoutput distribution (say D ′ ) that “approximates” the LLL-distribution D : in that for every B , the same upper bound f A ( B ) · Pr [ B ] as above, holds in D ′ as well. This can be used to make probabilistic proofs thatuse the LLL-condition constructive.Problem (b), in all cases known to us, comes from problem (a): it is easy to test if any given A i holdscurrently (e.g., if a given subset of vertices in a graph is a clique), with the superpolynomiality of m beingthe apparent bottleneck. To circumvent this, we develop our third main contribution: the very generalTheorem 3.4 that is simple and directly applicable in all LLL instances that allow a small slack in the LLL’ssufficient conditions. This theorem proves that a small poly( n )-sized core-subset of the events in A can beselected and avoided efficiently using the MT algorithm. Using the LLL-distribution and a simple unionbound over the non-core events, we get efficient (Monte Carlo and/or RN C ) algorithms for these problems.We develop two types of applications, as sketched next.
A summary of four applications follows; all of these have problem (a), and all but the acyclic-coloringapplication have problem (b). Most such results have
RN C versions as well.
The Santa Claus Problem:
The Santa Claus problem is the restricted assignment version of the max-minallocation problem of indivisible goods. The Santa Claus has n items that need to be distributed among m children. Each child has a utility for each item, which is either 0 or some given p j for item j . The objectiveis to assign each item to some child, so that the minimum total utility received by any child is maximized.This problem has received much attention recently [15, 14, 26, 13, 16, 20]. The problem is NP-Hard and thebest-known approximation algorithm due to Bansal and Sviridenko [15] achieves an approximation factor of O ( log log m log log log m ) by rounding a certain configuration LP. Later, Feige in [26] and subsequently Asadpour, Feigeand Saberi in [13] showed that the integrality gap of the configuration LP is a constant. Surprisingly, bothresults were obtained using two different non-constructive approaches and left the question for a constant-factor approximation algorithm open. This made the Santa Claus problem to one of the rare instances [27]in which the proof of an integrality gap did not result in an approximation algorithm with the same ratio.In this paper we resolve this by making the non-constructive LLL-based proof of Feige [26] constructive(Section 4) and giving the first constant-factor approximation algorithm for the Santa Claus problem. Non-repetitive Coloring of Graphs:
Given a graph H = ( V, E ), a k -coloring (not necessarily proper) of theedges of H is called non-repetitive if the sequence of colors along any simple path is not the same in the firstand the second half. The smallest k such that H has a non-repetitive k -coloring is called the Thue number π ( H ) of H [47]. Alon, Grytczuk, Hauszczak and Riordan showed via the LLL that π ( H ) ≤ O (∆( H ) )[5], where ∆ is the maximum degree of any vertex in H . This was followed by much additional works[22, 45, 29, 32, 19, 4]. However, no efficient construction is known till date, except for special classes ofgraphs such as complete graphs, cycles and trees. We present a randomized algorithm for non-repetitivecoloring of H using at most O (∆( H ) ǫ ) colors, for every constant ǫ > General Ramsey-Type Graphs:
The Ramsey number R ( U s , V t ) refers to the smallest n such that any graphon n vertices either contains a U s within any subgraph of s vertices, or there exist t vertices that do notcontain V t . Obtaining lower bounds for various special cases of R ( U s , V t ) and constructing Ramsey typegraphs have been studied in much detail [2, 9, 31, 7]. A predominant case for such problems is when s is heldfixed. We consider the general setting of R ( U s , V t ) with fixed s , and provide efficient randomized algorithmsfor constructing Ramsey-type graphs (Section 6). Acyclic Edge-Coloring:
A proper edge-coloring of a graph is acyclic iff each cycle in it receives more than 2colors. The acyclic chromatic number a ( G ) introduced in [28] is the minimum number of colors in a proper2cyclic edge coloring of G [8, 37, 11, 28, 42]. Alon, McDiarmid and Reed [8] showed that a ( G ) < a ( G ) = ∆ + 2; Alon, Sudakov andZaks showed indeed the conjecture is true for graphs having girth Ω(∆ log ∆) [11]. Their algorithm can bemade constructive using Beck’s technique [17] to obtain an acyclic edge coloring using ∆ + 2 colors, albeit forgraphs with girth significantly larger than Θ(∆ log ∆) [11]. We bridge this gap by providing constructionsto achieve the same girth bound as in [11], yet obtaining an acyclic edge coloring with only ∆ + 2 colors. Forgraphs with no girth bound, 16∆ colors suffice to efficiently construct an acyclic edge coloring in contrast tothe 20∆ algorithmic bound of [37] (Section 7).The recent result of Matthew Andrews on approximating the edge-disjoint paths problem on undirectedgraphs is another example, where problems (a) and (b) occur and our LLL-techniques are applied to avoidsuper-polynomially many bad events [12]. Many settings require “almost all” bad events to be avoided, and not necessarily all; e.g., consider MAX-SATas opposed to SAT. However, in the LLL context, essentially the only known general applications were “allor nothing”: either the LLL’s sufficient conditions hold, and we are able to avoid all bad events, or the LLL’ssufficient conditions are violated, and the only known bound on the number of bad events is the trivial onegiven by the linearity of expectation (which does not exploit any “almost-independence” of the bad events,as does the LLL). This situation is even more pronounced in the algorithmic setting. We take what are, toour knowledge, the first steps in this direction, interpolating between these two extremes.While our discussion here holds for all applications of the symmetric LLL, let us take MAX- k -SAT as anillustrative example. (The LLL is stated in Section 1.3, but let us recall its well-known “symmetric” specialcase: in the setting of MT with P and A as defined near the beginning of Section 1, if Pr [ A i ] ≤ p and A i depends on at most d other A j for all i , then e · p · ( d + 1) ≤ A i .) Recall that inMAX- k -SAT, we have a CNF formula on n variables, with m clauses each containing exactly k literals; asopposed to SAT, where we have to satisfy all clauses, we aim to maximize the number of satisfied clauseshere. The best general upper-bounds on the number of “violated events” (unsatisfied clauses) follow fromthe probabilistic method, where each variable is set to True or False uniformly at random and independently.On the one hand, the linearity of expectation yields that the expected number of unsatisfied clauses is m · − k (with a derandomization using the method of conditional probabilities). On the other hand, if each clauseshares a variable with at most 2 k /e − ∼ α k /e otherclauses for 1 < α < e , then we can efficiently construct an assignment to the variables that violates at most( e ln( α ) /α + o (1)) · m · − k clauses for large k . (This is better than the linearity of expectation iff α < e : itis easy to construct examples with α = e where one cannot do better than the linearity of expectation. See[6] for the fixed-parameter tractability of MAX- k -SAT above (1 − − k ) m satisfied clauses.)The above and related results for applications of the symmetric LLL, follow from the connection to the“further probabilistic analysis using the remaining randomness of LLL-distributions” that we alluded toabove; see Section 8. We believe this connection to be the main conceptual message of this paper, andexpect further applications in the future. We follow the general algorithmic framework of the Local Lemma due to MT. As in our description at thebeginning of Section 1, let P be a finite collection of mutually independent random variables { P , P , . . . , P n } A = { A , A , . . . , A m } be a collection of events, each determined by some subset of P . For any event B that is determined by a subset of P we denote the smallest such subset by vbl( B ). For any event B thatis determined by the variables in P , we furthermore write Γ( B ) = Γ A ( B ) for the set of all events A = B in A with vbl( A ) ∩ vbl( B ) = ∅ . This neighborhood relation induces the following standard dependency graph or variable-sharing graph on A : For the vertex set A let G = G A be the undirected graph with an edgebetween events A, B ∈ A iff A ∈ Γ( B ). We often refer to events in A as bad events and want to find a pointin the probability space, or equivalently an assignment to the variables P , wherein none of the bad eventshappen. We call such an assignment a good assignment .With these definitions the general (“asymmetric”) version of the LLL simply states: Theorem 1.1 (Asymmetric Lov´asz Local Lemma) . With A , P and Γ defined as above, if there exists anassignment of reals x : A → (0 , such that ∀ A ∈ A : Pr [ A ] ≤ x ( A ) Y B ∈ Γ( A ) (1 − x ( B )); (1) then the probability of avoiding all bad events is at least Π A ∈A (1 − x ( A )) > and thus there exists a goodassignment to the variables in P . We study several LLL instances where the number of events to be avoided, m , is super-polynomial in n ; ourgoal is to develop algorithms whose running time is polynomial in n which is also the size of the output -namely a good assignment of values to the n variables. We introduce a key parameter: δ := min A ∈A x ( A ) Y B ∈ Γ( A ) (1 − x ( B )) . (2)Note that without loss of generality δ ≤ because otherwise all A ∈ A are independent, i.e., defined ondisjoint sets of variables. Indeed if δ > and there is an edge in G between A ∈ A and B ∈ A then wehave > x ( A )(1 − x ( B )) and > x ( B )(1 − x ( A )), i.e., · > x ( A )(1 − x ( A )) · x ( B )(1 − x ( B )) which is acontradiction because x (1 − x ) ≤ for all x (the maximum is attained at x = ).We allow our algorithms to have a running-time that is polynomial in log(1 /δ ); in all applications knownto us, δ ≥ exp( − O ( n log n )), and hence, log(1 /δ ) = O ( n log n ). In fact because δ is an upper bound formin A ∈A P ( A ) in any typical encodings of the domains and the probabilities of the variables, log(1 /δ ) will beat most linear in the size of the input or the output.The following subsection 1.4 reviews the MT algorithm and its analysis, which will be helpful to understandsome of our proofs and technical contributions; the reader familiar with the MT algorithm may skip it. Recall the resampling-based MT algorithm; let us now review some of the technical elements in the analysisof this algorithm, that will help in understanding our technical contributions better.A witness tree τ = ( T, σ T ) is a finite rooted tree T together with a labeling σ T : V ( T ) → A of its verticesto events, such that the children of a vertex u ∈ V ( T ) receive labels from Γ( σ T ( u )) ∪ σ T ( u ). In a properwitness tree distinct children of the same vertex always receive distinct labels. The “log” C of an executionof MT lists the events as they have been selected for resampling in each step. Given C , we can associatea witness tree τ C ( t ) with each resampling step t that can serve as a justification for the necessity of thatcorrection step. τ C ( t ) will be rooted at C ( t ). A witness tree is said to occur in C , if there exists t ∈ N , suchthat τ C ( t ) = τ . It has been shown in [40] that if τ appears in C , then it is proper and it appears in C withprobability at most Π v ∈ V ( τ ) Pr [ σ T ( v )]. 4o bound the running time of the MT algorithm, one needs to bound the number of times an event A ∈ A isresampled. If N A denotes the random variable for the number of resampling steps of A and C is the executionlog; then N A is the number of occurrences of A in this log and also the number of distinct proper witnesstrees occurring in C that have their root labeled A . As a result one can bound the expected value of N A simply by summing the probabilities of appearances of distinct witness trees rooted at A . These probabilitiescan be related to a Galton-Watson branching process to obtain the desired bound on the running time.A Galton-Watson branching process can be used to generate a proper witness tree as follows. In the firstround the root of the witness tree is produced, say it corresponds to event A . Then in each subsequentround, for each vertex v independently and again independently, for each event B ∈ Γ σ T ( v ) ∪ σ T ( v ), B isselected as a child of v with probability x ( B ) and is skipped with probability (1 − x B ). We will use theconcept of a proper witness trees and Galton-Watson process in several of our proofs. When trying to turn the non-constructive Lov´asz Local Lemma into an algorithm that finds a good assign-ment the following straightforward approach comes to mind: draw a random sample for the variables in P until one is found that avoids all bad events. If the LLL-conditions are met this rejection-sampling algo-rithm certainly always terminates but because the probability of obtaining a good assignment is typicallyexponentially small it takes an expected exponential number of resamplings and is therefore non-efficient.While the celebrated algorithm of Moser (and Tardos) is much more efficient, the above rejection-samplingmethod has a major advantage: it does not just produce an arbitrary assignment but provides a randomlychosen assignment from the distribution that is obtained when one conditions on no bad event happening.In the following, we call this distribution LLL-distribution or conditional LLL-distribution .The LLL-conditions and further probabilistic analysis can be a powerful tool to obtain new results (con-structive or otherwise) like the constructive one in Section 8. The following is a well-known bound on theprobability Pr D [ B ] that the LLL-distribution D places on any event B that is determined by variables in P (its proof is an easy extension of the standard non-constructive LLL-proof [10]): Theorem 2.1.
If the LLL-conditions from Theorem 1.1 are met, then the LLL-distribution D is well-defined.For any event B that is determined by P , the probability Pr D [ B ] of B under D satisfies: Pr D [ B ] := Pr " B (cid:12)(cid:12) ^ A ∈A A ≤ Pr [ B ] · Y C ∈ Γ( B ) (1 − x C ) − ; (3) here, Pr [ B ] is the probability of B holding under a random choice of P , P , . . . , P n . The fact that the probability of an event B does not increase much in the conditional LLL-distribution when B does not depend on “too many” C ∈ A , is used critically in the rest of the paper.More importantly, the following theorem states that the output distribution D ′ of the MT-algorithm ap-proximates the LLL-distribution D and has the very nice property that it essentially also satisfies (3): Theorem 2.2.
Suppose there is an assignment of reals x : A → (0 , such that (1) holds. Let B be anyevent that is determined by P . Then, the probability that B was true at least once during the execution ofthe MT algorithm on the events in A , is at most Pr [ B ] · ( Q C ∈ Γ( B ) (1 − x C )) − . In particular the probabilityof B being true in the output distribution of MT obeys this upper-bound.Proof. The bound on the probability of B ever happening is a simple extension of the MT proof [40]. Notethat we want to prove the theorem irrespective of whether B is in A or not. In either case we are interestedin the probability that the event was true at least once during the execution, i.e., if B is in A whether it5ould have been resampled at least once. The witness trees that certify the first time B becomes true arethe ones that have B as a root and all non-root nodes from A \ { B } . Similarly as in [40], we calculate theexpected number of these witness trees via a union bound. Let τ be a fixed proper witness tree with its rootvertex labeled B . Following the proof of Lemma 3.1 and using the fact that B cannot be a child of itself,it can be shown that the probability p τ with which the Galton-Watson process that starts with B yieldsexactly the tree τ is p τ = Q A ∈ Γ( B ) (1 − x ( A )) · Q v ∈ V ( τ ) x ′ ( σ v ). Here V ( τ ) are the non-root vertices of τ and x ′ ( σ v ) = x ( σ v ) Q C ∈ Γ( σ v ) (1 − x ( C )). Plugging this in the arguments following the proof of Lemma 3.1 of[40] it is easy to see that the union bound over all these trees and therefore also the desired probability is atmost Pr [ B ] · ( Q C ∈ Γ( B ) (1 − x C )) − where the term “ Pr [ B ]” accounts for the fact that the root-event B hasto be true as well.Using this theorem we can view the MT algorithm as an efficient way to obtain a sample that comesapproximately from the conditional LLL-distribution. This efficient sampling procedure makes it possible tomake proofs using the conditional LLL-distribution constructive and directly convert them into algorithms.All constructive results of this paper are based on Theorem 2.2 and demonstrate this idea. In several applications of the LLL, the number of bad events is super-polynomially larger than the numberof underlying variables. In these cases we aim for an algorithm that still runs in time polynomial in thenumber of variables, and it is not efficient to have an explicit representation of all bad events. Surprisingly,Theorem 3.1 shows that the number of resamplings done by the MT algorithm remains quadratic and inmost cases even near-linear in the number of variables n . Theorem 3.1.
Suppose there is an ǫ ∈ [0 , and an assignment of reals x : A → (0 , such that: ∀ A ∈ A : Pr [ A ] ≤ (1 − ǫ ) x ( A ) Y B ∈ Γ( A ) (1 − x ( B )) . With δ denoting min A ∈A x ( A ) Q B ∈ Γ( A ) (1 − x ( B )) , we have T := X A ∈A x A ≤ n log(1 /δ ) . (4) Furthermore:1. if ǫ = 0 , then the expected number of resamplings done by the MT algorithm is at most v = T max A ∈A − x ( A ) , and for any parameter λ ≥ , the MT algorithm terminates within λv resamplingswith probability at least − /λ .2. if ǫ > , then the expected number of resamplings done by the MT algorithm is at most v = O ( nǫ log Tǫ ) ,and for any parameter λ ≥ , the MT algorithm terminates within λv resamplings with probability − exp( − λ ) .Proof. The main idea of relating the quantity T to n and δ is to use: (i) the fact that the variable-sharinggraph G is very dense, and (ii) the nature of the LLL-conditions which force highly connected events to havesmall probabilities and x -values. To see that G is dense, consider for any variable P ∈ P the set of events A P = { A ∈ A| P ∈ vbl( A ) } , G . Indeed, the m vertices of G can be partitioned into n suchcliques with potentially further edges between them, and therefore has at least n · (cid:0) m/n (cid:1) = m / (2 n ) − m/ m ≫ n .Let us first prove the bound on T . To do so, we fix any P ∈ P and show that P B ∈A P x B ≤ log(1 /δ ),which will clearly suffice. Recall from the discussion following (2) that we can assume w.l.o.g. that δ ≤ .If | A P | = 1, then of course P B ∈A P x B ≤ ≤ log(1 /δ ). If | A P | >
1, let A ∈ A P have the smallest x A value.Note that by definition δ ≤ x A Y B ∈A P \ A (1 − x B ) = x A − x A Y B ∈A P (1 − x B ) . If x A ≤ /
2, then δ ≤ Q B ∈A P (1 − x B ) ≤ e − P B ∈A P x B , and we get P B ∈A P x B ≤ ln (1 /δ ) < log(1 /δ ) asrequired. Otherwise, if x A > /
2, let B ∈ A P \ A . Then, δ ≤ x A · Y B ∈A P \ A (1 − x B ) = x A (1 − x B ) Y B ∈A P \ ( A ∪ B ) (1 − x B ) ≤ x A (1 − x B ) e − P B ∈A P \ ( A ∪ B x B . (5)Let us now show that for 1 / ≤ x A ≤ x B ≤ x A (1 − x B ) ≤ e − ( x A + x B ) . (6)Fix x A . We thus need to show e x B (1 − x B ) ≤ x A e xA . The derivative of e x B (1 − x B ) is negative for x B ≥
0, showing that it is a decreasing function in the range x B ∈ [ x A , e x B (1 − x B ) is obtained at x B = x A and for (6) to hold, it is enough to show that, x A (1 − x A ) ≤ e − x A holds. The second derivative of e − x A − x A (1 − x A ) is positive. Differentiating e − x A − x A (1 − x A ) andequating the derivative to 0, returns the minimum in [1 / ,
1] at x A = 0 . . >
0. Thus we have (6) and so we get x A (1 − x B ) e − P B ∈A P \ ( A ∪ B x B ≤ e − P B ∈A P x B ;using this with (5), we obtain P B ∈A P x B ≤ ln (1 /δ ) < log(1 /δ ) as desired.Given the bound on T , part (1) follows directly from the main theorem of [40] and by a simple applicationof Markov’s inequality.Part (2) now also follows from [40]. In section 5 of [40] it is shown that saving a (1 − ǫ ) factor in the probabilityof every resampling step implies that with high probability, no witness tree of size Ω( ǫ log P A ∈A x A − x A )occurs. This easily implies that none of the n variables can be resampled more often. It is furthermoreshown that without loss of generality all x -values can be assumed to be bounded away from 1 by at least O ( ǫ ). This simplifies the upper bound on the expected running time to n · O ( ǫ log Tǫ ).As mentioned following the introduction of δ in (2), log(1 /δ ) ≤ O ( n log n ) in all applications known to us,and is often even smaller. Remarks • The max A ∈A − x ( A ) factor in the running time of part (1) of Theorem 3.1 corresponds to the expectednumber of times the event A gets resampled until one satisfying assignment to its variables is found.It is obviously unavoidable for an algorithm that has only black-box resampling and evaluation accessto the events. If one alters the algorithm to pick a random assignment that satisfies A (which can forexample be computed using rejection sampling, taking an expected Θ( − x ( A ) ) trials each time), thisfactor can be avoided. 7 The estimation T = P A ∈A x A = O ( n log 1 /δ ) is tight and can be achieved, e.g., by having an iso-lated event with constant probability for each variable. In many cases with log 1 /δ = ω (log n ) it isnevertheless an overestimate, and in most cases the running time is O ( n log n ) even for ǫ = 0.While Theorem 3.1 gives very good bounds on the running time of MT even for applications with Ω( n ) ≤ m ≤ poly( n ) many events, it unfortunately often fails to be directly applicable when m becomes super-polynomial in n . The reason is that maintaining bad events implicitly and running the resampling processrequires an efficient way to find violated events. In many examples like those of Section 4, 5 and 6 withsuper-polynomially many events, finding violated events or even just verifying a good assignment is notknown to be in polynomial time (often even provably NP-hard). To capture the sets of events for which wecan run the MT algorithm efficiently we use the following definition: Definition 3.2. (Efficient verifiability)
A set A of events that are determined by variables in P is efficiently verifiable if, given an arbitrary assignment to P , we can efficiently find an event A ∈ A that holdsor detect that there is no such event. Because many large A of interest are not efficiently verifiable, a direct application of the MT-algorithm is notefficient. Nevertheless we show in the rest of this section that using the randomness in the output distributionof the MT-algorithm characterized by Theorem 2.2, it is still practically always possible to obtain efficientMonte Carlo algorithms that produce a good assignment with high probability.The main idea is to judiciously select an efficiently verifiable core subset A ′ ⊆ A of bad events and apply theMT-algorithm to it. Essentially instead of looking for violated events in A we only resample events from A ′ and terminate when we cannot find one such violated event. The non-core events will have small probabilitiesand will be sparsely connected to core events and as such their probabilities in the LLL-distribution andtherefore also the output distribution of the algorithm does not blow up by much. There is thus hope thatthe non-core events remain unlikely to happen even though they were not explicitly fixed by the algorithm.Theorem 3.3 shows that if the LLL-conditions are fulfilled for A then a non-core event A ∈ A \ A ′ is violatedin the produced output with probability at most x A . This makes the success probability of such an approachat least 1 − X A ∈A\A ′ x A . Theorem 3.3.
Let A ′ ⊆ A be an efficiently verifiable core subset of A . If there is an ǫ ∈ [0 , and anassignment of reals x : A → (0 , such that: ∀ A ∈ A : Pr [ A ] ≤ (1 − ǫ ) x ( A ) Y B ∈ Γ( A ) ∩A ′ (1 − x ( B )) . Then the modified MT-algorithm can be efficiently implemented with an expected number of resamplingsaccording to Theorem 3.1. The algorithm furthermore outputs a good assignment with probability at least − X A ∈A\A ′ x A .Proof. Note that the set A ′ on which the actual MT-algorithm is run fulfills the LLL-conditions. Thismakes Theorem 3.1 applicable. To argue about the success probability of the modified algorithm, note that x ( A ) ≥ Pr [ A ] Q B ∈ Γ ′ ( A ) (1 − x ( B )) where Γ ′ ( A ) are the neighbors of A in the variable sharing graph definedon A ′ . Using Theorem 2.2 we get that the probability that a non-core bad event A ∈ A \ A ′ holds in theassignment produced by the modified algorithm is at most x A . Since core-events are avoided completely bythe MT-algorithm a simple union bound over all conditional non-core event probabilities results in a failureprobability of at most P A ∈A\A ′ x A .Here is furthermore a direct proof of the theorem incorporating the argument from Theorem 2.2 into theproof: 8edefine the witness trees of [40] to have only events from A ′ in non-root nodes, thus getting a modificationof the Galton-Watson process from Section 3 of [40]. As in [40], we grow witness trees from an execution-logstarting with a root event that holds at a certain point in time. This guarantees that we capture events A ∈ A \ A ′ happening even though they are never resampled (since we never check whether such events A hold or not). Note that if some A ∈ A \ A ′ holds after termination, then there is a witness tree with A as rootand with all non-root nodes belonging to A ′ . Following the proof of Lemma 3.1 from [40] the probability forthis to happen is at most P A ∈A\A ′ x A . (We do not get P A ∈A\A ′ x A / (1 − x A ), since A cannot be a child ofitself in the witness trees that we construct.)While the concept of an efficiently verifiable core is easy to understand, it is not clear how often and howsuch a core can be found. Furthermore having such a core is only useful if the probability of the non-core events is small enough to make the failure probability, which is based on the union bound over thoseprobabilities, meaningful. The following main theorem shows that in all applications that can tolerate asmall “exponential” ǫ -slack as introduced by [21], finding such a good core is straightforward: Theorem 3.4.
Suppose there is a fixed constant ǫ ∈ (0 , and an assignment of reals x : A → (0 , − ǫ ) such that: ∀ A ∈ A : Pr [ A ] − ǫ ≤ x ( A ) Y B ∈ Γ( A ) (1 − x ( B )) . Suppose further that log 1 /δ ≤ poly( n ) , where δ = min A ∈A x ( A ) Q B ∈ Γ( A ) (1 − x ( B )) . Then for every p ≥ n ) the set { A i ∈ A : Pr [ A i ] ≥ p } has size at most poly( n ) , and is thus essentially always an efficientlyverifiable core subset of A . If this is the case, then there is a Monte Carlo algorithm that terminates after O ( nǫ log nǫ ) resamplings and returns a good assignment with probability at least − n − c , where c > is anydesired constant.Proof. For a probability p = 1 / poly( n ) to be fixed later we define A ′ as the set of events with probabilityat least p . Recall from Theorem 3.1 that P A ∈A x A ≤ O ( n log(1 /δ )). Since x A ≥ p for A ∈ A ′ , we get that |A ′ | ≤ O ( n log(1 /δ ) /p ) = poly( n ). By assumption A ′ is efficiently verifiable and we can run the modifiedresampling algorithm with it.For every event we have Pr [ A ] ≤ x A < − ǫ and thus get a (1 − ǫ ) ǫ = (1 − Θ( ǫ ))-slack; therefore Theorem 3.1applies and guarantees that the algorithm terminates with high probability after O ( nǫ log nǫ ) resamplings.To prove the failure probability note that for every non-core event A ∈ A \ A ′ , the LLL-conditions withthe “exponential ǫ -slack” provide an extra multiplicative p − ǫ factor over the LLL-conditions in Theorem3.1. We have x ( A ) Pr [ A ] ǫ ≥ Pr [ A ] Q B ∈ Γ ′ ( A ) (1 − x ( B )) where Γ ′ ( A ) are the neighbors of A in the variablesharing graph defined on A ′ . Using Theorem 2.2 and setting p = n − Θ(1 /ǫ ) , we get that the probability thata non-core bad event A ∈ A \ A ′ holds in the assignment produced by the modified algorithm is at most x A Pr [ A ] ǫ ≤ x A n − Θ(1) . Since core-events are avoided completely by the MT-algorithm, a simple union boundover all conditional non-core event probabilities results in a failure probability of at most n Θ(1) P A ∈A\A ′ x A .Now since, P A ∈A\A ′ x A ≤ P A ∈ A ′ x A = T = poly ( n ) holds, we get that we fail with probability at most n − c on non-core events while safely avoiding the core. This completes the proof of the theorem.The last theorem nicely completes this section; it shows that in practically all applications of the general LLLit is possible to obtain a fast Monte Carlo algorithm with arbitrarily high success probability. The conditionsof Theorem 3.4 are very easy to check and are usually directly fulfilled. That is, in all LLL-based proofs(with a large number of events A i ) known to us, the set of high-probability events forms a polynomial-sizedcore that is trivially efficiently verifiable, e.g., by exhaustive enumeration. Theorem 3.4 makes these proofsconstructive without further complicated analysis. In most cases, only some adjustments in the bounds areneeded to respect the ǫ -slack in the LLL-condition. 9 emarks • Note that the failure probability can be made an arbitrarily small inverse polynomial. This is impor-tant since for problems with non-efficiently verifiable solutions the success probability of Monte Carloalgorithms cannot be boosted using standard probability amplification techniques. • In all applications known to us, the core above has further nice structure: usually the probability ofan event A i is exponentially small in the number of variables it depends on. Thus, each event in thecore only depends on O (log n ) many A i , and hence is usually trivial to enumerate. This makes thecore efficiently verifiable, even when finding a general violated event in A is NP-hard. • The fact that the core consists of polynomially many events with usually logarithmically many variableseach, makes it often even possible to enumerate the core in parallel and to evaluate each event in parallel.If this is the case one can get an RNC algorithm by first building the dependency graph on the coreand then computing an MIS of violated events in each round (using MIS algorithms such as [3, 35]).Using the proof of Theorem 3.1 which is based on some ideas from the parallel LLL algorithm of MT,it is easy to see that only logarithmically many rounds of resampling these events are needed. • Even though the derandomization of [21] also only requires an “exponential ǫ -slack” in the LLL-conditions, applying the techniques used there and in general getting efficient deterministic algorithmswhen m is superpolynomial seems hard. The derandomization in [21] either explicitly works on all m events when applying the method of conditional probabilities or uses approximate O (log m )-wiseindependent probability spaces which have an inherently poly ( m ) size domain. The
Santa Claus problem is the restricted assignment version of the max-min allocation problem of indivisibleitems. In this section, we present the first efficient randomized constant-factor approximation algorithm forthis problem.In the max-min allocation problem, there is a set C of n items, and m children. The value (utility) of item j tochild i is p i,j ≥
0. An item can be assigned to only one child. If a child i receives a subset of the items S i ⊆ C ,then the total valuation of the items received by i is P j ∈ S i p ( i, j ). The goal is to maximize the minimumtotal valuation of the items received by any child, that is, to maximize min i P j ∈ S i p ( i, j ). (The “minmax”version of this “maxmin” problem is the classical problem of makespan minimization in unrelated parallelmachine scheduling [34].) This problem has received much attention recently [15, 14, 26, 13, 16, 20, 44].A restricted version of max-min allocation is where each item has an intrinsic value, and where for everychild i , p i,j is either p j or 0. This is known as the Santa Claus problem. The Santa Claus problem is NP-hardand no efficient approximation algorithm better than 1 / P = N P [18]. Bansal andSviridenko [15] considered a linear-programming (LP) relaxation of the problem known as the configurationLP, and showed how to round this LP to obtain an O (log log log m/ log log m )-approximation algorithm forthe Santa Claus problem. They also showed a reduction to a crisp combinatorial problem, a feasible solutionto which implies a constant-factor integrality gap for the configuration LP.Subsequently, Feige [26] showed that the configuration LP has a constant integrality gap. Normally such aproof immediately gives a constant-factor approximation algorithm that rounds an LP solution along the lineof the integrality-gap proof. In this case Feige’s proof could not be made constructive because it was heavilybased on repeated reductions that apply the asymmetric version of the LLL to exponentially many events.Due to this unsatisfactory situation, the Santa Claus problem was the first on a list of problems reported inthe survey “Estimation Algorithms versus Approximation Algorithms” [27] for which a constructive proof10ould be desirable. Using a completely different approach, Asadpour, Feige and Saberi [13] could show thatthe configuration LP has an integrality gap of at most . Their proof uses local-search and hypergraphmatching theorems of Haxell [30]. Haxell’s theorems are again highly non-constructive and the stated local-search problem is not known to be efficiently solvable. Thus this second non-constructive proof still left thequestion of a constant-factor approximation algorithm open.In this section we show how our Theorem 3.4 can be used to easily and directly constructivize the LLL-basedproof of Feige [26], giving the first constant-factor approximation algorithm for the Santa Claus problem.It is to be noted that the more general max-min fair allocation problem appears significantly harder. Itis known that for general max-min fair allocation, the configuration LP has a gap of Ω( √ m ). Asadpourand Saberi [14] gave an O ( √ m ln ( m )) approximation factor for this problem using the configuration LP.Recently, Saha and Srinivasan [44] have improved this to O ( √ m ln m ln ln m ). So far the best approximation ratioknown for this problem due to Chakraborty, Chuzhoy and Khanna is O ( n ǫ ) [20], for any constant ǫ > O ( n /ǫ ) time. We focus on the Santa Claus problem here. We start by describing the configuration LP and the reductionof it to a combinatorial problem over a set system, albeit with a constant factor loss in approximation. Nextwe give a constructive solution for the set system problem, thus providing a constant-factor approximationalgorithm for the Santa Claus problem.We guess the optimal solution value T using binary search. An item j is said to be small, if p j < αT , otherwiseit is said to be big. Here α < C to child i is denoted by p i,C = P j ∈ C p i,j . A configuration C is called valid for child i if: • p i,C ≥ T and all the items are small; or • C contains only one item j and p i,j = p j ≥ αT , that is, j is a big item for child i .Let C ( i, T ) denote the set of all valid configurations corresponding to child i with respect to T . We definean indicator variable y i,C for each child i and all valid configurations C ∈ C ( i, T ) such that it is 1 if child i receives configuration C and 0 otherwise. These variables are relaxed to take any fractional value in [0 ,
1] toobtain the configuration LP relaxation. ∀ j : X C ∋ j X i y i,C ≤ ∀ i : X C ∈ C ( i,T ) y i,C = 1 ∀ i, C : y i,C ≥ − ǫ ) T to each child in polynomial time.The algorithm of Bansal and Sviridenko starts by solving the configuration LP (7). Then by various stepsof simplification, they reduce the problem to the following instance:11 here are p groups, each group containing l children. Each child is associated with a collection of k itemswith a total valuation of Tc , for some constant c > . Each item appears in at most βl sets for some β ≤ .Such an instance is referred to as ( k, l, β ) -system. The goal is to efficiently select one child from each group and assign at least ⌊ γk ⌋ items to each of the chosenchildren, such that each item is assigned only once. If such an assignment exists, then the corresponding( k, l, β )-system is said to be γ -good ( k, l, β )-system.Feige showed that indeed the ( k, l, β )-system that results from the configuration LP is γ -good, where γ = O (cid:16) max (1 ,β ) (cid:17) [26]. This established a constant factor integrality gap for the configuration LP. However, theproof being non-constructive, no algorithm was known to efficiently find such an assignment. In the remainingof this section, we make Feige’s argument constructive, thus giving a constant-factor approximation algorithmfor the Santa Claus problem. But before that, for the sake of completeness, we briefly describe the procedurethat obtains a ( k, l, β )-system from an optimal solution of the configuration LP [15]. ( k, l, β ) -system The algorithm starts by simplifying the assignment of big items in an optimal solution (say) y ∗ of theconfiguration LP. Let J B denote the set of big items. Consider a bipartite graph G with children M on theright side and big items J B on the left side. An edge ( i, j ) , i ∈ M, j ∈ J B of weight w i,j = P j ∈ C ( i,T ) y ∗ i,C isinserted in G if w i,j >
0. These w i,j values are then modified such that after the modification the edges of G with weight in (0 ,
1) form a forest.
Lemma 5 [15].
The solution y ∗ can be transformed into another feasible solution of the configuration LPwhere the graph G is a forest. The transformation is performed using the simple cycle-breaking trick. Each cycle is broken into two match-ings; weights on the edges of one matching are increased gradually while the weights on the other aredecreased until some weights hit 0 or 1. If a w i,j becomes 0 in this procedure, the edge ( i, j ) is removed from G . Else if it becomes 1, then item j is permanently assigned to child i and the edge ( i, j ) is removed.Suppose G ′ is the forest obtained after this transformation. The forest structure is then further exploited toform groups of children and big items. Lemma 6 [15].
The solution y ∗ can be transformed into another solution y ′ such that children M and bigitems J B can be clustered into p groups M , M , . . . , M p and J B, , J B , . . . , J B p respectively with the followingproperties.1. For each i = 1 , , . . . , p , the number of jobs J B,i in group M i is exactly | M i | − . The group J B,i couldpossibly be empty.2. Within each group the assignment of big job is entirely flexible in the sense that they can be placedfeasibly on any of the | M i | − children out of the | M i | children.3. For each group M i , the solution y ′ assigns exactly one unit of small configurations to children in M i and all the | M i | − units of configurations correspond to big jobs in J B,i . Also, for each small job j , P C ∋ j P i y ′ i,C ≤ . Lemma 6 implies that the assignment of big items to children in a group is completely flexible and canbe ignored. We only need to choose one child from each group who will be satisfied by a configuration ofsmall items. Let y ′ assigns a small configuration C to an extent of y ′ m,C to some child c ∈ M i , i ∈ [1 , p ],then we say that M i contains the small configuration C for child c ∈ M i . Without loss of generality, itcan be assumed that each child in the groups is fractionally assigned to exactly one small configuration.Bansal and Sviridenko further showed that y ′ can again be simplified such that each small configuration is12ssigned to at least to an extent of l = n + m to each child and for each small job j , P C ∋ j P i y ′ i,C ≤
3. Thisimplies, if we consider all the small configurations across p groups, then each small job appears in at most βl configurations, where β = 3.Finally, the following lemma shows that by losing a constant factor in the approximation, one can assumethat all the small jobs have same size. Lemma 8 [15].
Given the algorithmic framework above, by losing a constant factor in the approximation,each small job can be assumed to have size ǫTn . As a consequence of the above lemma, we now have the following scenario.
There are p groups M , M , . . . , M p , each containing at most l children. Each child is associated with a setthat contains k = Θ( nǫ ) items. Each item belongs to at most βl sets. The goal is to pick one child from eachgroup and assign at least a constant fraction of the items in its set such that each item is assigned exactlyonce. Therefore, we arrive at what is referred as a ( k, l, β )-system. γ -good solution for a ( k, l, β ) -system We now point out the main steps in Feige’s algorithm, and in detail, describe the modifications required tomake Feige’s algorithm constructive.
Feige’s Nonconstructive Proof for γ -good ( k, l, β ) -system: Feige’s approach is based on a systematicreduction of k and l in iterations, finally arriving to a system where k or l are constants. For constant k or l the following lemma asserts a constant γ . Lemma 4.1 (Lemma 2.1 and 2.2 of [26]) . For every ( k, l, β ) -system a γ -good solution with γ satisfying, γ = k or γk = ⌊ k ⌈ βl ⌉ ⌋ can be found efficiently. The reduction of ( k, l, β )-system to constant k and l involves two main lemmas, which we refer to as Reduce-l lemma and
Reduce-k lemma respectively.
Lemma 4.2 (Lemma 2.3 of [26], Reduce-l) . For l > c ( c is a sufficiently large constant), every γ -good ( k, l, β ) -system with k ≤ l can be transformed into a γ -good ( k, l ′ , β ′ ) -system with l ′ ≤ log l and β ′ ≤ β (1 + l ) . Lemma 4.3 (Lemma 2.4 of [26], Reduce-k) . Every ( k, l, β ) -system with k ≥ l ≥ c can be transformed into a ( k ′ , l, β ) -system with k ′ ≤ k and with the following additional property: if the original system is not γ -good,then the new system is not γ ′ -good for γ ′ = γ (1 + k √ γk ) . Conversely, if the new system is γ ′ -good, then theoriginal system was γ -good. If β is not a constant to start with, then by applying the following lemma repeatedly, β can be reducedbelow 1. Lemma 4.4 (Lemma 2.5 of [26]) . For l > c , every γ -good ( k, l, β ) -system can be transformed into a γ -good ( k ′ , l, β ′ ) -system with k ′ = ⌊ k ⌋ and β ′ ≤ β (cid:16) log βl √ βl (cid:17) . However in our context, β ≤
3, thus we ignore Lemma 2.5 of [26] from further discussions.Starting from the original system, as long as l > c , Lemma Reduce-l is applied when l > k and LemmaReduce-k is applied when k ≥ l . In this process β grows at most by a factor of 2. Thus at the end, l is aconstant and so is β . Thus by applying Lemma 4.1, the constant integrality gap for the configuration LP isestablished. 13 andomized Algorithm for γ -good ( k, l, β ) -system: There are two main steps in the algorithm.1. Show a constructive procedure to obtain the reduced system through Lemma Reduce-l and LemmaReduce-k.2. Map the solution of the final reduced system back to the original system.We now elaborate upon each of these.
This follows quite directly from [40]. The algorithm picks ⌊ log l ⌋ sets uniformly at random and independentlyfrom each group. Thus while the value of k remains fixed, l is reduced to l ′ = ⌊ log l ⌋ . Now in expectationthe value of β does not change and the probability that β ′ > β (1 + l ), and hence β ′ l ′ > βl (1 + l ), is atmost e − β ′ l ′ / l ≤ e − log l = l − log l . We define a bad event corresponding to each element: • A j : Element j has more than β ′ l ′ copies.Now noting that the dependency graph has degree at most klβl ≤ l , the uniform (symmetric) version ofthe LLL applies. Now it is easy to check if there exists a violated event: we simply count the number of timesan element appears in all the sets. Thus we directly follow [40]; setting x A j = el log2 l , we get the expectedrunning time to avoid all the bad events to be O ( plk/l log l ) = O ( p ) = O ( m ). This is the main challenging part. The random experiment involves selecting each item independently atrandom with probability . To characterize the bad events, we need a structural lemma from [26]. Constructa graph on the sets, where there is an edge between two sets if they share an element. A collection of setsis said to be connected if and only if the subgraph induced by this collection is connected.We consider two types of bad events:1. B : some set has less than k ′ = (cid:16) − log k √ k (cid:17) k items surviving, and2. B i for i ≥
2: there is a connected collection of i sets from distinct groups whose union originallycontained at most iγk items, of which more than iδ ′ k items survive, where δ ′ = γ (cid:16) log k √ γk (cid:17) .If none of the above bad events happen, then we can consider the first k ′ items from each set and yet thesecond type of bad events do not happen. These events are chosen such that γ ′ -goodness ( γ ′ = δ ′ k k ′ ≤ γ (cid:16) k √ γk (cid:17) ) of the new system certifies that the original system was γ -good. That this is indeed the casefollows directly from Hall’s theorem as proven by Feige: Lemma 4.5 (Lemma 2.7 of [26]) . Consider a collection of n sets and a positive integer q .1. If for some ≤ i ≤ n , there is a connected subcollection of i sets whose union contains less than iq items, then there is no choice of q items per set such that all items are distinct.2. If for every i , ≤ i ≤ n , the union of every connected subcollection of i sets contains at least iq (distinct) items, then there is a choice of q items per set such that all items are distinct. B i , i ≥
1, taking x i = 2 − i log k is sufficient to satisfy thecondition (1) of the asymmetric LLL. More precisely, suppose we define, for any bad event B ∈ S i ≥ B i ,Γ( B ) to be as in Section 1.3: i.e., Γ( B ) is the set of all bad events A = B such that A and B both dependon at least one common random variable in our “randomly and independently selecting items” experiment.Then, it is shown in [26] that with the choice x i = 2 − i log k for all events in B i , we have ∀ ( i ≥ ∀ ( B ∈ B i ) , Pr [ B ] ≤ − i log k ≤ x i Y j ≥ Y A ∈ ( B j ∩ Γ( B )) (1 − x j ) . (8)Thus by the LLL, there exists an assignment that avoids all the bad events. However, no efficient constructionwas known here, and as Feige points out, “the main source of difficulty in this respect is Lemma 2.4, becausethere the number of bad events is exponential in the problem size, and moreover, there are bad eventsthat involve a constant fraction of the random variables.” Our Theorem 3.4 again directly makes thisproof constructive and gives an efficient Monte Carlo algorithm for producing a reduce-k system with highprobability. Lemma 4.6.
There is a Monte Carlo algorithm that produces a valid reduce-k system with probability atleast − /m .Proof. Note from (8) that we can take δ = 2 − m log k . So, we get that log 1 /δ = O ( m log k ) = O ( n log n )where n is the number of items and m ≤ n is the number of children. We furthermore get that all eventswith probability larger than a fixed inverse-polynomial involve only connected subsets of size O ( log m log k ) andTheorem 3.4 implies that there are only polynomially many such “high” probability events. (This canalso be seen directly since the degree of a subset is bounded by kβl ≤ k and the number of connectedsubcollections is therefore at most (6 k ) O ( log m log k ) = m O (1) = n O (1) .) The connected collections of subsets areeasy to enumerate using, e.g., breadth-first search and are therefore efficiently verifiable (in fact, even inparallel). Theorem 3.4 thus applies and directly proves the lemma. By repeatedly applying algorithms to produce Reduce-l or Reduce-k system, we can completely reduce downthe original system to a system with a constant number of children per group, where β can increase from3 to at most 6 due to Lemma Reduce-l. This involves at most log m Reduce-l reductions and at most log n Reduce-k reductions. We can furthermore assume that n < m since otherwise simply all combinations ofone child per group could be tried in time polynomial in n . Since, each Reduce-l or Reduce-k operationproduces a desired solution with probability at least 1 − m , by union bound, with probability at least1 − O (log n log m/m ) = 1 − O (log m/m ) a final ( k, l, β )-system is produced that is γ -good for some constant γ by Lemma 4.1. Using Lemma 4.1, we can also find a γ -good selection of children. Now, once one childfrom each group is selected, we can construct a standard network flow instance to assign items to thesechosen children (Lemma 4.8). This finishes the process of mapping back a solution of the reduced system tothe original ( k, l, β )-system. While checking whether an individual reduction failed seems to be a NP-hardtask, it is easy to see in the end whether a good enough assignment is produced. This enables us to rerunthe algorithm in the unlikely event of a failure. Thus, the Monte Carlo algorithm can be strengthened to analgorithm that always produces a good solution and has an expected polynomial running-time.The details of the above are given in two lemmas, Lemma 4.7 and Lemma 4.8. Theorem 4.9 follows fromthe two lemmas.Suppose we start with a ( k , l , β )-system and after repeated application of either Lemma Reduce-l orLemma Reduce-k reach at a ( k s , l s , β s )-system, where l s < c , a constant. We then employ Lemma 4.1 toobtain a γ s -good ( k s , l s , β s )-system, where γ s satisfies γ s k s = ⌊ k s ⌈ β s l s ⌉ ⌋ . Since l s is a constant and β s ≤ γ s is also a constant. Lemma 4.1 also gives a choice of a child from each group, denoted by a function15 : { , . . . , p } → { , . . . , l s } that serves as a witness for γ s -goodness of ( k s , l s , β s )-system. We use this samemapping for the original system. The following lemma establishes the goodness of the ( k , l , β )-system. Lemma 4.7.
Given a sequence of reductions of k , ( k , l , β ) → . . . → ( k s , l s , β s ) , interleaved with reductionsof l , let for all s ≥ , γ s = γ s − (1 + k s − √ γ s − k s − ) . Then if the final reduced system is γ s -good and the function f : { , . . . , p } → { , . . . , l s } serves as a witness for its γ s -goodness, then f also serves as a witness of γ -goodness of ( k , l , β ) system with high probability. In other words, we can simply use the assignment givenby f to select one child from each group and that assignment serves as a witness of γ -goodness of the originalsystem with high probability.Proof. Suppose there exists a function f that serves as a witness for γ s -goodness of the ( k s , l s , β s )-system, butdoes not serve as a witness that ( k s − , l s − , β s − )-system is γ s − -good. Then there must exist a connected col-lection of i, i > p groups according to f , such that their union contains less than γ s − k s − i items. However in the reduced system, their union has γ s k s − i elements. Call such a function f bad. Thusevery bad function is characterized by a violation of event of type B i , i ≥
1, described in Section 4.3.2.However, by Lemma 4.6 we have Pr [ ∃ a bad function f ] ≤ Pr [an event of type B i , i ≥ ≤ m .Now the maximum number of times the Reduce-k step is applied is at most log k ≤ log n . Thus if theReduce-l step is not applied at all, then by a union bound, function f is γ -good for the ( k , l , β )-systemwith probability at least 1 − log m log nm . We can assume without loss of generality that n ≤ m . (Otherwise inpolynomial time we can guess the children who receive small items and thus know f . Once f is known, anassignment of small items to the children chosen by f can be done in polynomial time through Lemma 4.8.)Since n ≤ m , function f is γ -good for the ( k , l , β )-system with probability at least 1 − log m/m . Nowsince the Reduce-l step only reduces l and keeps k intact, it does not affect the goodness of the set system.Once we know the function f , using Lemma 4.8, we can get a valid assignment of ⌊ kγ ⌋ items to each chosenchild: Lemma 4.8.
Given a function f : { , . . . , p } → { , . . . , l } , and parameter γ , there is a polynomial timealgorithm to determine, whether f is γ -good and we can determine the subset of ⌊ kγ ⌋ items received by eachchild f ( i ) , i ∈ [1 , p ] .Proof. We construct a bipartite graph with a set of vertices U = { , . . . , p } corresponding to each chosenchild from the p groups, a set of vertices V corresponding to the small items in the sets of the chosen children,a source s and a sink t . Next we add a directed edge of capacity ⌊ γk ⌋ from source s to each vertex in U . Wealso add directed edges ( u, v ) , u ∈ U, v ∈ V , if the item u belongs to the set of v . These edges have capacity1. Finally we add a directed edge from each vertex in V to the sink t with capacity 1. We claim that thisflow network has a maximum flow of ⌊ kγ ⌋ p iff f is γ -good:For the one direction let f be γ -good. Thus there exists a set of ⌊ γk ⌋ elements that can be assigned to eachchild u ∈ U . Send one unit of flow from each child to these items that it receives. The outgoing flow fromeach u ∈ U is exactly ⌊ γk ⌋ . Since each item is assigned to at most one child, flow on each edge ( v, t ) , v ∈ V is at most 1. Thus all the capacity constraints are maintained and the flow value is ⌊ γk ⌋ p .For the other direction consider an integral maximum flow of ⌊ kγ ⌋ p . Since the total capacity of all the edgesemanating from the source is ⌊ kγ ⌋ p , they must all be saturated by the maxflow. Since the flow is integral,for each child u there are exactly ⌊ γk ⌋ edges with flow 1 corresponding to the items that it receives. Alsosince no edge capacity is violated, each item is assigned to exactly one child. Therefore f is γ -good.To check a function f for γ -goodness and obtain the good assignment we construct the flow graph and runa max flow algorithm that outputs in an integral flow. As proven above a max flow value of ⌊ kγ ⌋ p indicates γ -goodness and for a γ -good f the assignment can be directly constructed from the flow by considering onlythe flow carrying edges. 16 heorem 4.9. There exists a constant α > and a randomized algorithm for the Santa Claus problem thatruns in expected polynomial time and always assigns items of total valuation at least α · OPT to each child.
In this section, we give an efficient Monte-Carlo construction for non-repetitive coloring of graphs. Call aword (string) w “squarefree” or “non-repetitive” if there does not exist any string of the form w = xx , where x = ∅ . Let us refer to graphs using the symbol H instead of G , to not confuse with our dependency graphs G . Recall from Section 1 that a k -coloring of the edges of H is called non-repetitive if the sequence of colorsalong any path in H is squarefree: i.e., we want a coloring in which no path has a color-sequence of the form xx . (All paths here refer to simple paths.) The smallest k such that H has a non-repetitive coloring using k colors is called the Thue number of H and is denoted by π ( H ). The Thue number was first defined by Alon,Grytczuk, Hauszczak and Riordan in [5]: it is named after Thue who proved in 1906 that if H is a simplepath, then π ( H ) = 3 [47]. While the method of Thue is constructive, no efficient construction is knownfor general graphs. Alon et al. showed through application of the asymmetric LLL that π ( H ) ≤ c ∆( H ) for some absolute constant c . Their proof was non-constructive. The number of bad events is exponential.Not only that, checking whether a given coloring is non-repetitive is coNP-Hard, even when the number ofcolorings is restricted to 4 [36]. Thus checking if some “bad event” holds in a given coloring is coNP-Hard.Since the work of Alon et al., the non-repetitive coloring of graphs has received a good deal of attention inthe last few years [22, 45, 29, 32, 19, 4]. Yet no efficient construction is known till date, except for somespecial classes of graphs such as complete graphs, cycles and trees. Suppose we are given a graph H with maximum degree ∆. We first give the proof of Alon et al. which showsthat π ( H ) ≤ c ∆ , and then show how to convert this proof directly into a constructive algorithm (with theloss of a ∆ ǫ factor in the number of colors used): Theorem 5.1 (Theorem 1 of [5]) . There exists an absolute constant c such that π ( H ) ≤ c ∆ for all graphs H with maximum degree at most ∆ .Proof. Let C = (2 e + 1)∆ . Randomly color each edge of H with colors from C . Consider the followingtypes of bad events B i , for i ≥
1: “there exists a path P of length 2 i , such that the second half of P iscolored identically to its first half”.We have for a path P of length 2 i, i ≥ Pr [P has coloring of the form xx] = C i . Also, a path of length 2 i intersects at most 4 ij ∆ j paths of length 2 j . Thus, for any bad event A of type i , we have Pr [ A ] = C i andthat each bad event of type i share variables with at most 4 ij ∆ j bad events of type B j . Set x i = i ∆ i .We have (1 − x j ) ≥ e − x j ; this, along with the fact that P j ≥ j/ j = 2, shows that x i Y j (1 − x j ) ij ∆ j ≥ x i e − i P j x j j ∆ j ≥ i ∆ i e − i P j j j = (2 e ∆ ) − i . Since C = (2 e + 1)∆ , the condition of the LLL is satisfied and we are guaranteed the existence of such anon-repetitive coloring.Now we see that using just a slightly higher number of colors suffices to make Theorem 3.4 apply. Theorem 5.2.
There exists an absolute constant c such that for every constant ǫ > there exists a MonteCarlo algorithm that given a graph H with maximum degree ∆ , produces a non-repetitive coloring using at ost c ∆ ǫ colors. The failure probability of the algorithm is an arbitrarily small inverse polynomial in thesize of H .Proof. We apply the LLL using the same random experiments and bad events as in Theorem 5.1 but with C ′ = C − ǫ ′ colors such that C ′ < c ∆ ǫ . Using the same settings for x A gives an exponential ǫ ′ slack in theLLL-conditions since the probability of a bad event of type i is now at most C ′ i = (cid:0) C ′ i (cid:1) − ǫ ′ . Recall Theorem3.4. Clearly, log 1 /δ = O ( n ) and so the last thing to check to apply Theorem 3.4 is that for any inversepolynomial p , the bad events with probability at least p are efficiently verifiable. Here these events consistof paths smaller than a certain length (of the form O ((1 /ǫ ) log n/ log ∆), where n is the number of vertices),and Theorem 3.4 guarantees that there are only polynomially many of these. Using breadth-first-search togo through these paths and checking each of them for non-repetitiveness is efficient and thus Theorem 3.4directly applies. In this section, we briefly sketch another application of our method, namely the construction of Ramsey-typegraphs.The Ramsey number R ( K s , K t ) is the smallest number ℓ such that for any n ≥ ℓ and in any red-blue coloringof the edges of K n , there either exists a K s with all red edges or a K t with all blue edges. Here, K a for anyinteger a denotes a clique of size a as usual. The fact that these numbers are finite for all s, m is a special caseof Ramsey’s well-known theorem (see e.g. [43]). In one of the first applications of probabilistic methods incombinatorics Erd¨os showed the lower bound of R ( K m , K m ) = Ω( m m/ ) [25]. Since, then obtaining lowerbounds on R ( K s , K t ) and constructing Ramsey graphs avoiding “large” cliques as well as “large” independentsets simultaneously has attracted much attention [2, 9, 31, 7]. The case of fixed s is the main example case for off-diagonal Ramsey numbers. Alon and Pudl´ak gave an explicit deterministic construction for off-diagonalRamsey graphs in [2]. They showed constructively for some ǫ > R ( K s , K t ) ≥ t ǫ √ log s/ log log s . Thebest known bound for R ( K s , K t ) can be obtained using LLL, R ( K s , K t ) = O (cid:16) t log t (cid:17) s +12 [7]. Krivelevichgave a Monte Carlo algorithm matching this bound through large deviation inequalities [31]. In addition,Krivelevich considered related Ramsey type problems, for example, he showed that there exists a K -freegraph on n vertices in which any o ( n / log / n ) set of vertices does not contain a K . The problem offinding constructions for Ramsey type graphs matching the best known bounds is of great interest and mayhave algorithmic applications as well.Using our method, we can achieve the best known bound for off-diagonal Ramsey numbers, that is, for fixed s , by directly making the LLL-based proof [7] constructive. More importantly, we can provide randomized(Monte Carlo) constructions of graphs on n vertices of the form: “there is no subgraph U in any set of s vertices and no subgraph W in any set of t vertices”, where t can be large, typically n Θ(1) – the existenceof which can be proved using the LLL (often using appropriate random-graphs G ( n, p )). When U = K s , W = K t and s is fixed, we get the off-diagonal Ramsey number. We refer to these as general Ramsey-typegraphs. This is a direct generalizations of related Ramsey type problems considered by Krivelevich [31].When U and W are some special subgraphs, few results are known. As mentioned earlier Krivelevichconsidered the case where U = K , W = K , s = 4, and t = o ( n / log / n ). In addition, he also showedconstructions for arbitrary U , but when W = K t [31]. Alon and Krivelevich in [7] and Krivelevich in[31] considered a Ramsey-type bound R ′ ( K s , K r ( t )): the smallest number ℓ such that for any n ≥ ℓ , anygraph on n vertices either contains a K s or there exists a set of t vertices containing a K r . When r = 2, R ( K s , K t ) = R ′ ( K s , K ( t )). However, to the best of our knowledge, no general algorithmic result avoidingany subgraph U and W on any set of s and t vertices respectively is known till date.18riefly, the idea is as follows. Suppose, as in the typical existence-proofs for such graphs, we are able to showusing the (asymmetric) LLL that for a suitable p = p ( n ), the random graph with n vertices and independentedge-probability p , satisfies all the required properties with positive probability. Theorem 3.4 will typicallyimmediately apply if we allow an exponential ǫ -slack. When s is fixed, another related approach is to applyTheorem 3.3 and to only verify the events that correspond to s -sized subgraphs; since s is fixed, these can beenumerated and verified efficiently. Note that as pointed out in [10], the LLL may be much more significantin improving the bounds with fixed s and this is generally the case of interest while applying the LLL basedarguments. Given a graph G , an acyclic edge coloring is a proper edge coloring of G where each cycle receives more than2 colors. In a proper edge coloring, no two incident edges receive the same color. In addition, here we requirethat no cycle receives only 2 colors. The goal is to use a minimum number of colors (known as the acyclicchromatic number a ( G )) and obtain an acyclic edge coloring. The concept of a ( G ) was introduced way backin 1973 [28] and has been studied by a series of researchers over the years [8, 37, 11, 28, 42]. In all theseworks, the asymmetric LLL is applied to achieve the best non-constructive bounds. Thus an algorithmicversion of the local lemma strikes as the first choice to obtain an acyclic edge coloring.Alon, McDiarmid and Reed [8] showed that a ( G ) < randomly color each edge from a pool of colors { , , . . . , C } . They define a series ofbad events, where Type 1 bad event corresponds to two incident edges e, f receiving the same color and
Type k bad event implies a cycle of length 2 k getting 2 colors. A cycle of odd length automatically gets 3 colors,if the coloring is proper. Note that the number of Type k events for non-constant k is super-polynomialin the number of edges of G . The probability of Type 1 event is 1 /C and the probability of Type k eventis 1 /C k − . Let ∆ be the maximum degree of G . It is now an easy exercise to verify that each Type k event depends on at most 4 k ∆ Type 1 events and 2 k ∆ l − Type l events. With this dependency, setting C = 16∆, x e = C for each edge e and x k = (cid:0) C (cid:1) k − for each cycle of length 2 k satisfies the asymmetricLLL condition 1.1. We can turn this proof to an algorithm using 16∆ colors as a direct corollary of Theorem3.1. Theorem 7.1.
There is a randomized algorithm that produces a valid acyclic coloring of any graph with n edges and maximum degree ∆ in expected polynomial time using colors.Proof. Recall from Section 1.3 that δ needs to be at least as high as the smallest upper bound on a probabilityof a bad event which is 1 /C k − for an event of Type k . This gives that log 1 /δ is at most O ( n log ∆).Further, it is easy to check for a violated bad event in O (∆ n ) time: consider subgraphs on every pair ofcolors and check if there is a cycle in it. Therefore Theorem 3.1 directly applies (we can set ǫ = 0 in it) andwe get that the expected running time is O (∆ n ) · n · O ( n log ∆) = O ( n ∆ log ∆). Note that this is far fromtight. We can, for example, exploit that there is already a ǫ -slack in the analysis to get a smaller number ofresamplings from Theorem 3.1.Whereas this gives an efficient way to obtain acyclic edge coloring using 16∆ colors and thus matching thebound known non-constructively so far; the conjectured bound for a ( G ) is ∆ + 2. Alon, Sudakov and Zaksshowed indeed the conjecture is true for graphs having girth Ω(∆ log ∆) [11]. Their algorithm can be madeconstructive using Beck’s technique [17] to obtain an acyclic edge coloring using ∆ + 2 colors, albeit forgraphs with girth significantly larger than Θ(∆ log ∆). We bridge this gap by providing constructions toachieve the same girth bound as in [11], yet obtaining an acyclic edge coloring with only ∆ + 2 colors.19he proof of Alon, Sudakov and Zaks again relies on the asymmetric LLL, but their procedure for randomcoloring is different from [8, 37]. They first perform a proper coloring of the edges of G using ∆ + 1 colors[48]. Next each edge is switched to color ∆ + 2 with probability 1 / e, f are colored with (∆ + 2)th color. Type 2 events correspondto the case where no edge of a previously bichromatic cycle switches its coloring to the (∆ + 2)th color. Type3 events correspond to the case where a cycle with half of its edges (every other edge) having the same colorafter the first step, receives (∆ + 2)th color on half of its remaining edges resulting in a bichromatic cycle.It is sufficient to avoid these three types of events to ensure an acyclic edge coloring. It has been shownin [11] that by setting the values of x variables to 1 / , 1 / and 1 / k ∆ k for events of Type 1, 2,and 3 (with cycles of length 2 k ), conditions of the asymmetric LLL (Theorem 1.1) satisfy. Converting thisnon-constructive proof into an algorithm using our method is an easy exercise. We state the theorem below,whose proof is similar to Theorem 7.1 above, and is left as an exercise. Theorem 7.2.
There is a randomized algorithm that produces a valid acyclic coloring in expected polynomialtime using (∆ + 2) colors for graphs having girth
Ω(∆ log ∆) . Further non-asymptotic results are known for graphs with sufficiently large girth. Muthu, Narayanan andSubramanian showed a ( G ) <
6∆ for graphs with girth at least 9 and a ( G ) < .
52∆ for graphs with girth atleast 220 [42]. Their proofs can also be made constructive using essentially the same proof of Theorem 7.1.
This section sketches another application of using the properties of the conditional LLL-distribution intro-duced in Section 2 in a slightly different way. While all results presented so far rely on a union bound overevents in the LLL-distribution, we use here the linearity of expectation for further probabilistic analysis ofevents in the LLL-distribution. This already leads to new non-constructive results. Similar to the otherproofs involving the LLL-distribution in this paper, this upper bound can be made constructive using The-orem 2.2. Considering that the LLL-distribution approximately preserves other quantities such as highermoments, we expect that there is much more room to use more sophisticated probabilistic tools like concen-tration bounds to give both new non-constructive and constructive existence proofs of discrete structureswith additional strong properties.The setting we wish to concentrate on here is when a set of bad events is given from which not necessarilyall but as many as possible events are to be avoided. The exemplifying application is the well knownMAX- k -SAT problem which in contrast to k -SAT asks not for a satisfying assignment of a k -CNF formulabut for an assignment that violates as few clauses as possible. Given a k -CNF formula with m clauses arandom assignment to its variables violates each clause with probability 2 − k and thus using linearity ofexpectation it is easy to find an assignment that violates at most m − k clauses. If on the other hand eachclause shares variables with at most 2 k /e − k /e ?Lemma 8.1 shows that a better assignment can be constructed if it is possible to find a sparsely connectedsub-formula that satisfies the LLL-condition. Lemma 8.1.
Suppose F is a k -CNF formula in which there exists a set of core clauses C with the propertythat: (i) every clause in C shares variables with at most d ≤ k /e − clauses in C , and (ii) every clause in C shares variables with at most γ (2 k /e − many clauses in C , for some γ ≥ . Let n and m denote the totalnumber of variables and clauses in F , respectively. Then, for any θ ≥ / poly( n, m ) , there is a randomized poly( n, m ) -time algorithm that produces, with high probability, an assignment in which all clauses in C aresatisfied and at most an (1 + θ )2 − k e γ fraction of clauses from C are violated. (If we are content withsuccess-probability ρ − n − c for some constant c , then there is also a randomized algorithm that runs in time n, | C | ) , satisfies all clauses in C , and violates at most n (1 /ρ ) · − k e γ fraction of clauses from C . Thiscan be useful if | C | ≪ m .)Proof. Briefly, the idea above is as follows. Suppose we do the obvious random assignment to the variables:each is set to “True” or “False” uniformly at random and independently. For any clause C i , let A i be thebad event that it is violated in such an assignment. It is well-known that we can take x ( A i ) = e/ k for all C i ∈ C , and avoid all of these events with positive probability: this can be made constructive using MT.Suppose we run the MT algorithm for up to n c times its expected number of resamplings. By Markov’sinequality, the probability of MT not terminating by then is at most n − c . Furthermore, the probability thatat the end of this process, some clause C i ∈ C is violated can be bounded (using part (ii) of Theorem 2.2)by the following: 2 − k · (1 − e/ k ) − γ (2 k /e − ≤ e γ · − k . Thus, the expected fraction of clauses from C that are violated in the end, is at most 2 − k e γ . Markov’sinequality and a union bound (for a sufficiently large choice of c ) complete the proof.Along these lines we aim to develop a general result that can be applied in cases where the number ofdependencies are (slightly) beyond the LLL-threshold. For this, suppose we have a system of independentrandom variables P = { P , P , . . . , P n } and bad events A = { A , A , . . . , A m } with dependency graph G = G A as in the introduction. Let us consider the symmetric case in which Pr[ A i ] ≤ p for each i . Againthere are only two types of constructive results known in general, in terms of only allowing a “small” numberof the bad events to happen. It is easy to have only about mp of the A i happen – without any assumptionsabout G – just by using the linearity of expectation. On the other hand, if the maximum degree of G is at most 1 / ( ep ) −
1, the conditions of the symmetric LLL and the algorithm of [40], guarantee that wecan efficiently ensure that none of the A i happen. Interpolating between these two extremes Theorem 8.3characterizes the fraction λ of events that can be avoided if the maximum degree of G is by a factor of α > / ( ep ) −
1. To the best of our knowledge virtually nothing was known(even non-constructively) in this setting. Theorem 8.3 is obtained by using the probabilistic method toconstruct a sparsely connected core that satisfies the LLL-conditions with a sufficiently large gap. Using thelinearity of expectation in the analysis of the LLL-distribution with respect to this core the existence of agood assignment can be proven:
Definition 8.2.
For any α ≥ , let λ ( α ) be the smallest number satisfying the following:For any setting of the standard form “variables P and bad events A ” in which(a) the probability of any event A ∈ A is at most p = o (1) and(b) the maximum degree in the variable-sharing (dependency) graph is at most d = α (1 / ( ep ) − ,there exists an assignment to variables in P that violates at most (1 + o (1)) λ ( α ) · mp events. Theorem 8.3.
The fraction λ ( α ) is upper bounded as follows: • ∀ α ≤ λ ( α ) = 0 ; • ∀ < α < e : λ ( α ) ≤ e ln( α ) /α < ; • ∀ α > e : λ ( α ) = 1 .Proof. If α ≤ α > e , then consider m = 1 /p events given by a single random variable X which isuniformly distributed in { , , . . . , /p } ; the i th event holds iff X = i . We have d = 1 /p − mp if α > e .For our main case where the constant α satisfies 1 < α < e , we employ the probabilistic method to firstdetermine a core subset of the bad events, and then apply Theorem 2.2. We give a proof sketch here. Since21 = o (1), d = ω (1). We will pick a suitable ǫ = o (1) and an appropriate constant β ≥ d can be assumedsufficiently larger than β since d = ω (1). Choose a random subset A ′ of the events A i by choosing eachevent independently with probability (1 − ǫ ) /αβ and then eliminate all events from A that have more than d/ ( αβ ) neighbors in A ′ . The Chernoff bound shows that with probability 1 − exp( − Ω( dǫ /β )) (which is1 − o (1) for a suitable choice of ǫ and β ), the core A ′ has at least an (1 − ǫ ) αβ fraction of the events, and atmost an exp( − dǫ /β ) = o ( p ) fraction of the events get eliminated from A . Therefore, there is a core A ′ ofsize at least (1 − o (1)) m/ ( αβ ) to which all events that are not eliminated (a fraction of (1 − o ( p )) events)have at most d/ ( αβ ) neighbors. If we take x A ∼ γα/d for all A ∈ A ′ for a suitable γ ∈ [0 ,
1] then the core A ′ satisfies the LLL-conditions. This is the case if γ satisfies α/ ( ed ) < ( γα/d ) · (1 − γα/d ) d/ ( βα ) ;i.e., 1 /e < γe − γ/β (9)suffices for d large enough.Now we can apply Theorem 2.1 to obtain bounds on the probability of events in the conditional LLL-distribution D that avoids all events in A ′ . It implies that a random assignment that avoids all core-eventsin A ′ makes an event A ∈ ( A \ A ′ ) that is not eliminated true with probability at most Pr D [ A ] = Pr [ A ] / (1 − γα/d ) − d/ ( αβ ) ∼ e γ/β · p. (10)By the linearity of expectation applied to all “non-core” events A ∈ ( A \ A ′ ) and using (10), the expectedtotal number of events A i that happen in such an assignment is at most(1 + o (1)) · mp · (1 − / ( αβ )) · e γ/β , (11)assuming that (9) holds and that ǫ = o (1) can be chosen suitably. From (9), we can take 1 /β = (1 − o (1)) · (1 + ln( γ )) /γ . Plugging this into (11), we see that the optimal choice of γ is 1 /α . (Any choice of ǫ = o (1)that satisfies exp( − dǫ /β ) = o ( p ), will suffice for this argument.) Substituting these choices into (11) yieldsthe theorem. Theorem 8.4.
Theorem 8.3 can be made constructive for any α > and any efficiently verifiable A (theverification in this case is allowed to take poly( n, m ) time) that satisfies the conditions from Definition 8.2.That is, there is a poly( n, m ) -time randomized algorithm to set values for the variables in P , such that theexpected number of events A i that hold is at most (1 + o (1)) λ ( α ) · mp .Proof. If α ≤ α ≥ e a random assignment suffices.For 1 < α < e we make the proof from Theorem 8.3 constructive. For it suffices to see that the successprobability of the random experiment that creates the core can be made arbitrarily high by choosing ǫ and β accordingly. This makes the probabilistic method used there directly constructive. Finally we use againour main Theorem 2.2 from Section 2 which states that the MT algorithm can be used to efficiently samplethe LLL-distribution used in the proof of Theorem 8.3. We simply output the assignment which is producedby the MT algorithm in time poly( n, m ) and the proof of Theorem 8.3 guarantees that the expected numberof violated bad events in this assignment is at most (1 + o (1)) λ ( α ) · mp as desired. Remark.
Interestingly, we can use our LLL framework instead to construct the core in the proof of Theo-rem 8.3. This gives a larger core than what is obtained by uniform random core selection and thus slightlysharper results. Briefly, the idea is as follows. For parameters α , β , and γ that are similar to those in thatproof, we start with essentially the same random process for constructing the core: for a sufficiently largeconstant c (even something slightly smaller than 3 will suffice), we set ǫ = c p (ln d ) /d , and place each event22 i in the core independently with probability θ = (1 − ǫ ) / ( αβ ). Letting X i denote the (random) number ofneighbors that A i has in the core, note that the expected value of X i is at most dθ = (1 − ǫ ) d/ ( αβ ). Nowconsider the system of bad events C i (one corresponding to each original event A i ): C i is the event that X i > d/ ( αβ ). Note that each C i depends on at most d others; a Chernoff bound shows that for c largeenough, Pr[ C i ] ≤ / ( ed ). Thus, the LLL shows that there exists a choice of the core that avoids all the C i ,with a common value x = x ( C i ) for all i such that x = Θ(1 /d ); run the Moser-Tardos algorithm on thesystem of bad events C i to efficiently get a core that avoids all of the C i , and let N i be the indicator randomvariable for i not belonging to the core at the end of this run. We now apply Theorem 2.1 to upper-boundthe expected size P i Pr[ N i = 1] of the non-core. Since N i depends on at most 1 + d + d ( d −
1) = d + 1events C i , Theorem 2.1 givesPr[ N i = 1] ≤ − θ (1 − x ) d +1 = (1 + O (1 /d )) · (1 − θ ) . Thus the expected size of the non-core is at most (1 + O (1 /d )) · mp · (1 − θ ) after the above run of Moser-Tardos, similar to what the alteration argument in the proof of Theorem 8.3 gives. We can now proceed(with one more run of Moser-Tardos) as in the proof of Theorem 8.3. Acknowledgments:
We thank Nikhil Bansal for helpful clarifications about the Santa Claus problem. Thefirst author thanks Michel Goemans and Ankur Moitra for discussing the Santa Claus problem with himin an early stage of this work. Our thanks are due to Bill Gasarch and Mohammad Salavatipour for theirhelpful comments. We also acknowledge the referees for their valuable suggestions.
References [1] N. Alon. A parallel algorithmic version of the Local Lemma.
Random Structures & Algorithms , 2:367–378, 1991.[2] N. Alon. Explicit Ramsey graphs and orthonormal labelings.
The Electronic Journal of Combinatorics ,1:12–8, 1994.[3] N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximalindependent set problem.
Journal of Algorithms , 7:567–583, 1986.[4] N. Alon and J. Grytczuk. Breaking the rhythm on graphs.
Discrete Mathematics , 308(8):1375–1380,2008.[5] N. Alon, J. Grytczuk, M. Haluszczak, and O. Riordan. Nonrepetitive colorings of graphs.
RandomStructures & Algorithms , 21(3-4):336–346, 2002.[6] N. Alon, G. Gutin, E. J. Kim, S. Szeider, and A. Yeo. Solving MAX-r-SAT above a tight lower bound.In
SODA ’10: Proceedings of the 21th Annual ACM-SIAM Symposium on Discrete Algorithms , 2010.[7] N. Alon and M. Krivelevich. Constructive bounds for a Ramsey-type problem. In
Graphs and Combi-natorics 13 , pages 217–225, 1997.[8] N. Alon, C. McDiarmid, and B. Reed. Acyclic coloring of graphs.
Random Structures & Algorithms ,2(3):277–288, 2007.[9] N. Alon and P. Pudlak. Constructive lower bounds for off-diagonal Ramsey numbers.
Israel Journal ofMathematics , 122:243–251, 1999.[10] N. Alon and J. H. Spencer.
The Probabilistic Method, Third Edition . John Wiley & Sons, Inc., 2008.2311] N. Alon, B. Sudakov, and A. Zaks. Acyclic edge colorings of graphs.
Journal of Graph Theory , 37(3):157–167, 2001.[12] M. Andrews. Approximation algorithms for the edge-disjoint paths problem via Raecke decompositions.In
FOCS , pages 277–286, 2010.[13] A. Asadpour, U. Feige, and A. Saberi. Santa claus meets hypergraph matchings. In
APPROX ’08 /RANDOM ’08: Proceedings of the 11th international workshop, APPROX 2008, and 12th internationalworkshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization , pages10–20, 2008.[14] A. Asadpour and A. Saberi. An approximation algorithm for max-min fair allocation of indivisiblegoods. In
STOC ’07: Proceedings of the 39th annual ACM Symposium on Theory of Computing , pages114–121, 2007.[15] N. Bansal and M. Sviridenko. The Santa Claus problem. In
STOC ’06: Proceedings of the 38th annualACM Symposium on Theory of Computing , pages 31–40, 2006.[16] M. Bateni, M. Charikar, and V. Guruswami. Maxmin allocation via degree lower-bounded arborescences.In
STOC ’09: Proceedings of the 41st annual ACM Symposium on Theory of Computing , pages 543–552,2009.[17] J. Beck. An algorithmic approach to the Lov´asz Local Lemma.
Random Structures & Algorithms ,2(4):343–365, 1991.[18] I. Bez´akov´a and V. Dani. Allocating indivisible goods.
SIGecom Exch. , 5(3):11–18, 2005.[19] B. Bresar, J. Grytczuk, S. Klavzar, S. Niwczyk, and I. Peterin. Nonrepetitive colorings of trees.
DiscreteMathematics , 307(2):163–172, 2007.[20] D. Chakrabarty, J. Chuzhoy, and S. Khanna. On allocating goods to maximize fairness. In
FOCS ’09:50th Annual IEEE Symposium on Foundations of Computer Science , 2009.[21] K. Chandrasekaran, N. Goyal, and B. Haeupler. Deterministic Algorithms for the Lov´asz Local Lemma.In
Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’10,pages 992–1004, 2010.[22] J. D. Currie. Pattern avoidance: themes and variations.
Theor. Comput. Sci. , 339(1):7–18, 2005.[23] A. Czumaj and C. Scheideler. Coloring non-uniform hypergraphs: A new algorithmic approach to thegeneral Lov´asz local lemma. In
SODA ’00: Proceedings of the 11th Annual ACM-SIAM Symposium onDiscrete Algorithms , pages 30–39, 2000.[24] P. Erd˝os and L. Lov´asz. Problems and results on 3-chromatic hypergraphs and some related questions.In
Infinite and Finite Sets , volume 11 of
Colloq. Math. Soc. J. Bolyai , pages 609–627. North-Holland,1975.[25] P. Erd˝os. Some remarks on the theory of graphs.
Bulletin of the American Mathematical Society ,53:292–294, 1947.[26] U. Feige. On allocations that maximize fairness. In
SODA ’08: Proceedings of the 19th annual ACM-SIAM Symposium on Discrete Algorithms , pages 287–293, 2008.[27] U. Feige. On estimation algorithms vs approximation algorithms. In R. Hariharan, M. Mukund, andV. Vinay, editors,
FSTTCS , volume 2 of
LIPIcs , pages 357–363. Schloss Dagstuhl - Leibniz-Zentrumfuer Informatik, 2008.[28] B. Gr¨unbaum. Acyclic colorings of planar graphs.
Israel Journal of Mathematics , 14:390–408, 1973.2429] J. Grytczuk. Thue type problems for graphs, points, and numbers.
Discrete Mathematics , 308(19):4419–4429, 2008.[30] P. Haxell. A condition for matchability in hypergraphs.
Graphs and Combinatorics , 11(3):245–248,1995.[31] M. Krivelevich. Bounding Ramsey numbers through large deviation inequalities.
Random Structures &Algorithms , 7(2):145–155, 1995.[32] A. K¨undgen and M. J. Pelsmajer. Nonrepetitive colorings of graphs of bounded tree-width.
DiscreteMathematics , 308(19):4473–4478, 2008.[33] F. T. Leighton, B. M. Maggs, and S. B. Rao. Packet routing and jobshop scheduling in O (congestion+ dilation) steps. Combinatorica , 14:167–186, 1994.[34] J. K. Lenstra, D. B. Shmoys, and E. Tardos. Approximation algorithms for scheduling unrelated parallelmachines.
Mathematical Programming , 46:259–271, 1990.[35] M. Luby. A simple parallel algorithm for the maximal independent set problem.
SIAM Journal ofComputing , 15(4):1036–1053, 1986.[36] D. Marx and M. Schaefer. The complexity of nonrepetitive coloring.
Discrete Appl. Math. , 157(1):13–18,2009.[37] M. Molloy and B. Reed. Further algorithmic aspects of the Local Lemma. In
STOC ’98: Proceedingsof the 30th annual ACM Symposium on Theory of Computing , pages 524–529, 1998.[38] M. Molloy and B. Reed.
Graph Colouring and the Probabilistic Method . Springer-Verlag, 2001.[39] R. Moser. A constructive proof of the Lov´asz Local Lemma. In
STOC ’09: Proceedings of the 41stannual ACM Symposium on Theory of Computing , pages 343–350, 2009.[40] R. Moser and G. Tardos. A constructive proof of the general Lov´asz Local Lemma.
Journal of theACM , 57(2):1–15, 2010.[41] R. A. Moser. Derandomizing the Lov´asz Local Lemma more effectively.
CoRR , abs/0807.2120, 2008.[42] R. Muthu, N. Narayanan, and C. R. Subramanian. Improved bounds on acyclic edge colouring.
DiscreteMathematics , 307(23):3063–3069, 2007.[43] J. H. S. R.L. Graham, B.L. Rothschild.
Ramsey Theory, Second Edition . Wiley, 1990.[44] B. Saha and A. Srinivasan. A new approximation technique for resource-allocation problems. In
ICS’10: Proceedings of the first annual Symposium on Innovations in Computer Science , pages 342–357,2010.[45] M. Schaefer and C. Umans. Completeness in the polynomial-time hierarchy: A compendium.
SIGACTNews , 33(3):32–49, Sept. 2002.[46] A. Srinivasan. Improved algorithmic versions of the Lov´asz Local Lemma. In
SODA ’08: Proceedingsof the 19th annual ACM-SIAM Symposium on Discrete algorithms , pages 611–620, 2008.[47] A. Thue. ¨Uber unendliche Zeichenreihen.
Norske Vid Selsk. Skr. I. Mat. Nat. Kl. Christiana , 7:1–22,1906.[48] V. G. Vizing. On an estimate of the chromatic class of a p graph (in russian).