PPZ For More Than Two Truth Values - An Algorithm for Constraint Satisfaction Problems
aa r X i v : . [ c s . D S ] O c t PPZ For More Than Two Truth Values – AnAlgorithm for Constraint Satisfaction Problems
Dominik Scheder
Theoretical Computer Science, ETH Z¨urichCH-8092 Z¨urich, Switzerland [email protected]
November 5, 2018
Abstract.
We analyze the so-called ppz algorithm for ( d, k )-CSP prob-lems for general values of d (number of values a variable can take) and k (number of literals per constraint). To analyze its success probability,we prove a correlation inequality for submodular functions. Consider the following extremely simple randomized algorithm for k -SAT: Picka variable uniformly at random and call it x . If the formula F contains theunit clause ( x ), set x to 1. If it contains (¯ x ), set it to 0. It if contains neither,set x uniformly at random (and if it contains both unit clauses, give up). Thisalgorithm has been proposed and analyzed by Paturi, Pudl´ak, and Zane [4] andis called ppz.The idea behind analyzing its success probability can be illustrated nicelyif we assume, for the moment, that F has a unique satisfying assignment α setting all variables to 1. Switching a variable it from 1 to 0 makes the formulaunsatisfied. Therefore, there is a clause C x = ( x ∨ ¯ y ∨· · ·∨ ¯ y k − ). With probability1 /k , the algorithm picks and sets y , . . . , y k − before picking x . Supposed they y j have been set correctly (i.e., to 1), the clause C x is now reduced to ( x ),and therefore x is also set correctly. Intuitively, this shows that on average, thealgorithm has to guess (1 − /k ) n variables correctly and can infer the correctvalues of the remaining n/k variables. This increases the success probability ofthe algorithm from 2 − n (simple stupid guessing) to 2 − n (1 − /k ) .In this paper we generalize the sketched algorithm to general constraint satis-faction problems, short CSPs. These are a generalization of boolean satisfiabilityto problems involving more than two truth values. A set of n variables x , . . . , x n is given, each of which can take a value from [ d ] := { , . . . , d } . Each assignmentto the n variables can be represented as an element of [ d ] n . A literal is an ex-pression of the form ( x i = c ) for some c ∈ [ d ]. A CSP formula consists of aconjunction (AND) of constraints , where a constraint is a disjunction (OR) ofliterals. We speak of ( d, k )-CSP formula if each constraint consists of at most k literals. Finally, ( d, k )-CSP is the problem of deciding whether a given ( d, k )-CSP formula has a satisfying assignment. Note that (2 , k )-CSP is the same as k -SAT. Also ( d, k )-CSP is well-known to be NP-complete, unless d = 1, k = 1, or d = k = 2. We can manipulate a CSP formula F by permanently substituting avalue c for a variable x . This means we remove all satisfied constraints, i.e., thosecontaining a literal ( x = c ′ ) for some c ′ = c , and from the remaining constraintsremove the literal ( x = c ), if present. We denote the resulting formula by F [ x c ] .It is obvious how to generalize the algorithm to ( d, k )-CSP problems. Againwe process the variables in a random order. When picking x , we collect all unitconstraints of the form ( x = c ) and call the value c forbidden . Values in [ d ]which are not forbidden are called allowed , and we set x to a value that wechoose uniformly at random from all allowed values. How can one analyze thesuccess probability? Let us demonstrate this for d = k = 3. Suppose F hasexactly one satisfying assignment α = (1 , . . . , x from 1 to 2 or to 3 makes F unsatisfied, we find critical constraints ( x = 2 ∨ y = 1 ∨ z = 1)( x = 3 ∨ u = 1 ∨ v = 1)If all variables y, z, u, v are picked before x , then there is only one allowed valuefor x left, namely 1, and with probability 1, the algorithm picks the correctvalues. If y, z come before x , but at least one of u or v come after x , then it ispossible that the values 1 and 3 are allowed, and the algorithm picks the correctvalue with probability 1 /
2. In theory, we could list all possible cases and computetheir probability. But here comes the difficulty: The probability of all variables y, z, u, v being picked before x depends on whether these variables are distinct!Maybe y = u , or z = v ... For general d and k , we get d − C := ( x = 2 ∨ y (2)1 = 1 ∨ · · · ∨ y (2) k − = 1) C := ( x = 3 ∨ y (3)1 = 1 ∨ · · · ∨ y (3) k − = 1) . . . (1) C d := ( x = d ∨ y ( d )1 = 1 ∨ · · · ∨ y ( d ) k − = 1) . We are interested in the distribution of the number of allowed values for x .However, the above constraints can intersect in complicated ways, since we haveno guarantee that the variables y ( c ) j are distinct. Our main technical contributionis a sort of correlation lemma showing that in the worst case, the y ( c ) j are indeeddistinct, and therefore we can focus on that case, which we are able to analyze. Previous Work
Feder and Motwani [1] were the first to generalize the ppz-algorithm to CSPproblems. In their paper, they consider ( d, d values, and every constraint has at most two literals. In thiscase, the clauses C , . . . , C d cannot form complex patterns. Feder and Motwani show that the worst case happens if (i) the variables y (2)1 , . . . , y ( d )1 are pairwisedistinct and (ii) the CSP formula has a unique satisfying assignment. However,their proofs do not directly generalize to higher values of k .Recently, Li, Li, Liu, and Xu [2] analyzed ppz for general CSP problems (i.e., d, k ≥ x : When ppz processes x , then either(i) all d values are allowed, or (ii) at least one value is forbidden. In case (ii), ppzchooses one value randomly from at most d − d − n . However, the authors ignore the case that two, three, or more valuesare forbidden and lump it together with case (ii). Therefore, their analysis doesnot capture the full power of ppz. Our Contribution
Our contribution is to show that “everything works as expected”, i.e., that inthe worst case all variables y ( c ) j in (1) are distinct and the formula has a uniquesatisfying assignment. For this case, we can compute (or at least, bound frombelow) the success probability of the algorithm. Theorem 1.1.
For d, k ≥ , define G ( d, k ) := d − X j =0 log (1 + j ) (cid:18) d − j (cid:19) Z (1 − r k − ) j ( r k − ) d − − j dr . Then there is a randomized algorithm running in polynomial time which, given a ( d, k ) -CSP formula over n variables, returns a satisfying assignment with prob-ability at least − nG ( d,k ) . The algorithm we analyze in this paper is not novel. It is a straightforwardgeneralization of the ppz algorithm to CSP problems with more than two truthvalues. However, its analysis is significantly more difficult than for d = 2 (andalso more difficult than for large d and k = 2, the case Feder and Motwani [1]investigated). Comparison
We compare the success probability of Sch¨oning’s random walk algorithm withthat of ppz. For ppz, we state the bound given by Li, Li, Liu, and Xu [2] and bythis paper. All bounds are approximate and ignore polynomial factors.( d, k ) Sch¨oning [5] Li, Li, Liu, and Xu [2] this paper(2 ,
3) 1 . − n . − n . − n (3 ,
3) 2 − n . − n . − n (5 ,
4) 3 . − n .
73 3 . − n (6 ,
4) 4 . − n . − n . − n For small values of d , in particular for the boolean case d = 2, Sch¨oning’srandom walk algorithm is much faster than ppz, but ppz overtakes Sch¨oningalready for moderately large values of d and thus is, to our knowledge, thecurrently fastest algorithm for ( d, k )-CSP. The algorithm itself is simple. It processes the variables x , . . . , x n accordingto some random permutation π . When the algorithm processes the variable x ,it collects all unit constraints of the form ( x = c ) and calls c forbidden . Atruth value c that is not forbidden is called allowed . If the formula is satisfiablewhen the algorithm processes x , there is obviously at least one allowed value.The algorithm chooses uniformly at random an allowed value c and sets x to c , reducing the formula. Then it proceeds to the next variable. For technicalreasons, we think of the permutation π as part of the input to the algorithm,and sampling π uniformly at random from all n ! permutations before callingthe algorithm. The algorithm is described formally in Algorithm 1. To analyze Algorithm 1 ppz ( F : a ( d, k )-CSP formula over variables V := { x , . . . , x n } , π :a permutation of V ) α := the empty assignment2: for i = 1 , . . . , n do x := x π ( i ) S ( x, π ) := { c ∈ [ d ] | ( x = c ) F } if S ( x, π ) = ∅ then return failure end if b ← u . a . r . S ( x, π )9: α := α ∪ [ x b ]10: F := F [ x b ] end for if α satisfies F then return α else return failure end if the success probability of the algorithm, we can assume that F is satisfiable,i.e. the set sat( F ) of satisfying assignments is nonempty. This is because if F is unsatisfiable, the algorithm always correctly returns failure . For a fixedsatisfying assignment, we will bound the probabilityPr[ ppz ( F, π ) returns α ] , (2) where the probability is over the choice of π and over the randomness used by ppz . The overall success probability is given byPr[ ppz ( F, π ) is successful] = X α ∈ sat V ( F ) Pr[ ppz ( F, π ) returns α ] . (3)In the next section, we will bound (2) from below. The bound depends on thelevel of isolatedness of α : If α has many satisfying neighbors, its probability to bereturned by ppz decreases. However, the existence of many satisfying assignmentswill in turn increase the sum in (3). In the end, it turns out that the worst casehappens if F has a unique satisfying assignment. Observe that for the ppz -algorithm in the boolean case [4], the unique satisfiable case is also the worstcase, whereas for the improved version ppsz [3], it is not, or at least not knownto be. In this section, fix a satisfying assignment α . For simplicity, assume that α =(1 , . . . , ppz returns α ? For a permutation π and a variable x , let β be the partial truth assignmentobtained by restricting α to the variables that come before x in π , and define S ( x, π, α ) := { c ∈ [ d ] | ( x = c ) F [ β ] } . In words, we process the variables according to π and set them according to α ,but stop before processing x . We check which truth values are not forbidden for x by a unit constraint, and collect theses truth values in the set S ( x, π, α ). Letus give an example: Example.
Let d = 3 , k = 2, and α = (1 , . . . , F = ( x = 2 ∨ y = 1) ∧ ( x = 3 ∧ z = 1) . For π = ( x, y, z ), no value is forbidden when processing x , thus S ( x, π, α ) = { , , } . For π ′ = ( y, x, z ), then we consider the partial assignment that sets y to 1, obtaining F [ y = ( x = 2) ∧ ( x = 3 ∨ z = 1) , and S ( x, π ′ , α ) = { , } . Last, for π ′′ = ( y, z, x ), then we set y and z to 1,obtaining F [ y ,z = ( x = 2) ∧ ( x = 3) , thus S ( x, π ′′ , α ) = { } . (cid:3) Observe that S ( x, π, α ) is non-empty, since α ( x ) ∈ S ( x, π, α ), i.e. the value α assigns to x is always allowed. What has to happen in order for the algorithm to return α ? In every step of ppz , the value b selected in Line 8 for variable x must be α ( x ). Assume now that this was the case in each of the first i steps ofthe algorithm, i.e., the variables x π (1) , . . . , x π ( i ) have been set to their respectivevalues under α . Let x = x π ( i +1) be the variable processed in step i + 1. Theset S ( x, π, α ) coincides with the set S ( x, π ) of the algorithm, and therefore x isset to α ( x ) with probability 1 / | S ( x, π, α ) | . Since this holds in every step of thealgorithm, we conclude that for a fixed permutation π ,Pr[ ppz ( F, π ) returns α ] = Y x ∈ V | S ( x, π, α ) | . For π being chosen uniformly at random, we obtainPr[ ppz ( F, π ) returns α ] = E π " n Y x ∈ V | S ( x, π, α ) | . The expectation of a product is an uncomfortable term if the factors are notindependent. The usual trick in this context is to apply Jensen’s inequality,hoping that we do not lose too much.
Lemma 3.1 (Jensen’s Inequality).
Let X be a random variable and f : R → R a convex function. Then E [ f ( X )] ≥ f ( E [ X ]) , provided both expectations exist. We apply Jensen’s inequality with the convex function being f : x − x and the random variable being X = P x ∈ V log | S ( x, π, α ) | . With this notation, f ( X ) = Q nx ∈ V | S ( x,π,α ) | , the expectation of which we want to bound from below. E " Y x ∈ V | S ( x, π, α ) | = E h − P x ∈ V log | S ( x,π,α ) | i ≥ E [ − P x ∈ V log | S ( x,π,α ) | ] (4)= 2 − P x ∈ V E [log | S ( x,π,α ) | ] . Proposition 3.2.
Pr[ ppz ( F, π ) returns α ] ≥ − P x ∈ V E [log | S ( x,π,α ) | ] .Example: The boolean case. In the boolean case, the set S ( x, π, α ) is either { } or { , } , and thus the logarithm is either 0 or 1. Therefore, the term E [log | S ( x, π, α ) | ] is the probability that the value of x is not determined by aunit clause, and thus has to be guessed.So far the calculations are exactly as in the boolean ppz . This will not staythat way for long. In the boolean case, there are only two cases: Either the valueof x is determined by a unit clause (in which we call x forced ), or it is not. For d ≥
3, there are more cases: The set of potential values for x can be the full range[ d ], it can be just the singleton { } , but it can also be anything in between, andeven if the algorithm cannot determine the value of x by looking at unit clauses,it will still be happy if at least, say, d/ E [log | S ( x, π, α ) | ] In this section we prove an upper bound on E [log | S ( x, π, α ) | ]. We assumewithout loss of generality that α = (1 , . . . , d truth assignments α , . . . , α d agreeing with α on the variables V \ { x } : For a value c ∈ [ d ] we define α c := α [ x c ], i.e., we change the value it assignment to x to c , but keepall other variables fixed. Clearly, α = α . The number of assignments among α , . . . , α d that satisfy F is called the looseness of α at x , denoted by ℓ ( α, x ) . Since α = α satisfies F , the looseness of α at x is at least 1, and since thereare d possible values for x , the looseness is at most d . Thus 1 ≤ ℓ ( α, x ) ≤ d .If α is the unique satisfying assignment, then ℓ ( α, x ) = 1 for every x . Notethat α being unique is sufficient, but not necessary: Suppose α = (1 , . . . ,
1) and α ′ = (2 , , , , . . . ,
1) are the only two satisfying assignments. Then ℓ ( α, x ) = ℓ ( α ′ , x ) = 1 for every variable x .Why are we considering the looseness ℓ of α at x ? Suppose without loss ofgenerality that the assignments α , . . . , α ℓ satisfy F , whereas α ℓ +1 , . . . , α d donot. The set S ( x, π, α ) is a random object depending on π , but one thing is sure:for all c = 1 , . . . , ℓ ( α, x ) : c ∈ S ( x, π, α ) . For ℓ ( α, x ) < c ≤ d , what is the probability that c ∈ S ( x, π, α )? Since α c doesnot satisfy F , there must be a constraint in F that is satisfied by α but not by α c . Since α and α c disagree on x only, that constraint must be of the followingform: ( x = c ∨ y = 1 ∨ y = 1 ∨ · · · ∨ y k = 1) . (5)For some k − y , . . . , y k . We do not rule out constraints with fewerthan k − y j in (5) beingdistinct. In any case, if the variables y , . . . , y k come before x in the permutation π , then c S ( x, π, α ): This is because after setting to 1 the variables that comebefore x , the constraint in (5) has been reduced to ( x = c ). Note that y , . . . , y k coming before x is sufficient for c S ( x, π, α ), but not necessary, since therecould be multiple constraints of the form (5). With probability at least 1 /k , allvariables y , . . . , y k come before x , and we conclude: Proposition 3.3. If α c does not satisfy F , then Pr[ c ∈ S ( x, c, α )] ≤ − /k . This proposition is nice, but not yet useful on its own. We can use it to finish theanalysis of the running time, however we will end up with a suboptimal estimate. ppz
The function t log ( t ) is concave. We apply Jensen’s inequality to concludethat E [log | S ( x, π, α ) | ] ≤ log ( E [ | S ( x, π, α ) | ]) = log n X c =1 Pr[ c ∈ S ( x, π, α )] ! (6) We apply what we have learned above: For c = 1 , . . . , ℓ ( α, x ), it always holdsthat c ∈ S ( x, π, α ), and for c = ℓ ( α, x ) + 1 , . . . , d , we have computed that Pr[ c ∈ S ( x, π, α )] ≤ − /k . Therefore E [log | S ( x, π, α ) | ] ≤ log (cid:18) ℓ ( α, x ) + ( d − ℓ ( α, x )) (cid:18) − k (cid:19)(cid:19) . The unique case. If α is the unique satisfying assignment, then ℓ ( α, x ) = 1 forevery variable x in our CSP formula F , and the above term becomeslog (cid:18) d − k − k (cid:19) = log (cid:18) d ( k −
1) + 1 k (cid:19) . We plug this into the bound of Proposition 3.2:Pr[ ppz returns α ] ≥ − P ni =1 E [log | S ( x i ,π,α ) | ] ≥ − n log ( d ( k − k )= (cid:18) d ( k −
1) + 1 k (cid:19) − n . The success probability of Sch¨oning’s algorithm for ( d, k )-CSP problems is (cid:16) d ( k − k (cid:17) n ,and we see that even for the unique case, our analysis of ppz does not yield any-thing better than Sch¨oning. Discouraged by this failure, we do not continue thissuboptimal analysis for the non-unique case. The main culprit behind the poor performance of our analysis is Jensen’s inequal-ity in (6). To improve our analysis, we refrain from applying Jensen’s inequalitythere and instead try to analyze the term E [log | S ( x, π, α ) | ] directly. However,recall that we have used Jensen’s inequality before, in (4). Is it safe to apply itthere? How can we tell when applying it makes sense and when it definitely doesnot? To discuss this issue, we restate the two applications of Jensen’s inequality: E h − P x ∈ V log | S ( x,π,α ) | i ≥ E [ − P x ∈ V log | S ( x,π,α ) | ] (7) E [log | S ( x, π, α ) | ] ≤ log ( E [ | S ( x, π, α ) | ]) (8)Formally, Jensen’s inequality states that for a random variable X and a convexfunction f , it holds that E [ f ( X )] ≥ f ( E [ X ]) , (9)and by multiplying (9) by − X is very concen-trated around its expectation: In the most extreme case, X is a constant, and (9) holds with equality. On the other extreme, suppose X is a random variable tak-ing on values − m and m , each with probability 1 /
2, and let f : t t , which is aconvex function. The left-hand side of (9) evaluates to E [ f ( X )] = E [ X ] = m ,whereas the right-hand side evaluates to f ( E [ X ]) = f (0) = 0, and Jensen’s in-equality is very loose indeed. What random variables are we dealing with in (7)and (8)? These are X := X x ∈ V log | S ( x, π, α ) | and Y := | S ( x, π, α ) | , and the corresponding functions are f : t − t , which is convex, and g : t log t , which is concave. In both cases, the underlying probability space isthe set of all permutations of V , endowed with the uniform distribution. Wesee that Y is not concentrated at all: Suppose x comes first in π : If our CSPformula F contains no unit constraints, then | S ( x, π, α ) | = d , i.e., no truth valueis forbidden by a unit constraints. On the other hand, if x comes last in π ,then | S ( x, π, α ) | = ℓ ( α, x ). Either case happens with probability 1 /n , which isnot very small. Thus, the random variable | S ( x, π, α ) | does not seem to be veryconcentrated.Contrary to Y , the random variable X can be very concentrated, in fact forcertain CSP formulas it can be a constant: Suppose d = 2, i.e., the boolean case.Here X simply counts the number of non-forced variables. Consider the 2-CNFformula ∧ n/ i =1 ( x i ∨ y i ) ∧ ( x i ∨ ¯ y i ) ∧ (¯ x i ∨ y i ) . (10)This formula has n variables, and α = (1 , . . . ,
1) is the unique satisfying assign-ment. Observe that if x i comes before y i in π , then S ( x i , π, α ) = { , } and S ( y i , π, α ) = { } . If y i comes before x i , then S ( x i , π, α ) = { } and S ( y i , π, α ) = { , } . Hence X ≡ n/ k ≥ After this interlude on Jensen’s inequality, let us try to bound E [log | S ( x, π, α ) | ]directly. In this context, x is some variable, α is a satisfying assignment, for sim-plicity α = (1 , . . . , π is a permutation of the variables sampled uniformlyat random. Again think of the d truth assignments α , . . . , α d obtained by setting α c := α [ x c ] for c = 1 , . . . , d . Among them, ℓ := ℓ ( α, x ) satisfy the formula F .We assume without loss of generality that those are α , . . . , α ℓ . Thus, for each ℓ < c ≤ d , there is a constraint C c satisfied by α but not by α c . Let us write down these constraints: C ℓ +1 := ( x = ℓ + 1 ∨ y ( ℓ +1)1 = 1 ∨ · · · ∨ y ( ℓ +1) k − = 1) C ℓ +2 := ( x = ℓ + 2 ∨ y ( ℓ +2)1 = 1 ∨ · · · ∨ y ( ℓ +2) k − = 1) . . . (11) C d := ( x = d ∨ y ( d )1 = 1 ∨ · · · ∨ y ( d ) k − = 1)We define binary random variables Y ( c ) j for 1 ≤ j ≤ k − ℓ + 1 ≤ c ≤ d asfollows: Y ( c ) j := (cid:26) y ( c ) j comes after x in the permutation π , . We define Y ( c ) := Y ( c )1 ∨ · · · ∨ Y ( c ) k − . For convenience we also introduce randomvariables Y (1) , . . . , Y ( ℓ ) that are constant 1. Finally, we define Y := P dc =1 Y ( c ) .Observe that Y ( c ) = 0 if and only if all variables y c , . . . , y ck − come before x inthe permutation, in which case c S ( x, π, α ). Therefore, | S ( x, π, α ) | ≤ Y (12)The variables Y (1) , . . . , Y ( ℓ ) are constant 1, whereas each of the Y ( c +1) , . . . , Y ( d ) is 0 with probability at least 1 /k . Since 1 ≤ ℓ ≤ d , the random variable Y cantake values from 1 to d . We want to bound E [log | S ( x, α, π ) | ] ≤ E [log ( Y )] = E " log ℓ + d X c = ℓ +1 Y ( c ) ! . (13)For this, we must bound the probability Pr[ Y = j ] for j = 1 , . . . , d . This isdifficult, since the Y ( c ) are not independent: For example, conditioning on x coming very early in π increases the expectation of each Y ( c ) , and conditioningon x coming late decreases it. We use a standard trick, also used by Paturi,Pud´ak, Saks and Zane [3] to overcome these dependencies: Instead of viewing π as a permutation of V , we think of it as a function V → [0 ,
1] where for each x ∈ V , its value π ( x ) is chosen uniformly at random from [0 , π ( x ) are distinct and therefore give rise to a permutation. The trickis that for x , y , and z being three distinct variables, the events “y comes beforex” and “z comes before x” are independent when conditioning on π ( x ) = r :Pr[ π ( y ) < π ( x ) | π ( x ) = r ] = r Pr[ π ( z ) < π ( x ) | π ( x ) = r ] = r Pr[ π ( x ) < π ( x ) and π ( z ) < π ( x ) | π ( x ) = r ] = r Compare this to the unconditional probabilities:Pr[ π ( y ) < π ( x )] = 12Pr[ π ( z ) < π ( x ) | π ( x ) = r ] = 12Pr[ π ( x ) < π ( x ) and π ( z ) < π ( x ) | π ( x ) = r ] = 13We want to compute E [ Y ( c ) | π ( x ) = r ]. We know that E [ Y ( c ) j | π ( x ) = r ] = 1 − r ,since Y ( c ) j is 1 if and only if the boolean variable y ( c ) j comes after x . Since weare dealing with constraints of size at most k , there are, for each ℓ + 1 ≤ c ≤ d ,at most k − y ( c )1 , . . . , y ( c ) k − , and the probability that all comebefore x , conditioned on π ( x ) = r , is at least r k − . Therefore E [ Y ( c ) ] ≤ − r k − . Still, a variable y ( c ) j might occur in several constraints among C ℓ +1 , . . . , C d , andtherefore the Y c are not independent. The main technical tool of our analysisis a lemma stating that the worst case is achieved exactly if they in fact areindependent, i.e., if all variables y ( c ) j for c = ℓ + 1 , . . . , d and k = 1 , . . . , k − Lemma 3.4 (Independence is Worst Case).
Let r , k , ℓ and Y ( c ) be de-fined as above. Let Z ( ℓ +1) , . . . , Z ( d ) be independent binary random variables with E [ Z i ] = 1 − r k − . Then E " log ℓ + d X c = ℓ +1 Y ( c ) ! | π ( x ) = r ≤ E " log ℓ + d X c = ℓ +1 Z ( c ) ! . Before we prove the lemma in the next section, we first finish the analysis of thealgorithm. We apply a somewhat peculiar estimate: Let a ≥ b ≥ ( a + b ) ≤ log ( a · ( b + 1)) = log ( a ) + log ( b + 1). Applyingthis with a := ℓ and b := P dc = ℓ +1 Z ( c ) and combining it with the lemma andwith (13), we obtain E [log | S ( x, α, π ) | | π ( x ) = r ] ≤ log ( ℓ ) + E " log d X c = ℓ +1 Z ( c ) ! . (14)This estimate looks wasteful, but consider the case where F has a unique sat-isfying assignment α : There, ℓ ( α, x ) = 1 for every variable x , and (14) holdswith equality. In addition to Z ( ℓ +1) , . . . , Z ( d ) , we introduce ℓ − Z (2) , . . . , Z ( ℓ ) , each with expectation 1 − r k − , anddefine g ( d, k, r ) := E " log d X c =2 Z ( c ) ! . The only difference between the expectation in (14) and here is that here, wesum over c = 2 , . . . , d , whereas in (14) we sum only over c = ℓ + 1 , . . . , d . We getthe following version of (14): E [log | S ( x, α, π ) | (cid:12)(cid:12) π ( x ) = r ] ≤ log ( ℓ ) + g ( d, k, r ) . (15)We want to get rid of the condition π ( x ) = r . This is done by integrating (15)for r from 0 to 1. E [log | S ( x, α, π ) | ] ≤ log ( ℓ ) + Z g ( d, k, r ) dr =: log ( ℓ ) + G ( d, k ) . (16)This G ( d, k ) is indeed the same G ( d, k ) as in Theorem 1.1, and below we will doa detailed calculation showing this. Lemma 3.5 (Lemma 1 in Feder, Motwani [1]).
Let F be a satisfiable CSPformula over variable set V . Then X α ∈ sat V ( F ) Y x ∈ V ℓ ( α, x ) ≥ . (17)This lemma is a quantitative version of the intuitive statement that if a set S ⊆ [ d ] n is small, then there must be rather isolated points in S . We now puteverything together:Pr[ ppsz ( F, π ) is successful] = X α ∈ sat V ( F ) Pr[ ppsz ( F, π ) returns α ] ≥ X α ∈ sat V ( F ) − P x ∈ V E [log | S ( x,α,π ) | ] , where the inequality follows from (4). Together with (16), we see that X α ∈ sat V ( F ) − P x ∈ V E [log | S ( x,α,π ) | ] ≥ X α ∈ sat V ( F ) − P x ∈ V (log ( ℓ ( α,x ))+ G ( d,k )) = 2 − nG ( d,k ) X α ∈ sat V ( F ) − P x ∈ V log ( ℓ ( α,x )) = 2 − nG ( d,k ) X α ∈ sat V ( F ) Y x ∈ V ℓ ( α, x ) ≥ − nG ( d,k ) , where the last inequality follows from Lemma 3.5. To prove Theorem 1.1, we eval-uate the term G ( d, k ). Recall that G ( d, k ) = R g ( d, k, r ) dr , where g ( d, k, r ) = E h log (cid:16) P dc =2 Z ( c ) (cid:17)i , and Z (2) , . . . , Z ( d ) are independent binary variableswith expectation 1 − r k − each. For 0 ≤ j ≤ d −
1, it holds thatPr " d X c =2 Z ( c ) = j = (cid:18) d − j (cid:19) (1 − r k − ) j ( r k − ) ( d − − j ) . (18) By the definition of expectation, it holds that g ( d, k, r ) = d − X j =0 log (1 + j ) Pr " d X c =2 Z ( c ) = j . Combining this with (18) and integrating over r from 0 to 1 yields the expressionsTheorem 1.1. This finishes the proof. The goal of this section to prove Lemma 3.4. We will prove a more generalstatement.
Definition 4.1.
A function f : { , } n → R is called monotonically increasing ,or simply monotone , if for all x , y ∈ { , } n it holds that x ≤ y ⇒ f ( x ) ≤ f ( y ) , (19) where x ≤ y is understood pointwise, i.e., x i ≤ y i for all ≤ i ≤ n . For example, the functions ∧ and ∨ , seen as functions from { , } n to R , aremonotone, whereas the parity function ⊕ is not. Definition 4.2.
A function f : { , } n → R is called submodular if for all x , y ∈ { , } , it holds that f ( x ) + f ( y ) ≥ f ( x ∧ y ) + f ( x ∨ z ) , (20) where ∨ and ∧ are understood pointwise, i.e. ( x , . . . , x n ) ∨ ( y , . . . , y n ) = ( x ∨ y , . . . , x n ∨ y n ) .Example. The OR-function f : ( x , . . . , x n ) x ∨ · · · ∨ x n is monotone andsubmodular: It is pretty clear that it is monotone, so let us try to show sub-modularity. There are two cases: First, suppose at least one of x and y is , say y = . Then the left-hand side of (20) evaluates to f ( x ), and the right-hand sideto f (0) + f ( x ) = f ( x ). If neither x = nor y = , then the left-hand side is 2,and the right-hand side is obviously at most 2. Example.
The AND-function g : ( x , . . . , x n ) x ∧ · · · ∧ x n is monotone, butnot submodular. It is clearly monotone, so let us show that it is not submodu-lar. Consider n = 2. Set x = (0 ,
1) and y = (1 , f ( x ) + f ( y ) = 0, but f ( x ∧ y ) + f ( x ∨ y ) = f (0 ,
0) + f (1 ,
1) = 1 . We define the notion of glued restrictions of functions. Let
A, B be two ar-bitrary sets, and let f : A n → B be a function. We define a new function f ′ by fg X X X X X X X f X X X X X X X Fig. 1.
A 7-ary function f and a gluing restriction g .“gluing together” two input coordinates of f . Formally, for 1 ≤ i ≤ j ≤ n , wedefine the function f ′ : ( a , . . . , a n ) f ( a , . . . , a j − , a i , a j +1 , . . . , a n ) . The function f ′ can be viewed as a restriction of f to inputs ( a , . . . , a n ) forwhich a i = a j . Thus, f ′ can be seen as a function A n − → B . We prefer, however,to define it as a function A n → B that simply ignores the j th coordinate of itsinput. We say f ′ is obtained from f by a gluing step . A function g : A n → B is a glued restriction of f if it can be obtained from f by a sequence of gluing steps.See Figure 1 for an intuition.Consider a function f : { , } n → R and think of feeding f with random inputbits. Formally, let X , . . . , X n be n independent binary random variables, eachwith expectation p . We are interested in the term E [ f ( X , . . . , X n )]. In a sec-ond scenario, we introduce dependencies between the X i by gluing some of themtogether: For example, instead of choosing X , . . . , X n independently, we use thesame bit for X , X , and X n , thus computing E [ f ( X , X , X , X , . . . , X n − , X )]instead of E [ f ( X , . . . , X n )]. With the terminology introduced above, we want tocompare E [ f ( X , . . . , X n )] to E [ g ( X , . . . , X n )], where g is a glued restriction of f . For general functions f , we cannot say anything about how E [ f ( X , . . . , X n )]compares to E [ g ( X , . . . , X n )]. However, if f is submodular, we can.To get an intuition, consider the boolean lattice { , } n with at the bottomand at the top. In that lattice, x ∧ y is below x and y , and x ∨ y is abovethem. Thus, in some sense, the points x and y lie between x ∧ y and x ∨ y .See Figure 2 for an illustration. On the left-hand side of (20), we evaluate f atpoints that lie more to the middle of the lattice, whereas on the right-hand sidewe evaluate f at points that lie more to the bottom or top of it. The randomvector ( X , . . . , X n ) tends to lie around the pn th level of the lattice, whereas( X , X , X , X , . . . , X n − , X ) is less concentrated and more often visits the x ∨ yxx ∧ y y Fig. 2.
The boolean lattice with four points x , y , x ∧ y and x ∨ y .extremes of the lattice. In the light of (20), we expect that biasing points towardsthe extremes will decrease E [ f ]. The following lemma formalizes this intuition. Lemma 4.3.
Let f : { , } n → R be a submodular function and g be a gluedrestriction of it. Let X , . . . , X n be independent binary random variables, eachwith expectation p . Then E [ f ( X , . . . , X n )] ≥ E [ g ( X , . . . , X n )] .Proof. It is easy to see that applying a gluing step to a submodular functionresults in a submodular function: After all, a gluing step simply means restrictingthe function to a subset of its domain. Therefore, it suffices to prove the lemmafor a function g that has been obtained from f by a single gluing step. Withoutloss of generality, we can assume that X n − and X n have been glued together.We have to show that E [ f ( X , . . . , X n )] ≥ E [ f ( X , . . . , X n − , X n − )] . It suffices to show this inequality for every fixed ( n − X , . . . , X n − ). Formally, for b , . . . , b n − ∈ { , } , let g : ( x, y ) f ( b , . . . , b n − , x, y ) . The function g is also submodular. Let X, Y be two independent binary ran-dom variables, each with expectation p . We have to show that E [ g ( X, Y )] ≥ E [ g ( X, X )]. This is not difficult: E [ g ( X, Y )] = (1 − p ) · g (0 ,
0) + p (1 − p ) · g (1 ,
0) ++(1 − p ) p · g (0 ,
1) + p · g (1 , − p ) · g (0 ,
0) + p (1 − p ) · ( g (1 ,
0) + g (0 , p · g (1 , ≥ (1 − p ) · g (0 ,
0) + p (1 − p ) · ( g (0 ,
0) + g (1 , p · g (1 , − p ) + p (1 − p )) · g (0 ,
0) + ( p (1 − p ) + p ) · g (1 , − p ) · g (0 ,
0) + p · g (1 ,
1) = E [ g ( X, X )] , where the inequality comes from the submodularity of g . ⊓⊔ Lemma 4.4.
Let I ⊆ R be an interval, and let f : { , } n → I be monotoneand submodular, and h : I → R be non-decreasing and concave. Then h ◦ f : { , } n → R is also monotone and submodular.Proof. It is clear that h ◦ f , being the composition of two monotone functions, isagain monotone. To show submodularity, consider x , y ∈ { , } n . Without lossof generality, f ( x ) ≤ f ( y ). Using monotonicity, we see that f ( x ∧ y ) ≤ f ( x ) ≤ f ( y ) ≤ f ( x ∨ y ) . Claim. If s ≤ t are in I , and a ≥ b ≥ s − a ∈ I and t + b ∈ I ,then h ( s ) + h ( t ) ≥ h ( s − a ) + h ( t + b ).See Figure 3 for an illustration. To prove the claim, compare the line from s t t + bs − a h Fig. 3.
A monotone concave function f and two line segments.( s, h ( s )) to ( t, h ( t )) to the line from ( s − a, h ( s − a )) to ( t + b, h ( t + b )). Themidpoints of those lines have the coordinates (cid:18) s + t , h ( s ) + h ( t )2 (cid:19) and (cid:18) s − a + t + b , h ( s − a ) + h ( t + b )2 (cid:19) , respectively. Since a ≥ b , the first midpoint lies to the right of the second mid-point. Since both lines have positive slope (by monotonicity of h ) and the firstline lies above the second, we conclude that also the first midpoint lies abovethe second. Therefore ( h ( s − a ) + h ( t + b )) / ≤ ( h ( s ) + h ( t )) /
2, as claimed.We apply the above claim with s = f ( x ), t = f ( y ), a = f ( x ) − f ( x ∧ y ) and b = f ( x ∨ y ) − f ( y ). Note that s, t, s − a, t + b ∈ I and a, b ≥
0. To apply theclaim we need that a ≥ b , i.e., f ( x ) − f ( x ∧ y ) ≥ f ( x ∨ y ) − f ( y ) , which follows from submodularity. The claim implies that h ( s ) + h ( t ) ≥ h ( s − a ) + h ( t + b ), which with these particular values of s , t , a , and b yields h ( f ( x )) + h ( f ( y )) ≥ h ( f ( x ∧ y )) + h ( f ( x ∨ y )). ⊓⊔ Proof (Proof of Lemma 3.4).
We define ( d − ℓ )( k −
1) random variables Z ( c ) j for1 ≤ j ≤ k − ℓ < c ≤ d . These random variables are all independent andeach has expectation 1 − r . We define the function f : { , } ( d − ℓ )( k − by f ( x ( ℓ +1)1 , . . . , x ( d ) k − ) = log ℓ + d X c = ℓ +1 OR( x ( c )1 ∨ · · · ∨ x ( c ) k − ) ! . (21)This function is clearly monotone. We claim that it is submodular: The OR-function is submodular, and it is easy to check that a sum of submodular func-tions is again submodular. Finally, the function t log ( ℓ + t ) is concave.We apply Lemma 4.4 with the interval I = [0 , ∞ ), the submodular function P dc = ℓ +1 OR( x ( c )1 ∨ · · · ∨ x ( c ) k − ), which has domain I , and the concave function t log ( ℓ + t ). Thus f is submodular and monotone. To prove Lemma 3.4, wehave to show that E " log ℓ + d X c = ℓ +1 Y ( c ) ! | π ( x ) = r ≤ E " log ℓ + d X c = ℓ +1 Z ( c ) ! , (22)where the Z ( c ) are independent binary random variables with expectation 1 − r k − and Y ( c ) := OR( Y ( c )1 , . . . , Y ( c ) k − ), with Y ( c ) j := (cid:26) y ( c ) j comes after x in the permutation π , . The left-hand side of (22) thus reads as E [ f ( Y ( ℓ +1)1 , . . . , Y ( d ) k − | π ( x ) = r ]for f as defined in (21). Since the Z ( c ) are independent binary random variableswith expectation 1 − r k − , their distribution is identical to the distribution ofOR( Z ( c )1 , . . . , Z ( c ) k − ), and the right-hand side of (22) is equal to E [ f ( Z ( ℓ +1)1 , . . . , Z ( d ) k − ] . We have to show that E [ f ( Y ( ℓ +1)1 , . . . , Y ( d ) k − | π ( x ) = r ] ≤ E [ f ( Z ( ℓ +1)1 , . . . , Z ( d ) k − ] (23)Conditioned on π ( x ) = r , the distribution of each Y ( c ) j is identical to that of Z ( c ) j ,but some Y ( c ) j are “glued together”, since the underlying variables y ( c ) j of our CSPformula need not be distinct. We can, however, assemble the Y ( c ) j into groups according to their underlying variables y ( c ) j such that (i) random variables fromthe same group have the same underlying y ( c ) j and thus are identical, (ii) randomvariables from different groups are independent. Thus, f ( Y ( ℓ +1)1 , . . . , Y ( d ) k − is aglued restriction of f ( Z ( ℓ +1)1 , . . . , Z ( d ) k − or rather can be coupled with a gluedrestriction thereof, and thus by Lemma 4.3, the expectation of the former is atmost the expectation of the latter. Therefore (23) holds. ⊓⊔ References
1. T. Feder and R. Motwani. Worst-case time bounds for coloring and satisfiabilityproblems.
J. Algorithms , 45(2):192–201, 2002.2. L. Li, X. Li, T. Liu, and K. Xu. From k-SAT to k-CSP: Two generalized algorithms.
CoRR , abs/0801.3147, 2008.3. R. Paturi, P. Pudl´ak, M. E. Saks, and F. Zane. An improved exponential-timealgorithm for k-SAT.
J. ACM , 52(3):337–364, 2005.4. R. Paturi, P. Pudl´ak, and F. Zane. Satisfiability coding lemma.