A review of the Statistical Mechanics approach to Random Optimization Problems
Fabrizio Altarelli, Remi Monasson, Guilhem Semerjian, Francesco Zamponi
aa r X i v : . [ c s . CC ] F e b A review of the Statistical Mechanics approach to Random Optimization Problems
Fabrizio Altarelli , , R´emi Monasson , Guilhem Semerjian and Francesco Zamponi Dipartimento di Fisica and CNR, Universit`a di Roma La Sapienza, P. A. Moro 2, 00185 Roma, Italy, LPTENS, Unit´e Mixte de Recherche (UMR 8549) du CNRS et de l’ENS,associ´ee `a l’UPMC Univ Paris 06, 24 Rue Lhomond, 75231 Paris Cedex 05, France.
We review the connection between statistical mechanics and the analysis of random optimizationproblems, with particular emphasis on the random k -SAT problem. We discuss and characterize thedifferent phase transitions that are met in these problems, starting from basic concepts. We alsodiscuss how statistical mechanics methods can be used to investigate the behavior of local searchand decimation based algorithms. This paper has been written as a contribution to the “Handbook of Satisfiability” to be published in2008 by IOS press.
I. INTRODUCTION
The connection between the statistical physics of disordered systems and optimization problems in computer sciencedates back from twenty years at least [1]. In combinatorial optimization one is given a cost function (the lengthof a tour in the traveling salesman problem (TSP), the number of violated constraints in constraint satisfactionproblems, . . . ) over a set of variables and looks for the minimal cost over an allowed range for those variables.Finding the true minimum may be complicated, and requires bigger and bigger computational efforts as the numberof variables to be minimized over increases [2]. Statistical physics is at first sight very different. The scope is todeduce the macroscopic, that is, global properties of a physical system, for instance a gas, a liquid or a solid, from theknowledge of the energetic interactions of its elementary components (molecules, atoms or ions). However, at verylow temperature, these elementary components are essentially forced to occupy the spatial conformation minimizingthe global energy of the system. Hence low temperature statistical physics can be seen as the search for minimizinga cost function whose expression reflects the laws of Nature or, more humbly, the degree of accuracy retained inits description. This problem is generally not difficult to solve for non disordered systems where the lowest energyconformation are crystals in which components are regularly spaced from each other. Yet the presence of disorder, e.g.impurities, makes the problem very difficult and finding the conformation with minimal energy is a true optimizationproblem.At the beginning of the eighties, following the works of G. Parisi and others on systems called spin glasses [1],important progresses were made in the statistical physics of disordered systems. Those progresses made possiblethe quantitative study of the properties of systems given some distribution of the disorder (for instance the locationof impurities) such as the average minimal energy and its fluctuations. The application to optimization problemswas natural and led to beautiful studies on (among others) the average properties of the minimal tour length in theTSP, the minimal cost in Bipartite Matching, for some specific instance distributions [1]. Unfortunately statisticalphysicists and computer scientists did not establish close ties on a large scale at that time. The reason could havebeen of methodological nature [3]. While physicists were making statistical statements, true for a given distributionof inputs, computer scientists were rather interested in solving one (or several) particular instances of a problem. Thefocus was thus on efficient ways to do so, that is, requiring a computational effort growing not too quickly with thenumber of data defining the instance. Knowing precisely the typical properties for a given, academic distribution ofinstances did not help much to solve practical cases.At the beginning of the nineties practitionners in artificial intelligence realized that classes of random constraintsatisfaction problems used as artificial benchmarks for search algorithms exhibited abrupt changes of behaviour whensome control parameter were finely tuned [4]. The most celebrated example was random k -Satisfiability, where onelooks for a solution to a set of random logical constraints over a set of Boolean variables. It appeared that, for largesets of variables, there was a critical value of the number of constraints per variable below which there almost surelyexisted solutions, and above which solutions were absent. An important feature was that the performances of knownsearch algorithms drastically worsened in the vicinity of this critical ratio. In addition to its intrinsic mathematicalinterest the random k -SAT problem was therefore worth to be studied for ‘practical’ reasons.This critical phenomenon, strongly reminiscent of phase transitions in condensed matter physics, led to a revival ofthe research at the interface between statistical physics and computer science, which is still very active. The purposeof the present review is to introduce the non physicist reader to some concepts required to understand the literaturein the field and to present some major results. We shall in particular discuss the refined picture of the satisfiablephase put forward in statistical mechanics studies and the algorithmic approach (Survey Propagation, an extensionof Belief Propagation used in communication theory and statistical inference) this picture suggested.While the presentation will mostly focus on the k -Satisfiability problem (with random constraints) we will occa-sionally discuss another computational problem, namely, linear systems of Boolean equations. A good reason to doso is that this problem exhibits some essential features encountered in random k -Satisfiability, while being technicallysimpler to study. In addition it is closely related to error-correcting codes in communication theory.The chapter is divided into four main parts. In Section II we present the basic statistical physics concepts necessaryto understand the onset of phase transitions, and to characterize the nature of the phases. Those are illustrated ona simple example of decision problem, the so-called perceptron problem. In Section III we review the scenario ofthe various phase transitions taking place in random k -SAT. Section IV and V present the techniques used to studyvarious type of algorithms in optimization (local search, backtracking procedures, message passing algorithms). Weend up with some conclusive remarks in Sec. VI. II. PHASE TRANSITIONS: BASIC CONCEPTS AND ILLUSTRATIONA. A simple decision problem with a phase transition: the continuous perceptron
For pedagogical reasons we first discuss a simple example exhibiting several important features we shall define moreformally in the next subsection. Consider M points T , . . . , T M of the N -dimensional space R N , their coordinatesbeing denoted T a = ( T a , . . . , T aN ). The continuous perceptron problem consists in deciding the existence of a vector σ ∈ R N which has a positive scalar product with all vectors linking the origin of R N to the T ’s, σ · T a ≡ N X i =1 σ i T ai > , ∀ a = 1 , . . . , M , (1)or in other words determining whether the M points belong to the same half-space. The term continuous in the nameof the problem emphasizes the domain R N of the variable σ . This makes the problem polynomial from worst-casecomplexity point of view [5].Suppose now that the points are chosen independently, identically, uniformly on the unit hypersphere, and call P ( N, M ) = Probability that a set of M randomly chosen pointsbelong to the same half-space.This quantity can be computed exactly [6] (see also Chapter 5.7 of [5]) and is plotted in Fig. 1 as a function of theratio α = M/N for increasing sizes N = 5 , , P is a decreasing function of the number M of pointsfor a given size N : increasing the number of constraints can only make more difficult the simultaneous satisfaction ofall of them. More surprisingly, the figure suggests that, in the large size limit N → ∞ , the probability P reaches alimiting value 0 or 1 depending on whether the ratio α lies, respectively, above or below some ‘critical’ value α s = 2.This is confirmed by the analytical expression of P obtained in [6], P ( N, M ) = 12 M − N − ,M − X i =0 (cid:18) M − i (cid:19) , (2)from which one can easily show that, indeed,lim N →∞ P ( N, M = N α ) = ( α < α s α > α s , with α s = 2 . (3)Actually the analytical expression of P allows to describe more accurately the drop in the probability as α increases.To this aim we make a zoom on the transition region M ≈ N α s and find from (2) thatlim N →∞ P ( N, M = N α s (1 + λ N − / ) ) = Z ∞ λ √ dx √ π e − x / . (4)As it should the limits λ → ±∞ gives back the coarse description of Eq. (3) ratio M/N pob a b ilit y p ( M , N ) N=5N=20N=100
FIG. 1: Probability P ( N, M ) that M random points on the N -dimensional unit hypersphere are located in the same half-space.Symbols correspond to Cover’s exact result [6], see Eq. (2), lines serve as guides to the eye. B. Generic definitions
We now put this simple example in a broader perspective and introduce some generic concepts that it illustrates,along with the definitions of the problems studied in the following. • Constraint Satisfaction Problem (CSP)A CSP is a decision problem where an assignment (or configuration) of N variables σ = ( σ , . . . , σ N ) ∈ X N isrequired to simultaneously satisfy M constraints. In the continuous perceptron the domain of σ is R N and theconstraints impose the positivity of the scalar products (1). The instance of the CSP, also called formula inthe following, is said satisfiable if there exists a solution (an assignment of σ fulfilling all the constraints). The k -SAT problem is a boolean CSP ( X = { True , False } ) where each constraint (clause) is the disjunction (logicalOR) of k literals (a variable or its negation). Similarly in k -XORSAT the literals are combined by an eXclusiveOR operation, or equivalently an addition modulo 2 of 0 / k -XORSAT is in the P complexity class forany k while k -SAT is NP-complete for any k ≥ X = {− , +1 } . A k -SAT clause will be defined by k indices i , . . . , i k ∈ [1 , N ] and k values J i , . . . , J i k = ±
1, such that the clause is unsatisfied by the assignment σ if and only if σ i j = J i j ∀ j ∈ [1 , k ]. A k -XORSAT clause is satisfied if the product of the spins is equal to a fixed value, σ i . . . σ i k = J . • random Constraint Satisfaction Problem (rCSP)The set of instances of most CSP can be turned in a probabilistic space by defining a distribution over itsconstraints, as was done in the perceptron case by drawing the vertices T a uniformly on the hypersphere. Therandom k -SAT formulas considered in the following are obtained by choosing for each clause a independently a k -uplet of distinct indices i a , . . . , i ak uniformly over the (cid:0) Nk (cid:1) possible ones, and negating or not the correspondingliterals ( J ai = ±
1) with equal probability one-half. The indices of random XORSAT formulas are chosensimilarly, with the constant J a = ± • thermodynamic limit and phase transitionsThese two terms are the physics jargon for, respectively, the large size limit ( N → ∞ ) and for thresholdphenomena as stated for instance in (3). In the thermodynamic limit the typical behavior of physical systemsis controlled by a small number of parameters, for instance the temperature and pressure of a gas. At a phasetransition these systems are drastically altered by a tiny change of a control parameter, think for instance atwhat happens to water when its temperature crosses 100 o C . This critical value of the temperature separatestwo qualitatively distinct phases, liquid and gaseous. For random CSPs the role of control parameter is usuallyplayed by the ratio of constraints per variable, α = M/N , kept constant in the thermodynamic limit. Eq. (3)describes a satisfiability transition for the continuous perceptron, the critical value α s = 2 separating a satisfiablephase at low α where instances typically have solutions to a phase where they typically do not. Typically is usedhere as a synonym for with high probability, i.e. with a probability which goes to one in the thermodynamiclimit. • Finite Size Scaling (FSS)The refined description of the neighborhood of the critical value of α provided by (4) is known as a finite sizescaling relation. More generally the finite size scaling hypothesis for a threshold phenomenon takes the formlim N →∞ P ( N, M = N α s (1 + λ N − /ν ) ) = F ( λ ) , (5)where ν is called the FSS exponent (2 for the continuous perceptron) and the scaling function F ( λ ) has limits1 and 0 at respectively −∞ and + ∞ . This means that, for a large but finite size N , the transition window forthe values of M/N where the probability drops from 1 − ǫ down to ǫ is, for arbitrary small ǫ , of width N − /ν .Results of this flavour are familiar in the study of random graphs [7]; for instance the appearance of a giantcomponent containing a finite fraction of the vertices of an Erd¨os-R´enyi random graph happens on a windowof width N − / on the average connectivity. FSS relations are important, not only from the theoretical pointof view, but also for practical applications. Indeed numerical experiments are always performed on finite-sizeinstances while theoretical predictions on phase transitions are usually true in the N → ∞ limit. Finite-sizescaling relations help to bridge the gap between the two. We shall review some FSS results in Sec. III E.Let us emphasize that random k -SAT, and other random CSP, are expected to share some features of the continuousperceptron model, for instance the existence of a satisfiability threshold, but of course not its extreme analyticalsimplicity. In fact, despite an intensive research activity, the mere existence of a satisfiability threshold for random SATformulas remains a (widely accepted) conjecture. A significant achievement towards the resolution of the conjecturewas the proof by Friedgut of the existence of a non-uniform sharp threshold [8]. There exists also upper [9] andlower [10] bounds on the possible location of this putative threshold, which become almost tight for large values of k [11]. We refer the reader to the chapter [12] of this volume for more details on these issues. This difficulty toobtain tight results with the currently available rigorous techniques is a motivation for the use of heuristic statisticalmechanics methods, that provide intuitions on why the standard mathematical ones run into trouble and how toamend them. In the recent years important results first conjectured by physicists were indeed rigorously proven.Before describing in some generality the statistical mechanics approach, it is instructive to study a simple variationof the perceptron model for which the basic probabilistic techniques become inefficient. C. The perceptron problem continued: binary variables
The binary perceptron problem consists in looking for solutions of (1) on the hypercube i.e. the domain of thevariable σ is X N = {− , +1 } N instead of R N . This decision problem is NP-complete. Unfortunately Cover’s calcula-tion [6] cannot be extended to this case, though it is natural to expect a similar satisfiability threshold phenomenonat an a priori distinct value α s . Let us first try to study this point with basic probabilistic tools, namely the first andsecond moment method [13]. The former is an application of the Markov inequality,Prob[ Z > ≤ E [ Z ] , (6)valid for positive integer valued random variables Z . We shall use it taking for Z the number of solutions of (1), Z = X σ ∈X N M Y a =1 θ ( σ · T a ) , (7)where θ ( x ) = 1 if x >
0, 0 if x ≤
0. The expectation value of the number of solutions is easily computed, E [ Z ] = 2 N × − M = e N G with G = (1 − α ) ln 2 , (8)and vanishes when N → ∞ if α >
1. Hence, from Markov’s inequality (6), with high probability constraints (1) haveno solution on the hypercube when the ratio α exceeds unity: if the threshold α s exists, it must satisfy the bound α s ≤
1. One can look for a lower bound to α s using the second moment method, relying on the inequality [13] E [ Z ] E [ Z ] ≤ Prob[
Z > . (9)The expectation value of the squared number of solutions reads E [ Z ] = X σ,σ ′ ( E [ θ ( σ · T ) θ ( σ ′ · T )]) M (10)since the vertices T a are chosen independently of each other. The expectation value on the right hand side of theabove expression is simply the probability that the vector pointing to a randomly chosen vertex, T , has positive scalarproduct with both vectors σ, σ ′ . Elementary geometrical considerations reveal that E [ θ ( σ · T ) θ ( σ ′ · T )] = 12 π ( π − ϕ ( σ, σ ′ )) (11)where ϕ is the relative angle between the two vectors. This angle can be alternatively parametrized by the overlapbetween σ and σ ′ , i.e. the normalized scalar product, q = 1 N N X i =1 σ i σ ′ i = 1 − N N X i =1 I ( σ i = σ ′ i ) . (12)The last expression, in which I ( E ) denotes the indicator function of the event E , reveals the traduction between theconcept of overlap and the more traditional Hamming distance. The sum over vectors in (10) can then be replacedby a sum over overlap values with appropriate combinatorial coefficients counting the number of pairs of vectors at agiven overlap. The outcome is E [ Z ] = 2 N X q = − , − N , − N ,..., (cid:18) NN (cid:0) q (cid:1)(cid:19) (cid:18) − π Arcos q (cid:19) M . (13)In the large N limit we can estimate this sum with the Laplace method,lim N →∞ N ln E [ Z ] = max −
0, and inconsequence the left hand side of (9) vanishes. A possible scenario which explains this absence of concentrationof the number of solutions is the following. As shown by the moment calculation the natural scaling of Z isexponentially large in N (as is the total configuration space X N ). We shall thus denote s = (ln Z ) /N therandom variable of order one counting the log degeneracy of the solutions. Suppose s follows a large deviationprinciple [14] that we state in a very rough way as Prob[ s ] ≈ exp[ N L ( s )], with L ( s ) a negative rate function,assumed for simplicity to be concave. Then the moments of Z are given, at the leading exponential order, bylim N →∞ N ln E [ Z n ] = max s [ L ( s ) + ns ] , (16)and are controlled by the values of s such that L ′ ( s ) = − n . The moments of larger and larger order n are thusdominated by the contribution of rarer and rarer instances with larger and larger numbers of solutions. Onthe contrary the typical value of the number of solutions is given by the maximum of L , reached in a value wedenote s g ( α ): with high probability when N → ∞ , Z is comprised between e N ( s g ( α ) − ǫ ) and e N ( s g ( α )+ ǫ ) , for any ǫ >
0. From this reasoning it appears that the relevant quantity to be computed is s g ( α ) = lim N →∞ N E [ln Z ] = lim N →∞ lim n → n N ln E [ Z n ] . (17)This idea of computing moments of vanishing order is known in statistical mechanics as the replica method [1].Its non-rigorous implementation consists in determining the moments of integer order n , which are then continued The vocable replicas comes from the presence of n copies of the vector σ in the calculation of Z n (see the n = 2 case in formula (10)). towards n = 0. The outcome of such a computation for the binary perceptron problem reads [15] s g ( α ) = max q, ˆ q (cid:26) − q (1 − ˆ q ) + Z ∞−∞ Dz ln(2 cosh( z p ˆ q )) (18)+ α Z ∞−∞ Dz ln "Z ∞ z √ q/ (1 − q ) Dy , where Dz ≡ dz e − z / / √ π . The entropy s g ( α ) is a decreasing function of α , which vanishes in α s ≃ . • the calculation of the second moment is naturally related to the determination of the value of the overlap q between pairs of solutions (or equivalently their Hamming distance, recall Eq. (12)). This conclusion extendsto the calculation of the n th moment for any integer n , and to the n → q maximizing ther.h.s. of (18), q ∗ ( α ), represents the average overlap between two solutions of the same set of constraints (1).Actually the distribution of overlaps is highly concentrated in the large N limit around q ∗ ( α ), in other wordsthe (reduced) Hamming distance between two solutions is, with high probability, equal to d ∗ ( α ) = (1 − q ∗ ( α )) / d ∗ ( α ) ranges from for α = 0 to ≃ at α = α s . Slightly below the critical ratio solutions arestill far away from each other on the hypercube .Note that the perceptron problem is not as far as it could seem from the main subject of this review. Thereexists indeed a natural mapping between the binary perceptron problem and k -SAT. Assume the vertices T ofthe perceptron problem, instead of being drawn on the hypersphere, have coordinates that can take three values: T i = − , ,
1. Consider now a k -SAT formula F . To each clause a of F we associate the vertex T a with coordinates T ai = − J ai if variable i appears in clause a , 0 otherwise. Of course P i | T ai | = k : exactly k coordinates have non zerovalues for each vertex. Then replace condition (1) with N X i =1 σ i T ai > − ( k − , ∀ a = 1 , . . . , M . (19)The scalar product is not required to be positive any longer, but to be larger than − ( k − σ i = ±
1) if and only if F is satisfiable. While inthe binary perceptron model all coordinates are non-vanishing, only a finite number of them take non zero values in k -SAT. For this reason k -SAT is called a diluted model in statistical physics.Also the direct application of the second moment method fails for the random k -SAT problem; yet a refined versionof it was used in [11], which leads to asymptotically (at large k ) tight bounds on the location of the satisfiabilitythreshold. D. From random CSP to statistical mechanics of disordered systems
The binary perceptron example taught us that the number of solutions Z of a satisfiable random CSP usually scalesexponentially with the size of the problem, with large fluctuations that prevent the direct use of standard momentmethods. This led us to the introduction of the quenched entropy, as defined in (17). The computation techniquesused to obtain (18) were in fact developed in an apparently different field, the statistical mechanics of disorderedsystems [1].Let us review some basic concepts of statistical mechanics (for introductory books see for example [16, 17]). Aphysical system can be modeled by a space of configuration σ ∈ X N , on which is defined an energy function E ( σ ). Forinstance usual magnets are described by Ising spins σ i = ±
1, the energy being minimized when adjacent spins takethe same value. The equilibrium properties of a physical system at temperature T are given by the Gibbs-Boltzmannprobability measure on X N , µ ( σ ) = 1 Z exp[ − βE ( σ )] , (20) This situation is very different from the continuous perceptron case, where the typical overlap q ∗ ( α ) reaches one when α tends to 2: asingle solution is left right at the critical ratio. where the inverse temperature β equals 1 /T and Z is a normalization called partition function. The energy function E has a natural scaling, linear in the number N of variables (such a quantity is said to be extensive). In consequencein the thermodynamic limit the Gibbs-Boltzmann measure concentrates on configurations with a given energy density( e = E/N ), which depends on the conjugated parameter β . The number of such configurations is usually exponentiallylarge, ≈ exp[ N s ], with s called the entropy density. The partition function is thus dominated by the contribution ofthese configurations, hence lim(ln Z/N ) = s − βe .In the above presentation we supposed the energy to be a simple, known function of the configurations. In factsome magnetic compounds, called spin-glasses, are intrinsically disordered on a microscopic scale. This means thatthere is no hope in describing exactly their microscopic details, but that one should rather assume their energy tobe itself a random function with a known distribution. Hopefully in the thermodynamic limit the fluctuations of thethermodynamic observables as the energy and entropy density vanish, hence the properties of a typical sample willbe closely described by the average (over the distribution of the energy function) of the entropy and energy density.The random CSPs fit naturally in this line of research. The energy function E ( σ ) of a CSP is defined as thenumber of constraints violated by the assignment σ , in other words this is the cost function to be minimized in theassociated optimization problem (MAXSAT for instance). Moreover the distribution of random instances of CSP isthe counterpart of the distribution over the microscopic description of a disordered solid. The study of the optimalconfigurations of a CSP, and in particular the characterization of a satisfiability phase transition, is achieved by takingthe β → ∞ limit. Indeed, when this parameter increases (or equivalently the temperature goes to 0), the law (20)favors the lowest energy configurations. In particular if the formula is satisfiable µ becomes the uniform measuresover the solutions. Two important features of the formula can be deduced from the behavior of Z at large β : theground-state energy E g = min σ E ( σ ), which indicates how good are the optimal configurations, and the ground stateentropy S g = ln( |{ σ : E ( σ ) = E g }| ), which counts the degeneracy of these optimal configurations. The satisfiabilityof a formula is equivalent to its ground-state energy being equal to 0. In the large N limit these two thermodynamicquantities are supposed to concentrate around their mean values (this is proven for E in [18]), we thus introduce theassociated typical densities, e g ( α ) = lim N →∞ N E [ E g ] , s g ( α ) = lim N →∞ N E [ S g ] . (21)Notice that formula (21) coincides with (17) in the satisfiable phase (where the ground state energy vanishes).Some criteria are needed to relate these thermodynamic quantities to the (presumed to exist) satisfiability threshold α s . A first approach, used for instance in [19], consists in locating it as the point where the ground-state energy density e g becomes positive. The assumption underlying this reasoning is the absence of an intermediate, typically UNSATregime, with a sub-extensive positive E g . In the discussion of the binary perceptron we used another criterion, namelywe recognized α s by the cancellation of the ground-state entropy density. This argument will be true if the typicalnumber of solutions vanishes continuously at α s . It is easy to realize that this is not the case for random k -SAT: atany finite value of α a finite fraction exp[ − αk ] of the variables do not appear in any clause, which leads to a triviallower bound (ln 2) exp[ − αk ] on s g . This quantity is thus finite at the transition, a large number of solutions disappearsuddenly at α s . Even if it is wrong, the criterion s g ( α ) = 0 for the determination of the satisfiability transition isinstructive for two reasons. First, it becomes asymptotically correct at large k (free variables are very rare in thislimit), this is why it works for the binary perceptron of Section II C (which is, as we have seen, close to k -SAT with k of order N ). Second, it will reappear below in a refined version: we shall indeed decompose the entropy in twoqualitatively distinct contributions, one of the two being indeed vanishing at the satisfiability transition. III. PHASE TRANSITIONS IN RANDOM CSPSA. The clustering phenomenon
We have seen that the statistical physics approach to the perceptron problem naturally provided us with informationabout the geometry of the space of its solutions. Maybe one of the most important contribution of physicists to thefield of random CSP was to suggest the presence of further phase transitions in the satisfiable regime α < α s , affectingqualitatively the geometry (structure) of the set of solutions [20–22].This subset of the configuration space is indeed thought to break down into “clusters” in a part of the satisfiablephase, α ∈ [ α d , α s ], α d being the threshold value for the clustering transition. Clusters are meant as a partition ofthe set of solutions having certain properties listed below. Each cluster contains an exponential number of solutions,exp[ N s int ], and the clusters are themselves exponentially numerous, exp[ N Σ]. The total entropy density thus decom-poses into the sum of s int , the internal entropy of the clusters and Σ, encoding the degeneracy of these clusters, usuallytermed complexity in this context. Furthermore, solutions inside a given cluster should be well-connected, while twosolutions of distinct clusters are well-separated. A possible definition for these notions is the following. Suppose σ and τ are two solutions of a given cluster. Then one can construct a path ( σ = σ , σ , . . . , σ n − , σ n = τ ) where any twosuccessive σ i are separated by a sub-extensive Hamming distance. On the contrary such a path does not exist if σ and τ belong to two distinct clusters. Clustered configuration spaces as described above have been often encountered invarious contexts, e.g. neural networks [23] and mean-field spin glasses [24]. A vast body of involved, yet non-rigorous,analytical techniques [1] have been developed in the field of statistical mechanics of disordered systems to tackle suchsituations, some of them having been justified rigorously [25–27]. In this literature clusters appear under the name of“pure states”, or “lumps” (see for instance the chapter 6 of [25] for a rigorous definition and proof of existence in arelated model). As we shall explain in a few lines, this clustering phenomenon has been demonstrated rigorously inthe case of random XORSAT instances [28, 29]. For random SAT instances, where in fact the detailed picture of thesatisfiable phase is thought to be richer [22], there are some rigorous results [30–32] on the existence of clusters forlarge enough k . B. Phase transitions in random XORSAT
Consider an instance F of the XORSAT problem [33], i.e. a list of M linear equations each involving k out of N boolean variables, where the additions are computed modulo 2. The study performed in [28, 29] provides a detailedpicture of the clustering and satisfiability transition sketched above. A crucial point is the construction of a coresubformula according to the following algorithm. Let us denote F = F the initial set of equations, and V the set ofvariables which appear in at least one equation of F . A sequence F T , V T is constructed recursively: if there are novariables in V T which appear in exactly one equation of F T the algorithm stops. Otherwise one of these “leaf variables” σ i is chosen arbitrarily, F T +1 is constructed from F T by removing the unique equation in which σ i appeared, and V T +1 is defined as the set of variables which appear at least once in F T +1 . Let us call T ∗ the number of steps performedbefore the algorithm stops, and F ′ = F T ∗ , V ′ = V T ∗ the remaining clauses and variables. Note first that despitethe arbitrariness in the choice of the removed leaves, the output subformula F ′ is unambiguously determined by F .Indeed, F ′ can be defined as the maximal (in the inclusion sense) subformula in which all present variables have aminimal occurrence number of 2, and is thus unique. In graph theoretic terminology F ′ is the 2-core of F , the q -coreof hypergraphs being a generalization of the more familiar notion on graphs, thoroughly studied in random graphensembles in [34]. Extending this study, relying on the approximability of this leaf removal process by differentialequations [35], it was shown in [28, 29] that there is a threshold phenomenon at α d ( k ). For α < α d the 2-core F ′ is, with high probability, empty, whereas it contains a finite fraction of the variables and equations for α > α d . α d is easily determined numerically: it is the smallest value of α such that the equation x = 1 − exp[ − αkx k − ] has anon-trivial solution in (0 , F is satisfiable if and only if F ′ is, and that the number of solutions of these two formulas arerelated in an enlightening way. It is clear that if the 2-core has no solution, there is no way to find one for the fullformula. Suppose on the contrary that an assignment of the variables in V ′ that satisfy the equations of F ′ has beenfound, and let us show how to construct a solution of F (and count in how many possible ways we can do this).Set N = 1, and reintroduce step by step the removed equations, starting from the last: in the n ’th step of thisnew procedure we reintroduce the clause which was removed at step T ∗ − n of the leaf removal. This reintroducedclause has d n = | V T ∗ − n − | − | V T ∗ − n | ≥ d n − ways to satisfy thereintroduced clause, irrespectively of the previous choices, and we bookkeep this number of possible extensions bysetting N n +1 = N n d n − . Finally the total number of solutions of F compatible with the choice of the solution of F ′ is obtained by adding the freedom of the variables which appeared in no equations of F , N int = N T ∗ N −| V | . Let usunderline that N int is independent of the initial satisfying assignment of the variables in V ′ , as appears clearly fromthe description of the reconstruction algorithm; this property can be traced back to the linear algebra structure ofthe problem. This suggests naturally the decomposition of the total number of solutions of F as the product of thenumber of satisfying assignments of V ′ , call it N core , by the number of compatible full solutions N int . In terms of theassociated entropy densities this decomposition is additive s = Σ + s int , Σ ≡ N ln N core , s int ≡ N ln N int , (22)where the quantity Σ is the entropy density associated to the core of the formula. It is in fact much easier technicallyto compute the statistical (with respect to the choice of the random formula F ) properties of Σ and s int once thisdecomposition has been done (the fluctuations in the number of solutions is much smaller once the non-core part ofthe formula has been removed). The outcome of the computations [28, 29] is the determination of the threshold value α s for the appearance of a solution of the 2-core F ′ (and thus of the complete formula), along with explicit formulasfor the typical values of Σ and s . These two quantities are plotted on Fig. 2. The satisfiability threshold corresponds α s α d Σ sαs, Σ FIG. 2: Complexity and total entropy for 3-XORSAT, in units of ln 2. The inset presents an enlargement of the regime α ∈ [ α d , α s ].TABLE I: Critical connectivities for the dynamical, condensation and satisfiability transitions for k -SAT random formulas. α d [22] α c [22] α s [38] k = 3 3 .
86 3 .
86 4 . k = 4 9 .
38 9 .
547 9 . k = 5 19 .
16 20 .
80 21 . k = 6 36 .
53 43 .
08 43 . to the cancellation of Σ: the number of solutions of the core vanishes continuously at α s , while the total entropyremains finite because of the freedom of choice for the variables in the non-core part of the formula.On top of the simplification in the analytical determination of the satisfiability threshold, this core decompositionof a formula unveils the change in the structure of the set of solutions that occurs at α d . Indeed, let us call clusterall solutions of F reconstructed from a common solution of F ′ . Then one can show that this partition of the solutionset of F exhibits the properties exposed in Sec. III A, namely that solutions are well-connected inside a cluster andseparated from one cluster to another. The number of clusters is precisely equal to the number of solutions of the coresubformula, it thus undergoes a drastic modification at α d . For smaller ratio of constraints the core is typically empty,there is one single cluster containing all solutions; when the threshold α d is reached there appears an exponentialnumbers of clusters, the rate of growth of this exponential being given by the complexity Σ. Before considering theextension of this picture to random SAT problems, let us mention that further studies of the geometry of the spaceof solutions of random XORSAT instances can be found in [36, 37]. C. Phase transitions in random SAT
The possibility of a clustering transition in random SAT problems was first studied in [20] by means of variationalapproximations. Later developments allowed the computation of the complexity and, from the condition of itscancellation, the estimation of the satisfiability threshold α s . This was first done for k = 3 in [21] and generalized for k ≥ α s thus computed are reported in Tab. I. A systematic expansion of α s at large k was also performed in [38].SAT formulas do not share the linear algebra structure of XORSAT, which makes the analysis of the clusteringtransition much more difficult, and leads to a richer structure of the satisfiable phase α ≤ α s . The simple graphtheoretic arguments are not valid anymore, one cannot extract a core subformula from which the partition of thesolutions into clusters follows directly. It is thus necessary to define them as a partition of the solutions such that eachcluster is well-connected and well-separated from the other ones. A second complication arises: there is no reason forthe clusters to contain all the same number of solutions, as was ensured by the linear structure of XORSAT. On thecontrary, as was observed in [20] and in [39] for the similar random COL problem, one faces a variety of clusters withvarious internal entropies s int . The complexity Σ becomes a function of s int , in other words the number of clustersof internal entropy density s int is typically exponential, growing at the leading order like exp[ N Σ( s int )]. Drawingthe consequences of these observations, a refined picture of the satisfiable phase, and in particular the existence of a0new (so-called condensation) threshold α c ∈ [ α d , α s ], was advocated in [22]. Let us briefly sketch some of these newfeatures and their relationship with the previous results of [21, 38]. Assuming the existence of a positive, concave,complexity function Σ( s int ), continuously vanishing outside an interval of internal entropy densities [ s − , s + ], the totalentropy density is given by s = lim N →∞ N ln Z s + s − ds int e N [Σ( s int )+ s int ] . (23)In the thermodynamic limit the integral can be evaluated with the Laplace method. Two qualitatively distinctsituations can arise, whether the integral is dominated by a critical point in the interior of the interval [ s − , s + ], or bythe neighborhood of the upper limit s + . In the former case an overwhelming majority of the solutions are containedin an exponential number of clusters, while in the latter the dominant contributions comes from a sub-exponentialnumber of clusters of internal entropy s + , as Σ( s + ) = 0. The threshold α c separates the first regime [ α d , α c ] wherethe relevant clusters are exponentially numerous, from the second, condensated situation for α ∈ [ α c , α s ] with asub-exponential number of dominant clusters .The computations of [21, 38] did not take into account the distribution of the various internal entropies of theclusters, which explains the discrepancy in the estimation of the clustering threshold α d between [21, 38] and [22].Let us however emphasize that this refinement of the picture does not contradict the estimation of the satisfiabilitythreshold of [21, 38]: the complexity computed in these works is Σ max , the maximal value of Σ( s int ) reached at a localmaximum with Σ ′ ( s ) = 0, which indeed vanishes when the whole complexity function disappears.It is fair to say that the details of the picture proposed by statistical mechanics studies have rapidly evolved in thelast years, and might still be improved. They rely indeed on self-consistent assumptions which are rather tedious tocheck [40]. Some elements of the clustering scenario have however been established rigorously in [30–32], at least forlarge enough k . In particular these works demonstrated, for some values of k and α in the satisfiable regime, theexistence of forbidden intermediate Hamming distances between pairs of configurations, which are either close (in thesame cluster) or far apart (in two distinct clusters).Note finally that the consequences of such distributions of clusters internal entropies were investigated on a toymodel in [41], and that yet another threshold α f > α d for the appearance of frozen variables constrained to take thesame values in all solutions of a given cluster was investigated in [42]. D. A glimpse at the computations
The statistical mechanics of disordered systems [1] was first developed on so-called fully-connected models, whereeach variable appears in a number of constraints which diverges in the thermodynamic limit. This is for instancethe case of the perceptron problem discussed in Sec. II. On the contrary, in a random k -SAT instance a variable istypically involved in a finite number of clauses, one speaks in this case of a diluted model. This finite connectivityis a source of major technical complications. In particular the replica method, alluded to in Sec. II C and applied torandom k -SAT in [19, 20], turns out to be rather cumbersome for diluted models in the presence of clustering [43].The cavity formalism [21, 44, 45], formally equivalent to the replica one, is more adapted to the diluted models. Inthe following paragraphs we shall try to give a few hints at the strategy underlying the cavity computations, thatmight hopefully ease the reading of the original literature.The description of the random formula ensemble has two complementary aspects: a global (thermodynamic) one,which amounts to the computation of the typical energy and number of optimal configurations. A more ambitiousdescription will also provide geometrical information on the organization of this set of optimal configurations insidethe N -dimensional hypercube. As discussed above these two aspects are in fact interleaved, the clustering affectingboth the thermodynamics (by the decomposition of the entropy into the complexity and the internal entropy) and thegeometry of the configuration space. Let us for simplicity concentrate on the α < α s regime and consider a satisfiableformula F . Both thermodynamic and geometric aspects can be studied in terms of the uniform probability law overthe solutions of F : µ ( σ ) = 1 Z M Y a =1 w a ( σ a ) , (24) This picture is expected to hold for k ≥
4; for k = 3, the dominant clusters are expected to be of sub-exponential degeneracy in thewhole clustered phase, hence α c = α d in this case.
12 3 45 67
FIG. 3: The factor graph representation of a small 3-SAT formula: ( x ∨ x ∨ x ) ∧ ( x ∨ x ∨ x ) ∧ ( x ∨ x ∨ x ). where Z is the number of solutions of F , the product runs over its clauses, and w a is the indicator function of theevent “clause a is satisfied by the assignment σ ” (in fact this depends only on the configuration of the k variablesinvolved in the clause a , that we denote σ a ). For instance the (information theoretic) entropy of µ is equal to ln Z ,the log degeneracy of solutions, and geometric properties can be studied by computing averages with respect to µ ofwell-chosen functions of σ .A convenient representation of such a law is provided by factor graphs [46]. These are bipartite graphs with twotypes of vertices (see Fig. 3 for an illustration): one variable node (filled circle) is associated to each of the N Booleanvariables, while the clauses are represented by M constraint nodes (empty squares). By convention we use the indices a, b, . . . for the constraint nodes, i, j, . . . for the variables. An edge is drawn between variable node i and constraintnode a if and only if a depends on i . To precise further by which value of σ i the clause a gets satisfied one can usetwo type of linestyles, solid and dashed on the figure. A notation repeatedly used in the following is ∂a (resp. ∂i )for the neighborhood of a constraint (resp. variable) node, i.e. the set of adjacent variable (resp. constraint) nodes.In this context \ denotes the subtraction from a set. We shall more precisely denote ∂ + i ( a ) (resp. ∂ − i ( a )) the setof clauses in ∂i \ a agreeing (resp. disagreeing) with a on the satisfying value of σ i , and ∂ σ i the set of clauses in ∂i which are satisfied by σ i = σ . This graphical representation naturally suggests a notion of distance between variablenodes i and j , defined as the minimal number of constraint nodes crossed on a path of the factor graph linking nodes i and j .Suppose now that F is drawn from the random ensemble. The corresponding random factor graph enjoys severalinteresting properties [7]. The degree | ∂i | of a randomly chosen variable i is, in the thermodynamic limit, a Poissonrandom variable of average αk . If instead of a node one chooses randomly an edge a − i , the outdegree | ∂i \ a | of i has again a Poisson distribution with the same parameter. Moreover the sign of the literals being chosen uniformly,independently of the topology of the factor graph, the degrees | ∂ + i | , | ∂ − i | , | ∂ + i ( a ) | and | ∂ − i ( a ) | are Poisson randomvariables of parameter αk/
2. Another important feature of these random factor graphs is their local tree-like character:if the portion of the formula at graph distance smaller than L of a randomly chosen variable is exposed, the probabilitythat this subgraph is a tree goes to 1 if L is kept fixed while the size N goes to infinity.Let us for a second forget about the rest of the graph and consider a finite formula whose factor graph is a tree, asis the case for the example of Fig. 3. The probability law µ of Eq. (24) becomes in this case a rather simple object.Tree structures are indeed naturally amenable to a recursive (dynamic programming) treatment, operating first onsub-trees which are then glued together. More precisely, for each edge between a variable node i and a constraint node a one defines the amputated tree F a → i (resp. F i → a ) by removing all clauses in ∂i apart from a (resp. removing only a ). These subtrees are associated to probability laws µ a → i (resp. µ i → a ), defined as in Eq. (24) but with a productrunning only on the clauses present in F a → i (resp. F i → a ). The marginal law of the root variable i in these amputatedprobability measures can be parametrized by a single real, as σ i can take only two values (that, in the Ising spinconvention, are ± h i → a and u a → i , by µ i → a ( σ i ) = 1 − J ai σ i tanh h i → a , µ a → i ( σ i ) = 1 − J ai σ i tanh u a → i , (25)where we recall that σ i = J ai is the value of the literal i unsatisfying clause a . A standard reasoning (see forinstance [47]) allows to derive recursive equations (illustrated in Fig. 4) on these messages, h i → a = X b ∈ ∂ + i ( a ) u b → i − X b ∈ ∂ − i ( a ) u b → i , (26) u a → i = −
12 ln − Y j ∈ ∂a \ i − tanh h j → a . h i → a = 0, and progresses inwardsthe graph. The law µ can be completely described from the values of the h ’s and u ’s solutions of these equations forall edges of the graph. For instance the marginal probability of σ i can be written as µ ( σ i ) = 1 + σ i tanh h i , h i = X a ∈ ∂ + i u a → i − X a ∈ ∂ − i u a → i . (27)In addition the entropy s of solutions of such a tree formula, can be computed from the values of the messages h and u [47].We shall come back to the equations (26), and justify the denomination messages, in Sec. V C; these can beinterpreted as the Belief Propagation [46, 48, 49] heuristic equations for loopy factor graphs.The factor graph of random formulas is only locally tree-like; the simple computation sketched above has thus tobe amended in order to take into account the effect of the distant, loopy part of the formula. Let us call F L the factorgraph made of variable nodes at graph distance smaller than or equal to L from an arbitrarily chosen variable node i in a large random formula F , and B L the variable nodes at distance exactly L from i . Without loss of generalityin the thermodynamic limit, we can assume that F L is a tree. The cavity method amounts to an hypothesis on theeffect of the distant part of the factor graph, F \ F L , i.e. on the boundary condition it induces on F L . In its simplest(so called replica symmetric) version, that is believed to correctly describe the unclustered situation for α ≤ α d , F \ F L is replaced, for each variable node j in the boundary B L , by a fictitious constraint node which sends a bias u ext → j . In other words the boundary condition is factorized on the various nodes of B L ; such a simple description isexpected to be correct for α ≤ α d because, in the amputated factor graph F \ F L , the distance between the variablesof B L is typically large (of order ln N ), and these variables should thus be weakly correlated. These external biasesare then turned into random variables to take into account the randomness in the construction of the factor graphs,and Eq. (26) acquires a distributional meaning. The messages h (resp. u ) are supposed to be i.i.d. random variablesdrawn from a common distribution, the degrees ∂ ± i ( a ) being two independent Poisson random variables of parameter αk/
2. These distributional equations can be numerically solved by a population dynamics algorithm [44], also knownas a particle representation in the statistics litterature. The typical entropy density is then computed by averaging s over these distributions of h and u .This description fails in the presence of clustering, which induces correlations between the variable nodes of B L inthe amputated factor graph F \ F L . To take these correlations into account a refined version of the cavity method(termed one step of replica symmetry breaking, in short 1RSB) has been developed. It relies on the hypothesisthat the partition of the solution space into clusters γ has nice decorrelation properties: once decomposed onto thispartition, µ restricted to a cluster γ behaves essentially as in the unclustered phase (it is a pure state in statisticalmechanics jargon). Each directed edge a → i should thus bear a family of messages u γa → i , one for each cluster, oralternatively a distribution Q a → i ( u ) of the messages with respect to the choice of γ . The equations (26) are thuspromoted to recursions between distributions P i → a ( h ), Q a → i ( u ), which depends on a real m known as the Parisibreaking parameter. Its role is to select the size of the investigated clusters, i.e. the number of solutions they contain.The computation of the typical entropy density is indeed replaced by a more detailed thermodynamic potential,Φ( m ) = 1 N ln X γ Z mγ = 1 N ln Z s + s − ds int e N [Σ( s int )+ ms int ] . (28)In this formula Z γ denotes the number of solutions inside a cluster γ , and we used the hypothesis that at the leadingorder the number of clusters with internal entropy density s int is given by exp[ N Σ( s int )]. The complexity functionΣ( s int ) can thus be obtained from Φ( m ) by an inverse Legendre transform. For generic values of m this approach iscomputationally very demanding; following the same steps as in the replica symmetric version of the cavity methodone faces a distribution (with respect to the topology of the factor graph) of distributions (with respect to the choiceof the clusters) of messages. Simplifications however arise for m = 1 and m = 0 [22]; the latter case corresponds infact to the original Survey Propagation approach of [21]. As appears clearly in Eq. (28), for this value of m all clustersare treated on an equal footing and the dominant contribution comes from the most numerous clusters, independentlyof their sizes. Moreover, as we further explain in Sec. V C, the structure of the equations can be greatly simplified inthis case, the distribution over the cluster of fields being parametrized by a single number. E. Finite Size Scaling results
As we explained in Sec. II B the threshold phenomenon can be more precisely described by finite size scalingrelations. Let us mention some FSS results about the transitions we just discussed.3 ∂a \ ii au a → i j a i ∂i \ ah i → a bh j → a u b → i FIG. 4: A schematic representation of Eq. (26).
For random 2-SAT, where the satisfiability property is known [50] to exhibit a sharp threshold at α s = 1, the widthof the transition window has been determined in [51]. The range of α where the probability of satisfaction dropssignificantly is of order N − / , i.e. the exponent ν is equal to 3, as for the random graph percolation. This similarityis not surprising, the proof of [51] relies indeed on a mapping of 2-SAT formulas onto random (directed) graphs.The clustering transition for XORSAT was first conjectured in [52] (in the related context of error-correcting codes)then proved in [53] to be described by P ( N, M = N ( α d + N − / λ + N − / δ )) = F ( λ ) + O ( N − / ) , (29)where δ is a subleading shift correction that has been explicitly computed, and the scaling function F is, upto amultiplicative factor on λ , the same error function as in Eq. (4).A general result has been proved in [54] on the width of transition windows. Under rather unrestrictive conditionsone can show that ν ≥
2: the transitions cannot be arbitrarily sharp. Roughly speaking the bound is valid when afinite fraction of the clauses are not decisive for the property of the formulas studied, for instance clauses containinga leaf variable are not relevant for the satisfiability of a formula. The number of these irrelevant clauses is of order N and has thus natural fluctuations of order √ N ; these fluctuations blur the transition window which cannot be sharperthan N − / .Several studies (see for instance [33, 55, 56]) have attempted to determine the transition window from numericevaluations of the probability P ( N, α ), for instance for the satisfiability threshold of random 3-SAT [55, 56] andXORSAT [33]. These studies are necessarily confined to small formula sizes, as the typical computation cost ofcomplete algorithms grows exponentially around the transition. In consequence the asymptotic regime of the transitionwindow, N − /ν , is often hidden by subleading corrections which are difficult to evaluate, and in [55, 56] the reportedvalues of ν were found to be in contradiction with the latter derived rigorous bound. This is not an isolated case,numerical studies are often plagued by uncontrolled finite-size effects, as for instance in the bootstrap percolation [57],a variation of the classical percolation problem. IV. LOCAL SEARCH ALGORITHMS
The following of this review will be devoted to the study of various solving algorithms for SAT formulas. Algorithmsare, to some extent, similar to dynamical processes studied in statistical physics. In this context the focus is howevermainly on stochastic processes that respect detailed balance with respect to the Gibbs-Boltzmann measure [58], acondition which is rarely respected by solving algorithms. Physics inspired techniques can yet be useful, and willemerge in three different ways. The random walk algorithms considered in this Section are stochastic processes inthe space of configurations (not fulfilling the detailed balance condition), moving by small steps where one or a fewvariables are modified. Out-of-equilibrium physics (and in particular growth processes) provide an interesting view onclassical complete algorithms (DPLL), as shown in Sec. V B. Finally, the picture of the satisfiable phase put forwardin Sec. III underlies the message-passing procedures discussed in Sec. V C.
A. Pure random walk sat, definition and results valid for all instances
Papadimitriou [59] proposed the following algorithm, called Pure Random Walk Sat (PRWSAT) in the following,to solve k -SAT formulas:1. Choose an initial assignment σ (0) uniformly at random and set T = 0.42. If σ ( T ) is a solution of the formula (i.e. E ( σ ( T )) = 0), output solution and stop. If T = T max , a thresholdfixed beforehand, output undetermined and stop.3. Otherwise, pick uniformly at random a clause among those that are UNSAT in σ ( T ); pick uniformly at randomone of the k variables of this clause and flip it (reverse its status from True to False and vice-versa) to definethe next assignment σ ( T + 1); set T → T + 1 and go back to step 2.This defines a stochastic process σ ( T ), a biased random walk in the space of configurations. The modification σ ( T ) → σ ( T + 1) in step 3 makes the selected clause satisfied; however the flip of a variable i can turn previouslysatisfied clauses into unsatisfied ones (those which were satisfied solely by i in σ ( T )).This algorithm is not complete: if it outputs a solution one is certain that the formula was satisfiable (and thecurrent configuration provides a certificate of it), but if no solution has been found within the T max allowed stepsone cannot be sure that the formula was unsatisfiable. There are however two rigorous results which makes it aprobabilistically almost complete algorithm [60].For k = 2, it was shown in [59] that PRWSAT finds a solution in a time of order O ( N ) with high probability forall satisfiable instances. Hence, one is almost certain that the formula was unsatisfiable if the output of the algorithmis undetermined after T max = O ( N ) steps.Sch¨oning [61] proposed the following variation for k = 3. If the algorithm fails to find a solution before T max = 3 N steps, instead of stopping and printing undetermined , it restarts from step 1, with a new random initial condition σ (0). Sch¨oning proved that if after R restarts no solution has been found, then the probability that the instance issatisfiable is upper-bounded by exp[ − R × (3 / N ] (asymptotically in N ). This means that a computational cost oforder (4 / N allows to reduce the probability of error of the algorithm to arbitrary small values. Note that if the timescaling of this bound is exponential, it is also exponentially smaller than the 2 N cost of an exhaustive enumeration.Improvements on the factor 4 / B. Typical behavior on random k -SAT instances The results quoted above are true for any k -SAT instance. An interesting phenomenology arises when one appliesthe PRWSAT algorithm to instances drawn from the random k -SAT ensemble [63, 64]. Figure 5 displays the temporalevolution of the number of unsatisfied clauses during the execution of the algorithm, for two random 3-SAT instancesof constraint ratio α = 2 and 3. The two curves are very different: at low values of α the energy decays ratherfast towards 0, until a point where the algorithm finds a solution and stops. On the other hand, for larger values of α , the energy first decays towards a strictly positive value, around which it fluctuates for a long time, until a largefluctuation reaches 0, signaling the discovery of a solution. A more detailed study with formulas of increasing sizesreveals that a threshold value α rw ≈ . k = 3) sharply separates this two dynamical regimes. In fact the fractionof unsatisfied clauses ϕ = E/M , expressed in terms of the reduced time t = T /M , concentrates in the thermodynamiclimit around a deterministic function ϕ ( t ). For α < α rw the function ϕ ( t ) reaches 0 at a finite value t sol ( α, k ), whichmeans that the algorithm finds a solution in a linear number of steps, typically close to N t sol ( α, k ). On the contraryfor α > α rw the reduced energy ϕ ( t ) reaches a positive value ϕ as ( α, k ) as t → ∞ ; a solution, if any, can be found onlythrough large fluctuations of the energy which occur on a time scale exponentially large in N . This is an example ofa metastability phenomenon, found in several other stochastic processes, for instance the contact process [65]. Whenthe threshold α rw is reached from below the solving time t sol ( α, k ) diverges, while the height of the plateau ϕ as ( α, k )vanishes when α rw is approached from above.In [63, 64] various statistical mechanics inspired techniques have been applied to study analytically this phenomenol-ogy, some results are presented in Figure 6. The low α regime can be tackled by a systematic expansion of t sol ( α, k )in powers of α . The first three terms of these series have been computed, and are shown on the left panel to be ingood agreement with the numerical simulations.Another approach was followed to characterize the transition α rw , and to compute (approximations of) the asymp-totic fraction of unsatisfied clauses ϕ as and the intensity of the fluctuations around it. The idea is to project theMarkovian evolution of the configuration σ ( T ) on a simpler observable, the energy E ( T ). Obviously the Marko-vian property is lost in this transformation, and the dynamics of E ( T ) is much more complex. One can howeverapproximate it by assuming that all configurations of the same energy E ( T ) are equiprobable at a given step ofexecution of the algorithm. This rough approximation of the evolution of E ( T ) is found to concentrate around itsmean value in the thermodynamic limit, as was constated numerically for the original process. Standard techniquesallow to compute this average approximated evolution, which exhibits the threshold behavior explained above at avalue α = (2 k − /k which is, for k = 3, slightly lower than the threshold α rw . The right panel of Fig. 6 confronts theresults of this approximation with the numerical simulations; given the roughness of the hypothesis the agreement israther satisfying, and is expected to improve for larger values of k .5 t ' t ' FIG. 5: Fraction of unsatisfied constraints ϕ = E/M in function of reduced time t = T /M during the execution of PRWSATon random 3-SAT formulas with N = 500 variables. Top: α = 2, Bottom: α = 3. The rigorous results on the behavior of PRWSAT on random instances are very few. Let us mention in particular [66],which proved that the solving time for random 3-SAT formulas is typically polynomial up to α = 1 .
63, a result inagreement yet weaker than the numerical results presented here.
C. More performant variants of the algorithm
The threshold α rw for linear time solving of random instances by PRWSAT was found above to be much smallerthan the satisfiability threshold α s . It must however be emphasized that PRWSAT is only the simplest example ofa large family of local search algorithms, see for instance [67–71]. They all share the same structure: a solution issearched through a random walk in the space of configurations, one variable being modified at each step. The choiceof the flipped variable is made according to various heuristics; the goal is to find a compromise between the greedinessof the walk which seeks to minimize locally the energy of the current assignment, and the necessity to allow for movesincreasing the energy in order to avoid the trapping in local minima of the energy function. A frequently encounteredingredient of the heuristics, which is of a greedy nature, is the focusing: the flipped variable necessarily belongs to atleast one unsatisfied clause before the flip, which thus becomes satisfied after the move. Moreover, instead of choosingrandomly one of the k variables of the unsatisfied clause, one can consider for each of them the effect of the flip, andavoid variables which, once flipped, will turn satisfied clauses into unsatisfied ones [67, 68]. Another way to implementthe greediness [69] consists in bookkeeping the lowest energy found so far during the walk, and forbids flips which willraise the energy of the current assignment above the registered record plus a tolerance threshold. These demandingrequirements have to be balanced with noisy, random steps, allowing to escape traps which are only locally minimaof the objective function.These more elaborated heuristics are very numerous, and depend on parameters that are finely tuned to achievethe best performances, hence an exhaustive comparison is out of the scope of this review. Let us only mention thatsome of these heuristics are reported in [69, 70] to efficiently find solutions of large (up to N = 10 ) random formulasof 3-SAT at ratio α very close to the satisfiability threshold, i.e. for α . . (cid:11)tsol 1.210.80.60.40.200.240.210.180.150.12 (cid:11) ' a s FIG. 6: Top: linear solving time t sol ( α,
3) for random 3-SAT formulas in function of α ; symbols correspond to numericalsimulations, solid line to the second order expansion in α obtained in [63]. Bottom: fraction of unsatisfied constraints reachedat large time for α > α rw for random 3-SAT formulas; symbols correspond to numerical simulations, solid line to the approximateanalytical computations of [63, 64]. V. DECIMATION BASED ALGORITHMS
The algorithms studied in the remaining of the review are of a very different nature compared to the local searchprocedures described above. Given an initial formula F whose satisfiability has to be decided, they proceed by assigningsequentially the value of some of the variables. The formula can be simplified under such a partial assignment: clauseswhich are satisfied by at least one of their literal can be removed, while literals unsatisfying a clause are discardedfrom the clause. It is instructive to consider the following thought experiment: suppose one can consult an oracle who,given a formula, is able to compute the marginal probability of the variables, in the uniform probability measure overthe optimal assignments of the formula. With the help of such an oracle it would be possible to sample uniformly theoptimal assignments of F , by computing these marginals, setting one unassigned variable according to its marginal,and then proceed in the same way with the simplified formula. A slightly less ambitious, yet still unrealistic, task isto find one optimal configuration (not necessarily uniformly distributed) of F ; this can be performed if the oracle isable to reveal, for each formula he is questioned about, which of the unassigned variables take the same value in alloptimal assignments, and what is this value. Then it is enough to avoid setting incorrectly such a constrained variableto obtain at the end an optimal assignment.Of course such procedures are not meant as practical algorithms; instead of these fictitious oracles one has toresort to simplified evidences gathered from the current formula to guide the choice of the variable to assign. InSec. V A we consider algorithms exploiting basic information on the number of occurrences of each variable, and theirbehavior in the satisfiable regime of random SAT formulas. They are turned into complete algorithms by allowing forbacktracking the heuristic choices, as explained in V B. Finally in Sec. V C we shall use more refined message-passingsub-procedures to provide the information used in the assignment steps.7 A. Heuristic search: the success-to-failure transition
The first algorithm we consider was introduced and analyzed by Franco and his collaborators [72, 73].1. If a formula contains a unit clause i.e. a clause with a single variable, this clause is satisfied through anappropriate assignment of its unique variable (propagation); If the formula contains no unit-clause a variableand its truth value are chosen according to some heuristic rule (free choice). Note that the unit clause propagationcorresponds to the obvious answer an oracle would provide on such a formula.2. Then the clauses in which the assigned variable appears are simplified: satisfied clauses are removed, the otherones are reduced.3. Resume from step 1.The procedure will end if one of two conditions is verified:1. The formula is completely empty (all clauses have been removed), and a solution has been found ( success ).2. A contradiction is generated from the presence of two opposite unit clauses. The algorithm halts. We do notknow if a solution exists and has not been found or if there is no solution ( failure ).The simplest example of heuristic is called Unit Clause (UC) and consists in choosing a variable uniformly atrandom among those that are not yet set, and assigning it to true or false uniformly at random. More sophisticatedheuristics can take into account the number of occurrences of each variable and of its negation, the length of the clausesin which each variable appears, or they can set more than one variable at a time. For example, in the GeneralizedUnit Clause (GUC), the variable is always chosen among those appearing in the shortest clauses.Numerical experiments and theory show that the results of this procedure applied to random k -SAT formulas withratios α and size N can be classified in two regimes: • At low ratio α < α H the search procedure finds a solution with positive probability (over the formulas and therandom choices of the algorithm) when N → ∞ . • At high ratio α > α H the probability of finding a solution vanishes when N → ∞ . Notice that α H < α s :solutions do exist in the range [ α H , α s ] but are not found by this heuristic.The above algorithm modifies the formula as it proceeds; during the execution of the algorithm the current formulawill contain clauses of length 2 and 3 (we specialize here to k = 3-SAT for the sake of simplicity but higher valuesof k can be considered). The sub-formulas generated by the search procedure maintain their statistical uniformity(conditioned on the number of clauses of length 2 and 3). Franco and collaborators used this fact to write downdifferential equations for the evolution of the densities of 2- and 3-clauses as a function of the fraction t of eliminatedvariables. We do not reproduce those equations here, see [74] for a pedagogical review. Based on this analysis Friezeand Suen [75] were able to calculate, in the limit of infinite size, the probability of successful search. The outcome forthe UC heuristic is P (UC)success ( α ) = exp ( − p / α − " p / α − − α ) (30)when α < , and P = 0 for larger ratios. The probability P success is, as expected, a decreasing function of α ; itvanishes in α H = . A similar calculation shows that α H ≃ .
003 for the GUC heuristic [75].Franco et al’s analysis can be recast in the following terms. Under the operation of the algorithm the original 3-SATformula is turned into a mixed 2 + p -SAT formula where p denotes the fraction of the clauses with 3 variables: thereare N α · (1 − p ) 2-clauses and N αp α and p . This constatation motivated the study of therandom 2 + p -SAT ensemble by statistical mechanics methods [20, 56], some of the results being later confirmed by therigorous analysis of [76]. At the heuristic level one expects the existence of a p dependent satisfiability threshold α s ( p ),interpolating between the 2-SAT known threshold, α s ( p = 0) = 1, and the conjectured 3-SAT case, α s ( p = 1) ≈ . α s ( p ) ≤ / (1 − p ) is easily obtained: for the mixed formula to be satisfiable, necessarily the sub-formula obtained by retaining only the clauses of length 2 must be satisfiable as well. In fact this bound is tightfor all values of p ∈ [0 , / α and the fraction p are ‘dynamical’parameters, changing with the fraction t = T /N of variables assigned by the algorithm. They define the coordinates8 p H t L Α H t L GG’
FIG. 7: Trajectories generated by heuristic search acting on 3-SAT for α = 2 and α = 3 .
5. For all heuristics, the starting pointis on the p = 1 axis, with the initial value of α as ordinate. The curves that end at the origin correspond to UC, those endingon the p = 1 axis correspond to GUC. The thick line represents the satisfiability threshold: the part on the left of the criticalpoint (2 / , /
3) is exact and coincides with the contradiction line, where contradictions are generated with high probability, ofequation α = 1 / (1 − p ), and which is plotted for larger values of p as well; the part on the right of the critical point is only asketch. When the trajectories hit the satisfiability threshold, at points G for UC and G’ for GUC, they enter a region in whichmassive backtracking takes place, and the trajectory represents the evolution prior to backtracking. The dashed part of thecurves is “unphysical”, i.e. the trajectories stop when the contradiction curve is reached. of the representative point of the instance at ‘time’ t in the ( p, α ) plane of Figure 7. The motion of the representativepoint defines the search trajectory of the algorithm. Trajectories start from the point of coordinates p (0) = 1 , α (0) = α and end up on the α = 0 axis when a solution is found. The probability of success is positive as long as the 2-SATsubformula is satisfiable, that is, as long as α · (1 − p ) <
1. In other words success is possible provided the trajectorydoes not cross the contradiction line α = 1 / (1 − p ) (Figure 7). The largest initial ratio α such that no crossing occursdefines α H . Notice that the search trajectory is a stochastic object. However Franco has shown that the deviationsfrom its average locus in the plane vanish in the N → ∞ limit (concentration phenomenon). Large deviations fromthe typical behavior can be calculated e.g. to estimate the probability of success above α H [77].The precise form of P success and the value α H of the ratio where it vanishes are specific to the heuristic considered(UC in (30)). However the behavior of the probability close to α H is largely independent of the heuristic (providedit preserves the uniformity of the subformulas generated):ln P success (cid:0) α = α H (1 − λ ) (cid:1) ∼ − λ − / . (31)This universality can loosely be interpreted by observing that for α close to α H the trajectory will pass very closeto the contradiction curve α · (1 − p ) = 1, which characterizes the locus of the points where the probability that avariable is assigned by the heuristics H vanishes (and all the variables are assigned by Unit Propagation). The valueof α H depend on the “shape” of the trajectory far from this curve, and will therefore depend on the heuristics, butthe probability of success (i.e. of avoiding the contradiction curve) for values of α close to α H will only depend on thelocal behavior of the trajectory close to the contradiction curve, a region where most variables are assigned throughUnit Propagation and not sensitive to the heuristics.The finite-size corrections to equation (30) are also universal (i.e. independent on the heuristics):ln P success ( α = α H (1 − λ ) , N ) ∼ − N / F ( λN / ) , (32)where F is a universal scaling function which can be exactly expressed in terms of the Airy function [78]. This resultindicates that right at α H the probability of success decreases as a stretched exponential ∼ exp( − cst N ).The exponent suggests that the critical scaling of P is related to random graphs. After T = t N steps of theprocedure, the sub-formula will consists of C , C and C clauses of length 3, 2 and 1 respectively (notice that theseare extensive , i.e. O ( N ) quantities). We can represent the clauses of length 1 and 2 (which are the relevant ones to9understand the generation of contradictions) as an oriented graph G in the following way. We will have a vertex for eachliteral, and represent 1-clauses by “marking” the literal appearing in each; a 2-clause will be represented by two directededges, corresponding to the two implications equivalent to the clause (for example, x ∨ ¯ x is represented by the directededges ¯ x → ¯ x and x → x ). The average out-degree of the vertices in the graph is γ = C / ( N − T ) = α ( t )(1 − p ( t )).What is the effect of the algorithm on G ? The algorithm will proceed in “rounds”: a variable is set by the heuristics,and a series of Unit Propagations are performed until no more unit clauses are left, at which point a new round starts.Notice that during a round, extensive quantities as C , C , C are likely to vary by bounded amounts and γ to varyby O ( N ) (this is the very reason that guarantees that these quantities are concentrated around their mean). Ateach step of Unit Propagation, a marked literal (say x ) is assigned and removed from G , together with all the edgesconnected to it, and the “descendants” of x (i.e. the literals at the end of outgoing edges) are marked. Also ¯ x isremoved together with its edges, but its descendants are not marked. Therefore, the marked vertices “diffuse” in aconnected component of G following directed edges. Moreover, at each step new edges corresponding to clauses oflength 3 that get simplified into clauses of length 2 are added to the graph.When γ > G undergoes a directed percolation transition, and a giant component of size O ( N ) appears, in whichit is possible to go from any vertex to any other vertex by following a directed path. When this happens, there isa finite probability that two opposite literals x and ¯ x can be reached from some other literal y following a directedpath. If ¯ y is selected by Unit Propagation, at some time both x and ¯ x will be marked, and this corresponds to acontradiction. This simple argument explains more than just the condition γ = α · (1 − p ) = 1 for the failure of theheuristic search. It can also be used to explain the the exponent in the scaling (32), see [78, 79] for more details. B. Backtrack-based search: the Davis-Putnam-Loveland-Logeman procedure
The heuristic search procedure of the previous Section can be easily turned into a complete procedure for findingsolutions or proving that formulas are not satisfiable. When a contradiction is found the algorithm now backtracks tothe last assigned variable (by the heuristic; unit clause propagations are merely consequences of previous assignments),invert it, and the search resumes. If another contradiction is found the algorithm backtracks to the last-but-oneassigned variable and so on. The algorithm stops either if a solution is found or all possible backtracks have beenunsuccessful and a proof of unsatisfiability is obtained. This algorithm was proposed by Davis, Putnam, Lovelandand Logemann and is referred to as DPLL in the following.The history of the search process can be represented by a search tree, where the nodes represent the variablesassigned, and the descending edges their values (Figure 8). The leaves of the tree correspond to solutions (S), or tocontradictions (C). The analysis of the α < α H regime in the previous Section leads us to the conclusion that searchtrees look like Figure 8A at small ratios .For ratios α > α H DPLL is very likely to find a contradiction. Backtracking enters into play, and is responsiblefor the drastic slowing down of the algorithm. The success-to-failure transition takes place in the non-backtrackingalgorithm into a polynomial-to-exponential transition in DPLL. The question is to compute the growth exponent ofthe average tree size, T ∼ e Nτ ( α ) , as a function of the ratio α .
1. Exponential regime: Unsatisfiable formulas
Consider first the case of unsatisfiable formulas ( α > α s ) where all leaves carry contradictions after DPLL halts(Figure 8B). DPLL builds the tree in a sequential manner, adding nodes and edges one after the other, and completingbranches through backtracking steps. We can think of the same search tree built in a parallel way [80]. At time (depth T ) our tree is composed of L ( T ) ≤ T branches, each carrying a partial assignment over T variables. Step T consistsin assigning one more variable to each branch, according to DPLL rules, that is, through unit-propagation or theheuristic rule. In the latter case we will speak of a splitting event, as two branches will emerge from this node,corresponding to the two possible values of the variable assigned. The possible consequences of this assignment arethe emergence of a contradiction (which put an end to the branch), or the simplification of the attached formulas (thebranch keeps growing).The number of branches L ( T ) is a stochastic variable. Its average value can be calculated as follows [81]. Letus define the average number L ( ~C ; T ) of branches of depth T which bear a formula containing C (resp. C , C ) A small amount of backtracking may be necessary to find the solution since P success < c cc c cc cc cc cc c c cc c c c ScG A CB S FIG. 8: Search trees generated by DPLL: A. linear, satisfiable ( α < α H ); B. exponential, unsatisfiable ( α > α c ). C. exponential,satisfiable ( α H < α < α c ); Leaves are marked with S (solutions) or C (contradictions). G is the highest node to which DPLLbacktracks, see Figure 7. equations of length 3 (resp. 2,1), with ~C = ( C , C , C ) Initially L ( ~C ; 0) = 1 for ~C = (0 , , αN ), 0 otherwise. We shallcall M ( ~C ′ , ~C ; T ) the average number of branches described by ~C ′ generated from a ~C branch once the T th variable isassigned [79, 80]. We have 0 ≤ M ≤
2, the extreme values corresponding to a contradiction and to a split respectively.We claim that L ( ~C ′ ; T + 1) = X ~C M ( ~C ′ , ~C ; T ) L ( ~C ; T ) . (33)Evolution equation (33) could look like somewhat suspicious at first sight due to its similarity with the approximationwe have sketched in Sec. IV B for the analysis of PRWSAT. Yet, thanks to the linearity of expectation, the correlationsbetween the branches (or better, the instances carried by the branches) do not matter as far as the average numberof branches is concerned.For large N we expect that the number of alive (not hit by a contradiction) branches grows exponentially with thedepth, or, equivalently, X C ,C ,C L ( C , C , C ; T ) ∼ e N λ ( t )+ o ( N ) (34)The argument of the exponential, λ ( t ), can be found using partial differential equation techniques generalizing theordinary differential equation techniques of a single branch in the absence of backtracking (Section V A). Detailscan be found in [81]. The outcome is that λ ( t ) is a function growing from λ = 0 at t = 0, reaching a maximumvalue λ M for some depth t M , and decreasing at larger depths. t M is the depth in the tree of Figure 8B where mostcontradictions are found; the number of contradiction leaves is, to exponential order, e Nλ M . We conclude that thelogarithm of the average size of the tree we were looking for is τ = λ M . (35)For large α ≫ α s one finds τ = O (1 /α ), in agreement with the asymptotic scaling of [82]. The calculation can beextended to higher values of k .
2. Exponential regime: Satisfiable formulas
The above calculation holds for the unsatisfiable, exponential phase. How can we understand the satisfiable butexponential regime α H < α < α s ? The resolution trajectory crosses the SAT/UNSAT critical line α s ( p ) at somepoint G shown in Figure 7. Immediately after G the instance left by DPLL is unsatisfiable. A subtree with all itsleaves carrying contradictions will develop below G (Figure 8C). The size τ G of this subtree can be easily calculatedfrom the above theory from the knowledge of the coordinates ( p G , α G ) of G. Once this subtree has been built DPLLbacktracks to G, flips the attached variable and will finally end up with a solution. Hence the (log of the) number ofsplits necessary will be equal to τ = (1 − t G ) τ G split [80]. Remark that our calculation gives the logarithm of the averagesubtree size starting from the typical value of G. Numerical experiments show that the resulting value for τ coincidesvery accurately with the most likely tree size for finding a solution. The reason is that fluctuations in the sizes aremostly due to fluctuations of the highest backtracking point G, that is, of the first part of the search trajectory [77].1 C. Message passing algorithms
According to the thought experiment proposed at the beginning of this Section valuable information could beobtained from the knowledge of the marginal probabilities of variables in the uniform measure over optimal configu-rations. This is an inference problem in the graphical model associated to the formula. In this field message passingtechniques (for instance Belief Propagation, or the min-sum algorithm) are widely used to compute approximatelysuch marginals [46, 48]. These numerical procedures introduce messages on the directed edges of the factor graphrepresentation of the problem (recall the definitions given in Sec. III D), which are iteratively updated, the new valueof a message being computed from the old values of the incoming messages (see Fig. 4). When the underlying graphis a tree, the message updates are guaranteed to converge in a finite number of steps, and provide exact results.In the presence of cycles the convergence of these recurrence equations is not guaranteed; they can however be usedheuristically, the iterations being repeated until a fixed point has been reached (within a tolerance threshold). Thoughvery few general results on the convergence in presence of loops are known [83] (see also [84] for low α random SATformulas) these heuristic procedures are often found to yield good approximation of the marginals on generic factorgraph problems.The interest in this approach for solving random SAT instances was triggered in the statistical mechanics communityby the introduction of the Survey Propagation algorithm [21]. Since then several generalizations and reinterpretationsof SP have been put forward, see for instance [85–90]. In the following paragraph we present three different messagepassing procedures, which differ in the nature of the messages passed between nodes, following rather closely thepresentation of [47] to which we refer the reader for further details. We then discuss how these procedures have tobe interleaved with assignment (decimation) steps in order to constitute a solver algorithm. Finally we shall reviewresults obtained in a particular limit case (large α satisfiable formulas).
1. Definition of the message-passing algorithms • Belief Propagation (BP)For the sake of readability we recall here the recursive equations (26) stated in Sec. III D for the uniformprobability measure over the solutions of a tree formula, h i → a = X b ∈ ∂ + i ( a ) u b → i − X b ∈ ∂ − i ( a ) u b → i , (36) u a → i = −
12 ln − Y j ∈ ∂a \ i − tanh h j → a . where the h and u ’s messages are reals (positive for u ), parametrizing the marginal probabilities (beliefs) for thevalue of a variable in absence of some constraint nodes around it (cf. Eq. (25)). These equations can be usedin the heuristic way explained above for any formula, and constitute the BP message-passing equations. Notethat in the course of the simplification process the degree of the clauses change, we thus adopt here and in thefollowing the natural convention that sums (resp. products) over empty sets of indices are equal to 0 (resp. 1). • Warning Propagation (WP)The above-stated version of the BP equations become ill-defined for an unsatisfiable formula, whether thiswas the case of the original formula or because of some wrong assignment steps; in particular the normalizationconstant of Eq. (24) vanishes. A way to cure this problem consists in introducing a fictitious inverse temperature β and deriving the BP equations corresponding to the regularized Gibbs-Boltzmann probability law (20), takingas the energy function the number of unsatisfied constraints. In the limit β → ∞ , in which the Gibbs-Boltzmannmeasure concentrates on the optimal assignments, one can single out a part of the information conveyed by theBP equations to obtain the simpler Warning Propagation rules. Indeed the messages h, u are at leading orderproportional to β , with proportionality coefficients we shall denote b h and b u . These messages are less informativethan the ones of BP, yet simpler to handle. One finds indeed that instead of reals the WP messages are integers,more precisely b h ∈ Z and b u ∈ { , } . They obey the following recursive equations (with a structure similar to2the ones of BP), b h i → a = X b ∈ ∂ + i ( a ) b u b → i − X b ∈ ∂ − i ( a ) b u b → i , b u a → i = Y j ∈ ∂a \ i I ( b h j → a < , (37)where I ( E ) is the indicator function of the event E . The interpretation of these equations goes as follows. b u a → i is equal to 1 if in all optimal assignments of the amputated formula in which i is only constrained by a , i takesthe value satisfying a . This happens if all other variables of clause a (i.e. ∂a \ i ) are required to take their valuesunsatisfying a , hence the form of the right part of (37). In such a case we say that a sends a warning to variable i . In the first part of (37), the message b h i → a sent by a variable to a clause is computed by pondering the numberof warnings sent by all other clauses; it will in particular be negative if a majority of clauses requires i to takethe value unsatisfying a . • Survey Propagation (SP)The convergence of BP and WP iterations is not ensured on loopy graphs. In particular the clustering phe-nomenon described in Sec. III A is likely to spoil the efficiency of these procedures. The Survey Propagation(SP) algorithm introduced in [21] has been designed to deal with these clustered space of configurations. Theunderlying idea is that the simple iterations (of BP or WP type) remain valid inside each cluster of optimalassignments; for each of these clusters γ and each directed edge of the factor graph one has a message h γi → a (and u γa → i ). One introduces on each edge a survey of these messages, defined as their probability distribution withrespect to the choice of the clusters. Then some hypotheses are made on the structure of the cluster decompo-sition in order to write closed equations on the survey. We explicit now this approach in a version adapted tosatisfiable instances [47], taking as the basic building block the WP equations. This leads to a rather simpleform of the survey. Indeed b u a → i can only take two values, its probability distribution can thus be parametrizedby a single real δ a → i ∈ [0 , b u a → i = 1. Similarly the survey γ i → a is the probability that b h i → a <
0. The second part of (37) is readily translated in probabilistic terms, δ a → i = Y j ∈ ∂a \ i γ j → a . (38)The other part of the recursion takes a slightly more complicated form, γ i → a = (1 − π − i → a ) π + i → a π + i → a + π − i → a − π + i → a π − i → a , with π + i → a = Q b ∈ ∂ + i ( a ) (1 − δ b → i ) π − i → a = Q b ∈ ∂ − i ( a ) (1 − δ b → i ) . (39)In this equation π + i → a (resp. π − i → a ) corresponds to the probability that none of the clauses agreeing (resp.disagreeing) with a on the value of the literal of i sends a warning. For i to be constrained to the valueunsatisfying a , at least one of the clauses of ∂ − i ( a ) should send a warning, and none of ∂ + i ( a ), which explainsthe form of the numerator of γ i → a . The denominator arises from the exclusion of the event that both clausesin ∂ + i ( a ) and ∂ − i ( a ) send messages, a contradictory event in this version of SP which is devised for satisfiableformulas.From the statistical mechanics point of view the SP equations arise from a 1RSB cavity calculation, as sketchedin Sec. III D, in the zero temperature limit ( β → ∞ ) and vanishing Parisi parameter m , these two limits beingeither taken simultaneously as in [21, 89] or successively [22]. One can thus compute, from the solution ofthe recursive equations on a single formula, an estimation of its complexity, i.e. the number of its clusters(irrespectively of their sizes). The message passing procedure can also be adapted, at the price of technicalcomplications, to unsatisfiable clustered formulas [89]. Note also that the above SP equations have been shownto correspond to the BP ones in an extended configuration space where variables can take a “joker” value [85, 86],mimicking the variables which are not frozen to a single value in all the assignments of a given cluster. Heuristicinterpolations between the BP and SP equations have been studied in [86, 87].3
2. Exploiting the information
The information provided by these message passing procedures can be exploited in order to solve satisfiabilityformulas; in the algorithm sketched at the beginning of Sec. V A the heuristic choice of the assigned variable, and itstruth value, can be done according to the results of the message passing on the current formula. If BP were an exactinference algorithm, one could choose any unassigned variable, compute its marginal according to Eq. (27), and drawit according to this probability. Of course BP is only an approximate procedure, hence a practical implementation ofthis idea should privilege the variables with marginal probabilities closest to a deterministic law (i.e. with the largest | h i | ), motivated by the intuition that these are the least subject to the approximation errors of BP. Similarly, if themessage passing procedure used at each assignment step is WP, one can fix the variable with the largest | b h i | to thevalue corresponding to the sign of b h i . In the case of SP, the solution of the message passing equations are used tocompute, for each unassigned variable i , a triplet of numbers ( γ + i , γ − i , γ i ) according to γ + i = (1 − π + i ) π − i π + i + π − i − π + i π − i , γ − i = (1 − π − i ) π + i π + i + π − i − π + i π − i , γ i = 1 − γ + i − γ − i , with π + i = Q a ∈ ∂ + i (1 − δ a → i ) π − i = Q a ∈ ∂ − i (1 − δ a → i ) . (40) γ + i (resp. γ − i ) is interpreted as the fraction of clusters in which σ i = +1 (resp. σ i = −
1) in all solutions of the cluster,hence γ i corresponds to the clusters in which σ i can take both values. In the version of [47], one then choose thevariable with the largest | γ + i − γ − i | , and fix it to σ i = +1 (resp. σ i = −
1) if γ + i > γ − i (resp. γ + i < γ − i ). In this wayone tries to select an assignment preserving the maximal number of clusters.Of course many variants of these heuristic rules can be devised; for instance after each message passing computationone can fix a finite fraction of the variables (instead of a single one), allows for some amount of backtracking [91], orincrease a soft bias instead of assigning completely a variable [90]. Moreover the tolerance on the level of convergenceof the message passing itself can also be adjusted. All these implementation choices will affect the performances of thesolver, in particular the maximal value of α up to which random SAT instances are solved efficiently, and thus makesdifficult a precise statement about the limits of these algorithms. In consequence we shall only report the impressiveresult of [47], which presents an implementation [92] working for random 3-SAT instances up to α = 4 .
24 (very closeto the conjectured satisfiability threshold α s ≈ . N = 10 .The theoretical understanding of these message passing inspired solvers is still poor compared to the algorithmsstudied in Sec. V A, which use much simpler heuristics in their assignment steps. One difficulty is the description ofthe residual formula after an extensive number of variables have been assigned; because of the correlations betweensuccessive steps of the algorithm this residual formula is not uniformly distributed conditioned on a few dynamicalparameters, as was the case with ( α ( t ) , p ( t )) for the simpler heuristics of Sec. V A. One version of BP guided decimationcould however be studied analytically in [93], by means of an analysis of the thought experiment discussed at thebeginning of Sec. V. The study of another simple message passing algorithm is presented in the next paragraph.
3. Warning Propagation on dense random formulas
Feige proved in [94] a remarkable connection between the worst-case complexity of approximation problems andthe structure of random N ) values of the ratio α . He introduced the followinghardness hypothesis for random 3-SAT formulas:Hypothesis 1: Even if α is arbitrarily large (but independent of N ), there is no polynomial time algorithm that onmost 3-SAT formulas outputs UNSAT, and always outputs SAT on a 3-SAT formula that is satisfiable .and used it to derive hardness of approximation results for various computational problems. As we have seen theseinstances are typically unsatisfiable; the problem of interest is thus to recognize efficiently the rare satisfiable instancesof the distribution.A variant of this problem was studied in [95], where WP was proven to be effective in finding solutions of denseplanted random formulas (the planted distribution is the uniform distribution conditioned on being satisfied by agiven assignment). More precisely, [95] proves that for α large enough (but independent of N ), the following holdswith probability 1 − e − O ( α ) :1. WP converges after at most O (ln N ) iterations.42. If a variable i has b h i = 0, then the sign of b h i is equal to the value of σ i in the planted assignment. The numberof such variables is bigger than N (1 − e − O ( α ) ) (i.e. almost all variables can be reconstructed from the values of b h i ).3. Once these variables are fixed to their correct assignments, the remaining formula can be satisfied in time O ( N )(in fact, it is a tree formula).On the basis of non-rigorous statistical mechanics methods, these results were argued in [96] to remain true whenthe planted distribution is replaced by the uniform distribution conditioned on being satisfiable. In other wordsby iterating WP for a number of iterations bigger than O (ln N ) one is able to detect the rare satisfiable instancesat large α . The argument is based on the similarity of structure between the two distributions at large α , namelythe existence of a single, small cluster of solutions where almost all variables are frozen to a given value. Thiscorrespondence between the two distributions of instances was proven rigorously in [97], where it was also shown thata related polynomial algorithm succeeds with high probability in finding solutions of the satisfiable distribution oflarge enough density α .These results indicate that a stronger form of hypothesis 1, obtained by replacing always with with probability p (with respect to the uniform distribution over the formulas and possibly to some randomness built in the algorithm), iswrong for any p <
1. However, the validity of hypothesis 1 is still unknown for random 3-SAT instances. Nevertheless,this result is interesting because it is one of the rare cases in which the performances of a message-passing algorithmcould be analyzed in full detail.
VI. CONCLUSION
This review was mainly dedicated to the random k -Satisfiability and k -Xor-Satisfiability problems; the approachand results we presented however extend to other random decision problems, in particular random graph q -coloring.This problem consists in deciding whether each vertex of a graph can be assigned one out of q possible colors, withoutgiving the same color to the two extremities of an edge. When input graphs are randomly drawn from Erd¨os-Renyi(ER) ensemble G ( N, p = c/N ) a phase diagram similar to the one of k -SAT (Section III) is obtained. There existsa colorable/uncolorable phase transition for some critical average degree c s ( q ), with for instance c s (3) ≃ .
69 [98].The colorable phase also exhibits the clustering and condensation transitions [99] we explained on the example of the k -Satisfiability. Actually what seems to matter here is rather the structure of inputs and the symmetry propertiesof the decision problem rather than its specific details. All the above considered input models share a common,underlying ER random graph structure. From this point of view it would be interesting to ‘escape’ from the ERensemble and consider more structured graphs e.g. embedded in a low dimensional space.To what extent the similarity between phase diagrams correspond to similar behaviour in terms of hardness ofresolution is an open question. Consider the case of rare satisfiable instances for the random k -SAT and k -XORSATwell above their sat/unsat thresholds (Section V). Both problems share very similar statistical features. However, whilea simple message-passing algorithm allows one to easily find a (the) solution for the k -SAT problem this algorithm isinefficient for random k -XORSAT. Actually the local or decimation-based algorithms of Sections IV and V are efficientto find solution to rare satisfable instances of random k -SAT [100], but none of them works for random k -XORSAT(while the problem is in P!). This example raises the important question of the relationship between the statisticalproperties of solutions (or quasi-solutions) encoded in the phase diagram and the (average) computational hardness.Very little is known about this crucial point; on intuitive grounds one could expect the clustering phenomenon toprevent an efficient solving of formulas by local search algorithms of the random walk type. This is indeed true fora particular class of stochastic processes [101], those which respect the so-called detailed balance conditions. Thisconnection between clustering and hardness of resolution for local search algorithms is much less obvious when thedetailed balance conditions are not respected, which is the case for most of the efficient variants of PRWSAT. [1] M. M´ezard, G. Parisi, and M. Virasoro, Spin glass theory and beyond (World Scientific, Singapore, 1987).[2] C. Papadimitriou and K. Steiglitz,
Combinatorial Optimization: Algorithms and Complexity (Dover, New York, 1998).[3] Y. Fu and P. W. Anderson, Journal of Physics A: Mathematical and General , 1605 (1986).[4] D. Mitchell, B. Selman, and H. Levesque (1992), no. 459 in Proceedings of the Tenth National Conference on ArtificialIntelligence.[5] J. Hertz, A. Krogh, and R. Palmer, Introduction to the theory of neural computation , Santa Fe Institute Studies in theScience of Complexity (Addison-Wesley, Redwood city (CA), 1991). [6] T. Cover, IEEE Transactions on Electronic Computers , 326 (1965).[7] S. Janson, T. Luczak, and A. Rucinski, Random graphs (John Wiley and Sons, New York, 2000).[8] E. Friedgut, Journal of the American Mathematical Society , 1017 (1999).[9] O. Dubois, Theoret. Comput. Sci. , 187 (2001).[10] J. Franco, Theoret. Comput. Sci. , 147 (2001).[11] D. Achlioptas and Y. Peres, Journal of the American Mathematical Society , 947 (2004).[12] Chapter random sat, this volume .[13] N. Alon and J. Spencer,
The probabilistic method (John Wiley and sons, New York, 2000).[14] A. Dembo and O. Zeitouni,
Large deviations. Theory and applications (Springer, Berlin, 1998).[15] W. Krauth and M. Mezard, J. Physique , 3057 (1989).[16] S. K. Ma, Statistical Mechanics (World Scientific, Singapore, 1985).[17] K. Huang,
Statistical Mechanics (John Wiley and Sons, New York, 1990).[18] A. Broder, A. Frieze, and E. Upfal (1993), no. 322 in Proceedings of the Fourth Annual ACM-SIAM Symposium onDiscrete Algorithms.[19] R. Monasson and R. Zecchina, Phys. Rev. E , 1357 (1997).[20] G. Biroli, R. Monasson, and M. Weigt, Eur. Phys. J. B , 551 (2000).[21] M. M´ezard and R. Zecchina, Phys. Rev. E , 056126 (2002).[22] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, and L. Zdeborova, Proceedings of the National Academyof Sciences , 85 (1994).[24] T. R. Kirkpatrick and D. Thirumalai, Phys. Rev. B , 5388 (1987).[25] M. Talagrand, Spin glasses: a challenge for mathematicians (Springer, Berlin, 2003).[26] D. Panchenko and M. Talagrand, Probab. Theory Relat. Fields , 319 (2004).[27] S. Franz and M. Leone, J. Stat. Phys. , 535 (2003).[28] M. M´ezard, F. Ricci-Tersenghi, and R. Zecchina, J. Stat. Phys. , 505 (2003).[29] S. Cocco, O. Dubois, J. Mandler, and R. Monasson, Phys. Rev. Lett. , 047205 (2003).[30] M. M´ezard, T. Mora, and R. Zecchina, Physical Review Letters , 197205 (pages 4) (2005).[31] H. Daud´e, M. M´ezard, T. Mora, and R. Zecchina (2005), arXiv:cond-mat/0506053 .[32] D. Achlioptas and F. Ricci-Tersenghi, Proceedings of the thirty-eighth annual ACM symposium on Theory of computing(2006), arXiv:cs.CC/0611052 .[33] F. Ricci-Tersenghi, M. Weigt, and R. Zecchina, Phys. Rev. E , 026702 (2001).[34] B. Pittel, J. Spencer, and N. Wormald, J. Comb. Theory, Ser. B , 111 (1996).[35] T. Kurtz, J. Appl. Probab. , 49 (1970).[36] A. Montanari and G. Semerjian, J. Stat. Phys. , 103 (2006).[37] T. Mora and M. M´ezard, Journal of Statistical Mechanics: Theory and Experiment , P10007 (2006).[38] S. Mertens, M. M´ezard, and R. Zecchina, Random Struct. Algorithms , 340 (2006).[39] M. M´ezard, M. Palassini, and O. Rivoire, Physical Review Letters , 200202 (pages 4) (2005).[40] A. Montanari, G. Parisi, and F. Ricci-Tersenghi, Journal of Physics A: Mathematical and General , 2073 (2004).[41] T. Mora and L. Zdeborova (2007), arXiv:0710.3804 .[42] G. Semerjian, J.Stat.Phys. , 251 (2008).[43] R. Monasson, Journal of Physics A: Mathematical and General , 513 (1998).[44] M. M´ezard and G. Parisi, Eur. Phys. J. B , 217 (2001).[45] M. M´ezard and G. Parisi, J. Stat. Phys. , 1 (2003).[46] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, IEEE Trans. Inf. Theory , 498 (2001).[47] A. Braunstein, M. M´ezard, and R. Zecchina, Random Struct. Algorithms , 201 (2005).[48] J. S. Yedidia, W. T. Freeman, and Y. Weiss, Advances in Neural Information Processing Systems , 689 (2001).[49] J. S. Yedidia, W. T. Freeman, and Y. Weiss, in Exploring Artificial Intelligence in the New Millennium (2003), p. 239.[50] W. Fernandez de la Vega, Theor. Comput. Sci. , 131 (2001).[51] B. Bollob´as, C. Borgs, J. T. Chayes, J. H. Kim, and D. B. Wilson, Random Struct. Algorithms , 201 (2001).[52] A. Amraoui, A. Montanari, T. Richardson, and R. Urbanke, arXiv:cs.IT/0406050 (2004).[53] A. Dembo and A. Montanari, arXiv:math.PR/0702007 (2007).[54] D. B. Wilson, Random Struct. Algorithms , 182 (2002).[55] S. Kirkpatrick and B. Selman, Science , 1297 (1994).[56] R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman, and L. Troyansky, Random Struct. Algorithms , 414 (1999).[57] P. De Gregorio, A. Lawlor, P. Bradley, and K. Dawson, PNAS , 5669 (2005).[58] L. Cugliandolo, in Slow relaxations and nonequilibrium dynamics in condensed matter , edited by J. L. Barrat, M. Feigel-man, J. Kurchan, and J. Dalibard (Springer-Verlag, Les Houches, France, 2003).[59] C. Papadimitriou, in
Proceedings of the 32th Annual Symposium on Foundations of Computer Science (1991), pp. 163–169.[60] R. Motwani and P. Ravaghan,
Randomized algorithms (Cambridge University Press, Cambridge, 1995).[61] U. Sch¨oning, Algorithmica , 615 (2002), ISSN 0178-4617 (print), 1432-0541 (electronic).[62] S. Baumer and R. Schuler, Lecture Notes in Computer Science , 150 (2004).[63] G. Semerjian and R. Monasson, Phys. Rev. E , 066103 (2003).[64] W. Barthel, A. K. Hartmann, and M. Weigt, Phys. Rev. E , 066104 (2003).[65] T. M. Liggett, Interacting particle systems (Springer, Berlin, 1985). [66] M. Alekhnovich and E. Ben-Sasson, SIAM Journal on Computing , 1248 (2006).[67] B. Selman, H. A. Kautz, and B. Cohen, in Proceedings of the Twelfth National Conference on Artificial Intelligence(AAAI’94) (Seattle, 1994), pp. 337–343.[68] D. McAllester, B. Selman, and H. Kautz, in
Proceedings of the Fourteenth National Conference on Artificial Intelligence(AAAI’97) (Providence, Rhode Island, 1997), pp. 321–326.[69] S. Seitz, M. Alava, and P. Orponen, Journal of Statistical Mechanics: Theory and Experiment , P06006 (2005).[70] J. Ardelius and E. Aurell, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) , 037702 (pages 4)(2006).[71] M. Alava, J. Ardelius, E. Aurell, P. Kaski, S. Krishnamurthy, P. Orponen, and S. Seitz (2007), arXiv:0711.4902 .[72] M.-T. Chao and J. Franco, SIAM J. Comput. , 1106 (1986).[73] M.-T. Chao and J. Franco, Inf. Sci. , 289 (1990).[74] D. Achlioptas, Theor. Comput. Sci. , 159 (2001).[75] A. Frieze and S. Suen, J. Algorithms , 312 (1996).[76] D. Achlioptas, L. Kirousis, E. Kranakis, and D. Krizanc, Theor. Comput. Sci. , 109 (2001).[77] S. Cocco and R. Monasson, Ann. Math. Artif. Intell. , 153 (2005).[78] C. Deroulers and R. Monasson, Europhysics Letters , 153 (2004).[79] R. Monasson, in Complex Systems , edited by J. P. Bouchaud, M. M´ezard, and J. Dalibard (Elsevier, Les Houches, France,2007).[80] S. Cocco and R. Monasson, Phys. Rev. Lett. , 1654 (2001).[81] R. Monasson, A generating function method for the average-case analysis of DPLL. , Lecture Notes in Computer Science3624, 402-413 (2005). (2005).[82] P. Beame, R. Karp, T. Pitassi, and M. Saks, SIAM Journal of Computing , 1048 (2002).[83] S. Tatikonda and M. Jordan, in Proc. Uncertainty in Artificial Intell. (2002), vol. 18, pp. 493–500.[84] A. Montanari and D. Shah, in
SODA (2007), pp. 1255–1264.[85] A. Braunstein and R. Zecchina, Journal of Statistical Mechanics: Theory and Experiment , P06007 (2004).[86] E. Maneva, E. Mossel, and M. J. Wainwright, in
SODA ’05: Proceedings of the sixteenth annual ACM-SIAM symposiumon Discrete algorithms (Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2005), pp. 1089–1098,ISBN 0-89871-585-7.[87] E. Aurell, U. Gordon, and S. Kirkpatrick, in
NIPS (2004).[88] G. Parisi (2003), arXiv:cs.CC/0301015 .[89] D. Battaglia, M. Kol´aˇr, and R. Zecchina, Phys. Rev. E , 036107 (2004).[90] J. Chavas, C. Furtlehner, M. M´ezard, and R. Zecchina, Journal of Statistical Mechanics: Theory and Experiment ,P11016 (2005).[91] G. Parisi (2003), arXiv:cond-mat/0308510 .[92] URL .[93] A. Montanari, F. Ricci-Tersenghi, and G. Semerjian (2007), arXiv:0709.1667 , to be published in the Proceedings of the45th Allerton Conference (2007).[94] U. Feige, in STOC (2002), pp. 534–543.[95] U. Feige, E. Mossel, and D. Vilenchik,
Complete convergence of message passing algorithms for some satisfiability prob-lems. , Lecture Notes in Computer Science 4110, 339-350 (2006). (2006).[96] F. Altarelli, R. Monasson, and F. Zamponi, Journal of Physics A: Mathematical and Theoretical , 867 (2007).[97] A. Coja-Oghlan, M. Krivelevich, and D. Vilenchik, Why almost all k-cnf formulas are easy , to appear (2007).[98] F. Krzakala, A. Pagnani, and M. Weigt, Phys. Rev. E , 046705 (2004).[99] L. Zdeborov´a and F. Krzakala, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) , 031131 (pages 29)(2007).[100] W. Barthel, A. K. Hartmann, M. Leone, F. Ricci-Tersenghi, M. Weigt, and R. Zecchina, Phys. Rev. Lett. , 188701(2002).[101] A. Montanari and G. Semerjian, J. Stat. Phys.125