[PDF] Palette-colouring: a belief-propagation approach

Abstract

We consider a variation of the prototype combinatorial-optimisation problem known as graph-colouring. Our optimisation goal is to colour the vertices of a graph with a fixed number of colours, in a way to maximise the number of different colours present in the set of nearest neighbours of each given vertex. This problem, which we pictorially call "palette-colouring", has been recently addressed as a basic example of problem arising in the context of distributed data storage. Even though it has not been proved to be NP complete, random search algorithms find the problem hard to solve. Heuristics based on a naive belief propagation algorithm are observed to work quite well in certain conditions. In this paper, we build upon the mentioned result, working out the correct belief propagation algorithm, which needs to take into account the many-body nature of the constraints present in this problem. This method improves the naive belief propagation approach, at the cost of increased computational effort. We also investigate the emergence of a satisfiable to unsatisfiable "phase transition" as a function of the vertex mean degree, for different ensembles of sparse random graphs in the large size ("thermodynamic") limit.

Full PDF

aa r X i v : . [ c ond - m a t . s t a t - m ec h ] A p r Palette-colouring: a belief-propagation approach

Alessandro Pelizzola , , , Marco Pretti ¶ , and Jort van Mourik Dipartimento di Fisica, CNISM and Center for Computational Studies, Politecnicodi Torino – Corso Duca degli Abruzzi 24, I-10129 Torino, Italy INFN, Sezione di Torino, Torino, Italy HuGeF Torino – Via Nizza 52, I-10126 Torino, Italy CNR – Consiglio Nazionale delle Ricerche, Istituto dei Sistemi Complessi, andCNISM, Dipartimento di Fisica, Politecnico di Torino – Corso Duca degli Abruzzi 24,I-10129 Torino, Italy Non-Linearity and Complexity Research Group, Aston University, Birmingham B47ET, UK

Abstract.

We consider a variation of the prototype combinatorial-optimisationproblem known as graph-colouring. Our optimisation goal is to colour the verticesof a graph with a ﬁxed number of colours, in a way to maximise the number ofdiﬀerent colours present in the set of nearest neighbours of each given vertex. Thisproblem, which we pictorially call palette-colouring , has been recently addressed asa basic example of problem arising in the context of distributed data storage. Eventhough it has not been proved to be NP complete, random search algorithms ﬁnd theproblem hard to solve. Heuristics based on a naive belief propagation algorithm areobserved to work quite well in certain conditions. In this paper, we build upon thementioned result, working out the correct belief propagation algorithm, which needsto take into account the many-body nature of the constraints present in this problem.This method improves the naive belief propagation approach, at the cost of increasedcomputational eﬀort. We also investigate the emergence of a satisﬁable to unsatisﬁable“phase transition” as a function of the vertex mean degree, for diﬀerent ensembles ofsparse random graphs in the large size (“thermodynamic”) limit. ¶ To whom correspondence should be addressed ([email protected]) alette-colouring: a belief-propagation approach

1. Introduction

Graph colouring is a prototype of combinatorial optimisation or constraint satisfactionproblems [1]. It is NP-complete, so that it can be taken as a benchmark for optimisationalgorithms. Moreover, it is at the core of a large number of technologically relevantcombinatorial problems, such as scheduling. The goal is to assign a colour to eachvertex of a given graph (with a ﬁxed number of available colours), in such a way thatno pair of vertices connected by an edge have the same colour. Alternatively, one maybe satisﬁed with a suboptimal solution, i.e., minimising the number of vertex pairs withthe same colour.A nice variant of the above problem has been recently proposed and investigatedby Bounkong and coworkers [2, 3]. The variation consists in requiring that the set ofcolours assigned to each given vertex and its neighbours includes all available colours.The latter problem, which we pictorially call palette-colouring , has been suggested as abasic example of constraint satisfaction problem arising in the context of distributed datastorage [2]. The basic idea is as follows. On a computer network with limited storageresources at each node, it may be convenient to divide a ﬁle into a number of segments(colours), which are then distributed over diﬀerent nodes. Each given node should beable to retrieve the diﬀerent segments by accessing only its own and nearest neighbourstorage devices, whence the above described constraints. Even in this case, one mightbe satisﬁed with a suboptimal solution, i.e., maximising the number of colours presentin each node neighbourhood. We note that palette-colouring has not been proved to beNP-complete, but there are numerical evidences that it becomes intractable for largesystem size [2]. With respect to ordinary colouring, the most relevant diﬀerence is thatthe modiﬁed problem becomes easier to solve for graphs with higher, rather than lower,vertex degrees.In the last few years, diﬀerent types of constraint satisfaction problems have beenfaced by message passing techniques, among which Belief Propagation (BP) [4, 5]. BPhas been originally conceived as a dynamic programming algorithm to perform exactstatistical inference for Markov random ﬁeld models deﬁned on graphs without loops(trees) [6, 7]. Subsequently, it has been demonstrated to be relatively good even forloopy graphs. Such a successful behaviour seems to be related to the fact that actuallyBP is equivalent to determine a minimum of an approximate free energy function (Bethefree energy) for a corresponding thermodynamic system. The Bethe approximation wasindeed very well known to physicists [8, 9], but the connection with BP is a relativelyrecent result [4].In [2], Bounkong, van Mourik, and Saad analyse an algorithm based on BP,comparing its performance with a variant of Walksat [10]. In particular the BP-based algorithm makes use of beliefs averaged over several iterations, together witha common decimation strategy. It is observed that, while Walksat works deﬁnitelybetter for small graphs (100 vertices), the opposite occurs for larger (1000 vertices)random graphs. This result is somehow related to the nature of BP itself, since large alette-colouring: a belief-propagation approach not a generalised BP [13]. Indeed,in the literature, the latter term usually denotes a class of algorithms computing theminima of more reﬁned free energy approximations (Kikuchi [14], rather than Bethe,free energies) [15]. Here, however, we derive an algorithm computing the correct Betheapproximation, which is, the exact solution for loopless graphs. We then compare theperformance of the new BP algorithm (which we shall simply call BP from now on) tothe naive one, showing that further improvements can be obtained.Let us note that the correct Bethe approximation has already been considered forthis problem by Wong and Saad [3], in order to investigate the emergence and natureof the satisﬁable to unsatisﬁable transition, observed upon decreasing the mean vertexdegree of diﬀerent sparse random graphs. In the replica-symmetry assumption, theauthors of the cited paper study average macroscopic properties of a given randomgraph ensemble, making use of a numerical method of the population dynamics type.In the current work, we mainly focus on the algorithmic properties of the messagepassing procedure, and related decimation strategies. In particular, we discuss bothanalytical and numerical strategies for limiting the increase of computational costarising from the pairwise messages. Also, in the last part of the paper, we develop thedistributional version of the message-passing scheme, which in the literature is usuallydenoted as cavity method [16] and used to study random (glass-like) systems [17].We limit this analysis to the replica-symmetry assumption and to the simple colour-symmetric (paramagnetic) solution. Within these simplifying hypotheses, we computethe quenched entropy for given random graph ensembles, and estimate the correspondingsatisﬁability threshold, partially recovering a result of [3].

2. Statement of the problem and Belief Propagation

We consider an undirected simple graph, whose vertices are denoted by i = 1 , . . . , N .Our goal is to assign to each vertex i a colour x i from a given colour set C ≡ { , , . . . , q } , alette-colouring: a belief-propagation approach E ( x , . . . , x N ) = N X i =1 η ( x i , x ∂i ) , (1)where ∂i denotes the neighbourhood of i (i.e., the set of vertices directly connected to i by an edge), and x ∂i ≡ { x j } j ∈ ∂i the array of colour variables in ∂i . The elementaryenergy term η ( x i , x ∂i ) counts the number of missing colours in the neighbourhood of i ,including i itself. A suitable expression for the η function is therefore η ( x , . . . , x n ) = X x ∈ C n Y i =1 [1 − δ ( x i , x )] , (2)where δ ( x, y ) is a Kronecker delta, and n is the number of entries of the η function (nota-priori ﬁxed). With the above deﬁnitions, the cost function value is E ( x , . . . , x N ) = 0if and only if the colour assignments x , . . . , x N satisfy all constraints.In the current work, we deal with this problem by studying an equivalent“thermodynamic” system, whose potential energy is deﬁned by the cost function E ( x , . . . , x N ). For energy minimisation, we consider the zero temperature limit.The BP approach allows us to determine approximate marginals of the equilibrium(Boltzmann) probability distribution for the colour variables. As mentioned in theIntroduction, our approximation becomes exact when the graph is a tree. From thetreatment described in Appendix A, it turns out that we can write two diﬀerentmarginals, namely, the joint distribution of two colour variables on a graph edge p i,j ( x i , x j ), and the joint distribution of a given colour variable together with itsneighbours p i,∂i ( x i , x ∂i ) (“cluster” distribution), as a function of pairwise messages m j → i ( x j , x i ). Each given term m j → i ( x j , x i ) may be viewed as a message sent from thecluster { j, ∂j } to the edge { i, j } , representing the inﬂuence of the constraint associatedto the vertex j onto the colour variables of the edge { i, j } (some details about thisinterpretation are elucidated in Appendix B). In formulae, we have p i,j ( x i , x j ) = e f ij m i → j ( x i , x j ) m j → i ( x j , x i ) , (3) p i,∂i ( x i , x ∂i ) = e f i − βη ( x i ,x ∂i ) Y j ∈ ∂i m j → i ( x j , x i ) , (4)where β is the inverse temperature, and f ij and f i , usually called free energy shifts (seeAppendix A), can be determined by normalisation ase − f ij = X x i ,x j m i → j ( x i , x j ) m j → i ( x j , x i ) , (5)e − f i = X x i ,x ∂i e − βη ( x i ,x ∂i ) Y j ∈ ∂i m j → i ( x j , x i ) . (6)The messages have to satisfy a set of self-consistency equations, which basically accountfor compatibility between “overlapping” distributions. For instance, the { i, j } edgedistribution must be a marginal of the cluster distributions associated to both vertices alette-colouring: a belief-propagation approach i and j . Considering the former case, we can write p i,j ( x i , x j ) = X x ∂i \ j p i,∂i ( x i , x ∂i ) , (7)where the sum runs over the values of the array of colour variables x ∂i \ j ≡ { x k } k ∈ ∂i \ j ,i.e., the colour variables in the neighbourhood of i except x j . In fact, we can obtain theself-consistency equation by replacing (3) and (4) into the compatibility equation (7),yielding m i → j ( x i , x j ) ∝ X x ∂i \ j e − βη ( x i ,x ∂i ) Y k ∈ ∂i \ j m k → i ( x k , x i ) , (8)where a normalisation factor has been replaced by the proportionality symbol. In orderto satisfy all the necessary compatibilities, one equation of the above form must hold foreach directed edge i → j . The BP algorithm solves the set of self-consistency equationsiteratively, starting from suitable (usually random or uniform) initial conditions forthe messages, until the distance between messages at subsequent updates goes below agiven threshold. From a heuristic point of view, each message update according to (8)is usually interpreted as a propagation process, so that in the following we shall alsodenote (8) as the propagation equation . For completeness, in Appendix B we also reportthe propagation equations of the naive BP algorithm, which are numerically simpler.We note that, by employing the explicit expression (2) of the elementary energyterm (cluster energy), we can signiﬁcantly reduce the computational cost of thepropagation equation (8) as well. Indeed, it turns out that the latter can be rewrittenas m i → j ( x i , x j ) ∝ X B ⊆ C \ x i \ x j ( − − β ) | B | Y k ∈ ∂i \ j X x k ∈ C \ B m k → i ( x k , x i ) , (9)where the outer sum runs over all the possible subsets B of the colour set C withoutthe colours x i , x j . The derivation can be found in Appendix C. Now, we comparethe computational cost of the generic equations with respect to the simpliﬁed form.Assuming that d is the degree of vertex i , the generic equation (8) requires ( d − q d − multiplications, which can be reduced to 2 q d − + P d − n =2 q d − n by suitable (straightforward)programming tricks. Taking into account that a trivial necessary condition for anelementary constraint to be satisﬁable is d ≥ q −

1, the leading term of the computationalcost turns out to be at least q q − . The simpliﬁed equation (9), however, requires( d − q − multiplications, which is clearly much more convenient for any q > − f i = X x i ∈ C X B ⊆ C \ x i ( − − β ) | B | Y j ∈ ∂i X x j ∈ C \ B m j → i ( x j , x i ) , (10)which can be obtained by an analogous derivation. alette-colouring: a belief-propagation approach

3. Optimisation strategy and numerical results

In this section, we deﬁne the optimisation strategy, and test its performance on singleinstances of random graphs drawn from a suitable ensemble. Our strategy involves adecimation procedure, which is analogous to that of [2], but is carried out on the basis ofnearest-neighbour pair distributions p i,j ( x i , x j ), rather than single-variable distributions.Given a graph and a number q of available colours, we ﬁrst ﬁx the colour of a randomlychosen vertex, in order to break the colour permutation symmetry, and proceed asfollows. We perform the ﬁrst BP run (starting from uniform messages) and determinethe pair distributions according to (3). For each edge { i, j } , we ﬁx the colour variables x i , x j at the values ¯ x i , ¯ x j having the largest joint probability, provided the latter is largerthan a certain threshold. If no probability satisﬁes such a condition, we only ﬁx thepair of variables with the largest joint probability over the whole graph. Then, we rerunBP (starting from the previously computed messages) and iterate the above procedureuntil all variables are ﬁxed, or all constraints are satisﬁed (in the latter case, non-ﬁxedvariables can be assigned a random colour). We always set the threshold probabilityat 0 .

9, as done in [2]. We observe that, in most cases, one of the two variables chosento be ﬁxed has been already ﬁxed at a previous stage of the decimation procedure, sothat, in most cases, we actually ﬁx just one variable for each given pair. Therefore, eventhough we are working with pair, rather than single-variable distributions, we observethat choosing the same threshold probability results in a similar decimation rate.We now spend a few words on the precise meaning of “ﬁxing a variable”, asintroduced above, from the point of view of the message-passing procedure. In thethermodynamic language, colouring a vertex is tantamount to imposing an inﬁniteenergy penalty to all other possible colours. Thus, if we want to ﬁx a single variable x i to a given colour ¯ x i , we may add to the corresponding cluster energy η ( x i , x ∂i ) a term γ [1 − δ ( x i , ¯ x i )], and then take the limit γ → ∞ . By the propagation equation (8), it iseasy to see that such operations imply that all the messages m i → j ( x i , x j ), sent from thevertex i (more precisely, from the cluster associated to the vertex i ), must be multipliedby a prefactor δ ( x i , ¯ x i ), which basically preserves only messages of the type m i → j (¯ x i , x j ).As a consequence, when we ﬁx the colours of two nearby vertices, it turns out that thelatter no longer need to exchange messages or, in other words, the messages remainﬁxed at m i → j ( x i , x j ) = m j → i ( x j , x i ) = δ ( x i , ¯ x i ) δ ( x j , ¯ x j ) . (11)Although such messages have no eﬀect on the vertices i and j themselves, due to theform of the propagation equation, they may still inﬂuence their neighbourhoods ∂i \ j and ∂j \ i .Before presenting the results, we note that in [2] the authors observe thatthe naive BP hardly ever converges. This problem is circumvented by computingprobability distributions as “time-averages” over a number of iterations, which turnsout to provide suﬃcient information for guiding the decimation procedure. In ourscheme, the BP algorithm turns out to converge more frequently, except in the vicinity alette-colouring: a belief-propagation approach α ) and the updates obtainedfrom the propagation equation (with coeﬃcient 1 − α ). The adjustable parameter α playsthe role of a damping in the propagation dynamics, and we refer to it as the dampingparameter . Nevertheless, we generally ﬁnd that reaching convergence is not reallynecessary. Indeed, a very small number ν of sequential updates + of all messages aresuﬃcient to provide the relevant information about pair probabilities, and that a largernumber of iterations does not signiﬁcantly improve the overall algorithm performance.This fact allows us to drastically reduce the computational cost of the full procedure,although it does not aﬀect the complexity of a single iteration.We are now in a position to perform a quantitative comparison with the naive BPapproach [2]. As in the cited work, we consider a number of available colours q = 4 andrandom graphs with N = 1000 vertices. Graphs are generated in such a way to havevertices with two diﬀerent degrees d = ⌊ c ⌋ and d = ⌈ c ⌉ , where c is the mean degree.The degree distribution, i.e., the probability of a vertex having degree d , is therefore ρ d =  ⌈ c ⌉ − c if d = ⌊ c ⌋ c − ⌊ c ⌋ if d = ⌈ c ⌉ linear distribution. We always assume c ≥ q −

1, in order to avoidthe appearance of vertices with degree less than q −

1, for which the local constraints arenecessarily unsatisﬁable. We do not report results about graphs with cut-Poissonian degree distribution [2], which exhibit analogous behaviour.In ﬁgure 1 we report both perfect colouring and unsatisfaction measures, over 1000random graph samples, as a function of the mean degree. The perfect colouring measureis simply deﬁned as the fraction of samples for which the algorithm has been able toﬁnd a colour assignment satisfying all constraints. The unsatisfaction measure countsthe fraction of missing colours per vertex, i.e. the energy per vertex divided by thetotal number of colours, E ( x , . . . , x N ) /N q ( x , . . . , x N being the colour assignmentsfound by the algorithm), averaged over all samples. We can see that the BP approachimproves the naive one in both respects. The perfect colouring measure turns out to beconsistently increased in the vicinity of the critical mean degree values, below which itrapidly vanishes. In this region, naive BP itself was already found to work better thanthe Walksat-like algorithm, analysed in [2].In analogy with the ordinary colouring problem [18] (though with reversed rolefor the mean degree c ), we expect that, for even lower c values, our problembecomes unsatisﬁable with high probability (i.e., with probability tending to 1 in the“thermodynamic” N → ∞ limit). We also expect the presence of an intermediate + With reference to the propagation equation (8), by sequential update we mean that, in generating agiven “output” (left-hand side) message, one makes use of updated “input” (right-hand side) messages,if already available. alette-colouring: a belief-propagation approach un s a t i s f a c t i on ( % ) mean degree pe r f e c t c o l o r i ng ( % ) mean degree Figure 1.

Perfect colouring (left) and unsatisfaction (right) measures over 1000graphs for naive BP [2] (open squares) and BP with ν = 3 and α = 0 . β = 10. hard-satisﬁable phase in which the problem is satisﬁable with high probability butBP fails, because of a clustered structure of the solution space (replica-symmetrybreaking) [16, 17, 18, 19, 20]. Accordingly, the perfect colouring probability fallingdown to zero is likely to indicate the onset of such hard-satisﬁable phase rather thanthe truly unsatisﬁable phase. We shall return to this point later. For the moment, weobserve that the BP approach deﬁnitely works better than the naive one, even for verylow c values, in the (expected) unsatisﬁable phase. In this region we observe both areduction of the unsatisfaction measure itself and of its growth rate with decreasing c .Concerning the percentage of perfect colouring, we have noticed that theperformance of the algorithm is signiﬁcantly aﬀected by the number ν of iterationsper decimation step, only in a narrow region close to the critical c value. This suggeststhat in this region the problem is actually more diﬃcult to solve. Some results aboutthe inﬂuence of the ν parameter are reported in ﬁgure 2. Upon increasing ν , someimprovement can also be observed in the unsatisfaction measure. However, as previouslymentioned, increasing ν values beyond 2 or 3 does not yield any further signiﬁcantimprovement. We also note that a quantitatively comparable improvement of theunsatisfaction measure is obtained by choosing a small but nonzero value of the dampingparameter α . All the results reported in the current paper have been obtained with α = 0 .

1, but it turns out that in a rather large range (0 . . α . .

3) the averagealgorithm performance is practically independent of the precise value of the dampingparameter. Finally, we note that (for ν ≥

2) the perfect colouring measure exhibits aslight kink at c = 4 .

0. This can be ascribed to an abrupt change in the structure of thegraph ensemble. In fact, according to the linear degree distribution (12), for c = 4 allvertices have exactly degree 4, whereas, for c > c < alette-colouring: a belief-propagation approach un s a t i s f a c t i on ( % ) mean degree pe r f e c t c o l o r i ng ( % ) mean degree Figure 2.

Perfect colouring (left) and unsatisfaction (right) measures over 1000graphs for BP with α = 0 . β = 10, as a function of the mean degree. Squares,circles, triangles denote ν = 1 , ,

3, respectively. In the main ﬁgures, interpolationbetween data-points in the transition region has been performed by taking into accountthe extra data-points reported in the insets. un s a t i s f a c t i on ( % ) mean degree pe r f e c t c o l o r i ng ( % ) mean degree Figure 3.

Perfect colouring (left) and unsatisfaction (right) measures over 1000graphs for BP with ν = 2, α = 0 .

1, and β = 10, as a function of the mean degree.Squares, circles, triangles denote number of vertices N = 1000 , , We have also analysed the algorithm behaviour as a function of the number ofvertices N . The results are reported in ﬁgure 3. We can see that the transition in theperfect colouring probability becomes more and more abrupt upon increasing N , anda cross-over point appears at a mean degree value c ≈ . N → ∞ limit. The latter conjectureis consistent with the fact that random graphs of increasing size become more and moretree-like, such that the BP approach is able to provide better and better approximations.In principle, the cross-over point might be the signature of the satisﬁable to unsatisﬁabletransition, but, as previously mentioned, we are rather led to identify it with the onset alette-colouring: a belief-propagation approach

4. Entropy and satisﬁability threshold

In this section, we study average macroscopic properties of the BP solution overrandom graph ensembles, with particular attention to the average entropy. The latter isusually denoted as quenched entropy in statistical mechanics language. Taking the limit β → ∞ , this quantity provides an average measure of (the logarithm of) the number ofzero energy conﬁgurations, i.e., perfect colourings, for a given ensemble, which alsoallows us to estimate the satisﬁability threshold. In this context, the main sourceof approximation will be the replica-symmetry assumption, since the approximationdue to BP itself is expected to be negligible in the inﬁnite size limit. Furthermore,we limit the analysis to BP solutions that do not break the colour permutationsymmetry (“paramagnetic” solutions), because we have numerical evidence that, whenBP converges, no spontaneous symmetry breaking of the solution is ever observed.Average properties of non-paramagnetic (glass-like) solutions have been investigatedin [3], but they do only appear at very low c values, where the replica-symmetryassumption is expected to break down anyway.According to the paramagnetic ansatz, the messages are always such that m i → j ( x, x )does not depend on x , and m i → j ( x, y ) does not depend on x, y , if x = y . This means thatthe only important quantity is u i → j ≡ m i → j ( x, x ) /m i → j ( x, y ), i.e., the ratio between the“equal colours” message and the “diﬀerent colours” message. Taking into account thatthe message normalisation is irrelevant to all observable quantities, we can write the fullmessage as m i → j ( x, y ) = 1 − (1 − u i → j ) δ ( x, y ) = ( u i → j if x = y . (13)We note that in principle one could also think about the inverse ratio m i → j ( x, y ) /m i → j ( x, x ) as the relevant message, but this choice turns out to be un-feasible, due to the nature of the constraints, favouring the presence of diﬀerent neigh-bouring colours. Indeed, at zero temperature, it is easy to foresee the emergence of“hard” messages such that m i → j ( x, x ) = 0, stemming from vertices with degree q − m i → j ( x, y ) = 0 for x = y in a paramagnetic state.Replacing (13) into the inner sum appearing in the simpliﬁed propagationequation (9), we can write X x k ∈ C \ B m k → i ( x k , x i ) = q − | B | − u k → i , (14)where the term − u k → i appears because x i / ∈ B . Since the sum above only dependson B via its cardinality | B | , in (9) we can replace the sum over B by a sum over alette-colouring: a belief-propagation approach u i → j = q − X n =0 (cid:18) q − n (cid:19) ( − n Y k ∈ ∂i \ j ( q − n − u k → i ) q − X n =0 (cid:18) q − n (cid:19) ( − n Y k ∈ ∂i \ j ( q − n − u k → i ) , (15)in which we have also taken the zero temperature ( β → ∞ ) limit. The cluster free energyshift can be similarly derived by replacing (13) into (10). In the zero temperature limit,we obtain e − f i = q q − X n =0 (cid:18) q − n (cid:19) ( − n Y j ∈ ∂i ( q − n − u j → i ) . (16)The edge free energy shifts can be directly obtained by inserting (13) into (5)e − f ij = q ( q − u i → j u j → i ) . (17)We can characterise a random graph ensemble by a probability distribution ofmessages P ( u ). Such a distribution has to obey a functional equation (usually knownas cavity equation [16]) of the following form P ( u ) = X d ˜ ρ d Z d u P ( u ) . . . Z d u d − P ( u d − ) δ ( u − ˆ u ( u , . . . , u d − )) , (18)where ˆ u ( u , . . . , u d − ) is the “propagation function” deﬁned by (15), and where ˜ ρ d is theprobability of ﬁnding a vertex of degree d by choosing a random direction in a randomlyselected edge. It is easy to see that ˜ ρ d is related to the degree distribution ρ d as˜ ρ d = dρ d c . (19)In the context of the cavity method, the replica-symmetry assumption consists in thefact that we consider a single distribution of messages. In a replica-symmetry breakingscenario, each propagated quantity u i → j (message) would be replaced by a probabilitydistribution deﬁned over diﬀerent ergodic components (states) [16].We solve the functional equation (18) numerically by a population dynamicsapproach [16]. In a nutshell, we represent the distribution P ( u ) by an evolvingpopulation of messages. An elementary evolution step consists in generating a newmessage according to the propagation equation (15), making use of d − d is randomly generated according to the˜ ρ d distribution. The newly generated message replaces a randomly selected message ofthe population. Due to the presence of hard messages u = 0 generated by degree q − P ( u ) contains a Dirac delta peakcentred in zero with weight ˜ ρ q − . alette-colouring: a belief-propagation approach ha r d m e ss age s ( % ) mean degree en t r op y pe r v e r t e x mean degree Figure 4.

Entropy per vertex (left) and fraction of hard messages (right) for randomgraphs with linear degree distribution ( q = 4), as a function of the mean degree c . From the message distribution, we can evaluate the average cluster and edge free energyshifts as: f c = X d ρ d Z d u P ( u ) . . . Z d u d P ( u d ) f c ( u , . . . , u d ) , (20) f e = Z d u P ( u ) Z d u P ( u ) f e ( u , u ) , (21)where the functions f c ( u , . . . , u d ) and f e ( u , u ) are deﬁned by (16) and (17). Thus weobtain the average free energy per vertex as f = f c − c f e , (22)where c/ β factor in our free energydeﬁnition, and since the limit β → ∞ ﬁxes the energy at zero, the entropy per vertex issimply s = − f .For actual calculations, we have considered random graph ensembles with the lineardegree distribution (as deﬁned in the previous section), and with the cut-Poissoniandistribution (also considered in [2]), deﬁned as ρ d =  e − ( c − q +1) ( c − q + 1) d − q +1 ( d − q + 1)! if d ≥ q −

10 otherwise , (23)where c is still the mean degree. This distribution also excludes vertices with degreesmaller than q − ρ q − for the linear degree distribution turns out to be nonzero only alette-colouring: a belief-propagation approach ha r d m e ss age s ( % ) mean degree en t r op y pe r v e r t e x mean degree Figure 5.

The same as ﬁgure 4 for random graphs with cut-Poissonian degreedistribution. for c < q , which explains the kink observed in the entropy function. Negative entropyidentiﬁes the unsatisﬁable region (perfect colourings are exponentially rare), whereasthe zero entropy point identiﬁes the satisﬁability threshold c th . For the two ensembles,we respectively ﬁnd c th ≈ .

825 and c th ≈ . c th ≈ . c th ≈ .

1. Asfar as the linear ensemble is concerned, we expect that our result is also analyticallyequivalent to the (replica-symmetric) one by Wong and Saad [3], and in fact we obtaina very good numerical agreement for the threshold value.

5. Summary and conclusions

In this paper, we have considered a variation of the well-known graph colouring problem,which may be viewed as the prototype of a combinatorial optimisation problem emergingin the context of distributed data storage. We have worked out the BP equationsfor this problem, which provide the exact solution on a tree. Due to the many-body nature of the problem, such equations turn out to be diﬀerent from the naiveBP message-passing scheme, as the latter involves messages sent to single variables,whereas the former involve messages sent to pairs of nearest neighbour variables. Oursimulations, performed on random graphs drawn from a suitable ensemble, suggestthat the new algorithm, associated with a decimation procedure, turns out to be muchmore eﬀective than the naive BP-based algorithm. In particular, the probability ofﬁnding a perfect colouring is signiﬁcantly enhanced, especially in the vicinity of thesatisﬁable-to-unsatisﬁable transition. Furthermore, both the unsatisfaction measureand its growth rate upon decreasing the average graph connectivity are signiﬁcantlyreduced. This improved performance is, however, obtained at the cost of increased alette-colouring: a belief-propagation approach

Appendix A. Belief Propagation equations

The BP equations can in general be derived from a very simple recipe. One ﬁrst “fakes”that the graph is a tree and then formally applies the equations obtained for such a caseto a generic graph. This derivation also provides a heuristic argument explaining whythe method generally works better for graphs with a tree-like structure.According to the Boltzmann law, the joint probability distribution of all the colourvariables can be written as p ( x , . . . , x N ) = e F − βE ( x ,...,x N ) , (A.1)where E ( x , . . . , x N ) is the energy function (1), β is the inverse temperature, and F isthe free energy (times β ), which can be determined by normalisation. Following our“fake assumption”, we can consider, for each edge i → j (deﬁned with a direction), thebranch growing from the root vertex j towards i , disconnected from the remainder ofthe system (see ﬁgure A1). We can thus deﬁne a partial energy function E i → j ( x i → j ),obtained by summing the elementary interaction energies only for vertices in the branch,except the root vertex. Since our elementary interaction energies couple together clustersof variables including each vertex and all its neighbours, each partial energy functiondepends on the array of all colour variables in the branch including the root vertex. We alette-colouring: a belief-propagation approach Figure A1.

Tree graph (left), disconnected branch i → j (centre), and decompositionof the latter into subbranches k → i , for k ∈ ∂i \ j , plus the elementary clusterassociated to i (right). denote this array by x i → j . Now, each disconnected branch can be ideally studied as anindependent subsystem, whose Boltzmann probability distribution turns out to be p i → j ( x i → j ) = e F i → j − βE i → j ( x i → j ) , (A.2)where F i → j denotes the corresponding free energy. Note that it is possible to decomposethe partial energy of the given branch i → j into a sum of the partial energies of itssubbranches k → i , for all k ∈ ∂i \ j , plus the elementary interaction energy associatedto i (see ﬁgure A1): E i → j ( x i → j ) = η ( x i , x ∂i ) + X k ∈ ∂i \ j E k → i ( x k → i ) . (A.3)We also deﬁne a free energy shift f i → j as the diﬀerence between the free energy ofthe i → j disconnected branch and the sum of free energies of its (disconnected)subbranches, i.e., F i → j = f i → j + X k ∈ ∂i \ j F k → i . (A.4)From (A.2), (A.3), and (A.4), we can write p i → j ( x i → j ) = e f i → j − βη ( x i ,x ∂i ) Y k ∈ ∂i \ j p k → i ( x k → i ) , (A.5)which provides a relationship between the Boltzmann distribution of the i → j disconnected branch and those of its (disconnected) subbranches. Deﬁning the messages m i → j ( x i , x j ) as marginals of a corresponding branch distribution p i → j ( x i → j ) over thevariables x j and x i (respectively, the root vertex and its ﬁrst neighbour in the branch)we ﬁnally obtain the self-consistency equation (8).We still have to show how messages can determine cluster and edge marginals of thefull Boltzmann distribution (A.1). As in our previous manipulations, we observe that,for each given vertex i , it is possible to write the total energy function (1) as a sum of alette-colouring: a belief-propagation approach j → i , for all j ∈ ∂i , plus the elementaryinteraction energy associated to i : E ( x , . . . , x N ) = η ( x i , x ∂i ) + X j ∈ ∂i E j → i ( x j → i ) . (A.6)Deﬁning also the free energy shift f i as the diﬀerence between the total free energy F and the sum of the disconnected branch free energies, for all the possible branchesgrowing from vertex i , i.e., F = f i + X j ∈ ∂i F j → i , (A.7)from (A.1), (A.2), (A.6), and (A.7), we easily obtain p ( x , . . . , x N ) = e f i − βη ( x i ,x ∂i ) Y j ∈ ∂i p j → i ( x j → i ) . (A.8)Now, the cluster distribution p i,∂i ( x i , x ∂i ) for each vertex i can be derived as a suitablemarginal of p ( x , . . . , x N ). By this marginalisation, we obtain (4). As far as edgemarginals are concerned, we have to consider a diﬀerent decomposition of the totalenergy function. Namely, for each edge { i, j } , the former can be written as a sum oftwo contributions from respectively the branch starting from j towards i and the onestarting from i towards j : E ( x , . . . , x N ) = E i → j ( x i → j ) + E j → i ( x j → i ) . (A.9)We deﬁne the free energy shift f ij as the diﬀerence between the total free energy F andthe sum of the free energies of the disconnected branches mentioned above, i.e., F = f ij + F i → j + F j → i . (A.10)From (A.1), (A.2), (A.9), and (A.10), we obtain p ( x , . . . , x N ) = e f ij p i → j ( x i → j ) p j → i ( x j → i ) . (A.11)Evaluating the edge distribution p i,j ( x i , x j ) as a marginal of p ( x , . . . , x N ), we obtain(3). Finally, we determine the total free energy as a function of the free energy shifts.First we sum both sides of (A.7) over all vertices i , and both sides of (A.10) over alledges { i, j } . Then we subtract the latter equation from the former. It is easy to seethat, on a tree, the number of vertices equals the number of edges plus one, such thatthe left-hand side of the resulting equation turns out to be exactly F . Furthermore, inthe right-hand side all the branch free energies cancel out, and we obtain F = N X i =1 f i − X { i,j } f ij , (A.12)where P { i,j } denotes the sum over all edges. alette-colouring: a belief-propagation approach Figure B1.

A simple undirected graph (left), and the related factor graphs givingrise to naive BP (centre) and BP (right). Open circles and squares denote variableand function nodes, respectively. The labels are explained in the text.

Appendix B. Factor graph formalism

In this appendix, we ﬁrst introduce a more general form of BP equations, deﬁned ona factor graph [22]. Then, we show that from this form one can derive both the naiveBP equations of [2] and the BP equations of the current paper by two diﬀerent factorgraphs associated to the same problem.A factor graph is a bipartite graph, whose left- and right-side vertices are usuallyreferred to as variable nodes and function nodes . The notion of factor graph is meantto describe the structure of the energy function, whose independent variables (i.e., theconﬁguration variables of the corresponding thermodynamic system) are associated tothe variable nodes. A function node connected to a number of variable nodes representsan elementary interaction among the corresponding variables. Let V denote the set ofall the variable nodes, such that each node v ∈ V is associated with a conﬁgurationvariable x v . Let also A ⊆ V denote any subset (cluster) of variable nodes, and let x A ≡ { x v } v ∈ A denote the array of the associated conﬁguration variables. We can thuswrite the energy function as E ( x V ) = X A ∈F ǫ A ( x A ) , (B.1)where ǫ A ( x A ) denotes the elementary interaction energy among the variables in thecluster A (cluster energy), whereas the sum runs over the set F of all the interactingclusters. In what follows, the same label A denotes both a function node and the clusterof variable nodes connected to it. An example of factor graphs describing the energyfunction of a palette-colouring problem is sketched in ﬁgure B1.When the factor graph is a tree, an argument similar to that in Appendix A allowsone to write marginals of the Boltzmann distribution as follows:– For each variable node v ∈ V we have the marginal: p v ( x v ) = e f v Y A ∈F A ∋ v m A → v ( x v ) , (B.2)where the product runs over all the clusters A to which v belongs (i.e. all the function alette-colouring: a belief-propagation approach v ), m A → v ( x v ) is a function-to-variable message, and f v is a freeenergy shift (ensuring normalisation).– For each cluster A ∈ F , we have the marginal: p A ( x A ) = e f A − βǫ A ( x A ) Y v ∈ A w v → A ( x v ) , (B.3)where f A is a free energy shift, and where w v → A ( x v ) is a variable-to-function message: w v → A ( x v ) = Y A ′ ∈F\ AA ′ ∋ v m A ′ → v ( x v ) , (B.4)a product of the messages sent to v from all connected function nodes except A .As shown in Section 2, one can derive the propagation equations by imposingcompatibility between overlapping distributions. In this case, for all A ∈ F and forall v ∈ A , we can write p v ( x v ) = X x A \ v p A ( x A ) , (B.5)where the sum runs over all possible values of the variables in the cluster A except x v .Inserting (B.2), (B.3) into (B.5), we obtain the propagation equation m A → v ( x v ) ∝ X x A \ v e − βǫ A ( x A ) Y v ′ ∈ A \ v w v ′ → A ( x v ′ ) , (B.6)with the w v ′ → A ( x v ′ ) deﬁned by (B.4). Note that, as in (8), we have replaced thenormalisation factor with a proportionality symbol. Finally, following the argumentof Appendix A, we write the total free energy as a function of the free energy shifts as F = X A ∈F f A − X v ∈ V ( d v − f v , (B.7)where d v is the degree of the variable node v in the factor graph. Naive BP

We ﬁrst consider the energy function (1), where the conﬁguration (colour) variables x i are associated with the vertices i = 1 , . . . , N of an ordinary graph, and the elementaryinteraction energy involves a cluster made up of a vertex i and all its neighbours ∂i .This structure is described by a factor graph in which the variable nodes are associatedwith the vertices of the original graph and the function nodes with the clusters. We canuse the same index for both the variable node i and the function node with i at its centre(the cluster A i ≡ { i, ∂i } ). Hence, each variable node i receives messages m A j → i ( x i ) fromall the function nodes A j with j ∈ ∂i , and from A i itself. With the short-hand m j → i for m A j → i , omitting the normalisation factor, (B.2) becomes p i ( x i ) ∝ m i → i ( x i ) Y j ∈ ∂i m j → i ( x i ) . (B.8) alette-colouring: a belief-propagation approach A i receives variable-to-function messages from i and all j ∈ ∂i ,and the cluster distribution for A i (B.3) becomes p i,∂i ( x i , x ∂i ) ∝ e − βη ( x i ,x ∂i ) w i → i ( x i ) Y j ∈ ∂i w j → i ( x j ) . (B.9)We have identiﬁed ǫ A i ( x A i ) with η ( x i , x ∂i ), and w j → i is short-hand for w j → A i . From(B.4), one can see that the variable-to-function messages take two slightly diﬀerentforms, depending on whether they travel (to the cluster A i ) either from the “central”node i or from a “peripheral” node j ∈ ∂i . In the simpliﬁed notation, we haverespectively w i → i ( x i ) = Y j ∈ ∂i m j → i ( x i ) , (B.10) w j → i ( x j ) = m j → j ( x j ) Y k ∈ ∂j \ i m k → j ( x j ) . (B.11)The compatibility condition (B.5), can also be written in two diﬀerent forms. For all i = 1 , . . . , N , j ∈ ∂i , we have respectively: p i ( x i ) = X x ∂i p i,∂i ( x i , x ∂i ) , (B.12) p i ( x i ) = X x j ,x ∂j \ i p j,∂j ( x j , x ∂j ) . (B.13)Using (B.8) and (B.9), this in turn gives rise to two diﬀerent propagation equations: m i → i ( x i ) ∝ X x ∂i e − βη ( x i ,x ∂i ) Y j ∈ ∂i w j → i ( x j ) , (B.14) m j → i ( x i ) ∝ X x j ,x ∂j \ i e − βη ( x j ,x ∂j ) w j → j ( x j ) Y k ∈ ∂j \ i w k → j ( x k ) . (B.15)These equations, together with (B.10) and (B.11), are identical (apart from the notation)to the naive BP equations presented in [2]. From ﬁgure B1 one sees that even when theoriginal graph is a tree, the corresponding factor graph contains short loops, and thenaive BP equations are not exact. Current BP

We now consider an alternative form of the energy function (1) by introducing:(i) a variable x ji for each vertex-neighbour pair ( i, j ∈ ∂i ) (a kind of “replica” of x i );(ii) a constraint imposing that all replicas of x i are equal for each vertex i .The constraints can be realised by assigning inﬁnite energy penalties to conﬁgurationswe want to be forbidden. Assuming γ → ∞ , we deﬁne E ( { x j } j ∈ ∂ , . . . , { x jN } j ∈ ∂N ) = N X i =1 (cid:2) η ( x ∗ i , { x ij } j ∈ ∂i ) + γ χ ( { x ji } j ∈ ∂i ) (cid:3) , (B.16)where the function χ ( · ) returns 1 when its entries are not all equal, and 0 otherwise,whereas x ∗ i means that the replica index is irrelevant. Note that the allowed (ﬁnite alette-colouring: a belief-propagation approach η ( · ), but only on the fact that each vertex of theoriginal graph interacts (at most) with all its neighbours. With these deﬁnitions, eachedge { i, j } of the original graph can be naturally associated with the pair of variables { x ji , x ij } (the j -replica of x i and the i -replica of x j ). Moreover, the structure of themodiﬁed energy function (B.16) is described by a factor graph in which the variablenodes v correspond to the edges { i, j } of the original graph, while the function nodes A now correspond to the clusters of interacting edges A i ≡ {{ i, j }| j ∈ ∂i } . Figure B1shows that, when the original graph is a tree, this factor graph is also one, and everyvariable node { i, j } has degree 2, so that it only receives messages from the functionnodes A i and A j . Using m i → j as short-hand for m A i →{ i,j } , (B.2) becomes p { i,j } ( x ji , x ij ) = e f { i,j } m i → j ( x ji , x ij ) m j → i ( x ij , x ji ) . (B.17)The variable-to-function messages (B.4) are simply w i → j ( x ji , x ij ) = m i → j ( x ji , x ij ) , (B.18)where w i → j is short-hand for w { i,j }→ A j . Finally, the cluster distribution (B.3) is p A i ( { x ji , x ij } j ∈ ∂i ) = e f Ai − βη ( x ∗ i , { x ij } j ∈ ∂i ) − βγχ ( { x ji } j ∈ ∂i ) Y j ∈ ∂i m j → i ( x ij , x ji ) , (B.19)where the cluster energy ǫ A ( x A ) has been replaced with the elementary term of (B.16).Discarding forbidden conﬁgurations (dropping replica indices), (B.17) is equivalentto (3), and, since all the χ -terms vanish, (B.19) is equivalent to (4). This is suﬃcient toderive the propagation equation (8), as shown in Section 2. Finally, the free energy (B.7)is equivalent to (A.12), as all variable nodes of the factor graph have degree 2. Appendix C. Simpliﬁed equations

In this appendix, we derive the simpliﬁed forms (9) and (10) of the propagationequation (8) and the free energy shift (6), respectively. Both derivations are basedon similar manipulations. We consider the elementary energy term (2) associated tovertex i , and note that it can be written in an alternative form for each given choice ofa neighbour vertex j ∈ ∂i : η ( x i , x ∂i ) = X x ∈ C \ x i \ x j Y k ∈ ∂i \ j [1 − δ ( x k , x )] , (C.1)where the sum runs over the colour set C , excluding the colours x i and x j (if x i = x j ,just one colour is excluded). Since the product in the equation above can only take thevalues 0 and 1, we can write the corresponding Boltzmann factor ase − βη ( x i ,x ∂i ) = Y x ∈ C \ x i \ x j n − (1 − e − β ) Y k ∈ ∂i \ j [1 − δ ( x k , x )] o , (C.2) alette-colouring: a belief-propagation approach − βη ( x i ,x ∂i ) = X B ⊆ C \ x i \ x j ( − − β ) | B | Y x ∈ B Y k ∈ ∂i \ j [1 − δ ( x k , x )] , (C.3)where the sum runs over all the possible subsets B of the colour set C \ x i \ x j . Then, weexchange the two products, expand the product over x (taking into account that everyproduct of two or more deltas vanishes), and use the fact that X x ∈ C δ ( x k , x ) = 1 . (C.4)We ﬁnally obtaine − βη ( x i ,x ∂i ) = X B ⊆ C \ x i \ x j ( − − β ) | B | Y k ∈ ∂i \ j X x ∈ C \ B δ ( x k , x ) . (C.5)The propagation equation (8) for a given vertex i generates an outgoing message m i → j ( x i , x j ) as a function of the set of incoming messages m k → i ( x k , x i ) (where k ∈ ∂i \ j ).Replacing the ﬁnal expression for the Boltzmann factor (C.5) into this equation, wereadily obtain the simpliﬁed propagation equation (9).As far as the free energy shift (6) is concerned, we rewrite the elementary energyterm (2) in yet another form, namely, η ( x i , x ∂i ) = X x ∈ C \ x i Y j ∈ ∂i [1 − δ ( x j , x )] . (C.6)In this case the sum runs over the colour set C , excluding only the colour x i . A totallyanalogous derivation allows us to writee − βη ( x i ,x ∂i ) = X B ⊆ C \ x i ( − − β ) | B | Y j ∈ ∂i X x ∈ C \ B δ ( x j , x ) , (C.7)which, plugged into (6), yields (10). References [1] Garey M and Johnson D S, 1979

Computers and Intractability; A guide to the Theory of NP-completeness (San Francisco, CA: Freeman)[2] Bounkong S, van Mourik J, and Saad D, 2006

Phys. Rev. E J. Phys. A: Math. Theor. Advanced Mean Field Methods: Theory and Practice ed M Opper and D Saad(Cambridge, MA: MIT Press) p 21(Yedidia J S, 2000

Mitsubishi Electric Technical Report

Exploring Artiﬁcial Intelligence in the NewMillennium (San Francisco, CA: Morgan Kaufmann) p 239(Yedidia J S, Freeman W T, and Weiss Y, 2001

Mitsubishi Electric Technical Report

Artiﬁcial Intelligence Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (SanFrancisco, CA: Morgan Kaufmann)[8] Bethe H A, 1935

Proc. R. Soc. A

Phase Transitions and Critical Phenomena vol 2 ed C Domb and M S Green(New York: Academic) ch 9 alette-colouring: a belief-propagation approach [10] Selman B, Kautz H A, and Cohen B, 1994 Proceedings of the 12th National Conference on ArtiﬁcialIntelligence (AAAI-94) (Seattle, WA: MIT Press) p 337[11] Biroli G and M´ezard M, 2002

Phys. Rev. Lett. Phys. Rev. E Advances in Neural Information Processing Systems(NIPS) vol 13 p 689(Yedidia J S, Freeman W T, and Weiss Y, 2000

Mitsubishi Electric Technical Report

Phys. Rev. J. Phys. A: Math. Gen. R309[16] M´ezard M and Parisi G, 2001

Eur. Phys. J. B Spin Glass Theory and Beyond (Singapore, WorldScientiﬁc)[18] Mulet R, Pagnani A, Weigt M, and Zecchina R, 2002

Phys. Rev. Lett. Phys. Rev. E Phys. Rev. E Phys. Rev. Lett. IEEE Trans. Inform. Theory47