Palette-colouring: a belief-propagation approach
aa r X i v : . [ c ond - m a t . s t a t - m ec h ] A p r Palette-colouring: a belief-propagation approach
Alessandro Pelizzola , , , Marco Pretti ¶ , and Jort van Mourik Dipartimento di Fisica, CNISM and Center for Computational Studies, Politecnicodi Torino – Corso Duca degli Abruzzi 24, I-10129 Torino, Italy INFN, Sezione di Torino, Torino, Italy HuGeF Torino – Via Nizza 52, I-10126 Torino, Italy CNR – Consiglio Nazionale delle Ricerche, Istituto dei Sistemi Complessi, andCNISM, Dipartimento di Fisica, Politecnico di Torino – Corso Duca degli Abruzzi 24,I-10129 Torino, Italy Non-Linearity and Complexity Research Group, Aston University, Birmingham B47ET, UK
Abstract.
We consider a variation of the prototype combinatorial-optimisationproblem known as graph-colouring. Our optimisation goal is to colour the verticesof a graph with a fixed number of colours, in a way to maximise the number ofdifferent colours present in the set of nearest neighbours of each given vertex. Thisproblem, which we pictorially call palette-colouring , has been recently addressed asa basic example of problem arising in the context of distributed data storage. Eventhough it has not been proved to be NP complete, random search algorithms find theproblem hard to solve. Heuristics based on a naive belief propagation algorithm areobserved to work quite well in certain conditions. In this paper, we build upon thementioned result, working out the correct belief propagation algorithm, which needsto take into account the many-body nature of the constraints present in this problem.This method improves the naive belief propagation approach, at the cost of increasedcomputational effort. We also investigate the emergence of a satisfiable to unsatisfiable“phase transition” as a function of the vertex mean degree, for different ensembles ofsparse random graphs in the large size (“thermodynamic”) limit. ¶ To whom correspondence should be addressed ([email protected]) alette-colouring: a belief-propagation approach
1. Introduction
Graph colouring is a prototype of combinatorial optimisation or constraint satisfactionproblems [1]. It is NP-complete, so that it can be taken as a benchmark for optimisationalgorithms. Moreover, it is at the core of a large number of technologically relevantcombinatorial problems, such as scheduling. The goal is to assign a colour to eachvertex of a given graph (with a fixed number of available colours), in such a way thatno pair of vertices connected by an edge have the same colour. Alternatively, one maybe satisfied with a suboptimal solution, i.e., minimising the number of vertex pairs withthe same colour.A nice variant of the above problem has been recently proposed and investigatedby Bounkong and coworkers [2, 3]. The variation consists in requiring that the set ofcolours assigned to each given vertex and its neighbours includes all available colours.The latter problem, which we pictorially call palette-colouring , has been suggested as abasic example of constraint satisfaction problem arising in the context of distributed datastorage [2]. The basic idea is as follows. On a computer network with limited storageresources at each node, it may be convenient to divide a file into a number of segments(colours), which are then distributed over different nodes. Each given node should beable to retrieve the different segments by accessing only its own and nearest neighbourstorage devices, whence the above described constraints. Even in this case, one mightbe satisfied with a suboptimal solution, i.e., maximising the number of colours presentin each node neighbourhood. We note that palette-colouring has not been proved to beNP-complete, but there are numerical evidences that it becomes intractable for largesystem size [2]. With respect to ordinary colouring, the most relevant difference is thatthe modified problem becomes easier to solve for graphs with higher, rather than lower,vertex degrees.In the last few years, different types of constraint satisfaction problems have beenfaced by message passing techniques, among which Belief Propagation (BP) [4, 5]. BPhas been originally conceived as a dynamic programming algorithm to perform exactstatistical inference for Markov random field models defined on graphs without loops(trees) [6, 7]. Subsequently, it has been demonstrated to be relatively good even forloopy graphs. Such a successful behaviour seems to be related to the fact that actuallyBP is equivalent to determine a minimum of an approximate free energy function (Bethefree energy) for a corresponding thermodynamic system. The Bethe approximation wasindeed very well known to physicists [8, 9], but the connection with BP is a relativelyrecent result [4].In [2], Bounkong, van Mourik, and Saad analyse an algorithm based on BP,comparing its performance with a variant of Walksat [10]. In particular the BP-based algorithm makes use of beliefs averaged over several iterations, together witha common decimation strategy. It is observed that, while Walksat works definitelybetter for small graphs (100 vertices), the opposite occurs for larger (1000 vertices)random graphs. This result is somehow related to the nature of BP itself, since large alette-colouring: a belief-propagation approach not a generalised BP [13]. Indeed,in the literature, the latter term usually denotes a class of algorithms computing theminima of more refined free energy approximations (Kikuchi [14], rather than Bethe,free energies) [15]. Here, however, we derive an algorithm computing the correct Betheapproximation, which is, the exact solution for loopless graphs. We then compare theperformance of the new BP algorithm (which we shall simply call BP from now on) tothe naive one, showing that further improvements can be obtained.Let us note that the correct Bethe approximation has already been considered forthis problem by Wong and Saad [3], in order to investigate the emergence and natureof the satisfiable to unsatisfiable transition, observed upon decreasing the mean vertexdegree of different sparse random graphs. In the replica-symmetry assumption, theauthors of the cited paper study average macroscopic properties of a given randomgraph ensemble, making use of a numerical method of the population dynamics type.In the current work, we mainly focus on the algorithmic properties of the messagepassing procedure, and related decimation strategies. In particular, we discuss bothanalytical and numerical strategies for limiting the increase of computational costarising from the pairwise messages. Also, in the last part of the paper, we develop thedistributional version of the message-passing scheme, which in the literature is usuallydenoted as cavity method [16] and used to study random (glass-like) systems [17].We limit this analysis to the replica-symmetry assumption and to the simple colour-symmetric (paramagnetic) solution. Within these simplifying hypotheses, we computethe quenched entropy for given random graph ensembles, and estimate the correspondingsatisfiability threshold, partially recovering a result of [3].
2. Statement of the problem and Belief Propagation
We consider an undirected simple graph, whose vertices are denoted by i = 1 , . . . , N .Our goal is to assign to each vertex i a colour x i from a given colour set C ≡ { , , . . . , q } , alette-colouring: a belief-propagation approach E ( x , . . . , x N ) = N X i =1 η ( x i , x ∂i ) , (1)where ∂i denotes the neighbourhood of i (i.e., the set of vertices directly connected to i by an edge), and x ∂i ≡ { x j } j ∈ ∂i the array of colour variables in ∂i . The elementaryenergy term η ( x i , x ∂i ) counts the number of missing colours in the neighbourhood of i ,including i itself. A suitable expression for the η function is therefore η ( x , . . . , x n ) = X x ∈ C n Y i =1 [1 − δ ( x i , x )] , (2)where δ ( x, y ) is a Kronecker delta, and n is the number of entries of the η function (nota-priori fixed). With the above definitions, the cost function value is E ( x , . . . , x N ) = 0if and only if the colour assignments x , . . . , x N satisfy all constraints.In the current work, we deal with this problem by studying an equivalent“thermodynamic” system, whose potential energy is defined by the cost function E ( x , . . . , x N ). For energy minimisation, we consider the zero temperature limit.The BP approach allows us to determine approximate marginals of the equilibrium(Boltzmann) probability distribution for the colour variables. As mentioned in theIntroduction, our approximation becomes exact when the graph is a tree. From thetreatment described in Appendix A, it turns out that we can write two differentmarginals, namely, the joint distribution of two colour variables on a graph edge p i,j ( x i , x j ), and the joint distribution of a given colour variable together with itsneighbours p i,∂i ( x i , x ∂i ) (“cluster” distribution), as a function of pairwise messages m j → i ( x j , x i ). Each given term m j → i ( x j , x i ) may be viewed as a message sent from thecluster { j, ∂j } to the edge { i, j } , representing the influence of the constraint associatedto the vertex j onto the colour variables of the edge { i, j } (some details about thisinterpretation are elucidated in Appendix B). In formulae, we have p i,j ( x i , x j ) = e f ij m i → j ( x i , x j ) m j → i ( x j , x i ) , (3) p i,∂i ( x i , x ∂i ) = e f i − βη ( x i ,x ∂i ) Y j ∈ ∂i m j → i ( x j , x i ) , (4)where β is the inverse temperature, and f ij and f i , usually called free energy shifts (seeAppendix A), can be determined by normalisation ase − f ij = X x i ,x j m i → j ( x i , x j ) m j → i ( x j , x i ) , (5)e − f i = X x i ,x ∂i e − βη ( x i ,x ∂i ) Y j ∈ ∂i m j → i ( x j , x i ) . (6)The messages have to satisfy a set of self-consistency equations, which basically accountfor compatibility between “overlapping” distributions. For instance, the { i, j } edgedistribution must be a marginal of the cluster distributions associated to both vertices alette-colouring: a belief-propagation approach i and j . Considering the former case, we can write p i,j ( x i , x j ) = X x ∂i \ j p i,∂i ( x i , x ∂i ) , (7)where the sum runs over the values of the array of colour variables x ∂i \ j ≡ { x k } k ∈ ∂i \ j ,i.e., the colour variables in the neighbourhood of i except x j . In fact, we can obtain theself-consistency equation by replacing (3) and (4) into the compatibility equation (7),yielding m i → j ( x i , x j ) ∝ X x ∂i \ j e − βη ( x i ,x ∂i ) Y k ∈ ∂i \ j m k → i ( x k , x i ) , (8)where a normalisation factor has been replaced by the proportionality symbol. In orderto satisfy all the necessary compatibilities, one equation of the above form must hold foreach directed edge i → j . The BP algorithm solves the set of self-consistency equationsiteratively, starting from suitable (usually random or uniform) initial conditions forthe messages, until the distance between messages at subsequent updates goes below agiven threshold. From a heuristic point of view, each message update according to (8)is usually interpreted as a propagation process, so that in the following we shall alsodenote (8) as the propagation equation . For completeness, in Appendix B we also reportthe propagation equations of the naive BP algorithm, which are numerically simpler.We note that, by employing the explicit expression (2) of the elementary energyterm (cluster energy), we can significantly reduce the computational cost of thepropagation equation (8) as well. Indeed, it turns out that the latter can be rewrittenas m i → j ( x i , x j ) ∝ X B ⊆ C \ x i \ x j ( − − β ) | B | Y k ∈ ∂i \ j X x k ∈ C \ B m k → i ( x k , x i ) , (9)where the outer sum runs over all the possible subsets B of the colour set C withoutthe colours x i , x j . The derivation can be found in Appendix C. Now, we comparethe computational cost of the generic equations with respect to the simplified form.Assuming that d is the degree of vertex i , the generic equation (8) requires ( d − q d − multiplications, which can be reduced to 2 q d − + P d − n =2 q d − n by suitable (straightforward)programming tricks. Taking into account that a trivial necessary condition for anelementary constraint to be satisfiable is d ≥ q −
1, the leading term of the computationalcost turns out to be at least q q − . The simplified equation (9), however, requires( d − q − multiplications, which is clearly much more convenient for any q > − f i = X x i ∈ C X B ⊆ C \ x i ( − − β ) | B | Y j ∈ ∂i X x j ∈ C \ B m j → i ( x j , x i ) , (10)which can be obtained by an analogous derivation. alette-colouring: a belief-propagation approach
3. Optimisation strategy and numerical results
In this section, we define the optimisation strategy, and test its performance on singleinstances of random graphs drawn from a suitable ensemble. Our strategy involves adecimation procedure, which is analogous to that of [2], but is carried out on the basis ofnearest-neighbour pair distributions p i,j ( x i , x j ), rather than single-variable distributions.Given a graph and a number q of available colours, we first fix the colour of a randomlychosen vertex, in order to break the colour permutation symmetry, and proceed asfollows. We perform the first BP run (starting from uniform messages) and determinethe pair distributions according to (3). For each edge { i, j } , we fix the colour variables x i , x j at the values ¯ x i , ¯ x j having the largest joint probability, provided the latter is largerthan a certain threshold. If no probability satisfies such a condition, we only fix thepair of variables with the largest joint probability over the whole graph. Then, we rerunBP (starting from the previously computed messages) and iterate the above procedureuntil all variables are fixed, or all constraints are satisfied (in the latter case, non-fixedvariables can be assigned a random colour). We always set the threshold probabilityat 0 .
9, as done in [2]. We observe that, in most cases, one of the two variables chosento be fixed has been already fixed at a previous stage of the decimation procedure, sothat, in most cases, we actually fix just one variable for each given pair. Therefore, eventhough we are working with pair, rather than single-variable distributions, we observethat choosing the same threshold probability results in a similar decimation rate.We now spend a few words on the precise meaning of “fixing a variable”, asintroduced above, from the point of view of the message-passing procedure. In thethermodynamic language, colouring a vertex is tantamount to imposing an infiniteenergy penalty to all other possible colours. Thus, if we want to fix a single variable x i to a given colour ¯ x i , we may add to the corresponding cluster energy η ( x i , x ∂i ) a term γ [1 − δ ( x i , ¯ x i )], and then take the limit γ → ∞ . By the propagation equation (8), it iseasy to see that such operations imply that all the messages m i → j ( x i , x j ), sent from thevertex i (more precisely, from the cluster associated to the vertex i ), must be multipliedby a prefactor δ ( x i , ¯ x i ), which basically preserves only messages of the type m i → j (¯ x i , x j ).As a consequence, when we fix the colours of two nearby vertices, it turns out that thelatter no longer need to exchange messages or, in other words, the messages remainfixed at m i → j ( x i , x j ) = m j → i ( x j , x i ) = δ ( x i , ¯ x i ) δ ( x j , ¯ x j ) . (11)Although such messages have no effect on the vertices i and j themselves, due to theform of the propagation equation, they may still influence their neighbourhoods ∂i \ j and ∂j \ i .Before presenting the results, we note that in [2] the authors observe thatthe naive BP hardly ever converges. This problem is circumvented by computingprobability distributions as “time-averages” over a number of iterations, which turnsout to provide sufficient information for guiding the decimation procedure. In ourscheme, the BP algorithm turns out to converge more frequently, except in the vicinity alette-colouring: a belief-propagation approach α ) and the updates obtainedfrom the propagation equation (with coefficient 1 − α ). The adjustable parameter α playsthe role of a damping in the propagation dynamics, and we refer to it as the dampingparameter . Nevertheless, we generally find that reaching convergence is not reallynecessary. Indeed, a very small number ν of sequential updates + of all messages aresufficient to provide the relevant information about pair probabilities, and that a largernumber of iterations does not significantly improve the overall algorithm performance.This fact allows us to drastically reduce the computational cost of the full procedure,although it does not affect the complexity of a single iteration.We are now in a position to perform a quantitative comparison with the naive BPapproach [2]. As in the cited work, we consider a number of available colours q = 4 andrandom graphs with N = 1000 vertices. Graphs are generated in such a way to havevertices with two different degrees d = ⌊ c ⌋ and d = ⌈ c ⌉ , where c is the mean degree.The degree distribution, i.e., the probability of a vertex having degree d , is therefore ρ d = ⌈ c ⌉ − c if d = ⌊ c ⌋ c − ⌊ c ⌋ if d = ⌈ c ⌉ linear distribution. We always assume c ≥ q −
1, in order to avoidthe appearance of vertices with degree less than q −
1, for which the local constraints arenecessarily unsatisfiable. We do not report results about graphs with cut-Poissonian degree distribution [2], which exhibit analogous behaviour.In figure 1 we report both perfect colouring and unsatisfaction measures, over 1000random graph samples, as a function of the mean degree. The perfect colouring measureis simply defined as the fraction of samples for which the algorithm has been able tofind a colour assignment satisfying all constraints. The unsatisfaction measure countsthe fraction of missing colours per vertex, i.e. the energy per vertex divided by thetotal number of colours, E ( x , . . . , x N ) /N q ( x , . . . , x N being the colour assignmentsfound by the algorithm), averaged over all samples. We can see that the BP approachimproves the naive one in both respects. The perfect colouring measure turns out to beconsistently increased in the vicinity of the critical mean degree values, below which itrapidly vanishes. In this region, naive BP itself was already found to work better thanthe Walksat-like algorithm, analysed in [2].In analogy with the ordinary colouring problem [18] (though with reversed rolefor the mean degree c ), we expect that, for even lower c values, our problembecomes unsatisfiable with high probability (i.e., with probability tending to 1 in the“thermodynamic” N → ∞ limit). We also expect the presence of an intermediate + With reference to the propagation equation (8), by sequential update we mean that, in generating agiven “output” (left-hand side) message, one makes use of updated “input” (right-hand side) messages,if already available. alette-colouring: a belief-propagation approach un s a t i s f a c t i on ( % ) mean degree pe r f e c t c o l o r i ng ( % ) mean degree Figure 1.
Perfect colouring (left) and unsatisfaction (right) measures over 1000graphs for naive BP [2] (open squares) and BP with ν = 3 and α = 0 . β = 10. hard-satisfiable phase in which the problem is satisfiable with high probability butBP fails, because of a clustered structure of the solution space (replica-symmetrybreaking) [16, 17, 18, 19, 20]. Accordingly, the perfect colouring probability fallingdown to zero is likely to indicate the onset of such hard-satisfiable phase rather thanthe truly unsatisfiable phase. We shall return to this point later. For the moment, weobserve that the BP approach definitely works better than the naive one, even for verylow c values, in the (expected) unsatisfiable phase. In this region we observe both areduction of the unsatisfaction measure itself and of its growth rate with decreasing c .Concerning the percentage of perfect colouring, we have noticed that theperformance of the algorithm is significantly affected by the number ν of iterationsper decimation step, only in a narrow region close to the critical c value. This suggeststhat in this region the problem is actually more difficult to solve. Some results aboutthe influence of the ν parameter are reported in figure 2. Upon increasing ν , someimprovement can also be observed in the unsatisfaction measure. However, as previouslymentioned, increasing ν values beyond 2 or 3 does not yield any further significantimprovement. We also note that a quantitatively comparable improvement of theunsatisfaction measure is obtained by choosing a small but nonzero value of the dampingparameter α . All the results reported in the current paper have been obtained with α = 0 .
1, but it turns out that in a rather large range (0 . . α . .
3) the averagealgorithm performance is practically independent of the precise value of the dampingparameter. Finally, we note that (for ν ≥
2) the perfect colouring measure exhibits aslight kink at c = 4 .
0. This can be ascribed to an abrupt change in the structure of thegraph ensemble. In fact, according to the linear degree distribution (12), for c = 4 allvertices have exactly degree 4, whereas, for c > c < alette-colouring: a belief-propagation approach un s a t i s f a c t i on ( % ) mean degree pe r f e c t c o l o r i ng ( % ) mean degree Figure 2.
Perfect colouring (left) and unsatisfaction (right) measures over 1000graphs for BP with α = 0 . β = 10, as a function of the mean degree. Squares,circles, triangles denote ν = 1 , ,
3, respectively. In the main figures, interpolationbetween data-points in the transition region has been performed by taking into accountthe extra data-points reported in the insets. un s a t i s f a c t i on ( % ) mean degree pe r f e c t c o l o r i ng ( % ) mean degree Figure 3.
Perfect colouring (left) and unsatisfaction (right) measures over 1000graphs for BP with ν = 2, α = 0 .
1, and β = 10, as a function of the mean degree.Squares, circles, triangles denote number of vertices N = 1000 , , We have also analysed the algorithm behaviour as a function of the number ofvertices N . The results are reported in figure 3. We can see that the transition in theperfect colouring probability becomes more and more abrupt upon increasing N , anda cross-over point appears at a mean degree value c ≈ . N → ∞ limit. The latter conjectureis consistent with the fact that random graphs of increasing size become more and moretree-like, such that the BP approach is able to provide better and better approximations.In principle, the cross-over point might be the signature of the satisfiable to unsatisfiabletransition, but, as previously mentioned, we are rather led to identify it with the onset alette-colouring: a belief-propagation approach
4. Entropy and satisfiability threshold
In this section, we study average macroscopic properties of the BP solution overrandom graph ensembles, with particular attention to the average entropy. The latter isusually denoted as quenched entropy in statistical mechanics language. Taking the limit β → ∞ , this quantity provides an average measure of (the logarithm of) the number ofzero energy configurations, i.e., perfect colourings, for a given ensemble, which alsoallows us to estimate the satisfiability threshold. In this context, the main sourceof approximation will be the replica-symmetry assumption, since the approximationdue to BP itself is expected to be negligible in the infinite size limit. Furthermore,we limit the analysis to BP solutions that do not break the colour permutationsymmetry (“paramagnetic” solutions), because we have numerical evidence that, whenBP converges, no spontaneous symmetry breaking of the solution is ever observed.Average properties of non-paramagnetic (glass-like) solutions have been investigatedin [3], but they do only appear at very low c values, where the replica-symmetryassumption is expected to break down anyway.According to the paramagnetic ansatz, the messages are always such that m i → j ( x, x )does not depend on x , and m i → j ( x, y ) does not depend on x, y , if x = y . This means thatthe only important quantity is u i → j ≡ m i → j ( x, x ) /m i → j ( x, y ), i.e., the ratio between the“equal colours” message and the “different colours” message. Taking into account thatthe message normalisation is irrelevant to all observable quantities, we can write the fullmessage as m i → j ( x, y ) = 1 − (1 − u i → j ) δ ( x, y ) = ( u i → j if x = y . (13)We note that in principle one could also think about the inverse ratio m i → j ( x, y ) /m i → j ( x, x ) as the relevant message, but this choice turns out to be un-feasible, due to the nature of the constraints, favouring the presence of different neigh-bouring colours. Indeed, at zero temperature, it is easy to foresee the emergence of“hard” messages such that m i → j ( x, x ) = 0, stemming from vertices with degree q − m i → j ( x, y ) = 0 for x = y in a paramagnetic state.Replacing (13) into the inner sum appearing in the simplified propagationequation (9), we can write X x k ∈ C \ B m k → i ( x k , x i ) = q − | B | − u k → i , (14)where the term − u k → i appears because x i / ∈ B . Since the sum above only dependson B via its cardinality | B | , in (9) we can replace the sum over B by a sum over alette-colouring: a belief-propagation approach u i → j = q − X n =0 (cid:18) q − n (cid:19) ( − n Y k ∈ ∂i \ j ( q − n − u k → i ) q − X n =0 (cid:18) q − n (cid:19) ( − n Y k ∈ ∂i \ j ( q − n − u k → i ) , (15)in which we have also taken the zero temperature ( β → ∞ ) limit. The cluster free energyshift can be similarly derived by replacing (13) into (10). In the zero temperature limit,we obtain e − f i = q q − X n =0 (cid:18) q − n (cid:19) ( − n Y j ∈ ∂i ( q − n − u j → i ) . (16)The edge free energy shifts can be directly obtained by inserting (13) into (5)e − f ij = q ( q − u i → j u j → i ) . (17)We can characterise a random graph ensemble by a probability distribution ofmessages P ( u ). Such a distribution has to obey a functional equation (usually knownas cavity equation [16]) of the following form P ( u ) = X d ˜ ρ d Z d u P ( u ) . . . Z d u d − P ( u d − ) δ ( u − ˆ u ( u , . . . , u d − )) , (18)where ˆ u ( u , . . . , u d − ) is the “propagation function” defined by (15), and where ˜ ρ d is theprobability of finding a vertex of degree d by choosing a random direction in a randomlyselected edge. It is easy to see that ˜ ρ d is related to the degree distribution ρ d as˜ ρ d = dρ d c . (19)In the context of the cavity method, the replica-symmetry assumption consists in thefact that we consider a single distribution of messages. In a replica-symmetry breakingscenario, each propagated quantity u i → j (message) would be replaced by a probabilitydistribution defined over different ergodic components (states) [16].We solve the functional equation (18) numerically by a population dynamicsapproach [16]. In a nutshell, we represent the distribution P ( u ) by an evolvingpopulation of messages. An elementary evolution step consists in generating a newmessage according to the propagation equation (15), making use of d − d is randomly generated according to the˜ ρ d distribution. The newly generated message replaces a randomly selected message ofthe population. Due to the presence of hard messages u = 0 generated by degree q − P ( u ) contains a Dirac delta peakcentred in zero with weight ˜ ρ q − . alette-colouring: a belief-propagation approach ha r d m e ss age s ( % ) mean degree en t r op y pe r v e r t e x mean degree Figure 4.
Entropy per vertex (left) and fraction of hard messages (right) for randomgraphs with linear degree distribution ( q = 4), as a function of the mean degree c . From the message distribution, we can evaluate the average cluster and edge free energyshifts as: f c = X d ρ d Z d u P ( u ) . . . Z d u d P ( u d ) f c ( u , . . . , u d ) , (20) f e = Z d u P ( u ) Z d u P ( u ) f e ( u , u ) , (21)where the functions f c ( u , . . . , u d ) and f e ( u , u ) are defined by (16) and (17). Thus weobtain the average free energy per vertex as f = f c − c f e , (22)where c/ β factor in our free energydefinition, and since the limit β → ∞ fixes the energy at zero, the entropy per vertex issimply s = − f .For actual calculations, we have considered random graph ensembles with the lineardegree distribution (as defined in the previous section), and with the cut-Poissoniandistribution (also considered in [2]), defined as ρ d = e − ( c − q +1) ( c − q + 1) d − q +1 ( d − q + 1)! if d ≥ q −
10 otherwise , (23)where c is still the mean degree. This distribution also excludes vertices with degreesmaller than q − ρ q − for the linear degree distribution turns out to be nonzero only alette-colouring: a belief-propagation approach ha r d m e ss age s ( % ) mean degree en t r op y pe r v e r t e x mean degree Figure 5.
The same as figure 4 for random graphs with cut-Poissonian degreedistribution. for c < q , which explains the kink observed in the entropy function. Negative entropyidentifies the unsatisfiable region (perfect colourings are exponentially rare), whereasthe zero entropy point identifies the satisfiability threshold c th . For the two ensembles,we respectively find c th ≈ .
825 and c th ≈ . c th ≈ . c th ≈ .
1. Asfar as the linear ensemble is concerned, we expect that our result is also analyticallyequivalent to the (replica-symmetric) one by Wong and Saad [3], and in fact we obtaina very good numerical agreement for the threshold value.
5. Summary and conclusions
In this paper, we have considered a variation of the well-known graph colouring problem,which may be viewed as the prototype of a combinatorial optimisation problem emergingin the context of distributed data storage. We have worked out the BP equationsfor this problem, which provide the exact solution on a tree. Due to the many-body nature of the problem, such equations turn out to be different from the naiveBP message-passing scheme, as the latter involves messages sent to single variables,whereas the former involve messages sent to pairs of nearest neighbour variables. Oursimulations, performed on random graphs drawn from a suitable ensemble, suggestthat the new algorithm, associated with a decimation procedure, turns out to be muchmore effective than the naive BP-based algorithm. In particular, the probability offinding a perfect colouring is significantly enhanced, especially in the vicinity of thesatisfiable-to-unsatisfiable transition. Furthermore, both the unsatisfaction measureand its growth rate upon decreasing the average graph connectivity are significantlyreduced. This improved performance is, however, obtained at the cost of increased alette-colouring: a belief-propagation approach
Appendix A. Belief Propagation equations
The BP equations can in general be derived from a very simple recipe. One first “fakes”that the graph is a tree and then formally applies the equations obtained for such a caseto a generic graph. This derivation also provides a heuristic argument explaining whythe method generally works better for graphs with a tree-like structure.According to the Boltzmann law, the joint probability distribution of all the colourvariables can be written as p ( x , . . . , x N ) = e F − βE ( x ,...,x N ) , (A.1)where E ( x , . . . , x N ) is the energy function (1), β is the inverse temperature, and F isthe free energy (times β ), which can be determined by normalisation. Following our“fake assumption”, we can consider, for each edge i → j (defined with a direction), thebranch growing from the root vertex j towards i , disconnected from the remainder ofthe system (see figure A1). We can thus define a partial energy function E i → j ( x i → j ),obtained by summing the elementary interaction energies only for vertices in the branch,except the root vertex. Since our elementary interaction energies couple together clustersof variables including each vertex and all its neighbours, each partial energy functiondepends on the array of all colour variables in the branch including the root vertex. We alette-colouring: a belief-propagation approach Figure A1.
Tree graph (left), disconnected branch i → j (centre), and decompositionof the latter into subbranches k → i , for k ∈ ∂i \ j , plus the elementary clusterassociated to i (right). denote this array by x i → j . Now, each disconnected branch can be ideally studied as anindependent subsystem, whose Boltzmann probability distribution turns out to be p i → j ( x i → j ) = e F i → j − βE i → j ( x i → j ) , (A.2)where F i → j denotes the corresponding free energy. Note that it is possible to decomposethe partial energy of the given branch i → j into a sum of the partial energies of itssubbranches k → i , for all k ∈ ∂i \ j , plus the elementary interaction energy associatedto i (see figure A1): E i → j ( x i → j ) = η ( x i , x ∂i ) + X k ∈ ∂i \ j E k → i ( x k → i ) . (A.3)We also define a free energy shift f i → j as the difference between the free energy ofthe i → j disconnected branch and the sum of free energies of its (disconnected)subbranches, i.e., F i → j = f i → j + X k ∈ ∂i \ j F k → i . (A.4)From (A.2), (A.3), and (A.4), we can write p i → j ( x i → j ) = e f i → j − βη ( x i ,x ∂i ) Y k ∈ ∂i \ j p k → i ( x k → i ) , (A.5)which provides a relationship between the Boltzmann distribution of the i → j disconnected branch and those of its (disconnected) subbranches. Defining the messages m i → j ( x i , x j ) as marginals of a corresponding branch distribution p i → j ( x i → j ) over thevariables x j and x i (respectively, the root vertex and its first neighbour in the branch)we finally obtain the self-consistency equation (8).We still have to show how messages can determine cluster and edge marginals of thefull Boltzmann distribution (A.1). As in our previous manipulations, we observe that,for each given vertex i , it is possible to write the total energy function (1) as a sum of alette-colouring: a belief-propagation approach j → i , for all j ∈ ∂i , plus the elementaryinteraction energy associated to i : E ( x , . . . , x N ) = η ( x i , x ∂i ) + X j ∈ ∂i E j → i ( x j → i ) . (A.6)Defining also the free energy shift f i as the difference between the total free energy F and the sum of the disconnected branch free energies, for all the possible branchesgrowing from vertex i , i.e., F = f i + X j ∈ ∂i F j → i , (A.7)from (A.1), (A.2), (A.6), and (A.7), we easily obtain p ( x , . . . , x N ) = e f i − βη ( x i ,x ∂i ) Y j ∈ ∂i p j → i ( x j → i ) . (A.8)Now, the cluster distribution p i,∂i ( x i , x ∂i ) for each vertex i can be derived as a suitablemarginal of p ( x , . . . , x N ). By this marginalisation, we obtain (4). As far as edgemarginals are concerned, we have to consider a different decomposition of the totalenergy function. Namely, for each edge { i, j } , the former can be written as a sum oftwo contributions from respectively the branch starting from j towards i and the onestarting from i towards j : E ( x , . . . , x N ) = E i → j ( x i → j ) + E j → i ( x j → i ) . (A.9)We define the free energy shift f ij as the difference between the total free energy F andthe sum of the free energies of the disconnected branches mentioned above, i.e., F = f ij + F i → j + F j → i . (A.10)From (A.1), (A.2), (A.9), and (A.10), we obtain p ( x , . . . , x N ) = e f ij p i → j ( x i → j ) p j → i ( x j → i ) . (A.11)Evaluating the edge distribution p i,j ( x i , x j ) as a marginal of p ( x , . . . , x N ), we obtain(3). Finally, we determine the total free energy as a function of the free energy shifts.First we sum both sides of (A.7) over all vertices i , and both sides of (A.10) over alledges { i, j } . Then we subtract the latter equation from the former. It is easy to seethat, on a tree, the number of vertices equals the number of edges plus one, such thatthe left-hand side of the resulting equation turns out to be exactly F . Furthermore, inthe right-hand side all the branch free energies cancel out, and we obtain F = N X i =1 f i − X { i,j } f ij , (A.12)where P { i,j } denotes the sum over all edges. alette-colouring: a belief-propagation approach Figure B1.
A simple undirected graph (left), and the related factor graphs givingrise to naive BP (centre) and BP (right). Open circles and squares denote variableand function nodes, respectively. The labels are explained in the text.
Appendix B. Factor graph formalism
In this appendix, we first introduce a more general form of BP equations, defined ona factor graph [22]. Then, we show that from this form one can derive both the naiveBP equations of [2] and the BP equations of the current paper by two different factorgraphs associated to the same problem.A factor graph is a bipartite graph, whose left- and right-side vertices are usuallyreferred to as variable nodes and function nodes . The notion of factor graph is meantto describe the structure of the energy function, whose independent variables (i.e., theconfiguration variables of the corresponding thermodynamic system) are associated tothe variable nodes. A function node connected to a number of variable nodes representsan elementary interaction among the corresponding variables. Let V denote the set ofall the variable nodes, such that each node v ∈ V is associated with a configurationvariable x v . Let also A ⊆ V denote any subset (cluster) of variable nodes, and let x A ≡ { x v } v ∈ A denote the array of the associated configuration variables. We can thuswrite the energy function as E ( x V ) = X A ∈F ǫ A ( x A ) , (B.1)where ǫ A ( x A ) denotes the elementary interaction energy among the variables in thecluster A (cluster energy), whereas the sum runs over the set F of all the interactingclusters. In what follows, the same label A denotes both a function node and the clusterof variable nodes connected to it. An example of factor graphs describing the energyfunction of a palette-colouring problem is sketched in figure B1.When the factor graph is a tree, an argument similar to that in Appendix A allowsone to write marginals of the Boltzmann distribution as follows:– For each variable node v ∈ V we have the marginal: p v ( x v ) = e f v Y A ∈F A ∋ v m A → v ( x v ) , (B.2)where the product runs over all the clusters A to which v belongs (i.e. all the function alette-colouring: a belief-propagation approach v ), m A → v ( x v ) is a function-to-variable message, and f v is a freeenergy shift (ensuring normalisation).– For each cluster A ∈ F , we have the marginal: p A ( x A ) = e f A − βǫ A ( x A ) Y v ∈ A w v → A ( x v ) , (B.3)where f A is a free energy shift, and where w v → A ( x v ) is a variable-to-function message: w v → A ( x v ) = Y A ′ ∈F\ AA ′ ∋ v m A ′ → v ( x v ) , (B.4)a product of the messages sent to v from all connected function nodes except A .As shown in Section 2, one can derive the propagation equations by imposingcompatibility between overlapping distributions. In this case, for all A ∈ F and forall v ∈ A , we can write p v ( x v ) = X x A \ v p A ( x A ) , (B.5)where the sum runs over all possible values of the variables in the cluster A except x v .Inserting (B.2), (B.3) into (B.5), we obtain the propagation equation m A → v ( x v ) ∝ X x A \ v e − βǫ A ( x A ) Y v ′ ∈ A \ v w v ′ → A ( x v ′ ) , (B.6)with the w v ′ → A ( x v ′ ) defined by (B.4). Note that, as in (8), we have replaced thenormalisation factor with a proportionality symbol. Finally, following the argumentof Appendix A, we write the total free energy as a function of the free energy shifts as F = X A ∈F f A − X v ∈ V ( d v − f v , (B.7)where d v is the degree of the variable node v in the factor graph. Naive BP
We first consider the energy function (1), where the configuration (colour) variables x i are associated with the vertices i = 1 , . . . , N of an ordinary graph, and the elementaryinteraction energy involves a cluster made up of a vertex i and all its neighbours ∂i .This structure is described by a factor graph in which the variable nodes are associatedwith the vertices of the original graph and the function nodes with the clusters. We canuse the same index for both the variable node i and the function node with i at its centre(the cluster A i ≡ { i, ∂i } ). Hence, each variable node i receives messages m A j → i ( x i ) fromall the function nodes A j with j ∈ ∂i , and from A i itself. With the short-hand m j → i for m A j → i , omitting the normalisation factor, (B.2) becomes p i ( x i ) ∝ m i → i ( x i ) Y j ∈ ∂i m j → i ( x i ) . (B.8) alette-colouring: a belief-propagation approach A i receives variable-to-function messages from i and all j ∈ ∂i ,and the cluster distribution for A i (B.3) becomes p i,∂i ( x i , x ∂i ) ∝ e − βη ( x i ,x ∂i ) w i → i ( x i ) Y j ∈ ∂i w j → i ( x j ) . (B.9)We have identified ǫ A i ( x A i ) with η ( x i , x ∂i ), and w j → i is short-hand for w j → A i . From(B.4), one can see that the variable-to-function messages take two slightly differentforms, depending on whether they travel (to the cluster A i ) either from the “central”node i or from a “peripheral” node j ∈ ∂i . In the simplified notation, we haverespectively w i → i ( x i ) = Y j ∈ ∂i m j → i ( x i ) , (B.10) w j → i ( x j ) = m j → j ( x j ) Y k ∈ ∂j \ i m k → j ( x j ) . (B.11)The compatibility condition (B.5), can also be written in two different forms. For all i = 1 , . . . , N , j ∈ ∂i , we have respectively: p i ( x i ) = X x ∂i p i,∂i ( x i , x ∂i ) , (B.12) p i ( x i ) = X x j ,x ∂j \ i p j,∂j ( x j , x ∂j ) . (B.13)Using (B.8) and (B.9), this in turn gives rise to two different propagation equations: m i → i ( x i ) ∝ X x ∂i e − βη ( x i ,x ∂i ) Y j ∈ ∂i w j → i ( x j ) , (B.14) m j → i ( x i ) ∝ X x j ,x ∂j \ i e − βη ( x j ,x ∂j ) w j → j ( x j ) Y k ∈ ∂j \ i w k → j ( x k ) . (B.15)These equations, together with (B.10) and (B.11), are identical (apart from the notation)to the naive BP equations presented in [2]. From figure B1 one sees that even when theoriginal graph is a tree, the corresponding factor graph contains short loops, and thenaive BP equations are not exact. Current BP
We now consider an alternative form of the energy function (1) by introducing:(i) a variable x ji for each vertex-neighbour pair ( i, j ∈ ∂i ) (a kind of “replica” of x i );(ii) a constraint imposing that all replicas of x i are equal for each vertex i .The constraints can be realised by assigning infinite energy penalties to configurationswe want to be forbidden. Assuming γ → ∞ , we define E ( { x j } j ∈ ∂ , . . . , { x jN } j ∈ ∂N ) = N X i =1 (cid:2) η ( x ∗ i , { x ij } j ∈ ∂i ) + γ χ ( { x ji } j ∈ ∂i ) (cid:3) , (B.16)where the function χ ( · ) returns 1 when its entries are not all equal, and 0 otherwise,whereas x ∗ i means that the replica index is irrelevant. Note that the allowed (finite alette-colouring: a belief-propagation approach η ( · ), but only on the fact that each vertex of theoriginal graph interacts (at most) with all its neighbours. With these definitions, eachedge { i, j } of the original graph can be naturally associated with the pair of variables { x ji , x ij } (the j -replica of x i and the i -replica of x j ). Moreover, the structure of themodified energy function (B.16) is described by a factor graph in which the variablenodes v correspond to the edges { i, j } of the original graph, while the function nodes A now correspond to the clusters of interacting edges A i ≡ {{ i, j }| j ∈ ∂i } . Figure B1shows that, when the original graph is a tree, this factor graph is also one, and everyvariable node { i, j } has degree 2, so that it only receives messages from the functionnodes A i and A j . Using m i → j as short-hand for m A i →{ i,j } , (B.2) becomes p { i,j } ( x ji , x ij ) = e f { i,j } m i → j ( x ji , x ij ) m j → i ( x ij , x ji ) . (B.17)The variable-to-function messages (B.4) are simply w i → j ( x ji , x ij ) = m i → j ( x ji , x ij ) , (B.18)where w i → j is short-hand for w { i,j }→ A j . Finally, the cluster distribution (B.3) is p A i ( { x ji , x ij } j ∈ ∂i ) = e f Ai − βη ( x ∗ i , { x ij } j ∈ ∂i ) − βγχ ( { x ji } j ∈ ∂i ) Y j ∈ ∂i m j → i ( x ij , x ji ) , (B.19)where the cluster energy ǫ A ( x A ) has been replaced with the elementary term of (B.16).Discarding forbidden configurations (dropping replica indices), (B.17) is equivalentto (3), and, since all the χ -terms vanish, (B.19) is equivalent to (4). This is sufficient toderive the propagation equation (8), as shown in Section 2. Finally, the free energy (B.7)is equivalent to (A.12), as all variable nodes of the factor graph have degree 2. Appendix C. Simplified equations
In this appendix, we derive the simplified forms (9) and (10) of the propagationequation (8) and the free energy shift (6), respectively. Both derivations are basedon similar manipulations. We consider the elementary energy term (2) associated tovertex i , and note that it can be written in an alternative form for each given choice ofa neighbour vertex j ∈ ∂i : η ( x i , x ∂i ) = X x ∈ C \ x i \ x j Y k ∈ ∂i \ j [1 − δ ( x k , x )] , (C.1)where the sum runs over the colour set C , excluding the colours x i and x j (if x i = x j ,just one colour is excluded). Since the product in the equation above can only take thevalues 0 and 1, we can write the corresponding Boltzmann factor ase − βη ( x i ,x ∂i ) = Y x ∈ C \ x i \ x j n − (1 − e − β ) Y k ∈ ∂i \ j [1 − δ ( x k , x )] o , (C.2) alette-colouring: a belief-propagation approach − βη ( x i ,x ∂i ) = X B ⊆ C \ x i \ x j ( − − β ) | B | Y x ∈ B Y k ∈ ∂i \ j [1 − δ ( x k , x )] , (C.3)where the sum runs over all the possible subsets B of the colour set C \ x i \ x j . Then, weexchange the two products, expand the product over x (taking into account that everyproduct of two or more deltas vanishes), and use the fact that X x ∈ C δ ( x k , x ) = 1 . (C.4)We finally obtaine − βη ( x i ,x ∂i ) = X B ⊆ C \ x i \ x j ( − − β ) | B | Y k ∈ ∂i \ j X x ∈ C \ B δ ( x k , x ) . (C.5)The propagation equation (8) for a given vertex i generates an outgoing message m i → j ( x i , x j ) as a function of the set of incoming messages m k → i ( x k , x i ) (where k ∈ ∂i \ j ).Replacing the final expression for the Boltzmann factor (C.5) into this equation, wereadily obtain the simplified propagation equation (9).As far as the free energy shift (6) is concerned, we rewrite the elementary energyterm (2) in yet another form, namely, η ( x i , x ∂i ) = X x ∈ C \ x i Y j ∈ ∂i [1 − δ ( x j , x )] . (C.6)In this case the sum runs over the colour set C , excluding only the colour x i . A totallyanalogous derivation allows us to writee − βη ( x i ,x ∂i ) = X B ⊆ C \ x i ( − − β ) | B | Y j ∈ ∂i X x ∈ C \ B δ ( x j , x ) , (C.7)which, plugged into (6), yields (10). References [1] Garey M and Johnson D S, 1979
Computers and Intractability; A guide to the Theory of NP-completeness (San Francisco, CA: Freeman)[2] Bounkong S, van Mourik J, and Saad D, 2006
Phys. Rev. E J. Phys. A: Math. Theor. Advanced Mean Field Methods: Theory and Practice ed M Opper and D Saad(Cambridge, MA: MIT Press) p 21(Yedidia J S, 2000
Mitsubishi Electric Technical Report
Exploring Artificial Intelligence in the NewMillennium (San Francisco, CA: Morgan Kaufmann) p 239(Yedidia J S, Freeman W T, and Weiss Y, 2001
Mitsubishi Electric Technical Report
Artificial Intelligence Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (SanFrancisco, CA: Morgan Kaufmann)[8] Bethe H A, 1935
Proc. R. Soc. A
Phase Transitions and Critical Phenomena vol 2 ed C Domb and M S Green(New York: Academic) ch 9 alette-colouring: a belief-propagation approach [10] Selman B, Kautz H A, and Cohen B, 1994 Proceedings of the 12th National Conference on ArtificialIntelligence (AAAI-94) (Seattle, WA: MIT Press) p 337[11] Biroli G and M´ezard M, 2002
Phys. Rev. Lett. Phys. Rev. E Advances in Neural Information Processing Systems(NIPS) vol 13 p 689(Yedidia J S, Freeman W T, and Weiss Y, 2000
Mitsubishi Electric Technical Report
Phys. Rev. J. Phys. A: Math. Gen. R309[16] M´ezard M and Parisi G, 2001
Eur. Phys. J. B Spin Glass Theory and Beyond (Singapore, WorldScientific)[18] Mulet R, Pagnani A, Weigt M, and Zecchina R, 2002
Phys. Rev. Lett. Phys. Rev. E Phys. Rev. E Phys. Rev. Lett. IEEE Trans. Inform. Theory47