[PDF] Approximate inference on planar graphs using Loop Calculus and Belief Propagation

Abstract

We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in (Certkov et al., 2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for the partition function approximation for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference.

Full PDF

aa r X i v : . [ c s . A I] M a y Approximate inference on planar graphs using Loop Calculus and BP

Approximate inference on planar graphs using Loop Calculusand Belief Propagation

Vicen¸c G´omez [email protected]

Hilbert J. Kappen [email protected]

Department of BiophysicsRadboud University Nijmegen6525 EZ Nijmegen, The Netherlands

Michael Chertkov [email protected]

Theoretical Division and Center for Nonlinear StudiesLos Alamos National LaboratoryLos Alamos, NM 87545

Editor:

Abstract

We introduce novel results for approximate inference on planar graphical models usingthe loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows toexpress the exact partition function of a graphical model as a ﬁnite sum of terms that can beevaluated once the belief propagation (BP) solution is known. In general, full summationover all correction terms is intractable. We develop an algorithm for the approach presentedin Chertkov et al. (2008) which represents an eﬃcient truncation scheme on planar graphsand a new representation of the series in terms of Pfaﬃans of matrices. We analyze theperformance of the algorithm for the partition function approximation for models withbinary variables and pairwise interactions on grids and other planar graphs. We study indetail both the loop series and the equivalent Pfaﬃan series and show that the ﬁrst termof the Pfaﬃan series for the general, intractable planar model, can provide very accurateapproximations. The algorithm outperforms previous truncation schemes of the loop seriesand is competitive with other state-of-the-art methods for approximate inference.

Keywords: belief propagation, loop calculus, approximate inference, partition function,planar graphs.

1. Introduction

Graphical models are popular tools widely used in many areas which require modeling ofuncertainty. They provide an eﬀective approach through a compact representation of thejoint probability distribution. The two most common types of graphical models are BayesianNetworks (BN) and Markov Random Fields (MRFs).The partition function of a graphical model, which plays the role of normalization con-stant in a MRF or probability of evidence (likelihood) in a BN is a fundamental quantitywhich arises in many contexts such as hypothesis testing or parameter estimation. Exactcomputation of this quantity is only feasible when the graph is not too complex, or equiv- c (cid:13) Vicen¸c G´omez, Hilbert J. Kappen and Michael Chertkov. ´omez, Kappen and Chertkov alently, when its tree-width is small. Currently many methods are devoted to approximatethis quantity.The belief propagation (BP) algorithm (Pearl, 1988) is at the core of many of theseapproximate inference methods. Initially thought as an exact algorithm for tree graphs,it is widely used as an approximation method for loopy graphs (Murphy et al., 1999;Frey and MacKay, 1998). The exact partition function is explicitly related to the BP ap-proximation through the loop calculus framework introduced by Chertkov and Chernyak(2006a). Loop calculus allows to express the exact partition function as a ﬁnite sum ofterms (loop series) that can be evaluated once the BP solution is known. Each term mapsuniquely to a subgraph, also denoted as a generalized loop, where the connectivity of anynode within the subgraph is at least degree two. Summation of the entire loop series is ahard combinatorial task since the number of generalized loops is typically exponential inthe size of the graph. However, diﬀerent approximations can be obtained by consideringdiﬀerent subsets of generalized loops in the graph.It has been shown empirically (G´omez et al., 2007; Chertkov and Chernyak, 2006b) thattruncating this series may provide eﬃcient corrections to the initial BP approximation.More precisely, whenever BP performs satisfactorily which occurs in the case of suﬃcientlyweak interactions between variables or short-range inﬂuence of loops, accounting for only asmall number of terms is suﬃcient to recover the exact result (G´omez et al., 2007). On theother hand, for those cases where BP requires many iterations to converge, many terms of theseries are required to improve substantially the approximation. A formal characterizationof the classes of tractable problems via loop calculus still remains as an open question.A step toward this goal has been done in Chertkov et al. (2008) where it was shownthat for any graphical model, summation of a certain subset of terms can be mapped to asummation of weighted perfect matchings on an extended graph. For planar graphs (graphsthat can be embedded into a plane without crossing edges), summation of the subset can beperformed in polynomial time evaluating the Pfaﬃan of a skew-symmetric matrix associatedwith the extended graph. Furthermore, the full loop series can be expressed as a sum overcertain Pfaﬃan terms, where each Pfaﬃan term accounts for a large number of loops andis solvable in polynomial time as well.The approach of Chertkov et al. (2008) builds on classical results from 1960s by Kasteleyn(1963); Fisher (1966) and other physicists who addressed the question of counting the num-ber of perfect matchings on a planar grid, also known as the dimer problem in the statisticalphysics literature (a dimer correspond to a colored edge of the graph, and a valid dimerconﬁguration consists of exactly one dimer per any edge of the graph). The key result ofKasteleyn (1963); Fisher (1966) can be summarized as follows: the partition function of a planar graphical model deﬁned in terms of binary variables can be mapped to a weightedperfect matching problem and calculated in polynomial time under the restriction that in-teractions only depend on agreement or disagreement between the signs of their variables.Such a model is known in statistical physics as the Ising model without external ﬁeld . Noticethat exact inference for a general binary graphical model on a planar graph (that is Isingmodel with external ﬁeld) is intractable (Barahona, 1982).Recently, some methods for inference over graphical models, based on the works ofKasteleyn and Fisher, have been introduced. Globerson and Jaakkola (2007) obtained up-per bounds on the partition function for non-planar graphs with binary variables by decom- pproximate inference on planar graphs using Loop Calculus and BP position of the partition function into a weighted sum over partition functions of spanningtractable (zero ﬁeld) planar models. The resulting problem is a convex optimization prob-lem and, since exact inference can be done in each planar sub- model, the bound can becalculated in polynomial time.Another example is the work of Schraudolph and Kamenetsky (2008) which providesa framework for exact inference on a restricted class of planar graphs using the approachof Kasteleyn and Fisher. More precisely, they showed that any joint probability functiondeﬁned on binary variables can be expressed in a functional form without external ﬁelds byadding a new auxiliary node linked to all the existing nodes. Under this transformation,single-variable external ﬁelds can be allowed for a subset B of variables. If the graphicalmodel is B− outerplanar, which means that there exists a planar embedding in which thesubset B of the nodes lie on the same face, the techniques of Kasteleyn and Fisher can stillbe applied.Contrary to the two aforementioned approaches which rely on exact inference on atractable planar model, the loop calculus directly leads to a framework for approximateinference on general planar graphs. Truncating the loop series according to Chertkov et al.(2008) already gives the exact result in the zero external ﬁeld case. In the general pla-nar case, however, this truncation may result into an accurate approximation that can beincrementally corrected by considering subsequent terms in the series.In the next Section we review the main theoretical results of the loop calculus ap-proach for planar graphs and introduce the proposed algorithm. In Section 3 we provideexperimental results on approximation of the partition function for regular grids and othertypes of planar graphs. We focus on a planar-intractable binary model with symmetricpairwise interactions but nonzero single variable potentials. The source code used to derivethese results is freely available at . We end thismanuscript with conclusions and future work in Section 4.

2. Belief Propagation and loop Series for Planar Graphs

We consider the Forney graph representation, also called general vertex model (Forney,2001; Loeliger, 2004), of a probability distribution p ( σ ) deﬁned over a vector σ of binaryvariables (vectors are denoted using bold symbols). Forney graphs are associated withgeneral graphical models which subsume other factor graphs, e.g. those correspondent toBNs and MRFs. In Appendix A we show how to convert a factor graph model to itsequivalent Forney graph representation.A binary Forney graph G := ( V , E ) consists of a set of nodes V where each node a ∈ V represents an interaction and each edge ( a, b ) ∈ E represents a binary variable ab which takevalues σ ab := {± } . We denote ¯ a the set of neighbors of node a . Interactions f a ( σ a ) arearbitrary functions deﬁned over typically small subsets of variables where σ a is the vectorof variables associated with node a , i.e. σ a := ( σ ab , σ ab , . . . ) where b i ∈ ¯ a .The joint probability distribution of such a model factorizes as: p ( σ ) = Z − Y a ∈V f a ( σ a ) , Z = X σ Y a ∈V f a ( σ a ) , (1)where Z is the normalization factor, also called the partition function. ´omez, Kappen and Chertkov From a variational perspective, a ﬁxed point of the BP algorithm represents a stationarypoint of the Bethe ”free energy” approximation under proper constraints (Yedidia et al.,2000). In the Forney style notation: Z BP = exp (cid:0) − F BP (cid:1) , (2) F BP = X a X σ a b a ( σ a ) ln (cid:18) b a ( σ a ) f a ( σ a ) (cid:19) − X b ∈ ¯ a X σ ab b ab ( σ ab ) ln b ab ( σ ab ) , where b a ( σ a ) and b ab ( σ ab ) are the beliefs (pseudo-marginals) associated to each node a ∈ V and variable ab . For graphs without loops, Equation (2) coincides with the Gibbs ”freeenergy” and therefore Z BP coincides with the exact partition function Z . If the graph con-tains loops, Z BP is just an approximation critically dependent on how strong the inﬂuenceof the loops is.We introduce now some convenient deﬁnitions related to the loop calculus framework. Deﬁnition 1 A generalized loop in a graph G = hV , Ei is any subgraph C = h V ′ , E ′ i , V ′ ⊆ V , E ′ ⊆ ( V ′ × V ′ ) ∩ E such that each node in V ′ has degree two or larger. For simplicity, we will use the term ”loop”, instead of ”generalized loop”, in the rest of thismanuscript. Loop calculus allows to represent Z explicitly in terms of the BP approximationvia the loop series expansion: Z = Z BP · z, z = X C ∈C r C ! , r C = Y a ∈ C µ a ;¯ a C , (3)where C is the set of all the loops within the graph. Each loop term r C is a product ofterms µ a, ¯ a C associated with every node a of the loop. ¯ a C denotes the set of neighbors of a within the loop C : µ a ;¯ a C = X σ a b a ( σ a ) Y b ∈ ¯ a C ( σ ab − m ab ) Y b ∈ ¯ a C q − m ab , m ab = X σ ab σ ab b ab ( σ ab ) . (4)In this work we consider planar graphs where all nodes are of degree not larger than three,that is | ¯ a C | ≤

3. We denote by triplet a node with degree three in the graph. In AppendixA we show that a graphical model can be converted to this representation at the cost ofintroducing auxiliary nodes.

Deﬁnition 2 A is a loop in which all nodes have degree exactly two. Deﬁnition 3

The Z ∅ is the truncated form of (3) whichsums all -regular loops only: Z ∅ = Z BP · z ∅ , z ∅ = 1 + X C ∈C s.t. | ¯ a C | =2 , ∀ a ∈ C r C . (5) pproximate inference on planar graphs using Loop Calculus and BP a b c d e (a)(c)(d) f gh ji k l G (b) G ext Figure 1: Example. (a)

A Forney graph. (b)

Corresponding extended graph. (c)

Loops(in bold) included in the 2-regular partition function. (d)

Loops (in bold andred) not included in the 2-regular partition function. Marked in red, the tripletsassociated with each loop. Grouped in gray squares, the loops considered indiﬀerent subsets Ψ of triplets: (d.1) Ψ = { c, h } , (d.2) Ψ = { e, l } , (d.3) Ψ = { h, l } ,(d.4) Ψ = { c, e } and (d.4) Ψ = { c, e, h, l } (see Section 2.2).As an example, Figure 1a shows a small Forney graph and Figure 1c shows seven loopsfound in the corresponding 2-regular partition function. -regular Partition Function Using Perfect Matching In Chertkov et al. (2008) it has been shown that computation of Z ∅ can be mapped to adimer/matching problem, or equivalently, to the computation of the sum of all weightedperfect matchings within another graph. A perfect matching is a subset of edges such that

1. Notice that this part of the series was called single-connected partition function in Chertkov et al. (2008).Here we prefer to call it 2-regular partition function because loops with more than one connected com-ponent are also included in this part of the series. ´omez, Kappen and Chertkov G acb µ a ; { b,c } ac b d µ a ; { b,c } µ a ; { c,d } µ a ; { b,d } G ext Figure 2: Fisher’s rules. (Top)

A node a of degree two in G is split in two nodes in G ext . (Bottom) A node a of degree three in G is split in three nodes in G ext . Thesquares on the right indicate all possible matchings in G ext related with node a .Note that the rules preserve planarity.each node neighbors exactly one edge from the subset. The weight of a matching is theproduct of weights of edges in the matching. The key idea of this mapping is to extend theoriginal Forney graph G into an new graph G ext := ( V G ext , E G ext ) in such a way that eachperfect matching in G ext corresponds to a 2-regular loop in G . (See Figures 1b and c for anillustration). Under the condition of planarity, the sum of all weighted perfect matchingscan be calculated in a polynomial time following Kasteleyn’s arguments. Here we reproducethese results with little variations and more emphasis on the algorithmic aspects.Given a Forney graph G and the BP approximation, we simplify G and obtain the 2-coreby removing nodes of degree one recursively. After this step, G is either the null graph (andthen BP is exact) or it is only composed of vertices of degree two or three.To construct the extended graph G ext we split each node in G according to the rulesintroduced by Fisher (1966) and illustrated in Figure 2. The procedure results in an ex-tended graph of |V G ext | ≤ |V| nodes and |E G ext | ≤ |E| edges. It is easy to verify that each2-regular loop in G is associated with a perfect matching in G ext and, furthermore, thiscorrespondence is unique . Consider, for instance, the vertex of degree three in the bottomof Figure 2. Given a 2-regular loop C , vertex a can appear in four diﬀerent conﬁgurations:either node a does not appear in C , or C contains one of the following three paths: - b - a - c -,- b - a - d - or - c - a - d -. These four cases correspond to node terms in a loop with values 1, µ a ; { b,c } , µ a ; { b,d } and µ a ; { c,d } respectively, and coincide with the matchings in G ext shown within thebox on the bottom-right. An simpler argument applies to the vertex of degree two from thetop portion of Figure 2. pproximate inference on planar graphs using Loop Calculus and BP Therefore, if we associate to each internal edge (new edge in G ext not in G ) of each splitnode a the corresponding term µ a ;¯ a C of Equation (4) and to the external edges (existingedges already in G ) weight 1, then the sum over all weighted perfect matchings deﬁned on G ext is precisely z ∅ . The 2-regular partition function Z ∅ is obtained using Equation (5).Equivalently: z ∅ = X perfect matchings in G ext . Kasteleyn (1963) provided a method to compute this sum in polynomial time for planargraphs. We follow his approach. First, we create a planar embedding of G ext . A planarembedding of a graph divides the plane into disjoint regions that are bounded by sequencesof edges in the graph. The regions are called faces . Second, we orient the edges of the planarembedding in such a way that for every face (except possibly the unbounded or external face)the number of clockwise oriented edges is odd. Algorithm 1 produces such an orientation(Karpinski and Rytter, 1998). It receives an undirected graph G ext and constructs a copy G ′ ext := ( V G ′ ext , E G ′ ext ) with properly oriented edges E G ′ ext .It is convenient that G ext is bi-connected, i.e. it has no articulation points. If needed, weadd dummy edges with zero weight which do not alter the partition function or the originalmodel. Algorithm 1

Pfaﬃan orientation

Arguments: undirected bi-connected extended graph G ext . Construct a planar embedding ¯ G ext of G ext . Construct a spanning tree T of ¯ G ext . Construct a graph H having vertices corresponding to the faces of ¯ G ext :Connect two vertices in H if the respective face boundaries share an edge not in T . H is a tree. Root H to the external face. G ′ ext := T . Orient all edges in G ′ ext arbitrarily. for all face (vertex in H ) traversed in post-order do Add to G ′ ext the unique edge not in G ′ ext . Orient it such that the number of clock-wise oriented edges is odd. end for RETURN directed bi-connected extended graph G ′ ext .Finally, denote µ ij the weight of the edge between nodes i and j in G ′ ext . We create thefollowing skew-symmetric matrix ˆ A = − ˆ A t :ˆ A ij =  + µ ij if ( i, j ) ∈ E G ′ ext − µ ij if ( j, i ) ∈ E G ′ ext . This matrix is known as the Tutte matrix of G ′ ext and the Pfaﬃan of ˆ A gives the desiredsum up to the overall sign. The Pfaﬃan of ˆ A = ± q Det( ˆ A ). However, z ∅ can be eitherpositive or negative, and computing the value of the Pfaﬃan with the sign yet uncertain ´omez, Kappen and Chertkov is not suﬃcient. Furthermore, since each element ˆ A ij can be negative not only due to thePfaﬃan orientation but also if µ ij is negative, the sign of the Pfaﬃan needs to be corrected .This problem is ﬁxed with the help of the original Kasteleyn’s binary matrix:ˆ B ij =  +1 if ( i, j ) ∈ E G ′ ext − j, i ) ∈ E G ′ ext . If the sign of Pf( ˆ B ) is negative then the sign of Pf( ˆ A ) is changed. Notice that theabsolute value of Pf( ˆ B ) coincides with the number of perfect matchings or the numberof loops included in the sum if no additional edges have been added. The sign of Pf( ˆ B )represents the correction. Therefore, the corrected value of z ∅ is: z ∅ = sign (cid:16) Pf (cid:16) ˆ B (cid:17)(cid:17) · Pf (cid:16) ˆ A (cid:17) . Calculation of z ∅ can therefore be performed in time O ( N ) where N is the number ofnodes of G ext (Galbiati and Maﬃoli, 1994). For the special case of binary planar graphswith zero local ﬁelds the 2-regular partition function coincides with the exact partitionfunction Z = Z ∅ = Z BP · z ∅ since the other terms in the loop series vanish. Chertkov et al. (2008) established that z ∅ is just the ﬁrst term of a ﬁnite sum involvingPfaﬃans. We brieﬂy reproduce their results here and provide an algorithm for computingthe full loop series as a Pfaﬃan series.Consider T deﬁned as the set of all possible triplets (vertices with degree three in theoriginal graph G ). For each possible subset Ψ ∈ T , including an even number of triplets,there exists a unique correspondence between loops in G including the triplets in Ψ andperfect matchings in another extended graph G ext Ψ constructed after removal of the tripletsΨ in G . Using this representation the full loop series can be represented as a Pfaﬃan series,where each term Z Ψ is tractable and is a product of the respective Pfaﬃan and the µ a ;¯ a terms associated with each triplet of Ψ: z = X Ψ Z Ψ Z Ψ = z Ψ Y a ∈ Ψ µ a ;¯ a (6) z Ψ = sign (cid:16) Pf (cid:16) ˆ B Ψ (cid:17)(cid:17) · Pf (cid:16) ˆ A Ψ (cid:17) . The 2-regular partition function thus corresponds to Ψ = ∅ . We refer to the remainingterms of the series as higher order Pfaﬃan terms. Notice that matrices ˆ A Ψ and ˆ B Ψ dependon the removed triplets and therefore each z Ψ requires diﬀerent matrices and diﬀerentedge orientations. In addition, after removal of vertices in G the resulting graph may bedisconnected. As before, in these cases we add dummy edges to G ext with zero weight tomake the graph bi-connected again.

2. We omit the loop index in the triplet term µ a ;¯ a because nodes have at most degree three and thereforethe set ¯ a always coincide in all loops which contain that triplet. pproximate inference on planar graphs using Loop Calculus and BP Figure 1d shows loops corresponding to the higher order Pfaﬃan terms on our illustrativeexample. The ﬁrst and second subsets of triplets (d.1 and d.2) include summation over twoloops whereas the remaining Pfaﬃan terms include uniquely one loop.Exhaustive enumeration of all the subsets of triplets leads to a 2 |T | time algorithm, whichis prohibitive. However, many triplet combinations may lead to forbidden conﬁgurations.Experimentally, we found that a principled way to look for higher order Pfaﬃan terms withlarge contribution is to search ﬁrst for pairs of triplets, then groups of four, and so on. Forlarge graphs, this also becomes intractable. Actually, the problem is very similar to theproblem of selecting loop terms r C with largest contribution. The advantage of the Pfaﬃanrepresentation, however, is that Z ∅ is always the Pfaﬃan term that accounts for the largestnumber of loop terms and is the most contributing term in the series. In this work we donot derive any heuristic for searching Pfaﬃan terms with larger contributions. Instead, inSection 3.1 we study the full Pfaﬃan series and subsequently we restrict ourselves on theaccuracy of Z ∅ .Algorithm 2 describes the full procedure to compute all terms using the representationof expression (6). The main loop of the algorithm can be interrupted at any time, thusleading to a sequence of algorithms producing corrections incrementally. Algorithm 2

Pfaﬃan series

Arguments:

Forney graph G z := 0. for all (Ψ ∈ T ) do Build extended graph G ext Ψ applying rules of Figure 2. Set Pfaﬃan orientation in G ext Ψ according to Algorithm 1 Build matrices ˆ A and ˆ B . Compute Pfaﬃan with sign correction z Ψ according to Equation (3). z := z + z Ψ Q a ∈ Ψ µ a ;¯ a . end for RETURN Z BP · z

3. Experiments

In this Section we study numerically the proposed algorithm. To facilitate the evaluationand the comparison with other algorithms we focus on the binary Ising model, a particularcase of the model (1) where factors only depend on the disagreement between two variablesand take the form f a ( σ ab , σ ac ) = exp (cid:0) J a ; { ab,ac } σ ab σ ac (cid:1) . We consider also nonzero localpotentials parametrized by f a ( σ ab ) = exp (cid:0) J a ; { ab } σ ab (cid:1) in all variables so that the modelbecomes planar-intractable.We create diﬀerent inference problems by choosing diﬀerent interactions { J a ; { ab,ac } } andlocal ﬁeld parameters { J a ; { ab } } . To generate them we draw independent samples from aNormal distribution { J a ; { ab,ac } } ∼ N (0 , β/

2) and { J a ; { ab } } ∼ N (0 , β Θ), where Θ and β determine how diﬃcult the inference problem is. Generally, for Θ = 0 the planar problem istractable. For Θ >

0, small values of β result in weakly coupled variables (easy problems) ´omez, Kappen and Chertkov and large values of β in strongly coupled variables (hard problems). Larger values of Θresult in easier inference problems.In the next Subsection we analyze the full Pfaﬃan series using a small example andcompare it with the original representation based on the loop series. Next, we compare ouralgorithm with the following ones: Truncated Loop-Series for BP (TLSBP) (G´omez et al., 2007), which accounts for acertain number of loops by performing depth-ﬁrst-search on the factor graph and thenmerging the found loops iteratively. We adapted TSLBP as an any-time algorithm( anyTLSBP ) such that the length of the loop is used as the only parameter instead ofthe two parameters S and M (see G´omez et al. (2007) for details). This is equivalentto setting M = 0 and discard S . In this way, anyTLSBP does not compute all possibleloops of a certain length (in particular, complex loops are not included), but is moreeﬃcient than TLSBP. Cluster Variation Method ( CVM-Loopk ) A double-loop implementation of CVM (Heskes et al.,2003). This algorithm is a special case of generalized belief propagation (Yedidia et al.,2005) with convergence guarantees. We use as outer clusters all (maximal) factors to-gether with loops of four (k=4) or six (k=6) variables in the factor graph.

Tree-Structured Expectation Propagation ( TreeEP ) (Minka and Qi, 2004). This methodperforms exact inference on a base tree of the graphical model and approximates theother interactions. The method yields good results if the graphical model is verysparse.When possible, we also compare with the following two variational methods which provideupper bounds on the partition function:

Tree Reweighting ( TRW ) (Wainwright et al., 2005) which decomposes the parametriza-tion of a probabilistic graphical model as a mixture of spanning trees of the model,and then uses the convexity of the partition function to get an upper bound.

Planar graph decomposition ( PDC ) (Globerson and Jaakkola, 2007) which decomposesthe parametrization of a probabilistic graphical model as a mixture of tractable planargraphs (with zero local ﬁeld).To evaluate the accuracy of the approximations we consider errors in Z and, when possible,computational cost as well. As shown in G´omez et al. (2007), errors in Z , obtained from atruncated form of the loop series, are very similar to errors in single variable marginal prob-abilities, which can be obtained by conditioning over the variables under interest. We onlyconsider tractable instances for which Z can be computed via the junction tree algorithm(Lauritzen and Spiegelhalter, 1988) using 8GB of memory. When studying the scalabilityof the approaches, we Given an approximation Z ′ of Z , the error measure used in this

3. We use the libDAI library (Mooij, 2008) for algorithms

CVM-Loopk , TreeEP and

TRW .4. A complex loop is deﬁned as a loop which can not be expressed as the union of two or more circuits orsimple loops. pproximate inference on planar graphs using Loop Calculus and BP manuscript is: error Z ′ = | log Z − log Z ′ | log Z .

As in G´omez et al. (2007), we use four diﬀerent message updates for BP: ﬁxed and ran-dom sequential updates, parallel (or synchronous) updates, and residual belief propagation(RBP), a method proposed by Elidan et al. (2006) which selects the next message to be up-dated which has maximum residual , a quantity deﬁned as an upper bound on the distanceof the current messages from the ﬁxed point. We report non-convergence when none ofthe previous methods converged. We report convergence at iteration t when the maximumabsolute value of the updates of all the messages from iteration t − t is smaller than athreshold ϑ = 10 − . In the previous Section we have described two equivalent representations for the exactpartition function in terms of the loop series and the Pfaﬃan series. Here we analyzenumerically how these two representations diﬀer using an example, shown in Figure 3 asa factor graph, for which all terms of both series can be computed. We analyze a singleinstance, parametrized using Θ = 0 . β ∈ { . , . , . } .Figure 3: Planar bipartite factor graph used to compare the full Pfaﬃan series with theloop series. Circles and black squares denote variables and factors respectively.We use TLSBP to retrieve all loops, 8123 for this example, and Algorithm 2 to computeall Pfaﬃan terms. To compare the two approximations we sort all contributions, eitherloops or Pfaﬃans, by their absolute values in descending order, and then analyze how theerrors are corrected as more terms are included in the approximation. We deﬁne partitionfunctions for the truncated series in the following way: Z T LSBP ( l ) = Z BP X i =1 ...l r C i ! , Z P f ( p ) = Z BP  X i =1 ...p Z Ψ i  . ´omez, Kappen and Chertkov e rr o r Z −10 −5 β = 0.1 BP error Z ∅ error Z Ψ −8−6−4−202 x 10 −12 e rr o r Z −10 −5 β = 0.5 Z Ψ −15−10−505 x 10 −8 l (loop terms) e rr o r Z −10 −5 p (pfaffian terms) β = 1.510 Z Ψ p (pfaffian terms)10 −505 x 10 −4 Figure 4: Comparison between the full loop series and the full Pfaﬃan series. Each rowcorresponds to a diﬀerent value of the interaction strength β . Left column showsthe error, considering loop terms Z T LSBP ( l ) in log-log scale. Shaded regionsinclude all loop terms (not necessarily 2-regular loops) required to reach thesame (or better) accuracy than the accuracy of the 2-regular partition function Z ∅ . Middle column shows the error considering Pfaﬃan terms Z P f ( p ) also inlog-log scale. The ﬁrst Pfaﬃan term corresponds to Z ∅ , marked by a circle. Rightcolumn shows the values of the ﬁrst 100 Pfaﬃan terms sorted in descending orderin | Z Ψ | and excluding z ∅ .Then Z T LSBP ( l ) accounts for the l most contributing loops and Z P f ( p ) accounts for the p most contributing Pfaﬃan terms. In all cases, the Pfaﬃan term with largest absolute value Z Ψ corresponds to z ∅ .Figure 4 shows the error Z T LSBP (ﬁrst column) and Z P f (second column) for bothrepresentations. For weak interactions ( β = 0 .

1) BP converges fast and provides an accurateapproximation with an error of order 10 − . Summation of less than 50 loop terms (top-leftpanel) is enough to obtain machine precision accuracy. Notice that the error is almostreduced totally with the z ∅ correction (top-middle panel). In this scenario, higher orderterms of the Pfaﬃan series are negligible (top-right panel).As we increase β , the quality of the BP approximation decreases. The number of loopcorrections required to correct the BP error then increases. In this example, for intermediateinteractions ( β = 0 .

5) the ﬁrst Pfaﬃan term z ∅ improves considerably, more than ﬁve orders pproximate inference on planar graphs using Loop Calculus and BP of magnitude, on the BP error, whereas approximately 100 loop terms are required to achievea similar correction (gray region of middle-left panel).For strong interactions ( β = 1 .

5) BP converges after many iterations and gives a poorapproximation. In this scenario also a larger proportion of loop terms (bottom-left panel)is necessary to correct the BP result up to machine precision. Looking at the bottom-leftpanel we ﬁnd that approximately 200 loop terms are required to achieve the same correctionas the one obtained by z ∅ . The z ∅ is quite accurate (bottom-middle panel).As the right panels show, higher order Pfaﬃan contributions change progressively froma ﬂat sequence of small terms to an alternating sequence of positive and negative termswhich grow in absolute value as β increases. These oscillations are also present in the loopseries expansion.In general, we conclude that the z ∅ correction to the BP approximation can give asigniﬁcant improvement even in hard problems for which BP converges after many iterations.Notice again that calculating z ∅ , the most contributing term in the Pfaﬃan series, does notrequire explicit search of loop or Pfaﬃan terms. After analyzing the full Pfaﬃan series on a small random example we now restrict our at-tention to the Z ∅ approximation using Ising grids (nearest neighbor connectivity). First, wecompare that approximation with other inference methods for diﬀerent types of interactions(attractive or mixed) and then study the scalability of the method in the size of the graphs. We ﬁrst focus on binary models with interactions that tend to align the neighboring vari-ables to the same value, J a ; { ab,ac } >

0. If local ﬁelds are also positive J a ; { ab } > , ∀ a ∈ V ,Sudderth et al. (2008) showed that, under some additional condition, the BP approximationis a lower-bound of the exact partition function and all loops (and therefore Pfaﬃan termstoo) have the same sign . Although this is not formally proved for general models withattractive interactions regardless of the sign of the local ﬁelds, numerical results suggestthat this property holds as well for this type of models.We generate grids with positive interactions and local ﬁelds, that is |{ J a ; bc }| ∼ N (0 , β/ |{ J a ; b }| ∼ N (0 , β Θ), and study the performance for various values of β , as well as forstrong Θ = 1 and weak Θ = 0 . β ≈ Z ∅ = Z for the limit case of Θ = 0.We observe that in all instances Z ∅ always improves over the BP approximation. Cor-rections are most signiﬁcant for weak interactions β < β >

5. The condition is that all single variable beliefs at the BP ﬁxed point must satisfy m ab = b ab (+1) − b ab ( − > , ∀ ( a, b ) ∈ E ´omez, Kappen and Chertkov −2 −1 −12 −10 −8 −6 −4 −2 e rr o r Z Attractive interactions, strong local fields: Θ = 1 β (a) −2 −1 β Attractive interactions, weak local fields: Θ = 0.1 (b) TRWPDCBPTreeEPCVM−Loop4CVM−Loop6 Z ∅ Figure 5: 7x7 grid attractive interactions and positive local ﬁelds. Error averages over 50random instances in function of the diﬃculty of the problem. (a)

Strong localﬁelds. (b)

Weak local ﬁelds.It appears that the Z ∅ approximation performs better than TreeEP in all cases exceptfor very strong couplings, where they show very similar results. Interestingly, Z ∅ performsvery similar to CVM-Loop4 which is known to be a very accurate approximation for thistype of model, see Yedidia et al. (2000) for instance. We observe that in order to obtainbetter average results than Z ∅ using CVM, we need to select larger outer-clusters such asloops of length 6, which increases dramatically the computational cost.The methods which provide upper bounds on Z (PDC and TRW) report the largest aver-age error. PDC performs slightly better than TRW, as was shown in Globerson and Jaakkola(2007) for the case of mixed interactions. We remark that the worse performance of PDCfor stronger couplings and weak local ﬁelds might be attributed to implementation arti-facts, since for β > Z ∅ . We now analyze a more general Ising grid model where interactions and local ﬁelds canhave mixed signs. In that case, Z BP and Z ∅ are no longer lower bounds on Z and loopterms can be positive or negative. Figure 6 shows results using this setup. Top panels showaverage errors and bottom panels show percent of instances in which BP converged usingat least one of the methods described above. pproximate inference on planar graphs using Loop Calculus and BP −12 −10 −8 −6 −4 −2 e rr o r Z Mixed interactions, strong local fields: Θ = 1 (a) −2 −1 β BP c on v e r gen c e (c) Mixed interactions, weak local fields: Θ = 0.1 (b) TRWPDCBPTreeEPCVM−Loop4CVM−Loop6 Z ∅ −2 −1 β (d) Figure 6: 7x7 grid mixed interactions. Error averaged over 50 random instances as afunction of the problem diﬃculty for (a) strong local ﬁelds and (b) weak localﬁelds. Bottom panels show percentage of cases when BP converges for (c) stronglocal ﬁelds and (d) weak local ﬁelds.For strong local ﬁelds (subplots a,c), we observe that Z ∅ improvements over BP resultsbecome less signiﬁcant as β increases. It is important to note that Z ∅ always improves onthe BP result, even when the couplings are very strong ( β = 10) and BP fails to convergefor a small percentage of instances. Z ∅ performs slightly better than CVM-Loop4 andsubstantially better than TreeEP for small and intermediate β . All three methods showsimilar results for strong couplings β >

2. As in the case of attractive interactions, the bestresults are attained using CVM-loop6.For the case of weak local ﬁelds (subplots b,d), BP fails to converge near the transitionto the spin-glass phase. For β = 10, BP converges only in less than 25% of the instances.In the most diﬃcult domain, β >

22, all methods under consideration give similar results(all comparable to BP). Moreover, it may happen that Z ∅ degrades the Z BP approxima-tion because loops of alternating signs have major inﬂuence in the series. This result wasalso reported in G´omez et al. (2007) when loop terms, instead of Pfaﬃan terms, whereconsidered. We now study how the accuracy of the Z ∅ approximation changes as we increase the sizeof the grid. We generate random grids with mixed couplings for √ N = { , ..., } andfocus on a regime of very weak local ﬁelds Θ = 0 .

01 and strong couplings β = 1, a diﬃcult ´omez, Kappen and Chertkov −8 −7 −6 −5 −4 −3 −2 Ising grids: β = 1 Θ = 0.01N e rr o r Z (a) −4 −3 −2 −1 N c pu − t i m e (b) BPTreeEPCVM−Loop4CVM−Loop6 Z ∅ anyTLSBPJuncTree Figure 7: Results on regular grids: scaling with grid size for strong interactions β = 1 andvery weak local ﬁelds Θ = 0 .

01. BP converged in all cases. (a)

Error mediansover 50 instances. (b)

Cpu time (log-scale).conﬁguration according to the previous results. We compare Z ∅ also with anyTLSBP, avariant of our previous algorithm for truncating the loop series. We run anyTLSBP byselecting loops shorter than a given length, and the length is increased progressively. Toprovide a fair comparison between both methods, we run anyTLSBP for the same amountof cpu time as the one required to obtain Z ∅ .Figure 7a shows the errors of diﬀerent methods. Since variability in the errors is largerthan before, we take the median for comparison. In order of increasing accuracy we getBP, TreeEP, anyTLSBP, CVM-Loop6, CVM-Loop4 and Z ∅ . We note that larger clustersin CVM does not necessarily result in better performance.Overall, we can see that results are roughly independent of the network size N in almostall methods that we compare. The error of anyTLSBP starts being the smallest but soonincreases because the proportion of loops captured decreases very fast. For N > Z ∅ correction, on the other hand, stays ﬂat andwe can conclude that it scales reasonably well. Interestingly, although Z ∅ and TLSBP usediﬀerent ways to truncate the loop series, both methods show similar scaling behavior forlarge graphs.Figure 7b shows the cpu time for all the tested approaches averaged over all cases.Concerning the approximate inference methods, in order of increasing cost, we have BP,TreeEP, CVM-Loop4, Z ∅ with anyTLSBP, and CVM-Loop6. Although the cpu time re-quired to compute Z ∅ scales with O ( N G ext ), its curve shows the steepest growth. We discusshow to correct this caveat in Section 4. The cpu time of the junction tree method quickly pproximate inference on planar graphs using Loop Calculus and BP Figure 8: Two examples of planar graphs used for comparison between methods. We ﬁxthe number of concentric polygons to 9 and change the degree d of the centralnode within the range [3 , ..., (left) Graph for d = 3. (right) Graph for d = 25. Here nodes represent variables and edges pairwise interactions. We alsoadd external ﬁelds which depend on the state of each nodes (not drawn).increases with the tree-width of the graphs. For large enough N , exact solution via thejunction tree method is no longer feasible because of its memory requirements. In contrast,for all approximate inference methods, memory demands do not represent a limitation. In the previous subsection we analyzed the quality of the Z ∅ correction for graphs with aregular grid structure. Here, we carry over the analysis of the Z ∅ correction using planargraphs which consist of concentric polygons with a variable number of sides. Figure 8illustrates these spider-web like graphs. We generate them as factor graphs with pairwiseinteractions which we subsequently convert to an equivalent Forney graph. (See AppendixA for details). Again, local ﬁeld potentials are parametrized using Θ = 0 .

01 and interactionsusing β = 1. We analyze the error in Z as a function of the degree d of the central node.Figure 9a shows the median of errors in Z of 50 random instances. First, we see thatthe variability of all methods, in particular anyTLSBP, is larger than in the regular gridscenario. Also, the improvement of CVM-Loop4 over BP is slightly less signiﬁcant, possiblycaused by the existence of the central node with a large degree. CVM-Loop6 does notconverge for instances with d > seconds and is not included in the analysis.We can say that all approaches present results independent of the degree d .The Z ∅ approximation is the best method compared to the other tested approaches.The improvements of Z ∅ on CVM-Loop4 (the second best method) can be of more thantwo orders of magnitude and more than three orders of magnitude compared to BP. ´omez, Kappen and Chertkov −7 −6 −5 −4 −3 −2 spider web graphs: β = 1 Θ = 0.01d e rr o r Z (a) −4 −3 −2 −1 d c pu − t i m e (b) BPTreeEPCVM−Loop4 Z ∅ anyTLSBPJuncTree Figure 9: Results on spider-web like graphs: scaling with the degree of the central nodefor β = 1 and Θ = 0 .

01. BP converged in all cases. (a)

Error medians over 50instances. (b)

Cpu time (log-scale).Computational costs are shown in 9b. The best performance of Z ∅ comes at the costof being the most expensive approximate inference approach for which we obtain results.Again, for larger graphs, exact solution via the junction tree is not feasible due to the largetree-width.

4. Discussion

We have presented an approximate algorithm based on the exact loop calculus frameworkfor inference on planar graphical models deﬁned in terms of binary variables. The pro-posed approach improves the estimate for the partition function provided by BP withoutan explicit search of loops.The algorithm is illustrated on the example of ordered and disordered Ising model ona planar graph. Performance of the method is analyzed in terms of its dependence on thesystem size. The complexity of the partition function computation is exponential in thegeneral case, unless the local ﬁelds are zero, when it becomes polynomial. We tested ouralgorithm on regular grids and planar graphs with diﬀerent structures. Our experimentson regular grids show that signiﬁcant improvements over BP are always obtained if singlevariable potentials (local magnetic ﬁelds) are suﬃciently large. The quality of this correctiondegrades with decrease in the amplitude of external ﬁeld, to become exact at zero externalﬁelds. This suggests that the diﬃculty of the inference task changes abruptly from very pproximate inference on planar graphs using Loop Calculus and BP easy, with no local ﬁelds, to very hard, with small local ﬁelds, and then decays again asexternal ﬁelds become larger.The Z ∅ correction turns out to be competitive with other state of the art methods forapproximate inference of the partition function. First of all, we showed that Z ∅ is muchmore accurate than upper bounds based methods such as TRW or PDC. This illustratesthat such methods come at the cost of less accurate approximations. We have also shownthat for the case of grids with attractive interactions, the lower bound provided by Z ∅ isthe most accurate.Secondly, we found that Z ∅ performs much better than treeEP for weak and intermediatecouplings and shows competitive results for strong interactions. Concerning CVM, weshowed that using larger outer clusters does not necessarily lead to better approximations.In general, the Z ∅ correction presented better results than CVM for our choice of regions.Finally, we have presented a comparison of Z ∅ with TLSBP, which is another algorithmfor the BP-based loop series using the loop length as truncation parameter. On the onehand, the calculation of Z ∅ involves a re-summation of many loop terms which in the caseof TLSBP are summed individually. This consideration favors the Z ∅ approach. On theother hand, Z ∅ is restricted to the class of 2-regular loops whereas TLSBP also accountsfor terms corresponding to more complex loop structures in which nodes can have degreelarger than two. Overall, for planar graphs, we have shown evidence that the Z ∅ approach isbetter than TLSBP when the size of the graphs is not very small. We emphasize, however,that TLSBP can be applied to non-planar binary graphical models too.Currently, the shortcoming of the presented approach is in its relatively costly imple-mentation. However, since the bottleneck of the algorithm is the Pfaﬃan calculation andnot the algorithm itself (used to obtain the extended graphs and the associated matrices),it is easy to devise more eﬃcient methods than the one used here. Thus, one may substitutebrute-force evaluation of the Pfaﬃans by a smarter one available for planar graphs. Thisreduces the cost from O ( N ) to O ( N / ) (Galluccio et al., 2000; Loh and Carlson, 2006).Besides, the Pfaﬃan of ˆ B is binary, see Eq. (6), making it possible to improve using abit-matrix representation (Schraudolph and Kamenetsky, 2008). Alternatively one couldthink of a strategy which does not require the Pfaﬃan of ˆ B . All these technical issues arethe focus of our continuing investigation.In this manuscript we have focused on inference problems deﬁned on planar graphs withsymmetric pairwise interactions and, to make the problems diﬃcult, we have introducedlocal ﬁeld potentials. Notice however, that the algorithm can also be used to solve modelswith more complex interactions, i.e. more than pairwise as in the case of the Ising model(see Chertkov et al., 2008, for a discussion of possible generalizations). This makes ourapproach more powerful than other approaches, namely, (Globerson and Jaakkola, 2007;Schraudolph and Kamenetsky, 2008), designed speciﬁcally for the pairwise interaction case.Although planarity is a severe restriction, we emphasize that planar graphs appear inmany contexts such as computer vision and image processing, magnetic and optical record-ing, or network routing and logistics. It would also be interesting (and possible) to considerextensions of the algorithm developed in the manuscript for approximate inference of someclass of non-planar graphs. Thus, following the approach of Globerson and Jaakkola (2007),one can think of other types of spanning subgraphs more general than ”easy” planar graphsfor which exact computation can be performed using perfect matching. The correction Z ∅ ´omez, Kappen and Chertkov can be an accurate approximation for this spanning subgraphs and the resulting approxi-mation method would also provide bounds on the exact result. Acknowledgments

We acknowledge J. M. Mooij for providing the libDAI framework and A. Windsor forthe planar graph functions of the boost graph library. We also thank V. Y. Chernyak,J. K. Johnson and N. Schraudolph for interesting discussions and A. Globerson for provid-ing the Matlab sources of PDC. This research is part of the Interactive Collaborative In-formation Systems (ICIS) project, supported by the Dutch Ministry of Economic Aﬀairs,grant BSIK03024. The work at LANL was carried out under the auspices of the NationalNuclear Security Administration of the U.S. Department of Energy at Los Alamos NationalLaboratory under Contract No. DE-AC52-06NA25396.

Appendix A: Converting a factor graph to a Forney Graph.

A probabilistic model is usually represented as a Bayesian Network or a Markov RandomField. Since bipartite factor graphs subsume both models, we show here how to converta factor graph model deﬁned in terms of binary variables to a more general Forney graphrepresentation, for which the presented algorithm can be directly applied to.On a bipartite factor graph G F = ( V F , E F ) the set V F is composed of a set of variablenodes I and a set of factor nodes J . Each variable node i ∈ I , i := { , , . . . } representsa variable which takes values σ i = {± } . We label factor nodes using capital letters sothat a = { A, B, . . . } , a ∈ J denotes a factor node which has an associated function f a ( σ a )deﬁned on a subset of variables ¯ a ∈ I . An (undirected) edge exists between two nodes( a, i ) ∈ E F if i ∈ ¯ a .Given G F , a direct way to obtain an equivalent Forney graph G is: ﬁrst, to create anode δ i ∈ V for each variable node i ∈ V F , and second, to associate a new binary variable δ i a with values σ δ i a = {± } to edges ( δ i , a ) ∈ E . Nodes δ i ∈ V are equivalent factor nodes denoting the characteristic function: δ i ( σ a ) = 1 if σ δ i a = σ δ i b , ∀ a, b ∈ ¯ δ i and zero otherwise.Finally, factor nodes c ∈ V F correspond to the same factor nodes c in V but deﬁned interms of the new variables δ i c , ∀ i ∈ ¯ c .Figure 10 shows an example of this transformation. Notice that, although we imposean direction in the edge labels, they remain undirected: ( δ i , a ) = ( a, δ i ), ∀ δ i , a ∈ V . Forvariables i ∈ V F which only appear in two factors, such as variable 3, the corresponding δ node is redundant and can be removed. The joint distribution of G F is related to the jointdistribution of G by:1 Z f A ( σ ) f B ( σ ) f C ( σ , σ ) f D ( σ , σ ) f E ( σ , σ ) (7) ≡ Z f A ( σ δ A ) f B ( σ δ B ) f C ( σ δ C , σ δ C ) f D ( σ δ D , σ δ D ) f E ( σ δ E , σ δ E ) f δ ( σ δ A , σ δ C , σ δ D ) f δ ( σ δ B , σ δ C , σ δ E ) f δ ( σ δ D , σ δ E ) . Once G has been generated following the previous procedure it may be the case that thenodes δ i ∈ V have degree three or larger. This happens if a variable i appears in more than pproximate inference on planar graphs using Loop Calculus and BP A C B D E Aδ C Bδ D Eδ δ A δ Cδ D δ C δ Bδ Eδ D δ E Figure 10: (a)

An factor graph G F and (b) an equivalent Forney graph G .3 factor nodes on G F . It is easy to convert G to a graph were all δ i nodes have maximumdegree three by introducing new auxiliary variables δ i , δ i , ... and new equivalent nodes.For instance, if variable i ∈ V F appears in 4 factors A, B, C, D : f δ i ( σ δ i A , σ δ i B , σ δ i C , σ δ i D ) ≡ f δ i ( σ δ i A , σ δ i B , σ δ i ) f δ i ( σ δ i , σ δ i C , σ δ i D ) . Notice that although the models are equivalent, the number of loops in G may be largerthan in G F . In the case that a factor in G F involves more than three variables, as sketched inChertkov et al. (2008), one could split the node of degree N into auxiliary nodes of degree N − Z ∅ on the transformed model. Alternatively, one can reduce the numberof variables that enter a factor by clamping. References

F. Barahona. On the computational complexity of Ising spin glass models.

Jour-nal of Physics A: Mathematical and General , 15(10):3241–3253, 1982. URL http://stacks.iop.org/0305-4470/15/3241 .M. Chertkov and V. Y. Chernyak. Loop series for discrete statistical models on graphs.

Journal of Statistical Mechanics: Theory and Experiment , 2006(06):P06009, 2006a.M. Chertkov and V. Y. Chernyak. Loop calculus helps to improve Belief Propagation andlinear programming decodings of LDPC codes. In invited talk at 44th Allerton Conference ,September 2006b.M. Chertkov, V. Y. Chernyak, and R. Teodorescu. Belief propagation and loop serieson planar graphs.

Journal of Statistical Mechanics: Theory and Experiment , 2008(05):P05003 (19pp), 2008. URL http://stacks.iop.org/1742-5468/2008/P05003 .G. Elidan, I. McGraw, and D. Koller. Residual belief propagation: Informed schedulingfor asynchronous message passing. In

Proceedings of the 22nd Annual Conference onUncertainty in Artiﬁcial Intelligence (UAI-06) , Boston, Massachussetts, July 2006. AUAIPress. ´omez, Kappen and Chertkov M. Fisher. On the dimer solution of the planar Ising model.

Journal of MathematicalPhysics , 7(10):1776–1781, 1966.Jr. Forney, G.D. Codes on graphs: normal realizations.

IEEE Transactions on InformationTheory , 47(2):520–548, Feb 2001. ISSN 0018-9448. doi: 10.1109/18.910573.B. J. Frey and D. J. C. MacKay. A revolution: belief propagation in graphs with cycles. In

Advances in Neural Information Processing Systems 10 , pages 479–486, Cambridge, MA,1998. MIT Press.G. Galbiati and F. Maﬃoli. On the computation of Pfaﬃans.

Discrete Applied Mathematics ,51(3):269–275, 1994. ISSN 0166-218X. doi: http://dx.doi.org/10.1016/0166-218X(92)00034-J.A. Galluccio, M. Loebl, and J. Vondr´ak. New algorithm for the Ising problem: partitionfunction for ﬁnite lattice graphs.

Physical Review Letters , 84(26):5924–5927, Jun 2000.doi: 10.1103/PhysRevLett.84.5924.A. Globerson and T. S. Jaakkola. Approximate inference using planar graph decomposition.In B. Sch¨olkopf, J. Platt, and T. Hoﬀman, editors,

Advances in Neural InformationProcessing Systems 19 , pages 473–480. MIT Press, Cambridge, MA, 2007.V. G´omez, J. M. Mooij, and H. J. Kappen. Truncating the loop series expansion for beliefpropagation.

Journal of Machine Learning Research , 8:1987–2016, 2007. ISSN 1533-7928.T. Heskes, K. Albers, and H. J. Kappen. Approximate inference and constrained optimiza-tion. In

Proceedings of the 19th Annual conference on Uncertainty in Artiﬁcial Intelligence(UAI-03) , pages 313–320, San Francisco, CA, 2003. Morgan Kaufmann Publishers.M. Karpinski and W. Rytter. Fast parallel algorithms for graph matching problems. pages164–170. Oxford University Press, USA, 1998.P. W. Kasteleyn. Dimer statistics and phase transitions.

Journal of Math-ematical Physics , 4(2):287–293, 1963. doi: 10.1063/1.1703953. URL http://link.aip.org/link/?JMP/4/287/1 .S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphicalstructures and their application to expert systems.

Journal of the Royal Statistical society.Series B-Methodological , 50(2):154–227, 1988.H.-A. Loeliger. An introduction to factor graphs.

Signal Processing Magazine, IEEE , 21(1):28–41, Jan. 2004. ISSN 1053-5888. doi: 10.1109/MSP.2004.1267047.Y. L. Loh and E. W. Carlson. Eﬃcient algorithm for random-bond Ising models in 2d.

Physical Review Letters , 97(22):227205, 2006. doi: 10.1103/PhysRevLett.97.227205. URL http://link.aps.org/abstract/PRL/v97/e227205 .T. Minka and Y. Qi. Tree-structured approximations by expectation propagation. InSebastian Thrun, Lawrence Saul, and Bernhard Sch¨olkopf, editors,

Advances in NeuralInformation Processing Systems 16 . MIT Press, Cambridge, MA, 2004. pproximate inference on planar graphs using Loop Calculus and BP J. M. Mooij. libDAI: A free/open source C++ library for discrete approximate inferencemethods, 2008. http://mloss.org/software/view/77/ .J. M. Mooij and H. J. Kappen. On the properties of the Bethe approximation and loopybelief propagation on binary networks.

Journal of Statistical Mechanics: Theory andExperiment , 2005(11):P11012, 2005.K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy Belief Propagation for approximateinference: An empirical study. In

Proceedings of the 15th Annual Conference on Un-certainty in Artiﬁcial Intelligence (UAI-99) , pages 467–475, San Francisco, CA, 1999.Morgan Kaufmann Publishers.J. Pearl.

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference .Morgan Kaufmann Publishers, San Francisco, CA, 1988. ISBN 1558604790.N. Schraudolph and D. Kamenetsky. Eﬃcient exact inference in planar Ising models. In

Advances in Neural Information Processing Systems 22 . MIT Press, Cambridge, MA,2008.E. Sudderth, M. Wainwright, and A. Willsky. Loop series and Bethe variational bounds inattractive graphical models. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors,

Advances in Neural Information Processing Systems 20 , pages 1425–1432. MIT Press,Cambridge, MA, 2008.M. Wainwright, T. Jaakkola, and A. Willsky. A new class of upper bounds on the logpartition function.

IEEE Transactions on Information Theory , 51(7):2313–2335, July2005.J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. In T.K.Leen, T.G. Dietterich, and V. Tresp, editors,

Advances in Neural Information ProcessingSystems 13 , pages 689–695, December 2000.J. S. Yedidia, W. T. Freeman, Y. Weiss, and A. L. Yuille. Constructing free-energy approx-imations and generalized belief propagation algorithms.

IEEE Transactions on Informa-tion Theory , 51(7):2282–2312, July 2005., 51(7):2282–2312, July 2005.