Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions
aa r X i v : . [ s t a t . M L ] J u l Marginal Densities, Factor Graph Duality, andHigh-Temperature Series Expansions
Mehdi Molkaraie
Department of Statistical SciencesUniversity of Toronto
Abstract
We prove that the marginal densities of aglobal probability mass function in a primalnormal factor graph and the correspondingmarginal densities in the dual normal factorgraph are related via local mappings. Themapping depends on the Fourier transformof the local factors of the models. Detailsof the mapping, including its fixed points,are derived for the Ising model, and then ex-tended to the Potts model. By employing themapping, we can transform simultaneously all the estimated marginal densities from onedomain to the other, which is advantageous ifestimating the marginals can be carried outmore efficiently in the dual domain. An exam-ple of particular significance is the ferromag-netic Ising model in a positive external field,for which there is a rapidly mixing Markovchain (called the subgraphs-world process) togenerate configurations in the dual normalfactor graph of the model. Our numericalexperiments illustrate that the proposed pro-cedure can provide more accurate estimatesof marginal densities in various settings.
In any probabilistic inference problem, one of the mainobjectives is to compute the local marginal densitiesof a global probability mass function (PMF). Such acomputation in general require a summation with anexponential number of terms, which makes its exactcomputation intractable [Dagum and Luby, 1993].Our approach for estimating marginal densities hinges
Proceedings of the 23 rd International Conference on Artifi-cial Intelligence and Statistics (AISTATS) 2020, Palermo,Italy. PMLR: Volume 108. Copyright 2020 by the au-thor(s). on the notions of the normal realization (in whichthere is an edge for every variable) [Forney, 2001],the normal factor graph (NFG), and the dual NFG.The NFG duality theorem states that the partitionfunction of a primal NFG and the partition functionof its dual are equal up to some known scale fac-tor [Al-Bashabsheh and Mao, 2011, Forney, 2011]. Ithas been demonstrated that, in the low-temperatureregime, Monte Carlo methods for estimating thepartition function converge faster in the dual NFGthan in the primal NFG of the two-dimensional (2D)Ising model [Molkaraie and Loeliger, 2013] and of the q -state Potts model [Al-Bashabsheh and Mao, 2014,Molkaraie and Gómez, 2018].In this paper, we prove that marginal densities of aglobal PMF of a primal NFG and the correspondingmarginals of the dual NFG are related via local map-pings. Remarkably, the mapping is independent of thesize of the model, of the topology of the graph, and ofany assumptions on the parameters of the model.Each marginal density can of course be expressed as aratio of two partition functions. In non-homogeneousmodels, each ratio needs to be estimated separately via variational inference algorithms or via Monte Carlomethods. However, our proposed mapping allows a si-multaneous transformation of estimated marginal den-sities from one domain to the other.The mapping is practically advantageous if computingsuch estimates can be done more efficiently in the dualNFG than in the primal NFG. Indeed, for the ferro-magnetic Ising model in a positive external field thereis a rapidly mixing Markov chain (called the subgraphs-world process) to generate configurations in the dualNFG of the Ising model. As models, we mainly fo-cus on binary models with symmetric pairwise interac-tions (e.g., the Ising model). However, we will brieflydiscuss extensions of the proposed mappings to non-binary models (e.g., the q -state Potts model).Next, we will describe our models in the primal and inthe dual domains. unning heading title breaks the line Suppose variables X , X , . . . , X N are associated withthe vertices (sites) of a connected graph G = ( V , E )with |V| = N vertices and |E| edges (bonds). Two vari-ables ( X i , X j ) interact if their corresponding verticesare connected by an edge in G . Each variable takesvalues in A = Z / Z , i.e., the set of integers modulotwo. We will mainly view A as a group with respectto addition.In the primal domain, the probability of a configura-tion x ∈ A N is given by π ( x ) ∝ Y ( i,j ) ∈E ψ i,j ( x i , x j ) Y v ∈V φ v ( x v ) . (1)Furthermore, we assume that each pairwise potentialfactor ψ i,j ( · ) is only a function of y i,j = x i − x j . Tolighten notations we denote the index pair ( i, j ) ∈ V by a single index e ∈ E . In the primal domain, weexpress the global probability mass function (PMF)as π p ( x ) = 1 Z p Y e ∈E ψ e ( y e ) Y v ∈V φ v ( x v ) . (2)Here, the normalization constant Z p is the partitionfunction, { ψ e : A → R ≥ , e ∈ E} are the edge-weighingfactors , and { φ v : A → R ≥ , v ∈ V} are the vertex-weighing factors [Molkaraie, 2017, Forney, 2018].The factorization in (2) can be represented by an NFG G = ( V , E ), where vertices represent the factors andedges represent the variables. The edge that repre-sents some variable y e is connected to the vertex rep-resenting the factor ψ e ( · ) if and only if y e is an ar-gument of ψ e ( · ). If a variable appears in more thantwo factors, it is replicated using an equality indicatorfactor [Forney, 2001].For a 2D lattice, the NFG of (2) is depicted in Fig. 1,in which the unlabeled boxes represent ψ e ( · ), smallunlabeled boxes represent φ v ( · ). In Fig. 1, boxes la-beled “+” are instances of zero-sum indicator factorsI + ( · ), which impose the constraint that all their inci-dent variables sum to zero, and boxes labeled “=” areinstances of equality indicator factors I = ( · ), which im-pose the constraint that all their incident variables areequal.E.g., the equality indicator factor involving x , x ′ , and x ′′ is given byI = ( x , x ′ , x ′′ ) = δ ( x − x ′ ) · δ ( x − x ′′ ) (3)and the zero-sum indicator factor involving x , x , and y is as in I + ( y , x , x ) = δ ( y + x + x ) , (4) = ❩❩ + = ❩❩ + = ❩❩ + + += ❩❩ + = ❩❩ + = ❩❩ + + += ❩❩ + = ❩❩ + = ❩❩ X X ′ X ′′ X Y ψ φ Figure 1: Primal NFG of the factorization (2).where δ ( · ) is the Kronecker delta function. (Note thatall arithmetic manipulations are modulo two.)In the primal NFG, variables include X = { X v : v ∈V} and Y = { Y e : e ∈ E} . However, these vari-ables are not independent. Indeed, we canfreely choose X and therefrom fully determine Y [Molkaraie and Gómez, 2018, Forney, 2018]. E.g.,if we take G to be a d -dimensional lattice, we cancompute each component Y e of Y by adding two com-ponents of X that are incident to the correspondingzero-sum indicator factor (see Fig. 1).The number of configurations in the primal domain isthus |A| N , and Z p = X x ∈A N Y e ∈E ψ e ( y e ) Y v ∈V φ v ( x v ) . (5)The Ising model can be easily formulated via (2). Inan Ising model the energy of a configuration x is givenby the Hamiltonian H ( x ) = − X ( i,j ) ∈E J i,j · (cid:0) δ ( x i − x j ) − (cid:1) − X v ∈V H v · (cid:0) δ ( x v ) − (cid:1) , (6)which can be expressed as H ( x ) = − X e ∈E J e · (cid:0) δ ( y e ) − (cid:1) − X v ∈V H v · (cid:0) δ ( x v ) − (cid:1) . (7)Here J e is the coupling parameter associated with thebond e ∈ E and H v is the external field at site v ∈V . The model is called homogeneous if couplings areconstant and ferromagnetic if J e ≥ e ∈ E . In the bipolar case (i.e., when X = {− , +1 } ),the Hamiltonian is H ( x ) = − P ( i,j ) ∈E J i,j x i x j − P ≤ i ≤ N H i x i . ehdi Molkaraie The probability of x is given by the Gibbs-Boltzmanndistribution [Yeomans, 1992] π B ( x ) ∝ e − β H ( x ) , (8)where β ∈ R ≥ denotes the inverse temperature.From (7) and (8), it is straightforward to obtain theedge-weighing factors of the Ising model as ψ e ( y e ) = (cid:26) e βJ e , if y e = 0 e − βJ e , if y e = 1 (9)and the vertex-weighing factors as φ v ( x v ) = (cid:26) e βH v , if x v = 0 e − βH v , if x v = 1. (10)The Gibbs-Boltzmann distribution in (8) can thereforebe expressed via the factorization (2). The dual NFG has the same topology as the primalNFG, but with factors replaced by the discrete Fouriertransform (DFT) or the inverse DFT of correspondingfactors in the primal NFG.We can obtain the dual NFG of our binary modelsby replacing factors by their one-dimensional (1D)DFT, equality indicator factors by zero-sum indica-tor factors, and zero-sum indicator factors by equal-ity indicator factors [Al-Bashabsheh and Mao, 2011,Molkaraie and Loeliger, 2013, Molkaraie, 2016].We will use the tilde symbol to denote variables in thedual NFG, which also take values in A .The dual NFG of Fig. 1 is illustrated in Fig. 2, inwhich the unlabeled boxes represent ˜ ψ e : A → R , the1D DFT of ψ e ( · ), given by˜ ψ e (˜ y e ) = (cid:26) ψ e (0) + ψ e (1) , if ˜ y e = 0 ψ e (0) − ψ e (1) , if ˜ y e = 1 (11)and for v ∈ V the small unlabeled boxes are ˜ φ v : A → R , the 1D DFT of φ v ( · ), as in˜ φ v (˜ x v ) = (cid:26) φ v (0) + φ v (1) , if ˜ x v = 0 φ v (0) − φ v (1) , if ˜ x v = 1. (12)The set of variables in the dual domain consist of˜ Y = { ˜ Y e : e ∈ E} and ˜ X = { ˜ X v : v ∈ V} . Again, thesevariables are not independent as we can freely choose˜ Y and therefrom fully determine ˜ X . E.g., if we take G to be a d -dimensional lattice and assume periodicboundaries, each component ˜ X v of ˜ X can be computedby adding 2 d components of ˜ Y that are incident to thecorresponding zero-sum indicator factor (see Fig. 2). + ❩❩ = + ❩❩ = + ❩❩ = = =+ ❩❩ = + ❩❩ = + ❩❩ = = =+ ❩❩ = + ❩❩ = + ❩❩ ˜ X ˜ X ˜ X ˜ Y ˜ ψ ˜ φ Figure 2: The dual of the NFG in Fig. 1.In the dual NFG, the number of configurations is |A| |E| ,and its the partition function Z d is given by Z d = X ˜ y ∈A |E| Y e ∈E ˜ ψ e (˜ y e ) Y v ∈V ˜ φ v (˜ x v ) . (13)On condition that factors (11) and (12) are nonnega-tive, we can define the global PMF in the dual NFGas π d (˜ y ) = 1 Z d Y e ∈E ˜ ψ e (˜ y e ) Y v ∈V ˜ φ v (˜ x v ) . (14)The dual Ising model can be expressed via (14). Indeed˜ ψ e (˜ y e ) = (cid:26) βJ e ) , if ˜ y e = 02 sinh( βJ e ) , if ˜ y e = 1 , (15)in agreement with (9) and (11), and˜ φ v (˜ x v ) = (cid:26) βH v ) , if ˜ x v = 02 sinh( βH v ) , if ˜ x v = 1, (16)in agreement with (10) and (12).If the model is ferromagnetic (i.e., J e ≥ H v ≥ Z p and Z d are equal up to some scale factor α ( G ). Indeed Z d = α ( G ) · Z p , (17)where α ( G ) only depends on the topology of G .For more details, see [Al-Bashabsheh and Mao, 2011],[Molkaraie, 2017, Appendix],[Forney, 2018, Thm 8]. unning heading title breaks the line In [Jerrum and Sinclair, 1993], the authors proposeda rapidly mixing Markov chain (called the subgraphs-world process) which evaluates the partition functionof an arbitrary ferromagnetic Ising model in a positiveexternal field to any specified degree of accuracy.The mixing time of the process is polynomial inthe size of the model at all temperatures. Indeed,the expected running time of the generator of thesubgraphs-world process is O (cid:0) |E| N (log δ − + |E| ) (cid:1) ,where δ is the confidence parameter. For more details,see [Jerrum and Sinclair, 1993, Section 4].The subgraphs-world process employs the following ex-pansion of Z defined on the set of edges W ⊆ E inpowers of tanh( H ) and tanh( J e ) as Z ∝ X W⊆E tanh( H ) | odd( W ) | Y e ∈W tanh( J e ) , (18)where odd( W ) denotes the set of all odd-degreevertices in the subgraph of E induced by W .The expansion (18) is known as the high-temperature series expansion of the partitionfunction [Newell and Montroll, 1953, Yeomans, 1992,Grimmett and Janson, 2009]. Proposition 1.
The configurations that arise inthe high-temperature series expansion of the parti-tion function (which are the configurations of thesubgraphs-world process) coincide with the valid con-figurations in the dual NFG of the Ising model.See [Molkaraie and Gómez, 2018, Section VIII] and[Forney, 2018, Section III-E] for the proof.Following Proposition 1, we can employ the subgraphs-world process (as a generator for the subgraphs-worldconfigurations) to generate configurations in the dualNFG of the Ising model. The process is rapidlymixing and therefore converges in polynomial time.However, under reasonable complexity assumptions,there is no generalization of this approximation schemeto the (nonbinary) Potts model or to spin glasses.For more details, see [Goldberg and Jerrum, 2012,Galanis et al., 2016].Next, we will present local (edge-based) mappings thattransform marginal densities from the dual NFG to theprimal NFG, or vice versa. The mappings depend onthe DFT of the local factors of the models. . . . = + = . . . −−−→ dual . . . + = + . . .ξ e ˜ ξ e Figure 3: The edge e ∈ E in the intermediate primalNFG (left) and in the intermediate dual NFG (right).The unlabeled box (left) represents (22) and the unla-beled box (right) represents (23). The edge marginal PMF of e ∈ E in the primal NFGcan be computed as π p ,e ( a ) = Z p ,e ( a ) Z p , a ∈ A , (19)where Z p ,e ( a ) = X x ∈A N δ ( y e − a ) Y e ′ ∈E ψ e ′ ( y e ′ ) Y v ∈V φ v ( x v ) . Hence Z p ,e ( a ) = ψ e ( a ) S e ( a ) , (20)with S e ( a ) = X x ∈A N δ ( y e − a ) Y e ′ ∈E\ e ψ e ′ ( y e ′ ) Y v ∈V φ v ( x v ) . (21)Here, Z p ,e ( a ) ≥ Z p = P a ∈A Z p ,e ( a ) = P a ∈A ψ e ( a ) S e ( a ), hence (19) is a valid PMF over A .In coding theory terminology, { ψ e ( a ) , a ∈ A} is calledthe intrinsic message vector and { S e ( a ) , a ∈ A} iscalled the extrinsic message vector at edge e ∈ E .According to the sum-product message passing up-date rule, the edge marginal PMF vector is com-puted as the dot product of the intrinsic and extrin-sic message vectors up to scale. The scale factoris equal to the partition function Z p [Forney, 2001,Kschischang et al., 2001].In our setup, S e ( a ) is the partition function of an inter-mediate primal NFG with all factors as in the primalNFG, excluding the factor ψ e ( y e ), which is replacedby ξ e ( y e ; a ) = δ ( y e − a ) . (22)Fig. 3 (left) shows the corresponding edge in the inter-mediate primal NFG. The intermediate dual NFG isshown in Fig. 3 (right), in which the factor ˜ ψ e (˜ y e ) isreplaced by˜ ξ e (˜ y e ; a ) = ( δ ( a ) + δ (1 − a ) , if ˜ y e = 0 δ ( a ) − δ (1 − a ) , if ˜ y e = 1 , (23) ehdi Molkaraie which is the 1D DFT of (22). According to the NFGduality theorem (17), the partition function of the in-termediate dual NFG is α ( G ) · S e ( a ).Similarly, in the dual NFG the edge marginal PMF of e ∈ E is π d ,e ( a ′ ) = Z d ,e ( a ′ ) Z d , a ′ ∈ A . (24)Hence Z d ,e ( a ′ ) = ˜ ψ e ( a ′ ) · (cid:16) X ˜ y ∈A |E| δ (˜ y e − a ′ ) Y e ′ ∈E\ e ˜ ψ e ′ (˜ y e ′ ) Y v ∈V ˜ φ v (˜ x v ) (cid:17) = ˜ ψ e ( a ′ ) ˜ S e ( a ′ ) . (25) Proposition 2.
The vectors { S e ( a ) , a ∈ A} and { ˜ S e ( a ′ ) , a ′ ∈ A} are DFT pairs. Proof.
For a ∈ A , the partition function of the in-termediate dual NFG is the dot product of messagevectors { ˜ ξ e ( a ′ ; a ) , a ′ ∈ A} and { ˜ S e ( a ′ ) , a ′ ∈ A} . Thus α ( G ) · S e ( a ) = X a ′ ∈A ˜ ξ e ( a ′ ; a ) ˜ S e ( a ′ ) , (26)which gives α ( G ) · S e ( a ) = (cid:0) ˜ S e (0) + ˜ S e (1) (cid:1) · δ ( a )+ (cid:0) ˜ S e (0) − ˜ S e (1) (cid:1) · δ (1 − a ) . (27)After setting a = 0 and a = 1 in (27), we obtain " S e (0) S e (1) = 1 α ( G ) " − · " ˜ S e (0)˜ S e (1) (28)which is an instance of the two-point DFT. (cid:4) Proposition 3.
The vectors { π p ,e ( a ) /ψ e ( a ) , a ∈ A} and { π d ,e ( a ′ ) / ˜ ψ e ( a ′ ) , a ′ ∈ A} are DFT pairs. Proof.
From (19) and (21) we have S e ( a ) = Z p · π p ,e ( a ) ψ e ( a ) , a ∈ A . (29)But (17), (24), and (25) yield˜ S e ( a ′ ) = Z d · π d ,e ( a ′ )˜ ψ e ( a ′ ) (30)= α ( G ) · Z p · π d ,e ( a ′ )˜ ψ e ( a ′ ) , a ′ ∈ A . (31)Putting (31) and (29) in (28), and after a little rear-ranging, we obtain the following mapping " π p ,e (0) /ψ e (0) π p ,e (1) /ψ e (1) = " − · " π d ,e (0) / ˜ ψ e (0) π d ,e (1) / ˜ ψ e (1) (32)in matrix-vector format via the two-point DFT. (cid:4) By virtue of Proposition 3, it is possible to estimateedge marginal densities in one domain, and then trans-form them to the other domain all together. The map-ping is fully local, and is independent of the size ofthe graph N and of the topology of G . (Indeed, therelevant information regarding the rest of the graphis incorporated in the estimated edge marginal densi-ties.)We state without proof that Proposition 4.
The vectors { π p ,v ( a ) /φ v ( a ) , a ∈ A} and { π d ,v ( a ′ ) / ˜ φ v ( a ′ ) , a ′ ∈ A} are DFT pairs. For the general Ising model substituting factors (9)and (15) in (32) yields " π p ,e (0) π p ,e (1) = e βJ e βJ e ) e βJ e βJ e ) e − βJ e βJ e ) − e − βJ e βJ e ) · " π d ,e (0) π d ,e (1) (33)for βJ e = 0.Let us consider a homogeneous and ferromagneticIsing model. A straightforward calculation shows thatthe fixed points of the mapping (33) are given by (cid:2) π ∗ p ,e (0) π ∗ p ,e (1) (cid:3) = (cid:20) e βJ cosh( βJ )1 + sinh(2 βJ ) e − βJ sinh( βJ )1 + sinh(2 βJ ) (cid:21) (34)Fig. 4 shows the fixed points (34) as a function of βJ . Proposition 5.
The min of π ∗ p ,e (0) and the max of π ∗ p ,e (1) are attained at the criticality of the 2D homo-geneous Ising model without an external field. Proof.
In the thermodynamic limit (i.e., as N → ∞ )the 2D Ising model undergoes a phase transition at βJ c = ln(1 + √ / ≈ .
44 [Onsager, 1944].In the absence of an external field and for βJ = 1, theHamiltonian (7) can be expressed as H ( y ) = − X e ∈E (cid:0) δ ( y e ) − (cid:1) (35)= − X e ∈E (1 − y e ) , (36)where y e = x i − x j for e = ( i, j ) ∈ E . unning heading title breaks the line . . . . βJ c βJ π ∗ p ,e (0) in (34) π ∗ p ,e (1) in (34)Min attained at βJ c Max attained at βJ c Figure 4: The fixed points (34) as a function of βJ .The filled circles show the fixed points at criticality ofthe 2D Ising model given by (46).The average energy is equal to H ( y ) = X y ∈A π p ( y ) H ( y ) (37)= −|E| · (1 − Y e ]) (38)= −|E| · (1 − π p ,e (1)) . (39)In the 2D Ising model with periodic boundaries |E| =2 N , thus the average energy per site is given by H ( y ) /N = − − π p ,e (1)) . (40)From Onsager’s closed-form solutionlim N →∞ ln ZN = 12 ln(2 cosh βJ )+1 π Z π ln (cid:0) p − κ sin θ (cid:1) dθ (41)and the average (internal) energy per site is given by U ( βJ ) = lim N →∞ − N · ∂ ln Z∂βJ (42)= − coth(2 βJ ) · (cid:16) − π (1 − κ sinh 2 βJ ) Z π dθ p − κ sin θ (cid:17) (43)with κ ( βJ ) = 2 sinh 2 βJ cosh βJ · (44)See [Onsager, 1944], [Baxter, 2007, Chapter 7] formore details.A routine calculation shows that κ ( βJ c ) = 1, thus U ( βJ c ) = −√ . (45)From (40) and (45), we obtain π ∗ p ,e (1) = (2 − √ / (cid:2) π ∗ p ,e (0) π ∗ p ,e (1) (cid:3) = (cid:2) (2 + √ / − √ / (cid:3) , (46) . . . . . βJ c βJ e Lower bound on π p ,e (0) in (47)Lower bound on π d ,e (0) in (48) Figure 5: For a ferromagnetic Ising model in a nonneg-ative external field, the solid black plot and the dashedblue plot show the lower bound on π p ,e (0), given by(47), and the lower bound on π d ,e (0), given by (48), asa function of βJ e , respectively. The lower bounds in-tersect at the criticality of the 2D homogeneous Isingmodel in zero field, denoted by βJ c .which coincides with the min of π ∗ p ,e (0) and the maxof π ∗ p ,e (1). We emphasize that in the 2D homogeneousIsing model in zero field and in the thermodynamiclimit (i.e., as N → ∞ ), edge marginal densities in theprimal and dual domains are equal at criticality. (cid:4) The fixed points π ∗ p ,e (0) and π ∗ p ,e (1) at criticalityin (46) are illustrated by filled circles in Fig. 4. Proposition 6.
In an arbitrary ferromagnetic Isingmodel in a nonnegative external field, it holds that π p ,e (0) ≥
11 + e − βJ e (47)and π d ,e (0) ≥ e − βJ e . (48) Proof.
Since the Ising model is ferromagnetic and ina nonnegative external field, we can define the globalPMF π d ,e ( · ) in the dual domain as in (14). From (33),we have π p ,e (0) e βJ e = π d ,e (0)2 cosh( βJ e ) + π d ,e (1)2 sinh( βJ e ) (49)= 12 sinh( βJ e ) − e − βJ e sinh(2 βJ e ) π d ,e (0) . (50)We conclude from (50) that π p ,e (0) achieves its mini-mum when π d ,e (0) = 1. After substituting π d ,e (0) = 1in (50), and after a little rearranging, we obtain π p ,e (0) ≥ e βJ e βJ e ) (51)= 11 + e − βJ e · (52)The proof of (48) follows along the same lines. (cid:4) ehdi Molkaraie . . . . . π ∗ p ,e (0) in (60) βJ q = 3 q = 4 q = 5 q = 10 q = 100 Figure 6: The fixed points (60) as a function of βJ fordifferent values of q . The filled circles show the fixedpoints at criticality of the 2D Potts model located at βJ c = ln(1 + √ q ).Proposition 6 is valid for arbitrary ferromagnetic Isingmodels in a nonnegative external magnetic field, i.e.,the bonds do not depend on N (the size of the graph-ical model G ) and on the topology of G .Fig. 5 shows the lower bounds in (47) and (48) as afunction of βJ e . The lower bounds intersect at βJ c ,i.e., at the criticality of the 2D homogeneous Isingmodel in the absence of an external field. Remark 1.
From (47) and (48) we conclude that in anarbitrary ferromagnetic Ising model in a nonnegativeexternal field π p ,e (0) π d ,e (0) ≥ , (53)which is in the form of an uncertainty principle. We briefly discuss extensions of our mapping to non-binary models, in particular to the q -state Pottsmodel [Wu, 1982, Baxter, 2007]. Accordingly, we let A = Z /q Z for some integer q ≥
2. (The binary Isingmodel is recovered as the special case q = 2.)In the absence of an external field, the Hamiltonian ofthe model is given by H ( x ) = − X e ∈E J e · δ ( y e ) . (54)From (8) and (54), we obtain that in the primal NFG ψ e ( y e ) = (cid:26) e βJ e , if y e = 01 , otherwise, (55)and in the dual NFG, factors are equal to the 1D DFT . . . . . . π ∗ p ,e (1) in (61) βJ q = 3 q = 4 q = 5 q = 10 q = 100 Figure 7: Everything as in Fig. 6 but for π ∗ p ,e (1)in (61).of (55) given by˜ ψ e (˜ y e ) = (cid:26) e βJ e − q, if ˜ y e = 0 e βJ e − , otherwise, (56)which is nonnegative if the model is ferromagnetic(i.e., J e ≥ { π p ,e ( a ) /ψ e ( a ) , a ∈ A} and { π d ,e ( a ′ ) / ˜ ψ e ( a ′ ) , a ′ ∈ A} via W q = { w k,ℓ , k, ℓ ∈ A} with w k,ℓ = e − π i |A| kℓ , where W q is the q -point DFTmatrix (i.e., the Vandermonde matrix for the roots ofunity) and i = √− π p ,e (1) /ψ e (1) = . . . = π p ,e ( q − /ψ e ( q −
1) (57)and π d ,e (1) / ˜ ψ e (1) = . . . = π d ,e ( q − / ˜ ψ e ( q − . (58)Thus, e.g., for q = 3, the mapping yields π p ,e (1) ψ e (1) = π d ,e (0)˜ ψ e (0) + π d ,e (1)˜ ψ e (1) e − π i3 + π d ,e (2)˜ ψ e (2) e − π i3 = π d ,e (0)˜ ψ e (0) − π d ,e (1)˜ ψ e (1) , (59)which is real-valued.For a homogeneous and ferromagnetic q -state Pottsmodel, the fixed points are given by π ∗ p ,e (0) = e βJ ( e βJ − q ) e βJ − − q ) e βJ + 1 − q (60)and π ∗ p ,e ( t ) = e βJ − e βJ − − q ) e βJ + 1 − q (61) unning heading title breaks the line for t ∈ { , , . . . , q − } .Figs. 6 and 7 show the fixed point (60) and (61) as afunction of βJ . Like the Ising model, the minimumof π ∗ p ,e (0) is attained at the criticality of the 2D Pottsmodel located at βJ c = ln(1 + √ q ) [Wu, 1982].It is easy to show that at criticality and in the many-component limit (i.e., as q → ∞ ), we havelim q →∞ π ∗ p ,e (0) = 12 (62) Remark 2.
Transforming marginals from one do-main to the other requires a matrix-vector multipli-cation with computational complexity O ( |A| ). How-ever, when there is symmetry in the factors, as in (9)and (55), the complexity can be reduced to O ( |A| ). Remark 3.
In binary models, factors in the dual NFGcan in general take negative values, and in nonbinarymodels, the factors can be complex-valued. In suchcases a valid PMF can no longer be defined in the dualdomain. The mappings remain nevertheless valid; butfor marginal functions (instead of marginal densities)of a global function with a factorization given by (14).
In both domains estimates of marginal densities can beobtained via Monte Carlo methods or via variationalalgorithms [Robert and Casella, 2004, Murphy, 2012].We only consider the subgraphs-world process (SWP)and two variational algorithms, the belief propagation(BP) and the tree expectation propagation (TEP), forthe Ising model. Estimated marginals in the dual do-main are then transformed all together to the primaldomain via (32) and(59). In all experiments, the exactmarginal densities are computed via the junction treealgorithm implemented in [Mooij, 2010].The choice of methods and the models is far from ex-haustive – our goal is to show the advantage of usingthe mappings in approximating marginal densities insimilar settings.In our first experiment, we consider a 2D homogeneousIsing model, in a constant external field βH = 0 . N = 6 × samples.Fig. 8 shows the relative error in estimating π p ,e (0) asa function of βJ , where SWP (which operates in thedual NFG) gives good estimates in the whole range.Compared to variational algorithms, convergence ofthe SWP is slow; moreover, SWP is only applicable .
05 0 .
15 0 .
25 0 .
35 0 .
45 0 .
55 0 .
65 0 . − − − − − βJ BP primal/dualTEP primal/dualSWP
Figure 8: Relative error as a function of βJ in estimat-ing π p ,e (0) of a homogeneous Ising model in a constantexternal field βH = 0 .
15, with periodic boundaries,and with size N = 6 × N = 6 × βJ e = | βJ ′ e | with βJ ′ e i.i.d. ∼ N (0 , σ ). Fig. 9 shows the average relativeerror in estimating the marginal density π p ,e (0) as afunction of σ , where the results are averaged over200 independent realizations. We consider a fully-connected Ising model with N = 10 in our last ex-periment. Couplings are chosen randomly accordingto βJ e i.i.d. ∼ U [0 . , βJ x ], i.e., uniformly between 0.05and βJ x denoted by the x -axis. The average relativeerror over 50 independent realizations is illustratedin Fig. 10.In both experiments, BP and TEP provide close ap-proximations in the dual domain, therefore only BPresults are reported. Figs. 9 and 10 show that for σ > .
25 and βJ x > .
20, BP in the dual NFG cansignificantly improve the quality of estimates – even bymore than two orders of magnitude in terms of relativeerror.
We proved that marginals densities of a primal NFGand the corresponding marginal densities of its dualNFG are related via local mappings. The mapping pro-vides a simple procedure to transform simultaneously ehdi Molkaraie .
05 0 .
25 0 .
45 0 .
65 0 .
85 1 .
05 1 .
25 1 .
45 1 .
65 1 . − − − − − σ BP primalTEP primalBP/TreeEP dual
Figure 9: Average relative error in estimating π p ,e (0) ofan Ising model with periodic boundaries and with size N = 6 ×
6. Couplings are chosen randomly accordingto a half-normal distribution with variance σ . .
05 0 .
15 0 .
25 0 .
35 0 .
45 0 .
55 0 . − − − − − βJ x BP primalTEP primalBP/TreeEP dual
Figure 10: Average relative error in estimating π p ,e (0)in a fully-connected Ising model with N = 10. Cou-pling parameters are chosen uniformly and indepen-dently between 0.05 and βJ x denoted by the x -axis.the estimated marginals from one domain to the other.Furthermore, the mapping relies on no assumptionson the size or on the topology of the graphical model.Our numerical experiments show that estimating themarginals in the dual NFG can sometimes significantlyimprove the quality of approximations in terms of rel-ative error. In the special case of the ferromagneticIsing model in a positive external field, there is indeeda rapidly mixing Markov chain (the subgraphs-worldprocess) to generate configurations in the dual domain. Acknowledgements
The author is extremely grateful to G. D. Forney, Jr.,for his comments and for his continued support. Theauthor also wishes to thank J. Dauwels, P. Vontobel,V. Gómez, and B. Ghojogh for their comments.
References [Al-Bashabsheh and Mao, 2011] Al-Bashabsheh, A.and Mao, Y. (2011). Normal factor graphs andholographic transformations.
IEEE Transactionson Information Theory , 57:752–763.[Al-Bashabsheh and Mao, 2014] Al-Bashabsheh, A.and Mao, Y. (2014). On stochastic estimationof the partition function.
Proc. IEEE Interna-tional Symposium on Information Theory , pages1504–1508.[Baxter, 2007] Baxter, R. J. (2007).
Exactly SolvedModels in Statistical Mechanics . Dover Publications.[Bracewell, 1999] Bracewell, R. N. (1999).
TheFourier Transform and Its Applications . McGraw-Hill.[Dagum and Luby, 1993] Dagum, P. and Luby, M.(1993). Approximating probabilistic inference inBayesian belief networks is NP-hard.
Artificial In-telligence , 60:141–153.[Forney, 2001] Forney, G. D. (2001). Codes on graphs:Normal realization.
IEEE Transactions on Informa-tion Theory , 47:520–548.[Forney, 2011] Forney, G. D. (2011). Codes on graphs:Duality and MacWilliams identities.
IEEE Trans-actions on Information Theory , 57:1382–1397.[Forney, 2018] Forney, G. D. (2018). Codes on graphs:Models for elementary algebraic topology and sta-tistical physics.
IEEE Transactions on InformationTheory , 64:7465–7487.[Galanis et al., 2016] Galanis, A., Stefankovic, D.,Vigoda, E., and Yang, L. (2016). FerromagneticPotts model: Refined
SIAM Journal on Computing , 45:2004–2065.[Goldberg and Jerrum, 2012] Goldberg, L. A. and Jer-rum, M. (2012). Approximating the partition func-tion of the ferromagnetic Potts model.
Journal ofthe ACM , 59:1222–1239.[Grimmett and Janson, 2009] Grimmett, G. and Jan-son, S. (2009). Random even graphs.
The ElectronicJournal of Combinatorics , 16:R46.[Jerrum and Sinclair, 1993] Jerrum, M. and Sinclair,A. (1993). Polynomial-time approximation algo-rithms for the Ising model.
SIAM Journal on Com-puting , 22:1087–1116.[Kschischang et al., 2001] Kschischang, F. R., Frey,B. J., and Loeliger, H.-A. (2001). Factor graphs andthe sum-product algorithm.
IEEE Transactions onInformation Theory , 47:498–519. unning heading title breaks the line [Molkaraie, 2016] Molkaraie, M. (2016). An impor-tance sampling algorithm for the Ising model withstrong couplings.
Proc. International Zurich Semi-nar on Communications , pages 180–184.[Molkaraie, 2017] Molkaraie, M. (2017). The primalversus the dual Ising model.
Proc. 55th AnnualAllerton Conference on Communication, Control,and Computing , pages 53–60.[Molkaraie and Gómez, 2018] Molkaraie, M. andGómez, V. (2018). Monte Carlo methods for theferromagnetic Potts model using factor graphduality.
IEEE Transactions on Information Theory ,59:7449–7464.[Molkaraie and Loeliger, 2013] Molkaraie, M. andLoeliger, H.-A. (2013). Partition function of theIsing model via factor graph duality.
Proc. IEEEInternational Symposium on Information Theory ,pages 2304–2308.[Mooij, 2010] Mooij, J. M. (2010). libdai: A free andopen source C++ library for discrete approximateinference in graphical models.
The Journal of Ma-chine Learning Research , 11:2169–2173.[Murphy, 2012] Murphy, K. P. (2012).
Machine Learn-ing: A Probabilistic Perspective . MIT Press.[Newell and Montroll, 1953] Newell, G. F. and Mon-troll, E. W. (1953). On the theory of the Ising modelof ferromagnetism.
Reviews of Modern Physics ,25:353–389.[Onsager, 1944] Onsager, L. (1944). Crystal statis-tics. I. A two-dimensional model with an order-disorder transition.
Physical Review , 65:117–149.[Robert and Casella, 2004] Robert, C. P. and Casella,G. (2004).
Monte Carlo Statistical Methods .Springer.[Welsh, 1993] Welsh, D. J. A. (1993).
Complexity:Knots, Colourings and Countings . Cambridge Uni-versity Press.[Wu, 1982] Wu, F.-Y. (1982). The Potts model.
Re-views of Modern Physics , 54:235–268.[Yeomans, 1992] Yeomans, J. M. (1992).