[PDF] Dynamic mean-field and cavity methods for diluted Ising systems

Abstract

We compare dynamic mean-field and dynamic cavity as methods to describe the stationary states of dilute kinetic Ising models. We compute dynamic mean-field theory by expanding in interaction strength to third order, and compare to the exact dynamic mean-field theory for fully asymmetric networks. We show that in diluted networks the dynamic cavity method generally predicts magnetizations of individual spins better than both first order ("naive") and second order ("TAP") dynamic mean field theory.

Full PDF

DDynamic mean-ﬁeld and cavity methods for diluted Ising systems

Erik Aurell

Department of Computational Biology, AlbaNova University Centre, 106 91 Stockholm, Sweden ∗ Hamed Mahmoudi

Department of Information and Computer Science, Aalto University, Finland (Dated: October 15, 2018)We compare dynamic mean-ﬁeld and dynamic cavity as methods to describe the stationary statesof dilute kinetic Ising models. We compute dynamic mean-ﬁeld theory by expanding in interactionstrength to third order, and compare to the exact dynamic mean-ﬁeld theory for fully asymmetricnetworks. We show that in diluted networks the dynamic cavity method generally predicts magneti-zations of individual spins better than both ﬁrst order (“naive”) and second order (“TAP”) dynamicmean ﬁeld theory.

PACS numbers: 64.60.-i, 68.43.De, 75.10.Nr, 24.10.Ht

I. INTRODUCTION (Classical) statistical mechanical systems in equilibrium are described by the Gibbs measure, which connects thepropensity of a system to move between two states taken in isolation (the energy diﬀerences between these twostates) to the probability of ﬁnding the system in one of the states, when all states are available. This relation isnormally used to ﬁnd equilibrium statistics of a system (magnetizations, correlation functions etc.) by sampling adynamics for which the Gibbs measure in a stationary state. A Markov Chain Monte Carlo (MCMC) method, suchas Glauber dynamics for Ising systems, which we will review brieﬂy below in Sec. II, can work if the sample averageconverges quickly enough to Gibbs measure, and if the quantity to be measured has wide support in phase space.Well-known scenarios where this is case are spin systems in the high-temperature phase and when measuring e.g. total magnetization. In the low-temperature phase relaxation time to the Gibbs distribution can be very long. On theother hand, if the quantity to be measured is e.g. the magnetization of a single spin, then MCMC in a large systemis slow for the trivial reason that one needs to sweep through all the spins while just being interested in the changesin and average over one of them. If the interactions are weak, marginal probability distributions can be computedperturbatively in mean-ﬁeld theory [1–3], which give closed equations for e.g. single-spin magnetizations. For dilutesystems, where every spin is not connected to most other spins, very powerful message-passing methods have beendeveloped by physicists, information theorists and computer scientists over the last two decades to compute marginalsof Gibbs distributions quickly and accurately [4, 5]. While these cavity equations cannot (in their simplest form)deal with the complex phases of random spin systems at low temperature, in suitable scenarios they are much moreaccurate than mean-ﬁeld theory, and they greatly improve on MCMC for single site magnetization and other localproperties by substituting a cumbersome sampling by a direct deterministic computation. The cavity method hasfound to have many technological as well as fundamental applications [5–9].The situation is very diﬀerent for out-of-equilibrium systems, in itself is an extremely broad term covering every-thing from macroscopic hydrodynamics (turbulence) [10] and physical and chemical kinetics [11] to interdisciplinaryapplications of statistical physics to neuroscience, population biology, and other ﬁelds [12, 13]. We here consider themodel systems obtained when generalizing the MCMC rules of Ising spin systems (Glauber dynamics) from the equi-librium case (symmetric interactions) to a non-equilibrium case (non-symmetric interactions). Such “kinetic Ising”models are only conceptual – but tractable – models of real spin systems driven out of equilibrium, and have mainlybeen studied with applications to neuroscience in mind [14–17]. From the mathematical point of view, they are speciﬁcexamples of Markov chains which do not obey detailed balance conditions. In contrast to equilibrium systems, thereis hence no simple expression for a stationary state akin to a Gibbs measure, but such a state, when it exists, is a(complicated) function of all the details of the model. On the other hand, MCMC works as well in such systems asfor standard equilibrium Ising models, and mean-ﬁeld theory have been developed up to second order in the inter-action strength [18]. This leaves open the case of dilute kinetic Ising models, where in the equilibrium case cavity ∗ Department of Information and Computer Science, Aalto University, Finland; ACCESS Linnaeus Centre, KTH - Royal Institute ofTechnology, Stockholm, Sweden a r X i v : . [ c ond - m a t . d i s - nn ] M a r methods would be preferable. A dynamic cavity method has only very recently developed for majority dynamics[19]and Glauber dynamics [20] and was investigated by us for parallel and sequential update schemes in [21].The dynamic cavity method as outlined in [21] comprises ﬁrst an ansatz on probability distributions, similarto standard cavity, then the Belief Propagation ansatz that cavity distributions factorize, and then also a furtherassumption of Markovianity. As discussed in [22], the second assumption is exact for fully asymmetric models withthe parallel update rule. In a more general case, where either the interaction matrix has both a symmetric and anasymmetric component, or where the update rule is diﬀerent, it is however but an approximation. The numericalresults of [21], which showed that for some such mixed instances the dynamic cavity is quite accurate, were somewhatunexpected. The main motivation for the present paper is therefore to show more systematically in what parameterranges dynamic cavity converges (for these models), where it is accurate compared to MCMC, and to compare itspredictions to mean ﬁeld theory. We will show that for dilute kinetic Ising models, dynamic cavity works also forthe magnetizations of individual spins, and is considerably more accurate than mean-ﬁeld calculations of the samequantities.Kinetic Ising models have been studied by other approaches, and we outline them brieﬂy here. When the dis-creteness of states is relaxed to a spherical Ising model. Sompolinsky and Zippelius developed a Langevin equationformalism [23], later extended by Crisanti and Sompolinsky to the non-equilibrium case [24], where several phasesof these (dynamical) models are outlined. Although pioneering, predicting magnetizations of individual spins is outof scope of such methods, as the sphericity approximation has been made. The dynamic replica theory (DRT) hasbeen applied to kinetic Ising models [25], which, by the nature of replica theory, however only applies to averagesover ensembles of models. Sommers developed a path integral formulation for the Glauber dynamics [26], which wasat that time only investigated approximately. As an alternative approach to path integral formulation, generatingfunctional analysis was developed to study non equilibrium statistical mechanics of disordered systems [27]. It wasshown by Neri and Boll´e in [20] that at least in some cases, a dynamic cavity analysis explicitly averaged over a randomensemble recovers the results of generating functional analysis. Recently, Hertz and Roudi [28, 29] used generatingfunctional analysis to derive mean-ﬁeld theories, for inﬁnite-range spin glass models. To compare the accuracy of thepredictions of single-site magnetizations by the dynamic mean-ﬁeld formula of [28] to the dynamic cavity for dilutemixed models was one further motivation for this work.The paper is organized as follows. In section II we describe the Glauber dynamics for spin glasses, the modelwhich we will study. In Section III we discuss two approaches to a dynamic version of the TAP corrections to ﬁrst-order mean-ﬁeld theory [18, 28–30], while in Section IV we derive dynamic cavity equations for diluted networks inparallel update. This derivation should be seen as an alternative and (we hope) clearer alternative to [20] and ourearlier contribution [21]. The main new results of this paper, on the convergence phase of dynamic cavity and on acomparison between the predictions of dynamic cavity and mean-ﬁeld theory to MCMC are presented in Section V.In Section (VI) we conclude and discuss possible application areas of dynamic cavity. II. THE PARALLEL SPIN UPDATE SCHEME IN DILUTE KINETIC ISING MODELS

The asymmetric dilute Ising model is deﬁned over a set of N binary variables (cid:126)σ = { σ , . . . , σ N } , and an asymmetricgraph G = ( V, E ) where V is a set of N vertices, and E is a set of directed edges. To each vertex v i is associateda binary variable σ i . The graphs G are taken from random graph ensembles with bounded average connectivity.Following the parametrization of [27] we introduce a connectivity matrix c ij , where c ij = 1 if there is a link fromvertex i to vertex j , c ij = 0 otherwise, and matrix elements c ij and c kl are independent unless { kl } = { ji } . Therandom graph is then speciﬁed by marginal (one-link) distributions p ( c ij ) = cN δ ,c ij + (1 − cN ) δ ,c ij . (1)and conditional distributions p ( c ij | c ji ) = (cid:15)δ c ij ,c ji + (1 − (cid:15) ) p ( c ij ) . (2)where i, j ∈ { , ..., N } and i < j . In this model the average degree distribution is given by c , and the asymmetryis controlled by (cid:15) ∈ [ 0 , (cid:15) give respectively a fully asymmetric network ( (cid:15) = 0), wherethe probabilities of having two directed links between pairs of variables are uncorrelated, and the symmetric network( (cid:15) = 1) where the two links i → j and j → i are present or absent together. The parameter set is completed by a(real-valued) interaction matrix J ij . Additional assumptions on the J ij , i.e. smallness or that they are random withsuitable distribution are stated when used. However, for concreteness the reader may in much of this paper think of J ij to be independent identically distributed random variables with zero mean and variance c (Gaussian or binary)such that for the fully connected networks ( c = N ), the interactions scale as the Sherrington-Kirkpatrick model [31].The interactions among spins determine the dynamics of system. In the parallel update scheme, which will beconsidered here, at each (discrete) time, all spins are updated according to the Glauber rule σ i ( t + 1) = (cid:26) +1 with probability { − β h i ( t + 1)) } − − { β h i ( t + 1)) } − (3)where h i ( t ) is the eﬀective ﬁeld acting on spin i at time step th i ( t ) = (cid:88) j ∈ ∂i J ji σ j ( t −

1) + θ i ( t ) . (4)and the parameter β , analogous to inverse temperature, is a measure of the overall strength of the interactions. Thenotation j ∈ ∂i in (3) and (4) indicates all vertices having a direct links to node i and θ i is the (possibly time-dependent) external ﬁeld acting on spin i . In this paper we will adhere to the convention that the interaction indicesare written in the same order as the temporal order of the interacting spins. Hence we have J ij σ i ( s ) σ j ( s + 1) and J ji σ j ( s ) σ i ( s + 1).The joint probability distribution over all the spin histories p ( (cid:126)σ (0) , . . . , (cid:126)σ ( t )) has in principle the following simpleMarkov form P ( (cid:126)σ (0) . . . , (cid:126)σ ( t )) = t (cid:89) s =1 W [ (cid:126)σ ( s ) | (cid:126)h ( s )] p ( (cid:126)σ (0)) (5)where W is the appropriate transition matrix describing dynamics and updates. If we would have a fully understandingof joint probability distribution deﬁned in (5) we could compute time dependent macroscopic quantities such asmagnetization and correlations. The evolution of a a single spin is (trivially) deﬁned by summing over the historiesof all spins except one P i ( σ i (0) , . . . , σ i ( t )) = (cid:88) (cid:126)σ \ i (0) ,...,(cid:126)σ \ i ( t ) P ( (cid:126)σ (0) , . . . , (cid:126)σ ( t )) (6)which can be further marginalized to the probability of one spin at one time p i ( σ i ( s )) = (cid:88) σ i (0) ,...,σ i ( s − ,σ i ( s +1) ,...,σ i ( t ) P i ( σ i (0) , . . . , σ i ( t )) (7)and similarly for pairwise joint probability of the histories of two spins P ij ( σ i (0) , . . . σ i ( t ) , σ j (0) , . . . , σ j ( t )) and p ij ( σ i ( s ) , σ j ( s (cid:48) )). Consequently, the time evolution of single site magnetization can be obtained from Eq(7) as m i ( t ) = (cid:88) σ i ( t ) σ i ( t ) p i ( σ i ( t )) (8)and similarly the correlation functions c ij ( s, t ) = (cid:88) σ i ( s ) ,σ j ( t ) σ i ( s ) σ j ( t ) p ij ( σ i ( s ) , σ j ( t )) (9)Substituting Eq(8) into dynamics deﬁned in (5) we have m i ( t ) = (cid:42) tanh( (cid:88) j ∈ ∂i J ji σ j ( t −

1) + θ i ( t )) (cid:43) (10)where brackets are average with respect to trajectory history. Equation (10) is exact for the time-dependent magne-tization. It is not directly practical, since the marginal over one spin at one time (the magnetization) depends on thejoint distribution of all the spins inﬂuencing it at the previous time, but as we will see in Section III B it can be usedas a starting point of a perturbative calculation. III. MEAN-FIELD THEORIES, TAP, AND THE EXPANSION IN SMALL INTERACTIONS

The mean ﬁeld theory of spin glass systems started with the Sherrington Kirkpatrick (SK) model [31]. In thismodel all spins interact with all other spins (inﬁnite-range couplings), which motivates the simplest mean ﬁeld or“naive mean-ﬁeld” approximation m i = tanh β (cid:16)(cid:80) j J ji m j + θ i (cid:17) . Shortly afterwards a more accurate mean ﬁeldtheory (TAP) was introduced by introducing Onsager reaction for the SK model. This corrects m j inside the tanh to m j − βJ ij m i (1 − m j ) where J ij m i is the ﬁeld from spin i on spin j and where χ jj = β (1 − m j ) can be interpretedas a local susceptibility at spin j [1]. Since in equilibrium Ising J ij = J ji the TAP equilibrium mean ﬁeld theory ishence m i = tanh (cid:16) β (cid:80) j J ji m j + βθ i − β m i (cid:80) j J ji (1 − m j ) (cid:17) . As stated in [1] these results can be derived from thecavity approach. These can also be derived by observing that in equilibrium a susceptibility is related to a correlationby ﬂuctuation-dissipation, and the appropriate correlation was computed by a perturbative argument [1]. For alater approach by ﬁeld-theoretical methods, expanding a functional determinant describing the ﬂuctuations around amean-ﬁeld stationary point of an action, see e.g. [32], and references therein.In equilibrium Ising systems the naive mean-ﬁeld and the TAP approximations can further be computed by ex-panding the Boltzmann distribution in the interaction strength [33] . To ﬁrst and second order in interactions thisresult agree with naive mean-ﬁeld and TAP.Recently, a dynamic version of TAP has been derived by Hertz and Roudi [28, 29] by a ﬁeld-theoretical argument,and we show here in Section III B below that this also follows from Information Geometry, essentially a systematicexpansion in interaction strength. For completeness, we will also show that the same dynamic version of TAP followsfrom the “exact mean-ﬁeld theory” of M´ezard and Sakellariou [30], as already pointed out in [34].Outside equilbrium ﬂuctuation-dissipation does not hold. Conceptually one could therefore say that “dynamicTAP” as such is undeﬁned, or, alternatively, that a proper generalization of TAP to a non-equilibrium system shouldbe based on ﬂuctuation relations generalizing ﬂuctuation-dissipation theorems [35] (a task we have not attempted tocarry out). In this paper we take however a more pragmatic approach, and understand “dynamic TAP” to be theformulae derived in [28, 29].Before turning to the technical discussion, let us note that since mean-ﬁeld and TAP have obvious computationaladvantages, these theories have been applied in much wider settings than in which they have been derived, particularlyin neuroscience. For a recent review, see [36] and references therein. A. Fully asymmetric networks: a reduced theory

In this section we recall the theory in [30], with a view to compute the expansion in small interactions to thirdorder. We start by rewriting the exact equation (10) in the following explicit form: m i ( t ) = (cid:88) σ i ( t ) ,σ ∂i ( t − p ( σ ∂i ( t − σ i ( t ) e βσ i ( t ) ( (cid:80) j ∈ ∂i J ji σ j ( t − θ i ( t ) )2 cosh (cid:16) β ( (cid:80) j ∈ ∂i J ji σ j ( t −

1) + θ i ( t )) (cid:17) (11)where σ ∂i is the collection of spins neighboring i with c ji (cid:54) = 0 and p ( σ ∂i ( t − J ji in above is non-zero, then the opposite J ij is zero. Each of the spins σ j ( t −

1) on the right hand side therefore does not depend directly on spin i on yetone time step before, i.e. on σ i ( t − σ j ( t −

1) will in turn depend ondistributions of other σ k ( t − σ k ( t −

2) do not depend on the σ j ( t − σ j on the right hand side of (11) except throughthe cavity spin σ i ( t ), or if such paths are unimportant, then the spins σ j ( t −

1) will be independent in an asymmetricnetwork. and the eﬀective ﬁeld h i ( t ) = θ i ( t ) + (cid:80) j ∈ ∂i J ji σ j ( t −

1) acting on σ i ( t ) will be the sum of independentrandom variables.When the number of interacting spins is large the distribution of h i ( t ) follows from the central limit theorem p ( h i ( t )) = 1 (cid:112) πV i ( t ) exp (cid:20) − ( h i ( t ) − (cid:104) h i ( t ) (cid:105) ) V i ( t ) (cid:21) (12)where (cid:104) h i ( t ) (cid:105) = θ i ( t ) + (cid:80) j ∈ ∂i J ji m j ( t −

1) and V i ( t ) = (cid:104) h i ( t ) (cid:105) − (cid:104) h i ( t ) (cid:105) . We note that to arrive at this result, ﬁrstthe thermodynamic limit ( N → ∞ ) is taken at given connectivity c (so that the terms J ji m j ( t −

1) are independent),and then c is taken large (so that there are many of them). In general V i ( t ) is deﬁned as V i ( t ) = (cid:88) j ∈ ∂i,k ∈ ∂i J ji J ki [ (cid:104) σ j ( t − σ k ( t − (cid:105) − m j ( t − m k ( t − J ji are random, independent and evenly distributedand small the sum is dominated by the diagonal terms i.e. V i ( t ) = (cid:88) j ∈ ∂i J ji (cid:0) − m j ( t − (cid:1) (14)This gives the “exact mean-ﬁeld” theory of [30]: m i ( t ) = (cid:90) Dx tanh  β  θ i ( t ) + (cid:88) j J ji m j ( t −

1) + x (cid:115)(cid:88) j J ji (1 − m j ( t − )  (15)with the Gaussian measure Dx = dx √ π e − x . Equation (15) can be iterated starting from some initial condition to getall magnetizations at any time, and is exact when the assumptions hold i.e when the network is fully asymmetric,when the cavity assumptions hold, when any spin is inﬂuenced by a large number of other spins, and when theinteractions are random, independent, evenly distributed and small.To expand (15) in small interactions we introduce c i ( t ) ≡ (cid:113)(cid:80) j J ji (1 − m j ( t − ) and take all these quantities oforder (cid:15) . We have tanh [ β ( (cid:104) h i ( t ) (cid:105) + c i ( t ) x )] = tanh [ β (cid:104) h i ( t ) (cid:105) ] + xc i ( t ) β (1 − tanh [ β (cid:104) h i ( t ) (cid:105) ]) (16) − x c i β tanh [ β (cid:104) h i ( t ) (cid:105) ] (1 − tanh [ β (cid:104) h i ( t ) (cid:105) ]) + O ( (cid:15) )where every odd term in this expansion will give zero when integrated against a Gaussian measure. Therefore wehave m i ( t ) = tanh [ β (cid:104) h i ( t ) (cid:105) ] − β tanh [ β (cid:104) h i ( t ) (cid:105) ] (1 − tanh [ β (cid:104) h i ( t ) (cid:105) ]) c i ( t ) + O ( (cid:15) ) (17)We would like to write the right hand side of (17) as tanh [ β ( (cid:104) h i ( t ) (cid:105) + ∆ i ( t ))] + O ( (cid:15) ). A comparison shows that thisis possible setting ∆ i ( t ) = β tanh [ β (cid:104) h i ( t ) (cid:105) ] c i ( t ). We therefore have to fourth order the following functional expression m i ( t ) = tanh (cid:2) β (cid:0) (cid:104) h i ( t ) (cid:105) − β tanh [ β (cid:104) h i ( t ) (cid:105) ] c i ( t ) (cid:1)(cid:3) + O ( (cid:15) ) (18)To ﬁrst order in (cid:15) the solution is m i ( t ) = tanh  β  (cid:88) j ∈ ∂i J ji m j ( t −

1) + θ i ( t )  + O ( (cid:15) ) (19)which is the “dynamic naive mean-ﬁeld”. Inserting this back in (18) we have “dynamic TAP” of [28, 29] m i ( t ) = tanh  β  (cid:88) j ∈ ∂i J ji m j ( t −

1) + θ i ( t )  − β m i ( t ) (cid:88) j ∈ ∂i J ji (1 − m j ( t −  + O ( (cid:15) ) (20)The last term inside the tanh is of order (cid:15) and a form analogous to the Onsager back-reaction term; there is no thirdorder correction in (cid:15) in this theory. B. The Information Geometry viewpoint

We will now derive the analogues of (19), (20) and a third order term by following the approach of InformationGeometry [3, 18, 37]. Let (cid:126)σ (0) , . . . , (cid:126)σ ( t ) be the time history of a collection of spins. We assume that these spins havebeen generated by a kinetic Ising model with parallel updates and (possibly) time-dependent external ﬁelds. Thejoint probability of the history of all the spins is then P ( (cid:126)σ (0) , . . . , (cid:126)σ ( T ) | (cid:126)θ (0) , . . . , (cid:126)θ ( T ) , { J ij } ) = T (cid:89) t =1 (cid:89) i exp( σ i ( t ) h i ( t )) / h i ( t )) , h i ( t ) = θ i ( t ) + (cid:88) j J ji σ j ( t − . (21)In Information Geometry the space of these model is considered as a manifold with coordinates being the (many)parameters (cid:126)θ (0) , . . . , (cid:126)θ ( T ) , { J ij } . A sub-manifold is the family of independent models P ind( (cid:126)σ (0) , . . . , (cid:126)σ ( T ) | (cid:126)θ ind(0) , . . . , (cid:126)θ ind( T )) = T (cid:89) t =1 (cid:89) i exp( σ i ( t ) h i ( t )) / h i ( t )) , h i ( t ) = θ ind i ( t ) . (22)A mean-ﬁeld approximation is deﬁned as the the independent model with the same magnetizations as the fullmodel [37]. For our case it is easily seen that m i ( t ) = tanh( θ ind i ( t )) is the variational equation with respect toparameter θ ind i ( t ) of the Kullback-Leibler divergence D − (cid:2) p | p ind (cid:3) = (cid:80) p ln pp ind . Therefore, the mean ﬁeld approxi-mation in Information Geometry can also be seen as the independent model with the least Kullback-Leibler divergencefrom the full model [3, 18, 37].Following the approach of [18] we take the interaction parameters ( J ij ) as small parameters (of order (cid:15) ), and assumethat the diﬀerences ∆ θ i ( t ) = θ i ( t ) − θ ind i ( t ) can be expanded in (cid:15) :∆ θ i ( t ) = (cid:15) ∆ (1) i ( s ) + (cid:15) ∆ (2) i ( s ) + . . . (23)We can then write in analogy with Eq.3.2 of [18]0 = m i ( t ) − m ind i ( t ) = (cid:15) (cid:88) i,s ∂m i ( t ) ∂θ i ( s ) | ind∆ (1) i ( s ) + (cid:15) (cid:88) j,k ∂m i ( t ) ∂J kl | ind J kl + (cid:15) (cid:88) i,s ∂m i ( t ) ∂θ i ( s ) | ind∆ (2) i ( s ) + (cid:15) (cid:88) JK ∂ m i ( t ) ∂ Θ J ∂ Θ K | ind∆ (1) Θ J ∆ (1) Θ K + O ( (cid:15) ) (24)Here Θ J stands for the set of all interacting couplings and external ﬁelds and J runs over relevant indices. Thesubscript indicates that all derivatives are evaluated at the independent model, and the left-hand side is zero becausethis is the variational equation. In the last term the sum goes over all the parameters labeled J, K and the parameterincrements are the ﬁrst order terms ∆ (1) i ( s ) and J kl ; on third and higher orders mixed terms of ∆ (1) i ( s ) and ∆ (2) i ( s )will appear. A calculation presented in Appendix gives the results∆ (1) i ( t ) = − (cid:88) j J ji m j ( t −

1) (25)∆ (2) i ( t ) = m i ( t ) (cid:88) k J ki (1 − m k ( t − (3) i ( t ) = − (cid:88) k (1 − m k ( t − J ki ∆ (2) k ( t −

1) (27)Equation (23) together with the variational equation can be re-writtentanh − m i ( t ) = θ i ( t ) − (cid:15) ∆ (1) i ( s ) − (cid:15) ∆ (2) i ( s ) − (cid:15) ∆ (3) i ( s ) + O ( (cid:15) ) (28)It is seen that to (cid:15) this is “dynamic naive mean-ﬁeld”, compare (19), to (cid:15) this is “dynamic TAP”, compare (20), andto to (cid:15) typically there is a non-zero term absent in (20). Such a higher-order diﬀerence between the exact mean-ﬁeldtheory for the asymmetric model and the ﬁeld-theoretical approach of [28, 29] was also pointed out in [30] (page 4, intext below Eq. 7). IV. DYNAMIC CAVITY METHOD

The cavity method for equilibrium systems was introduced in [38, 39] while the dynamic version was studied butrecently [19–21]. In contrast to the equilibrium case, using only the cavity assumption does not in general provide uswith an eﬃcient algorithm in the dynamic case, but further assumptions are necessary. In this section we derive thedynamic cavity method for the kinetic Ising problem, taking a more explicit route than in [20] and [21].

A. Cavity and BP on spin histories

We consider a number of spins evolving according to a dynamics such as (5), and we let X i denote the whole historyof spin i , X i = ( σ i (0) , σ i (1) , . . . , σ T (0)). The probability in (5) can then be alternatively be interpreted as a jointprobability of spin histories, P ( X , X , . . . , X N ), and this probability can be represented by a graph where nodes i and j are connected if either J ij or J ji (or both) are non-zero. The corresponding product form is P ( X , X , . . . , X N ) = (cid:89) i e (cid:80) s θ i ( s ) σ i ( s ) (cid:89) ij e (cid:80) s J ij σ i ( s ) σ j ( s +1) (cid:89) i e − (cid:80) s log 2 cosh( θ i ( s )+ (cid:80) j J ji σ j ( s − (29)which is already normalized. Belief Propagation is expected to work well if this graph is locally tree-like i.e. if allloops are long, and can be ignored [5]. In (29) this is never the case, even if the couplings are fully asymmetric,for the simple reason that if spins i and j drive spin k , then they are coupled both by the terms σ i ( t ) σ k ( t + 1)and σ j ( t ) σ k ( t + 1), and by the normalization log 2 cosh h k ( t + 1). However, these couplings are of a rather peculiartype. To proceed we introduce four diﬀerent marginal probabilities. The ﬁrst P i is the marginal probability of spinhistory X i . The second P i + ∂i is the marginal probability on the set of histories { X i (cid:83) X ∂i } . The third P ∂i is themarginal on the set of histories X ∂i . The fourth and last is P ( i ) , a cavity distribution on X ∂i . This we take as themarginal on X ∂i in a revised model where both the spin history X i as well as the normalization log 2 cosh h i ( t ) havebeen eliminated. All four probabilities depend on external ﬁeld parameters which are not necessarily the same. Inparticular, we will express P ∂i with one set of external ﬁelds as P ( i ) with another set of external ﬁelds. By deﬁnition P i + ∂i = W ( X i | X ∂i ) P ∂i . The peculiarity of the model is that the (normalized) conditional probability W ( X i | X ∂i ) isalready explicitly included in (29). We can therefore compare P ∂i ( (cid:126)X ∂i ) = P i + ∂i ( X i (cid:83) X ∂i ) W ( X i | X ∂i ) ∝ (cid:89) j ∈ ∂i e (cid:80) s θ j ( s ) σ j ( s )+ (cid:80) k ∈ ∂j (cid:80) s J kj σ k ( s ) σ j ( s +1) − (cid:80) s log 2 cosh( θ j ( s )+ (cid:80) k J kj σ k ( s − (cid:88) (cid:126)X \{ X i (cid:83) X ∂i } (cid:89) k (cid:54) = i,∂i (cid:32) e (cid:80) s θ k ( s ) σ k ( s ) (cid:89) l ∈ ∂k e (cid:80) s J kl σ k ( s ) σ l ( s +1) (cid:89) k e − (cid:80) s log 2 cosh( θ k ( s )+ (cid:80) l J lk σ l ( s − (cid:33) (30)to P ( i ) ( X ∂i ) ∝ (cid:89) j ∈ ∂i e (cid:80) s θ j ( s ) σ j ( s )+ (cid:80) k ∈ ∂j,k (cid:54) = i (cid:80) s J kj σ k ( s ) σ j ( s +1) − (cid:80) s log 2 cosh( θ j ( s )+ (cid:80) k (cid:54) = i J kj σ k ( s − (cid:88) (cid:126)X \{ X i (cid:83) X ∂i } (cid:89) k (cid:54) = i,∂i (cid:32) e (cid:80) s θ k ( s ) σ k ( s ) (cid:89) l ∈ ∂k e (cid:80) s J kl σ k ( s ) σ l ( s +1) (cid:89) k e − (cid:80) s log 2 cosh( θ k ( s )+ (cid:80) l J lk σ l ( s − (cid:33) (31)This comparison shows that P ∂i with external ﬁelds θ j ( t ) is the same as P ( i ) with modiﬁed external ﬁelds θ j ( t ) + J ij σ i ( t − P i ( X i ) = (cid:80) X ∂i W ( X i | X ∂i ) P ∂i ( X ∂i ) we can therefore write the marginal probability P i as P i ( X i | θ i (0) , ..., θ i ( t ) , · ) = (cid:88) σ ∂i (0) ...σ ∂i ( t − P ( i ) ( X ∂i (0) | θ ( i ) ∂i (0) , . . . , θ ( i ) ∂i ( t ) , · ) t (cid:89) s =1 W i ( σ i ( s ) | h i ( s )) p i ( σ i (0)) (32)where · indicates all the parameters (external ﬁelds and interactions) which are the same on the two sides of theequation, and θ ( i ) j ( s ) = θ j ( s ) + J ij σ i ( s − s = 0 , . . . , t (33)is the set of external ﬁelds that are modiﬁed.The next step is to make the Belief Propagation assumption that the spin histories are taken independent in thecavity graph: P ( i ) ( X ∂i | θ ( i ) ∂i (0) , . . . , θ ( i ) ∂i ( t ) , · ) = (cid:89) j ∈ ∂i µ j → i ( X j | θ ( i ) j (0) , . . . , θ ( i ) j ( t ) , · ) (34)We here used used µ i → j to represent marginal probabilities of neighboring spins, as standard in the BP literature.We now consider the subgraph T ( i ) j connected to one spin j in the cavity of i . In analogy to above we want tocompare the marginal on the set of neighbours to spin j in T ( i ) j , to the cavity distribution on the same set of variables.As above the ﬁrst with one set of external ﬁelds is the same as the second with another set of external ﬁelds, and wetherefore ﬁnd the following recursion equations (“BP update equations”): µ j → i ( X j | θ ( i ) j (0) , ..., θ ( i ) j ( t ) , · T ( i ) j ) = (cid:88) X ∂j \ i (cid:89) k ∈ ∂j \ i µ k → j (cid:16) X k | θ ( i ) , ( j ) k (0) , ..., θ ( i ) , ( j ) k ( t − , · T ( i ) j (cid:17) t (cid:89) s =1 w j ( σ j ( s ) | h ( i ) j ( s )) µ j → i ( σ j (0)) (35)where · T ( i ) j indicates all the parameters (external ﬁelds and interactions) which are the same on the two sides of theequation, and θ ( i ) , ( j ) k ( s ) = θ ( i ) k ( s ) + J jk σ j ( s − s = 0 , . . . , t (36)is the set of external ﬁelds that are modiﬁed. Since in fact θ ( i ) k ( s ) = θ k ( s ) (spin k is not directly connected to i ) wenote that in (36) the upper index ( i ) can be dropped on both sides. The eﬀective ﬁeld on spin j in T ( i ) j is h ( i ) j ( s ) = (cid:88) k ∈ ∂j \ i J kj σ k ( s −

1) + θ j ( s ) (37)and w j ( σ j | h ( i ) j ( s )) is the transition probability for the single spin j in the model on T ( i ) j .The marginal probability over the history of one spin (“BP output equation”) follows from (32) and (34) and is P i ( X i | θ i (0) , ..., θ i ( t ) , · ) = (cid:88) X ∂i (cid:89) k ∈ ∂i µ k → i ( σ k (0) , ..., σ k ( t − | θ ( i ) k (0) , ..., θ ( i ) k ( t − , · ) t (cid:89) s =1 W i ( σ i ( s ) | h i ( s )) p i ( σ i (0)) (38)Equations (35) and (38) are the dynamic cavity equations for our system. Both are large sets of equations connectingmarginal distributions and cavity distributions between two probabilistic models with diﬀerent parameters. In generalthese equations are (as far as we know) only of conceptual value since on top of connecting diﬀerent models, the righthand side also involves on the order of 2 T | ∂i | operations. In BP such an operation would have to be iterated (for allvariables) a number of times to reach convergence: as T grows large this becomes unfeasible for the same reason thatordinary BP does not work well if the state space of each variable is large.We can deﬁne (and will later use) marginalizations of the messages down to one time (it is no restriction to takethis time as the last time): µ tj → i ( σ j ( t ) | θ ( i ) j ( t )) = (cid:88) σ j (0) ...σ j ( t − µ j → i ( σ j (0) , ..., σ j ( t ) | θ ( i ) j (0) , ..., θ ( i ) j ( t )) (39)but in general these quantities do not obey closed equations among themselves. An important exception are fullyasymmetric networks, since there at most one of J ij and J ji is non-zero. We note that in (35) and (38) the probabilitydistribution of spin i depends on the neighbors ∂i through the eﬀective ﬁelds h ( i ) j ( s ) and h i ( s ), but the messages sentfrom the neighbors to i also depend parametrically on the history of i through the modiﬁed external ﬁelds θ ( i ) k . Thisback-action is absent for the fully asymmetric case where θ ( i ) k = θ k independent of spin i . B. The projected dynamic BP

As discussed the marginalization of dynamic BP over one time is not in general a Markov process. However, thelong time behavior of dynamics (stationary state) is often demanded in many cases. In this section we explain anapproximation scheme for computing marginal probabilities for one spin over one time in stationary state, a procedurecalled one-time approximation in [20] and time factorization in [21, 22].We start with the dynamic BP for the time histories of messages, i.e.

Eq(35), where on the right hand side thetime trajectory of messages sent from neighboring spins carry the information from the whole time history of thosespins. We note that the full dynamics Eq. (5) is in fact Markov. This, and the need to introduce some approximation,motivates the time-factorization ansatz which we write for the terms in the right-hand side of Eq(35): µ k → j (cid:16) σ k (0) , . . . , σ k ( t ) |· T ( j ) k (cid:17) = t (cid:89) s =0 µ sk → j (cid:16) σ k ( s ) |· T ( j ) k (cid:17) (40)where · T ( j ) k indicates the parameters of the model, the same on both sides. Obviously, when inserted into the right-hand side of (35) such a factorization is not preserved on the left hand side. Since we deal with binary variables wecan introduce time-factorized cavity biases u T ( j ) k k → j ( s ), again written for the right-hand side of (35) which are deﬁned by µ sk → j (cid:16) σ k ( s ) | · T ( j ) k (cid:17) = e β (cid:34) u T ( j ) kk → j ( s ) σ k ( s ) (cid:35) (cid:20) β (cid:18) u T ( j ) k k → j ( s ) (cid:19)(cid:21) (41)A crucial observation is now that when the time-factorization ansatz has been made the cavity biases at diﬀerentexternal ﬁelds are simply related. We will need u T ( j ) k k → j ( s ) = u T ( i ) j k → j ( s ) + J jk σ j ( s − s = 0 , . . . , t (42)which follows from the relation (36). Inserting (41) and (42) into (35) gives µ j → i ( σ j (0) , ..., σ j ( t ) | · T ( i ) j ) = (cid:88) σ ∂j \ i (0) ,...,σ ∂j \ i ( t − t (cid:89) s =0 (cid:89) k ∈ ∂j \ i e βσ k ( s ) (cid:32) u T ( i ) jk → j ( s ) + J jk σ j ( s − (cid:33) (cid:20) β (cid:18) u T ( i ) j k → j ( s ) + J jk σ j ( s − (cid:19)(cid:21) t (cid:89) s =1 w j ( σ j ( s ) | h ( i ) j ( s )) µ j → i ( σ j (0)) (43)This equation can be marginalized explicitly over the last time to give µ tj → i ( σ j ( t ) | · T ( i ) j ) = (cid:88) σ j ( t − ,σ ∂j \ i ( t − (cid:89) k ∈ ∂j \ i e βσ k ( s ) (cid:32) u T ( i ) jk → j ( t − J jk σ j ( t − (cid:33) (cid:20) β (cid:18) u T ( i ) j k → j ( t −

1) + J jk σ j ( t − (cid:19)(cid:21) w j ( σ j ( t ) | h ( i ) j ( t )) µ t − j → i ( σ j ( t − | · T ( i ) j )(44)The projected dynamic cavity is then to use (44) to compute the terms in a time-factorization of the left-handside of Eq(35). Except for fully asymmetric models (with parallel updates), this approach is not appropriate fortransients [22]. However, when the external ﬁelds θ i are constant in time and when a stationary state has beenreached, it may be acceptable to also take the messages independent in time. For one and the same set of parametervalues the ﬁxed-point equations for the time-independent time-factorized cavity biases are then u ∗ j → i = 12 β (cid:88) σ j σ j log  (cid:88) σ ∂j \ i ,σ (cid:48) j e β (cid:80) k ∈ ∂j \ i σ k ( u ∗ k → j + J jk σ (cid:48) j )2 cosh (cid:104) β (cid:16) u ∗ k → j + J jk σ (cid:48) j (cid:17)(cid:105) e β h ( i ) j σ j β h ( i ) j ) e βσ (cid:48) j u ∗ j → i (cid:0) βu ∗ j → i (cid:1) (cid:35) h ( i ) j = (cid:88) k ∈ ∂j \ i J kj σ k + θ j (45)Eq (45) is as ordinary BP solved by iteration, where the right-hand side is computed from u ( t − k → j at iteration time t −

1, giving the left hand side u ( t ) j → i at iteration time t . The spin σ j summed over is then conceptually at time t , thespins σ k at time t − σ (cid:48) j at time t −

2, all these in the iteration time.0Using the iteration time as a proxy for real time we note that in a transient we can compute the time evolution ofmagnetization which would follow from (41), (37) m i ( t ) = (cid:88) σ ∂i \ j ( t − ,σ i ( t − e β (cid:80) k ∈ ∂i \ j [ u k → i ( t − J ik σ i ( t − σ k ( t − (cid:81) k ∈ ∂i \ j β ( u k → i ( t −

1) + J ik σ i ( t −  β ( (cid:88) j ∈ ∂i J ji σ j ( t −

1) + θ i )  e βu i → j ( t − σ i ( t − βu i → j ( t − V. RESULTS

In this section we investigate the performance of dynamic cavity method in computing stationary states of dilutedspin glass in parallel update, and compare to MCMC (Glauber dynamics) and to dynamic mean-ﬁeld and dynamicTAP as deﬁned in Section III. The convergence of projected dynamic cavity (dynamic cavity in time-factorized ap-proximation) is monitored by comparing magnetization computed from (46) at successive times for diﬀerent parametervalues of the model, and these predictions are then compared to dynamic mean-ﬁeld and dynamic TAP and MCMC.

A. Convergence of dynamic BP

In order to detect where dynamic BP reaches a stationary state we compare single magnetization in two successivetime step as ∆( t ) = 1 /N N (cid:88) i =1 ( m i ( t ) − m i ( t − (47)Whenever this deviation vanishes dynamic BP must have converged to a stationary state. Fig. 1 shows the resultsfor various connectivity parameters in symmetric and partially symmetric networks. In high temperature we observeconvergence towards a ﬁxed point whereas in low temperature BP does not reach a stationary state. Roughly speaking,dynamic BP stops converging at a value β cr ( c ) which depends on average connectivity. In Fig. 2 the convergence of β / c ∆ - B P ε = 0.5, c = 2 ε = 0.5, c = 3 ε = 0.5, c = 4 β / c ∆ - B P ε = 1, c = 2 ε = 1, c = 3 ε = 1, c = 4 FIG. 1: (Color online) Squared deviation of spin averages between successive update ∆( t ) = 1 /N (cid:80) Ni =1 ( m i ( t ) − m i ( t − atstationary limit for diﬀerent values of average connectivity c . Mean magnetizations are calculated by projected dynamic cavitymethod i.e Eq. 46. Left panel: partially asymmetric networks with (cid:15) = 0 .

5. Right panel : symmetric network (cid:15) = 1. Theresults are averaged over BP initial conditions (10 experiences). System size is 1000 and external ﬁelds are set to zero. dynamic BP is plotted to show the eﬀect of asymmetry. In this case it is simply so that for very asymmetric graphsBP converges in a very wide region, presumably for arbitrarily large values of β if the network grows large enough,and, in general, the more asymmetric the network, the better the convergence.1 β ∆ ε = 0, c = 2 ε = 0.5, c = 2 ε = 1, c = 2 β ∆ ε = 0, c = 3 ε = 0.5, c = 3 ε = 1, c = 3 β ∆ ε = 0, c = 4 ε = 0.5, c = 4 ε = 1, c = 4 FIG. 2: (Color online) Eﬀect of asymmetry ( (cid:15) = 0 , . ,

1) in squared deviation of spin averages between successive update∆( t ) = 1 /N (cid:80) Ni =1 ( m i ( t ) − m i ( t − , obtained by projected dynamic BP Eq. 46, at stationary limit for average connectivity c = 2 (left panel), c = 3 (middle panel), c = 4 (right panel). The results are averaged over BP initial conditions (10 experiences).System size is 1000 and external ﬁelds are set to zero. B. Performance of dynamic BP

Fig. 3 shows a comparison between dynamic cavity method and dynamic mean ﬁeld for total magnetization inspin glass systems with diﬀerent asymmetric parameter. The results are obtained in present of small external ﬁelds θ = 0 . β indicating that it is less accurate compared to the dynamic cavity method. β δ - n M F ε = 0, c = 3 ε = 0.5, c = 3 ε = 1, c = 3 β δ - B P ε = 0, c = 3 ε = 0.5, c = 3 ε = 1, c = 3 FIG. 3: (Color online) Mean square error δ ( t ) = 1 /N (cid:80) Ni =1 ( m predicted i ( t ) − m empirical i ( t )) of the two approximation methods(dynamic mean ﬁeld and dynamic cavity with respect to the empirical data (Glauber dynamics). Left panel: dynamic meanﬁeld (Eq. 19) for networks with diﬀerent asymmetric parameter ( (cid:15) = 0 , . ,

1) and ﬁxed average connectivity c = 3. Right panel: the corresponding results obtained by the projected dynamic cavity method Eq. 46) For small β , i.e. high temperature theyare in agreement with numerical simulations. In low β however, dynamic BP outperforms dynamic mean ﬁeld. In order to observe the comparison in more detail, we show also the scatter plot of spin-by-spin magnetization inFig.4. Dynamic cavity method predicts perfectly local magnetizations for fully asymmetric networks and agrees quitewell with numerical simulations in high temperature for fully symmetric network whereas naive mean ﬁeld and TAPstart to deviate already at moderate temperatures.

VI. CONCLUSION

Message-passing methods have become an important topic on the border-line between equilibrium statistical physicsand information theory. In the present paper we have studied an extension of message-passing to non-equilibriumIsing spin systems. In contrast to the equilibrium case, the cavity method is not immediately useful to describe the2 -0.1 0 0.1-0.100.1 β = . BPTAPMF

Fully asymmetric ε = 0 -0.3 0 0.3 0.6 -1-0.500.51 β = . -1-0.500.51 -0.2 0 0.2 0.4 empirical data ( Glauber dynamics) -0.300.30.6 β = -0.5 0 0.5-1-0.500.51 β = . -0.1 0 0.1-0.100.1 β = . BPTAPMF

Symmetric ε = 1 -1 -0.5 0 0.5 1 -1-0.500.51 β = . -1-0.500.51 -0.4 -0.2 0 0.2 0.4 empirical data ( Glauber dynamics) -1-0.500.5 β = -1 -0.5 0 0.5 1-1-0.500.51 β = . FIG. 4: (Color online) Scatter plot of local magnetizations for dilute asymmetric networks for four diﬀerent temperature β = 0 . , , . , . c = 3. Local magnetizations are obtained by dynamic mean ﬁeld Eq. 19(green), dynamic TAP Eq. 20 (red), and projected dynamic BP Eq. 46 (blue). Left panels show the scatter plots in fullyasymmetric networks ( (cid:15) = 0) where the projected dynamic BP provides exact results and right panels are scatter plots in fullysymmetric networks ( (cid:15) = 1). In high temperature, all three methods agree with numerical simulations. In low temperature BPstarts to outperform naive mean ﬁeld and TAP. dynamics, even if the topology is suitable, because the messages depend on whole spin time histories. The time-factorization assumption, as discussed here and in [20–22], (or some other simplifying assumption) is necessary toreduce the complexity, but when so doing one is generally restricted to stationary states.We have studied dynamic cavity in the time-factorized assumption for stationary states and outlined its convergenceregion in parameter strength ( β ), connectivity ( c ) and asymmetry ( (cid:15) ). By analogy with generally known facts aboutBP it can be argued that when dynamic cavity converges it should typically be a good approximation; the region ofconvergence is therefore a useful proxy for the accuracy. Expanding on ﬁrst results presented in [21] we show thatthe convergence region in β increases with the connectivity. We also ﬁnd that the convergence region increases withasymmetry for several values of connectivity, and that it converges for any interaction strength for fully asymmetricnetworks (as expected). For networks of moderate size we have directly compared dynamic cavity and dynamic mean-ﬁeld to direct simulation. For several values of asymmetry and connectivity we ﬁnd that their convergence regions arevery similar, if not identical, but when both methods converge, then dynamic cavity is considerably more accurate,except in the low β limit where their performance is about the same. We have hence showed that dynamic cavitycan be useful new approximation to the dynamics of non-equilibrium spin systems – and any system which can befruitfully modeled by such methods.On the analytical side we have discussed the special status of fully asymmetric models, for which the cavity approachis in some sense exact. We have also re-derived the “dynamic TAP” equation of Hertz and Roudi [28, 29] using astraight-forward approach borrowed from Kappen and Spanjers’ treatment of the stationary state [18] clarifyingthat this approach is based on minimizing the distance, in the sense of Information Geometry, to the sub-family ofindependent (but time-changing) models. Whether such a perturbative argument can be extended to small deviationsfrom e.g. fully asymmetric models remains to be seen. Acknowledgment

We thank Marc M´ezard for important remarks, and Mikko Alava, Yoshiyuki Kabashima, Pekka Orponen andToshiyuki Tanaka for discussions. The work was supported by the Academy of Finland as part of its FinlandDistinguished Professor program, project 129024/Aurell.3

Appendix A: The Information Geometry calculation to second order

The following calculations are completely parallel to those in Appendix 1 of [18] and start from ∂m i ( t ) ∂θ j ( s ) | ind = δ s,t δ ij (1 − m i ( t )) (A1) ∂m i ( t ) ∂J jk | ind = δ ik (1 − m i ( t )) m j ( t −

1) (A2) ∂ m i ( t ) ∂θ j ( s ) ∂θ k ( s (cid:48) ) | ind = − m i ( t )(1 − m i ( t )) δ ij δ ik δ s,t δ s (cid:48) ,t (A3) ∂ m i ( t ) ∂J jk ∂θ l ( s ) | ind = − m i ( t )(1 − m i ( t )) m j ( t − δ ik δ il δ s,t +(1 − m i ( t ))(1 − m j ( t − δ ik δ kl δ s,t − (A4) ∂ m i ( t ) ∂J jk ∂J lm | ind = δ ik (1 − m i ( t ))(1 − m j ( t − δ lk m l ( t −

2) + ( jk ) ↔ ( lm ) − m i ( t ) δ ik (1 − m i ( t )) δ im (cid:0) m k ( t − m m ( t −

1) + δ km (1 − m k ( t − (cid:1) (A5)To ﬁrst order in (cid:15) (24) hence gives (cid:88) s,j δ s,t δ ij (1 − m i ( t ))∆ (1) j ( s ) + (cid:88) jk δ ij (1 − m i ( t )) m k ( t − J jk = 0 (A6)which is simply A (1) i ( t ) ≡ ∆ (1) i ( t ) + (cid:88) j J ji m j ( t −

1) = 0 (A7)This is the same as ”dynamic naive mean ﬁeld”tanh − ( m i ( t )) = θ i ( t ) + (cid:88) k J ki m k ( t −

1) + O ( (cid:15) ) (A8)The terms arising from second order derivatives and ﬁrst order increments can be grouped together as(1 − m i ( t ))  − m i ( t )( A (1) i ) ( t ) − (cid:88) j (1 − m j ( t − J ji A (1) j ( t − − m i ( t ) (cid:88) k J ki (1 − m k ( t −  (A9)which together with the ﬁrst order conditions (A7) and the term from the ﬁrst order derivative and second orderincrement (1 − m i ( t ))∆ (2) i ( t ) gives ∆ (2) i ( t ) = m i ( t ) (cid:88) k J ki (1 − m k ( t − − ( m i ( t )) = θ i ( t ) + (cid:88) k J ki m k ( t − − m i ( t ) (cid:88) k J ki m k ( t −

1) + O ( (cid:15) ) (A11)4 Appendix B: The Information Geometry calculation to third order

Third order contributions consist partly of terms involving lower than third order derivatives and higher than ﬁrstorder increments. The calculation of these use the same elements as above and are (cid:88) j,s ∂m i ( t ) ∂θ j ( s ) | ind∆ (3) j ( s ) = (1 − m i ( t ))∆ (3) i ( t ) (B1) (cid:88) j,s,k,s (cid:48) ∂ m i ( t ) ∂θ j ( s ) ∂θ k ( s (cid:48) ) | ind∆ (2) j ( s )∆ (1) k ( s (cid:48) ) = − m i ( t )(1 − m i ( t ))∆ (2) i ( t )∆ (1) i ( t ) (B2) (cid:88) j,k,l,s ∂ m i ( t ) ∂J jk ∂θ l ( s ) | ind J jk ∆ (2) l ( s ) = − m i ( t )(1 − m i ( t )) (cid:88) j m j ( t − J ji ∆ (2) i ( t )+(1 − m i ( t )) (cid:88) k (1 − m k ( t − J ki ∆ (2) k ( t −

1) (B3)where two terms can be combined to − m i ( t )(1 − m i ( t ))∆ (2) i ( t ) (cid:32) ∆ (1) i ( t ) + (cid:88) k J ki m k ( t − (cid:33) = 0 . (B4)The remainder is (1 − m i ( t )) (cid:32) ∆ (3) i ( t ) + (cid:88) k (1 − m k ( t − J ki ∆ (2) k ( t − (cid:33) (lower order terms) (B5)To proceed with the terms from third order derivatives and ﬁrst order increments it is useful to introduce thestreamlined notation m i = m i ( t ) m (cid:48) i = m i ( t − m (cid:48)(cid:48) i = m i ( t −

2) etc. (B6)and similar for all other quantities. It is also useful to note that though the derivatives act on the complete expres-sion involving both probability density P and the tanh they partially obey a chain rule when taken to act on themagnetizations alone: • a derivative with respect to an external ﬁeld θ j ( s ) functions as an ordinary derivative and obeys a chain rule; • a derivative with respect to an interaction coeﬃcient J kl acting on a once or more than once primed quantity,such as m (cid:48) i and m (cid:48)(cid:48) i , functions as an ordinary derivative and obeys the chain rule; • a derivative with respect to an interaction coeﬃcient J kl acting on an unprimed quantity such as m i must betreated apart, since this derivative will include a term taken on the tanh, which in turn will give a higher ordercorrelation.These rules allow us to continue from what has already been computed and write ∂m i ∂θ j ( s ) | ind = (1 − m i ) δ ij δ st ∂ m i ∂θ j ( s ) ∂θ k ( s (cid:48) ) | ind = − m i (1 − m i ) δ ij δ st δ ik δ s (cid:48) t ∂ m i ∂θ j ( s ) ∂θ k ( s (cid:48) ) ∂θ l ( s (cid:48)(cid:48) ) | ind = 2(1 − m i )(3 m i − δ ij δ st δ ik δ s (cid:48) t δ il δ s (cid:48)(cid:48) t ...5For the mixed terms we have similarly ∂m i ∂J jk | ind = (1 − m i ) δ ik m (cid:48) j ∂ m i ∂J jk ∂θ l ( s ) ∂ | ind = − m i (1 − m i ) δ il δ s,t δ ik m (cid:48) j + (1 − m i ) δ ik (1 − ( m (cid:48) j ) ) δ jl δ s,t − ∂ m i ∂J jk ∂θ l ( s ) ∂θ l (cid:48) ( s (cid:48) ) | ind = 2(1 − m i )(3 m i − δ il (cid:48) δ s (cid:48) ,t δ il δ s,t δ ik m (cid:48) j − m i (1 − m i ) δ il δ s,t δ ik (1 − ( m (cid:48) j ) ) δ l (cid:48) j δ s (cid:48) ,t − − m i (1 − m i ) δ il (cid:48) δ s (cid:48) ,t δ ik (1 − ( m (cid:48) j ) ) δ jl δ s,t − + (1 − m i ) δ ik ( − m (cid:48) j )(1 − ( m (cid:48) j ) ) δ jl δ s,t − δ jl (cid:48) δ s (cid:48) ,t − ...and ∂ m i ∂J jk ∂J lm | ind = δ ik (1 − m i )(1 − ( m (cid:48) j ) ) δ lk m (cid:48)(cid:48) l + ( jk ) ↔ ( lm ) − m i (1 − m i ) δ ik δ im ( m (cid:48) k m (cid:48) m + χ (cid:48) km )) ∂ m i ∂J jk ∂J lm ∂θ n ( s ) | ind = δ ik ( − m i (1 − m i ) δ in δ s,t (1 − ( m (cid:48) j ) ) δ lk m (cid:48)(cid:48) l + δ ik (1 − m i )( − m (cid:48) j (1 − ( m (cid:48) j ) ) δ jn δ s,t − δ lk m (cid:48)(cid:48) l + δ ik (1 − m i )(1 − ( m (cid:48) j ) ) δ lk (1 − ( m (cid:48)(cid:48) l ) ) δ ln δ s,t − + ( jk ) ↔ ( lm )+ 2(1 − m i )(3 m i − δ in δ s,t δ ik δ im < σ k ( t − σ m ( t − > − m i (1 − m i ) δ ik δ im (1 − ( m (cid:48) k ) ) δ kn δ s,t − m (cid:48) m − m i (1 − m i ) δ ik δ im m (cid:48) k (1 − ( m (cid:48) m ) ) δ mn δ s,t − − m i (1 − m i ) δ ik δ im ∂χ (cid:48) km ∂θ n ( s )...where we use the correlation function χ km = < σ k ( t ) σ m ( t ) > − m k m m . Its partial derivative with respect to anexternal ﬁeld is always zero, and the last term in above therefore vanishes. The more cumbersome term is threederivatives with respect to interaction coeﬃcients, which we can start from ∂ m i ∂ pq ∂J jk ∂J lm | ind = ∂∂J pq  (cid:80) σ ∂ P ( σ ) ∂J jk ∂J lm tanh( · )+ (cid:80) σ ∂P ( σ ) ∂J jk (1 − tanh ( · )) δ im σ l ( t − jk ) ↔ ( lm ) (cid:80) σ P ( σ )( − · ))(1 − tanh ( · )) δ im σ l ( t − δ ik σ j ( t −  (B7)Applying ∂J pq gives (at least conceptually) eight terms. The term from acting on ∂ P ( σ ) ∂J jk ∂J lm vanishes. The term fromacting on tanh( · ) in the ﬁrst line gives a second derivative with respect to interaction coeﬃcients of a magnetization.The terms from the second and the third line give combinations involving either second derivatives of a magnetization,or ﬁrst derivatives of a correlation function. The terms from the last line are a third order correlation function andfurther ﬁrst deritives of second order correlation functions.6Taking all together we can sum the contributions toThird order = 16 2(1 − m i )(3 m i − A (1) i ( t )) + 2 m i (1 − m i )∆ (1) i ( t ) (cid:88) l (1 − ( m (cid:48) l ) ) J li A (1) l ( t − − (1 − m i ) (cid:88) l J li ( A (1) l ( t )) + 12 2(1 − m i )(3 m i − A (1) i ( t ) (cid:88) lm J il J im χ lm − m i (1 − m i ) (cid:88) ln J li (1 − ( m (cid:48) l ) ) A (1) l ( t − J ni m (cid:48) n + (1 − m i ) (cid:88) ml J mi J lm (1 − ( m (cid:48) m ) )(1 − ( m (cid:48)(cid:48) l ) ) A (1) l ( t −

2) + ( m ) ↔ ( l ) − m i (1 − m i ) (cid:88) ln,js J li J ni (cid:18) ∂χ ln ( t − ∂θ j ( s ) (cid:19) ∆ (1) j ( s ) − m i (1 − m i ) (cid:88) ln,js J li J ni (cid:18) ∂χ ln ( t − ∂J pq (cid:19) J pq + circ. perm.+ 13 (1 − m i )(3 m i − (cid:88) lnq J li J ni J qi χ (cid:48) lnq (B8)where in the last line we have used χ lnq = < ( σ l ( t ) − m l )( σ n ( t ) − m n )( σ q ( t ) − m q ) > . All the terms in above containingthe ﬁrst order terms A (1) vanish, the partial derivative terms of the second order correlation function with respectto external ﬁeld vanish, and the last line is at least smaller than (cid:15) . The sole remaining terms hence come fromthe partial derivatives of second order correlation functions with respect to interaction parameters. These are modeldependent, and are evaluated to non-zero for the sequential update rule in [18]. For the parallel update rule which welook at here they are however zero. The collection of terms (B8) therefore evaluates to zero. [1] D. Thouless, P. Anderson, and R. Palmer, Phil Mag , 593 (1977).[2] G. Parisi, Statistical ﬁeld theory (Addison-Wesley, 1988).[3] T. Tanaka, Neural Computation , 1951 (2000).[4] J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding Belief Propagation and its Generalizations (Science & Tech-nology Book, 2003), p. 239269.[5] M. M´ezard and A. Montanari,

Information, Physics and Computation (Oxford University Press, 2009).[6] J. Pearl,

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, San Francisco,1988).[7] J. Pearl,

Causality : Models, Reasoning, and Inference (Cambridge University Press, 2000).[8] M. Opper and O. Winther, Physical Review E , 056131 (2001).[9] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, IEEE Transactions on Information Theory , 498 (1998).[10] U. Frisch, Turbulence (Cambridge University Press, 1995), ISBN 0-521-45713-0.[11] N. van Kampen,

Stochastic processes in physics and chemistry (Elsevier, 2007).[12] J. J. Hopﬁeld, Proc. Natl. Acad. Sci. , 2554 (1982).[13] R. A. Blythe and A. J. McKane, Journal of Statistical Mechanics: Theory and Experiment , P07018 (2007), URL http://stacks.iop.org/1742-5468/2007/i=07/a=P07018 .[14] B. Derrida, E. Gardner, and A. Zippelius, EPL (Europhysics Letters) , 167 (1987).[15] J. Hertz, G. Grinstein, and S. A. Solla, in Neural Networks for Computing AIP Conf Proc 151 , edited by J. Denker (1986).[16] J. Hertz, G. Grinstein, and S. A. Solla, in

Heidelberg Conference on Glassy Dynamics and Optimization , edited by I. Mor-genstern and J. L. van Hemmen (Springer Verlag, 1987).[17] A. Crisanti and H. Sompolinsky, Phys. Rev. A (1988).[18] H. Kappen and J. J. Spanjers, Physical Review E , 5658 (2000).[19] Y. Kanoria and A. Montanari, Annals of Applied Probability (2011), in press, [arXiv:0907.0449].[20] I. Neri and D. Boll´e, J. Stat. Mech. p. P08009 (2009).[21] E. Aurell and H. Mahmoudi, J. Stat. Mech. p. P04014 (2011).[22] E. Aurell and H. Mahmoudi, Communications in Theoretical Physics , 157 (2011). [23] H. Sompolinsky and A. Zippelius, Phys. Rev. B , 6860 (1982).[24] A. Crisanti and H. Sompolinsky, Phys. Rev. A , 49224939 (1987).[25] A. C. C. Coolen, S. N. Laughton, and D. Sherrington, Phys. Rev. B , 8184 (1996).[26] H. J. Sommers, Phys. Rev. Lett. , 1268 (1987).[27] J. P. L. Hatchett, B. Wemmenhove, I. P. Castillo, T. Nikoletopoulos, N. S. Skantzos, and A. C. C. Coolen, J. Phys. A:Math. Gen. (2004).[28] J. Hertz and Y. Roudi, Phys Rev Lett , 048702 (2011).[29] Y. Roudi and J. Hertz, J. Stat. Mech. p. P03031 (2011).[30] M. M´ezard and J. Sakellariou, Journal of Statistical Mechanics: Theory and Experiment p. L07001 (2011).[31] D. Sherrington and S. Kirkpatrick, Phys. Rev. Lett , 1792 (1975).[32] A. Kholodenko, Journal of Statistical Physics , 355 (1990).[33] T. Plefka, Journal of Physics A: Mathematical and general , 1971 (1982).[34] J. Sakellariou, Y. Roudi, M. Mezard, and J. Hertz, Arxiv preprint arXiv:1106.0452 (2011).[35] U. Marconi, A. Puglisi, L. Rondoni, and A. Vulpiani, Physics Reports , 111 (2008).[36] J. Hertz, Y. Roudi, and J. Tyrcha, Arxiv preprint arXiv:1106.1752 (2011).[37] S. Amari, S. Ikeda, and H. Shimokawa, Information geometry and mean ﬁeld approximation: the alpha-projection approach (MIT Press, 2001), pp. 241–257, ISBN 0-262-15054-9.[38] M. Mezard, G. Parisi, and M. Virasoro,

Spin Glass Theory and Beyond (World Scientiﬁc, Singapore, 1987).[39] M. M´ezard and G. Parisi, Eur. Phys. Journ. B20