A unifying framework for mean-field theories of asymmetric kinetic Ising systems
AA unifying framework for mean-field theories of asymmetric kinetic Ising systems
Miguel Aguilera,
1, 2, 3, ∗ S. Amin Moosavi,
4, 5 and Hideaki Shimazaki
4, 6 IAS-Research Center for Life, Mind, and Society, Department of Logic and Philosophy of Science,University of the Basque Country, Donostia, Spain. Department of Informatics & Sussex Neuroscience, University of Sussex, Brighton, UK. ISAAC Lab, Arag´on Institute of Engineering Research, University of Zaragoza, Zaragoza, Spain. Graduate School of Informatics, Kyoto University, Kyoto, Japan. Present address: Department of Neuroscience, Brown University, USA Present address: Center for Human Nature, Artificial Intelligence,and Neuroscience (CHAIN), Hokkaido University, Japan (Dated: January 27, 2021)Kinetic Ising models are powerful tools for studying the non-equilibrium dynamics of complexsystems. As their behavior is not tractable for large networks, many mean-field methods havebeen proposed for their analysis, each based on unique assumptions about the system’s temporalevolution. This disparity of approaches makes it challenging to systematically advance mean-fieldmethods beyond previous contributions. Here, we propose a unifying framework for mean-field theo-ries of asymmetric kinetic Ising systems from an information geometry perspective. The frameworkis built on Plefka expansions of a system around a simplified model obtained by an orthogonal pro-jection to a sub-manifold of tractable probability distributions. This view not only unifies previousmethods but also allows us to develop novel methods that, in contrast with traditional approaches,preserve the system’s correlations. We show that these new methods can outperform previous onesin predicting and assessing network properties near maximally fluctuating regimes.
INTRODUCTION
Advances in high-throughput data acquisition tech-nologies for very large biological and social systems areproviding unprecedented possibilities to investigate theircomplex, non-equilibrium dynamics. For example, opti-cal recordings from genetically modified neural popula-tions make it possible to simultaneously monitor activi-ties of the whole neural network of behaving C. elegans[1] and zebrafish [2], as well as thousands of neurons inthe mouse visual cortex [3]. Such networks generally ex-hibit out-of-equilibrium dynamics [4], and are often foundto self-organize near critical regimes at which their fluc-tuations are maximized [5, 6]. Evolution of such sys-tems cannot be faithfully captured by methods assum-ing an asymptotic equilibrium state. Therefore, in gen-eral, there is a pressing demand for mathematical tools tostudy the dynamics of large-scale, non-equilibrium com-plex systems and to analyze high-dimensional data setsrecorded from them.The kinetic Ising model with asymmetric couplings isa prototypical model for studying such non-equilibriumdynamics in biological [7, 8] and social systems [9]. Itis described as a discrete-time Markov chain of interact-ing binary units, resembling the nonlinear dynamics ofrecurrently connected neurons. The model exhibits non-equilibrium behavior when couplings are asymmetric orwhen model parameters are subject to rapid changes, rul-ing out quasi-static processes. These conditions induce atime reversal asymmetry in dynamical trajectories, lead-ing to positive entropy production (the second law of ∗ [email protected] thermodynamics) as revealed by the fluctuation theorem[10–15] (see [16, 17] for reviews). This time-asymmetryis characteristic of non-equilibrium systems as it can onlybe displayed by systems in which energy dissipation takesplace [18]. In the case of symmetric connections andstatic parameters, the model converges to an equilibriumstationary state. Therefore, it is a generalization of itsequilibrium counterpart known as the (equilibrium) Isingmodel [19].The forward Ising problem refers to calculating statis-tical properties of the model, such as mean activationrates (mean magnetizations of spins) and correlations,given the parameters of the model. In contrast, inferenceof the model parameters from data is called the inverseIsing problem [20]. In this regard, kinetic Ising models[21, 22] and their equilibrium counterparts [23–25] havebecome popular tools for modeling and analyzing biolog-ical and social systems. In addition, they capture mem-ory retrieval dynamics in classical associative networks.Namely, they are equivalent to the Boltzmann machine,extensively used in machine learning applications [20].Unfortunately, exact solutions of the forward and inverseproblems often become computationally too expensivedue to the combinatorial explosion of possible patternsin large networks or the high volume of data, and appli-cations of exact or sampling-based methods are limitedin practice to around a hundred of neurons [5, 25, 26].Therefore, analytical approximation methods are nec-essary for analysing large systems. In this endeavour,mean-field methods have emerged as powerful tools totrack down otherwise intractable statistical quantities.The standard mean-field approximations to studyequilibrium Ising models are the classical naive mean-field (nMF) and the more accurate Thouless-Anderson- a r X i v : . [ c ond - m a t . d i s - nn ] J a n Palmer (TAP) approximations [27]. These methods havealso been employed to solve the inverse Ising problem [28–31]. Plefka demonstrated that the nMF and TAP approx-imations for the equilibrium model can be derived usingthe power series expansion of the free energy around amodel of independent spins, a method which is now re-ferred to as the Plefka expansion [32]. This expansionup to the first and second orders leads to the nMF andTAP mean-field approximations respectively. The Plefkaexpansion was later formalized by Tanaka and others inthe framework of information geometry [33–37].In non-equilibrium networks, however, the free energyis not directly defined, and therefore it is not obvioushow to apply the Plefka expansions. Kappen and Span-jers [38] proposed an information geometric approach tomean-field solutions of the asymmetric Ising model withasynchronous dynamics. They showed that their second-order approximation for an asymmetric model in the sta-tionary state is equivalent to the TAP approximationfor equilibrium models. Later, Roudi and Hertz derivedTAP equations for non-stationary states using a Legen-dre transformation of the generating functional of the setof trajectories of the model [39]. Another study by Roudiand Hertz extended mean-field equations to provide ex-pressions for the non-stationary delayed correlations as-suming the presence of equal-time correlations at the pre-vious step [40]. Yet another interesting method proposedby M´ezard and Sakellariou approximates the local fieldsby a Gaussian distribution according to the central limittheorem, yielding more accurate results for fully asym-metric networks [41]. This method was later extended toinclude correlations at the previous time-step, improvingthe results for symmetric couplings [42]. More recently,Bachschmid-Romano et al. extended the path-integralmethods in [39] with Gaussian effective fields [43], notonly recovering [41] for fully asymmetric networks butalso proposing a method that better approximates meanrate dynamics by conserving autocorrelations of units.Although many choices of mean-field methods are avail-able, the diversity of methods and assumptions makes itchallenging to advance systematically over previous con-tributions.Here, we propose a unified approach for mean-field ap-proximations of the Ising model. While our method isapplicable to symmetric and equilibrium models, we fo-cus for generality on asymmetric kinetic Ising models.Our approach is defined as a family of Plefka expansionsin an information geometric space. This approach al-lows us to unify and relate existing mean-field methodsand provide expressions for other statistics of the sys-tems such as pairwise correlations. Furthermore, our ap-proach can be extended beyond classical mean-field as-sumptions to propose novel approximations. Here, weintroduce an approximation based on a pairwise modelthat better captures network correlations, and we showthat it outperforms existing approximations of kineticIsing models near a point of maximum fluctuations. Wealso provide a data-driven method to reconstruct and test if a system is near a phase transition by combiningthe forward and inverse Ising problems, and demonstratethat the proposed pairwise model more accurately esti-mates the system’s fluctuations and its sensitivity to pa-rameter changes. These results confirm that our unifiedframework is a useful tool to develop methods to analyzelarge-scale, non-equilibrium biological and social dynam-ics operating near critical regimes. In addition, since themethods are directly applicable to Boltzmann machinelearning, the geometrical framework introduced here isrelevant in machine learning applications.The paper is organized as follows. First, we intro-duce the kinetic Ising model and its statistical proper-ties of interest. Second, we introduce our framework forthe Plefka approximation methods from a geometric per-spective. To explain how it works, we derive the classicalnaive and TAP mean-field approximations under the pro-posed framework. Third, we show that our approach canunify other known mean-field approximation methods.We then propose a novel pairwise approximation underthis framework. Finally, we compare different mean-fieldapproximations in solving the forward and inverse Isingproblems, as well as in performing the data-driven as-sessment of the system’s sensitivity. The last section isdevoted to discussion.
RESULTSThe kinetic Ising model
The kinetic Ising model is the least structured statis-tical model containing delayed pairwise interactions be-tween its binary components (i.e., a maximum calibermodel [44]). The system consists of N interacting bi-nary variables (down or up of Ising spins or inactive oractive of neural units) s i,t ∈ {− , +1 } , i = 1 , , .., N ,evolving in discrete time steps t with parallel dynam-ics. Given the configuration of spins at t − s t − = { s ,t − , s ,t − , . . . , s N,t − } , spins s t at time t are con-ditionally independent random variables, updated as adiscrete-time Markov chain, following P ( s t | s t − ) = (cid:89) i e s i,t h i,t h i,t , (1) h i,t = H i + (cid:88) j J ij s j,t − . (2)The parameters H = { H i } and J = { J ij } represent localexternal fields at each spin and couplings between pairsof spins respectively. When the couplings are asymmet-ric (i.e, J ij (cid:54) = J ji ), the system is away from equilibriumbecause the process is irreversible with respect to time.Given the probability mass function of the previous state P ( s t − ), the distribution of the current state is: P ( s t ) = (cid:88) s t − P ( s t | s t − ) P ( s t − ) . (3)This marginal distribution P ( s t ) is not factorized (exceptat J = ), but it rather exhibits a complex statisticalstructure, generally containing higher-order spin interac-tions. We can apply this equation recursively, e.g. de-composing P ( s t − ) in terms of the distribution P ( s t − ),and trace the evolution of the system from the initialdistribution P ( s ).In this article, we use variants of the Plefka expan-sion to calculate some statistical properties of the system.Namely, we investigate the average activation rates m t ,correlations between pairs of units (covariance function) C t , and delayed correlations D t given by m i,t = (cid:88) s t s i,t P ( s t ) , (4) C ik,t = (cid:88) s t s i,t s k,t P ( s t ) − m i,t m k,t , (5) D il,t = (cid:88) s t , s t − s i,t s l,t − P ( s t , s t − ) − m i,t m l,t − . (6)Note that m t and D t are sufficient statistics of the ki-netic Ising model. Therefore, we will use them in solvingthe inverse Ising problem (see Methods). We additionallyconsider the equal-time correlations C t as they are com-monly used to describe the systems, and are investigatedby some of the mean-field approximations in the litera-ture [40]. Calculation of these expectation values is ana-lytically intractable and computationally very expensivefor large networks, due to the combinatorial explosion ofthe number of possible states. To reduce this computa-tional cost, we approximate the marginal probability dis-tributions (Eq. 3) by the Plefka expansion method thatutilizes an alternative, tractable distribution. Geometrical approach to mean-field approximation
Information geometry [37, 45, 46] provides clear ge-ometrical understanding of information-theoretic mea-sures and probabilistic models [15, 47, 48]. Using the lan-guage of information geometry, we introduce our methodfor mean-field approximations of kinetic Ising systems.Let P t be the manifold of probability distributions attime t obtained from Eq. 3. Each point on the manifoldcorresponds with a set of parameter values. The mani-fold P t contains sub-manifolds Q t of probability distri-butions with analytically tractable statistical properties(See Fig. 1). We use this tractable manifold, i.e., a refer-ence model, to approximate a target point P ( s t | H , J ) inthe manifold P t and its statistical properties m t , C t , D t .The simplest sub-manifold Q t is the manifold of inde-pendent models, used in classical mean-field approxima-tions to compute average activation rates. Each point onthis sub-manifold corresponds to a distribution Q ( s t | Θ t ) = (cid:89) i e s i,t Θ i,t i,t , (7) where Θ t = { Θ i,t } is the vector of parameters that rep-resents a point in Q t . This distribution does not includecouplings between units, and its average activation rateis immediately given as m i,t = tanh Θ i,t .Our first goal is to find the average activation rates ofthe target distribution P ( s t | H , J ). It turns out that theycan be obtained from the independent model Q ( s t | Θ t )that minimizes the following Kullback-Leibler (KL) di-vergence from P ( s t ): D ( P || Q ) = (cid:88) s t P ( s t ) log P ( s t ) Q ( s t ) . (8)The independent model Q ( s t | Θ ∗ t )( ≡ Q ∗ ( s t )) that mini-mizes the KL divergence has activation rates m t identicalto those of the target distribution P ( s t ) [38] because theminimizing points Θ ∗ i,t satisfy (for i = 1 , . . . , N ) ∂D ( P || Q ) ∂ Θ i,t (cid:12)(cid:12)(cid:12)(cid:12) Θ t = Θ ∗ t = − (cid:88) s t s i,t P ( s t ) + tanh Θ ∗ i,t = m Q ∗ i,t − m Pi,t = 0 , (9)where m Pi,t and m Q ∗ i,t are respectively expectation valuesof s i,t by P ( s t ) and Q ( s t | Θ ∗ t ). As these values are equal,for the rest of the paper we will drop their superscriptsand just write m i,t for simplicity. Note that the result ofthis approximation is indifferent of the system’s correla-tions. Later in the paper we will consider approximationsthat take into account pairwise correlations.From an information geometric point of view, given m t (or Θ ∗ t ), we may consider a family of points definedas a linear mixture of P ( s t ) and Q ( s t | Θ ∗ t ) for which m t is kept constant (the dashed line A in Fig. 1). This isknown as an m-geodesic, and it is orthogonal to the e-flat manifold Q t , constituting an m-projection to thismanifold [37, 47]. Thus, the previous search of Q ( s t | Θ ∗ t )given P ( s t | H , J ) is equivalent to finding the orthogonalprojection point from P ( s t | H , J ) to the manifold Q t ofindependent models [36, 37]. The Plefka expansion
Although the m-projection provides the exact andunique average activation rates, its calculation in practicerequires the complete distribution P ( s t ). In the Plefkaexpansion, we relax the constraints of the m-projection,and introduce another set of more tractable distributionsthat passes only through P ( s t | H , J ) and Q ( s t | Θ ∗ t ) (thesolid line in Fig. 1). This distribution is defined using anew conditional distribution introducing a parameter α that connects a distribution on the manifold Q t with the Q t Q * ( s t ) P ( s t ) A P α ( s t ) α =1 α =0 FIG. 1.
A geometric view of the approximations basedon Plefka expansions . The point P ( s t ) is the marginal dis-tribution of a kinetic Ising model at time t . The submanifold Q t is a set of tractable distributions, for example a manifoldof independent models. The points in A correspond to a m-geodesic, that is a linear mixture of P ( s t ) and Q ∗ ( s t ) on Q t ,where for independent Q t all points on A share the same meanvalues m t . Geometrically, A constitutes the m-projectionfrom P ( s t ) to Q t , defining Q ∗ ( s t ) as the closest point in thesubmanifold Q t to the point P ( s t ) [47]. The Plefka expansionis defined by expanding an α -dependent distribution P α ( s t )that satisfies P α =0 ( s t ) = Q ∗ ( s t ) and P α =1 ( s t ) = P ( s t ). original distribution P ( s t ): P α ( s i,t | s t − ) = e s i,t h i,t ( α ) h i,t ( α ) , (10) h i,t ( α ) =(1 − α )Θ i,t + α ( H i + (cid:88) j J ij s j,t − ) . (11)At α = 0, P α =0 ( s t | s t − ) = Q ( s t | Θ t ), and α = 1 leads to P α =1 ( s t | s t − ) = P ( s t | s t − ). Using this alternative con-ditional distribution P α ( s i,t | s t − ), we construct an ap-proximate marginal distribution P α ( s t ). Consequently,expectation values with respect to P α ( s t ) are functionsof α . We thus write the statistics of the approximatesystem as m t ( α ), C t ( α ), and D t ( α ).The Plefka expansions of these statistics are definedas the Taylor series expansion of these functions around α = 0. In the case of the mean activation rate, theexpansion up to the n th-order leads to: m t ( α ) = m t ( α = 0) + n (cid:88) k =1 α k k ! ∂ k m t ( α = 0) ∂α k + O ( α ( n +1) ) , (12)where O ( α ( n +1) ) stands for the residual error of the ap-proximation of order n + 1 and higher. For the n th-order approximation, we neglect the residual terms as O ( α ( n +1) ) (cid:12)(cid:12) α =1 ≈
0. Note that all coefficients of expan-sion are functions of Θ t . The mean-field approximation is computed by setting α = 1 and finding the value of Θ ∗ t that satisfies Eq. 12. Note that since the originalmarginal distribution is recovered at α = 1, the equalityof Eq. 9 holds: m t ( α = 1) = m t ( α = 0). Then, we have n (cid:88) k =1 k ! ∂ k m t ( α = 0) ∂α k = 0 , (13)which should be solved with respect to the parameters Θ t . Since we neglected the terms higher than the n -thorder, the solution may not lead to the exact projection, Q ( s t | Θ ∗ t ). In this study, we investigate the first ( n =1) and second ( n = 2) order approximations. Moreoverwe can apply the same expansion to approximate thecorrelations C t and D t , using Eq. 10.What is the difference between this approach and othermean-field methods? Conventionally, naive mean-fieldapproximations are obtained by minimizing D ( Q || P ) asopposed to D ( P || Q ) (Eq. 8) [36, 49]. This approachis typically used in variational inference to constructa tractable approximate posterior in machine learningproblems. Following the Bogolyubov inequality, mini-mizing this divergence is equivalent to minimizing thevariational free energy. Geometrically, it comprisesan e-projection of P ( s t | H , J ) to the sub-manifold Q t ,which does not result in Q ( s t | Θ ∗ t ). Namely, minimizing D ( Q || P ), as well as minimization of other α -divergencesexcept for D ( P || Q ), introduces a bias in the estimation ofthe mean-field approximation [36, 37]. In contrast, if weconsider the m-projection point that minimizes D ( P || Q ),we can approximate the exact value of m t using Eq. 12up to an arbitrary order.In the subsequent sections we show that differentapproximations of the marginal distribution P ( s t ) inEq. 3 can be constructed by replacing P ( s i,τ | s τ − ) with P α ( s i,τ | s τ − ) for different pairs i, τ (here we will explorethe cases of τ = t and τ = t − k , P ( s t − k +1 , . . . , s t ). In addition, we are not restrictedto manifolds of independent models. The independentmodel is adopted as a reference model to approximatethe average activation rate, but one can also approximatecorrelations using this method. In this vein, we can ex-tend our framework to use reference manifolds Q t − k +1: t (of models Q ( s t − k +1 , . . . , s t )) that include interactions,e.g., pairwise couplings between elements at two differ-ent time points, to more accurately approximate the de-layed correlations (see Supplementary Note 1). By sys-tematically defining these reference distributions, we willprovide a unified approach to derive Plefka approxima-tions of m t , C t , and D t , including the one that utilizesa pairwise structure. Plefka[ t − , t ]: Expansion around independentmodels at times t − and t Before elaborating different mean-field approxima-tions, we demonstrate our method by deriving the knownresults of the classical nMF and TAP approximations forthe kinetic Ising model [38, 39]. In order to derive theseclassical mean-field equations, we make a Plefka expan-sion around the points Θ ∗ t and Θ ∗ t − that are respectivelyobtained by orthogonal projection to the independentmanifolds Q t and Q t − , computed as in Eq. 9. Note thatassuming an approximation where previous distributions(e.g., t − , t − , . . . ) are also independent yields exactlythe same result. In this way, we derive the nMF and TAPequations of a model defined by a marginal probabilitydistribution P [ t − t ] α . Using Eqs. (3) and (10), we write P [ t − t ] α ( s t ) = (cid:88) s t − s t − P α ( s t | s t − ) P α ( s t − | s t − ) P ( s t − ) , (14)where P [ t − t ] α =0 ( s t ) = Q ( s t ) and the original distributionis recovered for P [ t − t ] α =1 ( s t ) = P ( s t ).Following Eq. 13, for the first order approximation wehave ∂m i,t ( α =0) ∂α = 0. Since the derivative of the firstorder moment is ∂m i,t ( α = 0) ∂α =(1 − m i,t )( − Θ i,t + H i + (cid:88) j J ij m j,t − ) . (15)By solving the equation, we find Θ ∗ i,t ≈ H i + (cid:80) j J ij m j,t − that leads to the naive mean-field approximation: m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − ] . (16)We apply the same expansion to approximate the cor-relations, expanding C ik,t ( α ) and D il,t ( α ) around α = 0up to the first order using Θ i,t = Θ ∗ i,t . Then we obtain C ik,t ≈ , i (cid:54) = k, (17) D il,t ≈ J il (1 − m i,t )(1 − m l,t − ) . (18)Detailed calculations are presented in SupplementaryNote 2.To obtain the second order approximation, we need tosolve ∂m i ( α =0) ∂α + ∂ m i ( α =0) ∂α = 0 from Eq. 13. Here thesecond order derivative is given as ∂ m i,t ( α = 0) ∂α ≈ − m i,t (1 − m i,t ) (cid:88) j J ij (1 − m j,t − ) , (19)where terms of the order higher than quadratic were ne-glected (see Supplementary Note 2 for further details). From these equations, we find Θ ∗ i,t ≈ H i + (cid:80) j J ij m j,t − − m i,t (cid:80) j J ij (1 − m j,t − ) leading to the TAP equation: m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − − m i,t (cid:88) j J ij (1 − m j,t − )] . (20)Having Θ ∗ i,t , we can incorporate TAP approximations ofthe correlations by expanding C ik,t ( α ) and D il,t ( α ) (seeSupplementary Note 2 for details) as: C ik,t ≈ (1 − m i,t )(1 − m k,t ) (cid:88) j J ij J kj (1 − m j,t − ) ,i (cid:54) = k, (21) D il,t ≈ J il (1 − m i,t )(1 − m l,t − )(1 + 2 J il m i,t m l,t − ) . (22)In these approximations, Eqs. 16 and 20 of activationrates m t correspond to the classical nMF and TAP equa-tions of the kinetic Ising model [38, 39]. The mean-field equations for the equal-time and delayed correla-tions (Eqs. 17, 18 and 21, 22) are novel contributionsfrom applying the Plefka expansion to correlations.We note that, using the equations above, we can com-pute the approximate statistical properties of the systemat t ( m t , C t , D t ) from m t − . Therefore, the system evo-lution is described by recursively computing m t from aninitial state m (for both transient and stationary dy-namics), although approximation errors accumulate overthe iterations. After we introduce a unified view of mean-field approximations in the subsequent sections, we willnumerically examine approximation errors of these vari-ous methods in predicting statistical structure of the sys-tem. Generalization of mean-field approximations
In the previous section, we described a Plefka approx-imation that uses a model containing independent unitsat time t − t to construct a marginal probabil-ity distribution P [ t − t ] α ( s t ). This is, however, not theonly possible choice of approximation. As we mentionedabove, other approximations have been introduced in theliterature. In [40], expressions are provided for the non-stationary delayed correlations D t as a function of C t − .In [41], an approximation is derived by assuming thatunits at state s t − are independent while correlations of s t are preserved.In the following sections, we show that various approx-imation methods, including those mentioned above, canbe unified as Plefka expansions. Each method of the ap-proximation corresponds to a specific choice of the sub-manifold Q t at each time step. Fig. 2 show the cor-responding sub-manifolds Q t − t of possible approxima-tions, where gray lines represent interactions that areaffected by α in the Plefka expansion. The mean-fieldapproximations in the previous section were obtained by s i,t s t – s t – s i,t s t – s t – B Plefka[ t – ,t ] s i,t s t – s t – C Plefka[ t ] s i,t s t – s t – D Plefka[ t – ] E Plefka2[ t ] A Original model s i,t s t – s t – s l,t – s i,t s t – s t – s k,t FIG. 2.
Unified mean-field framework . Original model (A) and family of generalized Plefka expansions (B-E). Gray linesrepresent connections that are proportional to α and thus removed in the approximated model to perform the Plefka expansions,while solid black lines are conserved and dashed lines are free parameters. Plefka[ t − , t ] (B) retrieves the classical mean-fieldequations [38, 39]. Plefka[ t ] (C) results in a novel method which preserves correlations of the system at t −
1, incorporatingequations similar to [40]. Plefka[ t −
1] (D) assumes independent activity at t-1, and in its first order approximation reproducesthe results in [41]. Plefka2[ t ] (E) represents a novel pairwise approximation which performs better in estimating correlations. using the model represented in Fig. 2B, where the cou-plings at time t − t are affected by α . Below, wepresent systematic applications of the Plefka expansionsaround other reference models in order to approximatethe original distribution (Fig. 2C-E). By doing so, we notonly unify the previously reported mean-field approxima-tions but also provide novel solutions that can providemore precise approximations than known methods. Plefka[ t ]: Expansion around an independent modelat time t For the Plefka[ t − , t ] approximation, explained above,the system becomes independent for α = 0 at t as well as t −
1. This leads to approximations of m t , C t , D t beingspecified by m t − , while being independent of C t − and D t − . In [40], the authors describe a mean-field approx-imation by performing new expansion over the classicalnMF and TAP equations that takes into account previ-ous correlations C t − . Here, our framework allows usto obtain similar results by considering only a Plefka ex-pansion over manifold Q t while assuming that we knowthe properties of P ( s t − ) (Fig. 2C). Therefore, we denotethis approximation as P [ t ] α and consider P [ t ] α ( s t ) = (cid:88) s t − P α ( s t | s t − ) P ( s t − ) . (23) In Supplementary Note 3 we derive the equations for thisapproximation. For the first order, we obtain m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − ] , (24) C ik,t ≈ , i (cid:54) = k, (25) D il,t ≈ (1 − m i,t ) (cid:88) j J ij C jl,t − . (26)Note that Eqs. 24 and 25 are the same as the nMFPlefka[ t − , t ] equations. Eq. 26 includes C t − , beingexactly the same result obtained in [40, Eq. 4]. The sec-ond order approximations leads to: m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − − m i,t (cid:88) jl J ij J il C jl,t − ] , (27) C ik,t ≈ (1 − m i,t )(1 − m k,t ) (cid:88) jl J ij J kl C jl,t − , i (cid:54) = k, (28) D il,t ≈ (1 − m i,t ) (cid:88) j J ij C jl,t − (1 + 2 J il m i,t m l,t − ) . (29)All update rules include the effect of C t − . We can seethat if we use the covariance matrix of the independentmodel at t −
1, we recover the results of P [ t − t ] α in theprevious section. In contrast with [40], we provide anovel approximation method that depends on previouscorrelations using a single expansion (instead of two sub-sequent expansions), and additionally present approxi-mated equal-time correlations. Plefka[ t − ]: Expansion around an independentmodel at time t − In [41], a mean-field method is proposed by approxi-mating the effective field h t as the sum of a large numberof independent spins, approximated by a Gaussian dis-tribution, yielding exact results for fully asymmetric net-works in the thermodynamic limit. In our framework, wedescribe this approximation as an expansion around theprojection point from P ( s t − ) to the submanifold Q t − ,using a model where only s t − are independent (Fig. 2D).In this case (see Supplementary Note 4), the effective field h t at the submanifold is a sum of independent terms,which for large N yields a Gaussian distribution.By defining P [ t − α ( s t , s t − ) = (cid:88) s t − P ( s t | s t − ) P α ( s t − | s t − ) P ( s t − ) , (30)we see that now the expansion is defined for the marginaldistribution of the path { s t − , s t } (see SupplementaryNote 1). The first order equations for this method are m i,t ≈ (cid:90) D x tanh[ H i + (cid:88) j J ij m j,t − + x (cid:112) ∆ i,t ] , (31) C ik,t ≈ (cid:90) D ρ ik xy tanh[ H i + (cid:88) j J ij m j,t − + x (cid:112) ∆ i,t ] · tanh[ H k + (cid:88) l J kl m l,t − + y (cid:112) ∆ j,t ] − m i,t m k,t , i (cid:54) = k, (32) D il,t ≈ (cid:88) j J ij C jl,t − (cid:90) D x (cid:0) − tanh [ H i + (cid:88) j J ij m j,t − + x (cid:112) ∆ i,t ] (cid:1) . (33)Here we use D x = d x √ π exp (cid:0) − x (cid:1) , D ρ ik xy = d x d y π √ − ρ ik exp (cid:16) −
12 ( x + y ) − ρ ik xy − ρ ik (cid:17) , ∆ i,t = (cid:80) j J ij (1 − m j,t − ) and ρ xy = (cid:80) j J ij J kj (1 − m j,t − ) / (cid:112) ∆ i,t ∆ j,t .Derivations are described in Supplementary Note 4.These results are exactly the same as those presentedfor m t , D t in [41], adding an additional expression for C t . For this approximation, we do not consider the sec-ond order equations since they are computationally muchmore expensive than the other approximations. Plefka2[ t ]: Expansion around a pairwise model The proposed framework is also a powerful tool todevelop novel Plefka expansions. To make the expan-sions more accurately approximate target statistics, wecan consider a reference manifold composed of multipletime steps while maintaining some of the parameters inthe system (see Supplementary Note 1). Motivated bythis idea, here we propose new methods that directly ap-proximate pairwise activities of the units by choosing areference manifold that preserves a coupling term.Let us first consider the joint probability of any arbi-trary pair of units at time t − t to compute thedelayed correlations (Fig. 2E, left). Namely, we considerthe joint probability of spins s i,t and s l,t − : P ( s i,t , s l,t − ) = (cid:88) s \ l,t − s t − P ( s i,t | s t − ) P ( s t − | s t − ) P ( s t − ) , (34)with s \ l,t − containing all elements of s t − except s l,t − .As a reference manifold Q t − t , we consider the depen-dency among only the units i and l : Q ( s i,t , s l,t − ) = Q ( s i,t | s l,t − ) Q ( s l,t − )= e s i,t θ i,t ( s l,t − ) θ i,t ( s l,t − ) e s l,t − Θ l,t − l,t − , (35)where θ i,t ( s l,t − ) = Θ i,t + ∆ il,t s l,t − . Orthogonal projec-tion to Q t is equivalent to minimizing the KL divergence D ( P || Q ) with respect to the parameters: ∂D ( P || Q ) ∂ Θ i,t (cid:12)(cid:12)(cid:12)(cid:12) θ t = θ ∗ t Θ t − = Θ ∗ t − = m Q ∗ i,t − m Pi,t = 0 , (36) ∂D ( P || Q ) ∂ Θ l,t − (cid:12)(cid:12)(cid:12)(cid:12) θ t = θ ∗ t Θ t − = Θ ∗ t − = m Q ∗ l,t − − m Pl,t − = 0 , (37) ∂D ( P || Q ) ∂ ∆ il,t (cid:12)(cid:12)(cid:12)(cid:12) θ t = θ ∗ t Θ t − = Θ ∗ t − = (cid:104) s i,t s l,t − (cid:105) ( Q ∗ · P ) − (cid:104) s i,t s l,t − (cid:105) P = 0 . (38)with (cid:104) x (cid:105) ( Q ∗ · P ) = (cid:88) s i,t s l,t − x Q ( s i,t | s l,t − , θ ∗ t , Θ ∗ t − ) P ( s l,t − ) . (39)As in the previous approximations, P ( s i,t , s l,t − ) is con-nected to Q ( s i,t , s l,t − | θ ∗ t , Θ ∗ t − ) through an α -dependentprobability P t ] α ( s i,t , s l,t − ) = (cid:88) s \ l,t − s t − P α ( s i,t | s t − ) P α ( s l,t − | s t − ) · P ( s \ l,t − | s t − ) P ( s t − ) , (40)with conditional probabilities given by P α ( s i,t | s t − ) = e s i,t h i,t ( α ) h i,t ( α ) , (41) h i,t ( α ) =(1 − α ) θ i,t ( s l,t − )+ α ( H i + (cid:88) j J ij s j,t − ) ,P α ( s l,t − | s t − ) = e s l,t − h l,t − ( α ) h l,t − ( α ) , (42) h l,t − ( α ) =(1 − α )Θ l,t − + α ( H l + (cid:88) n J ln s n,t − ) . As in the cases above, we can calculate the equations forthe first and second order approximations (see Supple-mentary Note 5). Here, for the second order approxima-tion (which is more accurate than the first order) we havethat: θ ∗ i,t ( s l,t − ) ≈ H i + (cid:88) j J ij m j,t − + J il ( s l,t − − m l,t − )+ (cid:0) (cid:88) j (cid:54) = l,n J ij J ln D jn,t − (cid:1) ( s l,t − − m l,t − ) − tanh θ ∗ i,t ( s l,t − ) (cid:88) jn (cid:54) = l J ij J in C jn,t − , (43)Θ ∗ l,t − ≈ H l + (cid:88) n J ln m n,t − − m l,t − (cid:88) mn J lm J ln C mn,t − , (44)which directly leads to calculation of means and delayedcorrelations as: m i,t ≈ (cid:88) s l,t − tanh θ ∗ i,t ( s l,t − ) Q ∗ ( s l,t − ) , (45) m l,t − ≈ tanh Θ ∗ l,t − , (46) D il,t ≈ (cid:88) s l,t − tanh θ ∗ i,t ( s l,t − ) s l,t − Q ∗ ( s l,t − ) − m i,t m l,t − . (47)These results are related to previous work [43] that in-cluded autocorrelations as one of the constraints to de-rive the Plefka approximation. Instead, here we provide aPlefka approximation that includes delayed correlationsbetween any pair of units.To compute the above approximations, we need toknow C t − and C t − . Here, we provide similar pair-wise Plefka approximations for the pairwise distributionat time t , P ( s i,t , s k,t ). Since s i,t , s k,t are conditionallyindependent, we can construct a model in which first s k,t is computed from s t − , and then s i,t is computed condi-tioned on s k,t , s t − (Fig. 2E, right): Q ( s i,t , s k,t ) = Q ( s i,t | s k,t ) Q ( s k,t ) , (48) P t ] α ( s i,t , s k,t ) = (cid:88) s t − P α ( s i,t | s k,t , s t − ) (49) · P α ( s k,t | s t − ) P ( s t − ) , (50) with conditional probabilities given by P α ( s i,t | s k,t , s t − ) = e s i,t h i,t ( α ) h i,t ( α ) , (51) h i,t ( α ) =(1 − α ) θ i,t ( s k,t )+ α ( H i + (cid:88) j J ij s j,t − ) ,P α ( s k,t | s t − ) = e s k,t h k,t ( α ) h k,t ( α ) , (52) h k,t ( α ) =(1 − α )Θ k,t + α ( H k + (cid:88) l J kl s l,t − ) . Note that θ i,t is a function of s k,t that accounts for equal-time correlations between s i,t and s k,t . Computed sim-ilarly to delayed correlations, the second order approxi-mation yields (see Supplementary Note 5): θ ∗ i,t ( s k,t ) ≈ H i + (cid:88) j J ij m j,t − + ( (cid:88) jl J ij J kl C jl,t − )( s k,t − m k,t ) − tanh θ ∗ i,t ( s k,t ) (cid:88) jl J ij J il C jl,t − , (53)Θ ∗ k,t ≈ H k + (cid:88) l J kl m l,t − − m k,t (cid:88) jl J kj J kl C jl,t − . (54)Using these equations, approximate equal-time correla-tions are given as C ik,t ≈ (cid:88) s k,t tanh θ ∗ i,t ( s k,t ) s k,t Q ∗ ( s k,t ) − m i,t m k,t . (55)Note that the approximation of equal-time correlationsmay not be symmetric for C ik,t and C ki,t . In the resultsof this paper we use the average of the two. Comparison of the different approximations
In the subsequent sections, we compare the family ofPlefka approximation methods described above by test-ing their performance in the forward and inverse Isingproblems. More specifically, we compare the second or-der approximations of Plefka[ t − , t ] and Plefka[ t ], thefirst order approximation of Plefka[ t − t ]. We de-fine an Ising model as an asymmetric version of the ki-netic Sherrington-Kirkpatrick (SK) model, setting its pa-rameters around the equivalent of a ferromagnetic phasetransition in the equilibrium SK model. External fields H i are sampled from independent uniform distributions U ( − βH , βH ), H = 0 .
5, whereas coupling terms J ij are sampled from independent Gaussian distributions t . . . . . h m i,t i A t . . . . h C ik,t i B t . . . . h D il,t i C − .
25 0 .
00 0 .
25 0 .
50 0 . m o . . m p D .
005 0 .
010 0 .
015 0 . C o . . . . C p E .
00 0 .
01 0 .
02 0 . D o . . . D p F . . . β/β c − − − (cid:15) m G . . . β/β c − − − (cid:15) C H . . . β/β c − − (cid:15) D I Original Plefka[ t − , t ] Plefka[ t ] Plefka[ t −
1] Plefka2[ t ] FIG. 3.
Forward Ising problem . Top: Evolution of average activation rates (magnetizations) (A), equal-time correlations (B)and delayed correlations (C) found by different mean-field methods for β = β c . Middle: Comparison of the activation rates (D),equal-time correlations (E) and delayed correlations (F) found by the different Plefka approximations (ordinate, p superscript)with the original values (abscissa, o superscript) for β = β c and t = 128. Black lines represent the identity line. Bottom:Average squared error of the magnetizations (cid:15) m = (cid:104)(cid:104) ( m oi,t − m pi,t ) (cid:105) i (cid:105) t (G), equal-time correlations (cid:15) C = (cid:104)(cid:104) ( C oik,t − C pik,t ) (cid:105) ik (cid:105) t (H), and delayed correlations (cid:15) D = (cid:104)(cid:104) ( D oik,t − D pik,t ) (cid:105) il (cid:105) t (I) for 21 values of β in the range [0 . β c , . β c ]. N ( β J N , β J σ N ), J = 1 , J σ = 0 .
1, where β is a scalingparameter (i.e., an inverse temperature).Generally, mean-field methods are suitable for approx-imating properties of systems with small fluctuations.However, there is evidence that many biological systemsoperate in critical, highly fluctuating regimes [5, 6]. Inorder to examine different approximations in such a bi-ologically plausible yet challenging situation, we selectthe model parameters around a phase transition pointdisplaying large fluctuations.To find such conditions, we employed path integralmethods to solve the asymmetric SK model (Supplemen-tary Note 6). We find that the stationary solution ofthe asymmetric model displays for our choice of param-eters a non-equilibrium analogue of a critical point fora ferromagnetic phase transition, which takes place at β c ≈ . H shifts the phase transition pointfrom β = 1 obtained at H = . By simulation of the finite size system, we confirmed that the maximum fluc-tuations in the model are found near the theoretical β c ,which shows maximal covariance values (see Supplemen-tary Note 6, Supplementary Fig. 2).Fluctuations of a system are generally expected to bemaximized at a critical phase transition [19]. In addition,entropy production (a signature of time irreversibility)has been suggested as an indicator of phase transitions.For example, it presents a peak at the transition point ofa continuous phase transition in a non-equilibrium Curie-Weiss Ising model with oscillatory field [50] and someinstances of mean-field majority vote models [51, 52]. Wefound that the entropy production of the kinetic Isingsystem is also maximized around β c (discussed later, seealso Methods for its derivation).0 Forward Ising problem
We examine the performance of the different Plefkaexpansions in predicting the evolution of an asymmet-ric SK model of size N = 512 with random H and J .To study the non-stationary transient dynamics of themodel, we start from s = (all elements set to 1 at t = 0) and recursively update its state for T = 128 steps.We repeated this stochastic simulation for R = 10 trialsfor 21 values of β in the range [0 . β c , . β c ], except forthe reconstruction of the phase transition where we used R = 10 and 201 values of β in the same range. Usingthe R samples, we computed the statistical moments andcumulants of the system, m t , C t , and D t at each timestep. We then computed their averages over the systemunits, i.e., (cid:104) m i,t (cid:105) i , (cid:104) C ik,t (cid:105) ik and (cid:104) D il,t (cid:105) il , where the anglebracket denotes average over indices of its subscript.The black solid lines in Fig. 3A,B,C display non-stationary dynamics of these averaged statistics from t = 0 , . . . , β = β c .In comparison, color lines display these statistics pre-dicted by the family of Plefka approximations that arerecursively computed using the obtained equations, start-ing from the initial state m = , C = and D = .We observe that although the recursive application ofall the approximation methods provides good predictionsfor the transient dynamics of the mean activation rates m t until its convergence (Fig. 3A), the predictions usingPlefka[ t ] and especially the proposed Plefka2[ t ] approxi-mations are closer to the true dynamics than the others.Evolution of the mean equal-time and time-delayed cor-relations C t , D t is precisely captured only by our newmethod Plefka2[ t ]. In contrast, Plefka[ t ] overestimatescorrelations while Plefka[ t −
1] and Plefka[ t − , t ] under-estimate correlations.Performance of the methods in predicting individualactivation rates and correlations are displayed in Fig. 3D,E, F by comparing vectors m t , C t and D t at the last timestep ( t = 128) of the original model ( o superscript) andthose of the Plefka approximations ( p superscript). Foractivation rates m t , the proposed Plefka2[ t ] and Plefka[ t ]perform slightly better than the others (see also Fig. 3A).While being overestimated by Plefka[ t ], underestimatedmoderately by Plefka[ t −
1] and significantly by Plefka[ t − , t ], equal-time and time-delayed correlations C t , D t arebest predicted by Plefka2[ t ] (Fig. 3E,F).The above results are obtained at the critical β = β c ,intuitively the most challenging point for mean-field ap-proximations. In order to further show that our novelapproximation Plefka2[ t ] systematically outperforms theothers in a wider parameter range, we repeated the anal-ysis for different inverse temperatures β (the same ran-dom parameters are applied for all β ). Fig. 3G,H,U re-spectively show the averaged squared errors (averagedover time and units) of the activation rates (cid:15) m , equal-time correlations (cid:15) C and delayed correlations (cid:15) D betweenthe original model and approximations, averaged overunits and time for values of β in the range [0 . β c , . β c ]. Figs. 3G,H,I show that Plefka2[ t ] outperforms the othermethods in computing m t , C t , D t (with the exception ofa certain region of β > β c in which Plefka[ t ] is slightlybetter), yielding consistently a low error bound for allvalues of β . Errors of these approximations are smallerwhen the system is away from β c . Inverse Ising problem
We apply the approximation methods to the inverseIsing problem by using the data generated above for thetrajectory of T = 128 steps and R = 10 trials to infer theparameters of the model, H and J . The model param-eters are estimated by the Boltzmann learning methodunder the maximum likelihood principle: H and J areupdated to minimize the differences between the aver-age rates m t or delayed correlations D t of the originaldata and the model approximations, which can signifi-cantly reduce computational time (see Methods). WhileBoltzmann learning requires to compute the likelihoodof every point in a trajectory and every trial ( RT cal-culations) each iteration, we can estimate the gradientat each iteration in a one-shot computation by apply-ing the Plefka approximations (Methods). At β = β c (Fig. 4A,B), we observe that the classical Plefka[ t − , t ]approximation adds significant offset values to the fields H and couplings J . In contrast, Plefka[ t ], Plefka[ t − t ] are all precise in estimating the values of H and J .Figure 4C and D show the mean squared error (cid:15) H , (cid:15) J for bias terms and couplings between the original modeland the inferred values for different β . In this case, er-rors are large in the estimation of J for Plefka[ t − , t ].In comparison, Plefka[ t ], Plefka[ t −
1] and Plefka2[ t ] workequally well even in the high fluctuation regime ( β ≈ β c ).Since the inverse Ising problem is solved by applying ap-proximation one single time step (per iteration), it is notas challenging as the forward problem that can accumu-late errors by recursively applying the approximations.Therefore, different approximations other than the clas-sical mean-field Plefka[ t − , t ] perform equally well in thiscase. Phase transition reconstruction
We have shown how different methods perform in com-puting the behavior of the system (forward problem) andinferring the parameters of a given network from its ac-tivation data (inverse problem). Combining the two, wecan ask how well the methods explored here can recon-struct the behavior of a system from data, potentiallyexploring behaviors under different conditions than therecorded data.First, in Fig. 5.A-C we examine how the different ap-proximation methods approximate fluctuations (equal-time and time-delayed covariances) and the entropy pro-1 − . − .
25 0 .
00 0 .
25 0 . H oi − H pi A − . − .
01 0 .
00 0 .
01 0 . J oij − . . . . J pij B . . . β/β c − − (cid:15) H C . . . β/β c − − (cid:15) J D Plefka[ t − , t ] Plefka[ t ] Plefka[ t −
1] Plefka2[ t ] FIG. 4.
Inverse Ising problem . Top: Inferred external fields (A) and couplings (B) found by different mean-field modelsplotted versus the real ones for β = β c . Black lines represent the identity lines. Bottom: Average squared error of inferredexternal fields (cid:15) H = (cid:104) ( H oi − H pi ) (cid:105) i (C) and couplings (cid:15) J = (cid:104) ( J oij − J pij ) (cid:105) ij (D) for 21 values of β in the range [0 . β c , . β c ]. duction (see Methods) at t = 128 after solving the for-ward problem by recursively applying the approxima-tions for the 128 steps. As we mentioned above, theasymmetric SK model explored here presents maximumfluctuations and maximum entropy production around β = β c (Supplementary Note 6, Supplementary Fig. 2).However, we see that Plefka[ t − , t ] and Plefka[ t − C t and D t of the original SK model around the transition point.Plefka[ t ] and Plefka2[ t ] show much better performancein capturing the behavior of C t and D t in the phasetransition, although Plefka[ t ] overestimates both corre-lations. Additionally, all the methods capture the phasetransition in entropy production, though Plefka[ t ] overes-timates its value around β c and Plefka2[ t ] is more precisethan the other methods.Next, we combine the forward and inverse Ising prob-lem and try to reproduce the transition in the asym-metric SK model in the models inferred from the data.We first obtain the values of H , J by solving the inverseproblem from the data sampled at β = β c , and next wesolve again the forward problem with those estimatedparameters re-scaled by a new inverse temperature ˜ β .The results for the correlations (Fig. 5.D-E) show thatin this case Plefka[ t − , t ] works badly, not being able tocapture the transition. Plefka[ t −
1] shows similar per-formances as in the forward problem, and Plefka[ t ] andPlefka2[ t ] have a similar behavior, underestimating fluc-tuations slightly. When we analyze entropy production ofthe system (Fig. 5.F), we find that Plefka2[ t ] exhibits bet-ter performance with a high precision, with Plefka[ t − t ] underestimating it,and Plefka[ t − , t ] not capturing the phase transition.Overall, the results above suggest that Plefka2[ t ] is bet-ter suited to identify non-equilibrium phase transitions in models reconstructed from experimental data. DISCUSSION
We have proposed a framework that unifies differ-ent mean-field approximations of the evolving statisticalproperties of non-equilibrium Ising models. This allowsus to derive approximations premised on specific assump-tions about the correlation structure of the system previ-ously proposed in the literature. Furthermore, using ourframework we derive a new approximation (Plefka2[ t ])using atypical assumptions for mean-field methods, i.e.the maintenance of pairwise correlations in the system.This new pairwise approximation outperforms existingones for approximating the behavior of an asymmetricSK model near the non-equilibrium equivalent of a ferro-magnetic phase transition (see Supplementary Note 6),where classical mean-field approximations face problems.This shows that the proposed methods are useful tools toanalyze large-scale, non-equilibrium dynamics near crit-ical regimes expected for biological and social systems.However, we note that low-temperature spin phases (e.g.the spin-glass phase in symmetric models) also imposelimitations on mean-field approximations [32, 41], whichcould be further explored with methods like the ones pre-sented here.The generality of this framework allows us to pictureother approximations with atypical assumptions. For ex-ample, the Sessak-Monasson expansion [53] for an equi-librium Ising model assumes a linear relation between α and spin correlations. An equivalent equilibrium expan-sion could use an effective field h ( α ) nonlinearly depen-dent on α , satisfying linear C t ( α ) = α C t or D t ( α ) = α D t relations. As another extension, Plefka2[ t ] could in-2 . . . β/β c . . . h C ik,t i A . . . β/β c . . . h D il,t i B . . . β/β c h σ t i C . . . β . . . h C ik,t i D . . . β . . . h D il,t i E . . . β h σ t i F Original Plefka[ t − , t ] Plefka[ t ] Plefka[ t −
1] Plefka2[ t ] FIG. 5.
Reconstructing phase transition of kinetic Ising systems . Top: Average of the Ising model’s equal-timecorrelations (A), delayed correlations (B), and entropy production (shown as an exponential for better presentation of itsmaximum) (C), at the last step t = 128 found by different mean-field methods for β = β c . Bottom (D, E, F): The same asabove using the reconstructed network H , J by solving the inverse Ising problem at β = β c and multiplying a fictitious inversetemperature ˜ β to the estimated parameters. The stars are marked at the values of ˜ β that yield maximum fluctuations ormaximum entropy production. corporate higher-order interactions. As Eq. 43, 53 areeach equivalent to two mean-field approximations with s l,t − = ± t ] wouldinvolve 2 M − equations, increasing accuracy but alsocomputational costs. In general, reference models Q ( s t )set coupling parameters of the model to zero at somesteps of its dynamics. Other parameters (e.g. fields) areeither free parameters fitted as m-projection from P ( s t ),or preserved to their original value (see SupplementaryNote 7 for comparing free and fixed parameters of eachmodel). Augmenting accuracy by increasing parametersoften involves a computational cost. As a practical guide-line for using each method, Supplementary Note 7 com-pares their precision and computation time in the forwardand inverse problems (see also Supplementary Fig. 3,4).Asides from its theoretical implications, our unifiedframework offers analysis tools for diverse data-drivenresearch fields. In neuroscience, it has been popular tostudy the activity of ensembles of neurons by inferringan equilibrium Ising model with homogeneous (fixed)parameters [23] or inhomogeneous (time-dependent) pa-rameters [25, 54] from empirical data. Extended analysesbased on the equilibrium model have reported that neu-rons operate near a critical regime [5, 6]. However, stud-ies of non-equilibrium dynamics in neural spike trainsare scarce [7, 26, 55], partly due to the lack of systematicmethods for analysing large-scale non-equilibrium datafrom neurons exhibiting large fluctuations. The proposedpairwise model Plefka2[ t ] is suitable for simulating suchnetwork activities, being more accurate than previous methods in predicting the network evolution at criticality(Fig. 3) and in testing if the system is near the maximallyfluctuating regime (Fig. 5). In particular, application ofour methods for computing entropy production in non-equilibrium systems could provide tools for characteriz-ing the non-equilibrium dynamics of neural systems [56].In summary, a unified framework of mean-field theo-ries offers a systematic way to construct suitable mean-field methods in accordance with the statistical proper-ties of the systems researchers wish to uncover. This isexpected to foster a variety of tools to analyze large-scalenon-equilibrium systems in physical, biological, and so-cial systems. METHODSBoltzmann learning in the inverse Ising problem
Let S rt = { S r ,t , S r ,t , . . . , S rN,t } for t = 1 , . . . , T be ob-served states of a process described by Eq. 1 at the r -thtrial ( r = 1 , . . . , R ). We also define S T to represent theprocesses from all trials. The inverse Ising problem con-sists in inferring the external fields H and couplings J ofthe system. These parameters can be estimated by max-imizing the log-likelihood l ( S T ) of the observed statesunder the model: l ( S T ) = log T (cid:89) t =1 R (cid:89) r =1 P ( S rt | S rt − )3= (cid:88) t (cid:88) r (cid:88) i (cid:0) S ri,t h ri,t − log 2 cosh h ri,t (cid:1) , (56)with h ri,t = H i + (cid:80) j J ij S rj,t − . The learning steps areobtained as: ∂l ( S T ) ∂H i = RT ( (cid:104) S ri,t (cid:105) r,t − (cid:104) tanh h ri,t (cid:105) r,t ) , (57) ∂l ( S T ) ∂J il = RT ( (cid:104) S ri,t S rl,t − (cid:105) r,t − (cid:104) tanh h ri,t S r,tl,t − (cid:105) r,t ) , (58)where (cid:104)·(cid:105) r denotes average over trials. We solve the in-verse Ising problem by applying these equations as a gra-dient ascent rule adjusting H and J . The second termsof Eqs. 57, 58 need to be computed at every iteration,thus the computational cost grows linearly with R × T .However, the use of mean-field approximations can sig-nificantly reduce the cost when a large number of samples R and time bins T are used to correctly estimate activa-tion rates and correlations in large networks. Here thesecond terms can be written as (cid:104) tanh h ri,t (cid:105) r,t = (cid:88) s , ˜ s s i P ( s | ˜ s ) P (˜ s ) = m i , (59) (cid:104) tanh h ri,t S rl,t − (cid:105) r,t = (cid:88) s , ˜ s s i ˜ s l P ( s | ˜ s ) P (˜ s )= D il + m i ˜ m l , (60)where P (˜ s ) = RT (cid:80) r,t δ (˜ s , S rt ) is the empirical distribu-tion averaged over trials and trajectories (with δ beinga Kronecker delta) and ˜ m l is the average activation ratecomputed from the empirical distribution. P ( s | ˜ s ) is de-fined as Eq. 1. We then approximate m i and D il usingthe mean-field equations. Note that when we apply themean-field equations, we replaced all statistics related tothe previous step with those computed by the empiricaldistribution. By applying the mean-field methods, we re-duced the computation of R trials of trajectories of length T into a single computation (instead of RT calculations).In our numerical tests, gradient ascent was executed us-ing learning coefficients η H = 0 . /RT, η J = 1 / ( RT √ N ),starting from H = , J = . Entropy production of the kinetic Ising model
The entropy production is defined as the KL divergencebetween the forward and backward path, quantifying theirreversibility of the system [17, 55, 57]: (cid:104) σ t (cid:105) = 12 (cid:88) s t , s t − (cid:0) P ( s t − ) P ( s t | s t − ) − P ( s t ) P B ( s t − | s t ) (cid:1) · log P ( s t | s t − ) P ( s t − ) P B ( s t − | s t ) P ( s t ) . (61)where P B ( s t − | s t ) is a probability of the backward tra-jectory defined as in Eq. 1 but switching s t and s t − . Assuming a non-equilibrium steady state, where P ( s t ) = P ( s t − ), the entropy production of the kinetic Ising sys-tem is computed as: (cid:104) σ t (cid:105) = (cid:88) s t , s t − P ( s t , s t − ) (cid:16) (cid:88) i H i ( s i,t − s i,t − )+ (cid:88) ij J ij ( s i,t s j,t − − s i,t − s j,t )+ log P ( s t − ) − log P ( s t ) − log(2 cosh( H i + (cid:88) j J ij s j,t − ))+ log(2 cosh( H i + (cid:88) j J ij s j,t )) (cid:17) = (cid:88) ij J ij ( D ij,t − D ji,t ) = (cid:88) ij ( J ij − J ji ) D ij,t . (62) ACKNOWLEDGMENTS
The authors thank Yasser Roudi and MasanaoIgarashi for valuable comments and discussions onthis manuscript. This work was supported in partby the Cooperative Intelligence Joint Research be-tween Kyoto University and Honda Research Insti-tute Japan, MEXT/JSPS KAKENHI Grant NumberJP 20K11709, and the grant of Joint Research bythe National Institutes of Natural Sciences (NINS Pro-gram No. 01112005). M.A. was funded by the Euro-pean Union’s Horizon 2020 research and innovation pro-gramme under the Marie Sk(cid:32)lodowska-Curie grant agree-ment No 892715 and the University of the Basque Coun-try UPV/EHU post-doctoral training program grantESPDOC17/17, and supported in part by the BasqueGovernment project IT 1228-19 and project Outonomy(PID2019-104576GB-I00) by the Spanish Ministry of Sci-ence and Innovation.
CODE AVAILABILITY
The source code for implementing the results in thiswork is available under GPL license at GitHub https://github.com/MiguelAguilera/kinetic-Plefka-expansions [58] (DOI:10.5281/zenodo.4357634).
DATA AVAILABILITY
The datasets generated and analysed during thecurrent study are available under CC BY licenseat Zenodo https://zenodo.org/record/4318983 [59](DOI:10.5281/zenodo.4318983).4
COMPETING INTERESTS
The authors declare no competing interests.
AUTHOR CONTRIBUTIONS
M.A., S.A.M. and H.S. designed and reviewed research;M.A. contributed analytical and numerical results; M.A., S.A.M. and H.S. wrote the paper.
ADDITIONAL INFORMATIONSupplementary information
The online version contains supplementary materialavailable at https://doi.org/10.1038/s41467-021-20890-5 . [1] J. P. Nguyen, F. B. Shipley, A. N. Linder, G. S. Plummer,M. Liu, S. U. Setru, J. W. Shaevitz, and A. M. Leifer,Whole-brain calcium imaging with cellular resolution infreely behaving Caenorhabditis elegans, Proceedings ofthe National Academy of Sciences , E1074 (2016).[2] M. B. Ahrens, M. B. Orger, D. N. Robson, J. M. Li, andP. J. Keller, Whole-brain functional imaging at cellularresolution using light-sheet microscopy, Nature Methods , 413 (2013).[3] C. Stringer, M. Pachitariu, N. Steinmetz, M. Carandini,and K. D. Harris, High-dimensional geometry of popula-tion responses in visual cortex, Nature , 1 (2019).[4] G. Nicolis and I. Prigogine, Self-Organization in Nonequi-librium Systems: From Dissipative Structures to Orderthrough Fluctuations , 1st ed. (Wiley, New York, 1977).[5] G. Tkaˇcik, T. Mora, O. Marre, D. Amodei, S. E. Palmer,M. J. Berry, and W. Bialek, Thermodynamics and signa-tures of criticality in a network of neurons, Proceedingsof the National Academy of Sciences , 11508 (2015).[6] T. Mora, S. Deny, and O. Marre, Dynamical Critical-ity in the Collective Activity of a Population of RetinalNeurons, Physical Review Letters , 078105 (2015).[7] J. Hertz, Y. Roudi, and J. Tyrcha, Ising model for in-ferring network structure from spike data: In Principalof Neural Coding, in
Principles of Neural Coding (CRCPress, 2013) pp. 527–546.[8] Y. Roudi, B. Dunn, and J. Hertz, Multi-neuronal activityand functional connectivity in cell assemblies, Currentopinion in neurobiology , 38 (2015).[9] J.-P. Bouchaud, Crises and collective socio-economic phe-nomena: simple models and challenges, Journal of Sta-tistical Physics , 567 (2013).[10] D. J. Evans, E. G. D. Cohen, and G. P. Morriss, Proba-bility of second law violations in shearing steady states,Physical review letters , 2401 (1993).[11] C. Jarzynski, Nonequilibrium equality for free energy dif-ferences, Physical Review Letters , 2690 (1997).[12] G. E. Crooks, Nonequilibrium measurements of free en-ergy differences for microscopically reversible Markoviansystems, Journal of Statistical Physics , 1481 (1998).[13] G. E. Crooks, Entropy production fluctuation theoremand the nonequilibrium work relation for free energy dif-ferences, Physical Review E , 2721 (1999).[14] J. L. Lebowitz and H. Spohn, A Gallavotti–Cohen-typesymmetry in the large deviation functional for stochasticdynamics, Journal of Statistical Physics , 333 (1999).[15] S. Ito, M. Oizumi, and S.-i. Amari, Unified frameworkfor the entropy production and the stochastic interac- tion based on information geometry, Physical Review Re-search , 033048 (2020).[16] D. J. Evans and D. J. Searles, The fluctuation theorem,Advances in Physics , 1529 (2002).[17] U. Seifert, Stochastic thermodynamics, fluctuation the-orems and molecular machines, Reports on progress inphysics , 126001 (2012).[18] P. Gaspard, Time Asymmetry in Nonequilibrium Statis-tical Mechanics, in Special Volume in Memory of IlyaPrigogine (John Wiley & Sons, Ltd, 2007) pp. 83–133.[19] S. R. A. Salinas, The Ising Model, in
Introduction toStatistical Physics , Graduate Texts in ContemporaryPhysics, edited by S. R. A. Salinas (Springer New York,New York, NY, 2001) pp. 257–276.[20] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learn-ing algorithm for Boltzmann machines, Cognitive science , 147 (1985).[21] A. Witoelar and Y. Roudi, Neural network reconstruc-tion using kinetic Ising models with memory, BMC Neu-roscience , P274 (2011).[22] C. Donner and M. Opper, Inverse Ising problem in con-tinuous time: A latent variable approach, Physical Re-view E , 062104 (2017).[23] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek,Weak pairwise correlations imply strongly correlated net-work states in a neural population, Nature , 1007(2006).[24] S. Cocco, S. Leibler, and R. Monasson, Neuronal cou-plings between retinal ganglion cells inferred by efficientinverse statistical physics methods, Proceedings of theNational Academy of Sciences , 14058 (2009).[25] H. Shimazaki, S.-i. Amari, E. N. Brown, and S. Gr¨un,State-Space Analysis of Time-Varying Higher-OrderSpike Correlation for Multiple Neural Spike Train Data,PLOS Computational Biology , e1002385 (2012).[26] J. Tyrcha, Y. Roudi, M. Marsili, and J. Hertz, The ef-fect of nonstationarity on models inferred from neuraldata, Journal of Statistical Mechanics: Theory and Ex-periment , P03005 (2013).[27] D. J. Thouless, P. W. Anderson, and R. G. Palmer, Solu-tion of ’Solvable model of a spin glass’, The PhilosophicalMagazine: A Journal of Theoretical Experimental andApplied Physics , 593 (1977).[28] H. J. Kappen and F. B. Rodr´ıguez, Efficient Learningin Boltzmann Machines Using Linear Response Theory,Neural Computation , 1137 (1998).[29] Y. Roudi, E. Aurell, and J. A. Hertz, Statistical Physicsof Pairwise Probability Models, Frontiers in Computa- tional Neuroscience , 10.3389/neuro.10.022.2009 (2009).[30] Y. Roudi, J. Tyrcha, and J. Hertz, Ising model for neuraldata: Model quality and approximate methods for ex-tracting functional connectivity, Physical Review E ,051915 (2009).[31] C. Donner, K. Obermayer, and H. Shimazaki, Approxi-mate Inference for Time-Varying Interactions and Macro-scopic Dynamics of Neural Populations, PLOS Compu-tational Biology , e1005309 (2017).[32] T. Plefka, Convergence condition of the TAP equationfor the infinite-ranged Ising spin glass model, Journal ofPhysics A: Mathematical and general , 1971 (1982).[33] T. Tanaka, Mean-field theory of Boltzmann machinelearning, Physical Review E , 2302 (1998).[34] T. Tanaka, A theory of mean field approximation, in Ad-vances in Neural Information Processing Systems (1999)pp. 351–360.[35] C. Bhattacharyya and S. S. Keerthi, Information geome-try and Plefka’s mean-field theory, Journal of Physics A:Mathematical and General , 1307 (2000).[36] T. Tanaka, Information Geometry of Mean-Field Ap-proximation, in Advanced Mean Field Methods: Theoryand Practice (MIT Press, 2001).[37] S. Amari, S. Ikeda, and H. Shimokawa, Information ge-ometry of α -projection in mean-field approximation, in Advanced Mean Field Methods: Theory and Practice (MIT Press, 2001).[38] H. J. Kappen and J. J. Spanjers, Mean field theory forasymmetric neural networks, Physical Review E , 5658(2000).[39] Y. Roudi and J. Hertz, Dynamical TAP equations fornon-equilibrium Ising spin glasses, Journal of Statisti-cal Mechanics: Theory and Experiment , P03031(2011).[40] Y. Roudi and J. Hertz, Mean Field Theory for Nonequi-librium Network Reconstruction, Physical Review Let-ters , 048702 (2011).[41] M. M´ezard and J. Sakellariou, Exact mean-field inferencein asymmetric kinetic Ising systems, Journal of Statisti-cal Mechanics: Theory and Experiment , L07001(2011).[42] H. Mahmoudi and D. Saad, Generalized mean field ap-proximation for parallel dynamics of the Ising model,Journal of Statistical Mechanics: Theory and Experi-ment , P07001 (2014).[43] L. Bachschmid-Romano, C. Battistin, M. Opper, andY. Roudi, Variational perturbation and extended Plefkaapproaches to dynamics on random networks: the caseof the kinetic Ising model, Journal of Physics A: Mathe-matical and Theoretical , 434003 (2016). [44] S. Press´e, K. Ghosh, J. Lee, and K. A. Dill, Principlesof maximum entropy and maximum caliber in statisticalphysics, Reviews of Modern Physics , 1115 (2013).[45] S. Amari and H. Nagaoka, Methods of information geom-etry , Vol. 191 (American Mathematical Soc., 2007).[46] S. Amari,
Information geometry and its applications , Vol.194 (Springer, 2016).[47] S. Amari, K. Kurata, and H. Nagaoka, Information ge-ometry of Boltzmann machines, IEEE Transactions onneural networks , 260 (1992).[48] M. Oizumi, N. Tsuchiya, and S.-i. Amari, Unified frame-work for information integration based on informationgeometry, Proceedings of the National Academy of Sci-ences , 14817 (2016).[49] L. K. Saul and M. I. Jordan, Exploiting tractable sub-structures in intractable networks, in Advances in NeuralInformation Processing Systems (1996) pp. 486–492.[50] Y. Zhang and A. C. Barato, Critical behavior of entropyproduction and learning rate: Ising model with an oscil-lating field, Journal of Statistical Mechanics: Theory andExperiment , 113207 (2016).[51] L. Crochik and T. Tom´e, Entropy production in themajority-vote model, Physical Review E , 057103(2005).[52] C. F. Noa, P. E. Harunari, M. de Oliveira, and C. Fiore,Entropy production as a tool for characterizing nonequi-librium phase transitions, Physical Review E , 012104(2019).[53] V. Sessak and R. Monasson, Small-correlation expansionsfor the inverse Ising problem, Journal of Physics A: Math-ematical and Theoretical , 055001 (2009).[54] E. Granot-Atedgi, G. Tkaˇcik, R. Segev, and E. Schneid-man, Stimulus-dependent Maximum Entropy Models ofNeural Population Codes, PLOS Computational Biology , e1002922 (2013).[55] R. Cofr´e, L. Videla, and F. Rosas, An Introduction tothe Non-Equilibrium Steady States of Maximum EntropySpike Trains, Entropy , 884 (2019).[56] C. W. Lynn, E. J. Cornblath, L. Papadopoulos, M. A.Bertolero, and D. S. Bassett, Non-equilibrium dynam-ics and entropy production in the human brain, arXivpreprint arXiv:2005.02526 (2020).[57] J. Schnakenberg, Network theory of microscopic andmacroscopic behavior of master equation systems, Re-views of Modern Physics48