[PDF] A unifying framework for mean-field theories of asymmetric kinetic Ising systems

Abstract

Kinetic Ising models are powerful tools for studying the non-equilibrium dynamics of complex systems. As their behavior is not tractable for large networks, many mean-field methods have been proposed for their analysis, each based on unique assumptions about the system's temporal evolution. This disparity of approaches makes it challenging to systematically advance mean-field methods beyond previous contributions. Here, we propose a unifying framework for mean-field theories of asymmetric kinetic Ising systems from an information geometry perspective. The framework is built on Plefka expansions of a system around a simplified model obtained by an orthogonal projection to a sub-manifold of tractable probability distributions. This view not only unifies previous methods but also allows us to develop novel methods that, in contrast with traditional approaches, preserve the system's correlations. We show that these new methods can outperform previous ones in predicting and assessing network properties near maximally fluctuating regimes.

Full PDF

AA unifying framework for mean-ﬁeld theories of asymmetric kinetic Ising systems

Miguel Aguilera,

1, 2, 3, ∗ S. Amin Moosavi,

4, 5 and Hideaki Shimazaki

4, 6 IAS-Research Center for Life, Mind, and Society, Department of Logic and Philosophy of Science,University of the Basque Country, Donostia, Spain. Department of Informatics & Sussex Neuroscience, University of Sussex, Brighton, UK. ISAAC Lab, Arag´on Institute of Engineering Research, University of Zaragoza, Zaragoza, Spain. Graduate School of Informatics, Kyoto University, Kyoto, Japan. Present address: Department of Neuroscience, Brown University, USA Present address: Center for Human Nature, Artiﬁcial Intelligence,and Neuroscience (CHAIN), Hokkaido University, Japan (Dated: January 27, 2021)Kinetic Ising models are powerful tools for studying the non-equilibrium dynamics of complexsystems. As their behavior is not tractable for large networks, many mean-ﬁeld methods havebeen proposed for their analysis, each based on unique assumptions about the system’s temporalevolution. This disparity of approaches makes it challenging to systematically advance mean-ﬁeldmethods beyond previous contributions. Here, we propose a unifying framework for mean-ﬁeld theo-ries of asymmetric kinetic Ising systems from an information geometry perspective. The frameworkis built on Plefka expansions of a system around a simpliﬁed model obtained by an orthogonal pro-jection to a sub-manifold of tractable probability distributions. This view not only uniﬁes previousmethods but also allows us to develop novel methods that, in contrast with traditional approaches,preserve the system’s correlations. We show that these new methods can outperform previous onesin predicting and assessing network properties near maximally ﬂuctuating regimes.

INTRODUCTION

Advances in high-throughput data acquisition tech-nologies for very large biological and social systems areproviding unprecedented possibilities to investigate theircomplex, non-equilibrium dynamics. For example, opti-cal recordings from genetically modiﬁed neural popula-tions make it possible to simultaneously monitor activi-ties of the whole neural network of behaving C. elegans[1] and zebraﬁsh [2], as well as thousands of neurons inthe mouse visual cortex [3]. Such networks generally ex-hibit out-of-equilibrium dynamics [4], and are often foundto self-organize near critical regimes at which their ﬂuc-tuations are maximized [5, 6]. Evolution of such sys-tems cannot be faithfully captured by methods assum-ing an asymptotic equilibrium state. Therefore, in gen-eral, there is a pressing demand for mathematical tools tostudy the dynamics of large-scale, non-equilibrium com-plex systems and to analyze high-dimensional data setsrecorded from them.The kinetic Ising model with asymmetric couplings isa prototypical model for studying such non-equilibriumdynamics in biological [7, 8] and social systems [9]. Itis described as a discrete-time Markov chain of interact-ing binary units, resembling the nonlinear dynamics ofrecurrently connected neurons. The model exhibits non-equilibrium behavior when couplings are asymmetric orwhen model parameters are subject to rapid changes, rul-ing out quasi-static processes. These conditions induce atime reversal asymmetry in dynamical trajectories, lead-ing to positive entropy production (the second law of ∗ [email protected] thermodynamics) as revealed by the ﬂuctuation theorem[10–15] (see [16, 17] for reviews). This time-asymmetryis characteristic of non-equilibrium systems as it can onlybe displayed by systems in which energy dissipation takesplace [18]. In the case of symmetric connections andstatic parameters, the model converges to an equilibriumstationary state. Therefore, it is a generalization of itsequilibrium counterpart known as the (equilibrium) Isingmodel [19].The forward Ising problem refers to calculating statis-tical properties of the model, such as mean activationrates (mean magnetizations of spins) and correlations,given the parameters of the model. In contrast, inferenceof the model parameters from data is called the inverseIsing problem [20]. In this regard, kinetic Ising models[21, 22] and their equilibrium counterparts [23–25] havebecome popular tools for modeling and analyzing biolog-ical and social systems. In addition, they capture mem-ory retrieval dynamics in classical associative networks.Namely, they are equivalent to the Boltzmann machine,extensively used in machine learning applications [20].Unfortunately, exact solutions of the forward and inverseproblems often become computationally too expensivedue to the combinatorial explosion of possible patternsin large networks or the high volume of data, and appli-cations of exact or sampling-based methods are limitedin practice to around a hundred of neurons [5, 25, 26].Therefore, analytical approximation methods are nec-essary for analysing large systems. In this endeavour,mean-ﬁeld methods have emerged as powerful tools totrack down otherwise intractable statistical quantities.The standard mean-ﬁeld approximations to studyequilibrium Ising models are the classical naive mean-ﬁeld (nMF) and the more accurate Thouless-Anderson- a r X i v : . [ c ond - m a t . d i s - nn ] J a n Palmer (TAP) approximations [27]. These methods havealso been employed to solve the inverse Ising problem [28–31]. Plefka demonstrated that the nMF and TAP approx-imations for the equilibrium model can be derived usingthe power series expansion of the free energy around amodel of independent spins, a method which is now re-ferred to as the Plefka expansion [32]. This expansionup to the ﬁrst and second orders leads to the nMF andTAP mean-ﬁeld approximations respectively. The Plefkaexpansion was later formalized by Tanaka and others inthe framework of information geometry [33–37].In non-equilibrium networks, however, the free energyis not directly deﬁned, and therefore it is not obvioushow to apply the Plefka expansions. Kappen and Span-jers [38] proposed an information geometric approach tomean-ﬁeld solutions of the asymmetric Ising model withasynchronous dynamics. They showed that their second-order approximation for an asymmetric model in the sta-tionary state is equivalent to the TAP approximationfor equilibrium models. Later, Roudi and Hertz derivedTAP equations for non-stationary states using a Legen-dre transformation of the generating functional of the setof trajectories of the model [39]. Another study by Roudiand Hertz extended mean-ﬁeld equations to provide ex-pressions for the non-stationary delayed correlations as-suming the presence of equal-time correlations at the pre-vious step [40]. Yet another interesting method proposedby M´ezard and Sakellariou approximates the local ﬁeldsby a Gaussian distribution according to the central limittheorem, yielding more accurate results for fully asym-metric networks [41]. This method was later extended toinclude correlations at the previous time-step, improvingthe results for symmetric couplings [42]. More recently,Bachschmid-Romano et al. extended the path-integralmethods in [39] with Gaussian eﬀective ﬁelds [43], notonly recovering [41] for fully asymmetric networks butalso proposing a method that better approximates meanrate dynamics by conserving autocorrelations of units.Although many choices of mean-ﬁeld methods are avail-able, the diversity of methods and assumptions makes itchallenging to advance systematically over previous con-tributions.Here, we propose a uniﬁed approach for mean-ﬁeld ap-proximations of the Ising model. While our method isapplicable to symmetric and equilibrium models, we fo-cus for generality on asymmetric kinetic Ising models.Our approach is deﬁned as a family of Plefka expansionsin an information geometric space. This approach al-lows us to unify and relate existing mean-ﬁeld methodsand provide expressions for other statistics of the sys-tems such as pairwise correlations. Furthermore, our ap-proach can be extended beyond classical mean-ﬁeld as-sumptions to propose novel approximations. Here, weintroduce an approximation based on a pairwise modelthat better captures network correlations, and we showthat it outperforms existing approximations of kineticIsing models near a point of maximum ﬂuctuations. Wealso provide a data-driven method to reconstruct and test if a system is near a phase transition by combiningthe forward and inverse Ising problems, and demonstratethat the proposed pairwise model more accurately esti-mates the system’s ﬂuctuations and its sensitivity to pa-rameter changes. These results conﬁrm that our uniﬁedframework is a useful tool to develop methods to analyzelarge-scale, non-equilibrium biological and social dynam-ics operating near critical regimes. In addition, since themethods are directly applicable to Boltzmann machinelearning, the geometrical framework introduced here isrelevant in machine learning applications.The paper is organized as follows. First, we intro-duce the kinetic Ising model and its statistical proper-ties of interest. Second, we introduce our framework forthe Plefka approximation methods from a geometric per-spective. To explain how it works, we derive the classicalnaive and TAP mean-ﬁeld approximations under the pro-posed framework. Third, we show that our approach canunify other known mean-ﬁeld approximation methods.We then propose a novel pairwise approximation underthis framework. Finally, we compare diﬀerent mean-ﬁeldapproximations in solving the forward and inverse Isingproblems, as well as in performing the data-driven as-sessment of the system’s sensitivity. The last section isdevoted to discussion.

RESULTSThe kinetic Ising model

The kinetic Ising model is the least structured statis-tical model containing delayed pairwise interactions be-tween its binary components (i.e., a maximum calibermodel [44]). The system consists of N interacting bi-nary variables (down or up of Ising spins or inactive oractive of neural units) s i,t ∈ {− , +1 } , i = 1 , , .., N ,evolving in discrete time steps t with parallel dynam-ics. Given the conﬁguration of spins at t − s t − = { s ,t − , s ,t − , . . . , s N,t − } , spins s t at time t are con-ditionally independent random variables, updated as adiscrete-time Markov chain, following P ( s t | s t − ) = (cid:89) i e s i,t h i,t h i,t , (1) h i,t = H i + (cid:88) j J ij s j,t − . (2)The parameters H = { H i } and J = { J ij } represent localexternal ﬁelds at each spin and couplings between pairsof spins respectively. When the couplings are asymmet-ric (i.e, J ij (cid:54) = J ji ), the system is away from equilibriumbecause the process is irreversible with respect to time.Given the probability mass function of the previous state P ( s t − ), the distribution of the current state is: P ( s t ) = (cid:88) s t − P ( s t | s t − ) P ( s t − ) . (3)This marginal distribution P ( s t ) is not factorized (exceptat J = ), but it rather exhibits a complex statisticalstructure, generally containing higher-order spin interac-tions. We can apply this equation recursively, e.g. de-composing P ( s t − ) in terms of the distribution P ( s t − ),and trace the evolution of the system from the initialdistribution P ( s ).In this article, we use variants of the Plefka expan-sion to calculate some statistical properties of the system.Namely, we investigate the average activation rates m t ,correlations between pairs of units (covariance function) C t , and delayed correlations D t given by m i,t = (cid:88) s t s i,t P ( s t ) , (4) C ik,t = (cid:88) s t s i,t s k,t P ( s t ) − m i,t m k,t , (5) D il,t = (cid:88) s t , s t − s i,t s l,t − P ( s t , s t − ) − m i,t m l,t − . (6)Note that m t and D t are suﬃcient statistics of the ki-netic Ising model. Therefore, we will use them in solvingthe inverse Ising problem (see Methods). We additionallyconsider the equal-time correlations C t as they are com-monly used to describe the systems, and are investigatedby some of the mean-ﬁeld approximations in the litera-ture [40]. Calculation of these expectation values is ana-lytically intractable and computationally very expensivefor large networks, due to the combinatorial explosion ofthe number of possible states. To reduce this computa-tional cost, we approximate the marginal probability dis-tributions (Eq. 3) by the Plefka expansion method thatutilizes an alternative, tractable distribution. Geometrical approach to mean-ﬁeld approximation

Information geometry [37, 45, 46] provides clear ge-ometrical understanding of information-theoretic mea-sures and probabilistic models [15, 47, 48]. Using the lan-guage of information geometry, we introduce our methodfor mean-ﬁeld approximations of kinetic Ising systems.Let P t be the manifold of probability distributions attime t obtained from Eq. 3. Each point on the manifoldcorresponds with a set of parameter values. The mani-fold P t contains sub-manifolds Q t of probability distri-butions with analytically tractable statistical properties(See Fig. 1). We use this tractable manifold, i.e., a refer-ence model, to approximate a target point P ( s t | H , J ) inthe manifold P t and its statistical properties m t , C t , D t .The simplest sub-manifold Q t is the manifold of inde-pendent models, used in classical mean-ﬁeld approxima-tions to compute average activation rates. Each point onthis sub-manifold corresponds to a distribution Q ( s t | Θ t ) = (cid:89) i e s i,t Θ i,t i,t , (7) where Θ t = { Θ i,t } is the vector of parameters that rep-resents a point in Q t . This distribution does not includecouplings between units, and its average activation rateis immediately given as m i,t = tanh Θ i,t .Our ﬁrst goal is to ﬁnd the average activation rates ofthe target distribution P ( s t | H , J ). It turns out that theycan be obtained from the independent model Q ( s t | Θ t )that minimizes the following Kullback-Leibler (KL) di-vergence from P ( s t ): D ( P || Q ) = (cid:88) s t P ( s t ) log P ( s t ) Q ( s t ) . (8)The independent model Q ( s t | Θ ∗ t )( ≡ Q ∗ ( s t )) that mini-mizes the KL divergence has activation rates m t identicalto those of the target distribution P ( s t ) [38] because theminimizing points Θ ∗ i,t satisfy (for i = 1 , . . . , N ) ∂D ( P || Q ) ∂ Θ i,t (cid:12)(cid:12)(cid:12)(cid:12) Θ t = Θ ∗ t = − (cid:88) s t s i,t P ( s t ) + tanh Θ ∗ i,t = m Q ∗ i,t − m Pi,t = 0 , (9)where m Pi,t and m Q ∗ i,t are respectively expectation valuesof s i,t by P ( s t ) and Q ( s t | Θ ∗ t ). As these values are equal,for the rest of the paper we will drop their superscriptsand just write m i,t for simplicity. Note that the result ofthis approximation is indiﬀerent of the system’s correla-tions. Later in the paper we will consider approximationsthat take into account pairwise correlations.From an information geometric point of view, given m t (or Θ ∗ t ), we may consider a family of points deﬁnedas a linear mixture of P ( s t ) and Q ( s t | Θ ∗ t ) for which m t is kept constant (the dashed line A in Fig. 1). This isknown as an m-geodesic, and it is orthogonal to the e-ﬂat manifold Q t , constituting an m-projection to thismanifold [37, 47]. Thus, the previous search of Q ( s t | Θ ∗ t )given P ( s t | H , J ) is equivalent to ﬁnding the orthogonalprojection point from P ( s t | H , J ) to the manifold Q t ofindependent models [36, 37]. The Plefka expansion

Although the m-projection provides the exact andunique average activation rates, its calculation in practicerequires the complete distribution P ( s t ). In the Plefkaexpansion, we relax the constraints of the m-projection,and introduce another set of more tractable distributionsthat passes only through P ( s t | H , J ) and Q ( s t | Θ ∗ t ) (thesolid line in Fig. 1). This distribution is deﬁned using anew conditional distribution introducing a parameter α that connects a distribution on the manifold Q t with the Q t Q * ( s t ) P ( s t ) A P α ( s t ) α =1 α =0 FIG. 1.

A geometric view of the approximations basedon Plefka expansions . The point P ( s t ) is the marginal dis-tribution of a kinetic Ising model at time t . The submanifold Q t is a set of tractable distributions, for example a manifoldof independent models. The points in A correspond to a m-geodesic, that is a linear mixture of P ( s t ) and Q ∗ ( s t ) on Q t ,where for independent Q t all points on A share the same meanvalues m t . Geometrically, A constitutes the m-projectionfrom P ( s t ) to Q t , deﬁning Q ∗ ( s t ) as the closest point in thesubmanifold Q t to the point P ( s t ) [47]. The Plefka expansionis deﬁned by expanding an α -dependent distribution P α ( s t )that satisﬁes P α =0 ( s t ) = Q ∗ ( s t ) and P α =1 ( s t ) = P ( s t ). original distribution P ( s t ): P α ( s i,t | s t − ) = e s i,t h i,t ( α ) h i,t ( α ) , (10) h i,t ( α ) =(1 − α )Θ i,t + α ( H i + (cid:88) j J ij s j,t − ) . (11)At α = 0, P α =0 ( s t | s t − ) = Q ( s t | Θ t ), and α = 1 leads to P α =1 ( s t | s t − ) = P ( s t | s t − ). Using this alternative con-ditional distribution P α ( s i,t | s t − ), we construct an ap-proximate marginal distribution P α ( s t ). Consequently,expectation values with respect to P α ( s t ) are functionsof α . We thus write the statistics of the approximatesystem as m t ( α ), C t ( α ), and D t ( α ).The Plefka expansions of these statistics are deﬁnedas the Taylor series expansion of these functions around α = 0. In the case of the mean activation rate, theexpansion up to the n th-order leads to: m t ( α ) = m t ( α = 0) + n (cid:88) k =1 α k k ! ∂ k m t ( α = 0) ∂α k + O ( α ( n +1) ) , (12)where O ( α ( n +1) ) stands for the residual error of the ap-proximation of order n + 1 and higher. For the n th-order approximation, we neglect the residual terms as O ( α ( n +1) ) (cid:12)(cid:12) α =1 ≈

0. Note that all coeﬃcients of expan-sion are functions of Θ t . The mean-ﬁeld approximation is computed by setting α = 1 and ﬁnding the value of Θ ∗ t that satisﬁes Eq. 12. Note that since the originalmarginal distribution is recovered at α = 1, the equalityof Eq. 9 holds: m t ( α = 1) = m t ( α = 0). Then, we have n (cid:88) k =1 k ! ∂ k m t ( α = 0) ∂α k = 0 , (13)which should be solved with respect to the parameters Θ t . Since we neglected the terms higher than the n -thorder, the solution may not lead to the exact projection, Q ( s t | Θ ∗ t ). In this study, we investigate the ﬁrst ( n =1) and second ( n = 2) order approximations. Moreoverwe can apply the same expansion to approximate thecorrelations C t and D t , using Eq. 10.What is the diﬀerence between this approach and othermean-ﬁeld methods? Conventionally, naive mean-ﬁeldapproximations are obtained by minimizing D ( Q || P ) asopposed to D ( P || Q ) (Eq. 8) [36, 49]. This approachis typically used in variational inference to constructa tractable approximate posterior in machine learningproblems. Following the Bogolyubov inequality, mini-mizing this divergence is equivalent to minimizing thevariational free energy. Geometrically, it comprisesan e-projection of P ( s t | H , J ) to the sub-manifold Q t ,which does not result in Q ( s t | Θ ∗ t ). Namely, minimizing D ( Q || P ), as well as minimization of other α -divergencesexcept for D ( P || Q ), introduces a bias in the estimation ofthe mean-ﬁeld approximation [36, 37]. In contrast, if weconsider the m-projection point that minimizes D ( P || Q ),we can approximate the exact value of m t using Eq. 12up to an arbitrary order.In the subsequent sections we show that diﬀerentapproximations of the marginal distribution P ( s t ) inEq. 3 can be constructed by replacing P ( s i,τ | s τ − ) with P α ( s i,τ | s τ − ) for diﬀerent pairs i, τ (here we will explorethe cases of τ = t and τ = t − k , P ( s t − k +1 , . . . , s t ). In addition, we are not restrictedto manifolds of independent models. The independentmodel is adopted as a reference model to approximatethe average activation rate, but one can also approximatecorrelations using this method. In this vein, we can ex-tend our framework to use reference manifolds Q t − k +1: t (of models Q ( s t − k +1 , . . . , s t )) that include interactions,e.g., pairwise couplings between elements at two diﬀer-ent time points, to more accurately approximate the de-layed correlations (see Supplementary Note 1). By sys-tematically deﬁning these reference distributions, we willprovide a uniﬁed approach to derive Plefka approxima-tions of m t , C t , and D t , including the one that utilizesa pairwise structure. Plefka[ t − , t ]: Expansion around independentmodels at times t − and t Before elaborating diﬀerent mean-ﬁeld approxima-tions, we demonstrate our method by deriving the knownresults of the classical nMF and TAP approximations forthe kinetic Ising model [38, 39]. In order to derive theseclassical mean-ﬁeld equations, we make a Plefka expan-sion around the points Θ ∗ t and Θ ∗ t − that are respectivelyobtained by orthogonal projection to the independentmanifolds Q t and Q t − , computed as in Eq. 9. Note thatassuming an approximation where previous distributions(e.g., t − , t − , . . . ) are also independent yields exactlythe same result. In this way, we derive the nMF and TAPequations of a model deﬁned by a marginal probabilitydistribution P [ t − t ] α . Using Eqs. (3) and (10), we write P [ t − t ] α ( s t ) = (cid:88) s t − s t − P α ( s t | s t − ) P α ( s t − | s t − ) P ( s t − ) , (14)where P [ t − t ] α =0 ( s t ) = Q ( s t ) and the original distributionis recovered for P [ t − t ] α =1 ( s t ) = P ( s t ).Following Eq. 13, for the ﬁrst order approximation wehave ∂m i,t ( α =0) ∂α = 0. Since the derivative of the ﬁrstorder moment is ∂m i,t ( α = 0) ∂α =(1 − m i,t )( − Θ i,t + H i + (cid:88) j J ij m j,t − ) . (15)By solving the equation, we ﬁnd Θ ∗ i,t ≈ H i + (cid:80) j J ij m j,t − that leads to the naive mean-ﬁeld approximation: m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − ] . (16)We apply the same expansion to approximate the cor-relations, expanding C ik,t ( α ) and D il,t ( α ) around α = 0up to the ﬁrst order using Θ i,t = Θ ∗ i,t . Then we obtain C ik,t ≈ , i (cid:54) = k, (17) D il,t ≈ J il (1 − m i,t )(1 − m l,t − ) . (18)Detailed calculations are presented in SupplementaryNote 2.To obtain the second order approximation, we need tosolve ∂m i ( α =0) ∂α + ∂ m i ( α =0) ∂α = 0 from Eq. 13. Here thesecond order derivative is given as ∂ m i,t ( α = 0) ∂α ≈ − m i,t (1 − m i,t ) (cid:88) j J ij (1 − m j,t − ) , (19)where terms of the order higher than quadratic were ne-glected (see Supplementary Note 2 for further details). From these equations, we ﬁnd Θ ∗ i,t ≈ H i + (cid:80) j J ij m j,t − − m i,t (cid:80) j J ij (1 − m j,t − ) leading to the TAP equation: m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − − m i,t (cid:88) j J ij (1 − m j,t − )] . (20)Having Θ ∗ i,t , we can incorporate TAP approximations ofthe correlations by expanding C ik,t ( α ) and D il,t ( α ) (seeSupplementary Note 2 for details) as: C ik,t ≈ (1 − m i,t )(1 − m k,t ) (cid:88) j J ij J kj (1 − m j,t − ) ,i (cid:54) = k, (21) D il,t ≈ J il (1 − m i,t )(1 − m l,t − )(1 + 2 J il m i,t m l,t − ) . (22)In these approximations, Eqs. 16 and 20 of activationrates m t correspond to the classical nMF and TAP equa-tions of the kinetic Ising model [38, 39]. The mean-ﬁeld equations for the equal-time and delayed correla-tions (Eqs. 17, 18 and 21, 22) are novel contributionsfrom applying the Plefka expansion to correlations.We note that, using the equations above, we can com-pute the approximate statistical properties of the systemat t ( m t , C t , D t ) from m t − . Therefore, the system evo-lution is described by recursively computing m t from aninitial state m (for both transient and stationary dy-namics), although approximation errors accumulate overthe iterations. After we introduce a uniﬁed view of mean-ﬁeld approximations in the subsequent sections, we willnumerically examine approximation errors of these vari-ous methods in predicting statistical structure of the sys-tem. Generalization of mean-ﬁeld approximations

In the previous section, we described a Plefka approx-imation that uses a model containing independent unitsat time t − t to construct a marginal probabil-ity distribution P [ t − t ] α ( s t ). This is, however, not theonly possible choice of approximation. As we mentionedabove, other approximations have been introduced in theliterature. In [40], expressions are provided for the non-stationary delayed correlations D t as a function of C t − .In [41], an approximation is derived by assuming thatunits at state s t − are independent while correlations of s t are preserved.In the following sections, we show that various approx-imation methods, including those mentioned above, canbe uniﬁed as Plefka expansions. Each method of the ap-proximation corresponds to a speciﬁc choice of the sub-manifold Q t at each time step. Fig. 2 show the cor-responding sub-manifolds Q t − t of possible approxima-tions, where gray lines represent interactions that areaﬀected by α in the Plefka expansion. The mean-ﬁeldapproximations in the previous section were obtained by s i,t s t – s t – s i,t s t – s t – B Plefka[ t – ,t ] s i,t s t – s t – C Plefka[ t ] s i,t s t – s t – D Plefka[ t – ] E Plefka2[ t ] A Original model s i,t s t – s t – s l,t – s i,t s t – s t – s k,t FIG. 2.

Uniﬁed mean-ﬁeld framework . Original model (A) and family of generalized Plefka expansions (B-E). Gray linesrepresent connections that are proportional to α and thus removed in the approximated model to perform the Plefka expansions,while solid black lines are conserved and dashed lines are free parameters. Plefka[ t − , t ] (B) retrieves the classical mean-ﬁeldequations [38, 39]. Plefka[ t ] (C) results in a novel method which preserves correlations of the system at t −

1, incorporatingequations similar to [40]. Plefka[ t −

1] (D) assumes independent activity at t-1, and in its ﬁrst order approximation reproducesthe results in [41]. Plefka2[ t ] (E) represents a novel pairwise approximation which performs better in estimating correlations. using the model represented in Fig. 2B, where the cou-plings at time t − t are aﬀected by α . Below, wepresent systematic applications of the Plefka expansionsaround other reference models in order to approximatethe original distribution (Fig. 2C-E). By doing so, we notonly unify the previously reported mean-ﬁeld approxima-tions but also provide novel solutions that can providemore precise approximations than known methods. Plefka[ t ]: Expansion around an independent modelat time t For the Plefka[ t − , t ] approximation, explained above,the system becomes independent for α = 0 at t as well as t −

1. This leads to approximations of m t , C t , D t beingspeciﬁed by m t − , while being independent of C t − and D t − . In [40], the authors describe a mean-ﬁeld approx-imation by performing new expansion over the classicalnMF and TAP equations that takes into account previ-ous correlations C t − . Here, our framework allows usto obtain similar results by considering only a Plefka ex-pansion over manifold Q t while assuming that we knowthe properties of P ( s t − ) (Fig. 2C). Therefore, we denotethis approximation as P [ t ] α and consider P [ t ] α ( s t ) = (cid:88) s t − P α ( s t | s t − ) P ( s t − ) . (23) In Supplementary Note 3 we derive the equations for thisapproximation. For the ﬁrst order, we obtain m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − ] , (24) C ik,t ≈ , i (cid:54) = k, (25) D il,t ≈ (1 − m i,t ) (cid:88) j J ij C jl,t − . (26)Note that Eqs. 24 and 25 are the same as the nMFPlefka[ t − , t ] equations. Eq. 26 includes C t − , beingexactly the same result obtained in [40, Eq. 4]. The sec-ond order approximations leads to: m i,t ≈ tanh[ H i + (cid:88) j J ij m j,t − − m i,t (cid:88) jl J ij J il C jl,t − ] , (27) C ik,t ≈ (1 − m i,t )(1 − m k,t ) (cid:88) jl J ij J kl C jl,t − , i (cid:54) = k, (28) D il,t ≈ (1 − m i,t ) (cid:88) j J ij C jl,t − (1 + 2 J il m i,t m l,t − ) . (29)All update rules include the eﬀect of C t − . We can seethat if we use the covariance matrix of the independentmodel at t −

1, we recover the results of P [ t − t ] α in theprevious section. In contrast with [40], we provide anovel approximation method that depends on previouscorrelations using a single expansion (instead of two sub-sequent expansions), and additionally present approxi-mated equal-time correlations. Plefka[ t − ]: Expansion around an independentmodel at time t − In [41], a mean-ﬁeld method is proposed by approxi-mating the eﬀective ﬁeld h t as the sum of a large numberof independent spins, approximated by a Gaussian dis-tribution, yielding exact results for fully asymmetric net-works in the thermodynamic limit. In our framework, wedescribe this approximation as an expansion around theprojection point from P ( s t − ) to the submanifold Q t − ,using a model where only s t − are independent (Fig. 2D).In this case (see Supplementary Note 4), the eﬀective ﬁeld h t at the submanifold is a sum of independent terms,which for large N yields a Gaussian distribution.By deﬁning P [ t − α ( s t , s t − ) = (cid:88) s t − P ( s t | s t − ) P α ( s t − | s t − ) P ( s t − ) , (30)we see that now the expansion is deﬁned for the marginaldistribution of the path { s t − , s t } (see SupplementaryNote 1). The ﬁrst order equations for this method are m i,t ≈ (cid:90) D x tanh[ H i + (cid:88) j J ij m j,t − + x (cid:112) ∆ i,t ] , (31) C ik,t ≈ (cid:90) D ρ ik xy tanh[ H i + (cid:88) j J ij m j,t − + x (cid:112) ∆ i,t ] · tanh[ H k + (cid:88) l J kl m l,t − + y (cid:112) ∆ j,t ] − m i,t m k,t , i (cid:54) = k, (32) D il,t ≈ (cid:88) j J ij C jl,t − (cid:90) D x (cid:0) − tanh [ H i + (cid:88) j J ij m j,t − + x (cid:112) ∆ i,t ] (cid:1) . (33)Here we use D x = d x √ π exp (cid:0) − x (cid:1) , D ρ ik xy = d x d y π √ − ρ ik exp (cid:16) −

12 ( x + y ) − ρ ik xy − ρ ik (cid:17) , ∆ i,t = (cid:80) j J ij (1 − m j,t − ) and ρ xy = (cid:80) j J ij J kj (1 − m j,t − ) / (cid:112) ∆ i,t ∆ j,t .Derivations are described in Supplementary Note 4.These results are exactly the same as those presentedfor m t , D t in [41], adding an additional expression for C t . For this approximation, we do not consider the sec-ond order equations since they are computationally muchmore expensive than the other approximations. Plefka2[ t ]: Expansion around a pairwise model The proposed framework is also a powerful tool todevelop novel Plefka expansions. To make the expan-sions more accurately approximate target statistics, wecan consider a reference manifold composed of multipletime steps while maintaining some of the parameters inthe system (see Supplementary Note 1). Motivated bythis idea, here we propose new methods that directly ap-proximate pairwise activities of the units by choosing areference manifold that preserves a coupling term.Let us ﬁrst consider the joint probability of any arbi-trary pair of units at time t − t to compute thedelayed correlations (Fig. 2E, left). Namely, we considerthe joint probability of spins s i,t and s l,t − : P ( s i,t , s l,t − ) = (cid:88) s \ l,t − s t − P ( s i,t | s t − ) P ( s t − | s t − ) P ( s t − ) , (34)with s \ l,t − containing all elements of s t − except s l,t − .As a reference manifold Q t − t , we consider the depen-dency among only the units i and l : Q ( s i,t , s l,t − ) = Q ( s i,t | s l,t − ) Q ( s l,t − )= e s i,t θ i,t ( s l,t − ) θ i,t ( s l,t − ) e s l,t − Θ l,t − l,t − , (35)where θ i,t ( s l,t − ) = Θ i,t + ∆ il,t s l,t − . Orthogonal projec-tion to Q t is equivalent to minimizing the KL divergence D ( P || Q ) with respect to the parameters: ∂D ( P || Q ) ∂ Θ i,t (cid:12)(cid:12)(cid:12)(cid:12) θ t = θ ∗ t Θ t − = Θ ∗ t − = m Q ∗ i,t − m Pi,t = 0 , (36) ∂D ( P || Q ) ∂ Θ l,t − (cid:12)(cid:12)(cid:12)(cid:12) θ t = θ ∗ t Θ t − = Θ ∗ t − = m Q ∗ l,t − − m Pl,t − = 0 , (37) ∂D ( P || Q ) ∂ ∆ il,t (cid:12)(cid:12)(cid:12)(cid:12) θ t = θ ∗ t Θ t − = Θ ∗ t − = (cid:104) s i,t s l,t − (cid:105) ( Q ∗ · P ) − (cid:104) s i,t s l,t − (cid:105) P = 0 . (38)with (cid:104) x (cid:105) ( Q ∗ · P ) = (cid:88) s i,t s l,t − x Q ( s i,t | s l,t − , θ ∗ t , Θ ∗ t − ) P ( s l,t − ) . (39)As in the previous approximations, P ( s i,t , s l,t − ) is con-nected to Q ( s i,t , s l,t − | θ ∗ t , Θ ∗ t − ) through an α -dependentprobability P t ] α ( s i,t , s l,t − ) = (cid:88) s \ l,t − s t − P α ( s i,t | s t − ) P α ( s l,t − | s t − ) · P ( s \ l,t − | s t − ) P ( s t − ) , (40)with conditional probabilities given by P α ( s i,t | s t − ) = e s i,t h i,t ( α ) h i,t ( α ) , (41) h i,t ( α ) =(1 − α ) θ i,t ( s l,t − )+ α ( H i + (cid:88) j J ij s j,t − ) ,P α ( s l,t − | s t − ) = e s l,t − h l,t − ( α ) h l,t − ( α ) , (42) h l,t − ( α ) =(1 − α )Θ l,t − + α ( H l + (cid:88) n J ln s n,t − ) . As in the cases above, we can calculate the equations forthe ﬁrst and second order approximations (see Supple-mentary Note 5). Here, for the second order approxima-tion (which is more accurate than the ﬁrst order) we havethat: θ ∗ i,t ( s l,t − ) ≈ H i + (cid:88) j J ij m j,t − + J il ( s l,t − − m l,t − )+ (cid:0) (cid:88) j (cid:54) = l,n J ij J ln D jn,t − (cid:1) ( s l,t − − m l,t − ) − tanh θ ∗ i,t ( s l,t − ) (cid:88) jn (cid:54) = l J ij J in C jn,t − , (43)Θ ∗ l,t − ≈ H l + (cid:88) n J ln m n,t − − m l,t − (cid:88) mn J lm J ln C mn,t − , (44)which directly leads to calculation of means and delayedcorrelations as: m i,t ≈ (cid:88) s l,t − tanh θ ∗ i,t ( s l,t − ) Q ∗ ( s l,t − ) , (45) m l,t − ≈ tanh Θ ∗ l,t − , (46) D il,t ≈ (cid:88) s l,t − tanh θ ∗ i,t ( s l,t − ) s l,t − Q ∗ ( s l,t − ) − m i,t m l,t − . (47)These results are related to previous work [43] that in-cluded autocorrelations as one of the constraints to de-rive the Plefka approximation. Instead, here we provide aPlefka approximation that includes delayed correlationsbetween any pair of units.To compute the above approximations, we need toknow C t − and C t − . Here, we provide similar pair-wise Plefka approximations for the pairwise distributionat time t , P ( s i,t , s k,t ). Since s i,t , s k,t are conditionallyindependent, we can construct a model in which ﬁrst s k,t is computed from s t − , and then s i,t is computed condi-tioned on s k,t , s t − (Fig. 2E, right): Q ( s i,t , s k,t ) = Q ( s i,t | s k,t ) Q ( s k,t ) , (48) P t ] α ( s i,t , s k,t ) = (cid:88) s t − P α ( s i,t | s k,t , s t − ) (49) · P α ( s k,t | s t − ) P ( s t − ) , (50) with conditional probabilities given by P α ( s i,t | s k,t , s t − ) = e s i,t h i,t ( α ) h i,t ( α ) , (51) h i,t ( α ) =(1 − α ) θ i,t ( s k,t )+ α ( H i + (cid:88) j J ij s j,t − ) ,P α ( s k,t | s t − ) = e s k,t h k,t ( α ) h k,t ( α ) , (52) h k,t ( α ) =(1 − α )Θ k,t + α ( H k + (cid:88) l J kl s l,t − ) . Note that θ i,t is a function of s k,t that accounts for equal-time correlations between s i,t and s k,t . Computed sim-ilarly to delayed correlations, the second order approxi-mation yields (see Supplementary Note 5): θ ∗ i,t ( s k,t ) ≈ H i + (cid:88) j J ij m j,t − + ( (cid:88) jl J ij J kl C jl,t − )( s k,t − m k,t ) − tanh θ ∗ i,t ( s k,t ) (cid:88) jl J ij J il C jl,t − , (53)Θ ∗ k,t ≈ H k + (cid:88) l J kl m l,t − − m k,t (cid:88) jl J kj J kl C jl,t − . (54)Using these equations, approximate equal-time correla-tions are given as C ik,t ≈ (cid:88) s k,t tanh θ ∗ i,t ( s k,t ) s k,t Q ∗ ( s k,t ) − m i,t m k,t . (55)Note that the approximation of equal-time correlationsmay not be symmetric for C ik,t and C ki,t . In the resultsof this paper we use the average of the two. Comparison of the diﬀerent approximations

In the subsequent sections, we compare the family ofPlefka approximation methods described above by test-ing their performance in the forward and inverse Isingproblems. More speciﬁcally, we compare the second or-der approximations of Plefka[ t − , t ] and Plefka[ t ], theﬁrst order approximation of Plefka[ t − t ]. We de-ﬁne an Ising model as an asymmetric version of the ki-netic Sherrington-Kirkpatrick (SK) model, setting its pa-rameters around the equivalent of a ferromagnetic phasetransition in the equilibrium SK model. External ﬁelds H i are sampled from independent uniform distributions U ( − βH , βH ), H = 0 .

5, whereas coupling terms J ij are sampled from independent Gaussian distributions t . . . . . h m i,t i A t . . . . h C ik,t i B t . . . . h D il,t i C − .

25 0 .

00 0 .

25 0 .

50 0 . m o . . m p D .

005 0 .

010 0 .

015 0 . C o . . . . C p E .

00 0 .

01 0 .

02 0 . D o . . . D p F . . . β/β c − − − (cid:15) m G . . . β/β c − − − (cid:15) C H . . . β/β c − − (cid:15) D I Original Plefka[ t − , t ] Plefka[ t ] Plefka[ t −

1] Plefka2[ t ] FIG. 3.

Forward Ising problem . Top: Evolution of average activation rates (magnetizations) (A), equal-time correlations (B)and delayed correlations (C) found by diﬀerent mean-ﬁeld methods for β = β c . Middle: Comparison of the activation rates (D),equal-time correlations (E) and delayed correlations (F) found by the diﬀerent Plefka approximations (ordinate, p superscript)with the original values (abscissa, o superscript) for β = β c and t = 128. Black lines represent the identity line. Bottom:Average squared error of the magnetizations (cid:15) m = (cid:104)(cid:104) ( m oi,t − m pi,t ) (cid:105) i (cid:105) t (G), equal-time correlations (cid:15) C = (cid:104)(cid:104) ( C oik,t − C pik,t ) (cid:105) ik (cid:105) t (H), and delayed correlations (cid:15) D = (cid:104)(cid:104) ( D oik,t − D pik,t ) (cid:105) il (cid:105) t (I) for 21 values of β in the range [0 . β c , . β c ]. N ( β J N , β J σ N ), J = 1 , J σ = 0 .

1, where β is a scalingparameter (i.e., an inverse temperature).Generally, mean-ﬁeld methods are suitable for approx-imating properties of systems with small ﬂuctuations.However, there is evidence that many biological systemsoperate in critical, highly ﬂuctuating regimes [5, 6]. Inorder to examine diﬀerent approximations in such a bi-ologically plausible yet challenging situation, we selectthe model parameters around a phase transition pointdisplaying large ﬂuctuations.To ﬁnd such conditions, we employed path integralmethods to solve the asymmetric SK model (Supplemen-tary Note 6). We ﬁnd that the stationary solution ofthe asymmetric model displays for our choice of param-eters a non-equilibrium analogue of a critical point fora ferromagnetic phase transition, which takes place at β c ≈ . H shifts the phase transition pointfrom β = 1 obtained at H = . By simulation of the ﬁnite size system, we conﬁrmed that the maximum ﬂuc-tuations in the model are found near the theoretical β c ,which shows maximal covariance values (see Supplemen-tary Note 6, Supplementary Fig. 2).Fluctuations of a system are generally expected to bemaximized at a critical phase transition [19]. In addition,entropy production (a signature of time irreversibility)has been suggested as an indicator of phase transitions.For example, it presents a peak at the transition point ofa continuous phase transition in a non-equilibrium Curie-Weiss Ising model with oscillatory ﬁeld [50] and someinstances of mean-ﬁeld majority vote models [51, 52]. Wefound that the entropy production of the kinetic Isingsystem is also maximized around β c (discussed later, seealso Methods for its derivation).0 Forward Ising problem

We examine the performance of the diﬀerent Plefkaexpansions in predicting the evolution of an asymmet-ric SK model of size N = 512 with random H and J .To study the non-stationary transient dynamics of themodel, we start from s = (all elements set to 1 at t = 0) and recursively update its state for T = 128 steps.We repeated this stochastic simulation for R = 10 trialsfor 21 values of β in the range [0 . β c , . β c ], except forthe reconstruction of the phase transition where we used R = 10 and 201 values of β in the same range. Usingthe R samples, we computed the statistical moments andcumulants of the system, m t , C t , and D t at each timestep. We then computed their averages over the systemunits, i.e., (cid:104) m i,t (cid:105) i , (cid:104) C ik,t (cid:105) ik and (cid:104) D il,t (cid:105) il , where the anglebracket denotes average over indices of its subscript.The black solid lines in Fig. 3A,B,C display non-stationary dynamics of these averaged statistics from t = 0 , . . . , β = β c .In comparison, color lines display these statistics pre-dicted by the family of Plefka approximations that arerecursively computed using the obtained equations, start-ing from the initial state m = , C = and D = .We observe that although the recursive application ofall the approximation methods provides good predictionsfor the transient dynamics of the mean activation rates m t until its convergence (Fig. 3A), the predictions usingPlefka[ t ] and especially the proposed Plefka2[ t ] approxi-mations are closer to the true dynamics than the others.Evolution of the mean equal-time and time-delayed cor-relations C t , D t is precisely captured only by our newmethod Plefka2[ t ]. In contrast, Plefka[ t ] overestimatescorrelations while Plefka[ t −

1] and Plefka[ t − , t ] under-estimate correlations.Performance of the methods in predicting individualactivation rates and correlations are displayed in Fig. 3D,E, F by comparing vectors m t , C t and D t at the last timestep ( t = 128) of the original model ( o superscript) andthose of the Plefka approximations ( p superscript). Foractivation rates m t , the proposed Plefka2[ t ] and Plefka[ t ]perform slightly better than the others (see also Fig. 3A).While being overestimated by Plefka[ t ], underestimatedmoderately by Plefka[ t −

1] and signiﬁcantly by Plefka[ t − , t ], equal-time and time-delayed correlations C t , D t arebest predicted by Plefka2[ t ] (Fig. 3E,F).The above results are obtained at the critical β = β c ,intuitively the most challenging point for mean-ﬁeld ap-proximations. In order to further show that our novelapproximation Plefka2[ t ] systematically outperforms theothers in a wider parameter range, we repeated the anal-ysis for diﬀerent inverse temperatures β (the same ran-dom parameters are applied for all β ). Fig. 3G,H,U re-spectively show the averaged squared errors (averagedover time and units) of the activation rates (cid:15) m , equal-time correlations (cid:15) C and delayed correlations (cid:15) D betweenthe original model and approximations, averaged overunits and time for values of β in the range [0 . β c , . β c ]. Figs. 3G,H,I show that Plefka2[ t ] outperforms the othermethods in computing m t , C t , D t (with the exception ofa certain region of β > β c in which Plefka[ t ] is slightlybetter), yielding consistently a low error bound for allvalues of β . Errors of these approximations are smallerwhen the system is away from β c . Inverse Ising problem

We apply the approximation methods to the inverseIsing problem by using the data generated above for thetrajectory of T = 128 steps and R = 10 trials to infer theparameters of the model, H and J . The model param-eters are estimated by the Boltzmann learning methodunder the maximum likelihood principle: H and J areupdated to minimize the diﬀerences between the aver-age rates m t or delayed correlations D t of the originaldata and the model approximations, which can signiﬁ-cantly reduce computational time (see Methods). WhileBoltzmann learning requires to compute the likelihoodof every point in a trajectory and every trial ( RT cal-culations) each iteration, we can estimate the gradientat each iteration in a one-shot computation by apply-ing the Plefka approximations (Methods). At β = β c (Fig. 4A,B), we observe that the classical Plefka[ t − , t ]approximation adds signiﬁcant oﬀset values to the ﬁelds H and couplings J . In contrast, Plefka[ t ], Plefka[ t − t ] are all precise in estimating the values of H and J .Figure 4C and D show the mean squared error (cid:15) H , (cid:15) J for bias terms and couplings between the original modeland the inferred values for diﬀerent β . In this case, er-rors are large in the estimation of J for Plefka[ t − , t ].In comparison, Plefka[ t ], Plefka[ t −

1] and Plefka2[ t ] workequally well even in the high ﬂuctuation regime ( β ≈ β c ).Since the inverse Ising problem is solved by applying ap-proximation one single time step (per iteration), it is notas challenging as the forward problem that can accumu-late errors by recursively applying the approximations.Therefore, diﬀerent approximations other than the clas-sical mean-ﬁeld Plefka[ t − , t ] perform equally well in thiscase. Phase transition reconstruction

We have shown how diﬀerent methods perform in com-puting the behavior of the system (forward problem) andinferring the parameters of a given network from its ac-tivation data (inverse problem). Combining the two, wecan ask how well the methods explored here can recon-struct the behavior of a system from data, potentiallyexploring behaviors under diﬀerent conditions than therecorded data.First, in Fig. 5.A-C we examine how the diﬀerent ap-proximation methods approximate ﬂuctuations (equal-time and time-delayed covariances) and the entropy pro-1 − . − .

25 0 .

00 0 .

25 0 . H oi − H pi A − . − .

01 0 .

00 0 .

01 0 . J oij − . . . . J pij B . . . β/β c − − (cid:15) H C . . . β/β c − − (cid:15) J D Plefka[ t − , t ] Plefka[ t ] Plefka[ t −

1] Plefka2[ t ] FIG. 4.

Inverse Ising problem . Top: Inferred external ﬁelds (A) and couplings (B) found by diﬀerent mean-ﬁeld modelsplotted versus the real ones for β = β c . Black lines represent the identity lines. Bottom: Average squared error of inferredexternal ﬁelds (cid:15) H = (cid:104) ( H oi − H pi ) (cid:105) i (C) and couplings (cid:15) J = (cid:104) ( J oij − J pij ) (cid:105) ij (D) for 21 values of β in the range [0 . β c , . β c ]. duction (see Methods) at t = 128 after solving the for-ward problem by recursively applying the approxima-tions for the 128 steps. As we mentioned above, theasymmetric SK model explored here presents maximumﬂuctuations and maximum entropy production around β = β c (Supplementary Note 6, Supplementary Fig. 2).However, we see that Plefka[ t − , t ] and Plefka[ t − C t and D t of the original SK model around the transition point.Plefka[ t ] and Plefka2[ t ] show much better performancein capturing the behavior of C t and D t in the phasetransition, although Plefka[ t ] overestimates both corre-lations. Additionally, all the methods capture the phasetransition in entropy production, though Plefka[ t ] overes-timates its value around β c and Plefka2[ t ] is more precisethan the other methods.Next, we combine the forward and inverse Ising prob-lem and try to reproduce the transition in the asym-metric SK model in the models inferred from the data.We ﬁrst obtain the values of H , J by solving the inverseproblem from the data sampled at β = β c , and next wesolve again the forward problem with those estimatedparameters re-scaled by a new inverse temperature ˜ β .The results for the correlations (Fig. 5.D-E) show thatin this case Plefka[ t − , t ] works badly, not being able tocapture the transition. Plefka[ t −

1] shows similar per-formances as in the forward problem, and Plefka[ t ] andPlefka2[ t ] have a similar behavior, underestimating ﬂuc-tuations slightly. When we analyze entropy production ofthe system (Fig. 5.F), we ﬁnd that Plefka2[ t ] exhibits bet-ter performance with a high precision, with Plefka[ t − t ] underestimating it,and Plefka[ t − , t ] not capturing the phase transition.Overall, the results above suggest that Plefka2[ t ] is bet-ter suited to identify non-equilibrium phase transitions in models reconstructed from experimental data. DISCUSSION

We have proposed a framework that uniﬁes diﬀer-ent mean-ﬁeld approximations of the evolving statisticalproperties of non-equilibrium Ising models. This allowsus to derive approximations premised on speciﬁc assump-tions about the correlation structure of the system previ-ously proposed in the literature. Furthermore, using ourframework we derive a new approximation (Plefka2[ t ])using atypical assumptions for mean-ﬁeld methods, i.e.the maintenance of pairwise correlations in the system.This new pairwise approximation outperforms existingones for approximating the behavior of an asymmetricSK model near the non-equilibrium equivalent of a ferro-magnetic phase transition (see Supplementary Note 6),where classical mean-ﬁeld approximations face problems.This shows that the proposed methods are useful tools toanalyze large-scale, non-equilibrium dynamics near crit-ical regimes expected for biological and social systems.However, we note that low-temperature spin phases (e.g.the spin-glass phase in symmetric models) also imposelimitations on mean-ﬁeld approximations [32, 41], whichcould be further explored with methods like the ones pre-sented here.The generality of this framework allows us to pictureother approximations with atypical assumptions. For ex-ample, the Sessak-Monasson expansion [53] for an equi-librium Ising model assumes a linear relation between α and spin correlations. An equivalent equilibrium expan-sion could use an eﬀective ﬁeld h ( α ) nonlinearly depen-dent on α , satisfying linear C t ( α ) = α C t or D t ( α ) = α D t relations. As another extension, Plefka2[ t ] could in-2 . . . β/β c . . . h C ik,t i A . . . β/β c . . . h D il,t i B . . . β/β c h σ t i C . . . β . . . h C ik,t i D . . . β . . . h D il,t i E . . . β h σ t i F Original Plefka[ t − , t ] Plefka[ t ] Plefka[ t −

1] Plefka2[ t ] FIG. 5.

Reconstructing phase transition of kinetic Ising systems . Top: Average of the Ising model’s equal-timecorrelations (A), delayed correlations (B), and entropy production (shown as an exponential for better presentation of itsmaximum) (C), at the last step t = 128 found by diﬀerent mean-ﬁeld methods for β = β c . Bottom (D, E, F): The same asabove using the reconstructed network H , J by solving the inverse Ising problem at β = β c and multiplying a ﬁctitious inversetemperature ˜ β to the estimated parameters. The stars are marked at the values of ˜ β that yield maximum ﬂuctuations ormaximum entropy production. corporate higher-order interactions. As Eq. 43, 53 areeach equivalent to two mean-ﬁeld approximations with s l,t − = ± t ] wouldinvolve 2 M − equations, increasing accuracy but alsocomputational costs. In general, reference models Q ( s t )set coupling parameters of the model to zero at somesteps of its dynamics. Other parameters (e.g. ﬁelds) areeither free parameters ﬁtted as m-projection from P ( s t ),or preserved to their original value (see SupplementaryNote 7 for comparing free and ﬁxed parameters of eachmodel). Augmenting accuracy by increasing parametersoften involves a computational cost. As a practical guide-line for using each method, Supplementary Note 7 com-pares their precision and computation time in the forwardand inverse problems (see also Supplementary Fig. 3,4).Asides from its theoretical implications, our uniﬁedframework oﬀers analysis tools for diverse data-drivenresearch ﬁelds. In neuroscience, it has been popular tostudy the activity of ensembles of neurons by inferringan equilibrium Ising model with homogeneous (ﬁxed)parameters [23] or inhomogeneous (time-dependent) pa-rameters [25, 54] from empirical data. Extended analysesbased on the equilibrium model have reported that neu-rons operate near a critical regime [5, 6]. However, stud-ies of non-equilibrium dynamics in neural spike trainsare scarce [7, 26, 55], partly due to the lack of systematicmethods for analysing large-scale non-equilibrium datafrom neurons exhibiting large ﬂuctuations. The proposedpairwise model Plefka2[ t ] is suitable for simulating suchnetwork activities, being more accurate than previous methods in predicting the network evolution at criticality(Fig. 3) and in testing if the system is near the maximallyﬂuctuating regime (Fig. 5). In particular, application ofour methods for computing entropy production in non-equilibrium systems could provide tools for characteriz-ing the non-equilibrium dynamics of neural systems [56].In summary, a uniﬁed framework of mean-ﬁeld theo-ries oﬀers a systematic way to construct suitable mean-ﬁeld methods in accordance with the statistical proper-ties of the systems researchers wish to uncover. This isexpected to foster a variety of tools to analyze large-scalenon-equilibrium systems in physical, biological, and so-cial systems. METHODSBoltzmann learning in the inverse Ising problem

Let S rt = { S r ,t , S r ,t , . . . , S rN,t } for t = 1 , . . . , T be ob-served states of a process described by Eq. 1 at the r -thtrial ( r = 1 , . . . , R ). We also deﬁne S T to represent theprocesses from all trials. The inverse Ising problem con-sists in inferring the external ﬁelds H and couplings J ofthe system. These parameters can be estimated by max-imizing the log-likelihood l ( S T ) of the observed statesunder the model: l ( S T ) = log T (cid:89) t =1 R (cid:89) r =1 P ( S rt | S rt − )3= (cid:88) t (cid:88) r (cid:88) i (cid:0) S ri,t h ri,t − log 2 cosh h ri,t (cid:1) , (56)with h ri,t = H i + (cid:80) j J ij S rj,t − . The learning steps areobtained as: ∂l ( S T ) ∂H i = RT ( (cid:104) S ri,t (cid:105) r,t − (cid:104) tanh h ri,t (cid:105) r,t ) , (57) ∂l ( S T ) ∂J il = RT ( (cid:104) S ri,t S rl,t − (cid:105) r,t − (cid:104) tanh h ri,t S r,tl,t − (cid:105) r,t ) , (58)where (cid:104)·(cid:105) r denotes average over trials. We solve the in-verse Ising problem by applying these equations as a gra-dient ascent rule adjusting H and J . The second termsof Eqs. 57, 58 need to be computed at every iteration,thus the computational cost grows linearly with R × T .However, the use of mean-ﬁeld approximations can sig-niﬁcantly reduce the cost when a large number of samples R and time bins T are used to correctly estimate activa-tion rates and correlations in large networks. Here thesecond terms can be written as (cid:104) tanh h ri,t (cid:105) r,t = (cid:88) s , ˜ s s i P ( s | ˜ s ) P (˜ s ) = m i , (59) (cid:104) tanh h ri,t S rl,t − (cid:105) r,t = (cid:88) s , ˜ s s i ˜ s l P ( s | ˜ s ) P (˜ s )= D il + m i ˜ m l , (60)where P (˜ s ) = RT (cid:80) r,t δ (˜ s , S rt ) is the empirical distribu-tion averaged over trials and trajectories (with δ beinga Kronecker delta) and ˜ m l is the average activation ratecomputed from the empirical distribution. P ( s | ˜ s ) is de-ﬁned as Eq. 1. We then approximate m i and D il usingthe mean-ﬁeld equations. Note that when we apply themean-ﬁeld equations, we replaced all statistics related tothe previous step with those computed by the empiricaldistribution. By applying the mean-ﬁeld methods, we re-duced the computation of R trials of trajectories of length T into a single computation (instead of RT calculations).In our numerical tests, gradient ascent was executed us-ing learning coeﬃcients η H = 0 . /RT, η J = 1 / ( RT √ N ),starting from H = , J = . Entropy production of the kinetic Ising model

The entropy production is deﬁned as the KL divergencebetween the forward and backward path, quantifying theirreversibility of the system [17, 55, 57]: (cid:104) σ t (cid:105) = 12 (cid:88) s t , s t − (cid:0) P ( s t − ) P ( s t | s t − ) − P ( s t ) P B ( s t − | s t ) (cid:1) · log P ( s t | s t − ) P ( s t − ) P B ( s t − | s t ) P ( s t ) . (61)where P B ( s t − | s t ) is a probability of the backward tra-jectory deﬁned as in Eq. 1 but switching s t and s t − . Assuming a non-equilibrium steady state, where P ( s t ) = P ( s t − ), the entropy production of the kinetic Ising sys-tem is computed as: (cid:104) σ t (cid:105) = (cid:88) s t , s t − P ( s t , s t − ) (cid:16) (cid:88) i H i ( s i,t − s i,t − )+ (cid:88) ij J ij ( s i,t s j,t − − s i,t − s j,t )+ log P ( s t − ) − log P ( s t ) − log(2 cosh( H i + (cid:88) j J ij s j,t − ))+ log(2 cosh( H i + (cid:88) j J ij s j,t )) (cid:17) = (cid:88) ij J ij ( D ij,t − D ji,t ) = (cid:88) ij ( J ij − J ji ) D ij,t . (62) ACKNOWLEDGMENTS

The authors thank Yasser Roudi and MasanaoIgarashi for valuable comments and discussions onthis manuscript. This work was supported in partby the Cooperative Intelligence Joint Research be-tween Kyoto University and Honda Research Insti-tute Japan, MEXT/JSPS KAKENHI Grant NumberJP 20K11709, and the grant of Joint Research bythe National Institutes of Natural Sciences (NINS Pro-gram No. 01112005). M.A. was funded by the Euro-pean Union’s Horizon 2020 research and innovation pro-gramme under the Marie Sk(cid:32)lodowska-Curie grant agree-ment No 892715 and the University of the Basque Coun-try UPV/EHU post-doctoral training program grantESPDOC17/17, and supported in part by the BasqueGovernment project IT 1228-19 and project Outonomy(PID2019-104576GB-I00) by the Spanish Ministry of Sci-ence and Innovation.

CODE AVAILABILITY

The source code for implementing the results in thiswork is available under GPL license at GitHub https://github.com/MiguelAguilera/kinetic-Plefka-expansions [58] (DOI:10.5281/zenodo.4357634).

DATA AVAILABILITY

The datasets generated and analysed during thecurrent study are available under CC BY licenseat Zenodo https://zenodo.org/record/4318983 [59](DOI:10.5281/zenodo.4318983).4

COMPETING INTERESTS

The authors declare no competing interests.

AUTHOR CONTRIBUTIONS

M.A., S.A.M. and H.S. designed and reviewed research;M.A. contributed analytical and numerical results; M.A., S.A.M. and H.S. wrote the paper.

ADDITIONAL INFORMATIONSupplementary information

The online version contains supplementary materialavailable at https://doi.org/10.1038/s41467-021-20890-5 . [1] J. P. Nguyen, F. B. Shipley, A. N. Linder, G. S. Plummer,M. Liu, S. U. Setru, J. W. Shaevitz, and A. M. Leifer,Whole-brain calcium imaging with cellular resolution infreely behaving Caenorhabditis elegans, Proceedings ofthe National Academy of Sciences , E1074 (2016).[2] M. B. Ahrens, M. B. Orger, D. N. Robson, J. M. Li, andP. J. Keller, Whole-brain functional imaging at cellularresolution using light-sheet microscopy, Nature Methods , 413 (2013).[3] C. Stringer, M. Pachitariu, N. Steinmetz, M. Carandini,and K. D. Harris, High-dimensional geometry of popula-tion responses in visual cortex, Nature , 1 (2019).[4] G. Nicolis and I. Prigogine, Self-Organization in Nonequi-librium Systems: From Dissipative Structures to Orderthrough Fluctuations , 1st ed. (Wiley, New York, 1977).[5] G. Tkaˇcik, T. Mora, O. Marre, D. Amodei, S. E. Palmer,M. J. Berry, and W. Bialek, Thermodynamics and signa-tures of criticality in a network of neurons, Proceedingsof the National Academy of Sciences , 11508 (2015).[6] T. Mora, S. Deny, and O. Marre, Dynamical Critical-ity in the Collective Activity of a Population of RetinalNeurons, Physical Review Letters , 078105 (2015).[7] J. Hertz, Y. Roudi, and J. Tyrcha, Ising model for in-ferring network structure from spike data: In Principalof Neural Coding, in

Principles of Neural Coding (CRCPress, 2013) pp. 527–546.[8] Y. Roudi, B. Dunn, and J. Hertz, Multi-neuronal activityand functional connectivity in cell assemblies, Currentopinion in neurobiology , 38 (2015).[9] J.-P. Bouchaud, Crises and collective socio-economic phe-nomena: simple models and challenges, Journal of Sta-tistical Physics , 567 (2013).[10] D. J. Evans, E. G. D. Cohen, and G. P. Morriss, Proba-bility of second law violations in shearing steady states,Physical review letters , 2401 (1993).[11] C. Jarzynski, Nonequilibrium equality for free energy dif-ferences, Physical Review Letters , 2690 (1997).[12] G. E. Crooks, Nonequilibrium measurements of free en-ergy diﬀerences for microscopically reversible Markoviansystems, Journal of Statistical Physics , 1481 (1998).[13] G. E. Crooks, Entropy production ﬂuctuation theoremand the nonequilibrium work relation for free energy dif-ferences, Physical Review E , 2721 (1999).[14] J. L. Lebowitz and H. Spohn, A Gallavotti–Cohen-typesymmetry in the large deviation functional for stochasticdynamics, Journal of Statistical Physics , 333 (1999).[15] S. Ito, M. Oizumi, and S.-i. Amari, Uniﬁed frameworkfor the entropy production and the stochastic interac- tion based on information geometry, Physical Review Re-search , 033048 (2020).[16] D. J. Evans and D. J. Searles, The ﬂuctuation theorem,Advances in Physics , 1529 (2002).[17] U. Seifert, Stochastic thermodynamics, ﬂuctuation the-orems and molecular machines, Reports on progress inphysics , 126001 (2012).[18] P. Gaspard, Time Asymmetry in Nonequilibrium Statis-tical Mechanics, in Special Volume in Memory of IlyaPrigogine (John Wiley & Sons, Ltd, 2007) pp. 83–133.[19] S. R. A. Salinas, The Ising Model, in

Introduction toStatistical Physics , Graduate Texts in ContemporaryPhysics, edited by S. R. A. Salinas (Springer New York,New York, NY, 2001) pp. 257–276.[20] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learn-ing algorithm for Boltzmann machines, Cognitive science , 147 (1985).[21] A. Witoelar and Y. Roudi, Neural network reconstruc-tion using kinetic Ising models with memory, BMC Neu-roscience , P274 (2011).[22] C. Donner and M. Opper, Inverse Ising problem in con-tinuous time: A latent variable approach, Physical Re-view E , 062104 (2017).[23] E. Schneidman, M. J. Berry, R. Segev, and W. Bialek,Weak pairwise correlations imply strongly correlated net-work states in a neural population, Nature , 1007(2006).[24] S. Cocco, S. Leibler, and R. Monasson, Neuronal cou-plings between retinal ganglion cells inferred by eﬃcientinverse statistical physics methods, Proceedings of theNational Academy of Sciences , 14058 (2009).[25] H. Shimazaki, S.-i. Amari, E. N. Brown, and S. Gr¨un,State-Space Analysis of Time-Varying Higher-OrderSpike Correlation for Multiple Neural Spike Train Data,PLOS Computational Biology , e1002385 (2012).[26] J. Tyrcha, Y. Roudi, M. Marsili, and J. Hertz, The ef-fect of nonstationarity on models inferred from neuraldata, Journal of Statistical Mechanics: Theory and Ex-periment , P03005 (2013).[27] D. J. Thouless, P. W. Anderson, and R. G. Palmer, Solu-tion of ’Solvable model of a spin glass’, The PhilosophicalMagazine: A Journal of Theoretical Experimental andApplied Physics , 593 (1977).[28] H. J. Kappen and F. B. Rodr´ıguez, Eﬃcient Learningin Boltzmann Machines Using Linear Response Theory,Neural Computation , 1137 (1998).[29] Y. Roudi, E. Aurell, and J. A. Hertz, Statistical Physicsof Pairwise Probability Models, Frontiers in Computa- tional Neuroscience , 10.3389/neuro.10.022.2009 (2009).[30] Y. Roudi, J. Tyrcha, and J. Hertz, Ising model for neuraldata: Model quality and approximate methods for ex-tracting functional connectivity, Physical Review E ,051915 (2009).[31] C. Donner, K. Obermayer, and H. Shimazaki, Approxi-mate Inference for Time-Varying Interactions and Macro-scopic Dynamics of Neural Populations, PLOS Compu-tational Biology , e1005309 (2017).[32] T. Plefka, Convergence condition of the TAP equationfor the inﬁnite-ranged Ising spin glass model, Journal ofPhysics A: Mathematical and general , 1971 (1982).[33] T. Tanaka, Mean-ﬁeld theory of Boltzmann machinelearning, Physical Review E , 2302 (1998).[34] T. Tanaka, A theory of mean ﬁeld approximation, in Ad-vances in Neural Information Processing Systems (1999)pp. 351–360.[35] C. Bhattacharyya and S. S. Keerthi, Information geome-try and Plefka’s mean-ﬁeld theory, Journal of Physics A:Mathematical and General , 1307 (2000).[36] T. Tanaka, Information Geometry of Mean-Field Ap-proximation, in Advanced Mean Field Methods: Theoryand Practice (MIT Press, 2001).[37] S. Amari, S. Ikeda, and H. Shimokawa, Information ge-ometry of α -projection in mean-ﬁeld approximation, in Advanced Mean Field Methods: Theory and Practice (MIT Press, 2001).[38] H. J. Kappen and J. J. Spanjers, Mean ﬁeld theory forasymmetric neural networks, Physical Review E , 5658(2000).[39] Y. Roudi and J. Hertz, Dynamical TAP equations fornon-equilibrium Ising spin glasses, Journal of Statisti-cal Mechanics: Theory and Experiment , P03031(2011).[40] Y. Roudi and J. Hertz, Mean Field Theory for Nonequi-librium Network Reconstruction, Physical Review Let-ters , 048702 (2011).[41] M. M´ezard and J. Sakellariou, Exact mean-ﬁeld inferencein asymmetric kinetic Ising systems, Journal of Statisti-cal Mechanics: Theory and Experiment , L07001(2011).[42] H. Mahmoudi and D. Saad, Generalized mean ﬁeld ap-proximation for parallel dynamics of the Ising model,Journal of Statistical Mechanics: Theory and Experi-ment , P07001 (2014).[43] L. Bachschmid-Romano, C. Battistin, M. Opper, andY. Roudi, Variational perturbation and extended Plefkaapproaches to dynamics on random networks: the caseof the kinetic Ising model, Journal of Physics A: Mathe-matical and Theoretical , 434003 (2016). [44] S. Press´e, K. Ghosh, J. Lee, and K. A. Dill, Principlesof maximum entropy and maximum caliber in statisticalphysics, Reviews of Modern Physics , 1115 (2013).[45] S. Amari and H. Nagaoka, Methods of information geom-etry , Vol. 191 (American Mathematical Soc., 2007).[46] S. Amari,

Information geometry and its applications , Vol.194 (Springer, 2016).[47] S. Amari, K. Kurata, and H. Nagaoka, Information ge-ometry of Boltzmann machines, IEEE Transactions onneural networks , 260 (1992).[48] M. Oizumi, N. Tsuchiya, and S.-i. Amari, Uniﬁed frame-work for information integration based on informationgeometry, Proceedings of the National Academy of Sci-ences , 14817 (2016).[49] L. K. Saul and M. I. Jordan, Exploiting tractable sub-structures in intractable networks, in Advances in NeuralInformation Processing Systems (1996) pp. 486–492.[50] Y. Zhang and A. C. Barato, Critical behavior of entropyproduction and learning rate: Ising model with an oscil-lating ﬁeld, Journal of Statistical Mechanics: Theory andExperiment , 113207 (2016).[51] L. Crochik and T. Tom´e, Entropy production in themajority-vote model, Physical Review E , 057103(2005).[52] C. F. Noa, P. E. Harunari, M. de Oliveira, and C. Fiore,Entropy production as a tool for characterizing nonequi-librium phase transitions, Physical Review E , 012104(2019).[53] V. Sessak and R. Monasson, Small-correlation expansionsfor the inverse Ising problem, Journal of Physics A: Math-ematical and Theoretical , 055001 (2009).[54] E. Granot-Atedgi, G. Tkaˇcik, R. Segev, and E. Schneid-man, Stimulus-dependent Maximum Entropy Models ofNeural Population Codes, PLOS Computational Biology , e1002922 (2013).[55] R. Cofr´e, L. Videla, and F. Rosas, An Introduction tothe Non-Equilibrium Steady States of Maximum EntropySpike Trains, Entropy , 884 (2019).[56] C. W. Lynn, E. J. Cornblath, L. Papadopoulos, M. A.Bertolero, and D. S. Bassett, Non-equilibrium dynam-ics and entropy production in the human brain, arXivpreprint arXiv:2005.02526 (2020).[57] J. Schnakenberg, Network theory of microscopic andmacroscopic behavior of master equation systems, Re-views of Modern Physics48