[PDF] Modelling time-varying interactions in complex systems: the Score Driven Kinetic Ising Model

Abstract

We introduce a generalization of the Kinetic Ising Model using the score-driven approach, which allows the efficient estimation and filtering of time-varying parameters from time series data. We show that this approach allows to overcome systematic errors in the parameter estimation, and is useful to study complex systems of interacting variables where the strength of the interactions is not constant in time: in particular we propose to quantify the amount of noise in the data and the reliability of forecasts, as well as to discriminate between periods of higher or lower endogeneity in the observed dynamics, namely when interactions are more or less relevant in determining the realization of the observations. We apply our methodology to three different financial settings to showcase some realistic applications, focusing on forecasting high-frequency volatility of stocks, measuring its endogenous component during extreme events in the market, and analysing the strategic behaviour of traders around news releases. We find interesting results on financial systems and, given the widespread use of Ising models in multiple fields, we believe our approach can be efficiently adapted to a variety of settings, ranging from neuroscience to social sciences and machine learning.

Full PDF

MModelling time-varying interactions in complexsystems: the Score Driven Kinetic Ising Model

Carlo Campajola , Domenico Di Gangi , Fabrizio Lillo , andDaniele Tantari Scuola Normale Superiore, Pisa, Italy Department of Mathematics - University of Bologna, ItalyJuly 31, 2020

Abstract

We introduce a generalization of the Kinetic Ising Model using thescore-driven approach, which allows the eﬃcient estimation and ﬁlteringof time-varying parameters from time series data. We show that this ap-proach allows to overcome systematic errors in the parameter estimation,and is useful to study complex systems of interacting variables where thestrength of the interactions is not constant in time: in particular we pro-pose to quantify the amount of noise in the data and the reliability offorecasts, as well as to discriminate between periods of higher or lower en-dogeneity in the observed dynamics, namely when interactions are moreor less relevant in determining the realization of the observations. Weapply our methodology to three diﬀerent ﬁnancial settings to showcasesome realistic applications, focusing on forecasting high-frequency volatil-ity of stocks, measuring its endogenous component during extreme eventsin the market, and analysing the strategic behaviour of traders aroundnews releases. We ﬁnd interesting results on ﬁnancial systems and, giventhe widespread use of Ising models in multiple ﬁelds, we believe our ap-proach can be eﬃciently adapted to a variety of settings, ranging fromneuroscience to social sciences and machine learning.

Complex systems, characterized by a large number of simple components thatinteract with each other in a non-linear way, have been an increasingly impor-tant ﬁeld of study over the last decades. Ever since the milestone paper byAnderson [1] the notion that “more is diﬀerent” has been absorbed in a varietyof disciplines, and we now understand that accounting for the complexity ofecologies, societies, economies, physical and biological systems is necessary toobtain insight on how these systems work and evolve.Interactions make the whole more than the sum of its parts [2]: for this reasonthe eﬀort when modelling complex systems is ultimately directed to understandhow they arise, how to parametrize them into quantitative models and how toestimate them from empirical measurements.1 a r X i v : . [ q -f i n . S T ] J u l eality, however, is usually not only complex but also complicated: the wayin which events happen is typically the result of a huge number of conditionsand parameters, each holding its own signiﬁcance and making it very hard tocompare empirical data with models. One complication that is ubiquitous toreal complex systems, and particularly to the ones that cannot be reproducedin laboratory experiments, is non-stationarity. It is often the case in fact thatsystems change over time, possibly even in response to their own dynamics:traders in ﬁnancial markets continuously adapt their strategic decision-makingto each other’s actions [3] and to new information [4]; preys change their behav-ior to avoid predators [5]; neurons reinforce (or inhibit) connections in responseto stimuli [6]. Making accurate descriptions assuming that all parameters areconstant is then frequently very hard if not impossible, resulting either in verystrong limitations to sample selection and experimental design or in the ne-cessity to develop models that are able to capture this non-stationarity withreasonable eﬀort and accuracy.There are examples of successful attempts to overcome this issue: for in-stance the introduction of temporal networks [7] as the space in which theseinteractions are embedded has provided suitable methods to account for rela-tions that are conﬁned in time. More generally these network models refer tothe broader literature on Hidden Markov Models [8, 9, 10], where the basicassumption is that the observations come from a model whose parameters aredependent on an underlying, hidden Markovian dynamics that makes the sys-tem’s state evolve in time. While these approaches shine when the interactionnetwork structure is known, as is the case for instance in transportation net-works [11] or interbank networks [12], when the network structure is unknown itsinference can be cumbersome and dictates important model selection decisionson how to characterize the hidden Markov dynamics.In this article we introduce a time-varying parameters generalization of theKinetic Ising Model (KIM) [13, 14] through the use of the score-driven method-ology [15, 16]. Ising models in general are known to be some of the simplestmodels of complex systems that have been developed in the ﬁeld of statisticalphysics, and are at the roots of the theory on collective behavior and phasetransitions. The KIM in particular has been adopted in its standard formu-lation in a variety of ﬁelds, such as neuroscience [17, 18, 19], computationalbiology [20, 21, 22], economics and ﬁnance [23, 24, 25, 26] and has been studiedin the literature of machine learning [27, 28, 29] to understand recurrent neuralnetwork models.The model describes the time evolution of a set of binary variables s ( t ) ∈{− , } N for t = 1 , . . . , T , typically called “spins” in the statistical physicsliterature where it originated, which can inﬂuence each other through a timelagged interaction. We focus on its applications to time series analysis andextend it to allow the presence of time-varying parameters with score-drivendynamics [15, 16], which is a relatively recent and extremely eﬀective methodto describe non-stationary time series.In its standard form the Kinetic Ising Model for time series [30] involvesthree main sets of parameters: a N × N interaction or coupling matrix J , a N -dimensional vector h and a N × K matrix b characterizing the interactionwith external covariates x ( t ) ∈ R K . The model is Markovian with synchronousdynamics, characterized by the transition probability2 ( s ( t + 1) | s ( t ) , x ( t ); β, J, h, b ) == Z − ( t ) exp  β (cid:88) i s i ( t + 1) (cid:88) j J ij s j ( t ) + h i + (cid:88) k b ik x k ( t )  (1)where Z ( t ) is a normalizing constant commonly known as the partition func-tion in statistical mechanics, and β is a parameter that determines the amountof noise in the dynamics, known as the inverse temperature ; the smaller is β ,the more the dynamics of the s ( t ) evolves randomly, to the point that in thelimit β → s ( t ) becomes a vector of independent Bernoulli random variableswith parameter 0 .

5, while if β → + ∞ the dynamics becomes fully deterministic.Typically the quantity inside the inner brackets of Eq. 1 is called the eﬀectiveﬁeld perceived by spin i at time t , and in the following we will refer to it as g i ( t ) = (cid:80) j J ij s j ( t ) + h i + (cid:80) k b ik x k ( t ). For ease of notation we can also deﬁnethe set of static parameters of the KIM, Θ = ( J, h, b ).There are three main reasons that motivate our interest in developing a time-varying parameters version of this model: the ﬁrst is that, as we will argue inthe following paragraphs, the introduction of a dynamic noise parameter β ( t )allows to better understand the randomness in the dynamics, quantifying thelevel of noise at any point in time and thus leading to more informed forecasts;the second reason is that the standard Maximum Likelihood Estimators of theKIM parameters Θ turn out to make systematic errors when the data gener-ating process has time-varying parameters, in particular the estimated valuesare diﬀerent from the time-averaged values that generated the sample; lastlyour third reason is that by introducing a convenient factorization for the modelparameters it is possible to discriminate whether an observation is better ex-plained by endogenous interactions with other variables or by exogenous eﬀects,oﬀering an improved insight on the dynamics that generated the data even whenthese eﬀects are not constant over time. As mentioned, the fact that parame-ters are not constant in time is a common problem to complex systems such asﬁnancial markets, where for instance it is widely accepted that the volatility ofreturns is time-dependent, but also to brain networks where the processing oftime-varying stimuli [31, 19, 32] or the spontaneous emergence of thought [33]have been investigated in recent years with more quantitative methods.To expand on the ﬁrst point made above, a more practical representation ofthe eﬀect of having diﬀerent noise levels is obtained by deriving the theoreticalArea Under the ROC Curve (AUC) for the KIM and observing how it varies asa function of β . The AUC is a standard metric to evaluate the performance ofbinary classiﬁers [34, 35], which the Kinetic Ising Model de facto is, and relieson the generation of the Receiver Operating Characteristic (ROC) curve basedon the predictions ˆ s i ( t + 1) provided by the model.A ROC curve is a set of points ( F P R ( α ) , T P R ( α )), with α ∈ [0 ,

1] being afree parameter determining the minimum value of p ( s i ( t +1) = +1 | s ( t ) , x ( t ); β, Θ)which is considered to predict ˆ s i ( t + 1) = 1. If the prediction ˆ s i ( t + 1) matchesthe realization s i ( t + 1) then the classiﬁcation is identiﬁed as a True Positive(or Negative, if p < α ), otherwise it is identiﬁed as a False Positive (Negative).The True Positive Rate (TPR) is the ratio of True Positives to the total numberof realized Positives, that is True Positives plus False Negatives. Similarly the3 .0 0.5 1.0 1.5 2.0 2.5 . . . b A UC g = 0g = 1g = 2 g = 0.5g = 1g = 1.5 Figure 1: Theoretical AUC as a function of β assuming g i is Gaussian distributedwith mean g and standard deviation g . We see that increasing β has theeﬀect of reducing the uncertainty on the random variable s i ( t + 1), keeping g i unchanged. Grey dashed lines at AUC = 0 . T P R = T PT P + F N ; F P R = F PF P + T N

We can explicitly derive the analytical form of the theoretical Area Underthe Curve, that is the area that lies below the set of points (

F P R ( α ) , T P R ( α )),assuming the data generating process is well speciﬁed and performing someassumptions on the distribution of the model parameters. As a reminder, aclassiﬁer having AU C = 0 . uninformed classiﬁer , meaning it makespredictions statistically indistinguishable from random guessing, while values of AU C greater than 0 . β and the unconditional distribution of the eﬀective ﬁelds, φ ( g ), that isthe distribution from which any value g i ( t ) is sampled from.In Figure 1 we show the result assuming that φ ( g ) is a Gaussian distributionwith mean g and standard deviation g . This is the case for instance if the J ij entries are Gaussian distributed with zero mean as we show in Appendix A,since g would become a sum of Gaussian variables with random signs given bythe values of s ( t ). We see that the AUC is monotonically increasing with β , butalso that the distribution of the static parameters aﬀects the slope with whichthe curve converges towards 1. Indeed the smaller the mean and variance of theeﬀective ﬁelds g i , the slower the growth of AU C ( β ).This result would prove extremely useful if it wasn’t for the fact that, inthe standard form with static parameters of the KIM, β is not identiﬁable [36]:indeed it is a common multiplying factor to all the other parameters, meaningthat for any two values β and β there are also two sets of parameters Θ andΘ such that p ( s ( t +1) | s ( t ); β , Θ ) = p ( s ( t +1) | s ( t ); β , Θ ) for all s ( t ). For thisreason in inference problems it is typically assumed that β = 1 incorporatingits eﬀect in the size of the other parameters.4s we will see in more detail in Section 2 there is a way in which the β can be identiﬁed, and it relies on relaxing the assumption that β is constantthroughout the whole sample. If the β of Eq. 1 is allowed to be time-varyingthe identiﬁcation problem is limited to its average value (cid:104) β (cid:105) (which still needsto be assumed equal to 1), while its local value can be inferred from the datausing suitable methods. It is clear that the presence of a time-varying parameterimplies the necessity to complicate the model to describe the dynamical lawsof the parameter, but thanks to the score-driven methodology we propose it isactually both very easy and very eﬃcient to do so.This result has implications particularly for forecasting applications: a fore-cast should be considered more or less reliable by looking at the value of β ( t ) atthe previous instant in time and considering how well above 0 . β speciﬁcation ofthe KIM which is designed to capture this eﬀect, which we then apply in Section4.1 to a ﬁnancial setting.Having stated some of the motivations that move us towards the develop-ment of a KIM with time-varying parameters, let us set the stage to introducescore-driven models by brieﬂy reviewing the theory of time-varying parametersmodels in discrete time. There is a rich literature on the topic, which has beensummarized in the review by Tucci [37] and more recently by Koopman et al.[38]. In general, a time-varying parameters model can be written as y ( t ) ∼ p ( y ( t ) | f ( t ) , Y ( t − , Φ ) (2a) f t = ψ ( f ( t − , f ( t − , ..., Y ( t − , (cid:15) ( t ) , Φ ) (2b)where y ( t ) is a vector of observations sampled from the probability distribu-tion function p , Y ( t −

1) is the set of all observations up to time t − f ( t ) arethe parameters which are assumed to be time varying. The dynamics of thoseparameters can either depend on past observations, on past values of the sameparameters, on some external noise (cid:15) ( t ) and on two sets of static parameters Φ and Φ .If the function ψ only contains past values of the time-varying parameters,a noise term and the static parameters, then the model is called a parameter-driven model, whereas if the function ψ can be written as a deterministic func-tion of past observations only, it is called an observation-driven model [39].Examples for parameter-driven models can be found in the ﬁnancial econo-metrics literature looking at the Stochastic Volatility models [40, 41], as well asother examples as Bauwens and Veredas [42] or Hafner and Manner [43].The other family is the one of observation-driven models, whose probablymost celebrated example is the Generalized AutoRegressive Conditional Het-eroscedasticity (GARCH) model [44], where a time series of log-returns is mod-elled using a time-varying volatility parameter depending deterministically onsquared observations up to that time and past values of volatilities.The main advantage of adopting an observation-driven model rather thana parameter-driven one lies in its estimation: having time-varying parametersthat only depend on observations through a set of static parameters results in astrong reduction of complexity in writing the likelihood of the model, whereasthe calculations for most non-trivial parameter-driven models are typically ex-tremely convoluted and computationally intensive.5n this work we focus on one speciﬁc class of observation-driven models, theone of score-driven or Generalized Autoregressive Score (GAS) models, and theirimplementation in the case of the Kinetic Ising Model. Originally introduced byCreal et al. [15] and Harvey [16], they postulate that time-varying parametersdepend on observations through the score of the conditional likelihood, that isits gradient.To better introduce the score-driven methodology, let us consider a sequenceof observations { y ( t ) } Tt =1 , where each y ( t ) ∈ R N , and let us deﬁne a modelwith conditional probability density p ( y ( t ) | f ( t )) depending on a vector of time-varying parameters f ( t ) ∈ R M . Deﬁning the score as ∇ t = ∂ log p ( y ( t ) | f ( t )) ∂f ( t ) ,a score-driven model assumes that the time evolution of f ( t ) is ruled by therecursive relation f ( t + 1) = w + Bf ( t ) + A I − / ( t ) ∇ t (3)where w , B and A are a set of static parameters. In this generic form, w isa M -dimensional vector, while A and B are M × M matrices. I − / ( t ) is also a M × M matrix, that we choose to be the inverse of the square root of the Fisherinformation matrix associated with p ( y ( t ) | f ( t )). This is not the only possiblechoice for this rescaling matrix [15] but we will keep it this way throughout thisarticle as it is the most intuitive way of rescaling the score.As is clear from Eq. 3, the score drives the time evolution of f ( t ). This meansthat given a form of p ( y ( t ) | f ( t )) the sampling of the observations from thisdistribution results in a deterministic update of the time-varying parameters.The update can remind the reader of a Newton-like method for optimization, inthat the parameters are moved towards the maximum of the likelihood at eachrealization of the observations while keeping track of the time evolution throughthe B static parameter.Another reason to implement a score-driven model is provided by results[45, 46] from information theory about the optimality of this approach comparedto any other observation-driven method.Finally, the score-driven modelling approach provides access to a simple sta-tistical test, developed by Calvori et al. [47], which tests whether it is reasonableto assume that a given parameter is time-varying. This is of crucial importancewhen estimating a model parameters from data, as knowing whether the param-eter can be considered static or should be assumed to be time-varying helps inthe deﬁnition of models that extract more relevant informations from the dataand are less prone to overﬁtting or underﬁtting problems.The paper is structured as follows: in Section 2 we formalize two imple-mentations of score-driven Kinetic Ising Models, the Dynamical Noise KIM(DyNoKIM) and the Dynamic Endogeneity KIM (DyEKIM); then in Section 3we provide a number of tests on simulated data to assess the consistency of theestimation and to showcase the utility of score-driven modelling; in Section 4we oﬀer three example applications to ﬁnancial data of the two models; Section5 concludes the paper. 6 The Score-Driven KIM

In this section we deﬁne the Dynamical Noise Kinetic Ising Model (DyNoKIM),where as anticipated the noise parameter β of Eq. 1 is considered to be time-varying, which we assume to be modelled by a score-driven dynamics, To keepthe formulas concise, we impose that h i = b ik = 0 for all i, k as it is straightfor-ward to extend the results for any value of h and b . This leads to writing thetransition probability as p ( s ( t + 1) | s ( t ); J, β ( t )) = Z − ( t ) (cid:89) i exp  β ( t ) (cid:88) j s i ( t + 1) J ij s j ( t )  (4)with Z ( t ) = (cid:81) i (cid:104) β ( t ) (cid:80) j J ij s j ( t ) (cid:105) .The interpretation for this model is simple yet extremely useful: the higherthe value of β , the smaller the uncertainty over the realization of s ( t + 1) or, inother words, the more accurate a prediction of the value of s ( t + 1), as we haveshown in Fig. 1.We still have not explicitly introduced the dynamic rule of motion for thetime-varying parameter β ( t ), which, as was stated above, we choose to be score-driven. We give score-driven dynamics to f ( t ) = log β ( t ), as β is positive andinversely related to the noise:log β ( t + 1) = w + B log β ( t ) + A I − / ( t ) ∇ t (5)where w , B and A are parameters to be inferred by Maximum LikelihoodEstimation (MLE) and I is the Fisher Information matrix.The last term in Eq. 5 includes the score, which is the derivative of thelog-likelihood L at a given time t with respect to the time-varying parameterlog β ( t ), reading ∇ t = β ( t ) (cid:88) i  s i ( t + 1) − tanh  β ( t ) (cid:88) j J ij s j ( t )  (cid:88) j J ij s j ( t ) (6)The score is rescaled by the inverse of the square root of the Fisher Infor-mation, which is used to regularize its impact at diﬀerent times by consideringthe convexity of the log-likelihood. The Fisher Information corresponds to theexpectation of the Hessian of the log-likelihood, changed in sign and evaluatedat time t I ( t ) = − E (cid:20) ∂ L ( t ) ∂ (log β ) (cid:21) β ( t ) = − β ( t ) ∂ L ( t ) ∂β (cid:12)(cid:12)(cid:12) β ( t ) where ∂ L ( t ) ∂β (cid:12)(cid:12)(cid:12) β ( t ) = − (cid:88) i  − tanh  β ( t ) (cid:88) j J ij s j ( t )  (cid:88) j J ij s j ( t )  s ( t + 1).In the statistical physics literature there have been several attempts to studysimilar models: some examples are Penney et al. [48] where a model verysimilar to the one of Eq. 1 is considered, or the literature on superstatistics ofBeck and Cohen [49] and Beck et al. [50] which provides a general theory forphysical systems with non-static parameters and in particular studies modelswhere a time-varying noise parameter takes the role of β ( t ) in Eq. 4. There ishowever one important diﬀerence, which is related to the assumption of localequilibrium and time scale separation that is common to all the cited works.The authors assume that the sampling of the observations and of the time-varying parameters take place on two separated time scales, meaning that thetime-varying parameters are locally constant when the observations are sampled.This is not true for score-driven models, which are in fact designed to not requirethis assumption, intuitively formalized by the values of the parameters B and A .If B (cid:29) A then the evolution of f is indeed slower than the one of observations,while if B (cid:28) A they evolve on the same time scale.The model estimation procedure we choose to implement is done in steps:in a ﬁrst stage we only ﬁt the static parameters Θ following the algorithmof M´ezard and Sakellariou [36]; we then proceed to a targeted estimation ofthe score-driven dynamics parameters w , B and A . We still need to addressthe point of the identiﬁability of (cid:104) β (cid:105) : once the estimation is completed, werescale the obtained β ( t ) dividing them by their sample mean and multiply theparameters Θ by the same factor. This leaves the model likelihood unchanged,setting a reference value for β ( t ) and solving the indetermination. We discussthis and other technical aspects of the estimation in Appendix B.In Section 3 we provide simulation results to validate this estimation pro-cedure, while later in Section 4.1 we show an empirical application of theDyNoKIM to the forecasting of stock price changes at high frequency. The second speciﬁcation of the score-driven Kinetic Ising Model we explore inthis article is the Dynamic Endogeneity Kinetic Ising Model (DyEKIM). In theDyEKIM we let the number of time-varying parameters be a bit larger, assum-ing that the J , h and b parameters each have their own speciﬁc time-varyingfactorization. In principle these choices are up to the modeller, depending onthe speciﬁc application and data: here we present one factorization we believe isa reasonable choice for the ﬁnancial applications we propose in Sections 4.2 and4.3, albeit other implementations could be possible too. Going back to Eq. 1,we now impose the following structure to each of the time-varying parameters: J ij ( t ) = β diag ( t ) J ii δ ij + β off ( t ) J ij (1 − δ ij ) h i ( t ) = β h ( t )( h i + h ( t )) b ik ( t ) = β k ( t ) b ik x k ( t )where δ ij here represents the Kronecker symbol which is 1 if i = j and 0otherwise. The conditional probability density for this model, calling β ( t ) =( β diag , β off , β h , { β k } ), reads 8 ( s ( t + 1) | s ( t ) , x ( t ); J, h, b, h ( t ) , β ( t )) == Z − ( t ) exp (cid:104) (cid:88) i s i ( t + 1) (cid:104) β diag ( t ) J ii s ( t ) + β off ( t ) (cid:88) j (cid:54) = i J ij s j ( t ) ++ β h ( t )( h i + h ( t )) + (cid:88) k β k ( t ) b ik x k ( t ) (cid:105)(cid:105) (7)This change in the form of the model radically modiﬁes the interpretation onegives to the values of β ( t ). While it still has the role of modulating the relevanceof parameters, the main eﬀect is establishing how important are autocorrelationsand lagged cross-correlations among spins compared to idiosyncratic or externaleﬀects at any point in time. This model can then be used to describe data wherethe dynamics of the variables is dependent on others at intermittent times,disentangling network eﬀects from idiosyncratic dynamics or exogenous eﬀectsin a time-varying fashion.We will discuss in more detail the speciﬁc interpretation for each of the time-varying parameters in the empirical applications of Section 4.2 and 4.3. Theintuition behind this choice however is that we want to be able to discriminatebetween diﬀerent components of the dynamics observed in a set of variables:one associated to external inputs ( β k ), one to the idiosyncratic properties ofvariable i ( β h ), as well as general trends ( h ), one for autocorrelations ( β diag )and ﬁnally one for lagged cross-correlations among variables ( β off ). In thisformulation each of these time-varying parameters provides insight on the rel-ative importance of one term over the others in the generation of the data,highlighting periods of higher or lower endogeneity of the dynamics (when cor-relations have higher importance) rather than periods where the dynamics ismore idiosyncratic or exogenously driven.Regarding the estimation, the procedure is largely the same as the one forthe previous model. There are however a couple of subtleties that need to bepointed out, regarding the structure of the B and A parameters and of the FisherInformation I , which are now matrices. In order to make the estimation lesscomputationally demanding in our example applications we assume A, B and I diagonal, disregarding the dependencies between time-varying parameters:this will likely make our estimates less precise, but it also reduces the numberof static parameters to be inferred, letting us bypass model selection decisionswhich are outside the scope of this article.There is of course also in this case the problem of identiﬁcation for theaverages of the components of β , which we solve in the exact same way as wedid for the DyNoKIM by dividing the values of each component by their samplemean while multiplying the associated static parameter by the same factor,again leaving the model unchanged likelihood-wise but setting a reference levelfor β .As a last remark, notice that the DyNoKIM and the DyEKIM are equivalentwhen h ( t ) = 0 ∀ t and β diag = β off = β h = β k = β . In the next section wemainly present simulation results for the DyNoKIM alone to keep the manuscriptconcise, as we found no signiﬁcant diﬀerences between the two models when itcomes to the reliability of the estimation process, and later apply them to real-world scenarios where their interpretation is much more meaningful.9 regression coefficient C oun t s T = 750T = 1500True value (a)

J regression R C oun t s T = 750T = 1500 (b)

Figure 2: Consistency of the J matrix estimation. (a) Histogram of linear re-gression coeﬃcients b between inferred and true values of J ij over 250 samplesfor N = 50, T = 750 and T = 1500; (b) Histogram of coeﬃcients of determina-tion ( R ) for the same set of models. The convergence of both values towards1 when increasing T is a sign of consistency of the estimation. We start our analysis from a consistency test on simulated data, aimed at un-derstanding whether the two-step estimation procedure we outlined above isable to recover the values of the parameters of the model when the model itselfgenerated the data.Here we report results for simulations run with parameters N = 50, T = 750or T = 1500, J ij ∼ N (0 , / √ N ), h i = 0 ∀ i , B = 0 .

95 and A = 0 .

01. We seefrom Fig. 2 that the estimation of the elements of J is indeed consistent: weestimate a linear regression model between the estimated and the true values of J ij , namely J estij = a + bJ trueij , and plot the histogram of the values of b and of thecoeﬃcient of determination R of the resulting model from 250 simulations andestimations ( a is consistently found to be very close to 0 in all our simulationsand for this reason we omit it). In the ideal case where for any i, j J estij = J trueij one would have b = R = 1, which is what we aim for in the limit T → ∞ . Wesee from our results that there is indeed a convergence of both values towards1 when increasing sample size, reducing both the bias and the variance of theregression parameters.Turning to the score-driven dynamics parameters A and B , the situationdoes not change signiﬁcantly. In Fig. 3 we show the histograms of estimatedvalues of B and A over 250 simulations of N = 50 variables for both T = 750and T = 1500. It again appears clearly that when increasing the sample sizethe bias and variance of the estimators converge towards 0, with the estimatedparameter converging towards its simulated value. Thanks to these results weare able to conﬁdently apply the two-step estimation method without needingto estimate all the parameters at once.10 parameter C oun t s T = 750T = 1500True value (a)

A parameter C oun t s T = 750T = 1500True value (b)

Figure 3: Consistency of the score-driven dynamics parameters. (a) Histogramof estimated values of B over 250 samples for N = 50, T = 750 and T = 1500;(b) Histogram of estimated values of A over 250 samples for the same set ofmodels. The convergence towards the true value by increasing T is a sign ofconsistency of the estimation.Having shown that the model can be estimated consistently and eﬃciently,we want to test its performance when the β dynamics is not produced with thescore-driven data generating process. Indeed there is little reason to believethat this sort of dynamics is signiﬁcant for real-world applications, where thedynamics of β might follow exogenous and unknown rules. The power of score-driven models lies also in this feature, in that they are able to estimate time-varying parameters such as β ( t ) without actually needing any assumption ontheir true dynamical laws. In this sense they behave as ﬁlters for the underlying,unknown dynamics of the parameter.In all our simulations β ( t ) is deﬁned to have unit sample mean to solve theindetermination discussed in Section 2 for the data generating process too, aswell as to have a clear comparison between the simulated and inferred values.The ﬁrst question we ask is whether the estimation of the static KIM parametersΘ is consistent without accounting for the time variation of β ( t ): in Fig. 4we show the results for a set of simulations where β ( t ) follows a deterministicsinusoidal dynamics, β ( t ) = 1 + K sin ωt , varying the amplitude K , while theother parameters are ω = 2 π/ T = 3000, N = 30, J ij ∼ N (0 , / √ N ), h i = 0 ∀ i and the time evolution of s ( t ) is given by Eq. 4 with the enforced β ( t ). For each value of K we simulate 60 time series of T observations andﬁt both the constant parameters KIM and the score-driven DyNoKIM, thencomparing the resulting J est with the one that was used to generate the data, J true , by means of the usual linear regression model J estij = a + bJ trueij . We seefrom Figure 4 that when β is not constant the KIM underestimates the absolutevalue of the parameters, highlighted by the fact that b < a ≈

0, notshown), which is greatly reduced in the DyNoKIM thanks to the way in whichwe solve the indetermination of (cid:104) β (cid:105) . This happens despite the fact that, over thewhole sample, the average value of the simulated β ( t ) is 1 and its distribution11 . . . . β sinusoid amplitude J R eg r e ss i on C oe ff i c i en t KIMDyNoKIM lll ll ll lllll llll l ll llll lll l ll ll llll ll l ll ll ll lll l llll ll ll lll lllllll l llllllllll llllll ll ll ll ll lll lll lll lll llll ll llll lll ll lll llll llll ll l ll llll ll ll lll l ll ll lll llll lllll lllllll llll lllll l l ll llll l lll lll llll lll lll ll llll llll lllll llllll l llll ll l ll llll lll lll lll ll llllllll lllll lll llll ll lll lll llll ll llll ll ll ll lll llll lll lllll ll lllll lll ll ll l llllll l ll lllll ll ll lll ll ll lll llll lll ll ll lllll lll lll l ll l l lll lll llllllll ll lll ll llllll lll ll lll l llll ll l ll lll lllllll ll lll llll l ll l ll llll lll lll l llll l llll lllllllll lll lllll lllll ll lll llll l ll llll ll lllll l ll ll l llll llll lll ll lll ll l ll lll ll ll llll lll lll llllll llll llll lll lll ll llll llllll lll lllll ll ll lllllll ll ll lll llll ll ll ll llll llll l lll ll llll lll ll lllll l llll ll ll ll ll llll ll lll lll l l lll lllllll lll lll ll llllll l lll ll lllll ll lll ll ll ll ll llll lllll l llll ll ll ll lllllllll l lllllll l lll llll l ll llll lllll ll lll llllllll ll ll ll ll ll lll l ll lll llll ll lll llll llllllllll l lllll llll llll l lll lll ll lll l ll lllll lllllll llll llll ll llllll lll lll ll ll lllll llll l ll llll lll l ll ll llll ll l ll ll ll lll l llll ll ll lll lllllll l llllllllll llll ll ll ll ll ll lll lll lll lll llll ll llll lll ll lll llll llll ll l ll lll l ll ll lll l ll ll lll llll lllll lllllll llll lllll l l ll llll l lll lll llll lll lll ll llll llll lllll llllll l llll ll l ll llll lll lll lll ll llllllll lllll lll llll ll lll lll llll ll llll ll ll ll lll llll lll lllll ll lllll lll ll ll l ll l lll l ll lllll ll ll lll ll ll lll l lll lll ll ll lllll lll lll l ll l l lll lll ll llllll ll lll ll llllll lll ll lll l llll ll l ll lll lllllll ll lll llll l ll l ll llll lll lll l llll l llll lllllllll lll lllll lllll ll lll llll l ll ll ll ll lllll l ll ll l llll llll lll ll lll ll l ll lll ll ll llll lll lll llll ll llll llll lll lll ll llll llllll lll lllll ll ll lllllll ll ll lll llll ll ll ll llll llll l lll ll ll ll lll ll lllll l llll ll ll ll ll llll ll lll lll l l lll lllllll lll lll ll llllll l lll ll lllll ll lll ll ll ll ll llll lllll l llll ll ll ll lllllllll l lllllll l lll llll l ll llll lllll ll lll llllllll ll ll ll ll ll lll l ll lll llll ll lll llll llllllllll l lllll llll llll l lll lll ll lll l ll lllll lllllll llll llll ll llllll lll lll ll ll lllll llll l ll llll lll l ll ll llll ll l ll ll ll lll l llll ll ll lll lllllll l llllllllll llll ll ll ll ll ll lll lll lll lll llll ll llll lll ll lll llll llll ll l ll lll l ll ll lll l ll ll lll llll lllll lllllll llll lllll l l ll llll l lll lll llll lll lll ll llll llll lllll llllll l llll ll l ll llll lll lll lll ll llllllll lllll lll llll ll lll lll llll ll llll ll ll ll lll llll lll lllll ll lllll lll ll ll l ll l lll l ll lllll ll ll lll ll ll lll l lll lll ll ll lllll lll lll l ll l l lll lll ll llllll ll lll ll llllll lll ll lll l llll ll l ll lll lllllll ll lll llll l ll l ll llll lll lll l llll l llll lllllllll lll lllll lllll ll lll llll l ll ll ll ll lllll l ll ll l llll llll lll ll lll ll l ll lll ll ll llll ll l lll llll ll llll llll lll lll ll llll llllll lll lllll ll ll lllllll ll ll lll llll ll ll ll llll llll l lll ll ll ll lll ll lllll l llll ll ll ll ll llll ll lll lll l l lll lllllll lll lll ll llllll l lll ll lllll ll lll ll ll ll ll llll lllll l llll ll ll ll lllllllll l lllllll l lll llll l ll llll lllll ll lll llllllll ll ll ll ll ll lll l ll lll llll ll lll llll llllllllll l lllll llll llll l lll lll ll lll l ll lllll lllllll llll llll ll llllll lll lll ll ll lllll llll l ll llll lll l ll ll llll ll l ll ll ll lll l llll ll ll lll lllllll l llllllllll llll ll ll ll ll ll lll lll lll lll llll ll llll lll ll lll llll llll ll l ll lll l ll ll lll l ll ll lll llll lllll lllllll llll lllll l l ll llll l lll lll llll lll lll ll llll llll lllll llllll l llll ll l ll llll lll lll lll ll llllllll lllll lll llll ll lll lll llll ll llll ll ll ll lll llll lll lllll ll lllll lll ll ll l ll l lll l ll lllll ll ll lll ll ll lll l lll lll ll ll lllll lll lll l ll l l lll lll ll llllll ll lll ll llllll lll ll lll l llll ll l ll lll lllllll ll lll llll l ll l ll llll lll lll l llll l llll lllllllll lll lllll lllll ll lll llll l ll ll ll ll lllll l ll ll l llll llll lll ll lll ll l ll lll ll ll llll ll l lll llll ll llll llll lll lll ll llll llllll lll lllll ll ll lllllll ll ll lll llll ll ll ll llll llll l lll ll ll ll lll ll lllll l llll ll ll ll ll llll ll lll lll l l lll lllllll lll lll ll llllll l lll ll lllll ll lll ll ll ll ll llll lllll l llll ll ll ll lllllllll l lllllll l lll llll l ll llll lllll ll lll llllllll ll ll ll ll ll lll l ll lll llll ll lll llll llllllllll l lllll llll llll l lll lll ll lll l ll lllll lllllll llll llll ll llllll lll lll ll ll lllll llll l ll llll lll l ll ll llll ll l ll ll ll lll l llll ll ll lll lllllll l llllllllll llll ll ll ll ll ll lll lll lll lll llll ll llll lll ll lll llll llll ll l ll lll l ll ll lll l ll ll lll llll lllll lllllll llll lllll l l ll llll l lll lll llll lll lll ll llll llll lllll llllll l llll ll l ll llll lll lll lll ll llllllll lllll lll llll ll lll lll llll ll llll ll ll ll lll llll lll lllll ll lllll lll ll ll l ll l lll l ll lllll ll ll lll ll ll lll l lll lll ll ll lllll lll lll l ll l l lll lll ll llllll ll lll ll llllll lll ll lll l llll ll l ll lll lllllll ll lll llll l ll l ll llll lll lll l llll l llll lllllllll lll lllll lllll ll lll llll l ll ll ll ll lllll l ll ll l llll llll lll ll lll ll l ll lll ll ll llll ll l lll llll ll llll llll lll lll ll llll llllll lll lllll ll ll lllllll ll ll lll llll ll ll ll llll llll l lll ll ll ll lll ll lllll l llll ll ll ll ll llll ll lll lll l l lll lllllll lll lll ll llllll l lll ll lllll ll lll ll ll ll ll llll lllll l llll ll ll ll lllllllll l lllllll l lll llll l ll llll lllll ll lll llllllll ll ll ll ll ll lll l ll lll llll ll lll llll llllllllll l lllll llll llll l lll lll ll lll l ll lllll lllllll llll llll ll llllll lll lll ll ll lllll llll l ll llll lll l ll ll llll ll l ll ll ll lll l llll ll ll lll lllllll l llllllllll llll ll ll ll ll ll lll lll lll lll llll ll llll lll ll lll llll llll ll l ll lll l ll ll lll l ll ll lll llll lllll lllllll llll lllll l l ll llll l lll lll llll lll lll ll llll llll lllll llllll l llll ll l ll llll lll lll lll ll llllllll lllll lll llll ll lll lll llll ll llll ll ll ll lll llll lll lllll ll lllll lll ll ll l ll l lll l ll lllll ll ll lll ll ll lll l lll lll ll ll lllll lll lll l ll l l lll lll ll llllll ll lll ll llllll lll ll lll l llll ll l ll lll lllllll ll lll llll l ll l ll llll lll lll l llll l llll lllllllll lll lllll lllll ll lll llll l ll ll ll ll lllll l ll ll l llll llll lll ll lll ll l ll lll ll ll llll ll l lll llll ll llll llll lll lll ll llll llllll lll lllll ll ll lllllll ll ll lll llll ll ll ll llll llll l lll ll ll ll lll ll lllll l llll ll ll ll ll llll ll lll lll l l lll lllllll lll lll ll llllll l lll ll lllll ll lll ll ll ll ll llll lllll l llll ll ll ll lllllllll l lllllll l lll llll l ll llll lllll ll lll llllllll ll ll ll ll ll lll l ll lll llll ll lll llll llllllllll l lllll llll llll l lll lll ll lll l ll lllll lllllll llll llll ll llllll lll J true J e s t Figure 4: Estimation of J under model misspeciﬁcation with a time-varying β ( t ) = 1 + K sin( ωt ), comparing the constant parameters KIM and the score-driven DyNoKIM. On the x axis we plot the amplitude of the sine function K , on the y axis the distribution of the coeﬃcient b of the linear regression J estij = a + bJ trueij over 60 simulations. The insets show example scatter plots ofthe true J values ( x axis) and the estimated values ( y axis) using the standardKIM (yellow points) or the DyNoKIM (purple crosses). Each inset correspondsto one value of K and one of the 60 simulations.is not particularly pathological, thus one could expect that the estimation ofthe KIM parameters should not be aﬀected. This result supports our argumentthat ﬁtting a KIM on data where stationarity is not assured or parametersof the DGP are time varying can be misleading and lead to signiﬁcant errors,something that can be overcome by adopting the models proposed here.The second question to be asked is then how accurate is the ﬁlter in retrievingthe simulated values of β ( t ), despite its misspeciﬁcation. In Fig. 5 we show twoexamples of misspeciﬁed β ( t ) dynamics that are correctly recovered by the score-driven approach: the ﬁrst is a deterministic double step function and the secondis an AutoRegressive model of order 1 (AR(1)) which follows the equation β AR ( t + 1) = a + a β AR ( t ) + (cid:15) ( t )where (cid:15) ( t ) ∼ N (0 , Σ ) with parameters a = 0 . a = 0 . .

01 soto have (cid:104) β AR (cid:105) = 1. In both cases we simulate 30 time series of length T usingthe given values of β ( t ) to generate the s ( t ); given only the simulated s ( t ) timeseries, the inference algorithm determines the optimal static parameters A , B and J and ﬁlters the optimal value of β ( t ) at each time. We see that regardlessof whether the underlying true dynamics is deterministic, stochastic, or moreor less smooth the ﬁlter is rather accurate in retrieving the simulated values. In this section we brieﬂy show some simulations results from tests on theDyEKIM, where we want to show that diﬀerent eﬀects are correctly separatedand identiﬁed when estimating the model on a misspeciﬁed data generatingprocess. In fact while the consistency analysis largely resembles the one we re-ported for the DyNoKIM in Figures 2 and 3 and for this reason we omit it, the12igure 5: Simulation and estimation of a misspeciﬁed score-driven model over 30simulations, with sample trajectories highlighted. (a) Deterministic β followinga double step function; (b) Stochastic β ( t ) following an AutoRegressive modelof order 1.eﬀect of ﬁltering multiple time-varying parameters is something that cannot bepredicted by the simulations on the DyNoKIM alone.In Figure 6 we show the results when estimating the DyEKIM on a datasetgenerated by a Kinetic Ising Model with time-varying β diag ( t ), β off ( t ) and β h ( t ) as in Eq. 7 but where the dynamics of the parameters is predeterminedinstead of following the score-driven update rule. We arbitrarily choose to takea constant β diag ( t ) = 1, a piecewise constant β off ( t ) and an exponentiatedsinusoidal β h ( t ) = exp[sin( ωt )], with ω = 5 πT , T = 1500 and N = 30. Theresults show that the ﬁlter works correctly and that the diﬀerent time-varyingparameters are consistently estimated, regardless of the kind of dynamics givento each of them.Having provided evidence that both the DyNoKIM and the DyEKIM can beconsistently estimated and have a speciﬁc interpretation, in the following sectionwe propose three simple real world applications for our modelling approach,which we apply to high-frequency trading data from the US stock market andfrom the Foreign Exchange (FX) market. The ﬁrst dataset we use is a selection of 11 trading days in the 100 largestcapitalization stocks in the NASDAQ and NYSE , for which we track the eventsof mid-price change in the Limit Order Book (LOB) at a frequency of 5 seconds. Data provided by LOBSTER academic data - powered by NASDAQ OMX. β diag ( t ), β off ( t ) and β h ( t ) under model misspeciﬁcation.The model was simulated with a constant β diag ( t ) = 1, a piece-wise constant β off ( t ) and an exponentiated sinusoidal β h ( t ) = J (1) exp[sin( ωt )], with ω =5 πT and J the Bessel function of ﬁrst kind of order 0 to normalize the mean.The points are the result of 30 diﬀerent simulations and estimations, the linesshow the values of β off and β h used to generate the data.The mid-price is the average of the best bid and best ask occupied price levelsin the LOB of a stock, deﬁned for stock i at time t as M i ( t ) = P bi ( t ) + P ai ( t )2where P bi ( t ) and P ai ( t ) are the best bid and ask prices available in the LOBof stock i at time t . We discretize time in slices of 5 seconds and deﬁne for eachstock a binary time series s i ( t ), taking value +1 if the mid-price has changed inthe previous 5 seconds and − s ( t ).We test whether there is reason to assume a time-varying β by performing theLagrange Multiplier (LM) test proposed by Calvori et al. [47] as a generalizationof the method by White [58]. In short, the LM test consists in testing the nullhypothesis that log β = f is constant in time, that is f = w and A = B = 0,against the alternative hypothesis of a time-varying parameter. Calvori et al.[47] show that the test statistic of the LM test can be written as the Explained14igniﬁcance of LM testsSP100 - DyNoKIM FC - DyEKIM FOMC FX p < .

001 100% 100% 100% 88% p < .

01 - - - - p < .

05 - - - 4% p > .

05 - - - 8%Table 1: Percentage of p-values of Lagrange Multiplier tests below signiﬁcancethresholds, divided by dataset. The ﬁrst column refers to the application ofSection 4.1, the second and third to the one of Section 4.2 and the last to theone of Section 4.3. The only non-rejected nulls regard two β b parameters in theFX dataset, meaning traders don’t show signiﬁcant changes in strategy when itcomes to their reactions to prices in those months.Sum of Squares (ESS) of the auxiliary linear regression = c w ∇ t + c A S t − ∇ t (8)where ∇ t is the time t element of the score under the null hypothesis that f ( t ) = w ∀ t , S t is the time t element of the rescaled score (i.e. I − / ( t ) ∇ t )under the null, the constants c w and c A are estimated by standard linear regres-sion methods and the resulting LM test statistic is distributed as a χ randomvariable with one degree of freedom. If the null is rejected, the hypothesis that β is time varying is a valid alternative and we can proceed to estimate thescore-driven dynamics parameters.All our empirical results are validated by this preliminary test, for which wehave strong rejections of the null on all samples as reported in the ﬁrst columnof Table 1.Our theoretical and simulation results from Figures 1 and A.1 suggest to useour estimates of β to quantify the reliability of forecasts using this model: wethus estimate the model parameters once per day and use them to ﬁlter β ( t )on the next day, while checking the accuracy with which the model predictsmid-price movements out of sample using the AUC metric.The forecasts ˆ s i ( t + 1) are produced according toˆ s i ( t + 1) = sign [ p ( s i ( t + 1) = 1 | s ( t ) , J, h, β ( t − − α ] (9)and the ROC curves are obtained varying the value of α between 0 and 1.Notice that we take β ( t −

1) instead of β ( t ) as in the original Eq. 4: the reasonis that in order to estimate β ( t ) we need the observation of s ( t + 1), which iswhat we are trying to predict instead, thus we take the last available estimateof β as a proxy for the current value. In this way our prediction is fully causal.We show results of this analysis in Fig. 7. When looking at a single day wesee an upward trend in the AUC score as a function of the value of β ( t − β the more reliable the forecast canbe considered. Comparing our results to the theoretical value that the AUCshould take if the distribution of eﬀective ﬁelds g i were Gaussian we see thatthe empirical results are in good agreement with the theoretical prediction,however since the actual ﬁelds we measure are non-Gaussian the match is notperfect. A further aggregated measure is shown in Fig. 7b by looking at the15 lllllll lllllllll llllllllllll llllllllll l . . . b A UC Theor. AUCConstant b (a) l l l l l l l l l l nov 07 nov 09 nov 11 nov 13 nov 15 nov 17 nov 19 . . Forecasting day C o r [ A UC ( t ) , b ( t − ) ] (b) Figure 7: AUC statistics compared to β ( t −

1) forecasting US stocks mid-pricechange events at 5 seconds time scale. (a) AUC values for November 19, 2019aggregated for diﬀerent values of β ( t −

1) compared to the theoretical AUC inthe hypothesis of Gaussian eﬀective ﬁelds g i and to the average performancewith a constant β ; (b) Pearson’s correlation coeﬃcient between β ( t −

1) and

AU C ( t ) estimated daily with 95% conﬁdence intervals.correlation coeﬃcient between AUC and β ( t − In another application to a stock prices dataset, we analyze two events thatcaused turmoil in the stock markets at the intraday level as an example ap-plication of our second kind of score-driven Kinetic Ising Model, the DynamicEndogeneity KIM (DyEKIM). The two events we choose to analyze are the FlashCrash of May 6, 2010 and the Federal Open Market Committee announcementof July 31, 2019. The Flash Crash marked a historic event for electronic mar-kets, when a seemingly unjustiﬁable sudden drop in the price of E-mini S&P 50016 . . times[−1] b − . . . times[−1] < g b > b diag b off b h h Figure 8: Values of β ( t ), (cid:104) g β (cid:105) ( t ) and h ( t ) on a regular trading day, November12, 2019. We see that the endogenous components of g have larger values at thebeginning and the end of the day, while the exogenous g β h only grows towardsclosing. The most varying β parameter is the one related to cross-correlations, β off , which has a very signiﬁcant increase towards market closure.futures contracts caused all major stock indices to plummet in a matter of a fewminutes, including the biggest to date one-day point decline for the Dow JonesIndustrial Average and an overall loss of over 5% value across markets. Themarkets then stabilized and recovered most of the losses when circuit breakerscame into place in the original venue (the Chicago Mercantile Exchange) [59].Multiple explanations of what happened have been oﬀered by a large number ofacademics, regulators and practitioners: CFTC-SEC oﬃcials initially attributedresponsibility to a “fat-ﬁnger trade” by a mutual fund unloading its inventorythrough an unsophisticated sell algorithm, triggering a liquidity crisis in thefutures and stock markets.Following the oﬃcial report, alternative explanations challenging this viewwere presented, as in Easley et al. [60], where they argue that the state ofliquidity had deteriorated prior to the start of the crash and that liquidityproviders, in the form of High-Frequency Traders (HFT) and market makers,turned their backs on the market as soon as the distress rose, becoming liquidityconsumers. Madhavan [61] does not take position on the cause but argues thatmarket fragmentation, that is the fact that the same ﬁnancial instrument can betraded on multiple markets, causes liquidity provision to be more susceptible totransitory order imbalances, a view that is conﬁrmed by Menkveld and Yeushen[62]. Finally, Kirilenko et al. [63] analyze trading records by market participantsand ﬁnd that in terms of executed orders the behavior of HFTs had not changedduring the Flash Crash, while traditional intermediaries acted according to theirlimited risk-bearing capacity and did not absorb the shock in full. This diﬀerencealthough does not mean that HFTs did not contribute to the ampliﬁcation ofthe liquidity crisis, as the authors argue that they operate signiﬁcantly diﬀerentstrategies from traditional market makers, including quote sniping (or latencyarbitrage ) which is harmful to liquidity provision [64].17he other event we analyze is the announcement following the Federal OpenMarket Committee (FOMC) meeting of July 31, 2019. In this recent meetingthe Federal Reserve operated its ﬁrst interest rate cut in over a decade, the lastone dating back to the 2008 ﬁnancial crisis, encountering mixed reactions inboth the news and the markets. In particular an answer to a question in theQ&A press conference by the Fed Chairman Powell has been highlighted by newsagencies, when being asked whether further cuts in the future meetings were anoption, he answered “we’re thinking of it essentially as a midcycle adjustmentto policy” [65]. This answer triggered turmoil in the equity markets, with allmajor indices dropping around 2% in a few minutes.Our analysis focuses again for both events on midprice movements for thethen S&P100-indexed stocks at the 5 seconds time scale . Diﬀerently from theprevious example, here we apply the DyEKIM methodology to study variationsin the relative importance of diﬀerent sets of parameters as events unfold, asdeﬁned in Eq. 7. In this setting we include no covariates x k ( t ), so there is no b parameter matrix and consequently no β k ( t ). As usual we begin by running theLagrange Multiplier test on each of the hypothesized time-varying parameters,obtaining that all the nulls are rejected on both datasets as summarized in thesecond and third columns of Table 1. To exclude dependencies between thetests we take as null models both the completely static model (i.e. where all thetime-varying parameters are constant) and the model where all the parametersare time-varying except the one being tested, obtaining similar results regardlessof the choice.Here to better understand the results we introduce another quantity to theanalysis, given by the value of the components of the eﬀective ﬁelds g i ( t ), eachrelated to one of our time-varying parameters. In particular, we deﬁne g i ( t ) = g i,β diag ( t ) + g i,β off ( t ) + g i,β h ( t ) g i,β diag ( t ) = β diag ( t ) J ii s i ( t ) g i,β off ( t ) = β off ( t ) (cid:88) j J ij s j ( t ) g i,β h ( t ) = β h ( t )( h i + h ( t ))which we then average at each time across all indices i , obtaining the quan-tities (cid:104) g β diag (cid:105) ( t ) and so on.The way to interpret these quantities follows from the interpretation thevarious time-varying parameters have: as mentioned in the deﬁnition of themodel, the β diag parameter captures the level of endogeneity in the dynamicsrelated to auto-correlation in the time series; β off is related to endogeneity inthe form of lagged cross-correlations; β h instead models the level to which theobservations are close to realizations of independent Bernoulli random variables,unconditional of previously observed values - that is, they are not dependentfrom any other modelled variable, thus linking to exogenous eﬀects - and h shifts up or down the mean of these independent Bernoulli, thus capturingpurely exogenous eﬀects on the dynamics. What the (cid:104) g β (cid:105) ( t ) quantities showthen is intuitively related to what the explained sum of squares means for linearregression models, in the sense that the more a (cid:104) g β (cid:105) ( t ) is far from 0 relative to Data provided by LOBSTER academic data - powered by NASDAQ OMX. . . b − . . as.POSIXct(df_midp[−1, 1], tz = "UTC") < g b > b diag b off b h h as.POSIXct(df_midp$posix, tz = "UTC") A v g . P r i c e [ $ ] Flash Crash

Figure 9: Values of β ( t ), (cid:104) g β (cid:105) ( t ) and h ( t ) on May 6, 2010, along with theaverage midprice across the S&P100 stocks. The red shade highlights the timewindow (14:32:00 to 15:08:00 EST) where the Flash Crash takes place. We seethat the average activity parameter h starts increasing in the 45 minutes pre-ceding the crash, while during the crash a bigger role is played by the correlationparameters β diag and β off .others the more the data reﬂect a dynamics that is modelled by that subset ofparameters. We choose to show these quantities as a simple way of assessingthe relevance of the components, a problem that is not easily solved in thesekinds of models. One potential candidate to better quantify these eﬀects isprovided by dominance analysis [66, 67], which to the best our knowledge hasonly been applied in the framework of multiple logistic regressions but never toautoregressive models and whose generalization goes beyond the scope of thisarticle.Since the baseline model is applied to stock midprice changes at high fre-quency, typically called the activity of a stock which is taken as a proxy ofhigh-frequency volatility [53, 54], the interpretation of these time-varying pa-rameters relates to volatility clustering in the case of β diag , to volatility spilloversfor β off , to higher or lower market-wise volatility for h and the relevance ofexogenous eﬀects is given by β h .In Figure 8 we show results for these quantities on a regular trading day,November 12, 2019. We see that the J -related parameters, β diag and β off , aswell as the corresponding g components, show a U-shaped pattern throughoutthe trading day, having higher values at the opening and closing, while the h -related parameter β h only shows an increase towards the end of the day. The h parameter, which captures the average exogenous price activity across allstocks, shows itself a U-shaped pattern which is more pronounced at closing,consistent with the intraday pattern typical of traded volume.Figure 9 shows the same quantities during the Flash Crash of May 6, 2010.Here the situation appears to be radically diﬀerent from the one of Figure 8: the19 . . b − . . times[−1] < g b > b diag b off b h h . . times A v g . P r i c e [ $ ] FOMC ann. Powell Q&A

Figure 10: Values of β ( t ), (cid:104) g β (cid:105) ( t ) and h ( t ) on July 31, 2019, along with theaverage midprice across the S&P100 stocks. We highlight the time of at whichthe announcement becomes public (14:00:00 EST) and the time at which thepress Q&A with Chairman Powell begins (14:36:00 EST).parameters show a very signiﬁcant variation around the crash, with an abnormalincrease in quantities related to β h in the 45 minutes preceding the crash followedby a similar increase of the endogeneity parameters β diag and β off during theevent, which then stay relevant until market close. The intraday pattern isovershadowed by the eﬀect of the crash, but the picture at the beginning of theday is similar to normal trading days. These measurements are consistent withthe reconstruction of how events unfolded, with an abnormal exogenous increasein activity starting the crash, which is then ampliﬁed by endogenous mechanismsof volatility spillovers. Of note, the endogeneity parameters persist at relativelyhigh values in the aftermath of the crash, indicating that the turmoil inducedby the Flash Crash reverberated for the remainder of the trading hours, evenafter the prices had recovered at pre-crash levels.Moving on to the recent FOMC announcement of July 31, 2019, in Figure10 we show the values of β , (cid:104) g β (cid:105) and the average stock price of the S&P100-listed stocks we consider in the analysis (which are a diﬀerent set from the onein the Flash Crash example). The announcement went public at 14:00:00 ESTand is followed by a press conference at 14:30:00 EST, with a Q&A starting ataround 14:36:00 EST. Again we see that the usual intraday pattern shown inFigure 8 is interrupted by the news, which however, diﬀerently from the FlashCrash, was a scheduled event. This diﬀerence leads to the complete absence ofany sort of “unusual” eﬀect in the earlier hours of the day, as typically analystsprovide forecasts regarding these announcements in the previous days and thisinformation is already incorporated in the prices. What then happens is that,if the news does not meet market expectations, a correction in prices will occuras soon as the information is made public, leading to higher market volatilityin the minutes and hours following the announcement [68, 69]. In this speciﬁc20ase, forecasts were mixed between a 25 and a 50 basis points interest rates cutscenario .The published announcement at 14:00 EST mostly matched these forecasts,with the FOMC lowering the interest target rate by 25 basis points, and we in-deed see that the price levels are not particularly aﬀected by the news. Howeveran increase in volatility, and in particular the endogenous components, can stillbe observed in the few minutes following the announcement, quickly returningto average levels though. What is actually interesting is to see the reaction tothe press conference held 30 minutes after the release, and in particular to theanswers the Chairman of the Fed Jerome H. Powell gives to journalists in theQ&A. We see in fact that as soon as the Q&A starts, around 14:36 EST, pricesbegin to plummet in response to the Chairman’s answers, possibly reacting tothe statement that this interest rates cut was only intended as a “midcycle ad-justment to policy” rather than as the ﬁrst of a series. Expectations of furtherrates cuts in the later months of the year could be a reason for this adjustmentin the prices when these forecasts are not met, as usually lower interest ratespush the stock prices up. We see however that this unexpected event causes abehavior in the time-varying parameters estimates much more similar to whatwe have seen in the Flash Crash, albeit the endogenous components are evenmore signiﬁcant here.Overall, these two examples show that our model captures diﬀerent reac-tions to events in stock volatilities depending whether at least part of the newinformation is already incorporated in the price, as is the case for the FOMCdecision release, or whether the event is unpredictable in nature and triggeredby external causes, as in the Flash Crash or the press conference of July 31,2019. Another example application we propose is an extension to a previous workby some of the authors [26], where they utilized the Kinetic Ising Model withstatic parameters to infer a network of lead-lag relationships among traders andto estimate the opinions held by traders on the underlying asset price, in thiscase the spot exchange rate between Euro and US Dollar. Here the time seriesrepresent buy (+1) or sell (-1) trades performed by individual traders in theperiod May - September 2013, on a time scale of 5 minutes on the electronicForeign Exchange platform of a major dealer in the market, which providedthe data. The time series is produced as follows: starting from the tradingrecords, containing information about the time of trade with millisecond preci-sion, anonymized identity of the trader, volume of EUR purchased in exchangefor USD (negative in case the trade goes in the other direction) and price paid,we aggregate the traded volume for each trader in 5 minute time windows, andtake the sign of the total volume as the binary variable to feed the Kinetic IsingModel. In the model we also include the log-returns on the exchange rate asan external covariate, thus letting x ( t ) = r ( t ) and introducing the covariatecoupling parameters b i . We also split the data monthly in order to account for This information can be found on any ﬁnance-focused media outlet such as ﬁ-nance.yahoo.com, bloomberg.com or zacks.com . Forfurther methodological details we refer the interested reader to the originalpapers [30, 26], as explaining the method from scratch would require a lengthydiversion from the discussion of the results. Once this imputation is done, weadd the score-driven dynamics to the model as in Eq. 7 and, after performing theusual LM test reported in the fourth column of Table 1, we infer the score-drivendynamics parameters. We then obtain a set of time series for β , now includingalso a time-varying β b ( t ) parameter for the log-returns couplings, and h . Asmentioned in the original paper [26], this model should be interpreted as a wayto put in relation the strategic decisions made by traders, highlighting whichmarket participants can carry information about short-term trends in demandand supply as well as identifying the relations between traders adopting diﬀerentstrategies. In this extended score-driven version, the time-varying parametersallow a more reﬁned interpretation of the model results by making explicit whenthe considered traders are more “coupled” to others or to price variations in theirstrategic behavior.We compare our results to a dataset of macroeconomic announcement timesfrom the website , which provides a calendar of scheduled an-nouncements (e.g. interest rate decisions by central banks, quarterly unemploy-ment rate reports, ...) with labels characterizing which currencies are mostlyaﬀected by the announcement and the level of importance (low, medium or high)of the news. We restrict our analysis to the news that are labeled as highly im-portant and involving either EUR or USD, the pair traded by the traders inour dataset. We obtain a total of 474 non-overlapping announcement events, ofwhich 283 are referred to the US Dollar and the remaining 201 to the Euro.As we have diﬀerent models for diﬀerent months which we need to compare,we ﬁrst need to standardize the time series of our time-varying parameters. Todo so we perform a multiplicative trend-seasonal decomposition on the β s andan additive decomposition on h . Then we have β k ( t ) = β seask ( t ) β trendk ( t ) β randk ( t ) h ( t ) = h seas ( t ) + h trend ( t ) + h rand ( t )where β ( t ) = ( β diag ( t ) , β off ( t ) , β h ( t ) , β b ( t )). The seasonal component isassumed to have daily periodicity, thus capturing any intraday pattern the pa- The imputation of these missing trades has been showed to be ﬁnancially signiﬁcant toestimate causal eﬀects between herding and liquidity in fragmented markets [26] β randk and h rand are what we areactually interested in, as they are the residual part of our parameters that is notexplained by either the intraday pattern or the local average value. To checkthat we were not neglecting other possible choices, we measure seasonality in thedata by computing the Fourier transform of the time series to extract the prin-cipal spectral component (not shown here for the sake of space) and found it tobe typically around the daily frequency: we decided to enforce daily seasonalityin order to make the decomposition homogeneous across months.Having done this decomposition, we focus on the behavior of the residualparts β randk and h rand in the vicinity of news announcements. Deﬁning thenews timestamp t ∗ , we select the values of β randk and h rand in the interval[ t ∗ − m, t ∗ + 60 m ], that is one hour before and after the event. In order to beable to compare the residuals coming from diﬀerent months, since we ﬁnd thatthey are distributed similarly to a Gaussian by inspecting the quantile-quantileplots, we normalize them by subtracting the mean and dividing by the standarddeviation, obtaining ˆ β randk ( t ) = β randk ( t ) − E [ β randk ]Stdev[ β randk ] (10)We then take, for each lag l in the time window around the events, theaverage value of the normalized ˆ β randk and ˆ h across all events, that is (cid:104) ˆ β randk ( l ) (cid:105) = 1 N e N e (cid:88) e =1 ˆ β randk ( t ∗ e + l )where N e is the number of macroeconomic news events, l ∈ [ − m, m ] and t ∗ e is the timestamp within which event e takes place, that is the announcementhappens in the time window ( t ∗ e − m, t ∗ e ] identiﬁed by the bin labeled 0 in theﬁgures.In Figure 11 we show these average values for all the diﬀerent β compo-nents, along with 95% conﬁdence intervals obtained considering the probabilitythat a sample of N e Gaussian random variables sampled with zero populationmean and unit variance has an empirical mean diﬀerent from zero. We clearlysee that there are signiﬁcant patterns in the proximity of the news announce-ment, where both the β diag and the β off parameters show a reduction in theimportance of both autocorrelation and cross-correlation eﬀects in the tradingbehavior by traders, while the exogenous component β h is mostly unchanged.The β b parameter is also marginally smaller in the immediate vicinity of theannouncement, possibly meaning that in that time frame the traders are lessfocused on following the price dynamics and more on reacting to the news.As further evidence that this modelling approach captures meaningful ef-fects, in Figure 12 we show the pattern of h around the news events. Again,when h > h < . . < b ^ d i ag r and > − . . < b ^ o ff r and > − . . < b ^ h r and > −60 −40 −20 0 20 40 60 − . . (−12:12) * 5 < b ^ b r and > Minutes after news

Figure 11: Patterns of the normalized residual components of β around macroe-conomic news announcements, with purple dashed lines marking 95% conﬁdenceintervals for the null hypothesis of no variation with respect to the sample av-erage.Indeed as we mentioned our dataset contains news aﬀecting either USD or EUR,so we can condition our averaging procedure to this information.In the middle and bottom panels of Figure 12 we show the average pattern ofˆ h rand around USD-aﬀecting news and EUR-aﬀecting news, respectively. Whatwe ﬁnd is that traders are actually more likely to drop the aﬀected currency fromtheir inventory when they know a news is coming, and that the eﬀect is strongerfor EUR than USD. We believe there are at least two plausible explanations forthis: one is that the market we analyze is a European market in London openingtimes, thus the majority of the traders are likely to be European; another is thatthe year is 2013 and the Euro debt crisis is aﬀecting EUR-related news, causingmore risk-aversion in traders. While this eﬀect is not particularly surprisingand could be showed by simply looking at the net order ﬂow, which is what h is designed to capture, it proves that our modelling approach can be used tocleanly ﬁlter these eﬀects when describing these systems.We also argue that this particular behavior is consistent with the way weselect traders: as we only consider in our data traders that are active in morethan 30% of the timestamps, we are very likely excluding the traders that followa news-trading strategy, as they are unlikely to trade that often outside of thesetime intervals. In this sense it is not surprising to see other traders be risk-aversewith respect to the news, thus dropping the aﬀected currency before the news24 . . < h ^ r and >< h ^ r and > ( $ ) - . . -60 -40 -20 0 20 40 60 - . - . . < h ^ r and > ( € ) Minutes after news

Figure 12: Patterns of the normalized residual component of h around macroe-conomic news, with purple dashed lines marking 95% conﬁdence intervals forthe null hypothesis of no variation with respect to the sample average. (top) allevents; (mid) only USD-related events; (bottom) only EUR-related events.comes to avoid adverse selection and higher volatility. We have applied the score-driven methodology to extend the Kinetic Ising Modelto a time-varying parameters formulation, introducing two new models for non-stationary time series, the Dynamical Noise Kinetic Ising Model (DyNoKIM)and the Dynamic Endogeneity Kinetic Ising Model (DyEKIM). We showed thatthe DyNoKIM, characterized by a time-varying noise level parameter β ( t ), hasa clear utility in forecasting applications, as the Area Under the ROC Curvecan be showed to be a growing function of β ( t ), while the DyEKIM can be usedto discriminate between endogenous and exogenous eﬀects in the evolution of atime series. We then provided three example applications of the two models: inthe ﬁrst the DyNoKIM is successfully used to quantify the forecasting accuracyof stock activities in the US stock market; in the second we applied the DyEKIMto describe the high-frequency volatilities of US stocks in proximity of extremeevents such as the Flash Crash of May 6, 2010 or around scheduled announce-ments as the FOMC report of July 31, 2019; in the last empirical applicationwe built upon a previous work [26] on traders lead-lag networks to describe howtrading strategies aﬀect one another around macroeconomic announcements,showing that the DyEKIM eﬀectively captures some interesting features of thedata. While our empirical applications have been focused on ﬁnancial systems,mainly due to our expertise and data availability, we envision our approach canbe useful also in other ﬁelds of application such as neuroscience and machinelearning, where the static version of the Kinetic Ising Model has been in use for25 long time. Acknowledgements

FL acknowledges partial support from the European Integrated Infrastructurefor Social Mining and Big Data Analytics (SoBigData++, Grant Agreement eferences [1] Philip W Anderson. More is diﬀerent. Science, 177(4047):393–396, 1972.[2] Yaneer Bar-Yam. General features of complex systems. Encyclopedia of LifeSupport Systems (EOLSS), UNESCO, EOLSS Publishers, Oxford, UK, 1,2002.[3] Damien Challet, R´emy Chicheportiche, Mehdi Lallouache, and Serge Kas-sibrakis. Trader lead-lag networks and order ﬂow prediction. Available atSSRN 2839312, 2016.[4] Fabrizio Lillo, Salvatore Miccich`e, Michele Tumminello, Jyrki Piilo, andRosario N Mantegna. How news aﬀects the trading behaviour of diﬀer-ent categories of investors in a ﬁnancial market. Quantitative Finance,15(2):213–229, 2015.[5] Oswald Schmitz. Predator and prey functional traits: understanding theadaptive machinery driving predator–prey interactions. F1000Research, 6,2017.[6] Gaia Tavoni, Ulisse Ferrari, Francesco P Battaglia, Simona Cocco, andR´emi Monasson. Functional coupling networks inferred from prefrontalcortex activity show experience-related eﬀective plasticity. NetworkNeuroscience, 1(3):275–301, 2017.[7] Petter Holme and Jari Saram¨aki. Temporal networks. Physics reports,519(3):97–125, 2012.[8] Zoubin Ghahramani. An introduction to hidden markov models andbayesian networks. In Hidden Markov models: applications in computervision, pages 9–41. World Scientiﬁc, 2001.[9] Daniel K Sewell and Yuguo Chen. Latent space models for dynamic net-works with weighted edges. Social Networks, 44:105–116, 2016.[10] Peter Nystrup, Henrik Madsen, and Erik Lindstr¨om. Long memory of ﬁnan-cial time series and hidden markov models with time-varying parameters.Journal of Forecasting, 36(8):989–1002, 2017.[11] Riccardo Gallotti and Marc Barthelemy. The multilayer temporal networkof public transport in great britain. Scientiﬁc data, 2(1):1–8, 2015.[12] Piero Mazzarisi, Paolo Barucca, Fabrizio Lillo, and Daniele Tantari. Adynamic network model with persistent links and node-speciﬁc latent vari-ables, with an application to the interbank market. European Journal ofOperational Research, 281(1):50–65, 2020.[13] Bernard Derrida, Elizabeth Gardner, and Anne Zippelius. An exactlysolvable asymmetric neural network model. EPL (Europhysics Letters),4(2):167, 1987.[14] A Crisanti and Haim Sompolinsky. Dynamics of spin systems with ran-domly asymmetric bonds: Ising spins and glauber dynamics. PhysicalReview A, 37(12):4865, 1988. 2715] Drew Creal, Siem Jan Koopman, and Andr´e Lucas. Generalized autore-gressive score models with applications. Journal of Applied Econometrics,28(5):777–795, 2013.[16] Andrew C. Harvey. Dynamic Models for Volatility and Heavy Tails: WithApplications to Financial and Economic Time Series. Econometric SocietyMonographs. Cambridge University Press, 2013.[17] Simona Cocco, R´emi Monasson, Lorenzo Posani, and Gaia Tavoni. Func-tional networks from inverse modeling of neural population activity.Current Opinion in Systems Biology, 3:103–110, 2017.[18] Trang-Anh Nghiem, Bartosz Telenczuk, Olivier Marre, Alain Destexhe,and Ulisse Ferrari. Maximum-entropy models reveal the excitatory and in-hibitory correlation structures in cortical neuronal activity. Physical ReviewE, 98(1):012402, 2018.[19] Ulisse Ferrari, St´ephane Deny, Matthew Chalk, Gaˇsper Tkaˇcik, OlivierMarre, and Thierry Mora. Separating intrinsic interactions from extrin-sic correlations in a network of sensory neurons. Physical Review E,98(4):042410, 2018.[20] Seiji Tanaka and Harold A Scheraga. Model of protein folding: incorpora-tion of a one-dimensional short-range (ising) model into a three-dimensionalmodel. Proceedings of the National Academy of Sciences, 74(4):1320–1323,1977.[21] A Imparato, A Pelizzola, and M Zamparo. Ising-like model for proteinmechanical unfolding. Physical review letters, 98(14):148102, 2007.[22] Elena Agliari, Adriano Barra, Francesco Guerra, and Francesco Moauro. Athermodynamic perspective of immune capabilities. Journal of theoreticalbiology, 287:48–63, 2011.[23] Stefan Bornholdt. Expectation bubbles in a spin model of markets: Inter-mittency from frustration across scales. International Journal of ModernPhysics C, 12(05):667–674, 2001.[24] Jean-Philippe Bouchaud. Crises and collective socio-economic phenomena:Simple models and challenges. Journal of Statistical Physics, 151(3):567–606, May 2013.[25] Didier Sornette. Physics and ﬁnancial economics (1776–2014): puzzles,ising and agent-based models. Reports on progress in physics, 77(6):062001,2014.[26] Carlo Campajola, Fabrizio Lillo, and Daniele Tantari. Unveiling therelation between herding and liquidity with trader lead-lag networks.Quantitative Finance, in press, 2020.[27] Yann LeCun, Yoshua Bengio, and Geoﬀrey Hinton. Deep learning. nature,521(7553):436, 2015. 2828] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedfor-ward networks are universal approximators. Neural networks, 2(5):359–366,1989.[29] Aur´elien Decelle and Pan Zhang. Inference of the sparse kinetic ising modelusing the decimation method. Physical Review E, 91(5):052136, 2015.[30] Carlo Campajola, Fabrizio Lillo, and Daniele Tantari. Inference of thekinetic ising model with heterogeneous missing data. Physical Review E,99(6):062138, 2019.[31] Trang-Anh Nghiem, Olivier Marre, Alain Destexhe, and Ulisse Ferrari.Pairwise ising model analysis of human cortical neuron recordings. InInternational Conference on Geometric Science of Information, pages 257–264. Springer, 2017.[32] Trang-Anh E Nghiem, N´uria Tort-Colet, Tomasz G´orski, Ulisse Ferrari,Shayan Moghimyﬁroozabad, Jennifer S Goldman, Bartosz Tele´nczuk, Cris-tiano Capone, Thierry Bal, Matteo Di Volo, et al. Cholinergic switchbetween two types of slow waves in cerebral cortex. Cerebral Cortex,30(6):3451–3466, 2020.[33] Benjamin W Mooneyham, Michael D Mrazek, Alissa J Mrazek, Kaita LMrazek, Dawa T Phillips, and Jonathan W Schooler. States of mind: char-acterizing the neural bases of focus and mind-wandering through dynamicfunctional connectivity. Journal of cognitive neuroscience, 29(3):495–506,2017.[34] James A Hanley and Barbara J McNeil. The meaning and use of the areaunder a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, 1982.[35] Andrew P Bradley. The use of the area under the roc curve in the evalua-tion of machine learning algorithms. Pattern recognition, 30(7):1145–1159,1997.[36] Marc M´ezard and J Sakellariou. Exact mean-ﬁeld inference in asymmet-ric kinetic ising systems. Journal of Statistical Mechanics: Theory andExperiment, 2011(07):L07001, 2011.[37] Marco P Tucci. Time-varying parameters: a critical introduction.Structural Change and Economic Dynamics, 6(2):237–260, 1995.[38] Siem Jan Koopman, Andre Lucas, and Marcel Scharth. Predicting time-varying parameters with parameter-driven and observation-driven models.Review of Economics and Statistics, 98(1):97–110, 2016.[39] David R Cox, Gudmundur Gudmundsson, Georg Lindgren, Lennart Bon-desson, Erik Harsaae, Petter Laake, Katarina Juselius, and Steﬀen L Lau-ritzen. Statistical analysis of time series: Some recent developments [withdiscussion and reply]. Scandinavian Journal of Statistics, pages 93–115,1981. 2940] George E Tauchen and Mark Pitts. The price variability-volume relation-ship on speculative markets. Econometrica: Journal of the EconometricSociety, pages 485–505, 1983.[41] Neil Shephard. Stochastic volatility: selected readings. Oxford UniversityPress on Demand, 2005.[42] Luc Bauwens and David Veredas. The stochastic conditional durationmodel: a latent variable model for the analysis of ﬁnancial durations.Journal of econometrics, 119(2):381–412, 2004.[43] Christian M Hafner and Hans Manner. Dynamic stochastic copula models:Estimation, inference and applications. Journal of Applied Econometrics,27(2):269–295, 2012.[44] Tim Bollerslev. Generalized autoregressive conditional heteroskedasticity.Journal of econometrics, 31(3):307–327, 1986.[45] Francisco Blasques, Siem Jan Koopman, and Andre Lucas. Information-theoretic optimality of observation-driven time series models for continuousresponses. Biometrika, 102(2):325–343, 2015.[46] Francisco Blasques, Andre Lucas, and Andries van Vlodrop. Finite sampleoptimality of score-driven volatility models. Tinbergen Institute DiscussionPaper, 17-111/III, 2017.[47] Francesco Calvori, Drew Creal, Siem Jan Koopman, and Andr´e Lucas.Testing for parameter instability across diﬀerent modeling frameworks.Journal of Financial Econometrics, 15(2):223–246, 2017.[48] RW Penney, ACC Coolen, and D Sherrington. Coupled dynamics of fastspins and slow interactions in neural networks and spin systems. Journalof Physics A: Mathematical and General, 26(15):3681, 1993.[49] Christian Beck and Ezechiel GD Cohen. Superstatistics. Physica A:Statistical mechanics and its applications, 322:267–275, 2003.[50] Christian Beck, Ezechiel GD Cohen, and Harry L Swinney. From timeseries to superstatistics. Physical Review E, 72(5):056133, 2005.[51] Marcello Rambaldi, Paris Pennesi, and Fabrizio Lillo. Modeling foreignexchange market activity around macroeconomic news: Hawkes-processapproach. Physical Review E, 91(1):012819, 2015.[52] Marcello Rambaldi, Vladimir Filimonov, and Fabrizio Lillo. Detection ofintensity bursts using hawkes processes: An application to high-frequencyﬁnancial data. Physical Review E, 97(3):032318, 2018.[53] Vladimir Filimonov and Didier Sornette. Quantifying reﬂexivity in ﬁnan-cial markets: Toward a prediction of ﬂash crashes. Physical Review E,85(5):056108, 2012.[54] Stephen J Hardiman, Nicolas Bercot, and Jean-Philippe Bouchaud. Criticalreﬂexivity in ﬁnancial markets: a hawkes process analysis. The EuropeanPhysical Journal B, 86(10):442, 2013.3055] Vladimir Filimonov and Didier Sornette. Apparent criticality and cali-bration issues in the hawkes self-excited point process model: applicationto high-frequency ﬁnancial data. Quantitative Finance, 15(8):1293–1314,2015.[56] Stephen J Hardiman and Jean-Philippe Bouchaud. Branching-ratio ap-proximation for the self-exciting hawkes process. Physical Review E,90(6):062807, 2014.[57] Spencer Wheatley, Alexander Wehrli, and Didier Sornette. The endo–exo problem in high frequency ﬁnancial price ﬂuctuations and rejectingcriticality. Quantitative Finance, 19(7):1165–1178, 2019.[58] Halbert White. Speciﬁcation testing in dynamic models. In Advances ineconometrics-Fifth world congress, volume 1, pages 1–58, 1987.[59] US Securities & Exchange Commission and Commodity Futures Trad-ing Commission. Findings regarding the market events of may 6, 2010.Washington DC, 2010.[60] David Easley, Marcos M Lopez De Prado, and Maureen O’Hara. Themicrostructure of the “ﬂash crash”: ﬂow toxicity, liquidity crashes, andthe probability of informed trading. The Journal of Portfolio Management,37(2):118–128, 2011.[61] Ananth Madhavan. Exchange-traded funds, market structure, and the ﬂashcrash. Financial Analysts Journal, 68(4):20–35, 2012.[62] Albert J Menkveld and Bart Zhou Yueshen. The ﬂash crash: A cautionarytale about highly fragmented markets. Management Science, 65(10):4470–4488, 2019.[63] Andrei Kirilenko, Albert S Kyle, Mehrdad Samadi, and Tugkan Tuzun. Theﬂash crash: High-frequency trading in an electronic market. The Journalof Finance, 72(3):967–998, 2017.[64] Matteo Aquilina, Eric Budish, and Peter O’Neill. Quantifying the high-frequency trading “arms race”: A simple new methodology and estimates.United Kingdom Financial Conduct Authority Occasional Paper, 2020.[65] Jerome Powell. Transcript of chair powell’s press conference. Federal OpenMarket Committee, July 31, 2019.[66] David V Budescu. Dominance analysis: a new approach to the problemof relative importance of predictors in multiple regression. Psychologicalbulletin, 114(3):542, 1993.[67] Razia Azen and David V Budescu. The dominance analysis approachfor comparing predictors in multiple regression. Psychological methods,8(2):129, 2003.[68] Helena Chuli´a, Martin Martens, and Dick van Dijk. Asymmetric eﬀects offederal funds target rate changes on s&p100 stock returns, volatilities andcorrelations. Journal of Banking & Finance, 34(4):834–839, 2010.3169] Nikolaus Hautsch, Dieter Hess, and David Veredas. The impact of macroe-conomic news on quote adjustments, noise, and informational volatility.Journal of Banking & Finance, 35(10):2733–2746, 2011.[70] Yasser Roudi and John Hertz. Mean ﬁeld theory for nonequilibrium networkreconstruction. Physical review letters, 106(4):048702, 2011.[71] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic opti-mization. arXiv preprint arXiv:1412.6980, 2014.[72] Christian Francq, Lajos Horvath, and Jean-Michel Zako¨ıan. Merits anddrawbacks of variance targeting in garch models. Journal of FinancialEconometrics, 9(4):619–656, 2011.

Appendix A Derivation of the theoretical AUCfor the KIM

In this Appendix we provide the detailed derivation of the theoretical AUCshown in Fig. 1, as well as provide further comments on the choice of φ ( g ) tobe Gaussian.Following the deﬁnition of TPR and FPR one can compute their expectedvalues T P R φ ( α, β ) = 1 Z + φ ( β ) (cid:90) g i : p + >α dg i φ ( g ) p + ( β, g i ) (A.1a) F P R φ ( α, β ) = 1 Z − φ ( β ) (cid:90) g i : p + >α dg i φ ( g ) p − ( β, g i ) (A.1b)where Z ± φ ( β ) = p ( s i = ±

1) is a normalization function, φ ( g ) is the unconditionaldistribution of the eﬀective ﬁelds g i and we have abbreviated the probability ofsampling a positive or negative value as p ± ( β, g i ) = e ± βg i βg i )The deﬁnition of the theoretical AUC then reads as AU C φ ( β ) = (cid:90) T P R φ ( α, β ) ∂F P R φ ( α, β ) ∂α dα that is the area below the set of points ( F P R ( α ) , T P R ( α )). The lower limitto the integration in Eqs. A.1 is g min : p + ( g min ) = α , which is found to be g min ( α, β ) = 12 β log α − α Then applying the partial derivative to the deﬁnition of FPR it follows that ∂F P R∂α = − Z − φ ( β ) ∂g min ∂α φ ( g min )(1 − α )where we have substituted p − ( β, g min ) = 1 − α . Plugging all the aboveresults in the deﬁnition of AU C φ we then ﬁnd32 U C φ ( β ) = 1 Z + φ ( β ) Z − φ ( β ) (cid:90) dα (cid:34)(cid:90) + ∞ g min ( α,β ) dgφ ( g ) e βg βg (cid:35) ×× (cid:20) αβ φ ( g min ( α, β )) (cid:21) (A.2)From an operational perspective φ ( g ) is the distribution that the eﬀectiveﬁelds show cross-sectionally across the whole sample, that is g i ( t ) ∼ φ ( g ) ∀ i, t ,but it can also be calculated by giving a prior distribution to the static pa-rameters of the model, Θ = ( J, h, b ). Finding this distribution can be usefulto provide an easier and more accurate evaluation of the expected AUC of aforecast at a given β value, as it provides a bridge from the model parametersto the AU C ( β ) we derived in Eq. A.2 and shown in Fig. 1 in the main text.Let us assume, as is standard in the literature [14, 70, 36], that the param-eters Θ are structured in such a way that J ij iid ∼ N ( J /N, J /N − J /N ) h i iid ∼ N ( h , h )while b ik = 0 for simplicity. If that is the case then the distribution of g i ( t )is itself a Gaussian, as g i ( t ) is now a sum of independent Gaussian randomvariables J ij and h i with random coeﬃcients s j ( t ). Let us also deﬁne twoaverage operators: the average (cid:104)·(cid:105) over the distribution p of Eq. 1, also calledthe thermal average (which, the system being ergodic, coincides with a timeaverage for T → ∞ ), and the average · over the distribution of parameters,also known as the disorder average. Following M´ezard and Sakellariou [36] wecan then ﬁnd the unconditional mean of s i which reads m i = (cid:104) s i ( t ) (cid:105) = (cid:104) tanh [ βg i ( t )] (cid:105) (A.3)where we have substituted the conditional mean value of s i ( t ) inside thebrackets. This depends from the distribution of g i ( t ): assuming stationarityand calling g i = (cid:104) g i ( t ) (cid:105) and ∆ i = (cid:104) g i ( t ) (cid:105) − (cid:104) g i ( t ) (cid:105) we ﬁnd that they are g i = (cid:104) (cid:88) j J ij s j ( t ) + h i (cid:105) = (cid:88) j J ij m j + h i (A.4a)∆ i = (cid:42)(cid:88) j J ij s j ( t ) + h i  (cid:43) − (cid:42)(cid:88) j J ij s j ( t ) + h i (cid:43) == (cid:88) j,k J ij J ik [ (cid:104) s j ( t ) s k ( t ) (cid:105) − m j m k ] (A.4b)In Eq. A.4b spins s j ( t ) and s k ( t ) are mutually conditionally independentunder distribution p : this means that the only surviving terms are the ones for j = k , and thus we ﬁnd 33 i = (cid:88) j J ij (1 − m j ) (A.5)Having determined the value of the mean and variance of the eﬀective ﬁeld ofspin i we can now proceed to average over the disorder and ﬁnd the unconditionaldistribution of eﬀective ﬁelds at any time and for any spin, φ ( g ). First we realizethat the average of Eq. A.3 can now be substituted by a Gaussian integral m i = (cid:90) Dx tanh (cid:2) β (cid:0) g i + x ∆ i (cid:1)(cid:3) (A.6)where Dx is a Gaussian measure of variable x ∼ N (0 , φ ( g ) is g = (cid:104) g i ( t ) (cid:105) = (cid:88) j J ij m j + h i (A.7)Given the above results and the deﬁnition of J , the dependency between J ij and m j vanishes like O (1 /N ), which means that the two can be averagedover the disorder separately in the limit N → ∞ . This results in the followingexpression for the unconditional mean of g i ( t ) g = J m j + h = J m + h (A.8)where m = m i = (cid:90) Dx tanh [ β ( g i + x ∆ i )]both the integral and the average here are of diﬃcult solution and resultshave been provided by Crisanti and Sompolinsky [14]: they show that in thelimit N → ∞ and with h i = 0 ∀ i the system can be in one of two phases,a paramagnetic phase where m = 0 if β is smaller than a critical threshold β c ( J ) and J <

1, and a ferromagnetic phase where m (cid:54) = 0 otherwise. In thefollowing we report results for simulations in the paramagnetic phase, as theinference is not possible in the ferromagnetic phase. To give better intuitionlet us consider the integral above in the limit β →

0: then we can expand thehyperbolic tangent around 0 to ﬁnd (since x has zero mean) m ≈ βg i = β (cid:88) j J ij m j + h  = β ( J m + h ) (A.9)which in turn leads to an approximated solution for g in the limit β → g ≈ h (cid:18) βJ − βJ + 1 (cid:19) Moving on to the variance of g the calculation is straightforward. Addingthe mean over the disorder to Eq. A.4b we ﬁnd34 . . . b A UC l l l l l l l l l l J = 0, J =1.5, h = 0, h =1.5J = 0, J =0.5, h = 0, h =0.5J =−1, J =0.5, h =−1, h =0.5J = 1, J =0.5, h =−1, h =0.5 Figure A.1: Comparison between the AUC estimated on data simulated froma Kinetic Ising Model and the theoretically derived AUC with Gaussian distri-bution of the J and h parameters, varying β and the hyperparameters J , J , h and h . Plot points report average simulated values for a given β with errorbars at ± g = (cid:42)(cid:88) j J ij s j ( t ) + h i  (cid:43) − (cid:42)(cid:88) j J ij s j ( t ) + h i (cid:43) == (cid:88) j J ij + h i + 2 h i (cid:88) j J ij m j − (cid:88) j J ij m j + h i == J + h − J m (A.10)Equations A.8 and A.10 can then be used to calculate, given the parametersof the distribution generating Θ, the values of g and g that are to be pluggedin the distribution φ ( g ) of Eq. A.2.We simulated a Kinetic Ising Model with N = 100 spins for T = 2000 timesteps at diﬀerent constant values of β and then measured the AUC of predic-tions assuming the parameters are known. In Fig. A.1 we report a comparisonbetween these simulated values and the theoretical ones provided by Eq. A.2varying β and the hyperparameters J , J , h and h in the Gaussian setting wejust discussed and adopting the expansion for β →

0. We see that the approx-imation for small β of Eq. A.9 does not aﬀect the accuracy of the theoreticalprediction for larger values of β and that the mean is correctly captured by Eq.A.2. The only exception to this is found for β > J = 1, which accordingto the literature is close to the line of the ferromagnetic transition: in this casethe small β approximation fails to predict the simulated values. Larger valuesof N and T (not shown here) produce narrower error bars.The general eﬀect we see from Fig. A.1 is that higher variance of the J and h parameters leads to higher AUC values leaving all else unchanged (orangesquares and yellow circles), while moving the means has little eﬀect as long asthe system is in its paramagnetic phase.These results are easy to obtain thanks to the assumption that the model35arameters J and h have Gaussian distributed entries, but in principle thedistribution φ ( g ) can be derived also for other distributions, albeit probablyrequiring numerical solutions rather than the analytical ones we presented here. Appendix B Maximum Likelihood Estimation

As mentioned in the main text our estimation procedure is done in steps, start-ing by estimating the parameters Θ = (

J, h, b ) of the standard KIM and thenrunning a targeted estimation for the w , B and A parameters. In this Appendixwe provide some further details about this procedure.The whole process can be summarized as the maximization of the log-likelihood L (Θ , β ( t ) , w, B, A ) of the model in question, which in the case ofthe DyNoKIM reads (setting as usual h i = b i = 0 ∀ i ) L (Θ , β ( t ) , w, B, A ) = T (cid:88) t =1 (cid:88) i  β ( t ) (cid:88) j s i ( t + 1) J ij s j ( t )  − log Z ( t )  (B.1a)with log β ( t + 1) = w + B log β ( t ) + A I − / ( t ) ∇ t (B.1b)and the deﬁnitions of the various quantities are given in Section 2 in the maintext. The log-likelihood shown above has a recursive form, as each term in thesum of Eq. B.1a depends on β ( t ), which is determined recursively through Eq.B.1b from a starting condition β (1). This means that, if one were to maximize L with respect to all the parameters by applying a standard Gradient Descentmethod, at each computation of L and its gradient it would be necessary tocompute the recursion, resulting in a slow and computationally cumbersomeprocess. In order to make the estimation quicker we implement our multi-stepprocedure, relying on existing methods for the estimation of the standard KIMand of observation-driven models.Our ﬁrst step consists of maximizing L with respect to the standard KIMparameters Θ. This is done adopting the Mean Field approach of M´ezard andSakellariou [36], which is both fast and accurate in the estimation of fully con-nected models. We refer the interested readers to the original publication forfurther details on the method itself. The main reason to detach this step fromthe optimization of the complete log-likelihood is that Θ contains a large num-ber of parameters: if one can get an estimate for those without recurring toslow and hard to tune Gradient Descent methods the computational cost of theinference reduces signiﬁcantly.Given the values of Θ obtained in the ﬁrst step, we then move to the targetedestimation of w , B and A . This consists in ﬁrst estimating a target value ¯ f for the unconditional mean of f ( t ) = log β ( t ) and then optimize w , B and A maintaining the ratio w/ (1 − B ) = ¯ f ﬁxed. To estimate ¯ f we maximize thelog-likelihood of Eq. B.1a temporarily imposing A = B = 0, hence Eq. B.1bbecomes log β ( t ) = ¯ f = const. Finally, given this target value we optimize L with respect to w , B and A maintaining the ratio w/ (1 − B ) = ¯ f ﬁxed and setting f (1) = ¯ f to start the recursion of Eq. B.1b. During these last two steps we usethe ADAptive Momentum (ADAM) [71] Stochastic Gradient Descent method36s optimization algorithm, as we found in our case it had better performancewith respect to other available methods.This targeted estimation is not necessary - one could directly estimate w , B and A together - but it is a standard procedure in the estimation of observation-driven models like the GARCH [72], as it typically reduces the total number ofiterations of gradient descent.We point out one last remark concerning the indetermination of (cid:104) β (cid:105) in themodel of Eq. 4 and of (cid:104) β (cid:105) in Eq. 7, which is crucial to understand the resultsof Section 3. The fact that these values cannot be identiﬁed is not problematic per se , but requires caution when comparing models and ﬁltered parametersacross diﬀerent samples, or when comparing estimates with simulations. Toavoid misleading results, one needs to enforce the sample mean of the ﬁltered β ( t ) (or of each of the elements of β ( t ) in the DyEKIM) to be equal to areference value, which without loss of generality we pick to be (cid:104) β (cid:105) = 1. Thisis easily done by running the estimation and ﬁltering, then measuring (cid:104) β (cid:105) andrescaling β (cid:48) ( t ) = β ( t ) / (cid:104) β (cid:105) . To leave the model unchanged an opposite rescalingis needed for the parameters J , h and b , each having to be multiplied by (cid:104) β (cid:105) themselves. This transformation does not change the log-likelihood, thus themodel parameters are still MLE, but crucially allows to set a reference value for β that solves the indetermination.Given this remark, in all the simulations we show in Section 3 where the datagenerating process of β ( t ) is misspeciﬁed we generate its values making sure thattheir sample mean is 1. By doing so we do not lose any generality in our results,as the indetermination needs to be solved for the data generating process too ifone wants to obtain meaningful results, and we are able to correctly comparethe simulated values of J and β ( tt