[PDF] Tail Granger causalities and where to find them: extreme risk spillovers vs. spurious linkages

Abstract

Identifying risk spillovers in financial markets is of great importance for assessing systemic risk and portfolio management. Granger causality in tail (or in risk) tests whether past extreme events of a time series help predicting future extreme events of another time series. The topology and connectedness of networks built with Granger causality in tail can be used to measure systemic risk and to identify risk transmitters. Here we introduce a novel test of Granger causality in tail which adopts the likelihood ratio statistic and is based on the multivariate generalization of a discrete autoregressive process for binary time series describing the sequence of extreme events of the underlying price dynamics. The proposed test has very good size and power in finite samples, especially for large sample size, allows inferring the correct time scale at which the causal interaction takes place, and it is flexible enough for multivariate extension when more than two time series are considered in order to decrease false detections as spurious effect of neglected variables. An extensive simulation study shows the performances of the proposed method with a large variety of data generating processes and it introduces also the comparison with the test of Granger causality in tail by [Hong et al., 2009]. We report both advantages and drawbacks of the different approaches, pointing out some crucial aspects related to the false detections of Granger causality for tail events. An empirical application to high frequency data of a portfolio of US stocks highlights the merits of our novel approach.

Full PDF

aa r X i v : . [ q -f i n . R M ] M a y TAIL GRANGER CAUSALITIES AND WHERE TO FIND THEM:EXTREME RISK SPILLOVERS VS SPURIOUS LINKAGES

PIERO MAZZARISI , , ∗ , SILVIA ZAOLI , , CARLO CAMPAJOLA , AND FABRIZIO LILLO Abstract.

Identifying risk spillovers in ﬁnancial markets is of great importance for assessing systemicrisk and portfolio management. Granger causality in tail (or in risk) tests whether past extreme eventsof a time series help predicting future extreme events of another time series. The topology and con-nectedness of networks built with Granger causality in tail can be used to measure systemic risk andto identify risk transmitters. Here we introduce a novel test of Granger causality in tail which adoptsthe likelihood ratio statistic and is based on the multivariate generalization of a discrete autoregres-sive process for binary time series describing the sequence of extreme events of the underlying pricedynamics. The proposed test has very good size and power in ﬁnite samples, especially for large samplesize, allows inferring the correct time scale at which the causal interaction takes place, and it is ﬂexibleenough for multivariate extension when more than two time series are considered in order to decreasefalse detections as spurious eﬀect of neglected variables. An extensive simulation study shows the per-formances of the proposed method with a large variety of data generating processes and it introducesalso the comparison with the test of Granger causality in tail by [Hong et al., 2009]. We report bothadvantages and drawbacks of the diﬀerent approaches, pointing out some crucial aspects related to thefalse detections of Granger causality for tail events. An empirical application to high frequency data ofa portfolio of US stocks highlights the merits of our novel approach.

Introduction

The problem of inferring causal interactions from data has a remarkable history in scientiﬁc researchand the milestone work of Granger [Granger, 1969] represented the turning point in such study. Ac-cording to Granger causality, given a couple of time series x and y , it is said that y ‘Granger-causes’ x if the past information on y helps in forecasting x better than using only the past information on x .Granger causality overcomes the philosophical question of properly deﬁning what ‘true causality’ means,by limiting the study to systems whose state can be assessed quantitatively by means of time seriesand relying on the concept of ‘predictive causality’ [Granger, 1980]. Within this framework, Grangercausality is pragmatic, well deﬁned, and has exhibited many successful applications in a variety of ﬁelds,from quantitative ﬁnance [Billio et al., 2012, Corsi et al., 2018] to transportations [Zanin et al., 2017] andneuroscience [Seth et al., 2015].In time series econometrics, the most commonly used test of Granger causality for bivariate systems isthe F-test originally proposed in [Granger, 1969]. This test is sometimes referred to as causality in mean since the statistical testing procedure is based on the evaluation of the forecasting performances associ-ated with the ﬁrst moment of a time series. The most stringent assumption consists in considering theinformation on the two time series as all the signiﬁcant information when testing for Granger causality.This is a strong assumption because, in the case of a high dimensional system, a low dimensional subpro-cess contains little information about the true structure of interactions and some causal relations mightbe falsely detected as spurious eﬀect of neglected variables [Lütkepohl, 1982]. Starting with the seminalwork of [Geweke, 1982], multivariate approaches have been proposed to correct these spurious eﬀects bytaking into account network eﬀects in the statistical testing procedure. Finally, Granger causality hasbeen investigated from a theoretical point of view moving from Econometrics to Information Theory and,in particular, the equivalence with transfer entropy was proved in the Gaussian case [Barnett et al., 2009],which is in turn equivalent to the log-likelihood ratio statistic [Barnett and Bossomaier, 2012]. In prac-tice, this implies, among other things, that the Likelihood-Ratio test is supported as statistical method (1) University of Bologna, Department of Mathematics (2)

Scuola Normale Superiore, Pisa (3)

Quantitative Life Science Section, The Abdus Salam International center for Theoretical Physics(ICTP), Trieste ∗ Corresponding author at: Scuola Normale Superiore, Piazza dei Cavalieri, 7, 56126 Pisa (PI), Italy

E-mail addresses : [email protected], [email protected], [email protected], [email protected] . Date : May 5, 2020.

Key words and phrases.

Granger causality in risk, Spillover eﬀects, Financial contagion, Autoregressive processes,Likelihood-Ratio test. JEL: C12, C30, G10. or inferring Granger causality. The advantage relies on less stringent hypotheses about the statisticaldistributions with respect to the F-test.Granger causality proved useful in monitoring systemic risk in ﬁnancial markets. In fact, recent ap-plications showed that it captures the network of risk propagation between market participants, and thedegree of interconnectedness of this network can be used to deﬁne indicators of systemic risk. For in-stance, [Billio et al., 2012] have adopted Granger causality in mean as a proxy for return-spillover eﬀectsamong hedge funds, banks, broker/dealers, and insurance companies ( e.g. see [Danielsson et al., 2012,Battiston et al., 2012, Duarte and Eisenbach, 2014] for theoretical studies on spillover eﬀects during ﬁ-nancial crises), showing that the ﬁnancial system has become considerably more interconnected beforethe ﬁnancial crisis of 2007-2008 because ﬁnancial innovations and deregulation had increased the inter-dependence of business between such investors.The original Granger causality test evaluates the forecasting performance giving equal importance tothe ability to forecast average or extreme values, negative or positive ones. However, when monitoringﬁnancial risk, extreme downside market movements are much more important than small ﬂuctuationsfor spillover eﬀects. A method speciﬁc for risk measures, in particular volatility, was introduced by[Cheung and Ng, 1996] with the concept of Granger causality in variance , by extending the concept ofcausation to the second moment. Nevertheless, variance is a two-sided risk measure and it is not ableto capture heavy tails, thus the causal relations between extreme events of two time series. To this end,Granger causality in tail can be deﬁned. The concept was ﬁrstly introduced by [Hong et al., 2009]. Inthis work, the authors have proposed a kernel-based test to detect extreme downside risk spillovers witha statistical procedure in two steps: (i) measuring risk by the left (or right, depending on the application)tail of the distribution, i.e. Value at Risk, and (ii) testing for non-zero lagged cross-correlations betweenthe two binary time series representing the occurrences of extreme left (right) tail events, with a methodbased on spectral analysis. Based on this test, e.g. , [Corsi et al., 2018] have studied the network of causalrelations detected by Granger causality in tail for a bipartite ﬁnancial system of banks and sovereign bondsand, combining measures of network connectedness with the ratings of the sovereign bonds, proposed aﬂight-to-quality indicator to identify periods of turbulence in the market. However, as we show below, thestatistical test of Granger causality in tail by [Hong et al., 2009] displays some sensitivity to both non-zeroauto-correlation and instantaneous cross-correlation in the binary time series representing extreme events,resulting in an increased rate of false causality detections. Additionally, the test by [Hong et al., 2009]is by construction a pairwise causality analysis, thus sensitive to false detections when variables of someimportance, diﬀerent from the two under investigation, are not considered.In this paper we propose a diﬀerent approach to identify Granger causality in tail, which overcomessome of the issues of the Hong et al. method. Diﬀerently from the latter, our approach is parametric andit models explicitly causality interactions between time series. Thus, it is less sensitive to other possibleeﬀects, such as autocorrelation, which may result in spurious detections as mentioned above, and can begeneralized to multivariate settings. Moreover, having an explicit model of the causation process allowsus to devise a method based on Likelihood-Ratio to test for Granger causality in tail. Speciﬁcally, thecausation mechanism is captured by a process in which the extreme events are modeled according toa discrete autoregressive process of order p , namely DAR(p). The DAR(p) process, ﬁrst introduced by[Jacobs and Lewis, 1978], is the natural extension of the standard autoregressive process for binary timeseries. We consider a multivariate generalization of the DAR(p) process, where the binary variable X describing the occurrence of an extreme event for the underlying time series x can be copied from itspast values or from the past values of the extreme events of another time series y . We ﬁrst propose astatistical test based on the bivariate version of the model, for any p . Then, to overcome the limit ofpairwise analysis highlighted above, we also propose a statistical method for the multivariate case, with p = 1 (Markovian dynamics).Our ﬁndings show that the detection of causality between extreme events is far from a trivial task,with a signiﬁcant dependence on the adopted statistical procedure. In fact, we show that the proposedmethod and the current standard in literature represented by the test of [Hong et al., 2009] diﬀer undersome circumstances. First, as mentioned above, the test by Hong et al. displays some sensitivity toauto-correlation and cross-correlation of the time series of extreme events, which are, on the contrary,naturally accounted for by our framework. We show numerically concrete examples of such behavior andthe spurious eﬀect of two-way causality detection in the presence of unidirectional relations, a drawbackof the Hong et al.’s test which is solved by our method. As a consequence, we claim that our methodshould be preferable in such cases. At the same time, we present also cases when the approach by Hong etal. outperforms ours, for instance when the underlying dynamics of the time series is autoregressive andheteroskedastic, even if the discrepancy is typically quite small. Second, we point out the importance of a ultivariate approach, numerically showing the consequences of network eﬀects in the spurious detectionof Granger causality in tail relations when adopting a pairwise analysis. We then propose a possiblesolution within our framework. Finally, in an empirical application to high-frequency price returns of aportfolio of stocks traded in the US stock markets, we highlight that diﬀerent methods yield diﬀerentnetworks of causal interactions between stocks. First, while Hong et al.’s approach results in an almostcomplete graph, both the pairwise and multivariate versions of the statistical method we propose givemuch sparser networks. Hence, the measure of the level of causality in the system dynamics depends onthe chosen approach. Additionally, the captured dynamics of the network evolution itself is quite diﬀerent,in particular in relation to the presence of the ﬁnancial crisis of 2007-2008: no patterns are recognizedby using the test of Hong et al., while, with our method, a sharp transition characterizes the number ofcausal interactions between the stocks composing the ﬁnancial sector and the others. In particular, weﬁnd that the ﬁnancial sector started to be less ‘Granger-caused’ by the other stocks before the ﬁnancialcrisis, but it ‘Granger-causes’ more than the average over all the considered period. Thus, our ﬁndingsopen a discussion about the correct evaluation of a Granger causality relation between tail events and, inthis paper, we highlight both advantages and drawbacks of the diﬀerent approaches together with somesignals associated with false detections.The paper is organized as follows. Section 1 presents the general deﬁnition of Granger causality in tailand explains how to obtain the sequence of extreme events from data. We also discuss the importance ofthe multivariate approach to avoid false detections because of network eﬀects. Section 2 introduces thenovel methodology and describes how to construct the test statistics. Section 3 presents some Monte Carloexercises to validate numerically the novel approach and to compare it with the test by [Hong et al., 2009].Section 4 shows a ﬁnancial application of the method. Finally, Section 5 concludes. Technical details arereported in Appendix A. 1. Granger causality in tail

As introduced for the ﬁrst time by [Granger, 1969], the concept of

Granger causality relies on testingwhether the past information on a time series y is statistically useful in predicting the future of anothertime series x , better than using only the past information on x . In the original version of the test, theinformation on past realization of the two time series deﬁnes the information set, which is also called the Universe . The information set may include also the information on other variables.Here, we study Granger Causality (GC) in tail , that is a generalization of the standard Grangercausality, but focusing on the prediction of extreme events represented by binary time series. With asimilar aim of [Granger, 1969], we deﬁne:

Deﬁnition 1.1. (Granger Causality in tail) . A time series y ‘ Granger-causes in tail ’ another time series x if, given some information set, we are able to predict the occurrence of an extreme event for x usingthe past information on extreme events of y better than if the information on extreme events of y is notconsidered.To characterize whether the occurrence of an extreme event for y can help predicting the occurrenceof an extreme event for x in the spirit of Granger causality, we need to specify how an extreme event isdeﬁned, giving an operational meaning to deﬁnition 1.1.1.1. Extreme events and Value at Risk.

Assume to observe a realization at time t of a randomvariable x , i.e. x t ∈ R . Given the distribution of the random variable conditional to some suitablydeﬁned information set I t − up to time t − , the realization x t is deﬁned as extreme if it is in the left(or, equivalently right) tail of the distribution.A natural way to characterize the tail of a distribution is the Value at Risk (VaR), see, e.g. , [Roy, 1952].In the case of the left tail, for a given probability level ρ ∈ (0 , , the Value at Risk V aR ρ,t at time t is the quantile of the distribution of x conditional to the information set I t − and associated with theprobability ρ . It is implicitly deﬁned by P ( x t < V aR ρ,t | I t − ) = ρ almost surely (a.s.) . Given the time series { x t } t =1 ,...,T which describes the stochastic process for x at discrete time, there existseveral methods to estimate the Value at Risk of the distribution of x at time t : Monte Carlo simulationmethods, Hansen’s autoregressive conditional density estimation [Hansen, 1994], Morgan’s RiskMetrics[Morgan, 1996], historical estimation of the realized variance [Barndorﬀ-Nielsen and Shephard, 2002 ],and Engle and Manganelli’s conditional autoregressive VaR (CAViaR) model [Engle and Manganelli, 2004]. Here, we consider the stochastic process in discrete time with the unit value representing the time scale of observations. n the case of ﬁnancial time series, we can be interested, e.g. , in extreme variations of prices, thus wecan compare the price return x t with an estimate of the instantaneous or spot volatility σ t , i.e. x t σ t < θ, where the value of θ is either chosen as a free parameter or determined by the desired probability level ρ ofthe Value at Risk, given some assumption on the probability distribution of returns. The crucial aspects inthis approach are the proper estimation of the spot volatility, see [Barndorﬀ-Nielsen and Shephard, 2004,Mancini, 2009, Corsi et al., 2010] and the choice of the conditional density (when one is interested inmapping θ to ρ ).Sometimes, the tail probabilities depends not only on the mean and the variance of the distribu-tion, but also on other moments such as skewness and kurtosis, e.g. see the empirical ﬁnancial studiesby [Harvey and Siddique, 1999, Harvey and Siddique, 2000, Jondeau and Rockinger, 2003]. Some econo-metric models can capture time-varying higher order conditional moments, such as [Gallant and Tauchen, 1996,Hansen, 1994].Finally, given some estimation of the Value at Risk, we can map the time series of realizations { x t } t =1 ,...,T of x to the binary time series { X t } t =1 ,...,T of extreme events of x , by deﬁning X t = 1 if x t < V aR ρ,t , otherwise.1.2. Null vs Alternative Hypotheses for GC in tail.

The statistical test of Granger causality intail as deﬁned in Def. 1.1 can be formalized as follows.Given the times series of extreme events (or hits ) associated with x and y , i.e. X and Y , and all otherinformation about the universe included in the information set I , the null hypothesis that y does not‘Granger-cause in tail’ x can be stated as H : P ( X t = 1 | I t − ) = P ( X t = 1 | I t − \ { Y s } s =1 ,...,t − ) a.s. ∀ t, (1.1)where I t − ≡ {{ X s } s =1 ,...,t − , { Y s } s =1 ,...,t − , U t − } with U t − the set of all available information (onother time series, possibly) up to time t − .On the other hand, the alternative hypothesis is H A : P ( X t = 1 | I t − ) = P ( X t = 1 | I t − \ { Y s } s =1 ,...,t − ) a.s. ∀ t. (1.2)We say that y ‘Granger-causes in tail’ x if the null hypothesis H is rejected. Thus, under the alternativehypothesis H A , the information on the past extreme events of y , i.e. { Y s } s =1 ,...,t − , can be used to obtaina better prediction on the occurence of an extreme event of x , i.e. X t , with respect to the predictionobtained not accounting for it.1.3. Hong et al.’s test.

The method proposed by [Hong et al., 2009] is a kernel-based non-parametrictest for Granger causality in tail, which is built on the null hypothesis (1.1) and having test statisticsbased on the normalized cross-spectral density between two time series X and Y , f ( ω ) = 12 π + ∞ X τ = −∞ ρ ( τ ) e − iωτ (1.3)with ρ ( τ ) ≡ Corr ( X t , Y t − τ ) . In particular, under the null hypothesis (1.1), it is ρ ( τ ) = 0 ∀ τ > .Thus, using spectral methods similarly to [Hong, 2001], a test statistic can be deﬁned to control fornon-zero lagged cross-correlations between the two binary time series { X t } t =1 ,...,T and { Y t } t =1 ,...,T . Thestatistical rejection of zero cross-correlation coeﬃcients is thus a signal of a causality relation. Thisapproach considers the possibility of a causal interaction at any possible time lag. However, becauseof ﬁnite sample size, the largest time scale is imposed in practice by the kernel estimator of the cross-spectral density (1.3), in particular by the kernel non-uniform weighting whose eﬀective window is ﬁxedby an integer parameter M . In the following, we adopt the Daniell kernel which has displayed the bestperformance in the numerical analysis presented in [Hong et al., 2009], with M = 5 when not speciﬁeddiﬀerently. In the case of the right tail, it is x t > V aR − ρ,t . .4. Network eﬀects and multivariate analysis.

As pointed out in the original paper of Grangercausality [Granger, 1969], the deﬁnition of the Universe is crucial. In the analysis of a causality relationbetween x and y , the time series { x t , y t } t =1 ,...,T are considered as all available information. This as-sumption is strong and unrealistic. In fact, consider the simplest case in which all relevant information isnumerical in nature and refers to three stochastic processes x , y , and z . Now suppose to test the presenceof a causal relation between x and z , whereas the true causality chain is from x to y , and from y to z .Then, neglecting the information on y when testing causality between x and z can induce a spuriouscausality detection. This is similar to spurious correlation between sets of data that arise when someother statistical variable of importance has not been included.False detection of causal relations becomes relevant when we analyze a system displaying by nature acomplex network of interactions between its many subparts. In this scenario, pairwise causality analysisapplied to all possible pairs of elements will capture the network of causal relations, but also spuriouseﬀects. This could be the case, for example, of ﬁnancial networks of investors [Billio et al., 2012] orof banks and bond investments [Corsi et al., 2018] where the degree of connectedness in the Grangercausality network (for both in mean and in tail cases) is used to construct indicators of ﬁnancial distress.Thus, reducing false detections is of paramount importance for the correct evaluation of such indicators.To this end, the causality analysis needs to be generalized to the multivariate case, thus accounting forthe information on all subparts composing the whole system under investigation (but assuming that noimportant variables have been excluded).Granger causality in mean has been extended to the multivariate case using the vector autoregressiveprocess VAR(p) for more than two variables [Geweke, 1982] and several procedures have been introducedto validate the statistical signiﬁcance of the oﬀ-diagonal couplings, i.e. non-zero parameters capturingthe presence of causal interactions, such as the LASSO penalization [Tibshirani, 1996] or methods basedon the transfer entropy [Barrett et al., 2010]. In the next Section, after introducing a pairwise test ofGranger causality in tail based on

Discrete AutoRegressive

DAR(p) processes [Jacobs and Lewis, 1978]for binary time series, which we generalize to the bivariate case, we also present a Markovian ( p = 1 )extension to the multivariate case with more than two variables, in the same spirit of both the milestoneworks of [Granger, 1969, Geweke, 1982].2. Methods and test statistics

In this section we propose the multivariate generalization of the DAR(p) process to describe jointlythe dependence structure of N binary random variables which represent the extreme events of someunderlying time series. The introduction of this modeling framework permits to test for the presence ofnon-zero terms of (lagged) interactions between the binary variables, which signal causal relations.The discrete autoregressive model DAR(p) has been introduced for the ﬁrst time by [Jacobs and Lewis, 1978]and recently applied to the modeling of temporal networks [Williams et al., 2019a, Mazzarisi et al., 2020,Williams et al., 2019b]. In the most general setting, it describes the time evolution of a categorical vari-able which has p -th order Markov dependence and a multinomial marginal distribution. Here we areinterested in the case of a binary random variable X , thus the marginal distribution is Bernoulli. If X t is the realization of X at time t , we have X t = V t X t − τ t + (1 − V t ) Z t (2.1)where X t ∈ { , } ∀ t , V t ∼ B (˜ ν ) is a Bernoulli random variable with ˜ ν ∈ [0 , , τ t ∼ M (˜ γ , ..., ˜ γ p ) is amultinomial random variable with ˜ γ ≡ { ˜ γ j } j =1 ,...,p such that P pi =1 ˜ γ i = 1 and Z t ∼ B ( ˜ χ ) is a Bernoullirandom variable with ˜ χ ∈ [0 , . In other words, at each time, V t determines whether X t is copied fromthe past or sampled from the marginal. When X t is copied from the past, the multinomial randomvariable τ t selects the time lag and, accordingly, which past realization of X we copy.The process deﬁned by (2.1) has the property that the autocorrelation at any lag is larger than or equalto zero. This is by construction, since ˜ ν, ˜ γ , ..., ˜ γ p are non-negative deﬁnite. Moreover, the Yule-Walkerequations associated with (2.1) are formally equivalent to the ones of the standard AR(p) process, see[Jacobs and Lewis, 1978].We consider the generalization of the process (2.1) to the case of a multivariate (binary) time series X ≡ { X it } i =1 ,...,Nt =0 , ,...,T . With respect to the univariate DAR(p), the diﬀerence is that when X ti is copiedfrom the past it can be copied either from its own past values or from the past values of one of the other N − variables. Speciﬁcally, the binary random variable X ti is copied from the past with probability ν i ,and in this case it selects which variable j to copy according to the probabilities λ ij and the time lagaccording to the multinomial τ ijt . Otherwise, it is sampled from its marginal Bernoulli distribution with arameter χ i . We can write the evolution of X i as X it = V it X J it t − τ iJitt + (1 − V it ) Z it (2.2)where X it ∈ { , } ∀ i, t , V it ∼ B ( ν i ) with ν i ∈ [0 , ∀ i , τ ijt ∼ M ( γ ij, , ..., γ ij,p ) ∀ i, j with γ ij ≡{ γ ij,s } s =1 ,...,p such that P ps =1 γ ij,s = 1 , J it ∼ M ( λ i , ..., λ iN ) is a multinomial random variable with λ i ≡ { λ ij } j =1 ,...,N such that P Nj =1 λ ij = 1 ∀ i , and Z it ∼ B ( χ i ) with χ i ∈ [0 , , ∀ i . For notationalsimplicity, let us deﬁne γ ≡ { γ ij } i,j =1 ,...,N and λ ≡ { λ i } i =1 ,...,N . We refer to (2.2) as the Vector DiscreteAutoRegressive

VDAR(p) process.If the oﬀ-diagonal interaction term λ ij with i = j is non-zero, a causal relation in the sense of Grangeris present between X i and X j , that is, the occurrence of X jt = 1 (an extreme event for the underlying timeseries x j ) increases the probability of observing one hit X it + t ′ = 1 at some future time t + t ′ (dependingon τ ij ). Therefore, to test for Granger causality in tail between X i and X j we must assess the statisticalsigniﬁcance of the oﬀ-diagonal term λ ij of the process (2.2).In principle, this approach allows to analyze the time scale at which a causal relation occurs by properlyselecting the time lag associated with the mechanism of copying from the past. However, the numberof parameters of the model (2.2) grows as O ( N p ) , thus some restrictions of the parameter space areneeded for computational reasons. For this reason, in sections 2.1 and 2.2 we present two possible suchrestrictions to construct a statistical test: the bivariate case N = 2 , with p > and the multivariateMarkovian case p = 1 with N > .2.1. Pairwise causality analysis.

Pairwise analysis aims at detecting Granger causality in tail betweentwo time series, and it excludes the information on any other variables. This approach is suﬃcient if, atleast approximately, no other variable aﬀects the dynamics of the two considered ones. In this situation,in fact, we can ignore the risk of spurious detections discussed in section 1.4. Let { X t } t =1 ,...,T and { Y t } t =1 ,...,T be the binary time series representing the occurrences of extreme events. We describe theirdependence structure with the bivariate VDAR(p) process (2.2).2.1.1.

Bivariate VDAR(p).

The bivariate version of the model (2.2) can be speciﬁed more explicitly as ( X t = V t ((1 − J t ) X t − τ t + J t Y t − τ t ) + (1 − V t ) Z t Y t = V t ( J t X t − τ t + (1 − J t ) Y t − τ t ) + (1 − V t ) Z t (2.3)where X t , Y t ∈ { , } ∀ t , V it ∼ B ( ν i ) with ν i ∈ [0 , ∀ i = 1 , , J it ∼ B ( λ i ) with λ i ∈ [0 , ∀ i = 1 , ,and τ ijt ∼ M ( γ ij, , ..., γ ij,p ) with P ps =1 γ ij,s = 1 . The marginals Z t and Z t are also Bernoulli randomvariables with distribution B ( χ ) and B ( χ ) , respectively, with χ , χ ∈ [0 , .The process (2.3) describes the evolution of the binary time series X , as follows. At time t , (i) V t determines whether X t is copied from the past (with probability ν i ) or not; (ii) if it is, J t determineswhether it is copied from a past value of X , X t − τ t , (with probability − λ i ) or from a past value of Y , Y t − τ t (with probability λ i ); (iii) the time lag is determined by the multinomial random variable τ t (for X t − τ t ) or τ t (for Y t − τ t ); (iv) if X t is not copied from the past, its value is 1 with probability χ and 0 otherwise. Equivalently for the binary time series Y . Hence, the parameter λ controls the levelof dependence of X from Y (and vice versa when considering λ ): conditional on the probability that apast event aﬀects the current realization ( i.e. ν > ), the larger is λ , the larger is the probability thatthe occurrence of an extreme event Y t − τ t = 1 in the past triggers an extreme event X t = 1 . If this isthe case, taking into account the past information on Y helps in forecasting X , thus there exists a causalrelation from Y to X .We can test for Granger causality in tail by constructing a test statistic based on the Likelihood-Ratioas follows.2.1.2. Likelihood-Ratio (LR) statistic.

In order to construct the statistical test for Granger causality intail between two time series, we need to assess the statistical signiﬁcance of the oﬀ-diagonal autoregressivecoeﬃcients of the bivariate VDAR(p) process (2.3). We propose to adopt the Likelihood-Ratio (LR) test[Hansen, 1992] by stating the null hypothesis (1.1) together with the alternative hypothesis (1.2) interms of the likelihood of two competing models, namely VDAR(p) and DAR(p). This is further in linewith [Barnett et al., 2009, Barnett and Bossomaier, 2012], where authors notice that the GC (in mean)statistic may be formalized as a likelihood-ratio test. he null hypothesis H (1.1) that the time series Y does not Granger-cause in tail the time series X can be stated as H : P DAR ( p ) ( X t = 1 |{ X t − s } s =1 ,..,p , ˜ ν, ˜ γ , ˜ χ )= P V DAR ( p ) ( X t = 1 |{ X t − s } s =1 ,..,p , { Y t − s } s =1 ,..,p , ν , λ , γ , χ ) a.s. (2.4)versus the alternative hypothesis H A (1.2) formulated as H A : P DAR ( p ) ( X t = 1 |{ X t − s } s =1 ,..,p , ˜ ν, ˜ γ , ˜ χ ) = P V DAR ( p ) ( X t = 1 |{ X t − s } s =1 ,..,p , { Y t − s } s =1 ,..,p , ν , λ , γ , χ ) a.s. (2.5)where P DAR ( p ) ( X t = 1 |{ X t − s } s =1 ,..,p , ˜ ν, ˜ γ , ˜ χ ) is the probability of an extreme event for the DAR(p)model (2.1), whereas P V DAR ( p ) ( X t = 1 |{ X t − s } s =1 ,..,p , { Y t − s } s =1 ,..,p , ν , λ , γ , χ ) is for the VDAR(p)model (2.3).Notice that the two considered models are nested, since the ‘full’ VDAR(p) model (2.3) contains all theterms of the ‘restricted’ DAR(p) model (2.1), but including also the ‘oﬀ-diagonal’ terms of interaction.Thus, to make testable the null hypothesis (2.4), we can apply the likelihood-ratio test [Hansen, 1992]to assess the goodness of ﬁt of the two competing nested models by evaluating how much better thefull model works with respect to the restricted one: if the null H is supported by the observed data,the likelihoods of the two competing models should not diﬀer by more than sampling error. Thus, wetest whether the likelihood-ratio is signiﬁcantly diﬀerent from one, or equivalently whether its naturallogarithm is signiﬁcantly diﬀerent from zero.For notation simplicity, let us deﬁne θ ≡ { ˜ ν, ˜ γ , ˜ χ } and θ ≡ { ν , λ , γ , χ } , and indicate the likelihoodsof the DAR(p) and VDAR(p) models as ( L ( θ ) = Q Tt = p +1 P DAR ( p ) ( X t |{ X t − s } s =1 ,..,p , ˜ ν, ˜ γ , ˜ χ ) , L ( θ ) = Q Tt = p +1 P V DAR ( p ) ( X t |{ X t − s } s =1 ,..,p , { Y t − s } s =1 ,..,p , ν , λ , γ , χ ) . (2.6)The likelihood-ratio is deﬁned as LR ≡ sup θ ∈ Θ L ( θ )sup θ ∈ Θ L ( θ ) = L (ˆ θ ) L (ˆ θ ) (2.7)where Θ and Θ represent the domains of the parameters θ and θ , and ˆ θ and ˆ θ are the MaximumLikelihood Estimators (MLE) of the DAR(p) and VDAR(p) models, respectively. Therefore, in orderto apply the test, we need to know the quantiles of the distribution of LR corresponding to the desiredsigniﬁcance of the test and the MLE of the two models.In most cases, the exact distribution of LR is diﬃcult to determine. However, assuming H is true, thereis a fundamental result by Samuel S. Wilks [Wilks, 1938, Casella and Berger, 2002]: as the sample size T → ∞ , the test statistics Λ ≡ − L (ˆ θ ) − log L (ˆ θ )) . (2.8)is asymptotically chi-squared distributed with number of degrees of freedom equal to the diﬀerence indimensionality of Θ and Θ . Hence, this implies that we can calculate the test statistics Λ given thedata { X t , Y t } t =0 , ,...,T for ﬁnite T , and then compare Λ with the quantile of the chi-squared distributioncorresponding to the desired statistical signiﬁcance. We clearly expect that the larger is T , the betterthe test works.The MLE of the DAR(p) and VDAR(p) models is relatively simple to derive, see Appendix A. Toimprove the accuracy of the maximum likelihood estimation, it is convenient to use as a starting pointof the algorithm the parameters obtained with the method of moments, namely the solution of the Yule-Walker equations for the DAR(p) and VDAR(p) models. In fact, Yule-Walker equations for DAR(p)and VDAR(p) are entirely equivalent to the ones of the autoregressive AR(p) and VAR(p) processes,respectively. The solution is a standard result of time series analysis, see [Tsay, 2005]. See Appendix Afor the technical details about the inference process.2.2. Multivariate causality analysis.

The aim of the multivariate causality analysis is to detect aGranger causality in tail relation between a pair of time series, but extending the information set to allthe relevant variables of the system under investigation. This framework accounts for the situation inwhich a variable is aﬀected by several other variables (network eﬀect). Let the N binary time series { X it } i =1 ,...,Nt =1 ,...,T describe the sequences of extreme events. We describe their dependence structure at unit In our case, this diﬀerence is equal to p , i.e. the order of the discrete autoregressive process (2.3). ime lag with the Markovian VDAR(1) model (2.2) with p = 1 . The Markovian restriction, as mentionedbefore, is necessary for computational reasons.The VDAR(1) model is speciﬁed as X it = V it X J it t − + (1 − V it ) Z it (2.9)where X it ∈ { , } ∀ i, t , V it ∼ B ( ν i ) with ν i ∈ [0 , ∀ i = 1 , ..., N , J it ∼ M ( λ i , ..., λ iN ) is a multinomialrandom variable with λ i ≡ { λ ij } j =1 ,...,N such that P Nj =1 λ ij = 1 ∀ i = 1 , ..., N , and Z it ∼ B ( χ i ) with χ i ∈ [0 , , ∀ i = 1 , ..., N .The generic entry λ ij of the matrix λ ≡ { λ ij } i,j =1 ,...,N determines the level of interaction between X i and X j if i = j , or self-interaction if i = j (diagonal elements). Hence, the presence of a Grangercausality relation from X j to X i in the multivariate case can be tested by validating non-zero λ ij .In Statistical Inference, there exist many regularization methods that force the estimation algo-rithm to infer a less complex model by putting some parameters to zero, when not statistically sig-niﬁcant. The two most widely used types of regularization are the so-called L1 ( i.e. LASSO ) and L2regularization [Hastie et al., 2005]. Recently, Decimation [Decelle and Ricci-Tersenghi, 2014] has beenproposed to infer the topology of the interaction network in models with pairwise interactions be-tween binary random variables. The Decimation method has proved to be very eﬃcient in recoveringthe network of interactions for a speciﬁc logistic regression model, namely the Ising model, in bothstatic [Decelle and Ricci-Tersenghi, 2014] and kinetic [Decelle and Zhang, 2015, Campajola et al., 2019,Campajola et al., 2020B] cases, by setting the weakest interaction couplings to zero iteratively. Here, weadopt the Decimation method to validate the entries of the matrix λ estimated by maximum likelihood(see Appendix A for the technical details). The validated oﬀ-diagonal couplings constitute the detectedcausality relations. 3. Monte Carlo simulations

In this section we use Monte Carlo simulations to analyze the performance of the proposed methodsand to compare it with [Hong et al., 2009]. A ﬁrst analysis (section 3.1) is performed simulating twotime series using the bivariate VDAR(p) (2.3) as data generating process and then testing for a causalityrelation with the two methods. The two methods’ performances are compared on the basis of theirFalse Positive Rate (FPR) and True Positive Rate (TPR). Then, in section 3.2, we perform the sameanalysis using alternative data generating processes, speciﬁcally the GARCH-type models (adopted in[Hong et al., 2009]) and the bivariate VDAR(1) with non-independent marginals, to verify the numericalconsistency of the proposed parametric

Likelihood-Ratio test of Granger causality in tail even when thedata generating process is not correctly speciﬁed. In these pairwise settings, we point out the drawbacksof the method of [Hong et al., 2009] which are naturally solved by using the proposed approach. Then, insection 3.3 we move to a setting in which the pairwise approximation is not correct, by simulating timeseries with a multivariate VDAR(1) process. We show that the multivariate generalization of our methoddoes not give rise to the spurious eﬀects that are instead found by the [Hong et al., 2009] method in thiscase.3.1.

Bivariate VDAR(p) as data generating-process.

In the bivariate VDAR(p) process (2.3), acausal interaction from Y to X is present when λ > . Therefore, the null hypothesis H in (2.4) is truewhen λ = 0 , while the alternative hypothesis H A in (2.5) is true when λ > . We generated time seriesof X and Y of length T for diﬀerent values of T with the bivariate VDAR(p) model for p = 1 , anddiﬀerent values of λ . Table 1 reports the rejection rate of Λ in (2.8) for the proposed test of Grangercausality in tail, in particular the False Positive Rate (FPR), i.e. the percentage of rejections of the nullhypothesis when λ = 0 , and the True Positive Rate (TPR), i.e. the percentage of rejections of the nullhypothesis when λ > . The Likelihood-Ratio (LR) test has appropriate size for all T , i.e. FPR is belowthe signiﬁcance level. It also has a good power under the alternative hypothesis H A in (2.5), i.e. itis successful at recognizing true causality when λ > , and it becomes more powerful as T increases.The TPR, as expected, increases with the degree of causal interaction ( i.e. λ ), that is, interactions areincreasingly detected when they are stronger.We repeated the same analysis using the test of Granger causality in tail by [Hong et al., 2009], seeTable 2. The test of Hong et al. tends to over-reject signiﬁcantly the null hypothesis at the level, whendata are generated with zero causal interaction, i.e. FPR is much larger than . . This eﬀect seemsindependent from the sample size T . At the same time, it displays a good power under the alternativehypothesis for λ > , and the performances become better as T increases. ize and power at the signiﬁcance level of the proposed test of GC in tail. DGP:VDAR(1) FPR λ = 0 TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . T = 500 T = 1000 T = 2000 T = 5000 T = 10000 DGP:VDAR(2) FPR λ = 0 TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . T = 500 T = 1000 T = 2000 T = 5000 T = 10000 Table 1.

False Positive Rate (FPR) and True Positive Rate (TPR) of the test statistic Λ (2.8) under the null H that Y ‘does not Granger cause’ X in (2.4) with data generatedaccording to the VDAR(p) model (2.3) for diﬀerent values of λ and diﬀerent sample sizes T . The parameters of the VDAR(p) model are: ν , ν = 0 . , χ , χ = 0 . , λ = 0 , and γ ij = 0 . ∀ i, j = 1 , in the case p = 2 . Notice that data are generated with p = 1 (above)or p = 2 (below), but in obtaining the test statistic Λ (2.8) p is inferred according to theBayesian Information Criterion (see Appendix A), thus no prior information on the timelag order is used when applying the proposed test. Each rejection rate is computed overa sample of seeds. Size and power at the signiﬁcance level of the test of GC in tail by [Hong et al., 2009]. DGP:VDAR(1) FPR λ = 0 TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . T = 500 T = 1000 T = 2000 T = 5000 T = 10000 DGP:VDAR(2) FPR λ = 0 TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . TPR λ = 0 . T = 500 T = 1000 T = 2000 T = 5000 T = 10000 Table 2.

False Positive Rate (FPR) and True Positive Rate (TPR) associated with thetest by [Hong et al., 2009] under the null H that Y ‘does not Granger cause’ X withdata generated according to the VDAR(p) model (2.3) for diﬀerent values of λ anddiﬀerent sample sizes T . The test statistic is computed by using the Daniell kernel with M = 5 . The parameters of the VDAR(p) model are: ν , ν = 0 . , χ , χ = 0 . , λ = 0 ,and γ ij = 0 . ∀ i, j = 1 , in the case p = 2 . Each rejection rate is computed over asample of seeds.We remark that, for a given value of the signiﬁcance level, the method by Hong et al. shows a higherTPR with respect to our method (for ﬁxed λ ), but at the expense of a higher FPR. A fairer comparisonbetween the two methods can be performed building the Receiving Operating Characteristic (ROC) curveassociated with the two statistical methods, which allows to compare the TPRs of the two methods fora given FPR (Fig. 1). The curve corresponding to one method is built by plotting the points (FPR,TPR) obtained with the corresponding method for varying threshold values ( i.e. signiﬁcance levels). Wetherefore simulated data according to the VDAR(1) process (details in the caption of Fig. 1), both with FPR T P R ROC (low causality)

Hong et al. (M=5)Hong et al. (M=10)BiDAR(1)BiDAR(p)

FPR T P R ROC (high causality)

Hong et al. (M=5)Hong et al. (M=10)BiDAR(1)BiDAR(p)

Figure 1.

False Positive Rate (FPR) obtained for the test by [Hong et al., 2009] underthe null H that X ‘does not Granger cause’ Y with data generated according to theVDAR(1) model (2.3) for diﬀerent values of ν and sample sizes T = 10000 . The teststatistic is computed by using the Daniell kernel with diﬀerent values of M . The param-eters of the VDAR(1) model are: ν = ν , χ , χ = 0 . , λ = 0 . , and λ = 0 . Eachrejection rate is computed over a sample of seeds. Values in the brackets representinstead the FPR of the LR test based on the test statistic (2.8).and without causality, and computed the FPR and TPR obtained by the two methods, resulting in thecurves in Figure 1. For a given FPR, the best causality method displays the largest TPR. The results areshown in Figure 1 for the two cases of low or high causality. In the low causality case, time series withcausality are generated by the VDAR(1) model with λ ∼ U (0 , . , while in the high causality case with λ ∼ U (0 . , . . In both cases, we can notice that the novel approach outperforms the non-parametricmethod by [Hong et al., 2009]. It is worth noting that we are able to identify correctly the time scaleof the causal interaction, as evident from the superposition of the ROC curves built for VDAR(1) andVDAR(p) models: in the ﬁrst case, the order of the autoregressive process is imposed by hand, whereasin the second case the order p is optimally selected ( i.e. p = 1 ) by the Bayesian Information Criterionduring the Maximum Likelihood Estimation process, as explained in Appendix A.To conclude this section, we show that when there is a unidirectional causal relation between two timeseries (e.g., Y causes X but not the converse), in the presence of non-zero autocorrelation the test by ong et al. shows a very high FPR due to the mistaken detection of the inverse causal relation (from X to Y ), while our LR test has a small FPR rate. To show this, we consider data generated by theVDAR(1) model (2.3) with λ = 0 . ( Y Granger-causes X ) but λ = 0 ( X does not Granger cause Y ),and varying the parameter ν which determines how much the binary time series of Y is autocorrelated.We then test for a Granger causality from X to Y with the method by Hong et al. Table 3 shows theresults of the numerical exercise. The rate of false rejections for the test of Hong et al. is increasingas the autocorrelation of binary time series increases, eventually converging to one. For the novel LRtest based on the VDAR(p) model, instead, the rate is below the level of the test signiﬁcance, i.e. (see the values in the brackets in Table 3). The poor performance of the non-parametric method by[Hong et al., 2009] can be understood by the following argument. The method tests for non-zero laggedcross-correlations between the two binary time series. Now, assume that the series are generated accordingto the VDAR(1) model with λ > , λ = 0 , ν > . The fact that λ > implies, clearly, a non-zerocovariance E ( ˜ X t ˜ Y t − ) = 0 , where the tilde is for indicating the mean subtracted variable. However, since Y is autocorrelated ( ν > ), this will also imply a non-zero covariance E ( ˜ Y t ˜ X t − ) = E ( V t ( J t ˜ X t − ˜ X t − + (1 − J t ) ˜ Y t − ˜ X t − ) + (1 − J t ) ˜ Z t ˜ X t − ) == ν E ( ˜ X t − ˜ Y t − ) == ν E ( V t − ( J t − ˜ X t − ˜ X t − + (1 − J t − ) ˜ X t − ˜ Y t − ) + (1 − J t − ) ˜ X t − ˜ Z t − ) == ( ν ) E ( ˜ X t − ˜ Y t − ) = 0 , (3.1)where we used E ( ˜ X t ) = 0 , E ( ˜ Z t ) = 0 , E ( V t ) = ν , and E ( J t ) = 0 . Therefore, the method by Hong etal. tends to falsely detect causation from X to Y because of this non-zero lagged cross-correlation. Thisspurious eﬀect is increasing with the autocorrelation itself. Our novel approach, on the contrary, is ableto capture both cross and self interactions, thus validating the correct direction of the causality relation(within the signiﬁcance level).3.2. Alternative data generating process.

We repeat the Monte Carlo exercise for pairwise causalityanalysis with the data generating process adopted in [Hong et al., 2009]. They consider a GARCH-typemodel for the underlying time series x and x , from which the binary time series of extreme events areextracted. The model is the following:  x i,t = β i x ,t − + β i x ,t − + u i,t , i = 1 , u i,t = σ i,t ǫ i,t σ i,t = γ i, + γ i, σ i,t − + γ i, u ,t − + γ i, u ,t − ǫ i,t ∼ N (0 , , i.i.d. r.v. ∀ i = 1 , , (3.2)where the variance of x and x is described by a GARCH(1,1) process, but including also oﬀ-diagonaldependencies through the mixed terms u and u , respectively for x and x . Then, the realization attime t of the process is the sum of the innovation term (as in the standard GARCH(1,1)) plus a linearcombination of the past realizations at time t − of the two processes (similarly to standard vectorautoregressive VAR(1) model). The following parameterization is used ( ( β , β , γ , γ , γ , γ ) = (0 . , b, . , . , . , c )( β , β , γ , γ , γ , γ ) = (0 , . , . , . , , . . (3.3)In particular, the authors consider the following cases: NULL (no Granger causality in tail) when b = c = 0 ; ALTER1 (Granger causality in tail from mean) b = 2 , c = 0 ; ALTER2 (Granger causality in tailfrom variance) when b = 0 , c = 0 . .For each individual time series, the authors propose to estimate by means of the Quasi-MLE methodthe following GARCH-type model (without oﬀ-diagonal interaction terms)  x i,t = β i x ,t − + u i,t , i = 1 , u i,t = σ i,t ǫ i,t σ i,t = γ i, + γ i, σ i,t − + γ i, u i,t − ǫ i,t ∼ N (0 , , i.i.d. r.v. ∀ i = 1 , , (3.4)then using both the estimated parameters and the ﬁltered series to ﬁnd at each time the Value at Risk V aR , i.e. the left quantile corresponding to − .

64 ˆ σ i,t under the hypothesis of Gaussianity. Hence, ize at the signiﬁcance level of the test by Hong et al. or the LR test. GARCH-type m.NULL FPRHong et al. M = 5 FPRHong et al. M = 10 FPRHong et al. M = 15 FPRHong et al. M = 20 FPRLR test T = 500 T = 1000 T = 2000 T = 5000 T = 10000 signiﬁcance level of the test by Hong et al. or the LR test. GARCH-type m.ALTER1 TPRHong et al. M = 5 TPRHong et al. M = 10 TPRHong et al. M = 15 TPRHong et al. M = 20 TPRLR test T = 500 T = 1000 T = 2000 T = 5000 T = 10000 signiﬁcance level of the test by Hong et al. or the LR test. GARCH-type m.ALTER2 TPRHong et al. M = 5 TPRHong et al. M = 10 TPRHong et al. M = 15 TPRHong et al. M = 20 TPRLR test T = 500 T = 1000 T = 2000 T = 5000 T = 10000 Table 4.

FPR and TPR of the two statistical tests (Hong et al. test for diﬀerent valuesof M and the LR test with the statistic Λ (2.8) based on the VDAR(p) model) with datagenerated by the GARCH-type model as in [Hong et al., 2009] under the three diﬀerentcases explained in the text (Upper: NULL, Middle: ALTER1, Bottom: ALTER2). T indicates the sample size of each time series. The Danielsson kernel used for the Honget al. test is with M as indicated in the Tables. Each value is the average over a sampleof seeds.we can obtain the binary time series of extreme events according to the condition x i,t < − .

64 ˆ σ i,t whichidentiﬁes the (left) tail events for the the time series x i . We apply the two tests for Granger causality in tail to data generated in the three diﬀerent cases,namely NULL, ALTER1, and ALTER2. Under NULL, there is no Granger causality in tail from x to x , thus we can study the size of each method, i.e. the False Positive Rate (FPR), by testing under thenull hypothesis that x ‘does not Granger-cause in tail’ x . On the other hand, there exists Grangercausality in tail from x to x under both ALTER1 and ALTER2, but for diﬀerent underlying eﬀects.Under ALTER1, x triggers an extreme event for x by ‘moving’ the conditional mean of the distributionof x . Under ALTER2, x modiﬁes the conditional variance of the distribution of x , thus increasing, e.g. , the probability of a tail event. In both cases, we can study the power of each test of Granger causalityin tail by ﬁnding the rate of rejections of the null hypothesis that x ‘does not Granger-cause in tail’ x .The results for both tests are shown in Table 4. In comparing the ﬁnite size performances, the methodby [Hong et al., 2009] slightly outperforms ours.As in the previous section, for a fair comparison we build the ROC curves associated with the twomethods, see Figure 2. We simulated data according to the GARCH-type model, either without (NULLcase) and with (ALTER1 case in the left panel, ALTER2 case in the right panel) causality, and computedthe FPR and TPR obtained by the two methods. We used a large sample size T = 10000 . For bothmechanisms of causality, ALTER1 and ALTER2, our method displays a very good performance as testiﬁed Here, we are considering the left tail of the distribution by ﬁnding the Value at Risk for the probability level. Wecan consider equivalently the right tail by using, e.g. , the probability level. T P R ROC - NULL vs ALTER 1

Hong et al. (M=5)Hong et al. (M=10)LR test (p=1)LR test (p inferred) T P R ROC - NULL vs ALTER 2

Hong et al. (M=5)Hong et al. (M=10)LR test (p=1)LR test (p inferred)

Figure 2.

Receiving Operating Characteristic (ROC) curves built with the p-valuesassociated with the test statistics of either the Likelihood-Ratio (LR) method (2.8) andthe kernel-based non-parametric approach by [Hong et al., 2009], used as binary clas-siﬁers depending on some threshold value, i.e. the test signiﬁcance level. We use theGARCH-type model with T = 10000 as Data Generating Process (DGP), under the nullhypothesis NULL or under the alternative hypothesis ALTER1 (left) or ALTER2 (right),with one half probability over a sample of simulations. The model parameters areas in the main text. In implementing the LR test, the blue curve is obtained by imposingthe order p = 1 of the autoregressive process, whereas the black dotted curve is obtainedby optimally selecting the order with the Bayesian Information Criterion. The methodby Hong et al. is implemented with the Daniell kernel with a M = 5 (red solid line) or M = 10 (red dotted line).by the area under the (ROC) curve really close to one, a signal of high prediction power. Nevertheless,the test by Hong et al. is more powerful in detecting causality, especially under ALTER2. Therefore,when data are generated by the considered GARCH-type models and in the pairwise scenario, the non-parametric approach by [Hong et al., 2009] performs slightly better than the proposed LR test.However, we must not conclude that in general, when the data generating process is not correctlyspeciﬁed, the non-parametric approach performs better. In fact, consider the bivariate VDAR(1) model(2.3) with p = 1 as data generating process, but with a general dependence structure between theBernoulli marginal distributions, i.e. Z t and Z t in (2.3), described by a Gaussian copula. A positive(instantaneous) correlation structure between extreme events of prices (usually referred as jumps in thiscontext) is quite common in ﬁnancial markets, see e.g. an empirical study on synchronization of largeprice movements in the US stock market [Calgagnile et al., 2018]. Again, we compare the ROC curvesbuilt for the two methods of Granger causality in tail, see Figure 3. In this case, the test by Hong etal. is quite sensitive to the presence of instantaneous “co-jumps”, resulting in an increased rate of falserejections (left panel), whereas our method assures a high performance in assessing causality relations atthe unit time lag. For completeness, we report also the result for negative correlation between extremeevents ( i.e. very low probability for the co-occurrence) in the right panel. Also in this case, our methodoutperforms the Hong et al. test. Therefore, the bivariate VDAR(1) with non-independent marginals isan example in which our parametric method works better than the non-parametric one even though thedata generating process does not coincide with the one assumed by the method.3.3. VDAR(1) as data generating process.

Let us consider now the VDAR(1) model (2.9) as datagenerating process. We consider a system of N = 40 variables with star-shaped network of causalinteractions, meaning that the oﬀ-diagonal interaction terms λ ij are present between a central node(representing one of the variables) and each of the other N − . The extreme states of these variablesare described by binary time series evolving according to (2.9) with causal interactions captured by thematrix λ parameterized as λ = (cid:18) λ self λ in u ′ λ out v diag( λ iself ) (cid:19) , (3.5)where diag( λ iself ) is a N − × N − diagonal matrix having entries λ iself , sorted according to the nodeindex i = 2 , ..., N . The ﬁrst node is the center of the star. It interacts with itself (at the past time T P R ROC - VDAR(1) "with co-jumps "

Hong et al. (M=5)Hong et al. (M=10)LR test (p=1)LR test (p inferred) T P R ROC - VDAR(1) "without co-jumps "

Hong et al. (M=5)Hong et al. (M=10)LR test (p=1)LR test (p inferred)

Figure 3.

Receiving Operating Characteristic (ROC) curves built for both theLikelihood-Ratio (LR) test (2.8) and the kernel-based non-parametric approach by[Hong et al., 2009]. Data are generated by the VDAR(1) model (2.3) with p = 1 andsample size T = 10000 , but Bernoulli marginals with a dependence structure describedby a Gaussian copula having zero mean and correlation parameter equal to . (left)or − . (right). The model parameters are: ν = 0 . , ν = 0 , χ , χ = 0 . , λ = 0 ,and λ = 0 under the null H , or λ ∼ U (0 , under the alternative hypothesis H A .In implementing the LR test, the blue curve is obtained by imposing the order p = 1 of the autoregressive process, whereas the black dotted curve is obtained by optimallyselecting the order with the Bayesian Information Criterion. The method by Hong et al.is implemented with the Daniell kernel with a M = 5 (red solid line) or M = 10 (reddotted line).lag) with probability λ self , and with each of the other nodes with probability λ out or λ in , respectivelyfor outward or inward interactions. Any other node i interacts with itself (at the past time lag) withprobability λ iself , i = 2 , ..., N , and with the central node. In particular, we consider two cases: (i) the out star where the N − nodes are ‘Granger-caused’ by the ﬁrst node, obtained setting λ in = 0 , λ out = 1 / ,and v ≡ N − (the vector of N − ones). In this case, λ self = 1 and λ iself = 1 / ∀ i = 2 , ..., N (asrequired by normalization); (ii) the mixed star, where a subset of the nodes is ‘Granger-caused’ by theﬁrst node, which, in turn, is ‘Granger-caused’ by the complementary set, obtained by setting λ out = 1 / , u and v random vectors of ones and zeros such that u + v = N − , and λ in = 1 / (cid:16) P N − j =1 u j (cid:17) . Inthis case, λ self = 1 / (cid:16) P N − j =1 u j (cid:17) while λ iself with i = 1 is / if i is ‘Granger-caused’, otherwise(again, as required by normalization). In both cases, we set ν i = 1 / and χ i = χ ∀ i = 1 , ..., N , withvarying χ ∈ (0 , / . A pictorial representation of the two cases is shown in panels a and d of Figure 4.Hence, a node ‘Granger-causes’ another node if a directional link exists in the star. Thus, once thedata are generated by the VDAR(1) model, we can obtain both the power and the size of the statisticaltests of Granger causality in tail by considering how each method is able to reconstruct on average thenetwork of interactions. Notice that both the statistical method by [Hong et al., 2009] and the LR testbased on the statistic Λ (2.8), being pairwise approaches, detect causal interactions in the network byconsidering sequentially couples of nodes, whereas the statistical validation of the interactions in theVDAR(1) process by means of Decimation is an eﬀectively multivariate approach.The TPR and FPR for the three methods in each of the two cases, namely the out and the mixed stars,are shown in Figure 5. The rejection rates are plotted for varying frequency of tail events, controlled bythe parameter χ . In the left panels, we notice that all three methods are powerful in rejecting in bothcases the null hypothesis when a causal link is present in the stars. However, the right panels show thatthe rate of false rejections is quite high when we adopt the method by Hong et al. or the LR test. Whenconsidering the out star, the FPR associated with the method by Hong et al. converges quickly to one asthe frequency of tail events increases, i.e. the null hypothesis is always rejected for any possible coupleof non-interacting nodes. The approach based on the LR test outperforms slightly the method by Honget al., nevertheless it displays a very high FPR. When considering instead the mixed star, the rate of u is a random vector having each entry equal to either one or zero, depending on the realization of a Bernoulli randomvariable with one half success probability. The vector v is complementary to u , such that u + v = N − . U T S T A R M I XE D S T A R Hong et al. LR test True positiveFalse positiveVDAR(1)

Figure 4.

Pictorial representation of both out and mixed stars , as explained in the maintext, (a and d) together with the validated networks of interactions by means of the twomethods of Granger causality in tail, namely Hong et al. (b and e) and LR test (c andf).

Frequency of tail events T P R Out Star

Hong et al.LR testVDAR(1)5% confidence level

Frequency of tail events F P R Out Star

Frequency of tail events T P R Mixed Star

Frequency of tail events F P R Mixed Star

Figure 5.

True Positive Rate (TPR) and False Positive Rate (FPR) as a function ofthe frequency of tail events (or, equivalently, χ i = χ ∈ (0 , / , ∀ i = 1 , ..., N in (2.9))for the three methods: the test by Hong et al., the LR test with statistic Λ (2.8), andthe statistically validation of the interaction terms of the VDAR(1) process by meansof Decimation. Data are generated by the VDAR(1) model as described in the maintext. Top panels refer to the case of out star, while bottom panels to mixed star . Thesigniﬁcance level is , but corrected with the False Discovery Rate (FDR) methodto take into account the eﬀect of multiple testing comparison. Each value and thecorresponding error bar are the mean and the standard deviation, respectively, over asample of 100 simulation.false rejections of both tests is signiﬁcantly larger than the signiﬁcance level of (with the exceptionof the LR test in the case of really low frequency of tail events), and, again, the approach based onthe test statistics Λ (2.8) outperforms the method by Hong et al. Both the pairwise tests, therefore,have a non-satisfactory performance in presence of network eﬀects. On the contrary, the multivariate approach based on the statistical validation of the interaction terms in the VDAR(1) process by means of ecimation shows a False Positive Rate below the threshold of the signiﬁcance level, very close to zero,for both the out and the mixed stars.The very high rate of false rejections associated with the pairwise causality analysis is the combinationof two eﬀects: ﬁrst, a misspeciﬁcation of the information set and, second, non-zero autocorrelation.When considering the out star, the central node ‘Granger-causes’ all the others, but false positives aredetected in both directions between non-interacting nodes when the information on the central node isnot considered (compare panels a and c of Figure 4). This happens both for the LR test based on thestatistic Λ (2.8) and for the Hong et al. test. Moreover, when the binary time series have non-zeroautocorrelation, the test by Hong et al. mistakenly detects an additional causal interaction from theouter nodes to the central node (panel b of Figure 4). In this case, the validated network of interactionsbecomes, on average, the complete graph, i.e. a graph with all possible links, since both the TPR andthe FPR converge to one. Similar spurious causality arises for the mixed star. When A causes B , and B causes C , a spurious causality can be detected from A to C , if B is not considered. This causes spuriouslinks in the network reconstructed by the LR test (compare panels d and f of Figure 4) and also by theHong et al. test. The latter also detects spurious links in presence of non-zero autocorrelation, see panele. The multivariate approach is not prone to any of these mistakes, since it reconstructs correctly thenetworks of interactions shown in panels a and d.4. Empirical application on the US Stock Exchange market

To illustrate our novel method for testing Granger causality in tail, we now consider an application tohigh frequency data of a portfolio of ﬁnancial stocks belonging to the Russell 3000 Index, traded in theUS equity markets (mostly NYSE and NASDAQ). An empirical characterization of such high frequencydata can be found in [Calgagnile et al., 2018]. Here, we adopt the same ﬁltering procedure introduced in[Bormetti et al., 2015] to obtain the time series of the extreme events of stock returns, then we apply ourmethod to detect the causal relations between stocks.4.1.

Data ﬁltering and extreme events of stock returns.

We consider highly liquid stockstraded in the US markets from 2000 to 2012. We use one-minute closing price data during the regularUS trading session, i.e. from 9:30 am to 4:00 pm. Hence, we deﬁne the return at the one minute timescale as ˜ r it ≡ log P it /P it − where P it is the price of stock i at time t , with i = 1 , ..., and the t = 1 , ..., T where T is about per year (this number depends on the eﬀective number of trading days withinthe year).Intraday returns are ﬁrst ﬁltered because of the presence of the well-known U-shape pattern for volatil-ity within a trading day, i.e. prices exhibit typically larger movements at the beginning and at the endof the day. The raw return ˜ r id,t of generic stock i at day d and intraday time t is rescaled by a factor u it , r id,t = ˜ r id,t u it , where u it = 1 N days X d ′ | ˜ r id ′ ,t | s id ′ , with N days indicating the number of days in the sample and s id ′ the standard deviation of absoluteintraday returns of day d ′ . Rescaled returns { r it } i =1 ,..., t =1 ,...,T no longer possess any daily regularities. Toestimate the spot ( i.e. instantaneous) volatility σ it , we use the method of realized bipower variation by[Barndorﬀ-Nielsen and Shephard, 2004] with threshold correction for the presence of jumps by [Corsi et al., 2010]and using exponentially weighted moving averages of returns, see [Bormetti et al., 2015] for further detailson the ﬁltering procedure.Finally, we say that an extreme return occurs for stock i at time t when r it σ it < − θ (4.1)for a given threshold θ . Speciﬁcally, we use θ = 4 in the present analysis. By using condition (4.1), webuild the binary time series { X it } i =39 t =1 ,...,T of extreme events for the US stocks.4.2. Causal relations in the US stock exchange market.

We now consider the directed networkof causal interactions between extreme events of the underlying stock return dynamics. Again, we stressthat the output of the causality analysis depends largely on the adopted method, thus the importance ofunderstanding strengths and weaknesses of each statistical test of Granger causality in tail.

000 2002 2004 2006 2008 2010 201200.20.40.60.81

LR testVDAR(1)Hong et al.

LR testVDAR(1)Hong et al. Lo Figure 6.

Link density (left), reciprocity coeﬃcient (middle), and normalized number ofclosed triangles (right) of the Granger causality networks obtained with the three diﬀerentmethods, i.e. the LR test with test statistic (2.8) (black), the test by [Hong et al., 2009](red), and Decimation applied to the VDAR(1) model (blue). p=1p>1

LR testHong et al.

Figure 7.

Fraction of causal links validated by the LR test and described by theVDAR(p) model with p (as in the legend) optimally selected by the Bayesian Infor-mation Criterion (left). Mean node degree for both outgoing (middle) and incoming(right) causal links of the US stocks belonging to the Financial Sector, namely BAC, C,AXP, and WFC, i.e. ¯ k finout and ¯ k finin respectively, are rescaled by the mean overall degree ¯ k of the corresponding Granger causality network. Dotted lines represent the mean overthe period, for the Granger causality networks obtained according to the LR test (black)or Hong et al. (red). In the right panel, the mean value of the rescaled in-degree for theLR test is computed over two disjoint time windows, before and after .In building the Granger causality network, we can adopt either a multiple hypothesis testing approachbased on pairwise causality analysis or the multivariate approach based on the statistical validation ofoﬀ-diagonal couplings. In the ﬁrst case, given N time series of extreme events and considering either thenovel LR test or the one by Hong et al., we construct the network by applying the Granger causalitytest to all the possible N ( N − pairs. Then, assume we aim to obtain some overall signiﬁcance levelfor the multiple testing. Because of N ( N − simultaneous tests, a correction to the signiﬁcance levelof each single test needs to be considered to take into account the increased chance of rare events andtherefore, the increased probability of false positive (i.e. rejections) [Tumminello et al., 2011]. In thepresent analysis, the overall signiﬁcance level is set to and the False Discovery Rate (FDR) method[Benjamini and Hochberg, 1995] is applied to correct the p-value of each single test. In the second case,adopting the multivariate approach based on the VDAR(1) model, the network of causal interactions isvalidated by means of Decimation, thus the overall signiﬁcance level is implicitly deﬁned by the validationprocedure. Finally, we use the three diﬀerent methods to build the corresponding Granger causalitynetworks with the data of (left) extreme events of US stocks ﬁltered as above, and considering oneyear at the time from to .Figure 6 shows some characteristics of the diﬀerent causality networks. First, the number of validatedcausal interactions diﬀers signiﬁcantly if we use the methods based on discrete autoregressive processes orthe one proposed by Hong et al., the latter describing an almost complete graph, see the left panel of Figure . By using the Jaccard index to compare two causality networks, we measure a value between . and . for the Granger networks obtained with both the pairwise VDAR and the multivariate approach foralmost all periods, signalling many common causal links between the two networks. On the contrary, wemeasure a much smaller value (between . and . ) when comparing the Granger network obtained withthe test of Hong et al. with the others. In the previous section, we noticed how spurious eﬀects may bepresent in the multivariate case if we adopt a pairwise approach. In particular, the detection of causalityin both directions or in triangular loops may be due to non-zero autocorrelation and misspeciﬁcationof the information set. Possible metrics capturing such network eﬀects are the reciprocity coeﬃcient measuring the likelihood of nodes to be mutually linked, and the number of closed triangles . The middlepanel and the right panel of Figure 6 show the two network metrics for the three Granger networks and,again, we can notice how reciprocated casual links and triangular interactions are largely over-expressedin the causality network obtained with the method of Hong et al., thus suggesting the presence of spuriousdetections. It is interesting to notice that during the ﬁrst year of the US ﬁnancial crisis of − reciprocity of causal links in the network obtained with the LR test displays a signiﬁcant increase withrespect to the mean level of reciprocity during the whole considered period.An interesting empirical observation concerns how memory in causal links depends on the ﬁnancialcycle. We notice that during turmoil periods for the US stock exchange markets, i.e. the dot-com bubbleof − and the subprime mortgage crisis of − , non-Markovian eﬀects become moreimportant, as testiﬁed by the fraction of causal links better described by a VDAR(p) model with p > (optimally selected by the Bayesian Information Criterion), see the left panel of Figure 7.Finally, we consider the stocks belonging to the ﬁnancial sector, i.e. BAC, C, AXP, and WFC. Themiddle and right panel of Figure 7 show the average value over this set of stocks of both the out-degreeand the in-degree, i.e. the mean number of outgoing and incoming causal links, respectively. Each valueis rescaled by the average degree of the causality network. We compare the Granger causality networkobtained by either our method or the one of Hong et al. We ﬁnd that the two methods describe thesubsystem of ﬁnancial stocks quite diﬀerently: while, according to Hong et al., the ﬁnancial sector ison average equivalent to any other one, the VDAR method suggests that the ﬁnancial sector ‘Granger-causes’ more than the others over all the considered period, since the ratio in the middle panel of Figure7 is above one. Moreover, there is a transition around − in the average number of causal linkspointing to the ﬁnancial stocks, thus highlighting a scenario where the other sectors began to cause lessthe ﬁnancial sector before the crisis, while, at the same time, ﬁnancial stocks started to cause more theother sectors. This simple example shows how diﬀerent conclusions on the role of a stock or sector in thecausal networks can be drawn depending on the adopted method.5. Conclusions

Based on the deﬁnition of Granger causality in tail, we propose a novel statistical approach in theoriginal spirit of Granger, to test whether the information on extreme events of one time series is statis-tically signiﬁcant in forecasting extreme events of a second time series, by introducing the multivariategeneralization of the discrete autoregressive process, namely VDAR(p). We devise a method based onthe Likelihood-Ratio test statistic which is able to detect causal interactions between two time series,while also correctly identifying the time scale of the causal interaction. Then, to overcome the limit ofthe pairwise causality analysis resulting in spurious detections because of neglected variables, we alsopropose a statistical method for the multivariate case with Markovian dynamics. Finally, we highlightthe importance of disentangling true causalities from false detections, and we prove the dependence ofthe results from the used statistical method. To this end, we present also a comparative study with thecurrent standard in literature represented by the test of [Hong et al., 2009].Simulation studies show that the proposed test of Granger causality in tail has both good power andgood size in ﬁnite samples, against a large variety of data generating processes. We show numerically thatthe proposed method and the test of Hong et al. diﬀer under some circumstances. In particular, the testby Hong et al. displays some sensitivity to auto-correlation of the time series of extreme events, resultingin spurious eﬀect of two-way causality detection in the presence of unidirectional relations, a drawback The Jaccard similarity coeﬃcient between two sample sets is deﬁned as the size of their intersection divided by the sizeof their union (of the links in the two networks). The reciprocity coeﬃcient is deﬁned as the ratio of the number of links pointing in both directions to the total numberof links. A normalized measure for the number of closed triangles in a network can be deﬁned as the ratio of the number ofsubgraphs of three nodes ( i.e. triplets) which are connected each other independently from link directions, with the numberof all possible triplets, both open and closed. hich is solved by our method. Then, we prove numerically how network eﬀects may result in spuriousdetections of triangular interactions due to the misspeciﬁcation of the information set in the pairwisecausality analysis, a second drawback which is solved by our multivariate approach. Furthermore, wehighlight some signals which may be associated with false detections in networked systems, namely thehigh rate of either reciprocated causal links or triangular loops.The empirical application to high frequency data of the US stock exchange points out how the outputof the causality analysis depends signiﬁcantly on the adopted statistical procedure. In particular, weﬁnd that the network of causal interactions is sparse and the memory of the causation process dependson the ﬁnancial cycles, with signiﬁcant non-Markovian eﬀects during ﬁnancial crises. A focus on theﬁnancial sector reveals that ﬁnancial stocks ‘Granger-cause’ more than the others, especially duringturmoil periods, and started to be ‘Granger-caused’ less before the ﬁnancial crisis of − . On thecontrary, the method by Hong et al. describes a network which is dense, without displaying any speciﬁcpattern for the ﬁnancial sector, whose stocks are on average equivalent to the others. Acknowledgements

This project has received funding from the SESAR Joint Undertaking under the European Union’sHorizon 2020 research and innovation programme under grant agreement No 783206. The opinions ex-pressed herein reﬂect the authors’ views only. Under no circumstances shall the SESAR Joint Undertakingbe responsible for any use that may be made of the information contained herein.

References [Barndorﬀ-Nielsen and Shephard, 2002 ] Barndorﬀ-Nielsen, O. E., & Shephard, N. (2002). Econometric analysis of realizedvolatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 64(2), 253-280.[Barndorﬀ-Nielsen and Shephard, 2004] Barndorﬀ-Nielsen, O. E., & Shephard, N. (2004). Power and bipower variation withstochastic volatility and jumps.

Journal of ﬁnancial econometrics , 2(1), 1-37.[Barnett et al., 2009] Barnett, L., Barrett, A. B., & Seth, A. K. (2009). Granger causality and transfer entropy are equivalentfor Gaussian variables.

Physical review letters , 103(23), 238701.[Barnett and Bossomaier, 2012] Barnett, L., & Bossomaier, T. (2012). Transfer entropy as a log-likelihood ratio.

Physicalreview letters , 109(13), 138105.[Barrett et al., 2010] Barrett, A. B., Barnett, L., & Seth, A. K. (2010). Multivariate Granger causality and generalizedvariance.

Physical Review E , 81(4), 041907.[Battiston et al., 2012] Battiston, S., Gatti, D. D., Gallegati, M., Greenwald, B., & Stiglitz, J. E. (2012). Liaisons dan-gereuses: Increasing connectivity, risk sharing, and systemic risk.

Journal of Economic Dynamics and Control , 36(8),1121-1141.[Benjamini and Hochberg, 1995] Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical andpowerful approach to multiple testing.

Journal of the Royal statistical society: series B (Methodological) , 57(1), 289-300.[Billio et al., 2012] Billio, M., Getmansky, M., Lo, A. W., & Pelizzon, L. (2012). Econometric measures of connectednessand systemic risk in the ﬁnance and insurance sectors.

Journal of ﬁnancial economics , 104(3), 535-559.[Bormetti et al., 2015] Bormetti, G., Calcagnile, L. M., Treccani, M., Corsi, F., Marmi, S., & Lillo, F. (2015). Modellingsystemic price cojumps with Hawkes factor models.

Quantitative Finance , 15(7), 1137-1156.[Calgagnile et al., 2018] Calcagnile, L. M., Bormetti, G., Treccani, M., Marmi, S., & Lillo, F. (2018). Collective synchro-nization and high frequency systemic instabilities in ﬁnancial markets.

Quantitative Finance , 18(2), 237-247.[Campajola et al., 2019] Campajola, C., Lillo, F. & Tantari, D. (2019). Inference of the Kinetic Ising Model with heteroge-neous missing data.

Physical Review E

Quantitative Finance , in press.[Casella and Berger, 2002] Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2). Paciﬁc Grove, CA: Duxbury.[Cheung and Ng, 1996] Cheung, Y. W., & Ng, L. K. (1996). A causality-in-variance test and its application to ﬁnancialmarket prices.

Journal of Econometrics , 72(1-2), 33-48.[Corsi et al., 2010] Corsi, F., Pirino, D., & Reno, R. (2010). Threshold bipower variation and the impact of jumps onvolatility forecasting.

Journal of Econometrics , 159(2), 276-288.[Corsi et al., 2018] Corsi, F., Lillo, F., Pirino, D., & Trapin, L. (2018). Measuring the propagation of ﬁnancial distress withGranger-causality tail risk networks.

Journal of Financial Stability , 38, 18-36.[Danielsson et al., 2012] Danielsson, J., Shin, H. S., & Zigrand, J. P. (2012). Endogenous and systemic risk. In Quantifyingsystemic risk (pp. 73-94). University of Chicago Press.[Davis, 2016] Davis, M. H. (2016). Veriﬁcation of internal risk measure estimates.

Statistics & Risk Modeling , 33(3-4), 67-93.[Duarte and Eisenbach, 2014] Duarte, F., & Eisenbach, T. M. (2014). Fire-sale spillovers and systemic risk.

FRB of NewYork Staﬀ Report , (645).[Decelle and Ricci-Tersenghi, 2014] Decelle, A., & Ricci-Tersenghi, F. (2014). Pseudolikelihood decimation algorithm im-proving the inference of the interaction network in a general class of Ising models.

Physical review letters , 112(7), 070603.[Decelle and Zhang, 2015] Decelle, A & Zhang, P. (2015). Inference of the sparse kinetic Ising model using the decimationmethod.

Physical Review E

91, 052136.[Engle and Manganelli, 2004] Engle, R. F., & Manganelli, S. (2004). CAViaR: Conditional autoregressive value at risk byregression quantiles.

Journal of Business & Economic Statistics , 22(4), 367-381. Fredrickson and Andersen, 1984] Fredrickson, G. H., & Andersen, H. C. (1984). Kinetic Ising model of the glass transition.

Physical review letters , 53(13), 1244.[Gallant and Tauchen, 1996] Gallant, A. R., & Tauchen, G. (1996). Which moments to match?.

Econometric Theory , 12(4),657-681.[Geweke, 1982] Geweke, J. (1982). Measurement of linear dependence and feedback between multiple time series.

Journalof the American statistical association , 77(378), 304-313.[Granger, 1969] Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods.

Econometrica: Journal of the Econometric Society , 424-438.[Granger, 1980] Granger, C. W. (1980). Testing for causality: a personal viewpoint.

Journal of Economic Dynamics andControl , 2, 329-352.[Hansen, 1992] Hansen, B. E. (1992). The likelihood ratio test under nonstandard conditions: testing the Markov switchingmodel of GNP.

Journal of applied Econometrics , 7(S1), S61-S82.[Hansen, 1994] Hansen, B. E. (1994). Autoregressive conditional density estimation.

International Economic Review , 705-730.[Harvey and Siddique, 1999] Harvey, C. R., & Siddique, A. (1999). Autoregressive conditional skewness.

Journal of ﬁnancialand quantitative analysis , 34(4), 465-487.[Harvey and Siddique, 2000] Harvey, C. R., & Siddique, A. (2000). Conditional skewness in asset pricing tests.

The Journalof Finance , 55(3), 1263-1295.[Hastie et al., 2005] Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning:data mining, inference and prediction.

The Mathematical Intelligencer , 27(2), 83-85.[Hong, 2001] Hong, Y. (2001). A test for volatility spillover with application to exchange rates.

Journal of Econometrics ,103(1-2), 183-224.[Hong et al., 2009] Hong, Y., Liu, Y., & Wang, S. (2009). Granger causality in risk and detection of extreme risk spilloverbetween ﬁnancial markets.

Journal of Econometrics , 150(2), 271-287.[Jacobs and Lewis, 1978] Jacobs, P. A., & Lewis, P. A. (1978). Discrete Time Series Generated by Mixtures. III. Autore-gressive Processes (DAR (p)) (No. NPS55-78-022).

NAVAL POSTGRADUATE SCHOOL MONTEREY CALIF .[Jondeau and Rockinger, 2003] Jondeau, E., & Rockinger, M. (2003). Conditional volatility, skewness, and kurtosis: exis-tence, persistence, and comovements.

Journal of Economic Dynamics and Control , 27(10), 1699-1737.[Lütkepohl, 1982] Lütkepohl, H. (1982). Non-causality due to omitted variables.

Journal of Econometrics , 19(2-3), 367-378.[Mancini, 2009] Mancini, C. (2009). Non-parametric threshold estimation for models with stochastic diﬀusion coeﬃcientand jumps.

Scandinavian Journal of Statistics , 36(2), 270-296.[Mazzarisi et al., 2020] Mazzarisi, P., Barucca, P., Lillo, F., Tantari, D. (2020) A dynamic network model with persistentlinks and node-speciﬁc latent variables, with an application to the interbank market.

European Journal of OperationalResearch

Morgan Guaranty Trust Company of New York ,New York, 35-65.[Roy, 1952] Roy, A. D. (1952). Safety ﬁrst and the holding of assets.

Econometrica: Journal of the econometric society ,431-449.[Seth et al., 2015] Seth, A. K., Barrett, A. B., & Barnett, L. (2015). Granger causality analysis in neuroscience and neu-roimaging.

Journal of Neuroscience , 35(8), 3293-3297.[Taylor, 2007] Taylor, J. W. (2007). Forecasting daily supermarket sales using exponentially weighted quantile regression.

European Journal of Operational Research , 178(1), 154-167.[Tibshirani, 1996] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.

Journal of the Royal StatisticalSociety: Series B (Methodological) , 58(1), 267-288.[Troster, 2018] Troster, V. (2018). Testing for Granger-causality in quantiles.

Econometric Reviews , 37(8), 850-866.[Tsay, 2005] Tsay, R. S. (2005). Analysis of ﬁnancial time series (Vol. 543). John wiley & sons.[Tumminello et al., 2011] Tumminello, M., Micciche, S., Lillo, F., Piilo, J., & Mantegna, R. N. (2011). Statistically validatednetworks in bipartite complex systems.

PloS one , 6(3).[Wilks, 1938] Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses.

The Annals of Mathematical Statistics , 9(1), 60-62.[Williams et al., 2019a] Williams, O.E., Lillo, F., & Latora, V. (2019). Eﬀects of memory on spreading processes in non-Markovian temporal networks.

New Journal of Physics

21, 043028.[Williams et al., 2019b] Williams, O.E., Lillo, F., & Latora, V. (2019). How auto- and cross-correlations in link dynamicsinﬂuence diﬀusion in non-Markovian temporal networks. (2019) arxiv.org/abs/1909.08134[Zanin et al., 2017] Zanin, M., Belkoura, S., & Zhu, Y. (2017). Network analysis of Chinese air transport delay propagation.

Chinese Journal of Aeronautics , 30(2), 491-499.

Appendix A. Inference of Discrete Autoregressive Models

The Maximum Likelihood Estimators (MLE) of the VDAR(p) models (2.3) and (2.9) is as follows.A.1.

MLE of bivariate VDAR(p).

Assume to observe the binary time series { X , Y } ≡ { X t , Y t } t =1 ,...,T and we aim to obtain the maximum likelihood estimator of the parameters ν , λ , χ , and γ ≡ { γ ,k } k =1 ,...,p , γ ≡ { γ ,k } k =1 ,...,p (or, equivalently, ν , λ , χ , and γ ≡ { γ ,k } k =1 ,...,p , γ ≡ { γ ,k } k =1 ,...,p ). Thelog-likelihood of observing the time series X given the information on Y according to the bivariate DAR(p) model is log P ( X | Y , ν , λ , χ , γ , γ ) = log T Y t = p +1 P ( X t | X t − , ..., X t − p , Y t − , ..., Y t − p , ν , λ , χ , γ , γ ) == T X t = p +1 log " ν (1 − λ ) p X k =1 γ ,k δ X t ,X t − k + λ p X k =1 γ ,k δ X t ,Y t − k ! + (1 − ν )( χ ) X t (1 − χ ) − X t , (A.1)by using the chain rule and conditioning on Y and the ﬁrst p observations X , ..., X p , where δ A,B is theKronecker delta taking value equal to one if A = B and zero otherwise.MLE of the parameters is obtained by maximizing the log-likelihood via gradient descendent methods[Hastie et al., 2005], or ﬁnding the point estimate by solving iteratively the following system of equations  ∂ φ log P ( X | X , ν , λ , χ , γ , γ ) = 0 , ∀ φ = ν , λ , χ ,∂ γ ,k log P ( X | Y , ν , λ , χ , γ , γ ) = 0 , ∀ k = 1 , ..., p − ,∂ γ ,k log P ( X | Y , ν , λ , χ , γ , γ ) = 0 , ∀ k = 1 , ..., p − , (A.2)together with the conditions γ ,p = 1 − γ , − ... − γ ,p − and γ ,p = 1 − γ , − ... − γ ,p − .In both cases, the estimation algorithm needs a starting point. A random point in the parameter spacecan be used. However, we suggest to adopt the solution of the Yule-Walker equations for the VDAR(p)model.A.1.1. Yule-Walker equations for VDAR(p).

By taking expectations of both sides of Equation (2.3), weobtain ( E ( X t ) = ν ((1 − λ ) P pk =1 γ ,k E ( X t − k ) + λ P pk =1 γ ,k E ( Y t − k )) + (1 − ν ) χ E ( Y t ) = ν ( λ P pk =1 γ ,k E ( X t − k ) + (1 − λ ) P pk =1 γ ,k E ( Y t − k )) + (1 − ν ) χ . (A.3)This is formally equivalent to the expectation of a bivariate VAR(p) model [Tsay, 2005] E ( Z t ) = φ + p X k =1 Φ k E ( Z t − k ) (A.4)with φ ≡ { φ , , φ , } and Φ k as × matrix ∀ k = 1 , ..., p , by using a suitable one-to-one mapping ofparameters and with Z t ≡ ( X t , Y t ) ′ .It is also trivial to show the following identity E ( ˜ Z t ˜ Z ′ t − k ) = Φ k E ( ˜ Z t − ˜ Z ′ t − k ) ∀ k = 1 , ..., p, (A.5)where the tilde indicates the mean subtracted variable.The Yule-Walker estimator of the parameters φ , Φ k , ∀ k = 1 , ..., p is a standard result of time seriesanalysis, see [Tsay, 2005] for a reference, and can be obtained by solving the linear system (A.5) for theentries of Φ k ∀ k = 1 , ..., p , then solving (A.4) for φ . Finally, the estimated parameters of the bivariateVDAR(p) model are obtained by inverting the one-to-one mapping between (A.3) and (A.4).A.1.2. Optimal selection of p . Assume to estimate the bivariate VDAR(p) model (2.3) for diﬀerent orders p , thus asking for the value of p describing better the data. The optimal p can be selected by using theBayesian Information Criterion (BIC) [Hastie et al., 2005], as follows. The BIC statistic is deﬁned asBIC ( p ) = 2(2 p + 1) log T − P ( X , Y | ˆΘ ( p ) ) (A.6)where ˆΘ ( p ) is the MLE of the p + 1) VDAR(p) parameters and T is the sample size. Notice that log P ( X , Y | ˆΘ ( p ) ) = log P ( X | Y , ˆΘ ( p ) ) + log P ( Y | X , ˆΘ ( p ) ) , (A.7)where log P ( X | Y , ˆΘ ( p ) ) is as in (A.1) (and a similar relation holds for log P ( Y | X , ˆΘ ( p ) ) ), by using thechain rule. Finally, the optimal value of p is the one associated with the lowest BIC.A.2. MLE of DAR(p).

In implementing the Likelihood-Ratio test, the MLE of the DAR(p) model isrequired to obtain the test statistics (2.8). The maximum likelihood inference of the DAR(p) model (2.1)is similar to the case of VDAR(p), but without dependence on Y , thus resulting in imposing λ = 0 and γ ,k = 0 ∀ k = 1 , ..., p in (A.1). Then, the inference process follows similarly as described before.The Yule-Walker estimator of the DAR(p) model is described in [Jacobs and Lewis, 1978]. The optimalselection of the order p is as before. .3. MLE of multivariate VDAR(1) and Decimation of coupling parameters.

Assume to ob-serve the binary time series X ≡ { X it } i =1 ,...,Nt =1 ,...,T following the VDAR(1) process (2.9) and looking at themaximum likelihood estimator of model parameters ν ≡ { ν i } i =1 ,...,N , λ ≡ { λ ij } i,j =1 ,...,N with conditions P Nj =1 λ ij = 1 ∀ i = 1 , ..., N , and χ ≡ { χ i } i =1 ,...,N . The log-likelihood of observing the time series X is ℓ ( X , ν , λ , χ ) ≡ log P ( X , ..., X T | X , ν , λ , χ ) = log T Y t =2 P ( X t | X t − , ν , λ , χ ) == T X t =2 N X i =1 log  ν i N X j =1 λ ij δ X it ,X jt − + (1 − ν i )( χ i ) X it (1 − χ i ) − X it  , (A.8)by using the chain rule and conditioning on the ﬁrst observation X , where δ A,B is the Kronecker deltataking value equal to one if A = B and zero otherwise.MLE of the parameters is obtained by maximizing the log-likelihood via gradient descendent methods[Hastie et al., 2005]. Since the number of parameters is of order O ( N ) , thus sharply increasing with thenumber of variables N , it is preferable to adopt the solution of the Yule-Walker equations associated withthe VDAR(1) model (2.9), instead of a random input, as starting point of the gradient descent method.A.3.1. Yule-Walker equations for VDAR(1).

Similarly to before, The Yule-Walker equations of the VDAR(1)model are formally equivalent to the ones of the VAR(1) model [Tsay, 2005] E ( X t ) = φ + Φ E ( X t − ) (A.9)with φ ≡ { φ ,i } i =1 ,...,N and Φ ≡ { Φ ,ij } i,j =1 ,...,N , and by using a suitable one-to-one mapping betweenthe two sets of parameters { ν , λ , χ } and { φ , Φ } . In particular, it is  ν i = P Nj =1 Φ ,ij , ∀ i = 1 , ..., Nλ ij = Φ ,ij P Nj =1 Φ ,ij , ∀ i, j = 1 , ..., N − χ i = φ ,i − P Nj =1 Φ ,ij , ∀ i = 1 , ..., N (A.10)where P Nj =1 λ ij = 1 , by construction.It is also trivial to show the following identity E ( ˜ X t ˜ X ′ t − ) = Φ E ( ˜ X t − ˜ X ′ t − ) , (A.11)where the tilde is for indicating the mean subtracted variable.Thus, the Yule-Walker estimator of the parameters φ , Φ is a standard result of time series analysis,see [Tsay, 2005] for a reference, and can be obtained by solving the linear system (A.11) for Φ , thensolving (A.9) for φ . Finally, the Yule-Walker estimator of the parameters of the V-DAR(1) model isobtained by inverting the one-to-one mapping (A.10).A.3.2. Decimation of VDAR(1).

The MLE ˆ λ of the parameter { λ ij } i,j =1 ,...,N need to be validated toreveal the causal interactions between the random variables { X i } i =1 ,...,N , and a possible procedure is theso-called Decimation [Decelle and Ricci-Tersenghi, 2014, Decelle and Zhang, 2015]. It was shown to beworking best for the kinetic Ising model [Campajola et al., 2019], which is a logistic regression model forbinary random variables. Decimation aims at pruning any parameter which is considered as unnecessaryto increase the likelihood. The process starts by setting parameters to zero based on their absolute size,starting with the smallest one, and stops when a transformed likelihood function ˜ ℓ is maximized, ˜ ℓ ( q ) = ℓ ( q ) − [(1 − q ) ℓ max + qℓ ] , (A.12)where q is the fraction of pruned parameters, ℓ ( q ) is the log-likelihood (A.8) of the model (2.9) with q pruned parameters, ℓ max is the log-likelihood of the full model obtained with MLE ˆ λ , and ℓ is thelog-likelihood (A.8) of the VDAR(1) model with parameters λ ij set to , ∀ i, j = 1 , ..., N ( i.e. consideringdata as generated by Bernoulli marginal distributions). The output of the regularization method is a setof validated parameters, in particular a matrix ˜ λ of couplings which is sparser than the initial MLE ˆ λ .Thus, a non-zero entry ˜ λ ij of the matrix ˜ λ describes a Granger causality in tail relation from j to i ..