[PDF] A Dynamic Bayesian Model for Interpretable Decompositions of Market Behaviour

Abstract

We propose a heterogeneous simultaneous graphical dynamic linear model (H-SGDLM), which extends the standard SGDLM framework to incorporate a heterogeneous autoregressive realised volatility (HAR-RV) model. This novel approach creates a GPU-scalable multivariate volatility estimator, which decomposes multiple time series into economically-meaningful variables to explain the endogenous and exogenous factors driving the underlying variability. This unique decomposition goes beyond the classic one step ahead prediction; indeed, we investigate inferences up to one month into the future using stocks, FX futures and ETF futures, demonstrating its superior performance according to accuracy of large moves, longer-term prediction and consistency over time.

Full PDF

AA Dynamic Bayesian Model for InterpretableDecompositions of Market Behaviour

Th´eophile Griveau-BillionDepartment of Statistics, University of Imperial College LondonandBen CalderheadDepartment of Statistics, University of Imperial College LondonJanuary 22, 2020

Abstract

We propose a heterogeneous simultaneous graphical dynamic linear model (H-SGDLM), which extends the standard SGDLM framework to incorporate a hetero-geneous autoregressive realised volatility (HAR-RV) model. This novel approachcreates a GPU-scalable multivariate volatility estimator, which decomposes multipletime series into economically-meaningful variables to explain the endogenous and ex-ogenous factors driving the underlying variability. This unique decomposition goesbeyond the classic one step ahead prediction; indeed, we investigate inferences up toone month into the future using stocks, FX futures and ETF futures, demonstratingits superior performance according to accuracy of large moves, longer-term predictionand consistency over time.

Keywords:

Dynamic Bayesian model, dynamic graphical model, GPU computation, market-stress forecasting, sparse multivariate model, volatility forecasting.

Word count: a r X i v : . [ q -f i n . C P ] J a n Introduction

The behaviour of each asset in the market is driven by both endogenous factors representingthe information speciﬁc to that asset and exogenous factors representing the impact of themarket. The heterogeneous market hypothesis considers that agents in the market tradewith diﬀerent objectives. While these objectives can be related to many characteristics,M¨uller et al. (1997) argue that most of them are reﬂected in the time horizon and highlightthis fact by studying the impact of heterogeneous investment horizons with a volatilitymodel using the returns computed at diﬀerent frequencies. Following this reasoning, thestudy of a time series at diﬀerent frequencies should reﬂect the impact of endogenous fac-tors on that asset. On the other hand, for each asset the exogenous variables correspondto the time series that have the greatest impact on the behaviour of that asset. The classicapproach involves creating a graph of cross-series relationships between the time series ina market based on the covariance matrix. However, the covariance is a symmetric object,while in reality some assets might inﬂuence many and others none, in a non-symmetricmanner. Hence, a model that selects the exogenous factors without assuming such symme-try in the relationship may be more appropriate. Combining these two sources of drivingfactors, our aim is to construct a model that decomposes each time series into its endoge-nous and exogenous parts. Having such a decomposition gives us a better understandingof what is driving each time series behaviour and thus the market as a whole. A betterstructural understanding both for a single asset and the overall market allows us to producemore accurate inferences and stress indicators.We propose a model that extends the Simultaneous Graphical Dynamic Linear Model(SGDLM) of Gruber and West (2016a,b); Zhao et al. (2016); McAlinn and West (2016) toincorporate the heterogeneous autoregressive realised volatility (HAR-RV) model of Corsi(2004); Corsi et al. (2008); Corsi and Reno (2009). Combining the HAR-RV model with theSGDLM framework creates an easy-to-scale multivariate volatility estimator. Each timeseries of daily log-volatility is a DLM with idiosyncratic factors from the HAR-RV model,cross-series relationship factors from the multivariate Wishart and any additional variablesspeciﬁc to this time series. The variables can be clustered into two groups: endogenous and2xogenous. The endogenous group represents the information speciﬁc to the stock, whilethe exogenous one represents the inﬂuence of the environment. As a result of the ﬂexibilityof the SGDLM framework, as long as the normality condition of each DLM is respectedthey can easily be extended to include additional variables, and these do not have to be thesame for every series. Hence, we will present diﬀerent extensions that move further awayfrom the standard HAR-RV formulation .The decomposition performed by our proposed model can explain at any time whicheconomic variables are likely to be driving the variability; for example, it may be due to thesector, the market or internal information, or at a particular frequency. This decompositionof the move into economically-meaningful variables and the capacity of the algorithm tofollow their evolution creates new signals to study. In order to get the most out of thesediﬀerent signals we used a simple scale space change point algorithm, such that the cor-relations between these signals and the underlying time series allows us to perform morereliable inference for days and weeks ahead for each stock, and indeed the whole market.In addition, this model has proven to be an eﬃcient market stress indicator and forecaster.While Gruber and West (2016a) observed striking similarities between some metric of theSGDLM and the market stress index St. Louis Fed Financial Stress Index, our modelappears to oﬀer insight into the likely moves of this index weeks ahead.In order to assess the performance of the HSGDLM model we look at the percentage ofmeasured points that lie within the inferred conﬁdence interval. We are especially interestedby the ﬁgures obtained for large moves in the variance, in particular positive moves, sincemore than 68% of them are negative. We compare the percentage of correct predictionsbetween algorithms for a conﬁdence interval smaller than the move. If our novel approachfor decomposing the variance into diﬀerent groups of variables representing complementaryinformation at diﬀerent scales works, it should give us new insights on what is drivingthe market and thus be much better at predicting large moves. Indeed, when consideringthe realized variance of stocks in a group of 487 European stocks over 18 years from 2001to 2019, the HAR-RV and SGDLM models correctly predicted only respectively 53 . .

69% of the changes in variance bigger than 9 . .

89% of them correctly and in a diﬀerent environment with stocks from the S&P500 overthe same period, with the same model and parameters, that ﬁgure is 65 . Literature review

M¨uller et al. (1997) proposed the HARCH variance model to build on the idea of heteroge-neous investment horizons. Following their approach, we start by working with variancesbefore extending the model to other metrics. The variance of a time series of asset priceshas many interesting properties that many models have previously tried to capture. Somestylized facts of variance-processes are of particular interest to us. The ﬁrst is the timeasymmetry observed in ﬁnancial time series, i.e. the importance of distinguishing past fromfuture. From this we can conclude that a model of variance should not be time reversible.This assumption then leads to a leverage eﬀect, which is the response asymmetry betweenthe magnitude of previous stock returns and future variance. Furthermore, we may considermultifractality, which deﬁnes how the distribution of returns should change when studiedat diﬀerent time scales. These stylized facts are not reproduced by the widely used stochas-tic volatility or GARCH models. Further complicating matters, the distribution of eachreturns time series may have diﬀerent characteristics and these can evolve through time.Papers such as M¨uller et al. (1997), Muzy et al. (2000), Lynch and Zumbach (2003) studiedthe multifractal behaviour and time asymmetry of ﬁnancial returns. In Calvet and Fisher(2001, 2004) the authors extended the multifractal model of asset returns introduced byMandelbrot to propose a stationary volatility model that can be estimated with maximumlikelihood. Their Markov switching volatility model is based on a multiplicative cascadeof volatility components, each representing a diﬀerent frequency. While this model has aclosed form likelihood it still requires an optimization step and the parameters lack clearinterpretation. In contrast, we seek a model that helps us understand the time series, suchthat all parameters and variables have a clear interpretation.

As explained by the heterogeneous market hypothesis in M¨uller et al. (1997), diﬀerentmarket participants trade with diﬀerent objectives. One of them is the time scale and it5mpacts participants asymmetrically; the proposed HARCH model is a modiﬁed GARCHmodel using returns at diﬀerent frequencies. They used this decomposition to show thatlower time scales inﬂuence higher scales more than the other way around. In other words,short term traders are more greatly impacted by trades from long term traders than theother way around. This information asymmetry between scales has been further detailedin many papers by studying the volatility cascade, i.e. the ﬂow of information betweenscales, see Zumbach and Lynch (2001) for a study of this cascade and its parallel withideas from physics. Indeed, the name came from a similar concept in physics where theﬂuid vorticity cascade observed in turbulent states has a comparable behaviour to thedistribution of ﬁnancial returns at diﬀerent scales, thus allowing the same mathematics tobe used for volatility modelling. See also for example Muzy et al. (2000) for a multifractalrandom walk model that produces a scale-invariant stochastic volatility model. This formof statistical feedback model led Borland (1998, 2002, 2004); Borland et al. (2005); Borlandand Bouchaud (2005) to propose a process with a noise component following the non-linear Fokker-Planck equation, the solution of which results in a q-Gaussian or Tsallisdistribution. With this approach the model could theoretically give a distribution thatmodels the behaviour of the price process for diﬀerent frequencies and thus characterizeeach stock into diﬀerent categories depending on the value of q . However, although thiscomplicated distribution has interpretable parameters, it is hard to ﬁt with few data points. The models described previously are able to reproduce most stylized facts, especially mul-tifractality, however they are hard to estimate and can lack clear economic interpretation.While many papers make use of multiplicative models, such as Calvet and Fisher (2004),for reproducing multifractality, LeBaron (2001) proved for the ﬁrst time that a simplethree-factor additive model could also display this behaviour. Based on these results, Corsi(2004); Corsi et al. (2008); Corsi and Reno (2009) build the HAR-RV model which uses theheterogeneous market idea to create a simple additive volatility model combining AR (1)processes at diﬀerent scales. This model is able to reproduce multifractal scaling while6onserving simple economic interpretations and being easily extendible to incorporate ad-ditional stylized facts such as leverage and jumps Corsi and Reno (2009). They used thisdecomposition into diﬀerent time-scales to learn which economic factor is inﬂuencing thetime series. But this was done for each stock individually and lacks multivariate connec-tions. In order to obtain a model of the market as a whole it is important to take into account thecross-series relationships between volatilities. Many studies have shown the clear perfor-mance increase of integrating the multivariate eﬀect in their model, such as multivariate-GARCH and -HEAVY models, see e.g. Noureldin et al. (2011), however such models useheavy MCMC parametrization techniques which make them expensive to scale. Otherpapers proposed methods to minimize this cost; for example Nakajima (2014) proposeda multivariate SV model using a covariance matrix Cholesky decomposition to allow forparallel evaluation of independent univariate SV models but still ﬁt the model with anexpensive MCMC procedure. Another limitation of most multivariate models is their de-pendence on a historical window to perform the parametrization compared to employinga sequential learning approach. One reason for this is the complexity of online learningalgorithms for non-Gaussian models, see e.g. Johannes et al. (2005), which presents anauxiliary particle ﬁlter to sequentially update a stochastic volatility model with jumps. Incontrast, our proposed model is multivariate and due to a Gaussian setup it can sequentiallyadapt its parameters sequentially to the evolving market.

The novel model we present here aims at explaining what drives the time series. Thisapproach will give both new structural information about the market, as well as betterlonger-term forecasts. We build each module of the model to capture the diﬀerent ob-servations described above, and then combine each time series into a multivariate modelof the market under study. This is possible using the SGDLM framework developed by7ruber and West (2016a,b); Zhao et al. (2016); McAlinn and West (2016), which consid-ers an environment of independent time series modelled by a simple Normal DLM, usinga sparse covariance matrix representation to incorporate cross-series relationships. Also,the SGDLM does not follow the classic multivariate variance approach of assuming sym-metric connections between assets by using a covariance matrix to model the cross-seriesrelationship. Instead, the SGDLM has a multivariate Wishart model running in parallelto sequentially select for each time series which other assets have inﬂuence on it. Afterthis selection phase, the candidates are tested as a regression variable. This techniquescreates non-symmetric cross-series relationships which produce interesting information onthe underlying market risks.In addition, compared to the previously described multivariate approaches, the combi-nation of de-coupling and re-coupling steps allows for parallel updates of each DLM withKalman ﬁlter equations. This highly parallel architecture to sequentially update the seriesin a large scale environment is perfectly suited for a GPU. Since the DLMs are independentthey do not need to follow the same model; their only requirement is for the distributionsof the states and observations to be Normally distributed. Since this sequential approachcontinuously updates the coeﬃcients of the DLMs, their distribution will adapt to changesin the environment.

The motivation for this paper started with a simple observation. In an article written forRisk, Anderson et al. (2000) used a volatility signature plot to determine the appropriatefrequency to use for computing the realized variance (RV) over high frequency returns. Thisplot corresponded to the realized variance averaged at diﬀerent time scales, which they thenused to determine the time scale on which the microstructure eﬀects stop interfering onthe realized variance. On this graphic they also observed distinct patterns for liquid andilliquid stocks. In the ﬁrst case the realized variance was increasing with the scale while8t was decreasing for the illiquid stocks. A similar approach was used in Borland (2004)and Borland and Bouchaud (2005) where they used the variogram computed with a stocksreturns to exhibit the properties of multifractality and correlation of volatilities across timescales. Their variogram corresponded to the averaged squared diﬀerence between squaredreturns at diﬀerent scales. Both of these approaches used high frequency data, but whatwould we see on lower frequency scales, such as daily returns? Following this logic wecomputed the averaged realized variance for diﬀerent frequencies of returns ranging fromdays to weeks, rescaled by frequency. The common stochastic diﬀusion process of log-pricesassumes a property of self-similarity. If that was correct, as explained in Borland (2004),the variance should be scale-invariant and thus exhibit a straight line on the variogram.In practice we observe three diﬀerent clusters of stocks on this plot. Stocks with variancesthat increase, stay constant or decrease w.r.t. the time scale, i.e. going from daily toweekly returns. Can we explain this behaviour? As detailed in the following section, wecan indeed formulate diﬀerent economic interpretations which lead us to follow this logicfurther.

We expect to detect with a variogram what practitioners often describe as the gammaeﬀect. Gamma corresponds here to the Greek letter Γ which represents for option hedgersthe derivative of the delta. The delta ∆ corresponds to the ﬁrst derivative of the priceof the option with respect to the price of the underlying asset and aims to measure thesensitivity of the option to the price of the asset. Hence the gamma represents the secondorder sensitivity to variations in the price of the asset.Let us consider stocks; traders have to delta-hedge their positions, which signiﬁes buyingor selling a certain quantity of stocks in order for their ∆ to be zero, and to do so theyfollow the results given by the equations of Black-Scholes. However, as they are not alonein the market a stocks price might not move in their direction. When Γ is positive, if theprice increases then the ∆ will increase too; while for a negative Γ the ∆ will decrease.Thus, depending on the Γ traders will not react symmetrically to the move of the stock.9urthermore, when a trader wants to delta-hedge his position this second order eﬀect mighthave an unexpected impact if the absolute value of Γ is high, since it means a high sensitivityon the variation of the stocks. This is a simple description, for more details on Delta andGamma hedging refer to Hull (2017).If we consider a stock on which a trader has a big option position its hedging mighthave a detectable impact on the time series of prices. Indeed, in the case where the tradertries to keep the price in a speciﬁc range, we might not see any impact on the daily scale,but on a lower one, such as weekly returns, we might observe a variance lower than usualdue to this bounding eﬀect.

The proof of concept for our intuition uses the following computation. For each stock weselect the last L = 180 close-prices. Then for each speciﬁc time frequency we compute therealized variance, RV , of the exponentially weighted returns. Let l be the time lag, p thestock price, w the exponential weights and ρ the selected exponential decay speed, then for l > ρ l = ρ ( L − / ( L − l ) ,w i = (1 − ρ l )( ρ l ) i ,RV l = 1 l L (cid:88) i = l w i (cid:18) log (cid:18) p i p i − l (cid:19)(cid:19) . Where ρ l = ρ ( L − / ( L − l ) ensures that the sum of the exponential average weights is thesame at each frequency l . For each stock we compute RV l for l ∈ [1 ,

5] to compare the dailyvariance with the weekly one, where l corresponds to the returns’ frequency used for thecomputation. By re-weighting the variance by the time-scale we want to observe the spreadwith the commonly used theoretical variance rescaling. For example to obtain a weeklyvariance from a daily one it suﬃces to multiply it by 5. This comes from the assumptionthat prices are i.i.d, see Diebold et al. (1997) Diebold et al. (1997) for a critique of thispractice. Hence the expected behaviour would be for the function RV l to be constant withrespect to the time scale l . 10 .4 Results for diﬀerent stocks In Figure 1 we show the result of the previously described computation for some selectedstocks in the European market. This graphic exhibits three clusters: stocks with an increas-ing, constant, or decreasing variogram. Naturally the group of constant RV l are not strictlyconstant and should be better described as oscillating. The other two groups increase ordecrease at diﬀerent speeds. Indeed, while some seem to have an exponential variation,others are more linear. While the graphic shows only a few sampled stocks, we observedthis behaviour across many diﬀerent markets and stocks. Another interesting aspect of thisphenomenon is that stocks with an increasing or decreasing variogram tend to keep thisbehaviour for some period of time.Figure 1: Variogram for 6 diﬀerent European stocks. The X-axis represents the timelag used to compute the realised variance while the Y-axis corresponds to the obtained RVvalues re-scaled to obtained a lag one value of 1. The curves in black represent the expectedbehaviour. In blue are two selected stocks with increasing RV, while those in green areselected examples with decreasing RV.Following the description above, a stock with a bounding eﬀect on its time series wouldhave a decreasing variogram. On the other hand, stocks which tend to diverge will have a11eekly variance higher than the daily one due to these sparse explosions, which creates anincreasing variogram. While this explanation makes sense, it is hard to prove if this is thedominating eﬀect. The interesting point here is that by simply observing the time seriesof prices at diﬀerent frequencies we could potentially detect such behaviour. That leadsus to go further and try to decompose the moves into diﬀerent and interpretable economicfactors that will explain the stocks behaviour. The SGDLM model developed by Gruber and West (2016a,b); Zhao et al. (2016); McAlinnand West (2016) brings together many diﬀerent statistical techniques. Before it, the classicapproach to working with DLMs in a multivariate setting was the inverse Wishart model;see Prado and West (2010) for a detailed description of this model and most tools used inthe SGDLM. The motivation for this set-up was to study each time series of a multivariatemodel independently in parallel while still modelling the combined multivariate distribu-tion. They used the variational Bayes approximation to approximate the multivariatemodel into independent individual distributions and then recouple them with importancesampling. This allows each time series’ DLM to be diﬀerent from the others. Let us denoteby y j,t the observations, θ j,t = ( φ j,t , γ j,t ) the state vector, G j,t the state evolution matrix, F j,t the vector of variables, ν j,t the observations’ noise independent of ω j,t the states’ noise.Therefore, λ j,t corresponds to the observations’ precision and W j,t to the states’ covariancematrix. Hence the DLM of series j at time t reads : y j,t = F jt θ j,t + ν j,t = x j,t φ j,t + y sp t ( j ) ,t γ j,t + ν j,t ,θ j,t = G j,t θ j,t − + ω j,t , with: ν j,t ∼ N (0 , λ − j,t ) ,ω j,t ∼ N (0 , W j,t ) . x j,t and φ j,t represent variables and coeﬃcients of the internal predictors, while y sp t ( j ) ,t and γ j,t correspond to the variables and coeﬃcients of the cross-series time varyingconditional dependencies. In other words ( x j,t , φ j,t ) represents the endogenous informationand ( y sp t ( j ) ,t , γ j,t ) the exogenous one. sp t ( j ) denotes the group of selected simultaneousparents which is adaptively revised over time to capture the changes in the market. Let usdeﬁne µ j,t = x j,t φ j,t as the mean, Λ t as the precision-matrix with only λ j,t on the diagonal,and the matrix with the coeﬃcients over the related parents by Γ t , then we can write themultivariate model in matrix form: y t ∼ N ( A t µ t , Σ t ) , With: A t = ( I − Γ t ) − , Ω t = Σ − = ( I − Γ t ) T Λ t ( I − Γ t ) . Once this decomposition between endogenous and exogenous variables has been deﬁned,the parents need to be chosen. In the ﬁrst version of the model a ﬁxed matrix deﬁnesthe relationships between the diﬀerent time series to select the exogenous variables. Thisproves particularly limiting as the performance of the whole model heavily depends on thechoice of this matrix. In the latest evolution the authors added the two letters SG forSimultaneous Graphical. This means that a multivariate Wishart DLM runs in parallelto the main algorithm to build a relational graph and select which exogenous variables toinclude in the DLM. The selection of this group of parents will be detailed in the followingsection.It is interesting to emphasize how this set-up diﬀers from the classic multivariate mod-elling approaches. Usually the multivariate distribution of the market will represent thecross-series relationships by non-zero elements in the covariance matrix, denoted Σ here.This implicitly assumes the relationships to be symmetric. While in the SGDLM model thecross-series coeﬃcients are present in each individual DLM as variables of the regression,( y sp t ( j ) ,t , γ j,t ), hence in the mean part of the individual DLM. Furthermore, in this modelthe step selecting the parents variable is separated from the one computing its value. Itﬁrst ﬁts a multivariate inverse-Wishart distribution to the environment under study and13se it to create a short-list of candidates. Members of this list are then included in theDLM formulation and the model will compute with its sequential update their coeﬃcients.Therefore, it creates an asymmetrical relationship between the stocks in the environment;for each individual DLM the parents are selected because they impact the behaviour of thestock.By restricting each DLM to follow a log-normal-gamma distribution we can sequentiallyupdate the parameters of the DLM. This uses a Kalman ﬁlter with a Gamma distributionfor the noise. Using this sequential update is possible thanks to the decomposition of themultivariate distribution into simple DLMs for which an extensive theory exists. Havingindependent equations for each time series will later allow us to include as endogenousfactors an entire economic model. Another quality of this model is its capability to learnsequentially. Thanks to the Bayesian formulation and Kalman ﬁlter the model will continu-ously update the parameters of the distributions without depending on a speciﬁc look-backwindow. Indeed in the Kalman ﬁlter the past information is only present through thecurrent value of the mean and variance of the distribution. That will allow us to followthe evolution of the coeﬃcients through time and for the model to quickly adapt in case ofrapid changes in the market. The group of parents for stock j , sp t ( j ), includes three subsets: the core sp core,t ( j ), up sp up,t ( j ) and down sp down,t ( j ) sets. The core parental set sp core,t ( j ) corresponds to theselected group of series used as predictors for j . In the up set are core-group candidatesfor entry into the parental set. In the down set are all the series previously in the core orup group.A multivariate Wishart DLM runs in parallel to the main algorithm to update theparental sets. With this model we obtain a dense covariance precision matrix which is usedto select the candidates to the up set. For each series j the n max series with the highestprecision element become candidates for inclusion in the up set. As long as the size limit14f the up set is not reached the n max series not currently in the up or core group are addedto the up set. The candidates stay in this set during ∆ T time before being considered forpromotion to the core set. During that time they will be part of the parental set y sp t ( i ) ,t used in the SGDLM. When the core group is full all the members of the up- and core-setsare ranked according to their signal to noise ratio, a t /R t , and the smallest values retiredto the down set. Down set members will have their coeﬃcients gradually put to zero over∆ T time periods.Issues arise in the choice for n max , ∆ T and the number of parents allowed. For example,if n max equals the number of parents allowed in the core and up sets, then the parents setcan only be updated every ∆ T times since it is full in-between. If the up-set does not haveany size limit and the number of series considered for the up-set, n max , is high then thenumber of parents could reach the total number of series m . These parameters will alsoinﬂuence the frequency at which the core-set can change all its members. A large n max willresult in big and noisy up-sets and the bigger the up-set the higher the potential turnoverrate of the core-set. A smaller n max will limit the maximum number of changes in thecore-set, thus reducing the noise in those sets. If the number of allowed parents in theup-set is small, ∆ T determines the turnover-frequency of the core-set. In addition, a short∆ T will put more importance in the choice of prior for the new members of the up-set,while, with a longer ∆ T the prior choice becomes irrelevant in the signal to noise ratio usedin the acceptance decision.These choices of parameters will inﬂuence the size of the matrices used in the followingsteps, since they increase as a square of the number of parents. Hence, while the choiceof parameters does not inﬂuence the computation of the parents-update it does have asigniﬁcant impact on the memory requirement of the other steps. The standard multivariateWishart DLM and parent updates do not have any computationally intensive steps.15 .3 Posterior update We update the posterior with the data measured at D t , and parameters computed at t −

1. Since the time series are independent we can update them separately. Each inde-pendent DLM posterior distribution follows a Normal Gamma distribution p ( θ j,t , λ j,t |D t ) ∼N G ( m j,t , C j,t , n j,t , s j,t ): θ j,t | λ j,t , D t ∼ N ( m j,t , C j,t / ( s j,t − λ j,t )) ,λ j,t |D t ∼ G ( n j,t / , n j,t s j,t − / . Where m j,t corresponds to the mean, n j,t the shape of the Gamma distribution and n j,t s j,t − its scale. The notations were motivated by the observation that following those equationsthe distribution of θ j,t is multivariate student-t distribution with n j,t degrees of freedomand scale matrix C j,t . As a result of this simple formulation the posterior update followsthe Kalman ﬁlter’s equations. The multivariate posterior P (Θ t , Λ t |D t ) is obtained by re-coupling those independent posterior distributions with importance sampling.The multivariate normal distribution sampling for ( θ j,t | λ j,t , D t ) will require a Choleskydecomposition of the covariance matrix. But since in practice the covariance matrix C j,t positive deﬁniteness is not guaranteed we can run into numerical issues when sampling thisvariable. We therefore added in our implementation a shrinkage method to guarantee thepositive deﬁniteness of this covariance matrix estimation.This step quickly becomes computationally expensive with a large number of series,samples and allowed parents. Let N MC represent the number of Monte-Carlo samples, m the number of series and K the number of parents. Then the posterior sampling requires m × N MC K -dimension multivariate normal-gamma computations. The independent DLMs are recoupled to form the multivariate posterior distribution withimportance sampling: p (Θ t , Λ t |D t ) ∝ | I − Γ t | m (cid:89) i =1 p ( θ j,t , λ j,t |D t ) . I is the identity matrix and |−| corresponds to the determinant. Hence the im-portance sampling weights are deﬁned by α ∝ | I − Γ t | . The weights are computed withMonte Carlo samples from the independent Normal-Gamma posterior distribution. Sincethe number of parents will stay low compare to m the coeﬃcient matrix Γ t will be sparse.Following previous section reasoning, the exact posterior requires the computation of N MC m × m -sparse-matrices determinants to obtain the importance sampling weights.In Gruber and West (2016b) they used the importance sampling entropy as a proxyof the Kullback-Leibler distance and thus of the quality of the model-ﬁt. They deﬁne theentropy H by: H N = N (cid:88) n =1 α n log( N α n ) . We will study this variable as a proxy of the environment stress. The higher the en-tropy the lower the approximation quality, which may be interpreted as a sign of increasedenvironment-variance as a result of divergence between the model and the observations.

The one-step-ahead inference corresponds to the prior update p (Θ t +1 , Λ t +1 |I t +1 D t ). Theauthors used Variational Bayes to decouple the multivariate distribution into independentseries and thus process the time series in parallel. The Variational Bayes decompositionapproximates the multivariate posterior p (Θ t , Λ t |D t ) by a product of Normal-Gammas p ( θ j,t , λ j,t |D t ) ∼ N G ( m j,t , C j,t , n j,t , s j,t ) by minimizing the Kullback-Leibler (KL) distancein the usual manner.The resulting KL distance at each time t gives information on the quality of the ﬁt.Signiﬁcant changes in the environment under study would decrease the approximationquality and thus increase the KL distance. Gruber and West (2016a) applied this modelon the S&P500 time series and observed similarities between a rescaled KL measure andthe St. Louis Federal Reserve Bank Financial Stress Index. That observation led them tostudy further the strength of this relation, see Section 7.6 for a more detailed discussion.17nce the multivariate distribution is decomposed into independent DLMs the param-eters are updated independently in parallel to t + 1. The updated parental set updatesthe ﬁlter’s parameters ( a j,t +1 , R j,t +1 , r j,t +1 , W j,t +1 ). The innovations follow a Gamma-BetaStochastic volatility update λ j,t +1 = λ j,t b j,t /β λ where b j,t ∼ Beta ( β λ n j,t / , (1 − β λ ) n j,t / β λ inﬂuences the smoothness of the predicted λ . The Gamma-BetaStochastic volatilities appears through the gamma parameter update r j,t +1 = β λ n j,t . Ac-cording to the DLM equations, the states follow the update θ j,t +1 = G j,t +1 θ j,t + ω j,t +1 with ω j,t +1 ∼ N (0 , W j,t +1 / ( s j,t λ j,t +1 )). Each DLM has an updated prior distribution p ( θ j,t +1 , λ j,t +1 |D t ) ∼ N G ( a j,t +1 , R j,t +1 , r j,t +1 , s j,t ).Samples from these independent priors are combined to obtain the multivariate distribution y t +1 ∼ N ( A t +1 µ t +1 , Σ t +1 ), with ﬁrst moments, A t +1 µ t +1 , and second moments, Σ t +1 . Thisstep is computationally expensive and requires the computation of N MC × m multivariatenormal gamma distribution to obtain p ( θ j,t +1 , λ j,t +1 |D t ), N MC inverses and matrix-matrixproducts of ( m, m ) matrices to compute the ﬁrst two moments A t +1 µ t +1 = ( I − Γ t +1 ) − µ t +1 and Σ t +1 = (cid:16) ( I − Γ t +1 ) T Λ t +1 ( I − Γ t +1 ) (cid:17) − .Inference more than one day ahead can be computed recursively with the previous one-step-ahead prediction. Before the next step prediction, the ﬁlter’s parameters a j,t +1 and R j,t +1 are updated with the previously inferred value. As before we produce Monte-Carlosamples of the ﬁrst two moments to infer time t + k with the computed parameters at t and inferred values at t + k −

1. The diﬃculty comes from the internal predictors variables x j,t which requires to have the values for the next step update. Hence it is only possible torecursively produce inference more than one step ahead if the predictors’ variables can beupdated with the previously inferred values.18 .7 Parameters sensitivity Due to its Bayesian structure the model has a few hyper-parameters. For the parent stepthe size of the diﬀerent groups is ﬁxed and must be decided with the update frequency.Due to the impact of these variables on the memory requirement, the hardware limits thepossible range. Regarding the parent set, the newly selected variables will need a meanand variance to initialise their distribution. We choose a zero mean and small variancefor them not to impact the rest of the regression knowing that this value will quickly beupdated with the Kalman ﬁlter in the subsequent steps.The states θ j,t follow a random walk update θ j,t +1 = G j,t +1 θ j,t + ω j,t , where the co-variance matrix for the noise terms W j,t follow a block discounting update; we refer tothe appendix for the detailed equations. This step uses discounting factors δ j,φ for theendogenous variables and δ j,γ for the parental ones. The multivariate Wishart also usesthis discounted covariance update technique with parameter δ w . These parameters inﬂu-ence the variance-of-variance of the states coeﬃcients. Similarly, the precision λ j followsa Gamma-Beta Stochastic volatility update which needs a parameter β λ that inﬂuencesthe variance of this update. The same parameter, β w is also present with the same role inthe multivariate Wishart. These are the ﬁve hyper-parameters of the SGDLM framework.Gruber and West (2016b) studied the inﬂuence of the discounting parameters and gavetheoretical bounds for the betas. Corsi (2004); Corsi and Reno (2009); Corsi et al. (2008) developed the HAR-RV, Heteroge-neous Autoregressive Realised Volatility model for two main reasons: ﬁrst the model mustreproduce the fat-tails, long-memory, scaling and multifractal volatility behaviours; secondit must be easy to estimate and keep clear economic interpretations. They used the easy tocompute Realised Volatility as a proxy of the unobservable latent volatility. Additive mod-19ls were known for their clear economic interpretations and easy calibration but not knownto reproduce multifractality. That was before LeBaron (2001) reproduced multifractalityand long-memory with a simple three factor model where the variables represented threediﬀerent and speciﬁcally chosen frequencies. Hence, a simple additive model with correctlychosen frequencies can reproduce the expected stylised facts. This led to the HAR modelwith the three, short, medium and long frequencies deﬁned as: day, week and month.Diﬀerent papers studied the asymmetric propagation of the volatilities between diﬀerenttime scales. The HARCH model introduced by Muller, M¨uller et al. (1997); Dacorognaet al. (1997) followed the heterogeneous market hypothesis to decompose the volatility intodiﬀerent time-scale dependent components to prove this asymmetry. An interpretationbehind this eﬀect is the heterogeneous objectives of markets participants, one of them beingthe investment horizon. In other words, the daily volatilities impact short-term tradersmore than long-term ones while low frequency movements impact both. A mathematicalrepresentation of this idea creates an additive cascade of the diﬀerent market components.Corsi (2004) build the cascade with an AR (1) models of the RV at each frequency with acoeﬃcient to relate it to the closest lower frequency component.This model reproduces the long-memory and multifractality behaviours without beingin the long-memory model class and keeps an economic interpretation. In their paper, theystudied the coeﬃcients to learn which frequency is driving the volatility move. They usethe observation that the low frequency coeﬃcients are non-negligible to explain the dailyvolatility moves as an argument for the volatility cascade. The model recreates the volatility cascade by incorporating dependencies on lower scalevolatility components at each frequency. Thus the return process is a function of thecascade’s highest frequency component, which is the daily volatility σ dt here. Let us considerthe returns r t to follow: r t = σ dt (cid:15) t , (cid:15) t ∼ N (0 , RV st represents the observed Realised Volatility at t for time scale s . They deﬁne the daily RV by: RV dt = (cid:118)(cid:117)(cid:117)(cid:116) M − (cid:88) j =0 r t − j ∆ , with ∆ the sampling frequency and M the resulting number of points. By doing so theyobtain a daily volatility computed from high frequency returns. The lower-frequency RVsare built by averaging the daily ones. e.g. the weekly RV is: RV wt = (cid:0) RV dt − d + RV dt − d + · · · + RV dt − d (cid:1) / , Following the volatility cascade idea, at each time scale the unobserved partial volatilityprocess ˜ σ st has an AR (1) structure with a coeﬃcient for the expected volatility at the nextlower scale. If we consider three frequencies, d daily, w weekly, m monthly the cascademodel reads: ˜ σ mt +1 m = c m + φ m RV mt + ˜ ω mt +1 m , ˜ σ wt +1 w = c w + φ w RV wt + γ w E t [˜ σ mt +1 m ] + ˜ ω wt +1 w , ˜ σ dt +1 d = c d + φ d RV dt + γ d E t [˜ σ dt +1 d ] + ˜ ω dt +1 d , where c s corresponds to a constant, phi s the coeﬃcient of the RV variable and γ s the coeﬃ-cient representing the dependence on the closest lower scale. The noises ˜ ω st are independentin time and between each others. By recursion we obtain:˜ σ dt +1 d = c + β d RV dt + β w RV wt + β m RV mt + ˜ ω dt +1 d , Where β s represent the diﬀerent coeﬃcients. With the relation between the daily latentvolatility measure ˜ σ dt +1 d and its estimate RV dt +1 d written as ˜ σ dt +1 d = RV dt +1 d + ω dt +1 d , weobtain the following time series representation of the cascade model: RV dt +1 d = c + β d RV dt + β w RV wt + β m RV mt + ω t +1 d . Where ω t +1 d = ˜ ω dt +1 d − ω dt +1 d . Hence the variance is modelled as a combination of diﬀerentfrequency components. Due to its simple additive form the authors suggested the use of21lassic least square regression to compute the parameters. In their paper they used all theavailable history as an increasing window on which to ﬁt the model. Hence the ﬁrst pointuses only 30 days while the last one uses the whole dataset. In their original paper theyalso discussed using a moving window regression to obtain a time series evolution of theweight but did not show any results. In order to avoid negativity issues they used the logarithm of the realized-variance log ( RV lt ).They also extended the original HAR-RV to model the leverage eﬀect by adding variablesat each frequency representing past positive r + s and negative r − s returns, with s the timescale. The cascade model with leverage coeﬃcients γ for each time scale becomes: log (cid:0) RV dt +1 d (cid:1) = c + β d log (cid:0) RV dt (cid:1) + γ ( d )+ r ( d )+ t + γ ( d ) − r ( d ) − t , + β w log ( RV wt ) + γ ( w )+ r ( w )+ t + γ ( w ) − r ( w ) − t , + β m log ( RV mt ) + γ ( m )+ r ( m )+ t + γ ( m ) − r ( m ) − t , + ω t +1 d . In Corsi et al. (2008) they presented a HAR-RV model with Normal Inverse Gamma vari-ance to model the variance-of-variance. While this model did not out-perform alternativemodels for all of the tests, it did prove to be the best regarding forecasts of the distributionof the realized-variance. Moreover they showed that incorporating time variation of thevariance of RV, without necessarily using an inverse gamma distribution, produced betterresults for inference and for the ﬁt of the RV distribution.

The HAR-RV model of Corsi (2004); Corsi and Reno (2009); Corsi et al. (2008) allowedthem to identify which frequency, and thus which traders were inﬂuencing the stock. Onthe other-hand the SGDLM framework of Gruber and West (2016a,b); Zhao et al. (2016);22cAlinn and West (2016) allowed them to dynamically create a graph of connectionsbetween stocks in a market and incorporate this information into a multivariate distributionof the environment. Combining those two models creates a time varying DLM for each assetwith factors representing the inﬂuence of diﬀerent frequencies to identify which trader isdriving the moves and others for the inﬂuence of selected stocks in the market. Furthermore,Corsi et al. (2008) studied the importance of including a time varying noise component andmore speciﬁcally a Normal inverse gamma distribution to eﬃciently model the variance-of-variance distribution. The SGDLM model uses exactly that distribution for the noiseterms since it is the conjugate distribution of a Normal with unknown variance. In addition,both models follow a simple additive set-up, hence the extension of the classic HAR-RV toincorporate leverage and jumps is straightforward to include into the DLM.Let us recall the initial variogram study. The study of the evolution of the RV whilechanging the returns’ scale from daily to weekly allowed us to detect abnormal patternsin the time series, which were potentially related to option hedging. The combination ofthe HAR-RV and SGDLM models allow us to go further into this decomposition. Eachtime series is decomposed into two groups of factors. The exogenous factors consist ofthe inﬂuence of the environment on the series, and this group combines the variablesselected during the parent selection phase. Secondly, the endogenous factors represent theinformation internal to the time series. This is composed of the previous RV and leveragevariables for diﬀerent frequencies.With this decomposition we aim to understand whether the moves are being drivenby external or internal factors, and also at which frequency. In addition, it is interestingto study the evolution of the spread between endogenous and exogenous groups. Theintuition here is that a company with a variance being pushed mainly by external factorsshould behave diﬀerently from one which mainly follows endogenous ones. Looking at adiﬀerent scale, can we relate diﬀerent market situations to phases where the stocks arebeing mainly driven by exogenous factors, endogenous ones or that the spread betweenthese two is increasing. The results presented in sections 7 & 8 provide an answer of thesequestions. 23 .2 Extending the standard DLM

While Corsi (2004); Corsi and Reno (2009); Corsi et al. (2008) used high frequency intra-day data to obtain a daily-variance, in this study we only make use of of end-of-day prices.At this low frequency we deﬁne the s -scale RV as the sum of squared s -scale log-returns, r sj,k = (cid:80) t − i − sk = t − i r j,k , over a time window L . With a weighting kernel w , the s -scale-RV isdeﬁned by: RV sj,t = 1 s L − (cid:88) i =0 w i (cid:32) t − i − s (cid:88) k = t − i r j,k (cid:33) . We will use an exponential moving average weighting kernel for w . To reproduce theHAR-RV three scales framework we will use daily ( s = 1), weekly ( s = 5) and monthly( s = 20) frequencies. If we suppose unknown RV precision η j,t as in the original SGDLMmodel, the resulting RV dj,t is Normal-inverse-Gamma. Following Corsi et al. (2008) we usethe logarithm of the RV to avoid negativity issues, hence obtaining a log-Normal-Inverse-Gamma distribution for each time series.A DLM is deﬁned by a distribution on the output variable and another on the statevariables. Following the notations introduced for the SGDLM, we can write the HAR-RVcascade model of one stock j as: RV dj,t = F jt θ j,t + ν j,t ,θ j,t = G j,t θ j,t − + ω j,t , with: ν j,t ∼ N (0 , λ − j,t ) ,ω j,t ∼ N (0 , W j,t ) . Where θ j,t is the state vector and F j,t the vector of variables. The multivariate state-noiseis W t ∼ N (0 , V t ) with V t the covariance matrix. The evolution matrix G j,t is diagonal withnon-zero elements for the selected parents. Following the SGDLM approach we can extendthis DLM to include K parents from the m stocks in the environment under study. Thestate vector becomes: θ j,t = (cid:0) c j,t , β dj,t , β wj,t , β mj,t , γ ,t , . . . , γ K,t (cid:1) T , β represent the coeﬃcients of the endogenous variables, here the oﬀset and threefrequencies from the HAR-RV model, and the γ the coeﬃcients of the exogeneous ones.Hence the corresponding vector of variables is: F j,t = (cid:0) , RV dj,t , RV wj,t , RV mj,t , RV dsp t (1) ,t , . . . , RV dsp t ( K ) ,t (cid:1) . Extending this set-up to include the leverage eﬀect is straightforward. The complete modelwill therefore be composed of internal coeﬃcients, the realised variance and leverage eﬀectcoeﬃcients at diﬀerent frequencies, and external ones, the group of daily realised variance ofthe selected parents. While for the HAR-RV model they considered the variance at diﬀerentscales to be the average of the daily one over diﬀerent time windows we considered it to bethe RV over the same time window but with diﬀerent frequencies of returns. We will referto this combined model as H-SGDLM.

In this section we will a present an example of a possible extension of the H-SGDLM toincorporate new endogenous variables. As a result of the simple additive structure of theH-SGDLM it is easy to extend as long as the constraint of having Normal distributions isrespected. A day of trading is often summarized with the Open, High, Low, Close (OHLC)data. Open and Close correspond to the ﬁrst and last traded price of the day while, Highand Low are the highest and lowest traded price of the day. We studied the correlation ofdiﬀerent metrics combining those variables and the underlying time series of close prices.Let us note a stock price as S then the variables with the highest correlation are: r lowt = log ( S lowt ) − log ( S lowt − ) ,CH t = S hight − S closet S hight − S lowt − . ,COHL t = S closet − S opent S hight − S lowt . Following the idea of incorporating the OHLC data to make the model more responsive torecent moves we can add them to the vector of endogenous variable. Instead of increas-ing its side we deleted the less informative variables from the previous formulation which25orresponded to the RV variables computed of a time window of a week. Thus we obtain: H j,t = (1 , RV dj,t , RV wj,t , RV mj,t , ( r dj,t ) , ( r wj,t ) , ( r mj,t ) ,r lowj,t , CH j,t , COHL j,t ,r ( d )+ j,t , r ( d ) − j,t , r ( w )+ j,t , r ( w ) − j,t , r ( m )+ j,t , r ( m ) − j,t ) T . In our simulations, incorporating OHLC variables did not improve the performance of theH-SGDLM when applied to model the log-RV. But when in section 8.2 we adapted theH-SGDLM to predict the logarithm of the prices on diﬀerent environment the OHLC dataimproved the results.

Since the SGDLM framework decomposes the multivariate distribution into independentDLMs they can be updated in parallel. Gruber and West (2016b) used a GPU implemen-tation developed in C++ with the CUDA library from Nvidia and ran it on a cluster ofGPUs. We used a simpler approach in the form of the TensorFlow library. This librarydeveloped by Google is primarily made for neural networks and deep learning architecture.Nevertheless, the wide array of functions available are all optimised to process large ten-sors on GPU; hence, we used TensorFlow as a library for tensor computation on GPU andimplemented the H-SGDLM with it.The two main implementation issues are ﬁrstly, the matrix inversion for the Kalmanupdate, and secondly the size of the matrix of the full multivariate distribution. For thematrix inverse we added a regularization step to guaranty its positive deﬁniteness beforeinversion and monitor its values to avoid any divergence. In addition, the scale of thevariables used in the model also has an impact, which is why we rescale them. For thememory issue we used the fact that the coeﬃcient matrix Θ t , the update matrix G t and thecovariance matrices are all sparse. Unfortunately, at the time of writing the TensorFlowlibrary did not have the necessary functions to perform the sparse matrix manipulationswe needed. Hence, we stored a dense version of those matrices and a matrix of indices to26ebuild them when needed for computation. While this is not optimal, it suﬃces for thepurpose of testing our model.Due to the memory consumption of these matrices and the limited memory available ona single GPU some parameters had to be bounded. More speciﬁcally, the number of allowedparents into the core, up and down sets and the number of time step between the updateof those groups. The GPU memory places limits on the feasible number of Monte-Carlosamples too. One option would be to store a set of samples on the permanent memoryand continue sampling on the GPU, however the added memory transfer counterbalancesthe gain in GPU processing. Since we only had one GPU at our disposal we could notreproduce the approach taken in Gruber and West (2016b) of distributing the sampling onmultiple GPUs to allow for larger sets. Before studying real data we tested the algorithm in a simulated environment. Each stockprices is assumed to follow a random walk with a Brownian motion with diﬀerent means andvariances which are distributed randomly but constant through time. With this process wecreated a dataset of 300 time series. The parent selection using the multivariate Wishartdoes not show any clear clear pattern; the connections are noisy and seem random. Thisbehaviour is also reﬂected in the exogenous coeﬃcients, their absolute value stays lowthroughout with no clear patterns. All of these observations meet the key objective; namelythat given a random environment with no connections between the time series, indeed themodel does not ﬁnd any. For the endogenous variables, the autoregressive coeﬃcient of theRV at the previous time point stands out from the rest. Again, this was expected since thesimulated time series follow a random walk.27 .2 Description of the dataset

In this section we use data from European stocks. The data corresponds to the 487 mostliquid stocks selected from thousands of stocks in diﬀerent European markets. We startthe backtest from 2000 but since not all stocks have such a long trading history we assume log ( RV ) to be zero when the price is missing. With that number of time series the availableGPU memory only allows us to work 500 Monte Carlo samples. For the exponentialaverages we used a parameter ρ = 0 .

98 and a time window of rv L = 40 days. Since weinitiate the variables randomly the model needs some time to converge, so all the graphicsand data shown below are between 01 / / / / nCore = 5, forthe up and down sets nU p = nDown = 5. With a time interval of ∆ T = 10 the core setwill need a minimum of 10 days to be completely updated. These numbers were also chosento obtain a number of parents’ variables less than 15. The objective is to have a balancebetween the number of endogenous and exogenous variables, as well as a low percentageof parents relative to the environment. In this section we compare the performance of ourproposed algorithm with the previously published HAR-RV and SGDLM models. Sincethey did not use OHLC variables we wont either. In the following section 8 we apply themodel on diﬀerent investment environments. This section compares the one step ahead inference computed from the proposed modelto those from the HAR-RV and SGDLM models. Our model was developed to study theevolution of the coeﬃcients to learn which one is driving the market, hence the quality of theone step ahead inference is used to assess the accuracy of this decomposition. We ran thealgorithm on daily log ( RV ) which produced one day ahead predictions. To put the metricsin perspective it is interesting to realize that the average daily return of log ( RV ) over thewhole dataset is 0 .

0, while its standard deviation is 0 .

28. Therefore, simply predicting aconstant value for the next day already produces a good mean absolute deviation (MAD),28ence we consider this case as model t − R of Minor-Zarnowitz, Root Mean Squared Error (RMSE) and Mean Absolute Deviation (MAD).West et al. used the MAD to assess the one point ahead forecast and the performance ofa portfolio optimisation to assess the quality of the multivariate model and especially theresulting sparse covariance matrix. We ran on our dataset the HAR-RV model with leverageeﬀect and the SGDLM model with only a constant and the previous value as endogenousvariables. All the metrics in table 1a correspond to averages over the individual onesobtained for each time series. The similarity between the HAR-RV model and t − log ( RV t − ) variable has a weight close to 1.When assessing inference of a Bayesian model, the median absolute deviation betweenthe predicted mean of the distribution and the target time series does not take into accountthe quality of the inferred distribution. A more interesting metric can be computed by usingthe predicted conﬁdence interval and the percentage of measured points which occurredinside it. The better the forecast, the smaller the conﬁdence interval and the higher thepercentage of points in the predicted interval should be. Since the HAR-RV employed aclassic least-square parametrization, we used a multiple of the squared standard deviationof the predictions over the past 30 points to compute the conﬁdence interval. In orderto compare the performance of the three models, HAR-RV, SGDLM and ours, we used amultiple of the standard deviations to obtain similar sizes of conﬁdence interval and thuscompare the percentage of correct predictions for that interval. Regarding the choice ofthe interval, we selected a size smaller than the twice the average move size, i.e. for anaverage move size of 5% we select a conﬁdence interval lower but close to 10%. To comparethe three models we select an average conﬁdence interval of similar size so we can comparethe predicted move direction and potential conﬁdence interval increase at the jumps. Theperfect model should have very low conﬁdence interval on average with large increase whena jump is predicted.Since the motivation behind combining endogenous information representing the volatil-29ty cascade and exogenous information representing the inﬂuence of the market was to pre-dict the underlying risk, we expect our model to perform well in predicting large moves.Hence, Table 2a shows the percentage metric for moves with an absolute return larger thana certain threshold. But, since 68 ,

7% of the moves of log ( RV ) in our dataset are negativethe quality of the model is assessed by the prediction of increases in variance. Thereforewe also show in Table 2b the percentage of correct predictions when focusing on volatilityincreases above a certain threshold. In every scenarios our model correctly predicted morethan 62 .

0% of the moves, positive or negative. While it correctly predicted 63 .

89% of moveslarger than 9 .

28% and 63 .

95% of the increases higher than 10 . .

14% and 40 .

41% respectively. Our combined model behaves as expected and outper-forms both previous approaches for positive, negative, small and large moves. Interestingly,the bigger the moves the better the performance of our model and the bigger the gap withthe other two models.Table 1: Performance metrics of the one-day-ahead inference obtained for 487 Europeanstocks. Model t - 1 HAR-RV SGDLM FullMedian ADV 0 .

010 0 .

012 0 .

015 0 . .

038 0 .

040 0 .

050 0 . R .

990 0 .

989 0 .

985 0 . .

977 1 .

006 0 . (a) One day ahead inference comparison between diﬀerent models. Median ADV corresponds tothe Median Absolute deviation. Median RMSE is the median of the Root Mean Squared Errors.M-Z coeﬃcient corresponds to the Minor-Zarnowitz regression coeﬃcient, i.e. the regression ofthe observed values on the predicted ones. .

96% 3 .

97% 3 . .

55% 49 .

43% 62 . | r | ) = 5 .

73% Interval size 10 .

75% 10 .

74% 10 . .

41% 37 ,

43% 62 . | r | ) = 7 .

55% Interval size 14 .

22% 14 .

20% 14 . .

43% 34 .

5% 62 . | r | ) = 9 .

28% Interval size 18 .

02% 18 .

04% 18 .

06% in 53 .

24% 34 .

69% 63 . (a) In order to compare the diﬀerent models, we ﬁxed a conﬁdence interval size but allowedfor diﬀerent means in the predicted distribution. This table shows the percentage of measured log ( RV ) which were in the predicted conﬁdence interval for the selected absolute-returns abovea certain threshold. The ﬁrst columns shows the average absolute-return, mean( | r | ), of thoseselected points. The conﬁdence intervals are selected to be smaller than twice the average-returnof the selected points, here mean( | r | ). This table shows the ability of our model to predict large,positive and negative, moves of any size. Model HAR-RV SGDLM Fullwith mean( r ) = 6 .

92% Interval size 13 .

06% 13 .

11% 13 . .

30% 43 .

75% 62 . r ) = 8 .

48% Interval size 16 .

63% 16 .

71% 16 . .

80% 43 .

30% 64 . r ) = 10 .

04% Interval size 19 .

71% 19 .

71% 19 . .

14% 40 .

41% 63 . (b) In order to compare the diﬀerent models, we ﬁxed a conﬁdence interval size but allowedfor diﬀerent means in the predicted distribution. This table shows the percentage of measured log ( RV ) which were in the predicted conﬁdence interval for selected returns above a threshold.The selection corresponds to only positive moves. The ﬁrst columns shows the average return,mean( r ), of those selected points. The conﬁdence intervals are selected to be smaller than twicethe average-return of the selected points, here mean( r ). This table highlights the ability of ourmodel to predict large upward moves of any size. .4 Evolution of the coeﬃcients The algorithm was built to explain what is driving the time series by decomposing it intodiﬀerent economic factors. We assessed the accuracy of this decomposition in the previoussection by measuring the quality of the direct one step ahead inference. Going into moredetails, Figure 2 shows the evolution of the endogenous variables through time for thecompany Actividades de Construccion y Servicios SA (ACS), which we chose randomly outof the 487 stocks. Hence, at any time t this graphic shows which variable and frequencyis mainly inﬂuencing the stocks’ behaviour. We did not put the corresponding graphicobtained by the HAR-RV model since in that case only the coeﬃcient of the previous RVvalue was meaningful throughout the dataset.Figure 2: This graphic shows the evolution of the endogenous coeﬃcients for the companyACS. The solid lines correspond to the RV at diﬀerent frequencies while the dashed onesare for the leverage eﬀect coeﬃcients. At each time t this ﬁgure shows which variable ismainly driving the volatility. In addition, the sequential update of these coeﬃcients allowsus to observe on this ﬁgure how they evolve through time.Figure 3 shows the evolution of the exogenous coeﬃcients through time for the company32CS. The variables correspond to the parents which are selected dynamically with the twostep process described in Section 4.2. Hence, as the coeﬃcients evolve with the selectedparents, the new ones will start close to zero. This eﬀect is responsible for the observablemean reversion on the graphic. As with the endogenous variables, at any time t this graphicis showing which stock in the market is mainly inﬂuencing the move. An interesting aspectwe can see on this Figure 3 is the decrease in activity of these coeﬃcients between 2010and 2016 after the ﬁnancial crisis. This observation motivated us to study how the meanabsolute values of these coeﬃcients evolve through time.Figure 3: This ﬁgure shows the evolution of the exogenous coeﬃcients for the companyACS. At each time t the lines correspond to the value of the coeﬃcients of the selectedcore-parent group variables. It shows at each time t which parent is mainly inﬂuencing thevolatility of ACS. Also, it shows how the coeﬃcients of the diﬀerent exogenous variablesevolve through time.To understand how these patterns are related to the underlying time series we clusteredthe coeﬃcients into three groups: a ﬁrst group including all the RV variables, rv , a leverageone for the leverage eﬀect coeﬃcients, r , and an exogenous group, core . These groups areconstructed by summing the absolute values of the individual coeﬃcients. We then sum33he values of the individual groups to obtain a market view. The log-realised-variance, log ( RV ), of the market is modelled by an average over the individual ones. Figure 4 showsthe evolution of the diﬀerent groups and the market on the same graphic. Even fromthis low scale view we can observe correlations between the behaviour of these diﬀerentvariables; not only between the market and each group separately, but also between themarket and the spread between the diﬀerent groups.Before commenting on Figure 4, it is interesting to think about what we should observe.We expect a stock to behave diﬀerently when its move is driven more by exogenous factorsthen endogenous ones and vice-versa. More interestingly, when the behaviour of a stockis changing, e.g. going from endogenously to exogenously driven, that should impact itsvariance. One interpretation is that when the variance of the stock is driven by endogenousfactors its behaviour is mainly inﬂuenced by internal information and thus less sensitive tothe market. Hence, it should be more predictable which should result in a lower variance.On the other hand, when the contribution of the exogenous factors increases that may becorrelated with an increasing inﬂuence of the market and thus uncertainty on that companywhich should result in higher variance.In Figure 3 we could observe a recent increase in activity of the exogenous variablesfrom the end of 2016. In the market view of Figure 4 we can clearly see the spike of theexogenous group at the end of 2016, followed by a spike in the variance of the market.While Figure 4 is interesting for having a global view and deﬁning the relations betweenthe diﬀerent groups and the market, the approach described in the next section will give aclearer view of the strength of this relation. We want to leverage these correlations between the diﬀerent groups and the underlying timeseries to make predictions. To do so we will focus on predicting either an increase, decreaseor neutral move. Since we previously observed that each group seems to be correlated tothe time series at diﬀerent frequencies we used a simple discrete derivative approximation34o detect a change of trend for each of those frequencies. The derivative is approximated bythe diﬀerence between the value at t and with a lag at t − l . We only considered multiples of5 days for the lag l . For example, for daily moves we used the data up to year 2010 to ﬁndtime lag l , for which the endogenous group was most correlated to log ( RV ); this turnedout to be one month l = 20. At each time t , we look at the sign of the diﬀerence betweenthe value of the endogenous group at time t and t −

20 to infer whether the log ( RV ) willincrease, decrease or stay neutral for the next day.With this set-up we looked at the correlation between the moves of the groups and log ( RV ) at diﬀerent frequencies: daily, two days in advance, weekly, by-weekly and monthly;i.e. lags= { , , , , } . We used this change point approach to produce inferences on thevariance at those diﬀerent frequencies. Each signal produces a value in the set {− , , } which corresponds respectively to a predicted decrease, constant or increase in variance atthe frequency under study. Hence when we use a monthly lag, i.e. 20 we take one pointsof the data set every 20 and use the diﬀerence of the signals to predict the move for thenext time point, i.e. 20 days later. In other words, we make one month ahead prediction.Interestingly, each group produces their most accurate prediction at diﬀerent frequencies.Since each signal is evolving at a diﬀerent frequency and related to a diﬀerent economic in-formation, they provide diﬀerent insight on the behaviour of the underlying time series. Wesum all of them to obtain the ﬁnal inference. To reduce the signals’ volatility we can adda threshold representing the conﬁdence on the predicted value. For example, by putting athreshold of 2, the absolute sum of the signals must be higher than 2 for a position to beentered i.e. more than 2 diﬀerent signals need to agree on the same direction.Using the previously described logic, we performed a backtesting on the underlying log ( RV ) to demonstrate the quality of the prediction. In order to assess the performancefor the whole environment we constructed an equally weighted portfolio and compared it tothe market. The linearity observable in Figure 5 shows that the quality of the prediction isconstant through time. Also, if we allow for only two states be it either short-only, {− , } ,or long-only, { , } , they both perform as well, hence the prediction does not show any biastoward a certain move direction. Another interesting aspect resides in the prediction for35iﬀerent time scales. Running the change point for diﬀerent time scales on the group of RV coeﬃcients and computing the EW portfolio for each of them produces Figure 6. Itrepresents the performance of the EW portfolio using signals created with the coeﬃcientsstudied at the diﬀerent lags (in days) { , , , , } . E.g. a lag s = 20 corresponds toan update of the prediction every 20 days. The usual issue with long term predictions isthe potential size of the drawdowns due to the re-sampling frequency. While more volatilethan the lower frequencies the portfolio with monthly update has small draw-downs.Let us recall the question raised in the previous section: is the variance increasingwhen the spread between the exogenous and endogenous variables increases or decreases?Following this change point set-up Figure 7 shows the diﬀerence between the exogenousgroup minus the endogenous ones for the European stocks we are studying. For the dif-ferent frequencies we looked at, { , , , , } , the variance increases when the exogenousgroups become more important than the endogenous one, and vice versa. In other wordthe behaviour of the variances is positively correlated with the spread exogenous minusendogenous groups. This corresponds to the results we expected, i.e. the more a varianceis being driven by external factor the more uncertain its behaviour is and thus its vari-ance increases, while if the endogenous coeﬃcient are dominants the stock moves due toinformation proper to itself and thus in a more predictable manner which decreases thevariance. However, as we can see on Figure 7 the quality of that connection is not constantwith time and depends on the market conditions. While the previous section focused on the direct inference of log ( RV ), this section willdiscuss the application on market stress. The SGDLM model uses a multivariate Wishartdistribution to compute a graph of the market and select the parents for each DLM. Withthis selection each DLM only has a few parents representing the stocks with the highestinﬂuence. We expect to see stocks that are driving a sector and others that are followers.In Figure 8 we plotted the evolution of the matrix of parents’ relation through time. Ateach time t we counted the number of core parent sets each stock is part of. The higher36he number of sets the blacker and vice-versa. By doing so, we want to identify the stocksthat are in many core-sets and thus inﬂuencing the environment. Indeed, by looking atthis matrix we can identify few stocks that have a clear inﬂuence in the market. Anotherway to see this would be to consider the market risk to be concentrated on a few stockswhich moves could have systemic impacts. We can also observe a clear change occurringafter the ﬁnancial crisis and after 2016.The Figure 8 is an interesting argument in favour of an asymmetric sparse covariancematrix. Indeed with the classic multivariate approach modelling the cross-relationshipswith a symmetric covariance this matrix would not exist. This assymetry exists becausethe SGDLM decomposes the parent update into two phases, the ﬁrst using the multivari-ate Wishart to select the potential candidates from the precision matrix; and the secondcomputing the variables’ coeﬃcients as a regression in each individual DLM to predict themove of the stock and selecting the parents according to their signal-to-noise ration. Andfor a stock to stay in the core group, which Figure 8 represents, it needs to pass the twophases i.e. have a high precision element in the multivariate Wishart, and a high signal-to-noise ratio when included in the up set of the SGDLM and part of the DLM’s variables.Thus, the coeﬃcients of the core-group variables represent the strength of the relationshipbetween the stock and its selected parents; and the parents selected in the core group thestocks with the highest signal-to-noise ratio.In Corsi et al. (2008) they modiﬁed the original HAR-RV to model the vol-of-vol, orvariance of the variance. This vol-of-vol represents the variance of the DLM and hence theconﬁdence in its prediction and decomposition. If that variance increases, either it is dueto a bad ﬁt from the DLM or it represents an increase in uncertainty for the variance ofthat stock. The inferred variance of variance is on average much lower than the computedone although it does spike during jumps of variances. In addition, the percentage of cor-rect predictions shown in table 2b when restricting the analysis to large moves is anothermeasure of the quality of the inferred variance of variance.Gruber and West (2016a) used the KL divergence as a market stress indicator and37ompared it to the St. Louis Fed Financial Stress Index (STLFSI). They observed thatboth metrics increased at the same time to the point of considering their metric to be betterthan the STLFSI index since it reacted faster. It turns out this index is quite similar to thecomputed estimate of the market log ( RV ). Thus, we used the change point algorithm onthe diﬀerent group variables described previously to produce an inference of this index. Asexplained previously, we can add a threshold on the value of the sum of the added signal,to show how that can impact the EW portfolio; Figure 9 used a threshold of 4, whichmeans the absolute sum of the signals must be higher than 4 for a position to be taken.Enforcing such a high threshold diminishes the number of trades. But, Figure 9 shows thequality of the prediction after 2007. It is not clear why the prediction between 2004 and2007 stagnate, could be due to the low variability of the STLFSI itself. With exactly the same set-up as in section 7 for EU stocks, i.e. without OHLC data, we ranthe algorithm on US stocks from the S&P500. While we used diﬀerent stock exchanges andcountries to select the most liquid European stocks for the US market we use a diﬀerentmethodology. We select the stocks currently present in the S&P500 index but only selectthe 376 stocks that have a history since 2000.Because all the stocks in this environment belong to the same stock index we expect tosee more connections between the time series in the index and thus the SGDLM model toperform much better than in the European environment which mixed many stock exchangesand countries. In terms of metrics the direct one step ahead prediction does not perform aswell as for the European environment which is probably due to the lower amount of infor-mation contained in the HAR variables. We followed the same methodology as describedin section 7.3 for European stocks. As for the European stocks, in the case of large moves38ur model correctly predicted more than 60 .

0% of the moves, positive or negative. Whileit correctly predicted 65 .

93% of moves bigger than 11 .

25% and 63 .

83% of the increases invariance larger than 20 . .

86% accuracy and, while the SGDLM performance is close to ourmodel for small moves the gap increases proportionally to the size of the moves considered.Table 4a shows the percentage metric for moves with an absolute return larger than acertain threshold. And, Table 4b the percentage of correct predictions when focusing onvolatility increases alone.Table 3: Performance metrics of the one-day-ahead inference obtained for stocks from theS&P500. Model t - 1 HAR-RV SGDLM FullMedian ADV 0 .

011 0 .

014 0 .

018 0 . .

040 0 .

041 0 .

047 0 . R .

991 0 .

988 0 . .

995 1 .

007 0 . (a) One day ahead inference comparison between diﬀerent models. Median ADV corresponds tothe Median Absolute deviation. Median RMSE is the median of the Root Mean Squared Errors.M-Z coeﬃcient corresponds to the Minor-Zarnowitz regression coeﬃcient, i.e. the regression ofthe observed values on the predicted ones. Figure 10 shows the evolution of the diﬀerent groups rescaled to be on the same graphand compared to the averaged log ( RV ) of the S&P500. Recall that on EU data as shownin Figure 4 the evolution of the diﬀerent groups seemed to be positively correlated to thevariance of the environment. On the US data this correlation seems just as strong as wecan see in Figure 10. We also perform the same backtesting strategy using the signals fromthe change point algorithm and show in Figure 11 the EW portfolio versus the log ( RV )of the market. As previously the performance is constant through time even during theﬁnancial crisis, highlighting further the robustness of our model.39able 4: Performance metrics with conﬁdence interval of the one-day-ahead inference ob-tained for 390 stocks from the S&P500.Model HAR-RV SGDLM Fullwithout threshold Interval size 4 .

37% 4 .

37% 4 . .

02% 62 .

47% 62 . | r | ) = 5 .

74% Interval size 10 .

18% 10 .

16% 10 . .

72% 60 .

32% 61 . | r | ) = 11 .

25% Interval size 22 .

07% 22 .

00% 22 . .

21% 64 .

56% 65 . | r | ) = 19 .

96% Interval size 39 .

35% 39 .

34 39 .

40% in 50 .

23% 61 .

00% 63 . (a) In order to compare the diﬀerent models, we ﬁxed a conﬁdence interval size but allowedfor diﬀerent means in the predicted distribution. This table shows the percentage of measured log ( RV ) which were in the predicted conﬁdence interval for the selected absolute-returns abovea certain threshold. The ﬁrst columns shows the average absolute-return, mean( | r | ), of thoseselected points. The conﬁdence intervals are selected to be smaller than twice the average-returnof the selected points, here mean( | r | ). This table shows the ability of our model to predict large,positive and negative, moves of any size. Model HAR-RV SGDLM Fullwith mean( r ) = 6 .

97% Interval size 12 .

18% 12 . . .

86% 59 .

80% 60 . r ) = 11 .

92% Interval size 22 .

02% 22 .

05% 22 . .

07% 61 .

08% 62 . r ) = 20 .

27% Interval size 40 .

02% 40 .

08% 40 . .

13% 61 .

96% 63 . (b) In order to compare the diﬀerent models, we ﬁxed a conﬁdence interval size but allowedfor diﬀerent means in the predicted distribution. This table shows the percentage of measured log ( RV ) which were in the predicted conﬁdence interval for selected returns above a threshold.The selection corresponds to only positive moves. The ﬁrst columns shows the average return,mean( r ), of those selected points. The conﬁdence intervals are selected to be smaller than twicethe average-return of the selected points, here mean( r ). This table highlights the ability of ourmodel to predict large upward moves of any size.

40s previously, we wanted to answer the question: is the variance increasing when thespread between the exogenous and endogenous variables increases or decreases? To do so wecomputed the EW portfolio from the signal build for the diﬀerent frequencies { , , , , } (in days) using the diﬀerence between the exogenous and endogenous groups. Figure 12shows the results for those diﬀerent frequencies. As in the EU market the variance ispositively correlated to the diﬀerence between these two groups. And the link seems evenstronger than the EU market since the backtest is more linear and smoother than in Figure7. Naturally, on the lower monthly-frequency the connection is not as robust as for thehigher ones. In the previous sections we focused predicting the log-realised-variance of US and EU stockswithout using OHLC data, in this section we will show the results on diﬀerent environmentsof futures with OHLC data. It is important to study the scale of the diﬀerent input-variableswhen applying this algorithm to diﬀerent datasets. To guarantee the numerical-stability ofthe computation the variables should have similar scales. Also, it is important to modifythe number of allowed parents to retain sparsity relative to the size of the new environment.While we previously worked with the variance, we can adapt the framework to workdirectly with prices. Following the logic of the information cascade we replace the diﬀerentrealised variance variables by the mean of the prices at these same diﬀerent frequencies:daily, weekly and monthly. In addition, instead of incorporating the previous squaredreturns, we use an exponentially weighted average of the returns over the same frequencies.On the other hand, the leverage eﬀect coeﬃcients are not modiﬁed. Thus we keep thedecomposition into groups with diﬀerent and complementary economic information.In order to demonstrate the robustness of our approach we now apply the model onETFs and FX futures. We dispose of 75 futures on ETF and 22 on FX-rates. Comparedto the previous examples we only had to modify the number of allowed parents to adaptto the smaller dataset. We chose a number of parents in the core set of 5 for both the41TFs and FXs environment, but a up-set size of 5 for the ETFs and 3 for the FXs.ETF and FX futures have diﬀerent behaviours from stocks but still follow this inﬂuenceof endogenous and exogenous information. Hence the model is still able to capture thosediﬀerent connections and create meaningful signals. To show the ﬂexibility and robustnessof our proposed model we ran the algorithm on weekly data instead of the daily one. I.e.we selected one point every ﬁve days and considered this new dataset as the input for thealgorithm. Thus the one step ahead inference now corresponds to one week ahead. We usedthe EW portfolio to assess the performance on these datasets. The combination of signalsbuilt from endogenous and exogenous groups at diﬀerent frequencies produces a portfoliowith low volatility especially on large downward moves, as can be seen in Figures 13 forthe ETF and 14 for the FX. As we used weekly data for this example the median lengthof trades was 2 weeks for both environments.

In this paper we proposed a new model that gives a better understanding of the underlyingcause of price ﬂuctuations and what it reveals about a speciﬁc asset and the market asa whole. In particular, we proposed a heterogeneous autoregressive time varying simulta-neous graphical multivariate dynamic linear model, H-SGDLM. This model can eﬃcientlymodel large-scale environments including the cross-series relationships. It is easy to ﬁt,does not assume stationarity of the underlying time series, and updates itself sequentially.The proposed approach decomposes the price movement into diﬀerent known underlyingeconomically-meaningful variables to produce a better understanding of the behaviour ofthe collection of time series; it can explain at any time t which ones are likely to be drivingthe moves, for example, at which trading frequency and if changes are due to endogenousor exogenous factors.We assessed the quality of our decomposition by looking at the forecasting performanceof our model. We showed that it outperformed previous models in predicting large moves;positive and negative ones alike. In addition, the quality of these predictions are relatively42onstant through time, within diﬀerent markets and for multiple assets types. This de-composition creates new insights into the behaviour of a speciﬁc asset and the rest of themarket. In order to leverage those new signals, we clustered them into endogenous and ex-ogenous groups. By studying the link between those groups and the underlying time serieswe could obtain accurate long-term predictions that worked well both for individual assetsas well as the market as a whole. This allowed us to produce more than a month aheadinference regarding the evolution of the variance of the asset and the market. Furthermore,these signals were shown to be reliable enough to predict the STLFSI stress index whilehaving learned only on a few hundred stocks.In this paper we focused on using the signals from the endogenous and exogenous groups,although our multivariate model can also produce many other interesting outputs such asan simultaneous directed cross-series graph, illustrating relationships between assets. Thisprovides many interesting directions for future research using our proposed model. SUPPLEMENTARY MATERIAL

10 SGDLM equations

Each posterior follow a Normal-Gamma distribution p ( θ j,t , λ j,t |D t ) ∼ N G ( m j,t , C j,t , n j,t , s j,t ) θ j,t | λ j,t , D t ∼ N ( m j,t , C j,t / ( s j,t − λ j,t )) ,λ j,t |D t ∼ G ( n j,t / , n j,t s j,t − / . y j,t the observed data at t the parameters follow the classic Kalman ﬁlterupdate: e j,t = y j,t − F Tj,t a j,t ,q j,t = s j,t − + F Tj,t R j,t F j,t ,A j,t = R j,t F j,t /q j,t ,z j,t = (cid:0) r j,t + e j,t /q j,t (cid:1) / ( r j,t + 1) ,m j,t = a j,t + A j,t e j,t ,C j,t = (cid:0) R j,t − A j,t A Tj,t q j,t (cid:1) z j,t ,n j,t = r j,t + 1 ,s j,t = z j,t s j,t − . The decoupling step approximate the multivariate posterior distribution by a product ofNormal-Gamma distributions p ( θ j,t , λ j,t |D t ) ∼ N G ( m j,t , C j,t , n j,t , s j,t ) whose parameters areobtain by minimising the Kullback-Leibler distance. m j,t = E [ λ j,t θ j,t ] /E [ λ j,t ] ,V j,t = E [ λ j,t ( θ j,t − m j,t ) ( θ j,t − m j,t ) T ] ,d j,t = E [ λ j,t ( θ j,t − m j,t ) T V − j,t ( θ j,t − m j,t )] ,s j,t = ( n j,t + p j,t − d j,t ) / ( n j,t E [ λ j,t ]) ,C j,t = s j,t V j,t .n j,t is updated through an optimisation step solving: log ( n j,t + p j,t − d j,t ) − ψ ( n j,t / − ( p j,t − d j,t ) /n j,t − log (2 E [ λ j,t ]) + E [ log ( λ j,t )] = 0 . t + 1 For the evolution from t to t + 1 the states θ j,t follow a random walk θ j,t +1 = G j,t +1 θ j,t + ω j,t . The coeﬃcient matrix G j,t is updated with the parents down set values following44 − (∆ T + 1 − l ) − for l = 1 : ∆ T . m j,t is updated to include a value for the new membersof the up-set. With B j,t +1 = G j,t +1 C j,t G Tj,t +1 the covariance matrix follow the followingblock discounting update: W j,t +1 = (cid:16) δ j,φ − (cid:17) B j,t +1 [: n E , : n E ] (cid:18) √ δ j,φ δj,γ − (cid:19) B j,t +1 [: n E , n E :]0 (cid:16) δ j,γ − (cid:17) B j,t +1 [ n E : , n E :]  . where n E corresponds to the number of external variables, vs. parents ones, δ φ the externalvariables factor and δ γ the parental one. The ﬁlters’ parameters follow: a j,t +1 = G j,t +1 m j,t +1 ,R j,t +1 = B j,t +1 + W j,t +1 ,r j,t +1 = β j n j,t . where this last equation represent the beta-stochastic-volatility model. The prior for t + 1 samples follow p ( θ j,t +1 , λ j,t +1 |D t ) ∼ N G ( a j,t +1 , R j,t +1 , r j,t +1 , s j,t ). Whichare used to compute Γ j,t +1 = [ θ j,t +1 ] ∀ j , Λ j,t +1 = diag ( λ ,t +1 , . . . , λ N,t +1 ) and µ t +1 = x j,t φ j,t +1 . Recall θ j,t = ( φ j,t , γ j,t ) T . Hence we obtain the mean and covariance matrixto sample y t +1 : A t +1 = ( I − Γ t +1 ) − , Σ t +1 = (cid:16) ( I − Γ t +1 ) T Λ t +1 ( I − Γ t +1 ) (cid:17) − ,y t +1 ∼ N ( A t +1 µ t +1 , Σ t +1 ) .

11 BibTeXReferences

Anderson, T., Bollerslev, T., Diebold, F., and Labys, P. (2000), “Great realisations,”

Risk ,URL http://scholar.google.comjavascript:void(0) .45orland, L. (1998), “Microscopic dynamics of the nonlinear Fokker-Planck equation: Aphenomenological model ,” physical review , 57.— (2002), “A theory of non-Gaussian option pricing,”

Quantitative Finance , URL .— (2004), “A multi-time scale non-Gaussian model of stock returns,” arXiv.org , URL http://arxiv.org/abs/cond-mat/0412526v3 .Borland, L. and Bouchaud, J.-P. (2005), “On a multi-timescale statistical feedbackmodel for volatility ﬂuctuations,” arXiv.org , URL http://arxiv.org/abs/physics/0507073v1 .Borland, L., Bouchaud, J.-P., Muzy, J.-F., and Zumbach, G. (2005), “The Dynamics ofFinancial Markets–Mandelbrot’s multifractal cascades, and beyond,” arXiv.org , URL https://arxiv.org/abs/cond-mat/0501292 .Calvet, L. E. and Fisher, A. J. (2001), “Forecasting multifractal volatility,”

Journal ofEconometrics , 105, 27–58, URL http://linkinghub.elsevier.com/retrieve/pii/S0304407601000690 .— (2004), “How to Forecast Long-Run Volatility: Regime Switching and the Estimationof Multifractal Processes,”

Journal of Financial Econometrics , 2, 49–83, URL http://jfec.oxfordjournals.org/content/2/1/49.short .Corsi, F. (2004), “A Simple Long Memory Model of Realized Volatility,”

SSRN ElectronicJournal , URL .Corsi, F., Mittnik, S., Pigorsch, C., and Pigorsch, U. (2008), “The Volatility of RealizedVolatility,”

Econometric Reviews , 27, 46–78, URL .Corsi, F. and Reno, R. (2009), “HAR volatility modelling with heterogeneous leverage andjumps,”

Available at SSRN 1316953 , URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.435.4217&rep=rep1&type=pdf .46acorogna, M. M., M¨uller, U. A., Pictet, O. V., and Olsen, R. B. (1997), “ModellingShort-Term Volatility with GARCH and HARCH Models,”

SSRN Electronic Journal ,URL .Diebold, F. X., Hickman, A., Inoue, A., and Schuermann, T. (1997),

Converting1-day volatility to h-day volatility: Scaling by sqrt(n) is worse than you think ,F. Diebold, URL http://scholar.google.com/scholar?q=related:GrOEHFKqSLQJ:scholar.google.com/&hl=en&num=20&as_sdt=0,5 .Gruber, L. F. and West, M. (2016a), “Bayesian forecasting and scalable multivari-ate volatility analysis using simultaneous graphical dynamic models,” arXiv.org , URL http://arxiv.org/abs/1606.08291v1 .— (2016b), “GPU-Accelerated Bayesian Learning and Forecasting in SimultaneousGraphical Dynamic Linear Models,”

Bayesian Analysis , 11, 125–149, URL http://projecteuclid.org/euclid.ba/1425304898 .Hull, J. C. (2017), “The Greek Letters,” in

Options, Futures, and Other Derivatives ,ed. P. E. Limited, pp. 421–452, URL https://ebookcentral.proquest.com/lib/imperial/detail.action?docID=5186416 .Johannes, M., Polson, N., and Stroud, J. (2005), “Sequential parameter es-timation in stochastic volatility models with jumps,” columbia.edu , URL http://scholar.google.com/scholar?q=related:fCswAqOxaHAJ:scholar.google.com/&hl=en&num=20&as_sdt=0,5 .LeBaron, B. (2001), “Stochastic volatility as a simple generator of apparent ﬁnancialpower laws and long memory,”

Quantitative Finance , 1, 621–631, URL .Lynch, P. E. and Zumbach, G. (2003), “Market heterogeneities and the causal structureof volatility,”

Quantitative Finance , 3, 320–331, URL . 47cAlinn, K. and West, M. (2016), “Dynamic Bayesian Predictive Synthesis in Time SeriesForecasting,” arXiv.org , URL http://arxiv.org/abs/1601.07463v3 .M¨uller, U. A., Dacorogna, M. M., Dav´e, R. D., Olsen, R. B., Pictet, O. V., and vonWeizs¨acker, J. E. (1997), “Volatilities of diﬀerent time resolutions — Analyzing thedynamics of market components,”

Journal of Empirical Finance , 4, 213–239, URL http://linkinghub.elsevier.com/retrieve/pii/S0927539897000078 .Muzy, J.-F., Delour, J., and Bacry, E. (2000), “Modelling ﬂuctuations of ﬁnancial timeseries: from cascade process to stochastic volatility model,” arXiv.org , 537–548, URL http://link.springer.com/10.1007/s100510070131 .Nakajima, J. (2014), “Bayesian Analysis of Multivariate Stochastic Volatility with SkewReturn Distribution,”

Econometric Reviews , 8, 1–23, URL .Noureldin, D., Shephard, N., and Sheppard, K. (2011), “Multivariate high-frequency-basedvolatility (HEAVY) models,”

Journal of Applied Econometrics , 27, 907–933, URL http://doi.wiley.com/10.1002/jae.1260 .Prado, R. and West, M. (2010),

Time series: modeling, computation, and inference ,Chapman & Hall Book CRC Press, URL https://books.google.com/books?hl=en&lr=&id=kUhKLLdGGZ4C&oi=fnd&pg=PP1&dq=Time+series+modeling+computation+and+inference&ots=F0XERR9f6n&sig=0O4heSO6OQIEAi_9PbxqhRAlRlI .Zhao, Z. Y., Xie, M., and West, M. (2016), “Dynamic dependence networks: Financialtime series forecasting and portfolio decisions,”

Applied Stochastic Models in Businessand Industry , 32, 311–332, URL http://doi.wiley.com/10.1002/asmb.2161 .Zumbach, G. and Lynch, P. (2001), “Heterogeneous volatility cascade in ﬁnancial mar-kets,”

Physica A: Statistical Mechanics and its Applications , 298, 521–529, URL http://linkinghub.elsevier.com/retrieve/pii/S0378437101002497 .48igure 4: Plots computed with data from 500 European stocks. The graphic shows theevolution of the diﬀerent groups (endogenous, with RV and R, and exogenous, denotedCore) rescaled to be on the same y − axis , which is left on the ﬁgure. The right axiscorresponds to the approximated log ( RV ) of the market, i.e. an equal weighted averageof the individual log ( RV ). This graphic highlights the connections between the behaviourof the market, represented here by log ( RV ), and the diﬀerent groups: RV, R and core.For example, looking at the volatility jump of the ﬁnancial crisis of 2008 the RV and Coregroup seem positively correlated with the market while the leverage group seems negativelycorrelated. What this graphic cannot show is if these connections have any predictive powerover the volatility of the market or if they simply react to it with a time lag, Figure 5 showsthis predictive relation. 49igure 5: Plots computed with data from 500 European stocks. The orange line showsthe performance of an EW portfolio built from the one-day-ahead inference. The greenline corresponds to the performance of the EW portfolio using the signals from the changepoint algorithm applied on the diﬀerent groups. The blue line shows the performance of theEW portfolio built by combining the one-day-ahead direct inference with the signals fromthe change points at diﬀerent frequencies. The simulated portfolios correspond to equalweighted portfolios over the individual time series. In order to highlight the constancethrough time of the quality of the inference we added in red and on the right Y-axis theevolution of the underlying log ( RV ) of the market.50igure 6: Plots computed with data from 500 European stocks. This ﬁgure plots theperformance of the EW portfolio computed with the signals from the group of externalcoeﬃcients (core group) created with the change points at diﬀerent frequencies. Thisgraphics shows that while the portfolios using signals updated at lower frequencies thandaily are more volatile they still have a linear performance through time and small draw-downs. 51igure 7: Plots computed with data from 500 European stocks. The ﬁgure shows theperformance of the EW portfolio computed with the signals from the diﬀerence betweenthe external and the RV groups using the change points algorithm at diﬀerent frequen-cies. Hence those plots show the positive correlation between the diﬀerence (exogenouscoeﬃcients - endogenous coeﬃcients) variables and the evolution of the volatility. In otherwords, when the volatility becomes more driven by exogenous factors than endogenous onesthe volatility increases and vice-versa. 52igure 8: Plots computed with data from 500 European stocks. The y − axis correspondsto the index of each of the 487 stocks while the time is on the x − axis . This graphic isa 2-dimensions representation of the evolution of the matrix of core-parental-set throughtime. For each stock, at each time t the darker the point the more core-parent sets it ispart of. Hence, with this graphic we can observe at each time t which stocks are inﬂuencingmany others, that corresponds to the ones with dark points. While, the stocks in white arefollowers; i.e. they are not inﬂuencing the move of any other stock in this environment.53igure 9: Plots computed with 390 stocks from the S&P500. The left axis is linked tothe blue line that represents the performance of the portfolio computed with the signalsfrom the change point algorithms using the evolution of the diﬀerent groups to predict theevolution of the STLFSI index. In red and linked to the right axis is the STLFSI stressindex re-based to 100 at the start of the backtest. This graphic shows the quality of theprediction of the stress index using the signals obtained from the diﬀerent groups from 390stocks from the S&P500. 54igure 10: Plots computed with stocks from the S&P500. The graphic shows the evolutionof the diﬀerent groups (endogenous, with RV and R, and exogenous, denoted Core) rescaledto be on the same y − axis , which is left on the ﬁgure. The right axis corresponds to theapproximated log ( RV ) of the market, i.e. an equal weighted average of the individual log ( RV ). This graphic highlights the connections between the behaviour of the market,represented here by log ( RV ), and the diﬀerent groups: RV, R and core. For example,looking at the volatility jump of the ﬁnancial crisis of 2008 the RV and Core group seempositively correlated with the market while the leverage group seems negatively correlated.What this graphic cannot show is if these connections have any predictive power over thevolatility of the market or if they simply react to it with a time lag, Figure 11 shows thispredictive capacity. 55igure 11: Plots computed with stocks from the S&P500. The orange line shows theperformance of an EW portfolio built from the one-day-ahead inference. The green linecorresponds to the performance of the EW portfolio using the signals from the changepoint algorithm applied on the diﬀerent groups. The blue line shows the performance ofthe EW portfolio built by combining the one-day-ahead direct inference with the signalsfrom the change points at diﬀerent frequencies. The simulated portfolios correspond toequal weighted portfolios over the individual time series. To highlight the consistencythrough time of the quality of the inference we added in red and on the right Y-axis theevolution of the underlying log ( RV ) of the market.56igure 12: Plots computed with stocks from the S&P500. The ﬁgure shows the performanceof the EW portfolio computed with the signals from the diﬀerence between the external andthe RV groups using the change points algorithm at diﬀerent frequencies. Hence those plotsshow the positive correlation between the diﬀerence (exogenous coeﬃcients - endogenouscoeﬃcients)(exogenous coeﬃcients - endogenouscoeﬃcients)