[PDF] Dynamic Ordering Learning in Multivariate Forecasting

Abstract

In many fields where the main goal is to produce sequential forecasts for decision making problems, the good understanding of the contemporaneous relations among different series is crucial for the estimation of the covariance matrix. In recent years, the modified Cholesky decomposition appeared as a popular approach to covariance matrix estimation. However, its main drawback relies on the imposition of the series ordering structure. In this work, we propose a highly flexible and fast method to deal with the problem of ordering uncertainty in a dynamic fashion with the use of Dynamic Order Probabilities. We apply the proposed method in two different forecasting contexts. The first is a dynamic portfolio allocation problem, where the investor is able to learn the contemporaneous relationships among different currencies improving final decisions and economic performance. The second is a macroeconomic application, where the econometrician can adapt sequentially to new economic environments, switching the contemporaneous relations among macroeconomic variables over time.

Full PDF

DDynamic Ordering Learning in MultivariateForecasting * † Bruno P. C. Levy

Insper ‡ Hedibert F. Lopes

Insper

Working Paper - January, 2021

Abstract

In many ﬁelds where the main goal is to produce sequential forecasts for decision-making problems, the good understanding of the contemporaneous relations amongdifferent series is crucial for the estimation of the covariance matrix. In recent years,the modiﬁed Cholesky decomposition appeared as a popular approach to covariancematrix estimation. However, its main drawback relies on the imposition of the se-ries ordering structure. In this work, we propose a highly ﬂexible and fast method todeal with the problem of ordering uncertainty in a dynamic fashion with the use ofDynamic Order Probabilities. We apply the proposed method in two different fore-casting contexts. The ﬁrst is a dynamic portfolio allocation problem, where the in-vestor is able to learn the contemporaneous relationships among different currenciesimproving ﬁnal decisions and economic performance. The second is a macroeconomicapplication, where the econometrician can adapt sequentially to new economic envi-ronments, switching the contemporaneous relations among macroeconomic variablesover time.

Keywords: ; Bayesian learning; Ordering Uncertainty; Dynamic Portfolio Alloca-tion; Exchange Rate Predictability; Macroeconomic Forecasting . J.E.L. codes: C11, C52, G11, G17, F31 * We thank to all participants at Insper Seminars for useful comments. All remaining errors are of ourresponsibility. † [email protected] ‡ hedibertﬂ@insper.edu.br a r X i v : . [ ec on . E M ] J a n ontents m univariate dynamic linear models . . . . . . . . . . . . . . . . . . . . . . . 82.2 Dynamic Order Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Predictors and discount factors learning . . . . . . . . . . . . . . . . . . . . . 12

1. Portfolio Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.1 Statistical Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.2 Economic Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352. Macroeconomic Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

Introduction

Model uncertainty is a well-known challenge among applied researchers and industrypractitioners that are interested in producing forecasts for decision-making problems. Suchapplications range from forecasting macroeconomic series (GDP, inﬂation, unemployment,interest rates, etc.) to commercial sales, ﬁnancial series for portfolio decisions and manyothers. The fact is that regardless of the ﬁeld of interest, the decision maker is often un-certain about which model speciﬁcations will produce higher forecasting performance,reﬁning ﬁnal decisions. Common uncertainties are not just about the best predictors tochoose, they are also related to the dynamic of the coefﬁcients. Specially in economicsand ﬁnance, where parameters can change in different manners over time, with a higheror lower degree depending on the environment of the economy (or even be virtually con-stant).In the last decades, the Bayesian literature has addressed the question of model un-certainty with great success. Bayesian Model Averaging (BMA) and Bayesian Model Se-lection (BMS) are well-known methodologies for static models when there is uncertaintyabout the predictors to include (Madigan and Raftery, 1994 and Hoeting et al., 1999). Morerecently, Raftery, Kárn `y, and Ettler (2010) proposed the dynamic version of BMA, calledDynamic Model Averaging (DMA) and Dynamic Model Selection (DMS). In this frame-work, it is considered the idea that the relevant model can change over time. For instance,a subset of the relevant predictors during the 1980s might become irrelavant during the1990s, or during the Great Recession or even during the current Covid-19 Crisis. This isa challenging task and even a simple forecasting problem can become computationallyinfeasible; for instance a researcher choosing among p predictors would have to consider2 p different models at each time period. Raftery et al. (2010) suggest the use of reasonableapproximations, borrowing ideas from discount (forgetting) methods, avoiding simula-tions of transition probabilities matrices and maintaining the conjugate form of posteriordistributions, allowing analytical solutions for forward ﬁltering and forecasting which sig-niﬁcantly reduces the computation burden of the process.After Raftery et al. (2010), several papers appeared in many areas applying DMA andDMS methods. Just to name a few, Koop and Korobilis (2012) were one of the ﬁrst touse this ideas in a macroeconomic context, forecasting US inﬂation. Dangl and Halling(2012) forecast the S & P

500 index using traditional ﬁnancial predictors, ﬁnding that time-varying parameter models are preferable to regressions with constant coefﬁcients. Morerecently, Catania, Grassi, and Ravazzolo (2019) applied those methods to forecast different3ryptocurrencies and Levy and Lopes (2020) show statistical and economic improvementsfor investors that learn the best look-back period in the time-series momentum context toforecasting the S & P

500 index.The literature has also produced examples in the multivariate context, using dynamicmodel probabilities. Such examples appeared in Koop and Korobilis (2013), where theauthors propose a method to sequentially learn the best dimension of time-varying pa-rameters VARs. Koop and Korobilis (2014) use in the construction of a ﬁnancial conditionindex, with time-varying weights for each ﬁnancial variable in the index. Recently, Beck-mann et al. (2020) show statistical and economic improvements for a portfolio allocationproblem using TVP-VARs. What is common for the papers on the multivariate case isthat they rely on the use of inverse Wishart distribution for the volatility matrix. ThisWishart Dynamic Linear Models (W-DLM) is well documented in the Bayesian literature(see West and Harrison, 1997). However, the W-DLM comes with some constraints. It notonly imposes all equations in the system to share the same predictors, but also ties the be-havior of volatilities and co-volatilities, since they are jointly modeled, avoiding speciﬁccustomizations.In the last couple of years, it began to ﬂourish in the statistic and econometric literaturewhat started to be known as a "Decouple/Recouple" concept, which is built on recursivesystems. The basic idea is to decouple the multivariate dynamic model into several uni-variate customized DLMs and then recouple for forecasting and decisions. It is strictly re-lated to the popular Cholesky-style Multivariate Stochastic Volatility of Lopes et al. (2018),Shirota, Omori, Lopes, and Piao (2017) and Primiceri (2005). Using similar ideas and ex-tending multi-regression dynamic models applied in Queen, Wright, and Albers (2008)and Costa et al. (2015), the work of Zhao, Xie, and West (2016) introduced the conceptof Dynamic Dependency Network models (DDNM), allowing the use of customized ad-ditional predictors and enabling sequential analysis of decoupled univariate DLMs andthen recoupled for forecasting. This type of approach has been discussed in West (2020)and applied in a similar fashion in Lavine et al. (2020) and Fisher et al. (2020) .The hugeadvantage of DDNM compared to the W-DLM approach is its ﬂexibility to feature its ownset of predictors for each equation and enabling volatilities and co-volatilities to vary withdifferent degrees over time. Also, as in the W-DLM, the DDNM yields closed-form so-lutions for posterior distributions and predictive densities. Hence, there is no need ofexpensive simulations methods which makes the process much faster.Even though DDNM introduce ﬂexibility in the process, it imposes a speciﬁc orderof the series in the model (more details will be discussed in Section 2). The fact is that4ll models within the Cholesky-style framework will also depend on the order structureselected by the researcher, leading to different contemporaneous relations among series.In our work, we argue that this imposition can change the covariance matrix estimationand substantially modify ﬁnal decisions. Therefore, the researcher in general will face theproblem of Order Uncertainty, where the "correct" order structure and contemporaneousrelations among variables are unknown.The problem of order uncertainty was recognized by the literature (Primiceri, 2005,Zhao et al., 2016 and Lopes, McCulloch, and Tsay, 2018), but was not fully solved. Possiblepaths have been applied, such as proposing some few different orders and comparing theresults among them or even do not impose any Cholesky-style structure and use more ap-proximations and expensive simulations to obtain predictive densities (Gruber and West,2016). Recent studies also try to consider the issue in the static formulation (Zheng et al.,2017 or Kang, Xie, and Wang, 2020), however it does not take into account the fact that thecontemporaneous relations among time series may evolve over time, i.e., there is a singleorder structure for all periods of time and parameters are considered constant throughout.We agree that, for some speciﬁc cases, the researcher may know in advance the con-temporaneous relations among time series. From macroeconomics, for example, the re-searcher is able to build the order structure borrowing ideas from some theory well devel-oped in the literature. However, for many different cases, the econometrician is faced witha new problem in which it is not totally clear how the data is organized. Or even more,we argue here that this structure is unstable over time. Hence, a common structure in the1980s can differ from the structure during the Great recession. Also, that latter order canbe different from nowadays. Therefore, the contemporaneous relations among differentvariables may change over time, being stronger, weaker or nonexistent depending on theeconomical environment. Considering the absence of studies dealing with the problem ofordering uncertainty in a dynamic fashion, we propose the

Dynamic Order Learning (DOL)approach. Our DOL scheme is a faster and a more ﬂexible method do deal with the un-certainty about the contemporaneous relations among dependent variables and predictorsin a online learning environment. By using the same structure as DDNM and followingsimilar path as Raftery et al. (2010) and Koop and Korobilis (2013) to deal with modeluncertainty, predictive densities have closed-form solutions, therefore avoiding the use ofMCMC schemes and substantially reducing the computational burden of the estimationand forecasting processes.We propose a dynamic method to deal with the uncertainty around series ordering anddifferent contemporaneous dependencies across series. We show in a dynamic portfolio5tudy with exchange rates that our DOL approach generates superior statistical and eco-nomic performances for a mean-variance investor that uses the predictive information todynamically rebalance her portfolio. That is, with DOL the investor is able to sequentiallylearn the dynamic contemporaneous relations among different currencies and improvethe predicted means and covariances of returns over time. We show that the investor willbe willing to pay a considerable fee to switch from the traditional Wishart-Random Walkmethod and ﬁxed orders over time to our DOL approach.Finally, we highlight that the DOL can be applied in any ﬁeld where the the main goalis to produce sequential forecasts in decision-making problems, such as in Central Banks,ﬁnancial institutions, commercial industries and others. It motivates our study to verify asecond application. We consider the problem of forecasting a set of important macroeco-nomic series commonly used in the literature: inﬂation, unemployment and interest rates.We show that the econometrician who learns the changes in the contemporaneous rela-tions among different economic variables is able to improve point and density forecastingcompared to the econometrician that considers a single order structure for all periods oftime. Therefore, it gives evidences that the environment of the economy is continuouslychanging, raising the importance for forecasters and decision makers to incorporate thisdynamic behavior in their econometric models.The remainder of the paper is organized as follows. The general econometric frame-work is introduced in Section 2 together with a brief discussion about the Cholesky-styleapproach. Section 2.2 details the DOL approach and how it can be applied to decision-making. In Section 4 we describe statistical and economic forecasting evaluation metrics.In Section 5 we show the results for both econometric applications and Section 6 concludes.

As previously mentioned, the DDNM framework of Zhao et al. (2016) is able to modelcross-section contemporaneous relationships and customize univariate DLMs. To make itclear, consider y t as a m -dimensional vector with (ﬁnancial) time series y j , t and considerthe following dynamic system: ( I m − Γ t ) y t =  x (cid:48) t − β t ... x (cid:48) m , t − β mt  + ν t , ν t | Ω t ∼ N ( , Ω t ) , (1)6here x j , t − is a p -dimensional vector of time series j’s lag predictors, β jt are time-varyingcoefﬁcients and Ω t = diag (cid:0) σ t , . . . , σ mt (cid:1) . Therefore, all the contemporaneous relationsamong time series are coming from the m × m matrix Γ t , whose off-diagonal elements γ jit s (for j (cid:54) = i ) capture the dynamic contemporaneous relationships among series j and i at time t . Γ t has zeroes on the main diagonal.Throughout our work, and following Zhao et al. (2016), we will focus on the partic-ular but important case where Γ t is lower triangular with zeroes in and above the maindiagonal: Γ t =  γ t γ m t γ m t . . . γ m , m − t  (2)This particular case has already appeared in the econometric literature. Lopes et al.(2018), for example, deal with time-varying learning of covariance matrices with no pre-dictors and handle hundreds of time series simultaneously via parsimonious priors (seealso Shirota, Omori, Lopes, and Piao (2017)). Also, Primiceri (2005) uses lagged values of y t in a VAR with stochastic volatility context with random walk dynamics for β it s, γ ijt sand σ it s.Since the error terms in ν t are contemporaneouly uncorrelated, the triangular contem-poraneous dependencies among time series in Equation (2) generate a fully recursive sys-tem, known as a Cholesky-style framework (West, 2020). Hence, each equation j of thesystem will have its own set of parents ( y < j , t ), that is, will depend contemporaneously onall other time series above equation j , following the triangular format in Equation (2). Inwords, the top time series in the system will not have parents, the second from the toptime series will have the ﬁrst time series as a parent, the third time series will have the ﬁrsttwo time series as parents all the way to the last time series, which will depend on all other m − y t = A t  x (cid:48) t − β t ... x (cid:48) m , t − β mt  + u t u t | Σ t ∼ N ( Σ t ) (3)where A t = ( I m − Γ t ) − and u t = A t ν t . The modiﬁed Cholesky decomposition clearlyappears in Σ t = A t Ω t A (cid:48) t which is now a full variance-covariance matrix capturing the con-temporaneous relations among the m time series. Given the parental triangular structureof Γ t in (2), the equations will be conditionally independent, bringing the “Decoupled”aspect of the multivariate model. In other words, the DDNM can be viewed as a set of m conditionally independent univariate DLMs that can be dealt with in a parallelizable fash-ion. The outputs of each equation are then used to compute Γ t and Ω t , hence recoveringthe full time-varying covariance matrix Σ t . m univariate dynamic linear models The set of m univariate models can be represented as m univariate recursive dynamicregressions, for j =

1, . . . , m : y jt = x (cid:48) j , t − β jt + y (cid:48) < j , t γ < j , t + ν jt , ν jt ∼ N (cid:16) σ jt (cid:17) , (4)and dynamic coefﬁcients evolving according to random walks: (cid:32) β jt γ < j , t (cid:33) = (cid:32) β j , t − γ < j , t − (cid:33) + ω jt ω jt ∼ N (cid:0) , W jt (cid:1) . (5)By deﬁning the full dynamic state and regression vectors as θ jt = (cid:32) β jt γ < j , t (cid:33) and F jt = (cid:32) x j , t − y < j , t (cid:33) ,we recover the traditional univariate DLM formulation as in West and Harrison (1997),namely y jt = F (cid:48) jt θ jt + ν jt , ν jt ∼ N (cid:16) σ jt (cid:17) , θ jt = θ j , t − + ω jt , ω jt ∼ N (cid:0) , W jt (cid:1) ,8or j =

1, . . . , m , where again the evolution of β jt and γ jt evolve over time as a simplerandom-walk. Posterior at t − . Following the algorithmic structure of sequential learning in DLMsWest and Harrison (1997), Chapter 4, at time t − j , the jointposterior distribution of θ jt − and σ jt − at time t − θ j , t − , σ − jt − | D t − ∼ N G (cid:0) m j , t − , C j , t − , n j , t − , n j , t − s j , t − (cid:1) . (6)Through the random walk evolution and conjugacy, we can derive the joint prior distri-bution of θ jt and σ jt for time t as: θ jt , σ − jt | D t − ∼ N G (cid:0) a jt , R jt , r jt , r jt s j , t − (cid:1) (7)where r jt = κ j n j , t − , a jt = m j , t − and R jt = C j , t − / δ j . The quantities 0 < δ j ≤ < κ j ≤ θ jt and σ jt , respectively.Discount methods are used to induce time-variations in the evolution of parameters andhave been extensively used in many applications (Raftery et al., 2010, Dangl and Halling,2012, Koop and Korobilis, 2013, McAlinn, Aastveit, Nakajima, and West, 2020, amongstothers) and well documented in West and Harrison (1997), Gamerman and Lopes (2006)and Prado and West (2010). t − . The (prior) predictive distribution of y jt is a Student’s t distribution with r jt degrees of freedom: y jt | y < j , t , D t − ∼ T r jt (cid:0) f jt , q jt (cid:1) ,with f jt = F (cid:48) jt a jt and q jt = s j , t − + F (cid:48) jt R jt F jt . It is important to notice that in this frameworkwe have a conjugate analysis for forward ﬁlters and one-step ahead forecasting. Therefore,we are able to compute closed-form solution for predictive densities for each equation j .Hence, conditional on parents , it is easy to compute the joint predictive density for y t : p ( y t | D t − ) = m ∏ j = p (cid:16) y jt | y < jt , D t − (cid:17) , . (8)which simply is the product of the already computed m different univariate Student’s t distributions. After the time series are decoupled for sequential analysis, they are then re-9oupled for multivariate forecasting. In our decision analysis at Section 4, we will be con-cerned with the mean and variance of this distribution for the portfolio allocation study: f t = E ( y t | D t − ) and Q t = V ( y t | D t − ) . (9)Further details about the derivations of the evolution, forecasting, updating distributionscan be found in Appendices A and B. In the previous Section, we have discussed the general format of DDNMs using a spe-ciﬁc given order structure. As we have argued, time series ordering has the potential ofbeing more ﬂexible and more dynamic than simply considering a single structure for allperiods of time. The fact is that for a multivariate model with k dependent series, it is pos-sible to have k ! different order permutations. Raftery et al. (2010) and Koop and Korobilis(2013) have proposed the use of dynamic model probabilities, where the model space wasdeﬁned by different models with speciﬁc predictors and discount factors. Following a sim-ilar idea, we propose what we call as Dynamic Order Probabilities (DOP), where the modelspace is set to have models with different order structures. Therefore, our model spacewill contain k ! possible orders. For any order, we dynamically compute probabilities and,conditionally on those and for each period of time, we can select the order that received thehighest predicted probability or just average the predicted outputs of all orders weighingby each order probability. We name those two approaches Dynamic Order Selection (DOS)and

Dynamic Order Averaging (DOA), respectively. In this way, the econometrician is ableto sequentially learn the orders that have performed well in the recent past and, conse-quently, learn from past mistakes. We highlight here that the notion of model probabilitieshas already been applied in the DDNM literature before. For example, Zhao et al. (2016),Fisher et al. (2020) and Lavine et al. (2020) have considered the uncertainty around predic-tors and discount factors for speciﬁc equations in the multivariate system conditioning ona predetermined order.Our

Dynamic Order Learning (DOL) approach works as follows. Suppose the researcheris faced with the problem of forecasting a multivariate model in which she is uncertainabout the order structure and does not know exactly the contemporaneous relation be-tween the variables of interest. Additionally, as a reﬁnement, she can also incorporate thecase where there are uncertainties around the speciﬁcation choices around predictors and10egrees of variations in parameters for each equation of the system. We will explain howto incorporate uncertainty around predictors and discount factors in Section 5.For each order i , i =

1, . . . , k !, we are going to have a set of conditionally independentunivariate DLMs with a given parental set structure. Similar to equation (8), we can com-pute the predictive density for each equation j at order i and then simply generate the jointpredictive density for order i as: p ( y t | D t − , O i ) = m ∏ j = p (cid:0) y jt | y < jt , D t − , O i (cid:1) . (10)After computing the joint predictive density for all k ! orders, we just follow the laws ofprobability and compute the DOP. First, denoting π t − | t − i = Pr ( O i | D t − ) ,as the posterior probability of order i at time t − i , given data until time t − π t | t − i = π α t − | t − i ∑ kl = π α t − | t − l , (11)where 0 < α ≤ α is avoiding thecomputational burden associated with expensive MCMC schemes to simulate transitionsbetween orders over time. This approach has also been extensively used in the Bayesianeconometric literaure with great sucess (Koop and Korobilis, 2013, Zhao et al., 2016, Lavineet al., 2020 and Beckmann et al., 2020). After observing new data at time t , we can computeand use the joint predictive density at Equation (10) to update our order probabilitiesfollowing a simple Bayes’ update: π t | t , i = π t | t − i p ( y t | D t − , O i ) ∑ kl = π t | t − l p ( y t | D t − , O l ) , (12)the posterior probability of order i at time t . Hence, upon the arrival of a new data point,the researcher is able to measure the performance for each order i and to assign higherprobability for those orders that generate better performance. One possible interpretationfor the forgetting factor α is through its role to discount past performance. Combining the11redicted and posterior probabilities, we can show that π t | t − i ∝ t − ∏ l = [ p ( y t − l | D t − l − , O i )] α l . (13)Since 0 < α ≤

1, Equation (13) can be viewed as a discounted predictive likelihood,where past performances are discounted more than recent ones. It implies that ordersthat received higher performance in the recent past will produce higher predictive orderprobabilities. The recent past is controlled by α , since a lower α discounts more heavilypast data and generates a faster switching behavior between orders over time. FollowingBeckmann et al. (2020), we induce time-variation in α by considering a grid of values,selecting the best value for each period of time. In this way, we can allow for periods offaster or slower order switching .After computing order probabilities the research is able to deal with the problem oforder uncertainty by sequentially learning about the importance of each order over time.As mentioned before, with predicted order probabilities at hand, the researcher can selectthe order that received the highest probability (DOS) or average the predicted outputs ofall orders weighting by each order probability (DOA). Since the environment of the economy is continuously changing, we apply our DOL ap-proach to sequentially learn the contemporaneous relations among different economicvariables over time, improving the covariance matrix estimation. In our application, thedecisions about speciﬁcation choices are quite ﬂexible, allowing not just the investor tolearn about contemporaneous dependencies, but also about the best predictors and de-grees of variations in coefﬁcients and volatilities over time.In order to sequentially select the best speciﬁcation choices for predictors and degreeof variation in coefﬁcients, we apply for each equation j (for a given order i ) the DynamicModel Selection (DMS) approach, similar to what have been done in Raftery et al. (2010),Koop and Korobilis (2012) and Levy and Lopes (2020). The procedure simply selects ateach period t the model speciﬁcation that received the highest predictive model probabil-ity. Therefore, an order structure will be deﬁned as the joint model with the best univari- At each time period t, we select those orders with the highest probabilities for each α in the grid. Giventhose orders, we select the alpha that generated the order with the highest sum of log predictive likelihoodin the past until time t −

1. After that, we compute model probabilities based on the best α t . j for a given order structure i , following the whole triangular format.Given the selection of the best univariate model for all equations j at order i , we canrecover the best multivariate model for that order structure. To make it clear, consider M ∗ j the univariate model for equation j at order i selected by the DMS procedure at period t with the highest model probability, and let P ( M ∗ j |D t − , O i ) be its model probability.Since each equation is conditionally independent of the others in the same order struc-ture, it imples that P ( M ∗ m |D t − , O i ) = ∏ mj = P ( M ∗ j |D t − , O i ) , where M ∗ m representsthe multivariate model with the highest probability. Hence, as soon as we are able to ﬁndthe best m univariate models within an order structure, we can easily recover the bestjoint model and compute its joint predictive density and Dynamic Order Probabilities,following Equations (11) and (12) in Section 2.2. Therefore, our DOL method propose con-siderable ﬂexibility in the speciﬁcation choices, allowing the econometrician to adapt tonew forecasting environments and learning from past mistakes, switching to new predic-tors, variation in coefﬁcients and volatilities and different contemporaneous dependenciesover time. After introducing the DDNM framework and the DOL procedure, we perform two differ-ent studies where we explore the main advantages of our econometric approach. The ﬁrstwill be based on a portfolio allocation application and the second is related to a macroeco-nomic forecasting procedure.

We perform dynamic asset allocation by combining the DDNM and DOL methods. It willbe based on the forecasting of a set of exchange rates and then using the predictive in-formation to sequentially rebalance the investor’s portfolio. The motivation to proposedifferent methods to predict exchange rates is not new in the literature. The seminal workof Meese and Rogoff (1983) brought evidences that structural models suffer to outperformsimple random walk models. Since their work, the literature has been moving towardsproducing models that can generate better results in terms of out-of-sample accuracy ortrading strategies. As summarized by Rossi (2013), each part of the literature use different13redictors, models and approaches. They can differ in terms of the selected predictors,time-variation in parameters and the use of multivariate or univariate models. Just toname a few interesting studies on the applied econometrics for exchange rate predictabil-ity, we can refer to Della Corte, Sarno, and Tsiakas (2009), Della Corte and Tsiakas (2012),Byrne, Korobilis, and Ribeiro ( 2016, 2018) on the univariate context and Beckmann, Koop,Korobilis, and Schüssler (2020) in a multivariate application.Similar to Della Corte et al. (2009), Byrne et al. (2018) and Beckmann et al. (2020), the dy-namic portfolio allocation takes the perspective of a US investor who allocates her wealthbetween six foreign bonds and one domestic bond (US). At each period, the foreign bondsyields a riskless return in the local currency and a risky return from the currency ﬂuctua-tions in US dollars.The investor takes two steps sequentially over time . The ﬁrst is to use the econometricmethod to generate one-month ahead forecasting. In the second step, the investor dynam-ically rebalances the portfolio by ﬁnding new optimal portfolio weights. To perform theportfolio optimization, the investor will use the vector of predicted mean exchange ratereturns and the predicted covariance matrix. Using this setup we are able to assess theeconomic value of exchange rate predictability from different methods within a dynamicmean-variance framework, implementing a maximum expected return strategy subject toa conditional volatility target.Following Della Corte et al. (2009) and Byrne et al. (2018), let r t + be the m × µ t + | t = E t [ r t + ] and Σ t + | t the m × m conditional covariance matrixof r t + . At each period of time, the investor solves the following problem:max w t (cid:110) µ p , t + = w (cid:48) t µ t + | t + (cid:0) − w (cid:48) t ι (cid:1) r f (cid:111) such that σ ∗ p = w (cid:48) t Σ t + | t w t (14)where ι is a vector of ones, µ p , t + is the conditional expected portfolio return, σ ∗ p is thevolatility target, w t is the m × r f is the return ofthe riskless asset. In our study we consider an annualized volatility target of σ p = w t = σ ∗ p √ C t Σ − t + | t (cid:16) µ t + | t − ι r f (cid:17) ,where C t = (cid:16) µ t + | t − ι r f (cid:17) (cid:48) Σ − t + | t (cid:16) µ t + | t − ι r f (cid:17) . The gross return of the investor’s portfo-14io is computed as R p , t + = + r p , t + = + (cid:0) − w (cid:48) t ι (cid:1) r f + w (cid:48) t r t + .Differently from Della Corte and Tsiakas (2012) and Byrne et al. (2018), where the authorsdo not model the conditional covariance matrix of exchange rate returns and just replace Σ t + | t by the unconditional covariance matrix, we use instead the predicted covariancematrix estimated from our econometric model. Vector Autoregressive (VAR) models are commonly applied in the macroeconomic litera-ture and used in Central Banks and ﬁnancial institutions in many different contexts. VARsare known to be a powerful tool to predict the future movements of the economy andfor monetary policy evaluation (Sims, 1980, Litterman, 1986, Primiceri, 2005, Clark andMcCracken, 2010 and Koop and Korobilis, 2013, Kastner and Huber, 2020).The recent VAR literature has recognized the advantages of considering time-varyingparameters and volatilities in model building. Inspired by the Cholesky-style behind thework of Primiceri (2005) and Del Negro and Primiceri (2015), we are motivated to explorethe ability of our approach to deal with the problem of order uncertainty in a macroeco-nomic context. Since the macroeconomy is continuously adapting to new environmentsand different sources of breaks, such as wars, global crisis and pandemics, VAR modelsare strongly susceptible to instabilities, as highlighted, for instance, by Cogley and Sargent(2005) and Clark and McCracken (2010). We argue here that when the main goal of theeconometrician is to produce sequential forecasts, those instabilities can induce differentsources of dependencies among economic variables. However, since the Cholesky-styleframework is tied to the order structure, the out-of-sample forecasting results can be seri-ously harmed from the static behavior of economic series dependencies, which can changerapidly from year to year or just in few months.It is important to highlight that, when we allow our model to learn and explore differ-ent series dependencies over time, our interest here is not on identiﬁcation assumptions orchallenging economic theories behind those dependencies, but instead focus on improv-ing out-of-sample forecasting accuracy.Therefore, we will follow a similar DDNM structure made by Zhao et al. (2016), wherenow the predictors will be composed by the time series lagged values, building on the15ormat of VARs with time-varying parameters and stochastic volatilities (TVP-VAR-SV).Similar to Primiceri (2005) and Del Negro and Primiceri (2015), we will focus on a VARmodel with three important US macroeconomic variables: inﬂation, unemployment andinterest rates.

In this Section we brieﬂy explain the main criteria used to compare different approaches interms of out-of-sample forecasting accuracy and economic performance. When forecastingmacroeconomic variables, it is common in the econometric literature to consider point anddensity forecasting metrics. However, for the portfolio allocation application, the investoris not just concerned about forecasting accuracy, but also how this accuracy is translatedto better portfolio performance and utility improvements for a mean-variance investor.Therefore, after introducing the main statistical evaluation measures, it is crucial to explainthe main economic criteria used to evaluate the outputs of our econometric method.

We make point and density forecast evaluations, where the point forecasts accuracy metricis the Mean Square Forecast Error (MSFE). First, we compute the MSFE for each currencyand then we take the ratio between the sum of indiviual MSFEs from each econometricmodel, i.e.

MSFE l = ∑ ki = MSFE li ∑ ki = MSFE

Bmki (15)where l is the speciﬁc order to be evaluated and Bmk is the speciﬁc benchmark.In terms of density forecast we decided to use the Log Predictive Density Ratio (

LPDR ),following the recent Bayesian econometric literature (McAlinn and West, 2019, McAlinnet al., 2020, Nakajima and West, 2013 and Koop and Korobilis, 2013). In this metric, weuse the predictive density for y t (given all data available until t − p ( y t | D t − ) . We opted by this criteria because our interest here is not just makingpoint forecasting, but also to generate better predictions about the whole predictive dis-tribution. In out ﬁrst application, the mean and the variance of the predictive density willbe essential to build portfolios. Therefore, a density forecast criteria suits much better toour interest. Additionally, it aligns with the work of Cenesizoglu and Timmermann (2012)who shows evidence of agreements between density forecast and economic performance.16he density forecast criteria ( LPDR ) is deﬁned as the ratio between the sum of the log-predictive density of model l and the sum of the log-predictive density of the benchmarkmodel : LPDR l = T ∑ t = log (cid:26) p l ( y t + | y t ) p Bmk ( y t + | y t ) (cid:27) . (16)The LPDR provides a statistical assessment of relative accuracy that extend traditionalBayes’ Factor. Therefore, whenever LPDR l >

0, it means that model l is statistically out-performing the benchmark. In order to evaluate the economic performance of our DOL method for exchange rate pre-dictability and portfolio allocation, we use a standard mean-variance measure. Follow-ing Fleming, Kirby, and Ostdiek (2001), we compute ex-post average utility for a mean-variance investor with a quadratic utility. As in Fleming et al. (2001), Della Corte et al.(2009) and Beckmann et al. (2020) we can calculate the performance fee that an investorwill be willing to pay to switch from the tradicional Wishart-Random Walk (W-RW) modelto the DOL approach. The performance fee is computed by equating the average utilityof the W-RW portfolio with the average utility of the DOL portfolio (or any alternativeportfolio), considering the latter with a fee Φ : T − ∑ t = { ( R DOLp , t + − Φ ) − θ ( + θ ) ( R DOLp , t + − Φ ) } = T − ∑ t = { R RWp , t + − θ ( + θ ) ( R RWp , t + ) } where θ is the investor’s degree of relative risk aversion (RRA), R DOLp , t + is the gross returnfrom the DOL portfolio and R RWp , t + is the gross return from the W-RW portfolio. As inBeckmann et al. (2020), we set θ = θ = Φ as the maximum annualized per-formance fee an investor is willing to pay to switch from a Wishart multivariate RandomWalk model to the Dynamic Ordering Learning (DOL) approach.In our results in the next Section we also show Sharpe Ratios (SR) as an additionaleconomic performance measure. It is the most commonly used measure in the ﬁnancialliterature and among practitioners. This measure is the average excess return of the port-folio divided by the standard deviation of the portfolio returns.17ll economic measures displayed in Section 5 are already net of transaction costs (TC).Following Marquering and Verbeek (2004), we deduct the transaction cost from the port-folio return ex-post. As argued by Della Corte and Tsiakas (2012), it is a reasonable sim-pliﬁcation that maintains the tractability of the analysis. Differently from Della Corte andTsiakas (2012) and Beckmann et al. (2020), that have set the transaction cost at 8bps, wedecide to use a slightly more conservative transaction cost of 10bps. As predictors, we propose to sequentially select one of 12 different measures of time seriesmomentum, each one with a speciﬁc look-back period, ranging from 1 to 12 months. Thetime series momentum predictor is a measure of continuation (or trend) in returns. Itsability to predict returns is well documented in the ﬁnancial literature (Jegadeesh andTitman, 1993, Moskowitz et al., 2012 and Levy and Lopes, 2020) and it can be deﬁned asthe accumulated returns from the previous l months, where l is the size of the look-backperiod . Additionally, each univariate model will have a different pair of discount factors δ and κ , with the following possible value choices for each one: δ ∈ {0.99, 1} and κ ∈ {0.96,1}. Therefore, the investor is able to sequentially learn not just about how far she needs tolook into the past to infer about the best trend to predict returns but also is able to learn ifcoefﬁcients and volatilities are constant or if they are time-varying, since discount factorslower than one induce variation in parameters and discount factors equal to one induceconstant parameters .The dataset consists of a set of one of the most traded currencies: the Australian dollar(AUD), the Canadian dollar (CAD), the Euro (EUR), the Japanese yen (JPY), the Swissfranc (SWF), the Great Britain pound (GBP) and the US dollar (USD) and by the one-monthLIBORs for the respective countries. All currencies are expreesed in terms of the US dollarand are end-of-month exchange rates, computed as discrete returns. All the data in ourapplication study here was taken from the work of Beckmann et al. (2020) . The sampleruns from 1986:01 until 2016:12 and we use the ﬁrst ten years of data as burn-in periodand the last twenty years as statistical and economic evaluation period. Speciﬁcally, we can deﬁne a momentum measure of look-back period l as: MOM lt = P t P t − l − See Appendix A for more details. Available at https://sites.google.com/site/dimitriskorobilis/matlab/fx_tvp igure 1: Time-Varying Forgetting Factor α t (Left panel) and Order Selection (Right panel) We use our DOL approaches to forecast the mean and covariance matrix of exchangerate returns. This study will compare statistically and economically the use of DOA andDOS with a mean-variance investor that uses a simple multivariate Random-Walk, wherethe error covariance matrix is assumed to follow an inverted Wishart distribution (W-RW) . Since our multivariate model contains six currencies, we have a total of 6! = 720possible order permutations. Hence, we also show results compared to the case wherethe investor believes in models that uses ﬁxed orders for all periods of time. We showevidences that the importance of orders are in fact dynamic, meaning that learning thechances in the contemporaneous relations among currencies improves statistical measuresand portfolio performance.Within each order structure, the investor can also learn about the look-back period fora time-series momentum strategy and different degrees of variability in parameters overtime. Since the environment of the economy induces changes in the behavior of returns,with periods of faster or slower switchings among orders, we allow to select among elevenvalues for the forgetting factor α over time . The left panel of Figure 1 reports the selected α across time. We can notice that in all periods of time the best forgetting factors were We have considered in our study a driftless Random Walk model with time-varying covariance matrix ( κ = We choose α ∈ { } α = α = We present point and density forecast evaluation for the period of 1996:1 through 2016:12.Both measures will assess how our DOL approach performs in terms of one-month aheadforecasting out-of-sample. As a measure of point forecast, we use mean square forecasterror (MSFE) and, for density forecast, the predictive likelihood. The latter is popularin the Bayesian literature and captures how the whole predictive distribution performsto forecast and not just a single value opposed to the MSFE. The right panel of Figure 2shows the MSFE performance of both of our DOL approaches, the DOA (green line ) andDOS (red line). The gray points are the point forecast performance of all 720 possible ﬁxedorders structures that the econometrician/investor has available. The MSFE is in relationto the Wishart-Random Walk (W-RW) model and numbers lower than one means that thespeciﬁc model is outperforming the W-RW model. As it is clear, all different speciﬁcationsoutperform the Random Walk model. Both the DOA and DOS approaches are performingbetter than the huge majority of the ﬁxed orders. Also, there are great differences amongthe performance between the ﬁxed order possibilities, meaning that relying on the use ofa random ﬁxed order for all periods of time can make huge differences at ﬁnal outcomes.In relation to density forecast performance, the left panel of Figure 2 shows how the20 igure 2: Statistical performance relative do the Wishart-Random-Walk model. i) Left panel: Log PredictiveDensity Ratio (LPDR); ii) Right panel: Mean Square Forecast Error (MSFE)

DOA outperforms not just the Random Walk model, but all 720 ﬁxed orders over time. Hence, when the investor sequentially learn about the importance of each order and con-siders the fact that one order can improve or get worse in matters of quarters or evenmonths, the statistical performance increases a lot. Also note that the DOS performs betterthan the vast majority of the ﬁxed orders.In order to understand how the density performance evolve over time, we plot at Fig-ure 3 the accumulated predictive likelihood. This metric is useful to visualize how differ-ent models are accumulating statistical gains or losses compared to the benchmark overtime. We can note that the DOA approach is the best model among all ﬁxed orders forthe whole out-of-sample evaluation period and accumulates signiﬁcant gains in relationto the benchmark, specially after the Great Recession.

The previous subsection provided evidences that the DOL approach is improving statisti-cal performance and capturing order change. But nothing was said yet about portfolio and Numbers above zero represents that the speciﬁc model is performing better than the benchmark (W-RW)in terms of density forecast igure 3: Accumulated Log Predictive Likelihood relative to the Wishart-Random Walk Model utility improvements. As we have highlighted in Section 4, we design a portfolio alloca-tion study in which a US investor optimally allocates her wealth among six foreing bondsand the US bond, receiving not just the risless return from those bonds but also de riskycurrency ﬂuctuations. We compute annualized Sharpe Ratios (SR) for the investor thatuses different econometric models to generate predicted mean and covariances amongcurrencies to reblance her portfolio each time period. Also we show the Annualized Man-agment Fee ( Φ ) that the investor will be willing to pay to switch from the W-RW model toeach one of the methods (DOA, DOS or ﬁxed orders).The right panel of Figure 4 shows the SR for different strategies. The results are alreadynet of transaction costs. The Random Walk strategy generates an annualized Sharpe Ratioof 0.71 for the out-of-sample evaluation period. It is not a bad performance, but as theﬁgure makes clear, any ﬁxed order generate much better average return adjusted for riskfor the investor. The green line shows how our DOA strategy dramatically improves SRfor the investor. An investor using our DOA approach would generate a portfolio withSR of 1.30 for the period, a number greater than around 92% of all ﬁxed orders over time.The DOS strategy also performed well, with a SR equal to 1.21, also much higher than the22 igure 4: Economic performance relative do the Wishart-Random-Walk model. i) Left panel: AnnualizedManagment Fees ( Φ ); ii) Right panel: Sharp Ratios. All results are already net of transaction costs (TC = 10bps). W-RW approach. Again, although all ﬁxed orders have produced good portfolio results,there are considerable differences in ﬁnal performance among them, meaning that orderuncertainty plays a important role on ﬁnal outcomes.It is important to highlight that although Sharpe Ratio is a popular measure amongpractitioners, it tends to overestimate risk for dynamic portfolios (Marquering and Ver-beek, 2004 and Beckmann et al., 2020). It motivates the use of a more robust measure ofeconomic performance, considering explicitly the risk aversion and a utility function forthe investor (as explained in Section 4). In terms of economic utility, the left panel of Fig-ure 4 shows the annualized managment fee (net of transaction costs) that a mean-varianceinvestor would pay to switch from the W-RW to the proposed methods. This ﬁgure makesclear the strong performance of the DOA strategy. In fact, a mean-variance investor willbe willing to pay the considerable fee of 638.5 basis points to migrate to the DOA strategy.The DOA requires a fee that is higher than around 98% of the fees for all ﬁxed orders. Aswe have noted for the statistical measures and SR, the investor that considers a ﬁxed orderover time will give up the opportunity to learn the time-varying contemporaneous depen-dencies among currencies and will be subject to a big variance on possible ﬁnal outcomes.Although one may argue that the DOA and DOS strategies generate strong portfolio23erformances, it still remains some few ﬁxed orders that have shown even higher per-formance and it would be worth to consider these small set of ﬁxed orders as candidatesto forecast and use as inputs in portfolio allocation. However, we highlight here that theinvestor was not aware of the performance of those ﬁxed orders in advance. In order toperform an asset allocation with the best order structures, the investor should considerthose orders that have performed better at the time of the decision. Hence, we investigatewhat a investor would have done in terms of portfolio allocation if she have consideredthe top ﬁxed orders at the end of 2006 and then allocated her wealth using them comparedto our DOL approach. We consider an investor that observes all the data available untildecember of 2006 and have computed the managment fee that all ﬁxed orders have gener-ated compared to the W-RW. Figure 5 shows the out-of-sample performance between 2007and 2016 that the investor would get if she just believed on the supperiority of the top 10ﬁxed orders in terms of managment fee at that time and used those 10 models to allocatesher wealth. What we can conﬁrm from this ﬁgure is that when the investor gives up theopportunity to learn about changes in importance among different orders and avoids thedynamic uncertainty on the dependencies of each currency over time, the ﬁnal economicperformance is harmed. The investor that instead considers the fact that there is strong un-certainty about the correct order structure and it is continuously changing depending onthe environment of the economy ﬁnishes the out-of-sample period with not just a higherSharpe Ratio, but a much more signiﬁcant managment fee that would receive.Therefore, we argue that is quite difﬁcult to antecipate what is a good order in advanceand even a good guess can lead to a suboptimal future performance. Our approach is ableto recognize which orders are starting to perform better or worse in a dynamic fashionand then attribute higher or lower probabilities to them.Finally, in Appendix C we also show the results for different model settings, usingthe DOA approach as a benchmark. Table 1 and Figure 9 make clear the great statisticalsuperiority of DOA when we allow the model to learn about the degree of variation in co-efﬁcients (TVP) and volatilities (SV). The statistical gains are even higher when comparedto models with constant volatilities (CV). Table 2 shows the great economic performanceof different model settings, specially when compared to the Wishart Random-Walk withconstant volatilities (W-RW-CV). 24 igure 5: Economic Performance 2007-2016: DOA and DOS against top 10 orders at the end of 2006.

As described before, we use the DDNM framework of Zhao et al. (2016) combined withour DOL approach to build TVP-VAR-SV models that are able to sequentially learn thecontemporaneous relations among inﬂation rate, unemployment rate and interest rates viadynamic ordering probabilities. These macroeconomic series were also used in the small-scale VAR of Primiceri (2005). We use quarterly data for the US economy from 1953Q1 to2015Q2. We left the ﬁrst 150 quarters (until 1990Q2) as a training period and perform anout-of-sample evaluation for the next 100 quarters (from 1990Q3 to 2015Q2). Inﬂation ismeasured as the year-over-year log growth rate of the GDP price index. Unemploymentrate is referred to all workers over 16 and interest rate is the yield on 3-month Treasurybills .We set our TVP-VAR-SV model to sequentially learn the use of two lags of all depen-dent variables. Each equation can adjust to the use of each lag predictor to enter or not inthe model for each period of time. Since we use quarterly data, it is common practice toinduce a higher discount in information, since few observations are able to contain long The R package bvarsv (Krueger, 2015) provides the macroeconomic data set, where GDP priceindex was collected from the Federal Reserve Bank of Philadelphia ( ). Unemployment and interest rate were downloaded from theFederal Reserve Bank of St. Louis ( https://fred.stlouisfed.org/ ) igure 6: Time-Varying Forgetting Factor α t (Left panel) and Order Selection (Right panel) periods of time and just in few quarters of data the environment of the economy can dra-matically change. Therefore, we consider the range of values for the discount factors δ and κ ∈ { } . As we did in the portfolio allocation problem, we are still letting ourapproach to learn the degree of variation in coefﬁcients, switching from higher and lowerdegrees of variation to a constant coefﬁcient if it is empirically wanted. This small-scale VAR has just three economic dependent variables, implying the exis-tence of 3! = y t = [ In f lation t , Unemployment t , InterestRate t ] (cid:48) is one of the six possibleﬁxed orders over time. Therefore, in our plots we show how DOL and others ﬁxed ordersperform in relation to this benchmark. Again, we argue here that the main goal of the In term of the forgetting factor α , we also allow for higher decay in model probabilities, selecting among α ∈ { } . α . For allevaluation period it remains lower than one, meaning that a higher discount on predictivedensities are induced. After the Great Financial Crisis, the model selected a even lower α , which means that orders that performed well in the very recent past are preferred toorders that performed well in the past. This behavior can be strongly related to changesin the economic behavior.The right panel of Figure 6 shows the orders that received the highest probability foreach period of time. Note that there is no single order that dominates others for the evalu-ation period. Here, for the whole out-of-sample period, the standard order used in Prim-iceri (2005) (the ﬁrst one) was not selected and preferred to the others. Unlike the portfolio allocation problem, in the context of macroeconomic forecasts, wewill focus only on measures of statistical accuracy: MSFE and LPDR. As a benchmark (inblack), we select the standard ﬁxed order where both inﬂation and unemployment are af-fected by monetary policy just after at least one lag of time. Hence, the benchmark doesnot consider order switching and learning. We also present the performance of the remain-ing 5 ﬁxed orders (gray dots). Again, the green line represents the statistical performanceof the DOA approach and in red the DOS performance.In terms of point forecast, the right panel of Figure 7 shows the great superiority ofDOA and DOS approaches in relation to the benchmark, representing the important out-of-sample accuracy improvement for the econometrician that learns sequentially from thedata differences in the dynamic contemporaneous dependencies among macroeconomicvariables over time. There are two ﬁxed orders that performed quite similar to DOS andDOA. Those orders are, interestingly, considering inﬂation at the end of the order structureinstead of ﬁrst. Hence, for these two orders, inﬂation is being affected by monetary policyand unemployment contemporaneously. Also, monetary policy is responding to inﬂationwith at least one lag of time.In relation to density forecasting, the results are similar. DOA performed better thanall ﬁxed orders and the benchmark. Also note that any ﬁxed order and the DOS approachoutperform the standard benchmark order structure. Again, those orders with inﬂationat the last position in the y t vector, have showed great improvements compared to the27enchmark. Figure 7: Relative statistical performance relative to the benchmark. i) Left panel: Log Predictive DensityRatio (LPDR); ii) Right panel: Mean Square Forecast Error (MSFE)

The relevance of considering inﬂation at the end of the order structure is highlightedin Figure 8. It presents the accumulated log predictive density, so it is a measure of how anorder structure is accumulating density forecast gains over time compared to the bench-mark. In blue we emphasize those orders that use inﬂation at the end of the y t vector.Before 2007-2008, none of the orders were easily seen as a superior. However, at the endof 2007 and beggining of 2008, the stastistical performance of those orders in blue startedto abruptly grow. This improvement growth last until around 2011 and, since then, theystill maintain the accuracy gains obtained before. It seems that for those stressed periods,an order structure that considers a monetary policy contemporaneously independent ofinﬂation predicted much better the future movements of the economy.The great advantage of our DOL approach is that, as soon as an order structure cap-tured some new kind of information in the economic environment, it starts to attributemore probabilities for those orders. This framework allows the DOA to accumulateshigher density forecast improvements than all other orders.Finally, we also show in Appendix C some additional results for the macroeconomicapplication. At Table 3 and Figure 10, we can note the great statistical improvement of theDOA approach, compared to models that consider constant parameters (CP). The statisti-28al gains are much higher when compared to models with constant volatilities (CV). Thisresults are in line with evidences of time-varying volatilities patterns on economic series. Figure 8: Accumulated Log Predictive Likelihood relative to the benchmark.

Since the recent and growing literature on multivariate forecast, where the popular Cholesky-style have been adopted as a ﬂexible method to Decouple a multivariate model into a set ofunivariate DLMs, little have been discussed about the differences in ﬁnal decisions whenconsidering different order structures. The main goal of our work is to solve this orderuncertainty in an online fashion. We extend the class of Dynamic Denpendency NetworkModels of Zhao et al. (2016) and propose the Dynamic Order Learning approach, a veryfast and ﬂexible method to deal with the uncertainty around the contemporaneous rela-tions among dependent variables. We perform a dynamic asset allocation study wherethe investor is uncertain about the contemporaneus relations among different currenciesand we show that the Dynamic Ordering Learning approach generated not just signiﬁ-cant statistical improvements, but also great economic gains for the investor. The results29how that the mean-variance investor will be willing to pay a considerable annualizedmanagment fee to switch from the traditional Wishart Random Walk model to the DOLapproach. Additionally, the DOA approach performs much better than the huge majorityof models with ﬁxed orders.As a second application, we use a VAR structure within our DOL approach to fore-casting inﬂation, unemployment and interest rates. We show evidences that DOL is ableto adapt to changes in the environment of the economy, giving higher probabilities forthose orders that have performed better in the recent past. We give evidences that duringthe Great Financial Crisis, our approach detected great improvements when changing thedynamic dependencies among economic variables, incorporating this new information.We show that the DOL was able to substantially increase both point and density forecastaccuracy compared to a standard order structure commonly used in the macroeconomicliterature.In summary, we found evidences that taking into account different contemporaneousrelations among variables over time improves statistical models and ﬁnal decisions, sincethe environment of the economy is continuously changing and the dependencies of vari-ables are switching over time. We highlight here that our framework can be expanded toa broader perspective, being applied not just on portfolio allocations or macroeconomicforecasting, but in any ﬁeld where the researcher is faced with the problem of multivariatesequential forecasts. 30 ppendix A: Filtering and Forecasting

We give details about the evolution and updating steps for the set of m univariate DLMs,following Zhao et al. (2016) and similar to Fisher et al. (2020).

Posterior at t −

1: At time t − θ jt − and volatility σ jt − as: θ j , t − , σ − jt − | D t − ∼ N G (cid:0) m j , t − , C j , t − , n j , t − , n j , t − s j , t − (cid:1) (17)Equation (16) is the joint posterior distribution of model parameters at time t −

1, knownas a Normal-Gamma distribution. Hence, given the initial states, posteriors at t − t via the evolution equations: θ j , t = θ j , t − + ω j , t , where ω j , t ∼ N (cid:16) W j , t / (cid:16) s j , t − σ − jt (cid:17)(cid:17) σ − jt = σ − jt − η j , t / κ j , where η j , t ∼ Be (cid:0) κ j n j , t − /2, (cid:0) − κ j (cid:1) n j , t − /2 (cid:1) where we can rewrite W j , t as a discounted function of C j , t − , W j , t = C j , t − (cid:0) − δ j (cid:1) / δ j for0 < δ j ≤ η j , t is deﬁned by the discount factor 0 < κ j ≤ δ and κ induce higher degrees ofvariation in parameters and when discount factors are equal to one, both coefﬁcients andvolatilities will be constant.Hence, the prior for time t is given by θ jt , σ − jt | D t − ∼ N G (cid:0) a jt , R jt , r jt , r jt s j , t − (cid:1) (18)where r jt = κ j n j , t − , a jt = m j , t − and R jt = C j , t − / δ j . t −

1: The predictive distribution for t at time t − r jt degrees of freedom: y jt | y < jt , D t − ∼ T r jt (cid:0) f jt , q jt (cid:1) f jt = F (cid:48) jt a jt and q jt = s j , t − + F (cid:48) jt R jt F jt . To make it explicit, we can deﬁne as thefollowing manner a jt = (cid:32) a j β t a j γ t (cid:33) and R jt = (cid:32) R j β t R j βγ t R (cid:48) j βγ t R j γ t (cid:33) we have f jt = x (cid:48) jt − a j β t + y (cid:48) < jt a j γ t q jt = s j , t − + y (cid:48) < jt R j γ t y < jt + y (cid:48) < jt R (cid:48) j βγ t x jt − + x (cid:48) jt − R j β t x jt − Updating at time t : with the previous prior, the Normal-Gamma posterior is θ jt , σ − jt | D t ∼ N G (cid:0) m jt , C jt , n jt , n jt s j , t (cid:1) (19)with parameters following standard updating equations:Posterior mean vector: m jt = a jt + A jt e jt Posterior covariance matrix factor: C jt = (cid:16) R jt − A jt A (cid:48) jt q jt (cid:17) z jt Posterior degrees of freedom: n jt = r jt + s jt = s j , t − z jt where1 - step ahead forecast error: e jt = y jt − F (cid:48) jt a jt q jt = s j , t − + F (cid:48) jt R jt F jt Adaptive coefﬁcient vector: A jt = R jt F jt / q jt Volatility update factor: z jt = (cid:16) r jt + e jt / q jt (cid:17) / (cid:0) r jt + (cid:1) ppendix B: Joint Predictive Moments After computing the predictive density for each equation j, we are able to compute thejoint predictive density for y t conditional on the parents : , p ( y t | D t − ) = ∏ j = m p (cid:0) y jt | y < jt , D t − (cid:1) (20)being simply the product of the already computed m different univariate student-t distri-butions. Hence, after series being decoupled for sequential analysis, they are recoupledfor multivariate forecasting. In our decision analysis at Section 4, we are concerned withthe mean and variance of this distribution for the portfolio allocation study: f t = E ( y t | D t − ) , Q t = V ( y t | D t − ) (21)The triangular form in Equation (2) allows for a recursive computation of moments ac-cording to the order dependence. Since the ﬁrst dependent variable has a empty parentalset, the forecast mean and variance for j = are given by f t = x (cid:48) t − a β t q t = r t r t − (cid:0) x (cid:48) t − R β t x t − + s t − (cid:1) inserting f t as the ﬁrst element of f t and q t the (

1, 1 ) element of Q t . For j =

2, . . . , m , wecan ﬁnd sequentially the subsequent predicted moments. Their conditional distributionsalso follow Student’s t-distribution, with predictive moments given by f jt = x (cid:48) jt − a j β t + f (cid:48) < jt a j γ t q jt = r jt r jt − (cid:0) s j , t − + u jt (cid:1) + a (cid:48) j γ t Q < jt a jy γ t with u jt = f (cid:48) < jt R j γ f < jt + tr (cid:0) R j γ t Q < jt (cid:1) + x (cid:48) jt − R j βγ f < jt + x (cid:48) jt − R j β t x jt − . Now, we just needto plug f jt as the j-th element of f t and q jt the ( j , j ) element of Q t . Finally, the covariancevector among y jt and its parents y < jt is computed as C (cid:0) y jt , y < j , t | D t − (cid:1) = Q < jt a j γ t . Hence,after reaching j = m , we have ﬁlled all elements of the m -vector f t and the m × m covari-ance matrix Q t . 33 ppendix C: Additional Results In this Section we make several comparisons of the DOA approach in relation to dif-ferent model settings. We show statistical and economic performance for models withconstant parameters (CP), time-varying parameters (TVP), constant volatilities (CV) andtime-varying volatilties (SV).

1. Portfolio Allocation

In the Portfolio Allocation problem, we use δ = κ = Table 1: Statistical Performance Relative to DOA

MSFE LPDRDOA-CP-CV 1.01 − − − − − − − − igure 9: Accumulated Log Predictive Likelihood relative to the DOA approach. Table 2: Potfolio Performance

SR FeeDOA-CP-CV 1.32 724.44DOA-TVP-CV 1.18 616.13DOA-CP-SV 1.33 625.94DOS-CP-CV 1.26 615.16DOS-TVP-CV 1.16 663.34DOS-CP-SV 1.25 600.59W-RW-CV 0.56 − . Macroeconomic Forecasting In the Macroeconomic Forecasting problem, we use δ = κ = Table 3: Statistical Performance Relative to DOA

MSFE LPDRDOA-CP-CV 1.16 − − − − − − igure 10: Accumulated Log Predictive Likelihood eferences B ECKMANN , J., G. K

OOP , D. K

OROBILIS , AND

R. A. S

CHÜSSLER (2020): “Exchange ratepredictability and dynamic Bayesian learning,”

Journal of Applied Econometrics , 35, 410–421. 4, 11, 12, 14, 17, 18, 23B

YRNE , J. P., D. K

OROBILIS , AND

P. J. R

IBEIRO (2016): “Exchange rate predictability in achanging world,”

Journal of International Money and Finance , 62, 1–24. 14——— (2018): “On the sources of uncertainty in exchange rate predictability,”

InternationalEconomic Review , 59, 329–357. 14, 15C

ATANIA , L., S. G

RASSI , AND

F. R

AVAZZOLO (2019): “Forecasting cryptocurrencies undermodel and parameter instability,”

International Journal of Forecasting , 35, 485–501. 3C

ENESIZOGLU , T.

AND

A. T

IMMERMANN (2012): “Do return prediction models add eco-nomic value?”

Journal of Banking & Finance , 36, 2974–2987. 16C

LARK , T. E.

AND

M. W. M C C RACKEN (2010): “Averaging forecasts from VARs withuncertain instabilities,”

Journal of Applied Econometrics , 25, 5–29. 15C

OGLEY , T.

AND

T. J. S

ARGENT (2005): “Drifts and volatilities: monetary policies andoutcomes in the post WWII US,”

Review of Economic dynamics , 8, 262–302. 15C

OSTA , L., J. S

MITH , T. N

ICHOLS , J. C

USSENS , E. P. D

UFF , T. R. M

AKIN , ET AL . (2015):“Searching multiregression dynamic models of resting-state fMRI networks using inte-ger programming,”

Bayesian Analysis , 10, 441–478. 4D

ANGL , T.

AND

M. H

ALLING (2012): “Predictive regressions with time-varying coefﬁ-cients,”

Journal of Financial Economics , 106, 157–181. 3, 9, 31D EL N EGRO , M.

AND

G. E. P

RIMICERI (2015): “Time varying structural vector autoregres-sions and monetary policy: a corrigendum,”

The review of economic studies , 82, 1342–1345.15, 16D

ELLA C ORTE , P., L. S

ARNO , AND

I. T

SIAKAS (2009): “An economic evaluation of empir-ical exchange rate models,”

The review of ﬁnancial studies , 22, 3491–3530. 14, 17D

ELLA C ORTE , P.

AND

I. T

SIAKAS (2012): “Statistical and economic methods for evaluat-ing exchange rate predictability,”

Handbook of exchange rates , 221–263. 14, 15, 1838

ISHER , J. D., D. P

ETTENUZZO , C. M. C

ARVALHO , ET AL . (2020): “Optimal asset alloca-tion with multivariate Bayesian dynamic linear models,”

Annals of Applied Statistics , 14,299–338. 4, 10, 31F

LEMING , J., C. K

IRBY , AND

B. O

STDIEK (2001): “The economic value of volatility timing,”

The Journal of Finance , 56, 329–352. 17G

AMERMAN , D.

AND

H. F. L

OPES (2006): “MCMC-Stochastic Simulation for BayesianInference,”

Chapman Hill . 9G

RUBER , L.

AND

M. W

EST (2016): “GPU-accelerated Bayesian learning and forecasting insimultaneous graphical dynamic linear models,”

Bayesian Analysis , 11, 125–149. 5H

OETING , J. A., D. M

ADIGAN , A. E. R

AFTERY , AND

C. T. V

OLINSKY (1999): “Bayesianmodel averaging: a tutorial,”

Statistical science , 382–401. 3J

EGADEESH , N.

AND

S. T

ITMAN (1993): “Returns to buying winners and selling losers:Implications for stock market efﬁciency,”

The Journal of ﬁnance , 48, 65–91. 18K

ANG , X., C. X IE , AND

M. W

ANG (2020): “A Cholesky-based estimation for large-dimensional covariance matrices,”

Journal of Applied Statistics , 47, 1017–1030. 5K

ASTNER , G.

AND

F. H

UBER (2020): “Sparse Bayesian vector autoregressions in hugedimensions,”

Journal of Forecasting . 15K

OOP , G.

AND

D. K

OROBILIS (2012): “Forecasting inﬂation using dynamic model averag-ing,”

International Economic Review , 53, 867–886. 3, 12——— (2013): “Large time-varying parameter VARs,”

Journal of Econometrics , 177, 185–198.4, 5, 9, 10, 11, 15, 16, 31——— (2014): “A new index of ﬁnancial conditions,”

European Economic Review , 71, 101–116. 4K

RUEGER , F. (2015): “bvarsv: Bayesian Analysis of a Vector Autoregressive Modelwith Stochastic Volatility and Time-Varying Parameters,”

R package: cran. r-project.org/package= bvarsv . 25L

AVINE , I., M. L

INDON , M. W

EST , ET AL . (2020): “Adaptive variable selection for sequen-tial prediction in multivariate dynamic models,”

Bayesian Analysis . 4, 10, 1139

EVY , B. P. C.

AND

H. F. L

OPES (2020): “Time-Series Momentum Predictability via Dy-namic Bayesian Learning,”

PhD Thesis . 4, 12, 18L

ITTERMAN , R. B. (1986): “Forecasting with Bayesian vector autoregressions—ﬁve yearsof experience,”

Journal of Business & Economic Statistics , 4, 25–38. 15L

OPES , H. F., R. E. M C C ULLOCH , AND

R. S. T

SAY (2018): “Parsimony inducing priors forlarge scale state-space models,”

Technical Report 2018-08 . 4, 5, 7M

ADIGAN , D.

AND

A. E. R

AFTERY (1994): “Model selection and accounting for model un-certainty in graphical models using Occam’s window,”

Journal of the American StatisticalAssociation , 89, 1535–1546. 3M

ARQUERING , W.

AND

M. V

ERBEEK (2004): “The economic value of predicting stockindex returns and volatility,”

Journal of Financial and Quantitative Analysis , 39, 407–429.18, 23M C A LINN , K., K. A. A

ASTVEIT , J. N

AKAJIMA , AND

M. W

EST (2020): “MultivariateBayesian predictive synthesis in macroeconomic forecasting,”

Journal of the AmericanStatistical Association , 115, 1092–1110. 9, 16, 31M C A LINN , K.

AND

M. W

EST (2019): “Dynamic Bayesian predictive synthesis in timeseries forecasting,”

Journal of econometrics , 210, 155–169. 16M

EESE , R. A.

AND

K. R

OGOFF (1983): “Empirical exchange rate models of the seventies:Do they ﬁt out of sample?”

Journal of international economics , 14, 3–24. 13M

OSKOWITZ , T. J., Y. H. O OI , AND

L. H. P

EDERSEN (2012): “Time series momentum,”

Journal of ﬁnancial economics , 104, 228–250. 18N

AKAJIMA , J.

AND

M. W

EST (2013): “Bayesian analysis of latent threshold dynamic mod-els,”

Journal of Business & Economic Statistics , 31, 151–164. 16P

RADO , R.

AND

M. W

EST (2010):

Time series: modeling, computation, and inference , CRCPress. 9, 31P

RIMICERI , G. E. (2005): “Time varying structural vector autoregressions and monetarypolicy,”

The Review of Economic Studies , 72, 821–852. 4, 5, 7, 15, 16, 25, 26, 27Q

UEEN , C. M., B. J. W

RIGHT , AND

C. J. A

LBERS (2008): “Forecast covariances in the linearmultiregression dynamic model,”

Journal of Forecasting , 27, 175–191. 440

AFTERY , A. E., M. K

ÁRN `Y , AND

P. E

TTLER (2010): “Online prediction under modeluncertainty via dynamic model averaging: Application to a cold rolling mill,”

Techno-metrics , 52, 52–66. 3, 5, 9, 10, 11, 12, 31R

OSSI , B. (2013): “Exchange rate predictability,”

Journal of economic literature , 51, 1063–1119. 13S

HIROTA , S., Y. O

MORI , H. F. L

OPES , AND

H. P

IAO (2017): “Cholesky realized stochasticvolatility model,”

Econometrics and Statistics , 3, 34–59. 4, 7S

IMS , C. A. (1980): “Macroeconomics and reality,”

Econometrica: journal of the EconometricSociety , 1–48. 15W

EST , M. (2020): “Bayesian forecasting of multivariate time series: scalability, structureuncertainty and decisions,”

Annals of the Institute of Statistical Mathematics , 72, 1–31. 4, 7W

EST , M.

AND

J. H

ARRISON (1997):

Bayesian forecasting and dynamic models , Springer Sci-ence & Business Media. 4, 8, 9Z

HAO , Z. Y., M. X IE , AND

M. W

EST (2016): “Dynamic dependence networks: Financialtime series forecasting and portfolio decisions,”

Applied Stochastic Models in Business andIndustry , 32, 311–332. 4, 5, 6, 7, 10, 11, 15, 25, 29, 31Z

HENG , H., K.-W. T

SUI , X. K

ANG , AND

X. D

ENG (2017): “Cholesky-based model av-eraging for covariance matrix estimation,”