General Bayesian time-varying parameter VARs for predicting government bond yields
Manfred M. Fischer, Niko Hauzenberger, Florian Huber, Michael Pfarrhofer
GGeneral Bayesian time-varying parameter VARs forpredicting government bond yields
Manfred M. Fischer* , Niko Hauzenberger
1, 2
Florian Huber , and Michael Pfarrhofer March 1, 2021
Time-varying parameter (TVP) regressions commonly assume that time-variation in the coefficients is determined by a simple stochastic processsuch as a random walk. While such models are capable of capturing a widerange of dynamic patterns, the true nature of time variation might stemfrom other sources, or arise from different laws of motion. In this paper, wepropose a flexible TVP VAR that assumes the TVPs to depend on a panelof partially latent covariates. The latent part of these covariates differ intheir state dynamics and thus capture smoothly evolving or abruptly changingcoefficients. To determine which of these covariates are important, and thusto decide on the appropriate state evolution, we introduce Bayesian shrinkagepriors to perform model selection. As an empirical application, we forecastthe US term structure of interest rates and show that our approach performswell relative to a set of competing models. We then show how the model canbe used to explain structural breaks in coefficients related to the US yield curve.
JEL : C11, C30, E37, E43
Keywords : Bayesian shrinkage, interest rate forecasting, latent effect modi-fiers, MCMC sampling, time-varying parameter regression *Corresponding author: Manfred M. Fischer, Vienna University of Economics and Business,Welthandelsplatz 1, A-1020 Vienna, Austria. E-mail: manfred.fi[email protected] Vienna University of Economics and Business, University of Salzburg a r X i v : . [ ec on . E M ] F e b Introduction
Time-varying parameter vector autoregressive (TVP-VAR) models are commonly used infinance and macroeconomics to capture dynamic relations across variables, regime shiftsand/or structural changes in economic processes (see Primiceri, 2005; Cogley and Sargent,2005; Dangl and Halling, 2012). These models typically assume that the parameters evolveover time according to a simple stochastic process such as a random walk. While beingrather flexible and parsimonious, this assumption does not allow to examine the extent towhich covariates cause changes in the time-varying parameters (TVPs) over time. Moreover,wrongly assuming a random walk state equation could negatively impact predictive accuracybecause it essentially implies a smoothness prior on the coefficients. This might be at oddswith the rapid shifts we have observed in financial time series such as bond yields and thuscould negatively impact predictive accuracy.The literature has dealt with the issue of selecting the appropriate law of motion for time-varying parameters (TVPs) by estimating different models separately and then using modelselection criteria to discriminate between competing specifications (see, e.g., Sims and Zha,2006; Koop et al., 2009; Hauzenberger, 2020). But, to the best of our knowledge, no attempthas been made to develop models that rely on a large set of competing laws of motion forthe coefficients and decide which one describes the data best.This paper proposes a flexible approach to TVP-VARs that efficiently integrates outuncertainty with respect to the state evolution equation. The approach assumes that theTVPs depend on a potentially large panel of covariates. These covariates are commonlylabeled effect modifiers which can be partially latent and might feature their own stateequations. In case they are observed, we obtain a model that is closely related to the varyingcoefficient model originally proposed in Hastie and Tibshirani (1993). The main advantageof using observed, as opposed to latent effect modifiers, is that we can investigate the drivingsources of parameter change. This feature is important if the researcher is interested ininvestigating why relations between variables in a VAR change over time, and to what extent2hese changes are explained by the observed effect modifiers.Careful selection of these effect modifiers is crucial. On the one hand, deciding on theappropriate set of observed quantities is difficult, and a large set of candidates could arise.One key objective of this paper is to provide techniques to select promising subsets. On theother hand, appropriately selecting the latent effect modifiers allows us to capture situationswhere parts of the coefficients evolve smoothly whereas others move more abruptly. Thelatter behavior of the TVPs is often found for US macroeconomic data (see Sims and Zha,2006), whereas the former is consistent with financial time series such as bond yields or stockreturns (see Dangl and Halling, 2012; Huber et al., 2019).Since large VARs often include both macroeconomic as well as financial quantities, asuccessful model should be able to accommodate both types of structural change or evenrely on linear combinations of them. This is precisely what we aim to achieve in this paper.Our approach is capable of answering not only the question why coefficients change, but alsoto infer an appropriate law of motion using a broad set of latent quantities, each equippedwith its own state evolution equation. These unobserved quantities range from latent factorsthat follow a random walk to Markov switching indicators that allow a subset of parametersto switch between a low number of regimes. Our model approach nests several alternativesproposed in the literature such as the TVP-VAR of Primiceri (2005) and Cogley and Sargent(2005) or the reduced-rank model of Chan et al. (2020).This large degree of flexibility, however, comes with two concerns. The first is thatoverfitting problems can easily arise. We overcome these by using Bayesian shrinkage priors.Our prior is a variant of the well-known Horseshoe prior (see Carvalho et al., 2010) that allowsus to shrink coefficients associated with irrelevant effect modifiers towards zero. The secondconcern relates to computation. Since inclusion of a large number of endogenous variables ina VAR quickly leads to a huge dimensional parameter space, we propose a computationallyefficient Markov chain Monte Carlo (MCMC) algorithm. To circumvent mixing issues, werely on different parameterizations of the model during MCMC sampling. The corresponding3ovel algorithm thus provides a second important contribution of the paper.In our empirical work, we use the model approach to forecast the US term structure ofinterest rates. We investigate the empirical properties of our approach using two informationsets in the underlying VAR. First, we include several interest rates at different maturities di-rectly as endogenous variables. Second, we consider a three-factor Nelson-Siegel model for theterm structure of interest rates as in Diebold and Li (2006) and Diebold et al. (2008). Adopt-ing a long hold-out period that includes several recessionary episodes, our approach improvesupon a wide range of competing models. While improvements for point forecasts are oftenmuted, we find that our proposed model yields favorable density predictions. The predictiveexercise is complemented by a comprehensive discussion of patterns in time-variation andtheir sources. Moreover, how our approach can be used to analyze low frequency relationsbetween the observed quantities in the model over time.The rest of the paper is structured as follows. Section 2 introduces the econometricframework which includes the general form of the TVP-VAR, a flexible law of motion forthe latent states as well as the effect modifiers which crucially impact the state dynamics.This section, moreover, introduces the Bayesian prior setup and techniques for posterior andpredictive inference. Section 3 applies the model approach to the term structure of US interestrates. It also serves to illustrate key model features and to highlight the predictive capabilitiesof the approach in an out-of-sample forecasting exercise. The last section summarizes andconcludes the paper. Additional technical details and further empirical results are providedin the Appendix.
Let y t denote an M × t = 1 , . . . , T .We assume that y t depends on its P lags which we store in a K = M P -dimensional vector4 t = ( y (cid:48) t − , . . . , y (cid:48) t − P ) (cid:48) . Then the basic TVP-VAR can be written as a linear multivariateregression model: y t = ( I M ⊗ x (cid:48) t ) β t + (cid:15) t , where β t represents a set of k = M K dynamic regression coefficients and (cid:15) t ∼ N ( M , Σ t ) isa vector Gaussian shock process with time-varying M × M -dimensional variance-covariancematrix Σ t . We assume that Σ t can be decomposed as follows: Σ t = Q t H t Q (cid:48) t . Here, Q t denotes an M × M lower triangular matrix with unit diagonal with v (= M ( M − q t . H t = diag( e h t , . . . , e h Mt ) is a diagonal matrix with h jt ( j =1 , . . . , M ) representing time-varying log-volatilities. These are assumed to evolve accordingto an AR(1) process: h jt = µ j + ψ j ( h jt − − µ j ) + ν jt , ν jt ∼ N (0 , ς j ) .µ j is the unconditional mean, ψ j the persistence parameter and ς j the variance of the log-volatility process for equation j .In what follows, we rewrite the TVP-VAR using its non-centered parameterization (Fr¨uhwirth-Schnatter and Wagner, 2010): y t = ( I M ⊗ x (cid:48) t )( β + ˜ β t ) + ( Q + ˜ Q t ) ε t , ε t ∼ N ( M , H t ) . β denotes a k -dimensional vector of constant coefficients, and ˜ β t = β t − β . This parame-terization allows us to disentangle time-invariant (encoded by β ) from time-varying effects(encoded by ˜ β t ) for the regressors. For the decomposed variance-covariance matrix, we have Q , a lower triangular matrix with ones on the diagonal capturing the constant part of the5ovariances. ˜ Q t = Q t − Q is the corresponding lower triangular matrix with zero-diagonal el-ements containing the time-varying part. Their free elements are collected in the v × q and ˜ q t , respectively.In this paper, the focus is on modeling the N (= k + v )-dimensional vector ˜ γ t = ( ˜ β (cid:48) t , ˜ q (cid:48) t ) (cid:48) and the constant part γ = ( β (cid:48) , q (cid:48) ) (cid:48) . The literature typically assumes that the transitiondistribution p (˜ γ t | ˜ γ t − ) is given by:˜ γ t | ˜ γ t − ∼ N (˜ γ t − , V ) with ˜ γ = N . This law of motion suggests that the expected value of ˜ γ t − equals ˜ γ t and the amount oftime-variation is determined by the N × N -dimensional process innovation variance-covariancematrix V . This matrix is often assumed to be diagonal. Notice that if selected elements in V are equal to zero, the corresponding regression coefficients are constant.Estimation and inference is typically carried out via Bayesian methods. The recent lit-erature proposes using shrinkage priors to allow for data-based selection of those coefficientswhich should be time-varying or constant. This already leads to substantial improvementsin predictive accuracy but does not tackle the fundamental question whether the coefficientsare better characterized by a random walk, a change-point process, or by mixtures of these. As discussed in the previous sub-section, the typical assumption is that ˜ γ t follows a randomwalk process. In addition, the shocks to the random walk state equation are often assumedto feature a positive error variance. We relax both assumptions to allow for more flexibility.The random walk assumption is relaxed by assuming that the time-varying part storedin ˜ γ t depends on a set of R additional factors z t . These z t are the effect modifiers mentionedin the introduction that can be observed or latent. The relationship between the TVPs and6 t is given by:˜ γ t = Λ z t + η t . (1) Λ denotes an N × R matrix of regression coefficients, and η t ∼ N ( N , Ω ) is a Gaussian errorterm with diagonal error variance-covariance matrix Ω = diag( ω , . . . , ω N ). If R (cid:28) N , thecoefficients feature a factor structure and co-move according to the effect modifiers in z t .The relationship between ˜ γ t and z t is determined by the factor loadings in Λ . For instance,if the j th column of Λ , λ j , is equal to zero, the corresponding j th factor in z t does not enterthe model and thus has no influence on ˜ γ t .The specific selection of z t is crucial for determining the dynamics of ˜ γ t . Appropriatechoice of z t yields a variety of important special cases that depend on the specific values of Λ and Ω as well as on the composition of z t . In this sub-section, we briefly focus on specialcases that arise independently of the choice of z t . The next sub-section deals with cases thatarise if z t is suitably chosen.These two cases are the following. If Λ = N × R , with N × R being a N × R matrix of zeros,we obtain a random coefficients model that assumes that the regression coefficients follow awhite noise process (for some recent papers that follow this approach, see Korobilis, 2019;Hauzenberger et al., 2019). The second special case arises if both Λ = N × R and Ω = N × N .In this case, we obtain a standard constant parameter regression model.Before we discuss the choice of z t , it is worth noting that if z t is (partially) latent, themodel in Eq. (1) is not identified. Since our object of interest is γ t , this poses no greaterissues. If we wish to structurally interpret elements in z t , standard identification strategiesfrom the literature on dynamic factor models can be used (see, e.g., Geweke and Zhou, 1996;Aguilar and West, 2000; Stock and Watson, 2011).7 .3 Possible Choices for the Effect Modifiers The specific choice of z t is crucial in determining how ˜ γ t behaves over time. Hence, by suitablychoosing the elements in z t , our model approach is related to the following specifications: • Chan et al. (2020): We assume that z t consists exclusively of a sequence of R = R τ latentfactors τ t , which follow a multivariate random walk: τ t = τ t − + ν t , ν t ∼ N (0 , V τ ) . V τ = diag( v , . . . , v R τ ) denotes a diagonal variance-covariance matrix with v j being processinnovation variances that determine the smoothness of the elements in τ t . Note that setting v j close to zero effectively implies that τ jt , the j th element in τ t , is constant. This modelimplies a factor structure in ˜ γ t if R τ (cid:28) N . • Primiceri (2005): If R = R τ = N , the elements in z t are random walks, and Λ = I N ,we obtain a standard time-varying parameter model. Assuming that the covariances areconstant we obtain the model put forth in Cogley and Sargent (2005). • Sims and Zha (2006): A Markov switching model can be obtained by setting z t = S t , with S t ∈ { , } denoting a binary indicator with transition probabilities given by: p ( S t = i | S t − = j ) = p ij for i, j = 0 , , with p ij denoting the ( i, j ) th element of a 2 × P . Inclusion of this random quantity allows to capture structural breaks in ˜ γ t that arecommon to all coefficients. • Caggiano et al. (2017): Assuming that z t is exclusively composed of observed quantitieswe obtain a regression model with interaction terms.These examples show that our model, conditional on choosing a suitable set of effect modifiers,8s capable of mimicking several prominent specifications in the literature. Since the questionon the appropriate state evolution equation is essentially a model selection issue, we simplyspecify z t to include most (with the exception of the R = N setup) of the modifiers discussedabove.More precisely, we set z t as follows: z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) . Here we let r t denote a set of R r observed factors and the dimension of z t is thus R = R r + R S + R τ with R S = M . To allow for additional flexibility we assume that τ t = ( τ (cid:48) t , . . . , τ (cid:48) Mt ) (cid:48) with τ jt being equation-specific factors of dimension R τj (and thus R τ = (cid:80) j R τj ). Assuming that R τi = R τj = δ for all i, j , we have R τ = δM latent random walk factors. Likewise, we estimatea separate Markov switching indicator S jt per equation (and thus S t = ( S t , . . . , S Mt ) (cid:48) , withthe corresponding transition probabilities matrix denoted by P j ).The corresponding loadings matrix Λ is structured such that the loadings in equation j associated with the factors τ it and S it for i (cid:54) = j equal zero. This assumption strikes abalance between assuming a large number of latent factors to achieve maximum flexibility(and thus risk overfitting) and using a rather parsimonious model (with the risk of being toosimplistic). Recent contributions use similar assumptions on the state evolutions, (see Koopet al., 2009; Maheu and Song, 2018). As opposed to these papers, our approach offers moreflexibility since, if necessary, the presence of the idiosyncratic shocks to the TVPs allows fordeviations if the factor structure does not represent the data well.Our specification implies that, depending on the factor loadings Λ , the evolution of ˜ γ t might be a combination of a set of random walk factors, a Markov switching process andsome observed quantities. To single out irrelevant elements in z t , one could simply set thecorresponding columns in Λ equal to zero. In this paper, we achieve this through a Bayesianshrinkage prior. The next sub-section discusses our priors in more detail.9 .4 The Prior Setup The discussion in Sub-section 2.1 shows that our model approach nests a variety of com-peting models. To select the appropriate model variant and alleviate over-parameterizationconcerns, we opt for a Bayesian approach to introduce shrinkage. Here, we summarize thepriors we impose on key parameters.In light of the specific choice of z t , we introduce some additional notation to clarify detailson our prior implementation. Let us assume that Λ is composed of the following matrices: Λ = [ Λ r Λ S Λ τ ] , where Λ r is an N × R r matrix of loadings related to the observed quantities, Λ S denotes an N × R S matrix of loadings related to S t , and Λ τ represents an N × R τ factor loadings matrixassociated with τ t .For imposing shrinkage we rely on variants of the horseshoe prior (Carvalho et al., 2010).While in principle any global-local shrinkage prior may be used, we choose the horseshoe priordue to its excellent shrinkage properties and the lack of tuning parameters. In particular, wespecify a column-wise horseshoe prior on the loadings matrix. Let Λ j denote a sub-matrixof the free elements in Λ corresponding to the j th equation, and λ ji mark the i th column ofthis matrix. λ ji,(cid:96) refers to the (cid:96) th element of this vector. The prior is given by: λ ji,(cid:96) | κ ji,(cid:96) , δ ji ∼ N (0 , κ ji,(cid:96) δ ji ) , κ ji,(cid:96) ∼ C + (0 , , δ ji ∼ C + (0 , . Here, C + (0 ,
1) denotes the half Cauchy distribution, and δ ji is an equation- and column-specific global shrinkage factor, while κ ji,(cid:96) is a local scaling parameter.To further regularize our potentially huge-dimensional parameter space, we impose anequation-wise horseshoe prior on the constant part of the regression coefficients and covari-ances in γ j corresponding to the j th equation. Let γ ji denote the i th element of the vector.10he setup is similar to the one of the loadings matrix, and given by the hierarchical structure: γ ji | ξ ji , ζ j ∼ N (0 , ξ ji ζ j ) , ξ ji ∼ C + (0 , , ζ j ∼ C + (0 , . The hyperparameter ζ j is an equation-specific global shrinkage factor, while the ξ ji ’s are localscalings.We mentioned earlier that it is often assumed that shocks to the states feature positiveerror variances. To introduce shrinkage of ω ii towards zero, we impose a horseshoe prior alsoon the square root of the innovation variances of the measurement errors in Eq. (1). This prioris specified in an equation-specific manner. For equation j , let ω j denote a v j (= j − k )-dimensional vector which stores the diagonal elements in Ω associated with the j th equation.This includes the process innovation variances on the k regression coefficients and the j − Q t . The square root of the i th element of ω j , √ ω ji , features thefollowing prior hierarchy: √ ω ji | (cid:36) ji , ϑ j ∼ N (0 , (cid:36) ji ϑ j ) , (cid:36) ji ∼ C + (0 , , ϑ j ∼ C + (0 , . Choosing a Gaussian prior on the square root of the variance in the first level of the hierarchyimplies a Gamma prior on ω ji , with ω ji | (cid:36) ji , ϑ j ∼ G (cid:0) / , (cid:36) − ji ϑ − j / (cid:1) , see also Fr¨uhwirth-Schnatter and Wagner (2010). The hyperparameters ϑ j and (cid:36) ji are again equation-specificglobal and local shrinkage parameters. Furthermore, we set V τ = I R τ and thus imposeshrinkage through the factor loadings in Λ τ (see Chan et al., 2020).On the parameters of the state equation of the log-volatility processes µ j , ψ j and ς j , weuse the setup proposed in Kastner and Fr¨uhwirth-Schnatter (2014). That is, we assume aGaussian prior on the unconditional mean, µ j ∼ N (0 , ), a Beta prior on the transformedautoregressive parameter, ( ψ j + 1) / ∼ B (5 , . ς j ∼ G (1 / , / j = 1 , . . . , M . 11or the equation-specific transition probabilities P j of Markov switching indicators, weassume that the ( i, i ) th element p j,ii arises from a Beta distribution given by: p j,ii ∼ B ( e i , e i ) , for i = 0 , j = 1 , . . . , M, and hence p j,i(cid:96) = 1 − p j,i(cid:96) for i (cid:54) = (cid:96) . In the empirical application, we define e = e = 10and e = e = 1, in order to weakly push each S jt towards a single-state a priori. To simulate from the full posterior distribution we develop an efficient MCMC algorithm.Since full-system estimation of the VAR quickly becomes computationally cumbersome, werely on the equation-by-equation algorithm suggested in Carriero et al. (2019).Conditional on Q t , one can state the VAR as a system of (conditionally) independentequations. The first equation of this system is given by: y t = x (cid:48) t ( β + ˜ β t ) + ε t , and the j th equation ( j > y jt = x (cid:48) t ( β j + ˜ β jt ) + u (cid:48) jt ( q j + ˜ q jt ) + ε jt . (2) β j and ˜ β jt denote the j th subvectors of the constant and time-varying parts in β t with β t = ( β (cid:48) t , . . . , β (cid:48) Mt ) (cid:48) and u jt = ( ε t , . . . , ε j − ,t ) (cid:48) . The ( j − q j and˜ q jt store the constant and time-invariant part of the free elements in the j th row of Q t .This approach allows to estimate the different elements of γ t that relate to the M equationsindependently from each other conditional on the shocks to the preceding j − y jt = m (cid:48) jt ( γ j + ˜ γ jt ) + ε jt , (3)where m jt = ( x (cid:48) t , u (cid:48) jt ) (cid:48) , ˜ γ jt = γ jt − γ j , and γ jt refers to the TVPs associated with the j th equation in γ t , and γ j denotes the corresponding constant part. All the following steps willbe carried out on an equation-by-equation basis and making use of the regression form in Eq.(3). For notational simplicity, we assume that all elements in z t are latent. In light of thediscussion in Sub-section 2.3, this implies that z jt = ( τ (cid:48) jt , S jt ) (cid:48) and the extension to includeobserved factors is trivial. Sampling z jt . Conditional on the remaining quantities of the model, we simulate thelatent (random walk and Markov switching) components in z jt by integrating out ˜ γ jt . Thisis achieved by rewriting Eq. (3) as:˜ y jt = m (cid:48) jt Λ j z jt + m (cid:48) jt η jt + ε jt , (4)with ˜ y jt = y jt − m (cid:48) jt γ j . Defining ˜ m (cid:48) jt = m (cid:48) jt Λ j and ˆ ε jt = m (cid:48) jt η jt + ε jt allows us to cast Eq.(4) as a simple linear regression model:˜ y jt = ˜ m (cid:48) jt z jt + ˆ ε jt , ˆ ε jt ∼ N (0 , m (cid:48) jt diag( ω j ) m jt + e h jt ) . (5)This parameterization has the advantage that it does not depend on ˜ γ jt , and z jt can thusbe sampled marginally of ˜ γ jt . This improves mixing substantially since ˜ γ jt and z jt will oftenbe highly correlated (for a detailed discussion of this issue, see Gerlach et al., 2000; Giordaniand Kohn, 2008).Depending on the precise law of motion for the elements in z jt , standard algorithms cannow be used. In this paper, we use two different law of motions. For the latent random walkfactors in τ jt , we use the forward filtering backward sampling algorithm outlined in Carter13nd Kohn (1994) and Fr¨uhwirth-Schnatter (1994). In case of the latent Markov switchingfactors in S jt , we use the algorithm outlined in Kim and Nelson (1999). Both algorithmsare well known and relevant details may be found in the original papers. Here, it suffices tonote that in both cases, sampling the latent states is computationally easy since the statespace is low dimensional with R (cid:28) N . In this setting, sampling the factors equation-wisecan be carried out in O ( R ) steps, a substantial computational improvement relative to the O ( N ) steps necessary to estimating an unrestricted TVP regression (see also the discussionin Chan et al., 2020). Sampling the state innovation variances . To obtain draws for the state innovationvariances, reconsider Eq. (1) and draw them conditional on the observed/latent states andthe factor loadings using a generalized inverse Gaussian distribution. For further details andthe moments of this distribution, see Appendix A.
Sampling Λ j and γ j jointly . Similarly to z jt , we sample the non-zero loadings in Λ and the time-invariant coefficients marginally of γ jt by using equation-by-equation estimation.The observation equation for equation j (conditional on z jt ) can be written as a standardregression model: y jt = ˆ m (cid:48) jt ˆ γ j + ˆ ε jt , (6)where ˆ m jt = ( m (cid:48) jt , ( z jt ⊗ m jt ) (cid:48) ) (cid:48) is an Rv j -dimensional vector of covariates, and ˆ γ j =( γ (cid:48) j , vec( Λ j ) (cid:48) ) (cid:48) denoting an Rv j -dimensional coefficient vector. The posterior of ˆ γ j is Gaussianwith well known moments. Sampling the stochastic volatilities . The latent log-volatility processes can again besampled on an equation-by-equation basis. This step is implemented using the R -package stochvol . Sampling the horseshoe prior hyperparameters . Our assumptions imply analogous It is worth mentioning that the O ( N ) statement is true for the precision sampler and differs for forward-filtering backward-sampling algorithms. This section applies the model to predict the term structure of US interest rates. These timeseries are characterized by substantial non-linearities (e.g. during the period of the zero lowerbound), feature substantial co-movement both in the level of the time series but also in theparameters describing their evolution. Our proposed model framework might thus be wellsuited to capture such features. We investigate this claim in a thorough forecasting exerciseusing several established benchmarks. After showing that our approach yields favorableforecasts, we discuss the driving forces behind parameter changes as well as discuss how keyquantities that shape yield curve dynamics co-move over time at low frequencies.
Our aim is to predict monthly zero-coupon yields of US treasuries at different yearly matu-rities. The data is described in detail in G¨urkaynak et al. (2007). The target variables are1 , , , ,
10 and 15 years maturities. All variables enter our model in first differences.Estimation and forecasting is carried out recursively. Using data from 1973:01 to 1999:12,we produce one-month-ahead and three-months-ahead forecasts for 2000:01. After obtainingthe predictive distributions we expand the sample and repeat this procedure until we reach2019:12. Point forecast performance is measured using Root Mean Squared Errors (RM-SEs), while density forecasts are assessed in terms of Log Predictive Bayes Factors (LPBFs),averaged over the out-of-sample observations (Geweke and Amisano, 2010). Available online at federalreserve.gov/data/nominal-yield-curve.htm. Note that for our set of financial indicators, data revisions and ragged edges arising from delays inthe publication of the series do not matter. This is due to financial market data being available almostinstantaneously, and the published quotes are not subject to revisions at later dates. Thus, r t = ( r NGCI ,t − , r REC ,t − , r RF ,t − ) (cid:48) and R r = 3. Exogenous variablesenter the model as first order lags. Higher-order forecasts involving exogenous variablesare based on random walk predictions of these quantities.We use these three effect modifiers for simple reasons. First, there is strong evidence thatyield curve dynamics differ across business cycle phases (see, e.g., Hevia et al., 2015). Second,the RF interest rate serves as an early warning indicator which possesses predictive powerfor changes in the shape of the yield curve. Finally, the inclusion of the NFCI is motivatedby the recent literature on forecasting tail risks in macroeconomic and financial time series(see, e.g., Adrian et al., 2019; Carriero et al., 2020; Adams et al., 2021). The forecast exercise distinguishes between two model classes, with 15 distinct model spec-ifications in each class. The first model class involves TVP-VARs that incorporate the sixtarget variables as endogenous variables, that is M = 6. This model class is labeled as VAR.The second model class includes specifications based on the three-factor Nelson-Siegel Available online at mba.tuck.dartmouth.edu/pages/faculty/ken.french. i t ( θ ) = L t + (cid:18) − exp( − θα ) θα (cid:19) ˇS t + (cid:18) − exp( − θα ) θα − exp( − θα ) (cid:19) C¸ t . where i t ( θ ) denotes the yield at maturity θ at time t , L t is a factor that controls the level, ˇS t determines the slope, and C¸ t represents the curvature factor of the yields. The parameter α governs the exponential decay rate. To maximize the loading on C¸ t we set α = 0 . × . In what follows, we use the latent factors L t , ˇS t and C¸ t as endogenous variablesin the VAR specifications by defining y t = ( L t , ˇS t , C¸ t ) (cid:48) , resulting in M = 3. These latentfactors are obtained by running OLS on a t -by- t basis. This model class is subsequentlylabeled NS-VAR.Model specifications are differentiated over a grid of effect modifier combinations. Inparticular, specifications within a model class differ in terms of three aspects (see Table 1):First, in terms of whether the three exogenous variables (collected in r t ) are included in z t or not (”x” marks inclusion, ”–” indicates no observed factors); second, in terms of thenumber of latent random walk factors R τj included in z t (which we assume to be equal acrossequations); and third, in terms of the presence of Markov switching indicators in S t , againwith ”x” marking their inclusion and ”–” their absence. This setup implies that we have 15time-varying parameter NS-VAR and 15 time-varying parameter VAR model specifications.For comparative purposes we also consider the two class-specific constant parameter modelvariants, labeled “Constant,” and a conventional independent random walk specification ofthe TVPs (i.e. we set the number of random walk factors equal to K and exclude r t and S t ).Notice that we also have a specification which includes only S t and another one which usesonly observed factors. The latter one is closely related to a Markov switching model whereasthe second one closely resembles a VAR with interaction terms. See Diebold and Li (2006) for a discussion of this specific choice. .3 Results Table 1 shows the one-month and one-quarter-ahead out-of-sample forecasting results forUS treasury yields at different maturities, using the 30 TVP model specifications and thetwo constant parameter model variants as described in the previous sub-section. Recall that M = 3 (and K = 9) in case of the NS-VAR and M = 6 (and K = 18) in case of theVAR model variants. z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) where r t denotes R r (= 3)-dimensional vector, S t a R S -dimensional vector, and τ t a R τ -dimensional vector. All specifications feature P = 3 lagsof the endogenous variables.The performance of point forecasts is measured in terms of RMSEs and that of densityforecasts in terms of LPBFs, relative to the constant parameter VAR model (shaded inyellow) that serves as benchmark. RMSEs are presented in ratios, and LPBFs in differencesare given below the RMSEs in parentheses. RMSEs below one indicate superior performance,relative to the benchmark, as LPBF figures greater than zero do. The best performing modelspecification by column is given in bold, highlighting the specification with the smallestRMSE ratio and the largest positive LPBF difference, respectively.The vast range of competing specifications, loss functions used to evaluate forecasts andmaturities makes it hard to identify a single best performing model. We first provide ageneral overview on model performance and then zoom into differences in predictive accuracyfor point and density forecasts.At a very general level, Table 1 suggests a pronounced degree of heterogeneity in forecastaccuracy across models and for both the NS-VAR and VAR specifications. While differencesat some maturities are substantial, they are muted or non-existent for others. It is alsoworth mentioning that the TVP variants of the NS-VAR and VAR models outperform theconstant parameter specifications in most cases (apart from one-quarter ahead point forecastsof treasuries with a maturity of five years). This general observation holds true irrespective ofwhether only point forecasts or the full predictive distribution are considered. These accuracypremia point towards the necessity of addressing structural breaks in the dynamic evolution18 able 1: Out-of-sample forecasting results for US treasury yields at different maturities using TVP-NS-VAR and TVP-VARmodel specifications.
Model Specification One-month-ahead One-quarter-ahead r t δ S t Joint 1 year 3 year 5 year 7 year 10 year 15 year Joint 1 year 3 year 5 year 7 year 10 year 15 yearTVP-NS-VAR x 6 x 0.78 0.91 0.95 0.94 0.89 0.79 0.61 0.95 0.99 1.04 1.01 0.98 0.94 0.85(-0.79) (-0.28) (-0.11) (0.03) (0.14) (0.34) (0.72) (-1.42) (-0.35) (-0.20) (-0.06) (-0.01) (0.01) (0.10)x 3 x 0.79 0.96 1.00 0.96 0.90 0.78 0.60 0.95 1.03 1.06 1.02 0.98 0.94 0.84(-0.66) (-0.21) (-0.09) (0.04) (0.14) (0.35) (0.73) (-1.35) (-0.27) (-0.14) (-0.03) (0.00) (0.02) (0.11)x 1 x ) (0.74) (-1.26) (-0.24) (-0.13) (-0.02) (0.02) (0.04) (0.13)– 6 x 0.79 0.96 0.97 0.95 0.90 0.79 0.60 0.94 1.04 1.03 1.00 0.97 0.93 0.84(-0.86) (-0.30) (-0.11) (0.04) (0.15) (0.35) (0.72) (-1.49) (-0.37) (-0.19) (-0.05) (0.00) (0.01) (0.09)– 3 x 0.78 0.95 ) (0.36) (0.73) (-1.16) (-0.22) (-0.09) (0.00) (0.02) (0.03) (0.12)– 6 – 0.79 0.94 0.97 0.95 0.90 0.79 0.61 0.93 0.98 0.99 0.97 0.96 0.93 0.84(-0.91) (-0.33) (-0.15) (0.00) (0.12) (0.33) (0.70) (-1.43) (-0.40) (-0.23) (-0.07) (-0.01) (0.02) (0.10)– 3 – 0.79 0.94 0.96 0.95 0.90 0.79 0.60 0.92 0.98 0.98 0.96 0.95 0.92 0.83(-0.86) (-0.23) (-0.10) (0.04) (0.15) (0.34) (0.72) (-1.37) (-0.29) (-0.15) (-0.02) (0.02) (0.03) (0.12)– 1 – 0.78 0.95 0.97 0.94 0.89 0.78 0.60 (-0.72) (-0.20) (-0.06) (0.05) (0.15) (0.34) (0.72) (-1.30) (-0.22) (-0.11) (0.00) (0.03) (0.04) (0.11)x – – 0.78 0.93 0.99 0.95 0.89 0.77 0.59 0.91 ) (0.06) ( )– – x 0.78 0.95 0.98 0.95 0.89 0.78 0.60 0.94 1.05 1.03 0.99 0.96 0.93 0.83(-0.76) (-0.24) (-0.07) (0.04) (0.14) (0.33) (0.71) (-1.26) (-0.29) (-0.12) (-0.03) (0.00) (0.01) (0.11)– K – 0.79 0.96 0.98 0.95 0.90 0.78 0.60 0.92 0.99 1.00 0.97 0.95 0.92 0.83(-0.42) (0.11) (0.01) (0.07) (0.15) (0.34) (0.71) (-1.02) (0.07) (-0.01) (0.02) (0.03) (0.03) (0.11) Constant able 1 continued Model Specification One-month-ahead One-quarter-ahead r t δ S t Joint 1 year 3 year 5 year 7 year 10 year 15 year Joint 1 year 3 year 5 year 7 year 10 year 15 yearTVP-VAR x 6 x 0.79 0.92 0.96 0.94 0.89 0.79 0.60 0.94 0.96 0.99 0.98 0.97 0.94 0.84(1.13) (0.06) (0.03) (0.07) (0.14) (0.33) (0.68) (0.63) (-0.01) (-0.01) (0.00) (0.00) (0.01) (0.08)x 3 x 0.78 0.92 0.96 0.94 0.89 0.78 0.60 0.94 0.98 1.00 0.99 0.98 0.94 0.84(1.20) (0.09) (0.03) (0.06) (0.14) (0.33) (0.70) (0.55) (0.01) (-0.02) (-0.01) (0.00) (0.02) (0.10)x 1 x 0.79 0.93 0.97 0.95 0.89 0.78 0.60 0.93 0.96 0.99 0.98 0.97 0.94 0.84(1.52) (0.10) (0.03) (0.06) (0.14) (0.33) (0.70) ( ) (0.02) (0.00) (0.01) (0.01) (0.02) (0.12)x 6 – 0.79 0.93 0.98 0.95 0.90 0.79 0.60 0.94 0.99 1.00 1.00 0.98 0.95 0.84(1.26) (0.06) (0.04) (0.07) (0.15) (0.34) (0.69) (0.69) (0.01) (0.01) (0.02) (0.02) (0.03) (0.10)x 3 – 0.78 0.91 0.96 0.94 0.89 0.79 0.60 0.94 0.98 1.00 1.00 0.98 0.95 0.85(1.20) (0.11) ( ) ( ) (0.16) (0.34) (0.71) (0.66) (0.03) (0.02) (0.02) (0.03) (0.04) (0.12)x 1 – 0.78 ) (0.04) (0.04) (0.05) (0.14)– 6 x 0.81 0.95 0.97 0.95 0.91 0.81 0.63 0.97 1.00 1.00 1.00 1.00 0.98 0.90(0.95) (0.02) (0.03) (0.07) (0.13) (0.32) (0.67) (0.45) (-0.04) (-0.03) (-0.01) (-0.01) (0.00) (0.06)– 3 x 0.79 0.95 0.98 0.95 0.90 0.79 0.60 0.93 0.99 0.99 0.97 0.96 0.93 0.83(1.08) (0.06) (0.02) (0.06) (0.12) (0.31) (0.68) (0.28) (-0.01) (-0.02) (-0.01) (-0.01) (0.01) (0.08)– 1 x 0.78 0.95 0.96 0.94 0.89 0.79 0.60 0.95 0.97 0.98 0.98 0.98 0.96 0.87(1.29) (0.08) (0.04) (0.07) (0.14) (0.32) (0.69) (0.44) (0.04) (0.00) (-0.01) (-0.01) (0.00) (0.09)– 6 – 0.79 0.95 0.96 0.95 0.90 0.79 0.60 0.92 0.99 0.98 0.97 0.96 0.93 0.83(1.04) (0.04) (0.05) (0.08) (0.15) (0.34) (0.69) (0.32) (-0.03) (0.01) (0.02) (0.03) (0.04) (0.11)– 3 – 0.79 0.94 0.97 0.94 0.89 0.79 0.61 0.93 0.98 0.98 0.97 0.96 0.93 0.83(1.27) (0.08) (0.05) (0.08) (0.15) (0.34) (0.70) (0.61) (0.00) (0.03) (0.04) (0.05) (0.06) (0.13)– 1 – 0.79 0.95 0.96 0.95 0.90 0.79 0.61 0.92 0.99 0.99 0.97 0.95 0.92 0.82( ) (0.09) (0.06) (0.08) (0.15) (0.35) (0.72) (0.78) (0.04) (0.04) ( ) (0.05) ( ) (0.15)x – – 0.79 0.93 0.97 0.95 0.90 0.79 0.60 0.93 0.98 0.99 0.98 0.97 0.93 0.83(1.26) (0.12) (0.04) (0.07) (0.14) (0.35) ( ) (0.75) (0.07) (0.03) (0.02) (0.01) (0.03) (0.15)– – x 0.79 0.94 0.96 0.93 0.89 0.79 0.61 0.93 0.97 0.97 0.97 0.96 0.94 0.86(0.83) (0.07) (0.02) (0.07) (0.14) (0.33) (0.70) (0.49) (0.02) (0.00) (0.01) (0.01) (0.02) (0.11)– K – 0.79 0.94 0.96 0.95 0.90 0.80 0.61 0.93 0.98 0.98 0.97 0.96 0.93 0.83(0.28) ( ) (0.03) (0.04) (0.10) (0.27) (0.67) (-0.15) ( ) (0.01) (-0.02) (-0.03) (-0.04) (0.07) Constant
Notes : We present the results of out-of-sample one-month-ahead and one-quarter-ahead forecasting from the 15 TVP-NS-VAR and 15 TVP-VAR model variants for maturities1, 3, 5, 7, 10 and 15 years and the corresponding joint measure across all maturities. Specifications of the TVP model variants are differentiated over a grid of effect modifiercombinations, in terms of whether the exogenous variables in r t are included or not (“x” marks inclusion, “–” indicates absence), the number δ = R τ /M of latent factors perequation (with δ = K yielding conventional independent random walk specifications for the TVPs) and the presence of Markov switching factors in S t , again with “x” markinginclusion and “–” absence. “Constant” labels a conventional constant parameter VAR. M = 3 and K = 9 in the case of NS-VAR, and M = 6 and K = 18 in the case of VAR.We estimate and forecast recursively, using data from 1973:01 to the time that the forecast is made, beginning in 2000:01 through 2019:12. Root Mean Squared Errors (RMSEs)and Log Predictive Bayes Factors (LPBFs), averaged over the out-of-sample observations are given relative to the constant parameter VAR model (shaded in yellow). The bestperforming model specification by column is given in bold, highlighting the specification with the smallest RMSE ratio and the largest positive LPBF difference, respectively. f the yield curve.Comparing the NS-VAR and VAR specifications indicates that the latter usually performbetter for density forecasts, while the former are often superior in terms of point forecasts.The better point forecasting performance of the NS-VAR suggests that the three factorscontain relevant information for the first moment of the predictive distribution. When wealso consider higher order moments, this story changes. The better density performanceof the VAR models is most likely driven by two sources. The first is that, as opposed tothe conditional mean, the strong implicit assumptions of the NS-VAR on the predictionerror variance-covariance matrix seems overly restrictive. Allowing for richer dynamics inthe covariances by explicitly modeling the shocks to a panel of yields thus yields betterdensity forecasts. The second fact is that the small-scale NS-VARs might feature a tootight predictive variance since relevant information is ignored. And this could harm densityforecasting accuracy during turbulent times.Next, we zoom into specific model classes. Within these, differences in performance alongsub-divisions (provided by the inclusion/exclusion of the exogenous variables or the Markovswitching processes and the number of latent random walk factors) are often negligible. Thisfinding indicates that our flexible approach to probabilistically selecting the most adequatestate evolution via Bayesian shrinkage is successful, and thus not susceptible to overfittingconcerns.We now turn to considering model-specific forecasting accuracy with the aim to find thebest performing models for point and density forecasts and for the two different forecasthorizons. The overall winner for point forecasts at the one-month horizon, on average, is theflexible NS-VAR model specification featuring the exogenous variables, one latent factor perequation and the Markov switching processes. While this specification yields RMSEs thatare 23 percent lower than those of the benchmark specification, it must be acknowledgedthat most competing specifications exhibit values that are similar in magnitude. The sameis true for the best-performing model specification at the one-quarter-ahead horizon, the21ne-factor NS-VAR without exogenous variables and Markov switching processes, albeit atmuch smaller margins versus the benchmark. Here, improvements are about nine percent interms of relative RMSEs. Assessing predictive performance for individual maturities allowsto identify which segment of the yield curve drives the overall results. While gains for shortermaturities in the case of one-month-ahead forecasts are muted, we observe large gains at thelong end of the yield curve. Relative to the best performing specification, improvements areabout 40 percent in RMSE terms for the best performing specification. The same is true forone-quarter-ahead forecasts, however, at smaller margins of about 20 percent.We proceed with our findings for density forecasts. As mentioned above, the VAR spec-ifications overall exhibit more favorable relative LPBFs compared to the NS-VAR. This iseasily observable by noting that most bold values (in parentheses) are located in the lowerpanel of Table 1. In terms of average performance at the one-month horizon, we find thatthe TVP-VAR specification with one factor but no exogenous variables or Markov switchingperforms best, closely followed by the most flexible specification with one unobserved factorincluding Markov switching. This again serves as an example that even though it is ex-tremely flexible, our approach to shrinking the parameter space avoids overfitting and doesnot harm predictive performance. In fact, for one-quarter-ahead forecasts, the TVP-VARwith exogeneous variables, one latent factor and Markov switching shows the largest gains interms of density forecasts. Again, it must be acknowledged that margins within this modelclass are rather small. As it is the case for point forecasts, these gains are mostly obtained interms of the long end of the yield curve, while gains in forecast accuracy at shorter horizonsare negligible.Summing up, while improvements relative to established models are often small, ourproposed framework is competitive for most maturities at both the one-month and one-quarter ahead horizons. Main differences arise from the considered model class, with the NS-VAR exhibiting promising results in terms of point forecasts, and superior density forecastsfor the VAR. Moreover, we detect the largest improvements at the long end of the yield curve.22he additional flexibility of our proposed approach does not lead to severe overfitting sincewe use shrinkage priors to regularize several parts of the parameter space. And it almostnever harms forecast accuracy while improving performance in some cases. The previous sub-section established that the proposed approach yields more favorable pre-dictions than conventional TVP-VARs. Including observed and latent effect modifiers allowsfor investigating the sources of time-variation in coefficients, and thus the driving factors ofimprovements in predictive accuracy. We carry out a detailed analysis of these determinantsin this sub-section.Because of its favorable forecasting properties, we choose the NS-VAR with one latentfactor per equation to investigate the driving forces of parameter variation over the fullestimation sample. Recall that the choice of z t for this specification translates into a singleequation-specific latent factor ( τ jt ), an equation-specific Markov switching indicator S jt andthe three observed early warning indicators.To illustrate the observed and latent factors, we transform each column vector in z t ex-post such that it is bounded between zero and one. This allows for comparisons even thoughsome blocks of the respective matrices are not econometrically identified. We thereforeexploit the fact that introducing any invertible R × R -dimensional matrix does not alter thelikelihood of the model, since Λ U − U z t = Λ z t . Define U as diagonal matrix such that themaximum of each z j (for j = 1 , . . . , R ) corresponds to one, the minimum is zero, ˜ Λ = Λ U − and ˜ z t = U z t . This simple linear transformation allows for assessing the relative movementof the indicators in ˜ z t without affecting overall dynamics.Figure 1 displays the evolution of the normalized indicators in ˜ z t over time. First, wefocus on the features of the observed effect modifiers depicted in the upper panel (a). The In particular, this is the case for the product Λ τ τ t . Note that we do not face this issue for Λ r r t and Λ S S t because both are either observed or already bounded between zero and one, and their scale and signare identified. igure 1: Evolution of the normalized effect modifiers ˜ z t = U z t over time.(a) Observed factors r t r NFCI,t r REC,t r RF,t (b)
Equation-specific latent Markov switching factors S t S L,t S Š,t S Ç,t (c)
Equation-specific latent random walk factors t t t L,t t Š,t t Ç,t
Notes : Results are based on a TVP-NS-VAR model specification z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) and δ = R τ /M = 1 and P = 3. Panel (a) shows the normalized observed factors r NFCI ,t , r REC ,t and r RF ,t (collected in r t ), panel (b)the posterior mean of the latent Markov switching factors, S L,t , S ˇS ,t and S C¸ ,t (collected in S t ) and panel (c)the posterior median of the latent random walk factors, τ L , τ ˇS and τ ¸c (collected in τ t ). Note that S jt and τ jt are equation-specific latent quantities with M (= 3) endogenous variables, j ∈ { L, ˇS , C¸ } . S L,t and τ L,t correspond to the first equation, S ˇS ,t and τ ˇS ,t to the second, while S C¸ ,t and τ C¸ ,t to the third. Results arebased on the TVP-NS-VAR model variant with δ = R τ /M = 1 and using 15,000 MCMC draws. The grayshaded vertical bars represent recessions dated by the NBER Business Cycle Dating Committee. Sampleperiod: 1973:01 to 2019:12. Vertical axis: normalized values. Front axis: months. r NFCI ,t tend to coincide with recessionaryepisodes, indicated by the binary recession indicator r REC ,t . The risk-free interest rate r RF ,t can be related to the monetary policy stance. Early in the sample we observe substantialincreases, peaking during the Volcker disinflation in the early 1980s. Subsequently, large andabrupt decreases are notable during recessions, while an overall decreasing trend is observable.Before turning to the latent indicators in S t and τ t , note that the respective elements S jt and τ jt are equation-specific with j ∈ { L, ˇS , C¸ } referring to the level, slope and curvature ofthe yield curve. The middle panel (b) indicates the posterior median of the three Markovswitching factors collected in S t . The lower panel (c) shows the posterior medians of thethree (transformed) gradually changing latent random walk factors in τ t .Several features of the latent indicators are worth highlighting. Each of the latent quan-tities exhibits distinct dynamics and thus carries information in addition to the observedindicators. Examining the posterior means, which are essentially the unconditional posteriorprobabilities that a given Markov indicator equals one, we observe that both S L,t and S ˇS ,t evolve comparatively gradually over time. Before the Volcker disinflation, both tend to in-crease with posterior medians above 0 .
5, indicating that regime 1 is more likely. After 1985,we observe a major shift towards regime 0. Interestingly, for S ˇS ,t this transition appearsimmediately after 1985, while we detect a notable delay in S L,t .During the Great Moderation both indicators tend to remain associated with regime0. The indicator associated with the middle segment of the yield curve, S C¸ ,t , by contrast,transitions between regimes at a higher frequency. Particularly during the Volcker disinflationwe observe mixed patterns and no clear or steady tendency towards a single regime. Thischanges between 1990 and 1997, where the posterior mean of S C¸ ,t is consistently above 0 . S C¸ ,t switches abruptly into regime 0.25onditional on the respective loadings in Λ S being non-zero, this feature would directly relateto the observed narrowing spread of the yield curve and accompanying structural breaks incoefficients of the equation related to the curvature of the yield curve.We observe several interesting features of the unobserved factors in τ t . While τ L,t is noisyand indicates substantial high-frequency movements, τ ˇS ,t and τ C¸ ,t are much smoother. Thefactor governing coefficients in the level-equation of the yield curve peaks early in the sample,followed by a decline between 1980 and 1990. After a brief increase and stabilization between1990 and 2000, we see gradual declines until the global financial crisis starting in 2007. Sincethen, the factor shows upward trending movement, with several high-frequency troughs. Bycontrast, the unobserved factor related to S t exhibits approximately linear trending behaviorfrom the beginning of the sample until the early 2000s, where it plateaued. After 2010,a gradual but moderate decrease is visible. τ C¸ ,t is comparable to τ L,t , albeit with severaldifferences. While several peaks coincide, we also find adverse movements, for instance inthe brief early 1980s recession and after 2000. Interestingly, high-frequency movements aremuted when compared to τ L,t .The preceding discussion of ˜ z t must be considered in light of the rescaled loadings in˜ Λ = Λ U − . ˜ Λ translates the law of motion captured in ˜ z t to the coefficients in β t byacting either as amplifier or attenuator. Figure 2 shows the posterior mean of the rescaledloadings Λ U − and allows to assess which elements in ˜ z t determine the time variation inthe TVPs. We differentiate the loadings along two dimensions. Panel (a) shows the effectmodifiers related to the VAR coefficients, while panel (b) depicts the block of ˜ Λ related tothe covariances (stored in q t ).Assessing the loadings in Λ r for the L t -equation reveals that a major part of the coeffi-cients loads strongly positive on r NFCI ,t . In the case of r RF ,t and r REC ,t the patterns are moremixed. For r NFCI ,t we find a maximum loading of around 0 .
25 for ˇS t − , L t − , C¸ t − and L t − ,while for r RF ,t and r REC ,t some coefficients load moderately positive (e.g., the first own lag L t − ) and others moderately negative (e.g., loadings related to lags of the curvature factor).26 igure 2: Heat maps for rescaled loadings in ˜ Λ = Λ U − .(a) Coefficients L t -equation ˇS t -equation C¸ t -equation Spacekeeper i t Ç t - Š t - L t - Ç t - Š t - L t - Ç t - Š t - L t - l NFCI l RF l RC l S l t i t Ç t - Š t - L t - Ç t - Š t - L t - Ç t - Š t - L t - l NFCI l RF l RC l S l t i t Ç t - Š t - L t - Ç t - Š t - L t - Ç t - Š t - L t - l NFCI l RF l RC l S l t −0.50−0.250.000.25 Posterior Mean (b)
Covariances w ( L t , Š t ) w ( L t , Ç t ) w ( Š t , Ç t ) l NFCI l RF l RC l S l t Posterior Mean
Notes : ˜ Λ translates the law of motion captured in ˜ z t to (a) the VAR coefficients in β t , and (b) the covariancesstored in q t , based on a TVP-NS-VAR model specification with z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) , δ = R τ /M = 1 and P = 3. ι t denotes the time-varying intercept, and L t − p , ˇS t − p , C¸ t − p for p = 1 , , L t , ˇS t and C¸ t , respectively. λ r relates to elements of ˜Λ associated with r NFCI ,t , r REC ,t and r RF ,t and the respectiveequations, λ S denotes loadings related to a single Markov switching factor collected in S t , and λ τ loadingscorresponding to latent random walk factors in τ t . Note that S jt and τ jt for j ∈ { L, ˇS , C¸ } are equation-specific quantities, while r t stays fixed across equations. w ( ˇS t , C¸ t ) denotes the contemporaneous relationbetween the ˇS t - and C¸ t -equations, w ( L t , C¸ t ) and w ( L t , ˇS t ) are defined analogously. Sample period: 1973:01to 2019:12. The loadings Λ S and Λ τ related to estimated latent factors exhibit only modest relevancefor defining the law of motion in coefficients related to L t . In particular, the amplifiers for τ t are shrunk heavily towards zero, implying that low frequency movements in the respectivecoefficients are either already captured by other measures, or irrelevant for coefficients in the L t -equation. The same is true for the indicators in S t and τ t for the case of the ˇS t -equation.27urning to observed factors, the corresponding factor loadings are positive for lower-order lags while strongly negative for higher-order lags. By contrast, we find mostly negativeloadings for observed factors for first-lags in the C¸ t -equation. Loadings related to higher-orderlags are mixed, and no clear patterns are observable. Interestingly, the curvature factor is theonly equation where we detect substantial loadings on the latent factors, with most loadingsshowing a positive sign.While unobserved factors appear to be less important for coefficients in the conditionalmean of the model, they play an important role for the covariances, as indicated in panel (b).The covariance between ˇS t and C¸ t , w ( ˇS t , C¸ t ), shows pronounced loadings on all other effectmodifiers than NFCI, where we observe a modest negative loading. A mixed pattern emergesfor the L t and C¸ t covariance, w ( L t , C¸ t ), with some positive and negative measures. Thecontemporaneous relationship w ( L t , ˇS t ) between L t and ˇS t , marks a particularly interestingcase, with modestly negative loadings on NFCI and RF, while REC and the unobservedfactor loadings are close to zero.Summarizing, we find that observed factors often load strongly on the coefficients of allequations. While the Markov switching indicator appears to be important particularly forthe L t - and C¸ t -equations, loadings are muted for the coefficients of ˇS t . Another interestingaspect is that conditional on observed effect modifiers, the gradually evolving coefficientscaptured by τ t are mostly irrelevant for the L t - and ˇS t -equations, different to strong positiveloadings in the case of the C¸ t -equation and the covariances. Additional results showing theactual regression coefficients are provided in Appendix B. To assess what our framework implies on the relations between the factors that determine theyield curve, we compute the low-frequency relationship between the L t , ˇS t and C¸ t . We choosethis long-run correlation measure for two reasons. First, it allows to illustrate movements intime-varying coefficients (i.e., transmission channels) and changes in the error variances in a28ingle indicator over time. Second, this measure compresses information of all coefficients ina structurally meaningful way since it isolates long-run trends and correlations from short-run fluctuations (Sargent and Surico, 2011; Kliem et al., 2016). Additional results for thereduced form coefficients are provided in Appendix B.To construct the measure, we transform the TVP-VAR( P ) in Eq. (1) to its state-spaceTVP-VAR(1) form. In what follows, the observation equation is given by y t = J Y t , whilethe state equation is defined as Y t = B t Y t − + ω t with ω t ∼ N ( , Ω t ). Here, J maps a K -dimensional vector Y t = ( y (cid:48) t , . . . , y (cid:48) t − P +1 ) (cid:48) to y t , B t collects the elements in β t in the upper M × K block and defines identities otherwise. Similar, the K × K -dimensional variance-covariance matrix Ω t collects elements in Σ t in the upper-left M × M block and is zerootherwise. We follow Sargent and Surico (2011) and first calculate the spectral density Φ t (0)of y t at a zero frequency which coincides with the unconditional variance-covariance matrixof y t : Φ t (0) = J ( I − B t ) Ω t ( I − B (cid:48) t ) − J (cid:48) , for t = 1 , . . . , T. Next, we transform the covariances of Φ t into a correlation measure for each period t andeach variable combination i, j ( i (cid:54) = j and i, j = 1 , . . . , M ): φ ij,t = Φ ij,t (0)Φ jj,t (0) . The measure φ ij,t describes the long-run relations between variable i and j at each pointin time, and is displayed in Figure 3 for the model specification TVP-NS-VAR with z t =( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) , R τj = 1 and P = 3. Note that the variables enter our model in differences. Hence,Figure 3 depicts the low frequency relations of changes in the level, slope and curvature ofthe yield curve.We observe several interesting periods characterized by structural breaks. First, therelationship between the level and slope of the yield curve was close to zero until the Volcker29 igure 3: Posterior median and the 68 percent credible set for pairwise long-run correlations.(a) φ L ˇS ,t −1.0−0.50.0 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 (b) φ L C¸ ,t −0.50.00.51.0 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 (c) φ ˇSC¸ ,t −1.0−0.50.00.5 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Notes : Panel (a) φ L ˇS ,t , (b) φ L C¸ ,t and (c) φ ˇSC¸ ,t , based on a TVP-NS-VAR with z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) , δ = Rτ /M = 1 and P = 3. The colored solid lines denote the posterior medians, the black dashed line the zeroline, and the colored shaded band the 68 percent posterior coverage interval, while the gray vertical barsrepresent recessions dated by the NBER Business Cycle Dating Committee. φ ij,t is the long-run correlationfor the variables i and j at time t , for i, j ∈ { L, ˇS , C¸ } . Results are based on 15,000 MCMC draws. Sampleperiod: 1973:01 to 2019:12. Vertical axis: correlation measurements. Front axis: months. This paper proposes methods for automatically selecting adequate state equations in TVP-VAR models in a data-driven fashion. The TVPs are assumed to depend on a set of observedand unobserved covariates, also known as effect modifiers. As unobserved covariates, weconsider a set of low dimensional latent factors that follow a random walk, alongside Markovswitching indicators that allow for abrupt structural breaks. Our model nests several alterna-tives commonly used in the literature on modeling macroeconomic and financial time series.31o choose between state equations, we use a hierarchical Bayesian global-local shrinkage prioron the most flexible specification.We apply our econometric framework to US yield curve data. Carrying out a thoroughpredictive exercise, we show that our techniques produce favorable point and density forecastsvis-`a-vis a set of established benchmark models (which are nested variants of our proposedmodeling approach). The performance is specific to the information set used in the under-lying TVP-VAR and appears to be more pronounced for density forecasts. This exerciseillustrates that our approach produces very competitive forecasts without increasing the riskof overfitting, while providing a framework to trace the sources of time-variation – a keyadvantage compared to conventional TVP-VARs. This predictive exercise is complementedby a full-sample analysis of structural breaks in the relationship between the level, slopeand curvature of the US yield curve. We detect several interesting patterns in abrupt andgradual time-variation patterns in long-run cross-variable relations. These changes appearto be specific to the monetary regime and the state of the business cycle.
Acknowledgments : The authors gratefully acknowledge financial support by the Jubil¨aums-fonds of the Oesterreichische Nationalbank (OeNB, project 18127) and by the Austrian Sci-ence Fund (FWF, project ZK 35).
References
Patrick A Adams, Tobias Adrian, Nina Boyarchenko, and Domenico Giannone. Forecasting macroe-conomic risks.
International Journal of Forecasting , in press, published online 6 Feb, 2021.Tobias Adrian, Nina Boyarchenko, and Domenico Giannone. Vulnerable growth.
American Eco-nomic Review , 109(4):1263–89, 2019.Omar Aguilar and Mike West. Bayesian dynamic factor models and portfolio allocation.
Journal f Business & Economic Statistics , 18(3):338–357, 2000.Giovanni Caggiano, Efrem Castelnuovo, and Giovanni Pellegrino. Estimating the real effects ofuncertainty shocks at the zero lower bound. European Economic Review , 100:257–272, 2017.Andrea Carriero, Todd Clark, and Massimiliano Marcellino. Large vector autoregressions withstochastic volatility and flexible priors.
Journal of Econometrics , 212(1):137–154, 2019.Andrea Carriero, Todd E Clark, and Massimiliano Giuseppe Marcellino. Capturing macroeconomictail risks with Bayesian vector autoregressions. Working Paper 202002R, Federal Reserve Bankof Cleveland, 2020.Chris K Carter and Robert Kohn. On Gibbs sampling for state space models.
Biometrika , 81(3):541–553, 1994.Carlos M Carvalho, Nicholas G Polson, and James G Scott. The horseshoe estimator for sparsesignals.
Biometrika , 97(2):465–480, 2010.Joshua CC Chan, Eric Eisenstat, and Rodney Strachan. Reducing the state space dimension in alarge TVP-VAR.
Journal of Econometrics , 218(1):105–118, 2020.Timothy Cogley and Thomas J. Sargent. Drifts and volatilities: Monetary policies and outcomesin the post WWII US.
Review of Economic Dynamics , 8(2):262 – 302, 2005.Thomas Dangl and Michael Halling. Predictive regressions with time-varying coefficients.
Journalof Financial Economics , 106:157–181, 2012.Francis X Diebold and Canlin Li. Forecasting the term structure of government bond yields.
Journalof Econometrics , 130(2):337–364, 2006.Francis X Diebold, Canlin Li, and Vivian Z Yue. Global yield curve dynamics and interactions: Adynamic Nelson–Siegel approach.
Journal of Econometrics , 146(2):351–363, 2008.Sylvia Fr¨uhwirth-Schnatter. Data augmentation and dynamic linear models.
Journal of Time SeriesAnalysis , 15(2):183–202, 1994. ylvia Fr¨uhwirth-Schnatter and Helga Wagner. Stochastic model specification search for Gaussianand partial non-Gaussian state space models. Journal of Econometrics , 154(1):85–100, 2010.Richard Gerlach, Chris Carter, and Robert Kohn. Efficient Bayesian inference for dynamic mixturemodels.
Journal of the American Statistical Association , 95(451):819–828, 2000.John Geweke and Gianni Amisano. Comparing and evaluating Bayesian predictive distributions ofasset returns.
International Journal of Forecasting , 26(2):216–230, 2010.John Geweke and Guofu Zhou. Measuring the pricing error of the arbitrage pricing theory.
Reviewof Financial Studies , 9(2):557–587, 1996.Paolo Giordani and Robert Kohn. Efficient Bayesian inference for multiple change-point and mixtureinnovation models.
Journal of Business & Economic Statistics , 26(1):66–77, 2008.Refet S G¨urkaynak, Brian Sack, and Jonathan H Wright. The US treasury yield curve: 1961 to thepresent.
Journal of Monetary Economics , 54(8):2291–2304, 2007.Trevor Hastie and Robert Tibshirani. Varying-coefficient models.
Journal of the Royal StatisticalSociety: Series B (Methodological) , 55(4):757–779, 1993.Niko Hauzenberger. Flexible mixture priors for time-varying parameter models. arXiv preprint ,2006.10088, 2020.Niko Hauzenberger, Florian Huber, Gary Koop, and Luca Onorante. Fast and flexible Bayesianinference in time-varying parameter regression models. arXiv preprint , 1910.10779, 2019.Constantino Hevia, Martin Gonzalez-Rozada, Martin Sola, and Fabio Spagnolo. Estimating andforecasting the yield curve using a Markov switching dynamic Nelson and Siegel model.
Journalof Applied Econometrics , 30(6):987–1009, 2015.Florian Huber, Gregor Kastner, and Martin Feldkircher. Should I stay or should I go? A latentthreshold approach to large-scale mixture innovation models.
Journal of Applied Econometrics ,34(5):621–640, 2019. regor Kastner and Sylvia Fr¨uhwirth-Schnatter. Ancillarity-sufficiency interweaving strategy(ASIS) for boosting MCMC estimation of stochastic volatility models. Computational Statistics& Data Analysis , 76:408–423, 2014.Chang-Jin Kim and Charles R. Nelson.
State-space models with regime switching: Classical andGibbs-sampling approaches with applications . MIT Press, Cambridge and London, 1999.Martin Kliem, Alexander Kriwoluzky, and Samad Sarferaz. On the low-frequency relationshipbetween public deficits and inflation.
Journal of Applied Econometrics , 31(3):566–583, 2016.Gary Koop, Roberto Leon-Gonzalez, and Rodney W Strachan. On the evolution of the monetarypolicy transmission mechanism.
Journal of Economic Dynamics and Control , 33(4):997–1017,2009.Dimitris Korobilis. High-dimensional macroeconomic forecasting using message passing algorithms.
Journal of Business & Economic Statistics , in press, published online 22 Nov, 2019.John M Maheu and Yong Song. An efficient Bayesian approach to multiple structural change inmultivariate time series.
Journal of Applied Econometrics , 33(2):251–270, 2018.Enes Makalic and Daniel F Schmidt. A simple sampler for the horseshoe estimator.
IEEE SignalProcessing Letters , 23(1):179–182, 2015.Giorgio Primiceri. Time varying structural autoregressions and monetary policy.
Review of Eco-nomic Studies , 72(3):821–852, 2005.Thomas J Sargent and Paolo Surico. Two illustrations of the quantity theory of money: Breakdownsand revivals.
American Economic Review , 101(1):109–28, 2011.Christopher A Sims and Tao Zha. Were there regime switches in US monetary policy?
AmericanEconomic Review , 96(1):54–81, 2006.James H Stock and Mark Watson. Dynamic factor models. In Michael P. Clements and David F.Hendry, editors,
The Oxford Handbook of Economic Forecasting , pages 35–60. Oxford University ress, 2011. ppendices A Technical appendix
A.1 Sampling the state innovation variances
For sampling the state innovation variances based on Eq. (1), we let η ji,t denote the shock tothe i th coefficient in ˜ γ t with respect to the j th equation. The posterior of the state innovationvariances is a generalized inverse Gaussian (GIG) distribution: ω ji | (cid:36) ji , ϑ j ∼ GIG (cid:32) − T , T (cid:88) t =1 η ji,t , (cid:36) ji ϑ j (cid:33) . A.2 Posterior for the horseshoe prior
Our specification of the horseshoe prior on Λ j , γ j and the square root of ω j in Section 2.4for a generic parameter b i for i = 1 , . . . , K is: b i | c i , d ∼ N (0 , c i d ) , c i ∼ C + (0 , , d ∼ C + (0 , . We rely on this prior in its auxiliary representation as in Makalic and Schmidt (2015) forefficient sampling of the local ( c i ) and global ( d ) shrinkage parameters: c i | e i ∼ G − (1 / , /e i ) , d | f ∼ G − (1 / , /f ) , e i ∼ G − (1 / , , f ∼ G − (1 / , . The generalized inverse Gaussian distribution is specified such that its density function is proportionalto f ( x ) = x λ − exp( − ( χ/x + ψx ) /
2) for a random variable x ∼ GIG ( λ, χ, ψ ). G − denotes the inverse Gamma distribution. This setup yields the following condi-tional posterior distributions: c i | b i , d, e i ∼ G − (cid:18) , e i + b i d (cid:19) , d | b i , c i , f ∼ G − (cid:32) K + 12 , f + K (cid:88) i =1 b i c i (cid:33) ,e i | c i ∼ G − (cid:0) , c − i (cid:1) , f | d ∼ G − (cid:0) , d − (cid:1) . B Further empirical results
This Appendix contains additional results for the reduced form coefficients. While Sub-section 3.4 provides posterior estimates of the lower dimensional effect modifiers, Figs.B.1–B.3 display the coefficients obtained by multiplying the factor loadings with the ob-served/latent factors based on the relationship established in Eq. (1).38 igure B.1:
Posterior median of the coefficients associated with the own lags of L t , ˇS t and C¸ t . (a) Coefficients of first own lag L t - Š t - Ç t - (b) Coefficients of second own lag L t - Š t - Ç t - (c) Coefficients of third own lag L t - Š t - Ç t - Notes : Panels (a), (b) and (c) show the dynamic evolution of the coefficients related to the variables’ ownlags p ∈ { , , } of the respective equation for L t , ˇS t and C¸ t . Results are based on the TVP-NS-VARmodel variant with δ = R τ /M = 1 and using 15,000 MCMC draws. The black dashed line denotes thezero line, while the gray shaded vertical bars represent recessions dated by the NBER Business Cycle DatingCommittee. Sample period 1973:01 to 2019:12. Vertical axis: posterior median estimate. Front axis: months. igure B.2: Posterior median of the coefficients associated with cross-variable lags of L t , ˇS t and C¸ t . (a) Coefficients of first other lags −0.50.00.51.01.5 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Equation L t Š t Ç t L t - Š t - Ç t - (b) Coefficients of second other lags −0.9−0.6−0.30.00.3 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Equation L t Š t Ç t L t - Š t - Ç t - (c) Coefficients of third other lags −0.4−0.20.00.20.4 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Equation L t Š t Ç t L t - Š t - Ç t - Notes : Panels (a), (b) and (c) show the dynamic evolution of the coefficients related to cross-variable lags p ∈ { , , } of the respective equation for L t , ˇS t and C¸ t . Results are based on the TVP-NS-VAR modelvariant with δ = R τ /M = 1 and using 15,000 MCMC draws. The black dashed line denotes the zero line, whilethe gray shaded vertical bars represent recessions dated by the NBER Business Cycle Dating Committee.Sample period 1973:01 to 2019:12. Vertical axis: posterior median estimate. Front axis: months. igure B.3: Posterior median of the contemporaneous relationships between L t , ˇS t and C¸ t . −2024 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Contemp. relations w ( L t , Š t ) w ( L t , Ç t ) w ( Š t , Ç t ) Notes : w ( ˇS t , C¸ t ) denotes the contemporaneous relation between the ˇS t - and C¸ t -equations, w ( L t , C¸ t ) and w ( L t , ˇS t ) are defined analogously. Results are based on the TVP-NS-VAR model variant with δ = R τ /M = 1and using 15,000 MCMC draws. The black dashed line denotes the zero line, while the gray shaded verticalbars represent recessions dated by the NBER Business Cycle Dating Committee. Sample period 1973:01 to2019:12. Vertical axis: posterior median estimate. Front axis: months.= 1and using 15,000 MCMC draws. The black dashed line denotes the zero line, while the gray shaded verticalbars represent recessions dated by the NBER Business Cycle Dating Committee. Sample period 1973:01 to2019:12. Vertical axis: posterior median estimate. Front axis: months.