[PDF] General Bayesian time-varying parameter VARs for predicting government bond yields

Abstract

Time-varying parameter (TVP) regressions commonly assume that time-variation in the coefficients is determined by a simple stochastic process such as a random walk. While such models are capable of capturing a wide range of dynamic patterns, the true nature of time variation might stem from other sources, or arise from different laws of motion. In this paper, we propose a flexible TVP VAR that assumes the TVPs to depend on a panel of partially latent covariates. The latent part of these covariates differ in their state dynamics and thus capture smoothly evolving or abruptly changing coefficients. To determine which of these covariates are important, and thus to decide on the appropriate state evolution, we introduce Bayesian shrinkage priors to perform model selection. As an empirical application, we forecast the US term structure of interest rates and show that our approach performs well relative to a set of competing models. We then show how the model can be used to explain structural breaks in coefficients related to the US yield curve.

Full PDF

GGeneral Bayesian time-varying parameter VARs forpredicting government bond yields

Manfred M. Fischer* , Niko Hauzenberger

1, 2

Florian Huber , and Michael Pfarrhofer March 1, 2021

Time-varying parameter (TVP) regressions commonly assume that time-variation in the coeﬃcients is determined by a simple stochastic processsuch as a random walk. While such models are capable of capturing a widerange of dynamic patterns, the true nature of time variation might stemfrom other sources, or arise from diﬀerent laws of motion. In this paper, wepropose a ﬂexible TVP VAR that assumes the TVPs to depend on a panelof partially latent covariates. The latent part of these covariates diﬀer intheir state dynamics and thus capture smoothly evolving or abruptly changingcoeﬃcients. To determine which of these covariates are important, and thusto decide on the appropriate state evolution, we introduce Bayesian shrinkagepriors to perform model selection. As an empirical application, we forecastthe US term structure of interest rates and show that our approach performswell relative to a set of competing models. We then show how the model canbe used to explain structural breaks in coeﬃcients related to the US yield curve.

JEL : C11, C30, E37, E43

Keywords : Bayesian shrinkage, interest rate forecasting, latent eﬀect modi-ﬁers, MCMC sampling, time-varying parameter regression *Corresponding author: Manfred M. Fischer, Vienna University of Economics and Business,Welthandelsplatz 1, A-1020 Vienna, Austria. E-mail: manfred.ﬁ[email protected] Vienna University of Economics and Business, University of Salzburg a r X i v : . [ ec on . E M ] F e b Introduction

Time-varying parameter vector autoregressive (TVP-VAR) models are commonly used inﬁnance and macroeconomics to capture dynamic relations across variables, regime shiftsand/or structural changes in economic processes (see Primiceri, 2005; Cogley and Sargent,2005; Dangl and Halling, 2012). These models typically assume that the parameters evolveover time according to a simple stochastic process such as a random walk. While beingrather ﬂexible and parsimonious, this assumption does not allow to examine the extent towhich covariates cause changes in the time-varying parameters (TVPs) over time. Moreover,wrongly assuming a random walk state equation could negatively impact predictive accuracybecause it essentially implies a smoothness prior on the coeﬃcients. This might be at oddswith the rapid shifts we have observed in ﬁnancial time series such as bond yields and thuscould negatively impact predictive accuracy.The literature has dealt with the issue of selecting the appropriate law of motion for time-varying parameters (TVPs) by estimating diﬀerent models separately and then using modelselection criteria to discriminate between competing speciﬁcations (see, e.g., Sims and Zha,2006; Koop et al., 2009; Hauzenberger, 2020). But, to the best of our knowledge, no attempthas been made to develop models that rely on a large set of competing laws of motion forthe coeﬃcients and decide which one describes the data best.This paper proposes a ﬂexible approach to TVP-VARs that eﬃciently integrates outuncertainty with respect to the state evolution equation. The approach assumes that theTVPs depend on a potentially large panel of covariates. These covariates are commonlylabeled eﬀect modiﬁers which can be partially latent and might feature their own stateequations. In case they are observed, we obtain a model that is closely related to the varyingcoeﬃcient model originally proposed in Hastie and Tibshirani (1993). The main advantageof using observed, as opposed to latent eﬀect modiﬁers, is that we can investigate the drivingsources of parameter change. This feature is important if the researcher is interested ininvestigating why relations between variables in a VAR change over time, and to what extent2hese changes are explained by the observed eﬀect modiﬁers.Careful selection of these eﬀect modiﬁers is crucial. On the one hand, deciding on theappropriate set of observed quantities is diﬃcult, and a large set of candidates could arise.One key objective of this paper is to provide techniques to select promising subsets. On theother hand, appropriately selecting the latent eﬀect modiﬁers allows us to capture situationswhere parts of the coeﬃcients evolve smoothly whereas others move more abruptly. Thelatter behavior of the TVPs is often found for US macroeconomic data (see Sims and Zha,2006), whereas the former is consistent with ﬁnancial time series such as bond yields or stockreturns (see Dangl and Halling, 2012; Huber et al., 2019).Since large VARs often include both macroeconomic as well as ﬁnancial quantities, asuccessful model should be able to accommodate both types of structural change or evenrely on linear combinations of them. This is precisely what we aim to achieve in this paper.Our approach is capable of answering not only the question why coeﬃcients change, but alsoto infer an appropriate law of motion using a broad set of latent quantities, each equippedwith its own state evolution equation. These unobserved quantities range from latent factorsthat follow a random walk to Markov switching indicators that allow a subset of parametersto switch between a low number of regimes. Our model approach nests several alternativesproposed in the literature such as the TVP-VAR of Primiceri (2005) and Cogley and Sargent(2005) or the reduced-rank model of Chan et al. (2020).This large degree of ﬂexibility, however, comes with two concerns. The ﬁrst is thatoverﬁtting problems can easily arise. We overcome these by using Bayesian shrinkage priors.Our prior is a variant of the well-known Horseshoe prior (see Carvalho et al., 2010) that allowsus to shrink coeﬃcients associated with irrelevant eﬀect modiﬁers towards zero. The secondconcern relates to computation. Since inclusion of a large number of endogenous variables ina VAR quickly leads to a huge dimensional parameter space, we propose a computationallyeﬃcient Markov chain Monte Carlo (MCMC) algorithm. To circumvent mixing issues, werely on diﬀerent parameterizations of the model during MCMC sampling. The corresponding3ovel algorithm thus provides a second important contribution of the paper.In our empirical work, we use the model approach to forecast the US term structure ofinterest rates. We investigate the empirical properties of our approach using two informationsets in the underlying VAR. First, we include several interest rates at diﬀerent maturities di-rectly as endogenous variables. Second, we consider a three-factor Nelson-Siegel model for theterm structure of interest rates as in Diebold and Li (2006) and Diebold et al. (2008). Adopt-ing a long hold-out period that includes several recessionary episodes, our approach improvesupon a wide range of competing models. While improvements for point forecasts are oftenmuted, we ﬁnd that our proposed model yields favorable density predictions. The predictiveexercise is complemented by a comprehensive discussion of patterns in time-variation andtheir sources. Moreover, how our approach can be used to analyze low frequency relationsbetween the observed quantities in the model over time.The rest of the paper is structured as follows. Section 2 introduces the econometricframework which includes the general form of the TVP-VAR, a ﬂexible law of motion forthe latent states as well as the eﬀect modiﬁers which crucially impact the state dynamics.This section, moreover, introduces the Bayesian prior setup and techniques for posterior andpredictive inference. Section 3 applies the model approach to the term structure of US interestrates. It also serves to illustrate key model features and to highlight the predictive capabilitiesof the approach in an out-of-sample forecasting exercise. The last section summarizes andconcludes the paper. Additional technical details and further empirical results are providedin the Appendix.

Let y t denote an M × t = 1 , . . . , T .We assume that y t depends on its P lags which we store in a K = M P -dimensional vector4 t = ( y (cid:48) t − , . . . , y (cid:48) t − P ) (cid:48) . Then the basic TVP-VAR can be written as a linear multivariateregression model: y t = ( I M ⊗ x (cid:48) t ) β t + (cid:15) t , where β t represents a set of k = M K dynamic regression coeﬃcients and (cid:15) t ∼ N ( M , Σ t ) isa vector Gaussian shock process with time-varying M × M -dimensional variance-covariancematrix Σ t . We assume that Σ t can be decomposed as follows: Σ t = Q t H t Q (cid:48) t . Here, Q t denotes an M × M lower triangular matrix with unit diagonal with v (= M ( M − q t . H t = diag( e h t , . . . , e h Mt ) is a diagonal matrix with h jt ( j =1 , . . . , M ) representing time-varying log-volatilities. These are assumed to evolve accordingto an AR(1) process: h jt = µ j + ψ j ( h jt − − µ j ) + ν jt , ν jt ∼ N (0 , ς j ) .µ j is the unconditional mean, ψ j the persistence parameter and ς j the variance of the log-volatility process for equation j .In what follows, we rewrite the TVP-VAR using its non-centered parameterization (Fr¨uhwirth-Schnatter and Wagner, 2010): y t = ( I M ⊗ x (cid:48) t )( β + ˜ β t ) + ( Q + ˜ Q t ) ε t , ε t ∼ N ( M , H t ) . β denotes a k -dimensional vector of constant coeﬃcients, and ˜ β t = β t − β . This parame-terization allows us to disentangle time-invariant (encoded by β ) from time-varying eﬀects(encoded by ˜ β t ) for the regressors. For the decomposed variance-covariance matrix, we have Q , a lower triangular matrix with ones on the diagonal capturing the constant part of the5ovariances. ˜ Q t = Q t − Q is the corresponding lower triangular matrix with zero-diagonal el-ements containing the time-varying part. Their free elements are collected in the v × q and ˜ q t , respectively.In this paper, the focus is on modeling the N (= k + v )-dimensional vector ˜ γ t = ( ˜ β (cid:48) t , ˜ q (cid:48) t ) (cid:48) and the constant part γ = ( β (cid:48) , q (cid:48) ) (cid:48) . The literature typically assumes that the transitiondistribution p (˜ γ t | ˜ γ t − ) is given by:˜ γ t | ˜ γ t − ∼ N (˜ γ t − , V ) with ˜ γ = N . This law of motion suggests that the expected value of ˜ γ t − equals ˜ γ t and the amount oftime-variation is determined by the N × N -dimensional process innovation variance-covariancematrix V . This matrix is often assumed to be diagonal. Notice that if selected elements in V are equal to zero, the corresponding regression coeﬃcients are constant.Estimation and inference is typically carried out via Bayesian methods. The recent lit-erature proposes using shrinkage priors to allow for data-based selection of those coeﬃcientswhich should be time-varying or constant. This already leads to substantial improvementsin predictive accuracy but does not tackle the fundamental question whether the coeﬃcientsare better characterized by a random walk, a change-point process, or by mixtures of these. As discussed in the previous sub-section, the typical assumption is that ˜ γ t follows a randomwalk process. In addition, the shocks to the random walk state equation are often assumedto feature a positive error variance. We relax both assumptions to allow for more ﬂexibility.The random walk assumption is relaxed by assuming that the time-varying part storedin ˜ γ t depends on a set of R additional factors z t . These z t are the eﬀect modiﬁers mentionedin the introduction that can be observed or latent. The relationship between the TVPs and6 t is given by:˜ γ t = Λ z t + η t . (1) Λ denotes an N × R matrix of regression coeﬃcients, and η t ∼ N ( N , Ω ) is a Gaussian errorterm with diagonal error variance-covariance matrix Ω = diag( ω , . . . , ω N ). If R (cid:28) N , thecoeﬃcients feature a factor structure and co-move according to the eﬀect modiﬁers in z t .The relationship between ˜ γ t and z t is determined by the factor loadings in Λ . For instance,if the j th column of Λ , λ j , is equal to zero, the corresponding j th factor in z t does not enterthe model and thus has no inﬂuence on ˜ γ t .The speciﬁc selection of z t is crucial for determining the dynamics of ˜ γ t . Appropriatechoice of z t yields a variety of important special cases that depend on the speciﬁc values of Λ and Ω as well as on the composition of z t . In this sub-section, we brieﬂy focus on specialcases that arise independently of the choice of z t . The next sub-section deals with cases thatarise if z t is suitably chosen.These two cases are the following. If Λ = N × R , with N × R being a N × R matrix of zeros,we obtain a random coeﬃcients model that assumes that the regression coeﬃcients follow awhite noise process (for some recent papers that follow this approach, see Korobilis, 2019;Hauzenberger et al., 2019). The second special case arises if both Λ = N × R and Ω = N × N .In this case, we obtain a standard constant parameter regression model.Before we discuss the choice of z t , it is worth noting that if z t is (partially) latent, themodel in Eq. (1) is not identiﬁed. Since our object of interest is γ t , this poses no greaterissues. If we wish to structurally interpret elements in z t , standard identiﬁcation strategiesfrom the literature on dynamic factor models can be used (see, e.g., Geweke and Zhou, 1996;Aguilar and West, 2000; Stock and Watson, 2011).7 .3 Possible Choices for the Eﬀect Modiﬁers The speciﬁc choice of z t is crucial in determining how ˜ γ t behaves over time. Hence, by suitablychoosing the elements in z t , our model approach is related to the following speciﬁcations: • Chan et al. (2020): We assume that z t consists exclusively of a sequence of R = R τ latentfactors τ t , which follow a multivariate random walk: τ t = τ t − + ν t , ν t ∼ N (0 , V τ ) . V τ = diag( v , . . . , v R τ ) denotes a diagonal variance-covariance matrix with v j being processinnovation variances that determine the smoothness of the elements in τ t . Note that setting v j close to zero eﬀectively implies that τ jt , the j th element in τ t , is constant. This modelimplies a factor structure in ˜ γ t if R τ (cid:28) N . • Primiceri (2005): If R = R τ = N , the elements in z t are random walks, and Λ = I N ,we obtain a standard time-varying parameter model. Assuming that the covariances areconstant we obtain the model put forth in Cogley and Sargent (2005). • Sims and Zha (2006): A Markov switching model can be obtained by setting z t = S t , with S t ∈ { , } denoting a binary indicator with transition probabilities given by: p ( S t = i | S t − = j ) = p ij for i, j = 0 , , with p ij denoting the ( i, j ) th element of a 2 × P . Inclusion of this random quantity allows to capture structural breaks in ˜ γ t that arecommon to all coeﬃcients. • Caggiano et al. (2017): Assuming that z t is exclusively composed of observed quantitieswe obtain a regression model with interaction terms.These examples show that our model, conditional on choosing a suitable set of eﬀect modiﬁers,8s capable of mimicking several prominent speciﬁcations in the literature. Since the questionon the appropriate state evolution equation is essentially a model selection issue, we simplyspecify z t to include most (with the exception of the R = N setup) of the modiﬁers discussedabove.More precisely, we set z t as follows: z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) . Here we let r t denote a set of R r observed factors and the dimension of z t is thus R = R r + R S + R τ with R S = M . To allow for additional ﬂexibility we assume that τ t = ( τ (cid:48) t , . . . , τ (cid:48) Mt ) (cid:48) with τ jt being equation-speciﬁc factors of dimension R τj (and thus R τ = (cid:80) j R τj ). Assuming that R τi = R τj = δ for all i, j , we have R τ = δM latent random walk factors. Likewise, we estimatea separate Markov switching indicator S jt per equation (and thus S t = ( S t , . . . , S Mt ) (cid:48) , withthe corresponding transition probabilities matrix denoted by P j ).The corresponding loadings matrix Λ is structured such that the loadings in equation j associated with the factors τ it and S it for i (cid:54) = j equal zero. This assumption strikes abalance between assuming a large number of latent factors to achieve maximum ﬂexibility(and thus risk overﬁtting) and using a rather parsimonious model (with the risk of being toosimplistic). Recent contributions use similar assumptions on the state evolutions, (see Koopet al., 2009; Maheu and Song, 2018). As opposed to these papers, our approach oﬀers moreﬂexibility since, if necessary, the presence of the idiosyncratic shocks to the TVPs allows fordeviations if the factor structure does not represent the data well.Our speciﬁcation implies that, depending on the factor loadings Λ , the evolution of ˜ γ t might be a combination of a set of random walk factors, a Markov switching process andsome observed quantities. To single out irrelevant elements in z t , one could simply set thecorresponding columns in Λ equal to zero. In this paper, we achieve this through a Bayesianshrinkage prior. The next sub-section discusses our priors in more detail.9 .4 The Prior Setup The discussion in Sub-section 2.1 shows that our model approach nests a variety of com-peting models. To select the appropriate model variant and alleviate over-parameterizationconcerns, we opt for a Bayesian approach to introduce shrinkage. Here, we summarize thepriors we impose on key parameters.In light of the speciﬁc choice of z t , we introduce some additional notation to clarify detailson our prior implementation. Let us assume that Λ is composed of the following matrices: Λ = [ Λ r Λ S Λ τ ] , where Λ r is an N × R r matrix of loadings related to the observed quantities, Λ S denotes an N × R S matrix of loadings related to S t , and Λ τ represents an N × R τ factor loadings matrixassociated with τ t .For imposing shrinkage we rely on variants of the horseshoe prior (Carvalho et al., 2010).While in principle any global-local shrinkage prior may be used, we choose the horseshoe priordue to its excellent shrinkage properties and the lack of tuning parameters. In particular, wespecify a column-wise horseshoe prior on the loadings matrix. Let Λ j denote a sub-matrixof the free elements in Λ corresponding to the j th equation, and λ ji mark the i th column ofthis matrix. λ ji,(cid:96) refers to the (cid:96) th element of this vector. The prior is given by: λ ji,(cid:96) | κ ji,(cid:96) , δ ji ∼ N (0 , κ ji,(cid:96) δ ji ) , κ ji,(cid:96) ∼ C + (0 , , δ ji ∼ C + (0 , . Here, C + (0 ,

1) denotes the half Cauchy distribution, and δ ji is an equation- and column-speciﬁc global shrinkage factor, while κ ji,(cid:96) is a local scaling parameter.To further regularize our potentially huge-dimensional parameter space, we impose anequation-wise horseshoe prior on the constant part of the regression coeﬃcients and covari-ances in γ j corresponding to the j th equation. Let γ ji denote the i th element of the vector.10he setup is similar to the one of the loadings matrix, and given by the hierarchical structure: γ ji | ξ ji , ζ j ∼ N (0 , ξ ji ζ j ) , ξ ji ∼ C + (0 , , ζ j ∼ C + (0 , . The hyperparameter ζ j is an equation-speciﬁc global shrinkage factor, while the ξ ji ’s are localscalings.We mentioned earlier that it is often assumed that shocks to the states feature positiveerror variances. To introduce shrinkage of ω ii towards zero, we impose a horseshoe prior alsoon the square root of the innovation variances of the measurement errors in Eq. (1). This prioris speciﬁed in an equation-speciﬁc manner. For equation j , let ω j denote a v j (= j − k )-dimensional vector which stores the diagonal elements in Ω associated with the j th equation.This includes the process innovation variances on the k regression coeﬃcients and the j − Q t . The square root of the i th element of ω j , √ ω ji , features thefollowing prior hierarchy: √ ω ji | (cid:36) ji , ϑ j ∼ N (0 , (cid:36) ji ϑ j ) , (cid:36) ji ∼ C + (0 , , ϑ j ∼ C + (0 , . Choosing a Gaussian prior on the square root of the variance in the ﬁrst level of the hierarchyimplies a Gamma prior on ω ji , with ω ji | (cid:36) ji , ϑ j ∼ G (cid:0) / , (cid:36) − ji ϑ − j / (cid:1) , see also Fr¨uhwirth-Schnatter and Wagner (2010). The hyperparameters ϑ j and (cid:36) ji are again equation-speciﬁcglobal and local shrinkage parameters. Furthermore, we set V τ = I R τ and thus imposeshrinkage through the factor loadings in Λ τ (see Chan et al., 2020).On the parameters of the state equation of the log-volatility processes µ j , ψ j and ς j , weuse the setup proposed in Kastner and Fr¨uhwirth-Schnatter (2014). That is, we assume aGaussian prior on the unconditional mean, µ j ∼ N (0 , ), a Beta prior on the transformedautoregressive parameter, ( ψ j + 1) / ∼ B (5 , . ς j ∼ G (1 / , / j = 1 , . . . , M . 11or the equation-speciﬁc transition probabilities P j of Markov switching indicators, weassume that the ( i, i ) th element p j,ii arises from a Beta distribution given by: p j,ii ∼ B ( e i , e i ) , for i = 0 , j = 1 , . . . , M, and hence p j,i(cid:96) = 1 − p j,i(cid:96) for i (cid:54) = (cid:96) . In the empirical application, we deﬁne e = e = 10and e = e = 1, in order to weakly push each S jt towards a single-state a priori. To simulate from the full posterior distribution we develop an eﬃcient MCMC algorithm.Since full-system estimation of the VAR quickly becomes computationally cumbersome, werely on the equation-by-equation algorithm suggested in Carriero et al. (2019).Conditional on Q t , one can state the VAR as a system of (conditionally) independentequations. The ﬁrst equation of this system is given by: y t = x (cid:48) t ( β + ˜ β t ) + ε t , and the j th equation ( j > y jt = x (cid:48) t ( β j + ˜ β jt ) + u (cid:48) jt ( q j + ˜ q jt ) + ε jt . (2) β j and ˜ β jt denote the j th subvectors of the constant and time-varying parts in β t with β t = ( β (cid:48) t , . . . , β (cid:48) Mt ) (cid:48) and u jt = ( ε t , . . . , ε j − ,t ) (cid:48) . The ( j − q j and˜ q jt store the constant and time-invariant part of the free elements in the j th row of Q t .This approach allows to estimate the diﬀerent elements of γ t that relate to the M equationsindependently from each other conditional on the shocks to the preceding j − y jt = m (cid:48) jt ( γ j + ˜ γ jt ) + ε jt , (3)where m jt = ( x (cid:48) t , u (cid:48) jt ) (cid:48) , ˜ γ jt = γ jt − γ j , and γ jt refers to the TVPs associated with the j th equation in γ t , and γ j denotes the corresponding constant part. All the following steps willbe carried out on an equation-by-equation basis and making use of the regression form in Eq.(3). For notational simplicity, we assume that all elements in z t are latent. In light of thediscussion in Sub-section 2.3, this implies that z jt = ( τ (cid:48) jt , S jt ) (cid:48) and the extension to includeobserved factors is trivial. Sampling z jt . Conditional on the remaining quantities of the model, we simulate thelatent (random walk and Markov switching) components in z jt by integrating out ˜ γ jt . Thisis achieved by rewriting Eq. (3) as:˜ y jt = m (cid:48) jt Λ j z jt + m (cid:48) jt η jt + ε jt , (4)with ˜ y jt = y jt − m (cid:48) jt γ j . Deﬁning ˜ m (cid:48) jt = m (cid:48) jt Λ j and ˆ ε jt = m (cid:48) jt η jt + ε jt allows us to cast Eq.(4) as a simple linear regression model:˜ y jt = ˜ m (cid:48) jt z jt + ˆ ε jt , ˆ ε jt ∼ N (0 , m (cid:48) jt diag( ω j ) m jt + e h jt ) . (5)This parameterization has the advantage that it does not depend on ˜ γ jt , and z jt can thusbe sampled marginally of ˜ γ jt . This improves mixing substantially since ˜ γ jt and z jt will oftenbe highly correlated (for a detailed discussion of this issue, see Gerlach et al., 2000; Giordaniand Kohn, 2008).Depending on the precise law of motion for the elements in z jt , standard algorithms cannow be used. In this paper, we use two diﬀerent law of motions. For the latent random walkfactors in τ jt , we use the forward ﬁltering backward sampling algorithm outlined in Carter13nd Kohn (1994) and Fr¨uhwirth-Schnatter (1994). In case of the latent Markov switchingfactors in S jt , we use the algorithm outlined in Kim and Nelson (1999). Both algorithmsare well known and relevant details may be found in the original papers. Here, it suﬃces tonote that in both cases, sampling the latent states is computationally easy since the statespace is low dimensional with R (cid:28) N . In this setting, sampling the factors equation-wisecan be carried out in O ( R ) steps, a substantial computational improvement relative to the O ( N ) steps necessary to estimating an unrestricted TVP regression (see also the discussionin Chan et al., 2020). Sampling the state innovation variances . To obtain draws for the state innovationvariances, reconsider Eq. (1) and draw them conditional on the observed/latent states andthe factor loadings using a generalized inverse Gaussian distribution. For further details andthe moments of this distribution, see Appendix A.

Sampling Λ j and γ j jointly . Similarly to z jt , we sample the non-zero loadings in Λ and the time-invariant coeﬃcients marginally of γ jt by using equation-by-equation estimation.The observation equation for equation j (conditional on z jt ) can be written as a standardregression model: y jt = ˆ m (cid:48) jt ˆ γ j + ˆ ε jt , (6)where ˆ m jt = ( m (cid:48) jt , ( z jt ⊗ m jt ) (cid:48) ) (cid:48) is an Rv j -dimensional vector of covariates, and ˆ γ j =( γ (cid:48) j , vec( Λ j ) (cid:48) ) (cid:48) denoting an Rv j -dimensional coeﬃcient vector. The posterior of ˆ γ j is Gaussianwith well known moments. Sampling the stochastic volatilities . The latent log-volatility processes can again besampled on an equation-by-equation basis. This step is implemented using the R -package stochvol . Sampling the horseshoe prior hyperparameters . Our assumptions imply analogous It is worth mentioning that the O ( N ) statement is true for the precision sampler and diﬀers for forward-ﬁltering backward-sampling algorithms. This section applies the model to predict the term structure of US interest rates. These timeseries are characterized by substantial non-linearities (e.g. during the period of the zero lowerbound), feature substantial co-movement both in the level of the time series but also in theparameters describing their evolution. Our proposed model framework might thus be wellsuited to capture such features. We investigate this claim in a thorough forecasting exerciseusing several established benchmarks. After showing that our approach yields favorableforecasts, we discuss the driving forces behind parameter changes as well as discuss how keyquantities that shape yield curve dynamics co-move over time at low frequencies.

Our aim is to predict monthly zero-coupon yields of US treasuries at diﬀerent yearly matu-rities. The data is described in detail in G¨urkaynak et al. (2007). The target variables are1 , , , ,

10 and 15 years maturities. All variables enter our model in ﬁrst diﬀerences.Estimation and forecasting is carried out recursively. Using data from 1973:01 to 1999:12,we produce one-month-ahead and three-months-ahead forecasts for 2000:01. After obtainingthe predictive distributions we expand the sample and repeat this procedure until we reach2019:12. Point forecast performance is measured using Root Mean Squared Errors (RM-SEs), while density forecasts are assessed in terms of Log Predictive Bayes Factors (LPBFs),averaged over the out-of-sample observations (Geweke and Amisano, 2010). Available online at federalreserve.gov/data/nominal-yield-curve.htm. Note that for our set of ﬁnancial indicators, data revisions and ragged edges arising from delays inthe publication of the series do not matter. This is due to ﬁnancial market data being available almostinstantaneously, and the published quotes are not subject to revisions at later dates. Thus, r t = ( r NGCI ,t − , r REC ,t − , r RF ,t − ) (cid:48) and R r = 3. Exogenous variablesenter the model as ﬁrst order lags. Higher-order forecasts involving exogenous variablesare based on random walk predictions of these quantities.We use these three eﬀect modiﬁers for simple reasons. First, there is strong evidence thatyield curve dynamics diﬀer across business cycle phases (see, e.g., Hevia et al., 2015). Second,the RF interest rate serves as an early warning indicator which possesses predictive powerfor changes in the shape of the yield curve. Finally, the inclusion of the NFCI is motivatedby the recent literature on forecasting tail risks in macroeconomic and ﬁnancial time series(see, e.g., Adrian et al., 2019; Carriero et al., 2020; Adams et al., 2021). The forecast exercise distinguishes between two model classes, with 15 distinct model spec-iﬁcations in each class. The ﬁrst model class involves TVP-VARs that incorporate the sixtarget variables as endogenous variables, that is M = 6. This model class is labeled as VAR.The second model class includes speciﬁcations based on the three-factor Nelson-Siegel Available online at mba.tuck.dartmouth.edu/pages/faculty/ken.french. i t ( θ ) = L t + (cid:18) − exp( − θα ) θα (cid:19) ˇS t + (cid:18) − exp( − θα ) θα − exp( − θα ) (cid:19) C¸ t . where i t ( θ ) denotes the yield at maturity θ at time t , L t is a factor that controls the level, ˇS t determines the slope, and C¸ t represents the curvature factor of the yields. The parameter α governs the exponential decay rate. To maximize the loading on C¸ t we set α = 0 . × . In what follows, we use the latent factors L t , ˇS t and C¸ t as endogenous variablesin the VAR speciﬁcations by deﬁning y t = ( L t , ˇS t , C¸ t ) (cid:48) , resulting in M = 3. These latentfactors are obtained by running OLS on a t -by- t basis. This model class is subsequentlylabeled NS-VAR.Model speciﬁcations are diﬀerentiated over a grid of eﬀect modiﬁer combinations. Inparticular, speciﬁcations within a model class diﬀer in terms of three aspects (see Table 1):First, in terms of whether the three exogenous variables (collected in r t ) are included in z t or not (”x” marks inclusion, ”–” indicates no observed factors); second, in terms of thenumber of latent random walk factors R τj included in z t (which we assume to be equal acrossequations); and third, in terms of the presence of Markov switching indicators in S t , againwith ”x” marking their inclusion and ”–” their absence. This setup implies that we have 15time-varying parameter NS-VAR and 15 time-varying parameter VAR model speciﬁcations.For comparative purposes we also consider the two class-speciﬁc constant parameter modelvariants, labeled “Constant,” and a conventional independent random walk speciﬁcation ofthe TVPs (i.e. we set the number of random walk factors equal to K and exclude r t and S t ).Notice that we also have a speciﬁcation which includes only S t and another one which usesonly observed factors. The latter one is closely related to a Markov switching model whereasthe second one closely resembles a VAR with interaction terms. See Diebold and Li (2006) for a discussion of this speciﬁc choice. .3 Results Table 1 shows the one-month and one-quarter-ahead out-of-sample forecasting results forUS treasury yields at diﬀerent maturities, using the 30 TVP model speciﬁcations and thetwo constant parameter model variants as described in the previous sub-section. Recall that M = 3 (and K = 9) in case of the NS-VAR and M = 6 (and K = 18) in case of theVAR model variants. z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) where r t denotes R r (= 3)-dimensional vector, S t a R S -dimensional vector, and τ t a R τ -dimensional vector. All speciﬁcations feature P = 3 lagsof the endogenous variables.The performance of point forecasts is measured in terms of RMSEs and that of densityforecasts in terms of LPBFs, relative to the constant parameter VAR model (shaded inyellow) that serves as benchmark. RMSEs are presented in ratios, and LPBFs in diﬀerencesare given below the RMSEs in parentheses. RMSEs below one indicate superior performance,relative to the benchmark, as LPBF ﬁgures greater than zero do. The best performing modelspeciﬁcation by column is given in bold, highlighting the speciﬁcation with the smallestRMSE ratio and the largest positive LPBF diﬀerence, respectively.The vast range of competing speciﬁcations, loss functions used to evaluate forecasts andmaturities makes it hard to identify a single best performing model. We ﬁrst provide ageneral overview on model performance and then zoom into diﬀerences in predictive accuracyfor point and density forecasts.At a very general level, Table 1 suggests a pronounced degree of heterogeneity in forecastaccuracy across models and for both the NS-VAR and VAR speciﬁcations. While diﬀerencesat some maturities are substantial, they are muted or non-existent for others. It is alsoworth mentioning that the TVP variants of the NS-VAR and VAR models outperform theconstant parameter speciﬁcations in most cases (apart from one-quarter ahead point forecastsof treasuries with a maturity of ﬁve years). This general observation holds true irrespective ofwhether only point forecasts or the full predictive distribution are considered. These accuracypremia point towards the necessity of addressing structural breaks in the dynamic evolution18 able 1: Out-of-sample forecasting results for US treasury yields at diﬀerent maturities using TVP-NS-VAR and TVP-VARmodel speciﬁcations.

Model Speciﬁcation One-month-ahead One-quarter-ahead r t δ S t Joint 1 year 3 year 5 year 7 year 10 year 15 year Joint 1 year 3 year 5 year 7 year 10 year 15 yearTVP-NS-VAR x 6 x 0.78 0.91 0.95 0.94 0.89 0.79 0.61 0.95 0.99 1.04 1.01 0.98 0.94 0.85(-0.79) (-0.28) (-0.11) (0.03) (0.14) (0.34) (0.72) (-1.42) (-0.35) (-0.20) (-0.06) (-0.01) (0.01) (0.10)x 3 x 0.79 0.96 1.00 0.96 0.90 0.78 0.60 0.95 1.03 1.06 1.02 0.98 0.94 0.84(-0.66) (-0.21) (-0.09) (0.04) (0.14) (0.35) (0.73) (-1.35) (-0.27) (-0.14) (-0.03) (0.00) (0.02) (0.11)x 1 x ) (0.74) (-1.26) (-0.24) (-0.13) (-0.02) (0.02) (0.04) (0.13)– 6 x 0.79 0.96 0.97 0.95 0.90 0.79 0.60 0.94 1.04 1.03 1.00 0.97 0.93 0.84(-0.86) (-0.30) (-0.11) (0.04) (0.15) (0.35) (0.72) (-1.49) (-0.37) (-0.19) (-0.05) (0.00) (0.01) (0.09)– 3 x 0.78 0.95 ) (0.36) (0.73) (-1.16) (-0.22) (-0.09) (0.00) (0.02) (0.03) (0.12)– 6 – 0.79 0.94 0.97 0.95 0.90 0.79 0.61 0.93 0.98 0.99 0.97 0.96 0.93 0.84(-0.91) (-0.33) (-0.15) (0.00) (0.12) (0.33) (0.70) (-1.43) (-0.40) (-0.23) (-0.07) (-0.01) (0.02) (0.10)– 3 – 0.79 0.94 0.96 0.95 0.90 0.79 0.60 0.92 0.98 0.98 0.96 0.95 0.92 0.83(-0.86) (-0.23) (-0.10) (0.04) (0.15) (0.34) (0.72) (-1.37) (-0.29) (-0.15) (-0.02) (0.02) (0.03) (0.12)– 1 – 0.78 0.95 0.97 0.94 0.89 0.78 0.60 (-0.72) (-0.20) (-0.06) (0.05) (0.15) (0.34) (0.72) (-1.30) (-0.22) (-0.11) (0.00) (0.03) (0.04) (0.11)x – – 0.78 0.93 0.99 0.95 0.89 0.77 0.59 0.91 ) (0.06) ( )– – x 0.78 0.95 0.98 0.95 0.89 0.78 0.60 0.94 1.05 1.03 0.99 0.96 0.93 0.83(-0.76) (-0.24) (-0.07) (0.04) (0.14) (0.33) (0.71) (-1.26) (-0.29) (-0.12) (-0.03) (0.00) (0.01) (0.11)– K – 0.79 0.96 0.98 0.95 0.90 0.78 0.60 0.92 0.99 1.00 0.97 0.95 0.92 0.83(-0.42) (0.11) (0.01) (0.07) (0.15) (0.34) (0.71) (-1.02) (0.07) (-0.01) (0.02) (0.03) (0.03) (0.11) Constant able 1 continued Model Speciﬁcation One-month-ahead One-quarter-ahead r t δ S t Joint 1 year 3 year 5 year 7 year 10 year 15 year Joint 1 year 3 year 5 year 7 year 10 year 15 yearTVP-VAR x 6 x 0.79 0.92 0.96 0.94 0.89 0.79 0.60 0.94 0.96 0.99 0.98 0.97 0.94 0.84(1.13) (0.06) (0.03) (0.07) (0.14) (0.33) (0.68) (0.63) (-0.01) (-0.01) (0.00) (0.00) (0.01) (0.08)x 3 x 0.78 0.92 0.96 0.94 0.89 0.78 0.60 0.94 0.98 1.00 0.99 0.98 0.94 0.84(1.20) (0.09) (0.03) (0.06) (0.14) (0.33) (0.70) (0.55) (0.01) (-0.02) (-0.01) (0.00) (0.02) (0.10)x 1 x 0.79 0.93 0.97 0.95 0.89 0.78 0.60 0.93 0.96 0.99 0.98 0.97 0.94 0.84(1.52) (0.10) (0.03) (0.06) (0.14) (0.33) (0.70) ( ) (0.02) (0.00) (0.01) (0.01) (0.02) (0.12)x 6 – 0.79 0.93 0.98 0.95 0.90 0.79 0.60 0.94 0.99 1.00 1.00 0.98 0.95 0.84(1.26) (0.06) (0.04) (0.07) (0.15) (0.34) (0.69) (0.69) (0.01) (0.01) (0.02) (0.02) (0.03) (0.10)x 3 – 0.78 0.91 0.96 0.94 0.89 0.79 0.60 0.94 0.98 1.00 1.00 0.98 0.95 0.85(1.20) (0.11) ( ) ( ) (0.16) (0.34) (0.71) (0.66) (0.03) (0.02) (0.02) (0.03) (0.04) (0.12)x 1 – 0.78 ) (0.04) (0.04) (0.05) (0.14)– 6 x 0.81 0.95 0.97 0.95 0.91 0.81 0.63 0.97 1.00 1.00 1.00 1.00 0.98 0.90(0.95) (0.02) (0.03) (0.07) (0.13) (0.32) (0.67) (0.45) (-0.04) (-0.03) (-0.01) (-0.01) (0.00) (0.06)– 3 x 0.79 0.95 0.98 0.95 0.90 0.79 0.60 0.93 0.99 0.99 0.97 0.96 0.93 0.83(1.08) (0.06) (0.02) (0.06) (0.12) (0.31) (0.68) (0.28) (-0.01) (-0.02) (-0.01) (-0.01) (0.01) (0.08)– 1 x 0.78 0.95 0.96 0.94 0.89 0.79 0.60 0.95 0.97 0.98 0.98 0.98 0.96 0.87(1.29) (0.08) (0.04) (0.07) (0.14) (0.32) (0.69) (0.44) (0.04) (0.00) (-0.01) (-0.01) (0.00) (0.09)– 6 – 0.79 0.95 0.96 0.95 0.90 0.79 0.60 0.92 0.99 0.98 0.97 0.96 0.93 0.83(1.04) (0.04) (0.05) (0.08) (0.15) (0.34) (0.69) (0.32) (-0.03) (0.01) (0.02) (0.03) (0.04) (0.11)– 3 – 0.79 0.94 0.97 0.94 0.89 0.79 0.61 0.93 0.98 0.98 0.97 0.96 0.93 0.83(1.27) (0.08) (0.05) (0.08) (0.15) (0.34) (0.70) (0.61) (0.00) (0.03) (0.04) (0.05) (0.06) (0.13)– 1 – 0.79 0.95 0.96 0.95 0.90 0.79 0.61 0.92 0.99 0.99 0.97 0.95 0.92 0.82( ) (0.09) (0.06) (0.08) (0.15) (0.35) (0.72) (0.78) (0.04) (0.04) ( ) (0.05) ( ) (0.15)x – – 0.79 0.93 0.97 0.95 0.90 0.79 0.60 0.93 0.98 0.99 0.98 0.97 0.93 0.83(1.26) (0.12) (0.04) (0.07) (0.14) (0.35) ( ) (0.75) (0.07) (0.03) (0.02) (0.01) (0.03) (0.15)– – x 0.79 0.94 0.96 0.93 0.89 0.79 0.61 0.93 0.97 0.97 0.97 0.96 0.94 0.86(0.83) (0.07) (0.02) (0.07) (0.14) (0.33) (0.70) (0.49) (0.02) (0.00) (0.01) (0.01) (0.02) (0.11)– K – 0.79 0.94 0.96 0.95 0.90 0.80 0.61 0.93 0.98 0.98 0.97 0.96 0.93 0.83(0.28) ( ) (0.03) (0.04) (0.10) (0.27) (0.67) (-0.15) ( ) (0.01) (-0.02) (-0.03) (-0.04) (0.07) Constant

Notes : We present the results of out-of-sample one-month-ahead and one-quarter-ahead forecasting from the 15 TVP-NS-VAR and 15 TVP-VAR model variants for maturities1, 3, 5, 7, 10 and 15 years and the corresponding joint measure across all maturities. Speciﬁcations of the TVP model variants are diﬀerentiated over a grid of eﬀect modiﬁercombinations, in terms of whether the exogenous variables in r t are included or not (“x” marks inclusion, “–” indicates absence), the number δ = R τ /M of latent factors perequation (with δ = K yielding conventional independent random walk speciﬁcations for the TVPs) and the presence of Markov switching factors in S t , again with “x” markinginclusion and “–” absence. “Constant” labels a conventional constant parameter VAR. M = 3 and K = 9 in the case of NS-VAR, and M = 6 and K = 18 in the case of VAR.We estimate and forecast recursively, using data from 1973:01 to the time that the forecast is made, beginning in 2000:01 through 2019:12. Root Mean Squared Errors (RMSEs)and Log Predictive Bayes Factors (LPBFs), averaged over the out-of-sample observations are given relative to the constant parameter VAR model (shaded in yellow). The bestperforming model speciﬁcation by column is given in bold, highlighting the speciﬁcation with the smallest RMSE ratio and the largest positive LPBF diﬀerence, respectively. f the yield curve.Comparing the NS-VAR and VAR speciﬁcations indicates that the latter usually performbetter for density forecasts, while the former are often superior in terms of point forecasts.The better point forecasting performance of the NS-VAR suggests that the three factorscontain relevant information for the ﬁrst moment of the predictive distribution. When wealso consider higher order moments, this story changes. The better density performanceof the VAR models is most likely driven by two sources. The ﬁrst is that, as opposed tothe conditional mean, the strong implicit assumptions of the NS-VAR on the predictionerror variance-covariance matrix seems overly restrictive. Allowing for richer dynamics inthe covariances by explicitly modeling the shocks to a panel of yields thus yields betterdensity forecasts. The second fact is that the small-scale NS-VARs might feature a tootight predictive variance since relevant information is ignored. And this could harm densityforecasting accuracy during turbulent times.Next, we zoom into speciﬁc model classes. Within these, diﬀerences in performance alongsub-divisions (provided by the inclusion/exclusion of the exogenous variables or the Markovswitching processes and the number of latent random walk factors) are often negligible. Thisﬁnding indicates that our ﬂexible approach to probabilistically selecting the most adequatestate evolution via Bayesian shrinkage is successful, and thus not susceptible to overﬁttingconcerns.We now turn to considering model-speciﬁc forecasting accuracy with the aim to ﬁnd thebest performing models for point and density forecasts and for the two diﬀerent forecasthorizons. The overall winner for point forecasts at the one-month horizon, on average, is theﬂexible NS-VAR model speciﬁcation featuring the exogenous variables, one latent factor perequation and the Markov switching processes. While this speciﬁcation yields RMSEs thatare 23 percent lower than those of the benchmark speciﬁcation, it must be acknowledgedthat most competing speciﬁcations exhibit values that are similar in magnitude. The sameis true for the best-performing model speciﬁcation at the one-quarter-ahead horizon, the21ne-factor NS-VAR without exogenous variables and Markov switching processes, albeit atmuch smaller margins versus the benchmark. Here, improvements are about nine percent interms of relative RMSEs. Assessing predictive performance for individual maturities allowsto identify which segment of the yield curve drives the overall results. While gains for shortermaturities in the case of one-month-ahead forecasts are muted, we observe large gains at thelong end of the yield curve. Relative to the best performing speciﬁcation, improvements areabout 40 percent in RMSE terms for the best performing speciﬁcation. The same is true forone-quarter-ahead forecasts, however, at smaller margins of about 20 percent.We proceed with our ﬁndings for density forecasts. As mentioned above, the VAR spec-iﬁcations overall exhibit more favorable relative LPBFs compared to the NS-VAR. This iseasily observable by noting that most bold values (in parentheses) are located in the lowerpanel of Table 1. In terms of average performance at the one-month horizon, we ﬁnd thatthe TVP-VAR speciﬁcation with one factor but no exogenous variables or Markov switchingperforms best, closely followed by the most ﬂexible speciﬁcation with one unobserved factorincluding Markov switching. This again serves as an example that even though it is ex-tremely ﬂexible, our approach to shrinking the parameter space avoids overﬁtting and doesnot harm predictive performance. In fact, for one-quarter-ahead forecasts, the TVP-VARwith exogeneous variables, one latent factor and Markov switching shows the largest gains interms of density forecasts. Again, it must be acknowledged that margins within this modelclass are rather small. As it is the case for point forecasts, these gains are mostly obtained interms of the long end of the yield curve, while gains in forecast accuracy at shorter horizonsare negligible.Summing up, while improvements relative to established models are often small, ourproposed framework is competitive for most maturities at both the one-month and one-quarter ahead horizons. Main diﬀerences arise from the considered model class, with the NS-VAR exhibiting promising results in terms of point forecasts, and superior density forecastsfor the VAR. Moreover, we detect the largest improvements at the long end of the yield curve.22he additional ﬂexibility of our proposed approach does not lead to severe overﬁtting sincewe use shrinkage priors to regularize several parts of the parameter space. And it almostnever harms forecast accuracy while improving performance in some cases. The previous sub-section established that the proposed approach yields more favorable pre-dictions than conventional TVP-VARs. Including observed and latent eﬀect modiﬁers allowsfor investigating the sources of time-variation in coeﬃcients, and thus the driving factors ofimprovements in predictive accuracy. We carry out a detailed analysis of these determinantsin this sub-section.Because of its favorable forecasting properties, we choose the NS-VAR with one latentfactor per equation to investigate the driving forces of parameter variation over the fullestimation sample. Recall that the choice of z t for this speciﬁcation translates into a singleequation-speciﬁc latent factor ( τ jt ), an equation-speciﬁc Markov switching indicator S jt andthe three observed early warning indicators.To illustrate the observed and latent factors, we transform each column vector in z t ex-post such that it is bounded between zero and one. This allows for comparisons even thoughsome blocks of the respective matrices are not econometrically identiﬁed. We thereforeexploit the fact that introducing any invertible R × R -dimensional matrix does not alter thelikelihood of the model, since Λ U − U z t = Λ z t . Deﬁne U as diagonal matrix such that themaximum of each z j (for j = 1 , . . . , R ) corresponds to one, the minimum is zero, ˜ Λ = Λ U − and ˜ z t = U z t . This simple linear transformation allows for assessing the relative movementof the indicators in ˜ z t without aﬀecting overall dynamics.Figure 1 displays the evolution of the normalized indicators in ˜ z t over time. First, wefocus on the features of the observed eﬀect modiﬁers depicted in the upper panel (a). The In particular, this is the case for the product Λ τ τ t . Note that we do not face this issue for Λ r r t and Λ S S t because both are either observed or already bounded between zero and one, and their scale and signare identiﬁed. igure 1: Evolution of the normalized eﬀect modiﬁers ˜ z t = U z t over time.(a) Observed factors r t r NFCI,t r REC,t r RF,t (b)

Equation-speciﬁc latent Markov switching factors S t S L,t S Š,t S Ç,t (c)

Equation-speciﬁc latent random walk factors t t t L,t t Š,t t Ç,t

Notes : Results are based on a TVP-NS-VAR model speciﬁcation z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) and δ = R τ /M = 1 and P = 3. Panel (a) shows the normalized observed factors r NFCI ,t , r REC ,t and r RF ,t (collected in r t ), panel (b)the posterior mean of the latent Markov switching factors, S L,t , S ˇS ,t and S C¸ ,t (collected in S t ) and panel (c)the posterior median of the latent random walk factors, τ L , τ ˇS and τ ¸c (collected in τ t ). Note that S jt and τ jt are equation-speciﬁc latent quantities with M (= 3) endogenous variables, j ∈ { L, ˇS , C¸ } . S L,t and τ L,t correspond to the ﬁrst equation, S ˇS ,t and τ ˇS ,t to the second, while S C¸ ,t and τ C¸ ,t to the third. Results arebased on the TVP-NS-VAR model variant with δ = R τ /M = 1 and using 15,000 MCMC draws. The grayshaded vertical bars represent recessions dated by the NBER Business Cycle Dating Committee. Sampleperiod: 1973:01 to 2019:12. Vertical axis: normalized values. Front axis: months. r NFCI ,t tend to coincide with recessionaryepisodes, indicated by the binary recession indicator r REC ,t . The risk-free interest rate r RF ,t can be related to the monetary policy stance. Early in the sample we observe substantialincreases, peaking during the Volcker disinﬂation in the early 1980s. Subsequently, large andabrupt decreases are notable during recessions, while an overall decreasing trend is observable.Before turning to the latent indicators in S t and τ t , note that the respective elements S jt and τ jt are equation-speciﬁc with j ∈ { L, ˇS , C¸ } referring to the level, slope and curvature ofthe yield curve. The middle panel (b) indicates the posterior median of the three Markovswitching factors collected in S t . The lower panel (c) shows the posterior medians of thethree (transformed) gradually changing latent random walk factors in τ t .Several features of the latent indicators are worth highlighting. Each of the latent quan-tities exhibits distinct dynamics and thus carries information in addition to the observedindicators. Examining the posterior means, which are essentially the unconditional posteriorprobabilities that a given Markov indicator equals one, we observe that both S L,t and S ˇS ,t evolve comparatively gradually over time. Before the Volcker disinﬂation, both tend to in-crease with posterior medians above 0 .

5, indicating that regime 1 is more likely. After 1985,we observe a major shift towards regime 0. Interestingly, for S ˇS ,t this transition appearsimmediately after 1985, while we detect a notable delay in S L,t .During the Great Moderation both indicators tend to remain associated with regime0. The indicator associated with the middle segment of the yield curve, S C¸ ,t , by contrast,transitions between regimes at a higher frequency. Particularly during the Volcker disinﬂationwe observe mixed patterns and no clear or steady tendency towards a single regime. Thischanges between 1990 and 1997, where the posterior mean of S C¸ ,t is consistently above 0 . S C¸ ,t switches abruptly into regime 0.25onditional on the respective loadings in Λ S being non-zero, this feature would directly relateto the observed narrowing spread of the yield curve and accompanying structural breaks incoeﬃcients of the equation related to the curvature of the yield curve.We observe several interesting features of the unobserved factors in τ t . While τ L,t is noisyand indicates substantial high-frequency movements, τ ˇS ,t and τ C¸ ,t are much smoother. Thefactor governing coeﬃcients in the level-equation of the yield curve peaks early in the sample,followed by a decline between 1980 and 1990. After a brief increase and stabilization between1990 and 2000, we see gradual declines until the global ﬁnancial crisis starting in 2007. Sincethen, the factor shows upward trending movement, with several high-frequency troughs. Bycontrast, the unobserved factor related to S t exhibits approximately linear trending behaviorfrom the beginning of the sample until the early 2000s, where it plateaued. After 2010,a gradual but moderate decrease is visible. τ C¸ ,t is comparable to τ L,t , albeit with severaldiﬀerences. While several peaks coincide, we also ﬁnd adverse movements, for instance inthe brief early 1980s recession and after 2000. Interestingly, high-frequency movements aremuted when compared to τ L,t .The preceding discussion of ˜ z t must be considered in light of the rescaled loadings in˜ Λ = Λ U − . ˜ Λ translates the law of motion captured in ˜ z t to the coeﬃcients in β t byacting either as ampliﬁer or attenuator. Figure 2 shows the posterior mean of the rescaledloadings Λ U − and allows to assess which elements in ˜ z t determine the time variation inthe TVPs. We diﬀerentiate the loadings along two dimensions. Panel (a) shows the eﬀectmodiﬁers related to the VAR coeﬃcients, while panel (b) depicts the block of ˜ Λ related tothe covariances (stored in q t ).Assessing the loadings in Λ r for the L t -equation reveals that a major part of the coeﬃ-cients loads strongly positive on r NFCI ,t . In the case of r RF ,t and r REC ,t the patterns are moremixed. For r NFCI ,t we ﬁnd a maximum loading of around 0 .

25 for ˇS t − , L t − , C¸ t − and L t − ,while for r RF ,t and r REC ,t some coeﬃcients load moderately positive (e.g., the ﬁrst own lag L t − ) and others moderately negative (e.g., loadings related to lags of the curvature factor).26 igure 2: Heat maps for rescaled loadings in ˜ Λ = Λ U − .(a) Coeﬃcients L t -equation ˇS t -equation C¸ t -equation Spacekeeper i t Ç t - Š t - L t - Ç t - Š t - L t - Ç t - Š t - L t - l NFCI l RF l RC l S l t i t Ç t - Š t - L t - Ç t - Š t - L t - Ç t - Š t - L t - l NFCI l RF l RC l S l t i t Ç t - Š t - L t - Ç t - Š t - L t - Ç t - Š t - L t - l NFCI l RF l RC l S l t −0.50−0.250.000.25 Posterior Mean (b)

Covariances w ( L t , Š t ) w ( L t , Ç t ) w ( Š t , Ç t ) l NFCI l RF l RC l S l t Posterior Mean

Notes : ˜ Λ translates the law of motion captured in ˜ z t to (a) the VAR coeﬃcients in β t , and (b) the covariancesstored in q t , based on a TVP-NS-VAR model speciﬁcation with z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) , δ = R τ /M = 1 and P = 3. ι t denotes the time-varying intercept, and L t − p , ˇS t − p , C¸ t − p for p = 1 , , L t , ˇS t and C¸ t , respectively. λ r relates to elements of ˜Λ associated with r NFCI ,t , r REC ,t and r RF ,t and the respectiveequations, λ S denotes loadings related to a single Markov switching factor collected in S t , and λ τ loadingscorresponding to latent random walk factors in τ t . Note that S jt and τ jt for j ∈ { L, ˇS , C¸ } are equation-speciﬁc quantities, while r t stays ﬁxed across equations. w ( ˇS t , C¸ t ) denotes the contemporaneous relationbetween the ˇS t - and C¸ t -equations, w ( L t , C¸ t ) and w ( L t , ˇS t ) are deﬁned analogously. Sample period: 1973:01to 2019:12. The loadings Λ S and Λ τ related to estimated latent factors exhibit only modest relevancefor deﬁning the law of motion in coeﬃcients related to L t . In particular, the ampliﬁers for τ t are shrunk heavily towards zero, implying that low frequency movements in the respectivecoeﬃcients are either already captured by other measures, or irrelevant for coeﬃcients in the L t -equation. The same is true for the indicators in S t and τ t for the case of the ˇS t -equation.27urning to observed factors, the corresponding factor loadings are positive for lower-order lags while strongly negative for higher-order lags. By contrast, we ﬁnd mostly negativeloadings for observed factors for ﬁrst-lags in the C¸ t -equation. Loadings related to higher-orderlags are mixed, and no clear patterns are observable. Interestingly, the curvature factor is theonly equation where we detect substantial loadings on the latent factors, with most loadingsshowing a positive sign.While unobserved factors appear to be less important for coeﬃcients in the conditionalmean of the model, they play an important role for the covariances, as indicated in panel (b).The covariance between ˇS t and C¸ t , w ( ˇS t , C¸ t ), shows pronounced loadings on all other eﬀectmodiﬁers than NFCI, where we observe a modest negative loading. A mixed pattern emergesfor the L t and C¸ t covariance, w ( L t , C¸ t ), with some positive and negative measures. Thecontemporaneous relationship w ( L t , ˇS t ) between L t and ˇS t , marks a particularly interestingcase, with modestly negative loadings on NFCI and RF, while REC and the unobservedfactor loadings are close to zero.Summarizing, we ﬁnd that observed factors often load strongly on the coeﬃcients of allequations. While the Markov switching indicator appears to be important particularly forthe L t - and C¸ t -equations, loadings are muted for the coeﬃcients of ˇS t . Another interestingaspect is that conditional on observed eﬀect modiﬁers, the gradually evolving coeﬃcientscaptured by τ t are mostly irrelevant for the L t - and ˇS t -equations, diﬀerent to strong positiveloadings in the case of the C¸ t -equation and the covariances. Additional results showing theactual regression coeﬃcients are provided in Appendix B. To assess what our framework implies on the relations between the factors that determine theyield curve, we compute the low-frequency relationship between the L t , ˇS t and C¸ t . We choosethis long-run correlation measure for two reasons. First, it allows to illustrate movements intime-varying coeﬃcients (i.e., transmission channels) and changes in the error variances in a28ingle indicator over time. Second, this measure compresses information of all coeﬃcients ina structurally meaningful way since it isolates long-run trends and correlations from short-run ﬂuctuations (Sargent and Surico, 2011; Kliem et al., 2016). Additional results for thereduced form coeﬃcients are provided in Appendix B.To construct the measure, we transform the TVP-VAR( P ) in Eq. (1) to its state-spaceTVP-VAR(1) form. In what follows, the observation equation is given by y t = J Y t , whilethe state equation is deﬁned as Y t = B t Y t − + ω t with ω t ∼ N ( , Ω t ). Here, J maps a K -dimensional vector Y t = ( y (cid:48) t , . . . , y (cid:48) t − P +1 ) (cid:48) to y t , B t collects the elements in β t in the upper M × K block and deﬁnes identities otherwise. Similar, the K × K -dimensional variance-covariance matrix Ω t collects elements in Σ t in the upper-left M × M block and is zerootherwise. We follow Sargent and Surico (2011) and ﬁrst calculate the spectral density Φ t (0)of y t at a zero frequency which coincides with the unconditional variance-covariance matrixof y t : Φ t (0) = J ( I − B t ) Ω t ( I − B (cid:48) t ) − J (cid:48) , for t = 1 , . . . , T. Next, we transform the covariances of Φ t into a correlation measure for each period t andeach variable combination i, j ( i (cid:54) = j and i, j = 1 , . . . , M ): φ ij,t = Φ ij,t (0)Φ jj,t (0) . The measure φ ij,t describes the long-run relations between variable i and j at each pointin time, and is displayed in Figure 3 for the model speciﬁcation TVP-NS-VAR with z t =( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) , R τj = 1 and P = 3. Note that the variables enter our model in diﬀerences. Hence,Figure 3 depicts the low frequency relations of changes in the level, slope and curvature ofthe yield curve.We observe several interesting periods characterized by structural breaks. First, therelationship between the level and slope of the yield curve was close to zero until the Volcker29 igure 3: Posterior median and the 68 percent credible set for pairwise long-run correlations.(a) φ L ˇS ,t −1.0−0.50.0 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 (b) φ L C¸ ,t −0.50.00.51.0 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 (c) φ ˇSC¸ ,t −1.0−0.50.00.5 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Notes : Panel (a) φ L ˇS ,t , (b) φ L C¸ ,t and (c) φ ˇSC¸ ,t , based on a TVP-NS-VAR with z t = ( r (cid:48) t , S (cid:48) t , τ (cid:48) t ) (cid:48) , δ = Rτ /M = 1 and P = 3. The colored solid lines denote the posterior medians, the black dashed line the zeroline, and the colored shaded band the 68 percent posterior coverage interval, while the gray vertical barsrepresent recessions dated by the NBER Business Cycle Dating Committee. φ ij,t is the long-run correlationfor the variables i and j at time t , for i, j ∈ { L, ˇS , C¸ } . Results are based on 15,000 MCMC draws. Sampleperiod: 1973:01 to 2019:12. Vertical axis: correlation measurements. Front axis: months. This paper proposes methods for automatically selecting adequate state equations in TVP-VAR models in a data-driven fashion. The TVPs are assumed to depend on a set of observedand unobserved covariates, also known as eﬀect modiﬁers. As unobserved covariates, weconsider a set of low dimensional latent factors that follow a random walk, alongside Markovswitching indicators that allow for abrupt structural breaks. Our model nests several alterna-tives commonly used in the literature on modeling macroeconomic and ﬁnancial time series.31o choose between state equations, we use a hierarchical Bayesian global-local shrinkage prioron the most ﬂexible speciﬁcation.We apply our econometric framework to US yield curve data. Carrying out a thoroughpredictive exercise, we show that our techniques produce favorable point and density forecastsvis-`a-vis a set of established benchmark models (which are nested variants of our proposedmodeling approach). The performance is speciﬁc to the information set used in the under-lying TVP-VAR and appears to be more pronounced for density forecasts. This exerciseillustrates that our approach produces very competitive forecasts without increasing the riskof overﬁtting, while providing a framework to trace the sources of time-variation – a keyadvantage compared to conventional TVP-VARs. This predictive exercise is complementedby a full-sample analysis of structural breaks in the relationship between the level, slopeand curvature of the US yield curve. We detect several interesting patterns in abrupt andgradual time-variation patterns in long-run cross-variable relations. These changes appearto be speciﬁc to the monetary regime and the state of the business cycle.

Acknowledgments : The authors gratefully acknowledge ﬁnancial support by the Jubil¨aums-fonds of the Oesterreichische Nationalbank (OeNB, project 18127) and by the Austrian Sci-ence Fund (FWF, project ZK 35).

References

Patrick A Adams, Tobias Adrian, Nina Boyarchenko, and Domenico Giannone. Forecasting macroe-conomic risks.

International Journal of Forecasting , in press, published online 6 Feb, 2021.Tobias Adrian, Nina Boyarchenko, and Domenico Giannone. Vulnerable growth.

American Eco-nomic Review , 109(4):1263–89, 2019.Omar Aguilar and Mike West. Bayesian dynamic factor models and portfolio allocation.

Journal f Business & Economic Statistics , 18(3):338–357, 2000.Giovanni Caggiano, Efrem Castelnuovo, and Giovanni Pellegrino. Estimating the real eﬀects ofuncertainty shocks at the zero lower bound. European Economic Review , 100:257–272, 2017.Andrea Carriero, Todd Clark, and Massimiliano Marcellino. Large vector autoregressions withstochastic volatility and ﬂexible priors.

Journal of Econometrics , 212(1):137–154, 2019.Andrea Carriero, Todd E Clark, and Massimiliano Giuseppe Marcellino. Capturing macroeconomictail risks with Bayesian vector autoregressions. Working Paper 202002R, Federal Reserve Bankof Cleveland, 2020.Chris K Carter and Robert Kohn. On Gibbs sampling for state space models.

Biometrika , 81(3):541–553, 1994.Carlos M Carvalho, Nicholas G Polson, and James G Scott. The horseshoe estimator for sparsesignals.

Biometrika , 97(2):465–480, 2010.Joshua CC Chan, Eric Eisenstat, and Rodney Strachan. Reducing the state space dimension in alarge TVP-VAR.

Journal of Econometrics , 218(1):105–118, 2020.Timothy Cogley and Thomas J. Sargent. Drifts and volatilities: Monetary policies and outcomesin the post WWII US.

Review of Economic Dynamics , 8(2):262 – 302, 2005.Thomas Dangl and Michael Halling. Predictive regressions with time-varying coeﬃcients.

Journalof Financial Economics , 106:157–181, 2012.Francis X Diebold and Canlin Li. Forecasting the term structure of government bond yields.

Journalof Econometrics , 130(2):337–364, 2006.Francis X Diebold, Canlin Li, and Vivian Z Yue. Global yield curve dynamics and interactions: Adynamic Nelson–Siegel approach.

Journal of Econometrics , 146(2):351–363, 2008.Sylvia Fr¨uhwirth-Schnatter. Data augmentation and dynamic linear models.

Journal of Time SeriesAnalysis , 15(2):183–202, 1994. ylvia Fr¨uhwirth-Schnatter and Helga Wagner. Stochastic model speciﬁcation search for Gaussianand partial non-Gaussian state space models. Journal of Econometrics , 154(1):85–100, 2010.Richard Gerlach, Chris Carter, and Robert Kohn. Eﬃcient Bayesian inference for dynamic mixturemodels.

Journal of the American Statistical Association , 95(451):819–828, 2000.John Geweke and Gianni Amisano. Comparing and evaluating Bayesian predictive distributions ofasset returns.

International Journal of Forecasting , 26(2):216–230, 2010.John Geweke and Guofu Zhou. Measuring the pricing error of the arbitrage pricing theory.

Reviewof Financial Studies , 9(2):557–587, 1996.Paolo Giordani and Robert Kohn. Eﬃcient Bayesian inference for multiple change-point and mixtureinnovation models.

Journal of Business & Economic Statistics , 26(1):66–77, 2008.Refet S G¨urkaynak, Brian Sack, and Jonathan H Wright. The US treasury yield curve: 1961 to thepresent.

Journal of Monetary Economics , 54(8):2291–2304, 2007.Trevor Hastie and Robert Tibshirani. Varying-coeﬃcient models.

Journal of the Royal StatisticalSociety: Series B (Methodological) , 55(4):757–779, 1993.Niko Hauzenberger. Flexible mixture priors for time-varying parameter models. arXiv preprint ,2006.10088, 2020.Niko Hauzenberger, Florian Huber, Gary Koop, and Luca Onorante. Fast and ﬂexible Bayesianinference in time-varying parameter regression models. arXiv preprint , 1910.10779, 2019.Constantino Hevia, Martin Gonzalez-Rozada, Martin Sola, and Fabio Spagnolo. Estimating andforecasting the yield curve using a Markov switching dynamic Nelson and Siegel model.

Journalof Applied Econometrics , 30(6):987–1009, 2015.Florian Huber, Gregor Kastner, and Martin Feldkircher. Should I stay or should I go? A latentthreshold approach to large-scale mixture innovation models.

Journal of Applied Econometrics ,34(5):621–640, 2019. regor Kastner and Sylvia Fr¨uhwirth-Schnatter. Ancillarity-suﬃciency interweaving strategy(ASIS) for boosting MCMC estimation of stochastic volatility models. Computational Statistics& Data Analysis , 76:408–423, 2014.Chang-Jin Kim and Charles R. Nelson.

State-space models with regime switching: Classical andGibbs-sampling approaches with applications . MIT Press, Cambridge and London, 1999.Martin Kliem, Alexander Kriwoluzky, and Samad Sarferaz. On the low-frequency relationshipbetween public deﬁcits and inﬂation.

Journal of Applied Econometrics , 31(3):566–583, 2016.Gary Koop, Roberto Leon-Gonzalez, and Rodney W Strachan. On the evolution of the monetarypolicy transmission mechanism.

Journal of Economic Dynamics and Control , 33(4):997–1017,2009.Dimitris Korobilis. High-dimensional macroeconomic forecasting using message passing algorithms.

Journal of Business & Economic Statistics , in press, published online 22 Nov, 2019.John M Maheu and Yong Song. An eﬃcient Bayesian approach to multiple structural change inmultivariate time series.

Journal of Applied Econometrics , 33(2):251–270, 2018.Enes Makalic and Daniel F Schmidt. A simple sampler for the horseshoe estimator.

IEEE SignalProcessing Letters , 23(1):179–182, 2015.Giorgio Primiceri. Time varying structural autoregressions and monetary policy.

Review of Eco-nomic Studies , 72(3):821–852, 2005.Thomas J Sargent and Paolo Surico. Two illustrations of the quantity theory of money: Breakdownsand revivals.

American Economic Review , 101(1):109–28, 2011.Christopher A Sims and Tao Zha. Were there regime switches in US monetary policy?

AmericanEconomic Review , 96(1):54–81, 2006.James H Stock and Mark Watson. Dynamic factor models. In Michael P. Clements and David F.Hendry, editors,

The Oxford Handbook of Economic Forecasting , pages 35–60. Oxford University ress, 2011. ppendices A Technical appendix

A.1 Sampling the state innovation variances

For sampling the state innovation variances based on Eq. (1), we let η ji,t denote the shock tothe i th coeﬃcient in ˜ γ t with respect to the j th equation. The posterior of the state innovationvariances is a generalized inverse Gaussian (GIG) distribution: ω ji | (cid:36) ji , ϑ j ∼ GIG (cid:32) − T , T (cid:88) t =1 η ji,t , (cid:36) ji ϑ j (cid:33) . A.2 Posterior for the horseshoe prior

Our speciﬁcation of the horseshoe prior on Λ j , γ j and the square root of ω j in Section 2.4for a generic parameter b i for i = 1 , . . . , K is: b i | c i , d ∼ N (0 , c i d ) , c i ∼ C + (0 , , d ∼ C + (0 , . We rely on this prior in its auxiliary representation as in Makalic and Schmidt (2015) foreﬃcient sampling of the local ( c i ) and global ( d ) shrinkage parameters: c i | e i ∼ G − (1 / , /e i ) , d | f ∼ G − (1 / , /f ) , e i ∼ G − (1 / , , f ∼ G − (1 / , . The generalized inverse Gaussian distribution is speciﬁed such that its density function is proportionalto f ( x ) = x λ − exp( − ( χ/x + ψx ) /

2) for a random variable x ∼ GIG ( λ, χ, ψ ). G − denotes the inverse Gamma distribution. This setup yields the following condi-tional posterior distributions: c i | b i , d, e i ∼ G − (cid:18) , e i + b i d (cid:19) , d | b i , c i , f ∼ G − (cid:32) K + 12 , f + K (cid:88) i =1 b i c i (cid:33) ,e i | c i ∼ G − (cid:0) , c − i (cid:1) , f | d ∼ G − (cid:0) , d − (cid:1) . B Further empirical results

This Appendix contains additional results for the reduced form coeﬃcients. While Sub-section 3.4 provides posterior estimates of the lower dimensional eﬀect modiﬁers, Figs.B.1–B.3 display the coeﬃcients obtained by multiplying the factor loadings with the ob-served/latent factors based on the relationship established in Eq. (1).38 igure B.1:

Posterior median of the coeﬃcients associated with the own lags of L t , ˇS t and C¸ t . (a) Coeﬃcients of ﬁrst own lag L t - Š t - Ç t - (b) Coeﬃcients of second own lag L t - Š t - Ç t - (c) Coeﬃcients of third own lag L t - Š t - Ç t - Notes : Panels (a), (b) and (c) show the dynamic evolution of the coeﬃcients related to the variables’ ownlags p ∈ { , , } of the respective equation for L t , ˇS t and C¸ t . Results are based on the TVP-NS-VARmodel variant with δ = R τ /M = 1 and using 15,000 MCMC draws. The black dashed line denotes thezero line, while the gray shaded vertical bars represent recessions dated by the NBER Business Cycle DatingCommittee. Sample period 1973:01 to 2019:12. Vertical axis: posterior median estimate. Front axis: months. igure B.2: Posterior median of the coeﬃcients associated with cross-variable lags of L t , ˇS t and C¸ t . (a) Coeﬃcients of ﬁrst other lags −0.50.00.51.01.5 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Equation L t Š t Ç t L t - Š t - Ç t - (b) Coeﬃcients of second other lags −0.9−0.6−0.30.00.3 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Equation L t Š t Ç t L t - Š t - Ç t - (c) Coeﬃcients of third other lags −0.4−0.20.00.20.4 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Equation L t Š t Ç t L t - Š t - Ç t - Notes : Panels (a), (b) and (c) show the dynamic evolution of the coeﬃcients related to cross-variable lags p ∈ { , , } of the respective equation for L t , ˇS t and C¸ t . Results are based on the TVP-NS-VAR modelvariant with δ = R τ /M = 1 and using 15,000 MCMC draws. The black dashed line denotes the zero line, whilethe gray shaded vertical bars represent recessions dated by the NBER Business Cycle Dating Committee.Sample period 1973:01 to 2019:12. Vertical axis: posterior median estimate. Front axis: months. igure B.3: Posterior median of the contemporaneous relationships between L t , ˇS t and C¸ t . −2024 1973:03 1980:01 1990:01 2000:01 2010:01 2019:12 Contemp. relations w ( L t , Š t ) w ( L t , Ç t ) w ( Š t , Ç t ) Notes : w ( ˇS t , C¸ t ) denotes the contemporaneous relation between the ˇS t - and C¸ t -equations, w ( L t , C¸ t ) and w ( L t , ˇS t ) are deﬁned analogously. Results are based on the TVP-NS-VAR model variant with δ = R τ /M = 1and using 15,000 MCMC draws. The black dashed line denotes the zero line, while the gray shaded verticalbars represent recessions dated by the NBER Business Cycle Dating Committee. Sample period 1973:01 to2019:12. Vertical axis: posterior median estimate. Front axis: months.= 1and using 15,000 MCMC draws. The black dashed line denotes the zero line, while the gray shaded verticalbars represent recessions dated by the NBER Business Cycle Dating Committee. Sample period 1973:01 to2019:12. Vertical axis: posterior median estimate. Front axis: months.