Flexible Mixture Priors for Large Time-varying Parameter Models
FFlexible mixture priors for time-varying parameter models
NIKO HAUZENBERGER ∗ University of Salzburg
Time-varying parameter (TVP) models often assume that the TVPs evolve accordingto a random walk. This assumption, however, might be questionable since it impliesthat coefficients change smoothly and in an unbounded manner. In this paper, we relaxthis assumption by proposing a flexible law of motion for the TVPs in large-scale vectorautoregressions (VARs). Instead of imposing a restrictive random walk evolution ofthe latent states, we carefully design hierarchical mixture priors on the coefficients inthe state equation. These priors effectively allow for discriminating between periodswhere coefficients evolve according to a random walk and times where the TVPs arebetter characterized by a stationary stochastic process. Moreover, this approach iscapable of introducing dynamic sparsity by pushing small parameter changes towardszero if necessary. The merits of the model are illustrated by means of two applications.Using synthetic data we show that our approach yields precise parameter estimates.When applied to US data, the model reveals interesting patterns of low-frequencydynamics in coefficients and forecasts well relative to a wide range of competing models.
JEL : C11, C30, C53, E44, E47
KEYWORDS : Time-varying parameter vector autoregressions, mixture priors, hier-archical modeling, clustering ∗ Salzburg Centre of European Union Studies, University of Salzburg. Address: M¨onchsberg 2a, 5020 Salzburg,Austria. Email: [email protected]. I thank Paul Hofmarcher, Florian Huber, Karin Klieber, Gary Koop,Luca Onorante, Michael Pfarrhofer and Anna Stelzer for valuable comments and suggestions. The author gratefullyacknowledges financial support from the Austrian Science Fund (FWF, grant no. ZK 35) and the OesterreichischeNationalbank (OeNB, Anniversary Fund, project no. 18127). a r X i v : . [ ec on . E M ] J un . INTRODUCTION A growing number of papers introduces time-varying parameters (TVP) in econometric modelsfor capturing structural breaks in relations across macroeconomic fundamentals (see, for example,Cogley and Sargent, 2005; Primiceri, 2005; Sims and Zha, 2006; Korobilis, 2013; Eickmeier et al. ,2015; Mumtaz and Theodoridis, 2018; Paul, 2019) and to achieve more accurate macroeconomicforecasts (see, for instance, Koop and Korobilis, 2012; 2013; D’Agostino et al. , 2013; Groen et al. ,2013; Bauwens et al. , 2015; Hauzenberger et al. , 2019; Huber et al. , 2020a;b).In this paper, we focus on estimating TVP vector autoregressive (VAR) models with a large num-ber of endogenous variables. Due to severe overfitting issues in large TVP-VARs, special emphasis ispaid to important modeling decisions, such as whether coefficients evolve gradually, change abruptlyor remain constant for subset of periods. In macroeconomic applications, it is common to assumethat coefficients evolve according to a random walk, implying that parameter change smoothly overtime. As noted by the recent literature (see, for example, Lopes et al. , 2016; Hauzenberger et al. ,2019), however, this assumption might be overly simplistic and lead to model misspecification.In large TVP-VARs it is often reasonable to assume that most parameters remain constantover time, while only few vary. To capture this behaviour, the Bayesian literature frequently usesshrinkage priors on the state innovation variances to sufficiently push them towards zero (Fr¨uhwirth-Schnatter and Wagner, 2010; Belmonte et al. , 2014). A severe drawback of this strategy is that itonly accounts for the case that a given coefficient is constant for all points in time (labeled staticsparsity).Another common situation faced by researchers is that coefficients change only at certain pointsin time (this is referred to as dynamic sparsity). Using a mixture distribution on the innovationvariances, for example, allows to push small parameter changes towards zero (see, inter alia, Ger-lach et al. , 2000; Giordani and Kohn, 2008; Koop et al. , 2009; Huber et al. , 2019). Alternatively,Hauzenberger et al. (2019) introduce a more flexible law of motion by assuming a conjugate hier-archical location mixture prior directly on the time-varying part of the coefficients. This locationmixture allows for a dynamically adjusting the prior mean on the TVPs to capture situations witha low, moderate or even large number of structural breaks in the coefficients. However, both tech-niques come with drawbacks. For instance, the mixture innovation model of Huber et al. (2019),equipped with a latent threshold mechanism, discriminates between a high and a low innovationvariance state. However, the authors do not discard the random walk law of motion, which might Other dynamic sparsification techniques include different forms of dynamic shrinkage processes (see, inter alia, Kalliand Griffin, 2014; Uribe and Lopes, 2017; Rockova and McAlinn, 2018; Kowal et al. , 2019; Hauzenberger et al. ,2020), latent threshold models (Nakajima and West, 2013) or dynamic model selection techniques (Chan et al. ,2012; Koop and Korobilis, 2013). e too restrictive. Hauzenberger et al. (2019) use either a conjugate g-prior (Zellner, 1986) or aconjugate Minnesota prior (Doan et al. , 1984; Litterman, 1986), potentially lacking flexibility todisentangle abrupt from gradual changes.In this paper, we carefully design suitable mixture priors for the state equation. In a first variant,a mixture prior is not only introduced on the state innovations, but also on the autoregressivecoefficients in the state equation to obtain sufficient flexibility. To achieve parsimony in large models,a latent binary indicator determines the law of motion for the TVPs and detect periods wherecoefficients evolve according to a random walk and times where the TVPs are better characterizedby a stationary stochastic process. Combined with a mixture on the innovation volatilites andsuitable shrinkage priors, this approach is capable of automatically capturing a wide range of typicalparameter changes. In a second variant, the sparse finite location mixture model of Hauzenberger et al. (2019) is extended by considering non-conjugate shrinkage priors and by replacing the locationmixture with a location-scale mixture. Here, an additional mixture on the state variances capturesthe notion that structural breaks in coefficients happen infrequently (with potentially large TVPinnovations), while most of the time coefficients are constant (with TVP innovations pushed towardszero), similar to mixture innovation models.In the previous paragraphs we repeatedly stated that our techniques are well suited to handleoverfitting issues in large TVP-VARs. But large TVP models also raise the question of computa-tional feasibility. In this contribution, computational complexity is reduced by using recent advancesin estimating large-scale TVP regression (see Chan and Jeliazkov, 2009; McCausland et al. , 2011;Hauzenberger et al. , 2020). These are based on rewriting the TVP model in its static regressionform. In this representation, the TVP model is treated as a very big regression model and thetechniques proposed in Bhattacharya et al. (2016) can be used. Since these algorithms are designedfor single equation models, we estimate the VAR model using its structural representation and thusestimate a set of unrelated TVP regressions (see Carriero et al. , 2019).Based on two applications we investigate the merits of the techniques developed in the paper.First, in an application using synthetic data we illustrate that the proposed methods work well indetecting small and large structural breaks in coefficients. Second, we employ a large US macroeco-nomic dataset for an empirical application. Our proposed methods reveal interesting patterns in thelow-frequency relationship between unemployment and inflation. Moreover, to evaluate predictiveperformance of our approach, we perform a comprehensive forecasting exercise. This forecastinghorse race shows that the proposed framework works well relative to a wide range of competingmodels. he remainder of the paper is structured as follows. Section 2 introduces a TVP regression modelwith flexible mixture priors and sketches the main contributions of the paper. Section 3 generallyoutlines inference in these models, while Section 4 discusses the posterior sampling algorithm ofBhattacharya et al. (2016), when applied to non-centerred TVP regressions. Section 5 and Section6 show the results for artificial data and US data, respectively. Finally, Section 7 summarizes andconcludes.
2. ECONOMETRIC FRAMEWORK2.1.
A TVP Regression
Let y t denote a scalar time series and x t refer to a K -dimensional vector of predictors, then theobservation equation for a TVP regression can be written as: y t = x (cid:48) t α t + ε t , ε t ∼ N (0 , σ t ) . (1)Here, α t is a K -dimensional vector of TVPs that relates x t to the quantity of interest and ε t denotesthe measurement error with mean zero and time-varying variance σ t . For the state equation of σ t ,we assume a stochastic volatility (SV) specification and refer to Appendix A.1 and Kastner andFr¨uhwirth-Schnatter (2014) for details.Typically, α t is assumed to evolve according to a random walk (RW). In this paper, interestcenters on relaxing this assumption. In the following, to achieve both sufficient flexibility and modelparsimony, we use two different mixture specifications for α t . In the first variant, we assume thatcoefficients evolve according to a mixture of a random walk and a white noise process. In thesecond variant, interest centers on further relaxing the law of motion proposed in Hauzenberger et al. (2019). A Flexible State Equation
For a mixture between a random walk and white noise process we assume that the evolution of α t is given by: α t = α + φ t ( α t − − α ) + ς t , ς t ∼ N ( , Ψ t ) (2)with α denoting a K -dimensional intercept vector, φ t being a K -dimensional diagonal autore-gressive coefficient matrix and ς t denoting a K -dimensional vector of state innovations, which arecentered on zero and feature a K × K -dimensional variance-covariance matrix Ψ t . Moreover, we ssume φ t and Ψ t to evolve according to a regime-switching process: φ t = S t (3)and Ψ t = S t ¯ Ψ + ( I K − S t ) ¯ Ψ . (4)Here, S t = diag ( s t , . . . , s Kt ) denotes a binary indicator matrix with { s it } Ki =1 being either zeroor one. ¯ Ψ = diag ( ¯ ψ , . . . , ¯ ψ K ), and ¯ Ψ = diag ( ¯ ψ , . . . , ¯ ψ K ) refer to K -dimensional diagonalmatrices. Equation 3 assumes that coefficients evolve according to a mixture of a random walkand a white noise process, while Equation 4 ensures sufficient flexibility of the state innovations,respectively. For example, if the covariate-specific indicator s it = 1 in the t th period, the i th covariatefollows a random walk with state innovation variance ψ it = ¯ ψ i , while if s it = 0 in the t th period, itfollows a stochastic process with variance ψ it = ¯ ψ i .This specification (henceforth labeled as TVP-MIX ) nests a wide variety of popular TVP models,such as standard RW state equations and mixture innovation models. A standard random walkevolution is trivially obtained by setting S t = I K . A so-called mixture innovation model assumes φ t = I K and specifies Ψ t similar to Equation 4 (Gerlach et al. , 2000; Giordani and Kohn, 2008; Koop et al. , 2009; Huber et al. , 2019). Additionally, mixture innovation specifications restrict ¯ Ψ = κ ˆ Ψ with κ being a small value close to zero and ˆ Ψ being a diagonal matrix collecting variable specificscaling parameters. Apart from discussing the relation to other popular TVP models, it is also worth highlightingadditional features of the model proposed in (2) to (4). If a parameter is almost constant, butalso features larger abrupt changes for some periods, we would expect that ¯ ψ i > ¯ ψ i . This case isof particular interest, when compared to a standard mixture innovation model with random walkstate equation. Conversely, if a coefficient features large, more persistent swings, but also someperiods of parameter stability, we would expect ¯ ψ i < ¯ ψ i . Intuitively, the relative proportions of¯ ψ i and ¯ ψ i depend mainly on the nature of coefficient changes. Alternatively, if the i th coefficientis constant or negligible (static sparsity), this can be achieved with ¯ ψ i and/or ¯ ψ i close to zero(Lopes et al. , 2016). Note that in the special case of constant coefficients, the proposed specificationis not identified. We address this issue in the context of interpreting the state indicators S t . In the empirical application, these are considered as important benchmarks. Related to literature on variable selection (George and McCulloch, 1993; 1997), here ¯ Ψ is commonly referred to asslab component and ¯ Ψ as spike component (see, for example, Huber et al. , 2019). .3. A Hierarchical Pooling Specification
For a hierarchical pooling specification, we follow Hauzenberger et al. (2019) and assume that thetime-varying part of α t follows a sparse finite mixture in the spirit of Malsiner-Walli et al. (2016).The specification of the state equation α t (labeled as TVP-POOL ) reads as: α t = α + γ t . (5)Here, α denotes a K -dimensional constant coefficient vector and γ t is assumed to be a K -dimensional vector of random coefficients featuring a specific structure. That is, conditional onlatent group indicators θ t that takes a value n ∈ { , . . . , N } , γ t follows a multivariate Gaussiandistribution: γ t = µ n + ς t , ς t ∼ N ( , Ψ t ) , if θ t = n, (6)where µ n refers to the group-specific mean and Ψ t denotes the variance-covariance matrix. It isalso worth noting that θ t serves as group indicator for γ t . The probability that γ t is assigned tocluster n is defined as P ( θ t = n ) = ω n .This structure is closely related to the setup of Hauzenberger et al. (2019). In the following,we extend their location mixture prior to a location-scale mixture prior by introducing a regime-switching specification on Ψ t similar to Equation 4. That is, Ψ t = S t ¯ Ψ + ( I K − S t ) ¯ Ψ , (7)with both ¯ Ψ and ¯ Ψ being diagonal matrices and S t denoting a binary indicator matrix. Sim-ilar to standard mixture innovation models one component serves to detect larger breaks, while asecond component handles dynamic sparsity. We therefore discard the conjugate prior assumptionof Hauzenberger et al. (2019) and instead assume non-conjugate shrinkage priors on both statevariances (described in more detail in Subsection 3.1).Before proceeding, it is also worth sketching the general idea of this random coefficient specific-ation. This model can be seen as a stochastic variant of multiple break point specifications (Koopand Potter, 2007), which is capable of capturing situations with a low, moderate or even largenumber of structural breaks. To estimate the number of regimes, we follow Malsiner-Walli et al. (2016) and Hauzenberger et al. (2019) and specify an “overfitting” model by setting N to a largeinteger (i.e. consider many regimes a priori). To achieve parsimony, we come up with an estimatefor the number of clusters ˆ N (usually ˆ N < N ) by specifying a shrinkage prior on both the mixture eights and the component means. Thus, overall shrinkage is determined between two interactingobjectives: we aim at eliminating irrelevant clusters, while at the same time attempting to avoidhighly overlapping component means.At this stage one might ask, why we do not assume N different state innovation variances(i.e. using the group indicators θ t for both γ t and Ψ t )? Here, it is worth discussing two importantconsiderations. First, N denotes a large integer and might lead to overfitting issues without assumingadditional hierarchical shrinkage/pooling priors on the state innovation variances. Second, covariate-specific binary indicators ( S t ) for the scales already render the model highly flexible and it allowsto introduce shrinkage on the state innovation variances in a simpler way. Moreover, the two-state mixture on the state variances (see Equation 7) is designed to support inference about thelocations γ n , for n = { , . . . , N } . We expect that many elements in { γ t } Tt =1 cluster around zero(i.e. coefficients are constant with Ψ t close to zero), while occasionally there are structural breaksin some coefficients (requires relatively large values in Ψ t ). Especially we aim to detect these twoextremes (changes/no changes in α t ) with γ t . The Latent State Indicator Matrix
Sofar we remained silent on the evolution of S t . There are many different possibilities how thebinary indicators s it , for i = { , . . . , K } evolve over time. In the following, we assume two laws ofmotion:1. Pooled Markov-switching process:
When assuming a first-order Markov process for each s it independently, sampling the state indicators can be computationally cumbersome, espe-cially if K is large. Since one has to rely on forward filtering backward sampling algorithms,computation time quickly adds up. Therefore, we replace S t with s t I K . In the following, s t is assumed to be common to all K covariates in period t and governed by a joint Markovprocess. This process is driven by a transition probability matrix given by: P = p − p − p p , with transition probabilities from state k to l denoted by p kl and following a Beta distribution p kk ∼ B ( c k , c k ), for k = { , } (see Uribe and Lopes, 2017).2. Independent over time and covariate-specific indicators:
The assumption that a jointindicator governs the evolution of large number of coefficients might be too inflexible in certaincases. For this reason, we also specify covariate-specific indicators, coupled with independent Alternatively, Koop et al. (2009) group coefficients and assume class-specific indicators. ixture priors (see Lopes et al. , 2016). In contrast to covariate-specific Markov processes,mixture priors are assumed to be independent over time and thus do not involve computa-tionally demanding forward filtering backward sampling algorithms. In the following, s it isassumed to follow an independent Bernoulli distribution with P ( s it = 1) = p i and p i beingBeta distributed, i.e. p i ∼ B ( c i, , c i, ).Moreover, it should be noted that the prior choice on the binary indicators is quite influential.For the random walk/white noise mixture ( TVP-MIX ), the hyperparameters are chosen in such away that it is more likely that gradual changes have a higher (unconditional) expected duration(with s t = 1) than abrupt changes (with s t = 0). In the empirical application we therefore set c = 0 . , c = 30 , c = 30, c = 0 . c i, = 0 . c i, = 30, i = { , . . . , K } , for the independent mixture distribution. For the location-scale mixture ( TVP-POOL with S t solely governing the state innovation variances, we take a more agnostic approach byassuming c = 0 . c = c i, = 0 . c = c = c i, = 3.
3. BAYESIAN INFERENCE
To discuss inference for both variants outlined in Section 2, we introduce a very general stateequation for α t : α t = α + γ t + φ t ( α t − − α ) + ς t , ς t ∼ N ( , Ψ t ) . (8)Equation 8 nests both approaches with the first variant ( TVP-MIX ) being obtained by setting γ t = K × , while the second approach ( TVP-POOL ) is given by defining φ t = K × K and γ t = µ n , if θ t = n . The Non-Centered Parameterization
In this subsection we exploit the non-centered parameterization to write ¯ Ψ and ¯ Ψ as part ofthe observation equation, enabling shrinkage on the regime-switching state innovation volatilities(Fr¨uhwirth-Schnatter and Wagner, 2010).We therefore recast the model as follows: y t = x (cid:48) t ( α + ˜ Ψ t ˜ α t ) (cid:124) (cid:123)(cid:122) (cid:125) α t + σ t (cid:15) t , (cid:15) t ∼ N (0 , , ˜ α t =˜ γ t + φ t ˜ α t − + η t , η t ∼ N ( , I K ) , ˜ α = , φ = I K . (9) ere, ˜ α t is a K -dimensional vector of normalized states, defined as ˜ α t = ˜ Ψ − t ( α t − α ) and˜ γ t = ˜ Ψ − t γ t with ˜ Ψ t = diag ( √ ψ t , . . . , √ ψ Kt ) denoting the (matrix) square-root of Ψ t . Usingthe definition of Ψ t in Equation 4 (or Equation 7) the observation equation in Equation 9 can berewritten as: y t = x (cid:48) t ( α + S t ˜ Ψ ˜ α t + ( I K − S t ) ˜ Ψ ˜ α t ) + σ t (cid:15) t , and, more compactly, as a standard regression model: y t = ˆ x (cid:48) t ˆ α + σ t (cid:15) t , with ˆ x t = ( x (cid:48) t , ( S t x t (cid:12) ˜ α t ) (cid:48) , (( I K − S t ) x t (cid:12) ˜ α t ) (cid:48) ) (cid:48) denoting a 3 K -dimensional covariate vector andˆ α = ( α (cid:48) , (cid:112) ¯ ψ , . . . , (cid:112) ¯ ψ K , (cid:112) ¯ ψ , . . . , (cid:112) ¯ ψ K ) (cid:48) being a 3 K -dimensional coefficient vector.On the time-invariant ˆ α we use a hierarchical global-local shrinkage prior (see Polson and Scott,2010):ˆ α j ∼ N (0 , τ j ) , τ j | λ ∼ f, λ ∼ g, for j = 1 , . . . K, where ˆ α j refers to the j th element in ˆ α , λ denotes a global shrinkage parameter and τ j induceslocal shrinkage. In the empirical application, we focus on the Normal-Gamma (Griffin and Brown,2010) shrinkage prior. This shrinkage prior has been proven to be successful in macroeconomic andfinancial application (see, for example, Huber and Feldkircher, 2019) and is quite common in theliterature. The exact prior specification is outlined in Appendix A.2.
The Static Representation
If interest centers on estimating the latent states { ˜ α t } Tt =1 , we can straightforwardly recast Equation 9in a static regression form by conditioning on α , the state innovation volatilities { ˜ Ψ t } Tt =1 and thestochastic volatilities in Σ = diag ( σ , . . . , σ T ). We define y as a T -dimensional vector, X as a T × K -dimensional matrix and (cid:15) as a T -dimensional vector with y t , x (cid:48) t and (cid:15) t on the t th position,respectively. Then, the static form of Equation 9 is: y = Xα + W ˜ α + Σ (cid:15) , (cid:15) ∼ N (0 , I T ) , Φ ˜ α =˜ γ + η , η ∼ N ( , I ν ) . It is worth noting that any global-local shrinkage prior might be used. Other popular choices are the SSVS prior(George and McCulloch, 1993; 1997), the Horseshoe prior (Carvalho et al. , 2010), the Bayesian Lasso (Park andCasella, 2008) or the Triple-Gamma prior (Cadonna et al. , 2020). See also Huber et al. (2020a), Kastner and Huber(2020) and Cross et al. (2020) for thorough studies of global-local shrinkage priors in macroeconomic applications. ere, ˜ α = ( ˜ α (cid:48) , . . . , ˜ α (cid:48) T ) (cid:48) is a ν (= T K )-dimensional latent state vector, ˜ γ = (˜ γ (cid:48) , . . . , ˜ γ (cid:48) T ) (cid:48) is a ν -dimensional intercept vector and η is a ν -dimensional shock vector. After defining ˜ x (cid:48) t = x (cid:48) t ˜ Ψ t , theprecise structure of W and Φ is given by: W = ˜ x (cid:48) (cid:48) K × . . . (cid:48) K × (cid:48) K × ˜ x (cid:48) . . . (cid:48) K × ... ... . . . ... (cid:48) K × (cid:48) K × . . . ˜ x (cid:48) T , and Φ = I K K × K . . . K × K K × K − φ I K . . . K × K K × K ... ... . . . ... ... K × K K × K . . . − φ T I K . In the following, solving for ˜ α yields:˜ α = Φ − (˜ γ + η ) , implying that ˜ α ∼ N ( a , Ω ) with prior mean a = Φ − ˜ γ and prior variance-covariance matrix Ω = ( Φ (cid:48) Φ ) − of ˜ α (see, for instance, Chan and Jeliazkov, 2009; Chan and Strachan, 2020). Inthe special case of φ t = K × K , for all t , Φ (and thus Ω ) reduces to an identity matrix, while φ t (cid:54) = K × K , for any t , induces a (specific) banded lower-triangular (block diagonal) structure of Φ ( Ω ). Here it is worth emphasizing, that the prior variance-covariance matrix Ω solely dependson state indicators S t .Moreover, the prior mean a also depends on the structure of Φ and ˜ γ . The simplest thing isto set a to a zero vector, which we implicitly assume for the TVP-MIX variants. For the
TVP-POOL approach we use a hierarchical mixture prior on a , described in detail next. A Hierarchical Prior Mean
The model outlined in Equation 5 to Equation 7 denotes a sparse finite location-scale mixture. Afterrecasting the model in the non-centered parameterization, we are able to replace the location-scalemixture prior on α t (outlined in Equation 6) with a location mixture prior on the normalized latentstates ˜ α t , since the scales of Equation 7 ( ¯ Ψ and ¯ Ψ ) are now part of the observation equation.That is:˜ α t | θ t = n ∼ N ( ˜ µ n , I K ) . (10)with group-specific mean ˜ µ n , for n = { , . . . , N } and variance-covariance matrix I K . In the follow-ing, the prior mean is defined as a (= ˜ γ ) = ( a (cid:48) , . . . , a (cid:48) T ) (cid:48) , with a t = ˜ µ n if θ t = n . Note that φ t = K × K , for t = { , . . . , T } , is always true for the TVP-POOL model, but not ruled out for the
TVP-MIX specification. Moreover, if Ω = I ν , a = ˜ γ . he sparse finite location mixture in Equation 10 allows us to use a similar prior setup asproposed in Malsiner-Walli et al. (2016) and Hauzenberger et al. (2019). To ensure model parsimonywe use a Dirichlet prior on ω = ( ω , . . . , ω N ) (cid:48) : ω | ξ ∼ Dir( ξ, . . . , ξ ) , with ξ referring to an intensity parameter. The prior on the intensity parameter is specified as: ξ ∼ G ( d , d N ) , with d = 10 in the empirical application. Here, we closely follow Malsiner-Walli et al. (2016),who show that this prior choice is successful in detecting superfluous components and obtaining aparsimonious mixture representation.Moreover, on the group means we specify the following shrinkage prior:˜ µ n | ˜ α ∼ N ( K × , Λ ) , with ˜ µ n being centered on zero and prior variance-covariance matrix Λ = LRL . Here, L =diag ( √ l , . . . , √ l K ) and R = diag( r , . . . , r K ) with r i denoting the range of ˜ α i = ( ˜ α i , . . . , ˜ α iT ) (cid:48) .Moreover, we specify a Gamma prior on the elements in L : l i ∼ G ( e , e ) . In what follows, we define e = e = 0 . et al. , 2016).
4. POSTERIOR COMPUTATION
In this subsection, we outline the MCMC sampling step for ˜ α . We stress that drawing ˆ α is computa-tionally fast, also for relatively large K , but sampling the ν -dimensional vector ˜ α is computationallydemanding (Hauzenberger et al. , 2020). Thus for ˆ α (and the remaining parameters) we use standardMCMC techniques with sampling steps and conditional posteriors outlined in Appendix A.3. or ˜ α , irrespectively of the structure of a and Ω , we obtain standard conditional Gaussianposterior quantities with ˜ y = Σ − ( y − Xα ) and ˜ W = Σ − W :˜ α | ˜ y , ˜ W , a , Ω ∼N ( a , Ω ) with Ω − = ( ˜ W (cid:48) ˜ W + Ω − ) (cid:124) (cid:123)(cid:122) (cid:125) ν × ν and a = Ω ( ˜ W (cid:48) ˜ y + Ω − a ) . The main issue, however, is that the inversion of Ω − is computationally costly, since it is ν × ν -dimensional matrix with ν = T K and { T, K } being potentially large integers.Thus, to avoid high-dimensional full matrix inversions and Cholesky decompositions for drawingthe normalized latent states ˜ α , we rely on the algorithms proposed in Bhattacharya et al. (2016),applied to TVP models in Hauzenberger et al. (2020). This method involves the following steps:1. Draw a ν -dimensional vector u ∼ N ( , I ν ) .
2. Sample a T -dimensional vector v ∼ N ( , I T ) .
3. Define q = a + Φ − u , with Φ − denoting the lower Cholesky factor of Ω , and r = ˜ W q + v .
4. Compute ˜ Ω = ( I T + ˜ W Ω ˜ W (cid:48) ) − .
5. Set f = ˜ Ω ( ˜ y − r ) .
6. Obtain a draw for ˜ α = ( Ω ˜ W (cid:48) f ) + q . Moreover, using the static representation for a TVP regression the involved matrices are sparse,which can be exploited to achieve additional computational gains (see Chan and Jeliazkov, 2009;Hauzenberger et al. , 2019; 2020). Depending on the structure of Φ there a two extreme cases asbriefly discussed in Subsection 3.2. Computationally the most expensive case is a random walkstate equation ( φ t = I K , ∀ t ), while having no autoregressive structure in the state equation( φ t = K × K , ∀ t ) it is computationally less demanding. Recall, the former Φ has a specific lowertriangular structure (rendering Ω block diagonal) and in the latter both Φ and Ω are diagonal.Thus, even for a random walk state equation (the most dense case), using sparse algorithms pays offin terms of computation. Moreover, if φ t = S t , for some t , we have to account for an intermediatecomputational burden lying between the two extremes that eventually depend on the exact structureof Φ ( Ω ). See Hauzenberger et al. (2020) for a comparison between the two extremes. .1. Equation-wise estimation for a TVP-VAR
The methods outlined in the previous subsection are designed for single equation models. To usethese algorithms also for posterior inference in TVP-VARs, we rewrite the multivariate model as aset of unrelated TVP regressions (see Carriero et al. , 2019).This can be done by using the structural form of the TVP-VAR: Y t = B t Y t + p (cid:88) i =1 B it Y t − i + C t + (cid:15) t , (cid:15) t ∼ N ( , Σ t ) . (11)Here, Y t = ( Y t , . . . , Y mt ) (cid:48) denotes an m -dimensional vector of endogenous variables with B t beingan m × m -dimensional strictly lower-triangular matrix (with zero main diagonal) defining contem-poraneous relationships between the elements of Y t . Moreover, B it , for i = 1 , . . . , p , denotes an m × m -dimensional time-varying coefficient matrix, C t is an m -dimensional intercept vector and (cid:15) t refers to an m -dimensional Gaussian distributed error vector, centered on zero and with time-varying m -dimensional diagonal variance-covariance matrix Σ t = diag ( σ t , . . . , σ mt ). Before proceeding, itis convenient to define B t = ( B t , . . . , B pt ).In the following, for i = 2 , . . . , m , the i th equation of Y t is given by: y it = x (cid:48) it ( α i + ˜ α it ) (cid:124) (cid:123)(cid:122) (cid:125) α it + (cid:15) it , (cid:15) it ∼ N (0 , σ it ) , with x it denoting a K i (= mp + i )-dimensional covariate vector with x it = ( { y jt } i − j =1 , Y (cid:48) t − , . . . , Y (cid:48) t − p , (cid:48) and α it = ( { b ij, t } i − j =1 , B i • ,t , c it ) (cid:48) a K i -dimensional vector of time-varying coefficients. Here b ij, t refers to the ( i, j ) th element of B t , B i • ,t denotes the i th row of B t and c it the i th element of C t .Moreover, for the first equation ( i = 1) we have x t = ( Y (cid:48) t − , . . . , Y (cid:48) t − p , (cid:48) and α t = ( B • ,t , c t ) (cid:48) .
5. SIMULATION STUDY
In this section we use synthetic data to illustrate the features of the proposed mixture variants.For the data generating process (DGP) we assume that the number of observations is T = 100 andthe number of covariates is given by K = 5. The covariates are simulated with X j ∼ N ( , I T )for j = 1 , . . . , ( K −
1) and X K = ι T with ι T being a T -dimensional vector of ones. For the errorvariance σ t , we assume an SV specification with log( σ t ) = h t following a random walk process.That is, h t = h t − + ϑ t with ϑ t ∼ N (0 , .
1) and h = log(0 . α t we assume quite specific laws of motion. We define α = ( − , , − , , (cid:48) as initial level and assumethat both regime-switching autoregressive parameters and regime-switching variances in the stateequation are governed by a joint Markov process s t . Here, we let S t = s t I K , φ t = S t , Ψ t = S t ¯ Ψ + I K − S t ) ¯ Ψ with ¯ Ψ = diag (10 − , . , . , − ,
1) and ¯ Ψ = diag (1 , . , . , − , − ). Thejoint indicator s t is simulated with transition probabilities p = 0 . p = 0 .
95, effectivelyleading to a higher unconditional probability that α t follows a random walk evolution.In particular, the first coefficient features larger, more persistent parameter changes (with ψ =1), while for a small number of periods α t is basically constant (achieved through the white noisestate equation with ψ close to zero). The second parameter features small gradual changes overtime ( ψ = 0 . ψ = 0 . ψ > ψ . The fourth coefficient is assumed to be constant over time( ψ and ψ are both close to zero) and, finally, the fifth parameter features some extremely largebreaks (with ψ = 1), while it is otherwise assumed to be constant (here achieved through therandom walk state equation with ψ close to zero).To assess the flexibility of our approaches we compare them to models assuming a standardrandom walk evolution of coefficients and to those assuming constant coefficients. Moreover, weconsider a typical mixture innovation model as an important benchmark. For each model weuse a Normal-Gamma (Griffin and Brown, 2010) prior on the constant part and allow for SV inthe measurement error variance. Furthermore, each TVP model features a Normal-Gamma prioron the square root of the state innovation variances, which are potentially regime-switching (seeEquation 9).Panels (a) to (c) in Figure 1 depict the evolution of regression parameters estimated with ourproposed methods, while panel (d) shows estimates with a typical mixture innovation model. Thered solid lines denote the true coefficients, the blue shaded areas represent the 68% posterior credibleinterval (with the blue solid lines referring to the posterior median), the gray shaded area representthe respective credible set of a standard TVP model with random walk state equation and theblack-dotted lines indicate the 16 th /50 th /84 th percentiles of a constant coefficient model.The results with artificial data reveal at least three important features. First, all TVP modelsyield reasonable estimates for constant coefficients, which is most important for forecasting applica-tions. Focussing on the the fourth parameter, considering a more flexible model pays off to produceless biased and more precise estimates, especially when compared to the constant coefficient model.Second, the TVP-MIX specifications are capable in capturing both rapid shifts and smooth adjust-ments in the regression coefficients. Our methods in panel (a) and (b) tend to quickly adjust whenfacing high frequency changes, rendering the methods even more flexible when compared to a typ-ical mixture innovation specification in panel (d). Third, the
TVP-POOL model in panel (c) tends todetect sudden changes in the parameters quite well, but is less capable in capturing low frequencymovements. This feature differs form a standard random walk evolution assumption on the TVPs. ssuming a standard random walk implies smoothly evolving coefficients makes capturing high fre-quency changes difficult. Interestingly, the time-varying intercept (the fifth coefficients) tends tosoak up movements of other parameters. Models that do not truly detect the large breaks of thethird coefficient are particularly prone to this issue ( TVP-POOL but also
TVP-RW specifications). st coefficient with true parameters: α = − , ¯ ψ ≈ , ¯ ψ = 1 nd coefficient with true parameters: α = 3 , ¯ ψ = 0 . , ¯ ψ = 0 . rd coefficient with true parameters: α = − , ¯ ψ = 0 . , ¯ ψ = 0 . th coefficient with true parameters: α = 2 , ¯ ψ ≈ , ¯ ψ ≈ th coefficient with true parameters: α = 0 , ¯ ψ = 1 , ¯ ψ ≈ (a) TVP-MIX with flexible state variances (
FLEX ) and S t = s t I K ( MS ) : − − − − − − − − − − . . . . . . . − − − (b) TVP-MIX with flexible state variances (
FLEX ) and covariate-specific indicators (
MIX ) : − − − − . . . . . . . . − − − − − − . . . . . . . − − − (c) TVP-POOL with flexible state variances (
FLEX ) and covariate-specific indicators (
MIX ) : − − − − . . . . . . . − − − − − − . . . . − − (d) TVP-RW with SSVS-type state variances (
SSVS ) and covariate-specific indicators (
MIX ) : − − − − . . . . . . . . − − − − − − . . . . . . . − − − − Figure 1:
The blue-shaded areas denote the 68% posterior credible intervals of the proposed methods with the blue solid lines denoting the posterior medians.The gray shaded areas refer to the 68% credible sets of a standard TVP regression with random walk state equation. The black dotted lines indicate the16 th /50 th /84 th percentiles of a constant coefficient model. Moreover, the red lines denote the true coefficients of α t . . EMPIRICAL APPLICATION Structural analysis and forecasting key macroeconomic indicators is of great relevance for policymakers. In the empirical work, we focus on output growth, inflation, unemployment, and/or theinterest rate. Focussing on these variables we investigate the merits of our approach by using thepopular quarterly US data described in McCracken and Ng (2016). The data set includes 165macroeconomic and financial variables and ranges from 1959:Q1 to 2019:Q4. In Subsection 6.1 weshow some stylized in-sample features of our methods for a small-scale model. By including the fourtarget variables in a small-scale VAR (henceforth
S-VAR ) we present posterior probabilities of thestate indicator matrix S t and estimate the low-frequency relationship between unemployment andinflation. Moreover, in Subsection 6.2, this variable set forms the basis for evaluating the predictiveperformance of our methods in a comprehensive forecast exercise.For the forecasting exercise, we consider two additional information sets. In our largest specific-ation ( L-VAR ) we pick 20 macroeconomic indicators, which are commonly considered by the recentliterature for forecasting (see, for example, Huber et al. , 2020a; Pfarrhofer, 2020). In particular,we include financial market indicators that carry important information about the future stance ofthe economy (see Ba´nbura et al. , 2010). Moreover, we consider a factor-augmented VAR (
FA-VAR ).Here, we augment the target variables with six principal components compromising information ofthe remaining variables in the data set, effectively leading to VAR with ten endogenous variables. In such larger scale-models our methods are capable of handling less frequent (but important)parameter instabilities in a genuine way.Especially forecasting these important macroeconomic aggregates remains a challenging task,since (at least) two issues arise. First, we have to decide on a set of variables, which we want toinclude in our econometric model. The recent literature on constant parameter VARs highlights thatexploiting large information sets yields forecast gains (see, for example, Ba´nbura et al. , 2010; Koop,2013). Second, it is well documented that important economic indicators feature instabilities instructural parameters and innovation volatilities. In the literature there is strong agreement thatSV is important in macroeconomic applications (see Clark, 2011). There is also strong empiricalsupport for shifting parameters in small-scale models (see D’Agostino et al. , 2013). However, thereis less consensus for time-varying parameters in larger-scale models. With increasing amount ofinformation overall time-variation in parameters tends to reduce. Recent contributions dealing with In the empirical application we start with 1962:Q1 and use the first observations for transformations. In Appendix C we provide further details on the specific variable set, included in the largest specification, and thetransformation applied. The number of principal components is motivated by the specification in Stock and Watson (2012), who also considersix factors. See, for example Stock and Watson (2012), Ng and Wright (2013) and Aastveit et al. (2017), which put specialemphasis on the recent financial crisis. arge-scale TVP-VARs argue that in smaller models the TVP part controls for an omitted variablebias (see Feldkircher et al. , 2017; Huber et al. , 2020b). In the following empirical application, note thate we consider two lags for every model and allowfor SV.
In-sample evidence
Before proceeding, we briefly elaborate on a potential identification problem when interpretingthe state indicators S t (see Fr¨uhwirth-Schnatter, 2001). For the TVP-MIX models, identificationis ensured by construction (if coefficients indeed feature time variation). Assuming φ t = S t (seeEquation 3) automatically imposes inequality constraints on the autoregressive coefficients in thestate equation. However, non-identifiability can occur when coefficients are constant. In such a case,elements in S t are hard to interpret, since a no change evolution is supported by both a randomwalk and a white noise process. Interpreting S t for the TVP-POOL specification is an even morechallenging task, since in these models S t solely controls the evolution of state innovations. Here,inference about the state indicator matrix is only useful in combination with inference about thesize of state innovation variances ¯ Ψ and ¯ Ψ and with imposing an inequality restriction ex-post(for example, ¯ ψ i < ¯ ψ i ).Therefore, we solely focus on two variants of a TVP-MIX model to illustrate the switching be-haviour. Figure 2 depicts the posterior median of the diagonal elements in S t . Panel (a) showsa TVP-MIX model with S t = s t I K and s t following a first-order Markov process ( MS ). Panel (b)depicts a specification with elements in S t following an independent mixture distribution ( MIX ). Acomparison between both approaches highlights that a joint indicator evidently leads to a differentposterior median of S t than covariate-specific indicators. By restricting S t = s t I K , all covariatesare driven solely by a single indicator that pushes all covariates towards either a random walk orwhite noise state equation in period t . Conversely, with covariate-specific indicators, we see moredispersion across covariates. However, both approaches agree on a white noise state equation intimes of turmoil, suggesting a need for abruptly adjusting parameters in these periods. This modelfeature is in line with the discussion in Primiceri (2005), who suggests that an economically stableperiod favours more gradual changes (which are more consistent with a random walk state equation)in the coefficients, while shifts in policy rules require quickly adjusting coefficients (which is bettercaptured by using a white noise state equation). Since estimating TVP models with typical MCMC methods remains computationally demanding, several studiestake this argument as a reason to opt for approximating the TVP part or rely on dimension reduction techniques,yielding fast inference while accepting a certain risk of misspecification (see, inter alia Eisenstat et al. , 2019; Korobilis,2019; Hauzenberger et al. , 2020; Huber et al. , 2020b; Korobilis and Koop, 2020). o further illustrate the proposed methods, we estimate the low-frequency relationship betweenunemployment and inflation. This low-frequency measure corresponds to a long-run coefficient ofdistributed-lag regression models (Whiteman, 1984) and disentangles systematic co-movements fromshort-run fluctuations. Panel (a) to (c) in Figure 3 depict the obtained low-frequency component with our proposedapproaches and panel (d) shows estimates with a standard TVP model assuming a standard randomevolution assumption. Starting with a comparison between the random walk/white noise mixture(
TVP-MIX ) and a classic random walk TVP model, we observe a similar pattern for both approachesduring tranquil periods. However, during recessions, the approaches significantly differ. Both
TVP-MIX models are capable of detecting a major structural break in the low-frequency relationshipafter the oil crisis in the 1970s and strongly support a long-lasting stagflation period (i.e. positiverelationship between unemployment and inflation). While
TVP-MIX methods are designed to quicklycapture these large abrupt breaks in parameters, a standard random walk state equation translatesinto a low-frequency component that gradually adapts over time. However, the
TVP-MIX modelwith covariate-specific indicators
MIX is slightly more sensitive with respect to abrupt changes inparameters than the
TVP-MIX MS model.Panel (c) shows the sparse scale-location mixture (
TVP-POOL ) approach with covariate-specificindicators (
MIX ). We observe that this method almost resembles a constant coefficient specificationwith SV. In the mid 1980s and in the financial crisis movement in the low-frequency relationship isslightly more erratic compared to other periods, but it stays mostly constant and significant.Overall, considering
TVP-MIX methods seem to improve the economic interpretability of the low-frequency component, while a
TVP-POOL model aggressively pushes coefficients towards a constantevolution, which could pay off for forecasting.
Forecasting evidence
In the forecast exercise we consider a wide range of models varying along the evolution assumptionof parameters and the information set considered.With respect to the evolution of parameters, it proves convenient to summarize the differentspecifications (see Table 1). The models differ along three dimensions: the autoregressive parameters φ t , the innovation variances Ψ t and the state indicator matrix S t . First, our main specificationsvary between a model that assumes a binary indicator matrix on the autoregressive parameter with φ t = S t (labeled as TVP-MIX ) and a model that introduces a hierarchical prior on the TVP-part Sargent and Surico (2011) and Kliem et al. (2016) suggest that a TVP-VAR framework, additionally, allows toaccount for changes in the transmission channels (time-varying coefficients) and changes in the error volatilities(SV). For further details see Appendix A.4. a) S t = s t I K with s t following an MS process: G_t C_t U_t F_t
G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t K prob. (b) Elements in S t follow an independent mixture specification: G_t C_t U_t F_t
G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t K prob. Figure 2:
Posterior distribution of s it , for i = { , . . . , K } , for two small-scale TVP-MIX models.Here, G denotes output growth (GDPC1), C the inflation (CPIAUCSL), U the unemploymentrate (UNRATE), F refers to the interest rate (FEDFUNDS) and ic to an intercept. Moreover, thestructural form in Equation 11 implies that some parameters are not part of the i th equation(denoted by grey shaded areas), due to the strictly lower triangular structure of B t .( TVP-POOL ). For the latter we implicitly assume that φ t = K × K , for all t , in Equation 5. Regardingthe autoregressive parameter, a natural competing model is a standard random walk assumptionwith φ t = I K , for all t ( TVP-RW ). Second, the models differ in the treatment of the state innovationvariances. The most flexible innovation variance specification does not restrict ¯ Ψ and ¯ Ψ (labeledas FLEX ), a second specification assumes ¯ Ψ = κ ˆ Ψ ( SSVS ), while the most restrictive specificationfixes Ψ t = ¯ Ψ , for all t , to a a single variance state ( SINGLE ). In the empirical exercise, we set κ = 0 . and ˆ Ψ = diag ( ˆ ψ , . . . , ˆ ψ K ) with ˆ ψ i , for, i = { , . . . , K } , denoting ordinary least square(OLS) variances obtained from an AR( p ) model (see Huber et al. , 2019). Third, with regards tothe state indicator matrix S t , we discriminate between a joint Markov-switching indicator (labeledas MS ) and covariate-specific indicators following an independent mixture distribution ( MIX ). Recall,that for the
TVP-MIX models S t adjusts both the autoregressive parameters and the state innovationvariances, while for the TVP-POOL and
TVP-RW models S t only controls the state innovations. Inthe following, we define TVP-MIX , TVP-POOL and
TVP-RW as the
Class of the TVP model and thecombination of the acronyms for the innovation variances and indicator matrix as the
Subclass ofthe specification. A single model is identified by a combination of all three acronyms. For example,a
TVP-MIX FLEX MIX specification denotes a model with a random walk/white noise mixture forthe state equation, with unrestricted two-state variances and with the elements in S t following anindependent mixture distribution. a) TVP-MIX with flexible state variances (
FLEX ) and S t = s t I K ( MS ) : − (b) TVP-MIX with flexible state variances (
FLEX ) and covariate-specific indicators (
MIX ) : − (c) TVP-POOL with flexible state variances (
FLEX ) and covariate-specific indicators (
MIX ) : − (d) Standard TVP-VAR with random walk state equation: − Figure 3:
Low-frequency relationship between the unemployment rate and the inflation. Theblue line refers to the posterior median, while the blue-shaded area indicates the 68% posteriorcredible set. The red line indicates zero. able 1: Overview of specifications.
TVP-MIX φ t = S t Ψ t = S t = Related to: FLEX MS S t ¯ Ψ + ( I K − S t ) ¯ Ψ s t I K FLEX MIX S t ¯ Ψ + ( I K − S t ) ¯ Ψ diag ( s t , . . . , s Kt ) SINGLE ¯ Ψ SSVS MIX S t ¯ Ψ + κ ( I K − S t ) ˆ Ψ diag ( s t , . . . , s Kt ) Chan et al. (2012) TVP-POOL φ t = K × K FLEX MS S t ¯ Ψ + ( I K − S t ) ¯ Ψ s t I K FLEX MIX S t ¯ Ψ + ( I K − S t ) ¯ Ψ diag ( s t , . . . , s Kt ) SINGLE ¯ Ψ Hauzenberger et al. (2019)
SSVS MIX S t ¯ Ψ + κ ( I K − S t ) ˆ Ψ diag ( s t , . . . , s Kt ) TVP-RW φ t = I K FLEX MS S t ¯ Ψ + ( I K − S t ) ¯ Ψ s t I K FLEX MIX S t ¯ Ψ + ( I K − S t ) ¯ Ψ diag ( s t , . . . , s Kt ) SINGLE ¯ Ψ Standard
TVP-RWSSVS MIX S t ¯ Ψ + κ ( I K − S t ) ˆ Ψ diag ( s t , . . . , s Kt ) e.g. Huber et al. (2019) All these TVP models feature a Normal-Gamma (Griffin and Brown, 2010) on ˆ α . We compareour methods to two constant parameter models. One variant features a Normal-Gamma ( const.(NG) ) prior, while the second variant assumes a Minnesota ( const. (MIN) ) prior. We consider anon-conjugate Minnesota prior, capturing the notion that own lags are more important than lagsfrom other variables (Doan et al. , 1984; Litterman, 1986). We estimate this set of models for threeinformation sets (
FA-VAR , L-VAR and
S-VAR ) with each featuring a different number of endogenousvariables. Every considered specification features two lags and SV.To asses one-quarter-, one-year- and two-year-ahead predictions, we treat observations rangingfrom 1962:Q1 to 1999:Q4 as an initial sample and the periods from 2000:Q1 to 2019:Q4 as a hold-outsample. The initial sample is then recursively expanded until the penultimate quarter (2019:Q3)is reached. For each forecast comparison, a small-scale Minnesota VAR with constant parameters(
S-VAR const. (MIN) ) serves as our benchmark. In the following, Table 2 shows the best performingmodels for point and density forecasts, being a tractable summary of Table 3 and Table 4. Table 3depicts root-mean squared error ratios (RMSEs) as point forecast measures and Table 4 the logpredictive Bayes factors (LPBFs) as density forecast metrics. The best performing models withineach column are indicated by bold numbers. In Table B.1 we provide additional results on continuousrank probability score (CRPS) ratios. This alternative density forecast measure is more robust tooutliers than log predictive scores (Gneiting and Raftery, 2007). With three different measures Note with a single-state variance ( Ψ t = ˆ Ψ ), ˆ α collapses to a 2 K -dimensional vector (see Bitto and Fr¨uhwirth-Schnatter, 2019). A constant coefficient model can be obtained by either offsetting ˜ α = ν × or setting { Ψ t } Tt =1 ≈ K × K . t three different horizons we obtain a comprehensive picture to evaluate our methods jointly andmarginally along the four target variables. Table 2:
Overview of the best performing models, indicated by bold numbers in Table 3 andTable 4.
Variable 1-quarter-ahead 1-year-ahead 2-years-ahead
Size Class Subclass Size Class Subclass Size Class Subclass
Point forecastsRMSE ratios
TOT
L-VAR TVP-POOL SINGLE L-VAR TVP-POOL SINGLE FA-VAR TVP-POOL SINGLE
GDPC1
L-VAR TVP-POOL FLEX MIX FA-VAR TVP-POOL FLEX MS FA-VAR TVP-MIX FLEX MIX
CPIAUCSL
S-VAR TVP-MIX FLEX MIX S-VAR TVP-RW FLEX MIX L-VAR TVP-RW SINGLE
UNRATE
L-VAR TVP-POOL SSVS MIX L-VAR TVP-POOL SINGLE FA-VAR TVP-POOL SSVS MIX
FEDFUNDS
FA-VAR TVP-RW SSVS MIX FA-VAR TVP-RW SSVS MIX FA-VAR TVP-POOL SINGLE
Density forecastsLPBFs
TOT
L-VAR TVP-POOL FLEX MIX L-VAR TVP-POOL SSVS MIX L-VAR const (NG.)
GDPC1
L-VAR TVP-POOL FLEX MS FA-VAR TVP-POOL SSVS MIX FA-VAR TVP-POOL SINGLE
CPIAUCSL
S-VAR TVP-MIX SSVS MIX L-VAR const. (NG) L-VAR const. (NG)
UNRATE
L-VAR TVP-POOL SSVS MIX L-VAR TVP-POOL SSVS MIX L-VAR TVP-MIX SSVS MIX
FEDFUNDS
L-VAR TVP-POOL SSVS MIX FA-VAR TVP-RW FLEX MIX FA-VAR const. (Min)
Table 2 summarizes the main findings of our forecast exercise.
First , larger-scale models (
FA-VAR , L-VAR ) generally outperform the small-scale specifications across horizon-variable combinations,indicating that an increasing amount of information pays off for forecasting (see Ba´nbura et al. ,2010). One exception is inflation. For inflation, flexible
S-VAR s yield more accurate forecasts than
FA-VAR s and
L-VAR s for one-quarter- and one-year-ahead point forecasts and one-quarter-aheaddensity forecasts. Comparing
FA-VAR s with
L-VAR s, the results are mixed. One pattern worthnoting is that
L-VAR s tend to outperform
FA-VAR s for the one-quarter-ahead horizon while thepicture reverses for higher-order forecasts. Second, with respect to parameter changes we see thatthe
TVP-POOL specifications forecast particularly well across all horizons and target variables. Thesemodels substantially improve upon a wide range of benchmarks. Overall, Table 2 shows that allTVP classes that provide accurate point predictions generally also perform well in terms of densityforecasts. able 3: Point forecast performance (RMSE ratios) relative to the benchmark ( const (Min.) ). The red shaded row denotes the benchmark (andits RMSE values). Asterisks indicate statistical significance for each model relative to const (Min.) at the 1 ( ∗∗∗ ), 5 ( ∗∗ ) and 10 ( ∗ ) percentsignificance levels. Specification 1-quarter-ahead 1-year-ahead 2-years-ahead
Class Subclass TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS
FA-VAR const. (Min.) 0.91* 0.87 0.94 0.79 1.22 0.92** 0.84* 1.03 0.80 1.11 0.93* 1.00 1.04 0.86 0.78***const. (NG) 0.88** 0.81* 0.95 0.80 1.14 0.90** 0.84** 0.99 0.81 1.02 0.92 1.00 1.01 0.87 0.75**TVP-MIX FLEX MIX 0.89** 0.80* 0.97 0.81 0.91 0.94* 0.92 0.98 0.88 0.98 1.01
SSVS MIX 0.89** 0.81* 0.96 0.76 1.06 0.88** 0.81** 1.00 0.77 0.98 0.89 0.98 1.01
L-VAR const. (Min.) 1.02 0.99 1.05 0.72 1.52 0.92** 0.93 0.95* 0.79 1.00 0.97** 1.02 0.98 0.95 0.88**const. (NG) 0.91** 0.85 0.96 0.71 1.10 0.88*** 0.89* 0.92** 0.74 0.98 0.92** 1.00 0.97 0.89 0.79**TVP-MIX FLEX MIX 1.04 0.90 1.14 0.71 2.19 0.91* 0.82 1.02 0.81 1.09 0.92 0.93 0.99 0.89 0.90FLEX MS 1.06 0.92 1.17 0.72 1.51 0.90* 0.86 0.95 0.79 1.08 1.22 1.56 1.30 0.90 1.13SINGLE 0.92* 0.86 0.98 0.74 0.90** 0.87* 0.82* 0.94 0.83 0.87* 0.92 0.94 0.97 0.91 0.86SSVS MIX 0.96 0.98 0.94 0.72 1.20 0.90 0.86 0.98 0.78 1.00 0.95 0.96 1.06 0.91 0.89TVP-POOL FLEX MIX 0.88***
S-VAR const. (Min.) 0.60 0.83 0.85 0.15 0.11 0.76 1.00 0.87 0.62 0.41 0.91 0.93 0.85 1.10 0.73const. (NG) 0.99** 0.99* 0.99 1.00 1.01 0.96** 0.94* 0.98 0.98* 0.98 0.97 0.99 0.98 0.97** 0.95TVP-MIX FLEX MIX 0.93** 0.94* able 4: Density forecast performance (LPBFs) relative to the benchmark ( const (Min.) ). The red shaded row denotes the benchmark (and itsLPS values). Asterisks indicate statistical significance for each model relative to const (Min.) at the 1 ( ∗∗∗ ), 5 ( ∗∗ ) and 10 ( ∗ ) percentsignificance levels. Specification 1-quarter-ahead 1-year-ahead 2-years-ahead
Class Subclass TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS
FA-VAR const. (Min.) 11.35 12.01** 2.23 8.56 -4.19 14.59 8.30** 2.02 14.03 8.69 37.98** 2.12 5.63 12.49 const. (NG) 18.88 13.48*** 4.19 8.66 0.46 25.90*** 9.01*** 2.34 14.21 10.93 40.82*** 3.08 7.01 12.22 28.66***TVP-MIX FLEX MIX 23.23 12.71** 2.78 11.15 12.95*** 39.12** 7.60 2.33 26.49 16.00 50.99 3.80 3.97 23.44 13.09FLEX MS 29.37 11.74** 5.32 13.84 9.59* 49.54** 7.49 2.32 27.51 17.64 41.96 0.10 3.65 22.67 13.22SINGLE 24.87 11.07** 2.24 10.12 17.01*** 52.21** 6.25 0.25 25.16 24.39 55.19 0.70 3.17 22.70 18.79SSVS MIX 25.93 13.23** 2.59 10.60 6.38* 40.29* 7.92* 1.90 23.53 10.64 40.57 3.38 1.82 25.83 7.45TVP-POOL FLEX MIX 26.23 13.49*** 3.44 10.51 7.28*** 39.69*** 10.63** 2.06 19.71 15.86 53.01** 3.61 6.18 22.98 28.67*FLEX MS 28.06 14.49*** 3.98* 10.24 7.62*** 34.75** 10.85*** 2.71 17.29 14.71 52.55** 4.07 6.27 21.13 28.34*SINGLE 25.63 14.42*** 2.98 10.61 6.20*** 34.67** 10.54*** 2.21 18.32 16.06 46.04**
L-VAR const. (Min.) 15.05 13.32* -0.29 12.15 -3.68*** 56.73** 2.78 3.22*** 32.41 11.78*** 61.59*** -3.86 3.18 15.52 18.28**const. (NG) 28.70 15.97*** 1.06 14.87 2.34 70.27*** 7.40**
S-VAR const. (Min.) -22.11 -82.64 -80.74 45.02 86.64 -256.48 -97.26 -83.70 -69.04 -29.94 -383.84 -92.27 -89.84 -125.53 -87.85const. (NG) 4.55 1.97* 0.71 0.62* 1.39*** 4.97 2.64* 1.95* 0.03 2.30 9.15 0.58 3.07 2.99 1.37TVP-MIX FLEX MIX 24.41 3.97* 5.52 2.90 12.18*** 15.60 9.64** 3.98 8.50 5.03 -3.76 3.00 3.74 -0.02 -7.71FLEX MS 18.00 2.66 3.57 6.42 5.86** 9.48 5.25* -0.44 12.27 -0.22 0.77 -1.60 0.83 10.35 -11.26SINGLE 7.18** 2.45 -1.98 0.85 13.26*** 12.51 5.42 3.33 6.38 7.12* -0.80 -2.74 1.19 2.14 -5.03SSVS MIX 23.21 3.44* hen examining Table 3 and Table 4 in greater detail, note that a large number of modelsshown in the tables outperform the Minnesota benchmark in terms of RMSEs (indicated by ratiosbelow one) and in terms of LPBFs (indicated by values above zero). However, the benchmark is atough competitor when predicting inflation and for higher-order point forecasts.When focussing on the differences occuring through the varying treatment of parameter evol-utions, our proposed methods, the TVP-POOL and
TVP-MIX specifications, show that their goodperformance is mainly driven by improved forecast accuracy for output growth and unemployment.In terms of the innovation variance assumption for these specifications, we observe that additionalflexibility tends to improve density forecasts performance and yields accurate point forecasts. Forthe
TVP-POOL models this higher degree of flexibility generally pays off across variables and modelsizes. For flexible
TVP-MIX specifications, forecast ability tends to improve for
S-VAR s and
FA-VARs and is competitive for the
L-VAR s. Especially the
TVP-MIX SSVS MIX and
TVP-MIX FLEX MIX mod-els using a small information set yield quite accurate inflation forecasts, being the best performingmodels for the one-quarter-ahead horizon. Across variables, a notable exceptions is the interestrate for
TVP-MIX models. Here, a
TVP-MIX SINGLE specification is superior to models assuming amixture on innovation volatilities.When assessing random walk state equation (
TVP-RW ) specifications across the information sets,two things are worth noting. First, a standard TVP model with random walk assumption
TVP-RWSINGLE is only competitive for one-year- and two-year-ahead forecasts and otherwise forecasts poorly.Second, more flexible
TVP-RW variants produce quite accurate forecasts for
FA-VAR s. Constantparameter models with a Normal-Gamma prior show reasonable forecasts for
L-VAR s (especially forinflation), but lack flexibility in smaller-scale models. This observation is in line with the fact thatin larger-scale model, time-variation in coefficients vanishes (see Huber et al. , 2020b). However,few parameter instabilities might still be present since, apart from some exceptions, our methodsprovide improvements when compared to constant coefficient models.To illustrate the forecast performance over time, Figure 4 depicts the evolution of cumulatedjoint LPBFs relative to our benchmark. Overall we find that for all four target variable jointly ourproposed methods never forecast poorly. Both
TVP-MIX (black lines) and
TVP-POOL (green lines)methods outperform a standard TVP model with random walk state equation (
TVP-RW SINGLE denoted by the red solid line) across information sets (one exception is the
TVP-MIX SINGLE modelfor the
S-VAR ). Moreover, allowing for occasional parameter changes during and in the aftermathof financial crisis tends to increase predictive ability.A few points are worth discussing in greater detail. First, at the beginning of the hold-outsample is characterized by the early 2000s recession. Although this was a quite short crisis, it a) FA-VAR (b)
L-VAR (c)
S-VAR l const. (Min) const. (NG) TVP−MIX TVP−POOL TVP−RW FLEX MIX−SPEC FLEX MS−POOL SINGLE (MS−POOL) SSVS MIX−SPEC
Figure 4:
Evolution of one-quarter-ahead total cumulative LPBFs relative to the benchmark.The gray dashed lines refer to the maximum/minimum Bayes factor over the full hold-out sample.The light gray shaded areas indicate the NBER recessions in the US. lready leads to a quite diverse model performance across information sets. During this episode andits consecutive three years, L-VAR s strictly dominate the other two information sets (
FA-VAR s and
S-VAR s). This implies, that for any TVP evolution assumption, the large-scale model outperforms itssmaller-scale counterparts. Moreover, during the financial crisis, we observe a substantial increasein LPBFs for a wide range of
FA-VAR s and
L-VAR s, while for
S-VAR s we see similar improvementssolely for some TVP-VARs. This feature might indicate that TVPs are capable of mitigating apotential omitted variable bias (Huber et al. , 2020b).Second, within each information set, performance across parameter evolution assumptions ismixed. Evidently, performance of the four
L-VAR s featuring a
TVP-POOL specification stands out(depicted by green lines). In more tranquil periods they show constant improvements and yieldsubstantial predictive gains during the financial crisis. Especially in the aftermath of the GreatRecession, the LPBFs steadily increase compared to other large TVP-VARs. This episode alsoincludes a time characterized by a (sluggish) recovery of the US economy after the financial crisis andthe interest rate hitting the zero lower bound. With monetary policy shifting towards unconventionalmeasures it might not only pay off to include financial market variables, such as longer-term yields,but also to allow for occasional changes in transmission channels in these variables. Moreover,it is worth noting that the four
TVP-POOL variants tend to perform almost identical until 2010,while afterwards slight performance differences become evident. Hauzenberger et al. (2019) havemade a similar observation, when varying the hyperparameters on their conjugate prior of the stateequation.Third,
TVP-MIX methods generally forecast well for
S-VARs and
FA-VARs , while for
L-VARs onlythe
TVP-MIX SINGLE specification yields substantial gains. In particular for
FA-VARs and
S-VAR s,a flexible variance modelling (
TVP-MIX FLEX MIX and
TVP-MIX SSVS MIX ) generally pays off. For
L-VAR s these two models also yield reasonable forecast accuracy, while the
TVP-MIX MS forfeits fore-cast accuracy. Thus, for a large-scale model, the assumption that a joint indicator governs theevolution of large number of coefficients might be less appropriate. Moreover, comparing
TVP-MIX with
TVP-RW specifications reveals that the random walk/white noise mixture yields gains in tranquilperiods for larger-scale models (
FA-VAR s and
L-VAR s) and does particularly well in recessions forthe small information set (
S-VAR ). Especially for small-scale VARs the
TVP-MIX variants, featuringa mixture distribution on the state innovation volatilities, greatly improve predictive performancerelative to
TVP-RW models during the financial crisis. Moreover, it is worth noting that the
TVP-RWSINGLE model forecasts poorly for larger-scale models (
FA-VAR s and
L-VAR s) during tranquil periodsprevious to the financial crisis, while performance slightly recovers in the middle of the Great Reces-sion. A plausible explanation for this pattern might be that spurious movements in coefficients lead o overfitting, widening the predictive density of the TVP-RW SINGLE model. This feature is harmingpredictive accuracy in stable times, while it is to some extent helpful in times of turmoil (periodscharacterized by large outliers). In contrast to the
TVP-RW SINGLE the other three flexible
TVP-RW variants forecast particularly well, suggesting that flexible state innovation volatilities greatly in-crease precision for TVP coefficients. In particular for
FA-VAR s these models show improved forecastaccuracy after the Great Recession.
7. CLOSING REMARKS
It is empirically well documented that macroeconomic time series feature instabilities in the paramet-ers and innovation volatilities. In the literature there is strong agreement that stochastic volatilityis important, while, especially in larger-scale models, there is less consensus for time-varying coeffi-cients. With increasing amount of information overall time-variation in parameters tends to reduce,but might be still present at few points in time for some parameters. Detecting such occasionalchanges is challenging and requires highly flexible modeling techniques. To achieve such flexibilitywe introduce mixture priors on the time-varying part of the parameters. By additionally usinghierarchical shrinkage priors on dynamic state variances, these methods are capable in imposingdynamic sparsity, as well as capturing a wide range of parameter changes. In a simulation studywe show that our methods detect both sudden and gradual changes in parameters. In an empiricalexercise we find that some coefficients tend to change abruptly in times of turmoil. Moreover, allproposed approaches forecast well. Even for large VARs flexible mixture priors improve forecastaccuracy upon a wide range of benchmarks, suggesting that capturing these infrequent instabilitiespays off. EFERENCES
Aastveit KA, Carriero A, Clark TE, and Marcellino M (2017), “Have standard VARs remainedstable since the crisis?”
Journal of applied econometrics (5), 931–951. Ba´nbura M, Giannone D, and Reichlin L (2010), “Large Bayesian vector auto regressions,”
Journal ofApplied Econometrics (1), 71–92. Bauwens L, Koop G, Korobilis D, and Rombouts JV (2015), “The contribution of structural breakmodels to forecasting macroeconomic series,”
Journal of Applied Econometrics (4), 596–620. Belmonte MA, Koop G, and Korobilis D (2014), “Hierarchical shrinkage in time-varying parametermodels,”
Journal of Forecasting (1), 80–94. Bhattacharya A, Chakraborty A, and Mallick BK (2016), “Fast sampling with Gaussian scalemixture priors in high-dimensional regression,”
Biometrika asw042.
Bitto A, and Fr¨uhwirth-Schnatter S (2019), “Achieving shrinkage in a time-varying parameter modelframework,”
Journal of Econometrics (1), 75–97.
Cadonna A, Fr¨uhwirth-Schnatter S, and Knaus P (2020), “Triple the gamma–A unifying shrinkageprior for variance and variable selection in sparse state space and TVP models,”
Econometrics (2), 20. Carriero A, Clark TE, and Marcellino M (2019), “Large Bayesian vector autoregressions withstochastic volatility and non-conjugate priors,”
Journal of Econometrics (1), 137–154.
Carvalho CM, Polson NG, and Scott JG (2010), “The horseshoe estimator for sparse signals,”
Bio-metrika (2), 465–480. Chan JC, and Jeliazkov I (2009), “Efficient simulation and integrated likelihood estimation in state spacemodels,”
International Journal of Mathematical Modelling and Numerical Optimisation (1-2), 101–120. Chan JC, Koop G, Leon-Gonzalez R, and Strachan RW (2012), “Time varying dimension models,”
Journal of Business & Economic Statistics (3), 358–367. Chan JC, and Strachan RW (2020), “Bayesian State Space Models in Macroeconometrics,” .
Clark T (2011), “Real-time density forecasts from BVARs with stochastic volatility,”
Journal of Businessand Economic Statistics , 327–341. Cogley T, and Sargent TJ (2005), “Drifts and volatilities: monetary policies and outcomes in the postWWII US,”
Review of Economic Dynamics (2), 262 – 302. Cross JL, Hou C, and Poon A (2020), “Macroeconomic forecasting with large Bayesian VARs: Global-local priors and the illusion of sparsity,”
International Journal of Forecasting . D’Agostino A, Gambetti L, and Giannone D (2013), “Macroeconomic forecasting and structuralchange,”
Journal of Applied Econometrics (1), 82–101. Doan T, Litterman R, and Sims C (1984), “Forecasting and conditional projection using realistic priordistributions,”
Econometric reviews (1), 1–100. Eickmeier S, Lemke W, and Marcellino M (2015), “Classical time varying factor-augmented vectorauto-regressive models-estimation, forecasting and structural analysis,”
Journal of the Royal StatisticalSociety: Series A (Statistics in Society) (3), 493–533.
Eisenstat E, Chan J, and Strachan R (2019), “Reducing Dimensions in a Large TVP-VAR,” Technicalreport.
Feldkircher M, Huber F, and Kastner G (2017), “Sophisticated and small versus simple and size-able: When does it pay off to introduce drifting coefficients in Bayesian VARs?” arXiv preprintarXiv:1711.00564 . Fr¨uhwirth-Schnatter S (2001), “Markov chain Monte Carlo estimation of classical and dynamic switchingand mixture models,”
Journal of the American Statistical Association (453), 194–209. Fr¨uhwirth-Schnatter S, and Wagner H (2010), “Stochastic model specification search for Gaussianand partial non-Gaussian state space models,”
Journal of Econometrics (1), 85–100.
George EI, and McCulloch RE (1993), “Variable selection via Gibbs sampling,”
Journal of the AmericanStatistical Association (423), 881–889.——— (1997), “Approaches for Bayesian variable selection,” Statistica sinica
Gerlach R, Carter C, and Kohn R (2000), “Efficient Bayesian inference for dynamic mixture models,”
Journal of the American Statistical Association (451), 819–828. Giordani P, and Kohn R (2008), “Efficient Bayesian inference for multiple change-point and mixtureinnovation models,”
Journal of Business & Economic Statistics (1), 66–77. Gneiting T, and Raftery AE (2007), “Strictly proper scoring rules, prediction, and estimation,”
Journalof the American statistical Association (477), 359–378.
Griffin J, and Brown P (2010), “Inference with normal-gamma prior distributions in regression problems,”
Bayesian Analysis (1), 171–188. Groen JJ, Paap R, and Ravazzolo F (2013), “Real-time inflation forecasting in a changing world,”
Journal of Business & Economic Statistics (1), 29–44. Hauzenberger N, Huber F, and Koop G (2020), “Dynamic Shrinkage Priors for Large Time- arying Parameter Regressions using Scalable Markov Chain Monte Carlo Methods,” arXiv preprintarXiv:2005.03906 . Hauzenberger N, Huber F, Koop G, and Onorante L (2019), “Fast and Flexible Bayesian Inferencein Time-varying Parameter Regression Models,” arXiv preprint arXiv:1910.10779 . Huber F, and Feldkircher M (2019), “Adaptive shrinkage in Bayesian vector autoregressive models,”
Journal of Business & Economic Statistics (1), 27–39. Huber F, Kastner G, and Feldkircher M (2019), “Should I stay or should I go? A latent thresholdapproach to large-scale mixture innovation models,”
Journal of Applied Econometrics (5), 621–640. Huber F, Koop G, and Onorante L (2020a), “Inducing sparsity and shrinkage in time-varying parametermodels,”
Journal of Business & Economic Statistics (just-accepted), 1–48.
Huber F, Koop G, and Pfarrhofer M (2020b), “Bayesian Inference in High-Dimensional Time-varyingParameter Models using Integrated Rotated Gaussian Approximations,” arXiv preprint arXiv:2002.10274 . Kalli M, and Griffin J (2014), “Time-varying sparsity in dynamic regression models,”
Journal of Eco-nometrics (2), 779 – 793.
Kastner G (2016), “Dealing with stochastic volatility in time series using the R package stochvol,”
Journalof Statistical software (5), 1–30. Kastner G, and Fr¨uhwirth-Schnatter S (2014), “Ancillarity-sufficiency interweaving strategy (ASIS)for boosting MCMC estimation of stochastic volatility models,”
Computational Statistics & Data Analysis , 408–423. Kastner G, and Huber F (2020), “Sparse Bayesian Vector Autoregressions in Huge Dimensions,”
Journalof Forecasting forthcoming . Kim CJ, and Nelson CR (1999), “Has the US economy become more stable? A Bayesian approach basedon a Markov-switching model of the business cycle,”
Review of Economics and Statistics (4), 608–616. Kliem M, Kriwoluzky A, and Sarferaz S (2016), “On the Low-Frequency Relationship Between PublicDeficits and Inflation,”
Journal of applied econometrics (3), 566–583. Koop G, and Korobilis D (2012), “Forecasting inflation using dynamic model averaging,”
InternationalEconomic Review (3), 867–886.——— (2013), “Large time-varying parameter VARs,” Journal of Econometrics (2), 185–198.
Koop G, Leon-Gonzalez R, and Strachan RW (2009), “On the evolution of the monetary policytransmission mechanism,”
Journal of Economic Dynamics and Control (4), 997–1017. Koop G, and Potter SM (2007), “Estimation and forecasting in models with multiple breaks,”
The Reviewof Economic Studies (3), 763–789. Koop GM (2013), “Forecasting with medium and large Bayesian VARs,”
Journal of Applied Econometrics (2), 177–203. Korobilis D (2013), “Assessing the transmission of monetary policy using time-varying parameter dynamicfactor models,”
Oxford Bulletin of Economics and Statistics (2), 157–179.——— (2019), “High-dimensional macroeconomic forecasting using message passing algorithms,” Journal ofBusiness & Economic Statistics
Korobilis D, and Koop G (2020), “Bayesian dynamic variable selection in high dimensions,” .
Kowal DR, Matteson DS, and Ruppert D (2019), “Dynamic shrinkage processes,”
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) . Litterman RB (1986), “Forecasting with Bayesian vector autoregressions – five years of experience,”
Journalof Business & Economic Statistics (1), 25–38. Lopes HF, McCulloch RE, and Tsay RS (2016), “Parsimony inducing priors for large scale state-spacemodels,”
Bayesian Anal . Malsiner-Walli G, Fr¨uhwirth-Schnatter S, and Gr¨un B (2016), “Model-based clustering based onsparse finite Gaussian mixtures,”
Statistics and computing (1-2), 303–324. McCausland WJ, Miller S, and Pelletier D (2011), “Simulation smoothing for state–space models:A computational efficiency analysis,”
Computational Statistics & Data Analysis (1), 199–212. McCracken MW, and Ng S (2016), “FRED-MD: A monthly database for macroeconomic research,”
Journal of Business & Economic Statistics (4), 574–589. Mumtaz H, and Theodoridis K (2018), “The changing transmission of uncertainty shocks in the US,”
Journal of Business & Economic Statistics (2), 239–252. Nakajima J, and West M (2013), “Bayesian analysis of latent threshold dynamic models,”
Journal ofBusiness & Economic Statistics (2), 151–164. Ng S, and Wright JH (2013), “Facts and challenges from the great recession for forecasting and macroe-conomic modeling,”
Journal of Economic Literature (4), 1120–54. Park T, and Casella G (2008), “The Bayesian Lasso,”
Journal of the American Statistical Association (482), 681–686.
Paul P (2019), “The time-varying effect of monetary policy on asset prices,”
Review of Economics and tatistics Pfarrhofer M (2020), “Forecasts with Bayesian vector autoregressions under real time conditions,” arXivpreprint arXiv:2004.04984 . Polson NG, and Scott JG (2010), “Shrink globally, act locally: Sparse Bayesian regularization andprediction,”
Bayesian statistics , 501–538. Primiceri G (2005), “Time varying structural autoregressions and monetary policy,”
Oxford UniversityPress (3), 821–852. Rockova V, and McAlinn K (2018), “Dynamic variable selection with spike-and-slab process priors,”
Bayesian Analysis . Sargent TJ, and Surico P (2011), “Two illustrations of the quantity theory of money: Breakdowns andrevivals,”
American Economic Review (1), 109–28.
Sims CA, and Zha T (2006), “Were there regime switches in US monetary policy?”
American EconomicReview (1), 54–81. Stock JH, and Watson MW (2012), “Disentangling the Channels of the 2007-2009 Recession,” Technicalreport, National Bureau of Economic Research.
Uribe PV, and Lopes HF (2017), “Dynamic sparsity on dynamic regression models,”
Manuscript, availableat http://hedibert. org/wp-content/uploads/2018/06/uribe-lopes-Sep2017. pdf . Whiteman CH (1984), “Lucas on the quantity theory: Hypothesis testing without theory,”
The AmericanEconomic Review (4), 742–749. Yau C, and Holmes C (2011), “Hierarchical Bayesian nonparametric mixture models for clustering withvariable relevance determination,”
Bayesian analysis (Online) (2), 329. Zellner A (1986), “On Assessing Prior Distributions and Bayesian Regression Analysis with g Prior Distri-butions,” n Goel, P.; Zellner, A. (eds.). Bayesian Inference and Decision Techniques: Essays in Honorof Bruno de Finetti. Studies in Bayesian Econometrics and Statistics 6 . TECHNICAL APPENDIXA.1. Stochastic volatility specification
A stochastic volatility specifications assumes that h t = log( σ t ) follows an AR(1)-process: h t = µ h + φ h ( h t − − µ h ) + ϑ t , ϑ t ∼ N (0 , ψ h ) . (A.1)Following Kastner and Fr¨uhwirth-Schnatter (2014), we assume Gaussian priors on the initial state h ∼ N (cid:16) µ h , ψ h − φ h (cid:17) and the unconditional mean µ h ∼ N (0 , ψ h +12 ∼ B (25 , .
5) and a Gamma prior on the state variance ψ h ∼ G (1 / , / ψ h pushes the specification towards a random walk. A.2.
The Normal-Gamma prior (Griffin and Brown, 2010)
Similar to Bitto and Fr¨uhwirth-Schnatter (2019), we introduce class-specific global shrinkage para-meters, differentiating between the constant part of the coefficients (labeled λ a ) and regime-switchingvariances (labeled λ ψ and λ ψ , respectively). In the following, specify τ j | λ j ∼ G ( (cid:37) j , (cid:37) j λ j /
2) and λ j ∼ G ( ζ, ζ ) with λ j = λ k and (cid:37) j = (cid:37) k if j ∈ P k for k = { a, ψ , ψ } . P k denotes a classifier (i.e.defines the set of coefficients belonging to the k th group). In the following, P a = { j : ˆ α j ∈ α } , P ψ = (cid:40) j : ˆ α j ∈ (cid:26)(cid:113) ¯ ψ i (cid:27) Ki =1 (cid:41) , and P ψ = (cid:40) j : ˆ α j ∈ (cid:26)(cid:113) ¯ ψ i (cid:27) Ki =1 (cid:41) . Moreover, we learn the hyperparameter (cid:37) k in a fully Bayesian fashion and specify ζ = 0 . A.3.
Detailed MCMC algorithm
In this section, we provide details on each sampling step of the MCMC algorithm and on the fullconditional posterior distributions. After defining appropriate starting values, we iterate throughthe following steps 20 ,
000 times and discard the first 10 ,
000 draws as burn-in:1. The sampling steps (and conditional posteriors) for ˆ α t , λ k , τ j , for k = { a, ψ , ψ } and j =1 , . . . , K and (cid:37) k are of standard form (Griffin and Brown, 2010):(a) Draw ˆ α from a multivariate Gaussian distribution:ˆ α | y , ˆ X , Σ , { τ j } Kj =1 ∼ N (cid:16) ˆ α , ˆ V (cid:17) . ere, ˆ X is a T × K -dimensional matrix with ˆ x (cid:48) t on the t th position and:ˆ V − = (cid:16) ( Σ − ˆ X ) (cid:48) ( Σ − ˆ X ) + diag ( τ − , . . . , τ − K ) (cid:17) , ˆ α = ˆ V (cid:16) ( Σ − ˆ X ) (cid:48) ( Σ − y ) (cid:17) . (b) Sample the local shrinkage scalings { τ j } Kj =1 from a generalized inverse Gaussian (GIG)distribution (Griffin and Brown, 2010): τ j | ˆ α j , λ j , (cid:37) j ∼ GIG (cid:18) (cid:37) j − , (cid:37) j λ j , ˆ α j (cid:19) , for j = { , . . . , K } . Here, λ j = λ k and (cid:37) j = (cid:37) k if j ∈ P k with k = { a, ψ , ψ } .(c) Sample the associated global shrinkage parameter λ k , for k = { a, ψ , ψ } , from a Gammadistribution distribution: λ k |{ τ j } j ∈ P k , (cid:37) k ∼ G ζ + (cid:37) k p k , ζ + (cid:37) k (cid:88) j ∈ P k τ j with p k denoting the cardinality of the set P k (see Appendix A.2).(d) The hyperparameter (cid:37) k , for k = { a, ψ , ψ } , are updated with a random walk MetropolisHastings (MH) step. We refer to Bitto and Fr¨uhwirth-Schnatter (2019) for details.2. Draw the normalized latent states ˜ α from a ν -dimensional Gaussian distribution by exploitingthe static representation (see Section 4).3. Draw time-varying volatilities Σ using the R package stochvol (Kastner, 2016).4. Update binary indicators in S t , depending on its law of motion. We recast state equation backin the centered parameterization and evaluate the following regime-switching specification: α t = α + γ t + ¯ φ ( α t − − α ) + ς t , ς t ∼ N ( , ¯ Ψ ) if s t = 0 , α + γ t + ¯ φ ( α t − − α ) + ς t , ς t ∼ N ( , ¯ Ψ ) if s t = 1 , (A.2)with ¯ φ = K × K , ¯ φ = I K and γ t = K × for the TVP-MIX model. For the
TVP-POOL modelwe set ¯ φ = ¯ φ = K × K . • s t follows a first-order MS process ( MS ): The GIG( a, b, c ) is parameterized as p ( x ) ∝ x a − exp {− ( bx + c/x ) / } . a) Conditional on the other parameters in Equation A.2, we follow Kim and Nelson(1999) and sample { s t } Tt =1 using standard algorithms.(b) Conditional on { s t } Tt =1 we update transition probabilities by sampling p ∼ B ( T + c , T + c ) and p ∼ B ( T + c , T + c ) both from a Beta distribution with T kl , denoting the number of transitions from the k th to the l th regime. • Covariate-specific indicators with { s it } Ki =1 independent over time ( MIX ):(a) Conditional on the other parameters in Equation A.2 we evaluate both regimes inEquation A.2 and sample s it for each period and covariate independently from aBernoulli distribution.(b) Conditional on { s it } Tt =1 we are able to update the success probability for each co-variate by sampling from a Beta distribution p i ∼ B ( T i, + c i, , T i, + c i, ), for i = { , . . . , K } , with T i,k denoting the number of periods in the k th regime.5. For the specification with a hierarchical prior on ˜ γ t and φ = φ = K × K , we need five addi-tional sampling steps (details can be found in Malsiner-Walli et al. (2016) and Hauzenberger et al. (2019)):(a) Draw mixture weights ω from a Dirichlet distribution: ω | θ , ξ ∼ D ir( ξ , . . . , ξ N ) , with θ = ( θ , . . . , θ T ) (cid:48) and ξ n = ξ + T n , where T n denotes the number of periods assignedto group n .(b) Update the hyperparemter ξ with random walk Metropolis-Hastings step (for details, seeMalsiner-Walli et al. , 2016).(c) Sample group indicators θ t ∈ { , . . . , N } for each ˜ α t from a Multinomial distribution:P( θ t = n | ω n , ˜ µ n ) ∝ ω n f N ( ˜ α t | ˜ µ n , I K ) , for n = { , . . . , N } . (d) The full conditional posterior of ˜ µ = vec( ˜ µ , . . . , ˜ µ N ) follows a multivariate Gaussiandistribution:˜ µ | Λ , θ ∼ N ( c , Λ ) , ith θ = ( θ , . . . , θ T ) (cid:48) and: Λ = (cid:0) I K ⊗ Ξ (cid:48) Ξ + I N ⊗ Λ − (cid:1) − , c = Λ (cid:16) vec( Ξ (cid:48) ˜ A ) (cid:17) . Here, Ξ denotes a T × N matrix with ( t, n ) th element given by I ( θ t = n ), where I ( • )refers to the indicator function and ˜ A collects ˜ α in a T × K matrix.(e) Sample shrinkage parameters { l j } Kj =1 from a GIG distribution: l j | R , { ˜ µ n } Nn =1 ∼ GIG (cid:32) e − N , e , (cid:80) Nn =1 ˜ µ jn r j (cid:33) . with ˜ µ jn , for n = { , . . . , N } , denoting the j th element of ˜ µ n . A.4.
The spectral decomposition
To obtain a time-varying low-frequency measure between two endogenous variable, we follow Sargentand Surico (2011) and Kliem et al. (2016). We therfore recast a TVP-VAR model in its companionform: Y t = J Z t Z t = F t Z t − + E t , E t ∼ N ( , Υ t )In the following, the spectral density of Y t at the very low frequency ρ = 0 is given by: Π t ( ρ = 0) = J (cid:0) I mp +1 − F t ) Υ t ( I mp +1 − F (cid:48) t ) − (cid:1) J (cid:48) . For ρ = 0 the low-frequency relationship π ij,t between two variables ( Y it , Y jt ) ∈ Y t can be derivedwith: π ij,t = Π ij,t ( ρ = 0)Π jj,t ( ρ = 0)with Π ij,t denoting the ( i, j ) th element in Π t . . ADDITIONAL FORECASTING RESULTS i. One-year-ahead ii.
Two-years-ahead (a)
FA-VAR (b)
L-VAR (c)
S-VAR l const. (Min) const. (NG) TVP−MIX TVP−POOL TVP−RW FLEX MIX−SPEC FLEX MS−POOL SINGLE (MS−POOL) SSVS MIX−SPEC
Figure B.1:
Evolution of one- and two-year-ahead total cumulative LPBFs relative to thebenchmark. The gray dashed lines refer to the maximum/minimum Bayes factor over the fullhold-out sample. The light gray shaded areas indicate the NBER recessions in the US. able B.1: Density forecast performance (CRPS ratios) relative to the benchmark ( const (Min.) ). The red shaded row denotes the benchmark(and its CRPS values). Asterisks indicate statistical significance for each model relative to const (Min.) at the 1 ( ∗∗∗ ), 5 ( ∗∗ ) and 10 ( ∗ ) percentsignificance levels. Specification 1-quarter-ahead 1-year-ahead 2-years-ahead
Class Subclass TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS
FA-VAR const. (Min.) 0.92 0.85 0.97 0.89 1.10 0.90** 0.85** 1.02 0.81 0.94 0.88** 0.98 0.97 0.87 0.71***const. (NG) 0.90** 0.82** 0.97 0.89 1.04 0.89** 0.84** 1.00 0.82 0.89 0.88** 0.97 0.97 0.87 0.72**TVP-MIX FLEX MIX 0.89*** 0.82** 0.96 0.89 0.85* 0.91** 0.89 0.98 0.86 0.88 0.97 0.95 1.03 0.99 0.89FLEX MS 0.90** 0.84* 0.97 0.85 0.88 0.91* 0.89* 1.02 0.82 0.85 0.95 1.00 1.00 0.94 0.87SINGLE 0.90** 0.84* 0.98 0.88 0.80** 0.90** 0.88** 1.01 0.83 0.79** 0.96 0.98 1.04 0.96 0.82SSVS MIX 0.89** 0.83* 0.97 0.88 0.92 0.91** 0.88** 1.01 0.83 0.88 0.95 0.97 1.05 0.90 0.89TVP-POOL FLEX MIX 0.88** 0.80** 0.97 0.86 0.95 0.87** 0.81** 1.00 0.79 0.85 0.86* 0.94* 0.97 0.82 0.70**FLEX MS 0.88*** 0.80** 0.96 0.86 0.95 0.87**
TVP-RW FLEX MIX 0.89** 0.85* 0.95 0.85 0.82** 0.87* 0.87** 0.99 0.78
L-VAR const. (Min.) 0.94 0.87 1.03 0.83 1.10 0.90** 0.95* 0.95*** 0.79 0.87 0.94** 1.03 0.95 0.94 0.83***const. (NG) 0.89** 0.82*** 0.97 0.81 0.99 0.85*** 0.89**
S-VAR const. (Min.) 0.24 0.44 0.41 0.07 0.05 0.38 0.54 0.44 0.31 0.22 0.49 0.48 0.46 0.58 0.44const. (NG) 0.98** 0.98* 0.99 1.00 1.00 0.96** 0.94** 0.97** 0.96** 0.96 0.97 0.99 0.96 0.96** 0.95TVP-MIX FLEX MIX 0.92*** 0.92** . DATA In this section we provide further details on the variable used for the large-scale VAR (
L-VAR ).Table C.1 lists the exact description and provides further information on the transformation of theindicators. The gray shaded rows denote our target variables.
Table C.1:
Data for the US is obtained from the FRED data base of the Federal Reserve of St. Louis.The column
Transformation shows the transformation applied to each variable. Following McCracken andNg (2016), (1) implies no transformation, (5) denotes growth rates, defined as log first differences ln (cid:16) x t x t − (cid:17) and (7) denotes differences in percentage changes with ∆ (cid:16) x t − x t − x t − (cid:17) . All variables are standardized bysubtracting the mean and dividing by the standard deviation. FRED.Mnemonic Description TransformationGDPC1 Real Gross Domestic Product 5PCECC96 Real Personal Consumption Expenditures 5FPIx Real private fixed investment 5GCEC1 Real Government Consumption Expenditures and Gross Investment 5INDPRO IP:Total index Industrial Production Index (Index 2012=100) 5CE16OV Civilian Employment (Thousands of Persons) 5UNRATE Civilian Unemployment Rate (Percent) 1CES0600000007 Average Weekly Hours of Production and Nonsupervisory Employees: Goods-Producing 1HOUST Housing Starts: Total: New Privately Owned Housing Units Started 5PERMIT New Private Housing Units Authorized by Building Permits 5PCECTPI Personal Consumption Expenditures: Chain-type Price Index 5GDPCTPI Gross Domestic Product: Chain-type Price Index 5CPIAUCSL Consumer Price Index for All Urban Consumers: All Items 5CES0600000008 Average Hourly Earnings of Production and Nonsupervisory Employees 5FEDFUNDS Effective Federal Funds Rate (Percent) 1GS1 1-Year Treasury Constant Maturity Rate (Percent) 1GS10 10-Year Treasury Constant Maturity Rate (Percent) 1TOTRESNS Total Reserves of Depository Institutions 5NONBORRES Reserves Of Depository Institutions, Nonborrowed 7S.P.500 S & P’s Common Stock Price Index: Composite 5
For the factor-augmented VAR (
FA-VAR ) we consider the full data set, compromising 165 vari-ables. For brevity we refer to McCracken and Ng (2016) for a detailed description and transformationcodes. All variables, serving as a basis for the principal components, are transformed to station-arity as suggested in McCracken and Ng (2016). Finally, we standardise the data by demeaningeach variable and dividing through the standard deviation. Especially for principal componentsstandardising is important due to the scale variance of the components.) we consider the full data set, compromising 165 vari-ables. For brevity we refer to McCracken and Ng (2016) for a detailed description and transformationcodes. All variables, serving as a basis for the principal components, are transformed to station-arity as suggested in McCracken and Ng (2016). Finally, we standardise the data by demeaningeach variable and dividing through the standard deviation. Especially for principal componentsstandardising is important due to the scale variance of the components.