[PDF] Flexible Mixture Priors for Large Time-varying Parameter Models

Abstract

Time-varying parameter (TVP) models often assume that the TVPs evolve according to a random walk. This assumption, however, might be questionable since it implies that coefficients change smoothly and in an unbounded manner. In this paper, we relax this assumption by proposing a flexible law of motion for the TVPs in large-scale vector autoregressions (VARs). Instead of imposing a restrictive random walk evolution of the latent states, we carefully design hierarchical mixture priors on the coefficients in the state equation. These priors effectively allow for discriminating between periods where coefficients evolve according to a random walk and times where the TVPs are better characterized by a stationary stochastic process. Moreover, this approach is capable of introducing dynamic sparsity by pushing small parameter changes towards zero if necessary. The merits of the model are illustrated by means of two applications. Using synthetic data we show that our approach yields precise parameter estimates. When applied to US data, the model reveals interesting patterns of low-frequency dynamics in coefficients and forecasts well relative to a wide range of competing models.

Full PDF

FFlexible mixture priors for time-varying parameter models

NIKO HAUZENBERGER ∗ University of Salzburg

Time-varying parameter (TVP) models often assume that the TVPs evolve accordingto a random walk. This assumption, however, might be questionable since it impliesthat coeﬃcients change smoothly and in an unbounded manner. In this paper, we relaxthis assumption by proposing a ﬂexible law of motion for the TVPs in large-scale vectorautoregressions (VARs). Instead of imposing a restrictive random walk evolution ofthe latent states, we carefully design hierarchical mixture priors on the coeﬃcients inthe state equation. These priors eﬀectively allow for discriminating between periodswhere coeﬃcients evolve according to a random walk and times where the TVPs arebetter characterized by a stationary stochastic process. Moreover, this approach iscapable of introducing dynamic sparsity by pushing small parameter changes towardszero if necessary. The merits of the model are illustrated by means of two applications.Using synthetic data we show that our approach yields precise parameter estimates.When applied to US data, the model reveals interesting patterns of low-frequencydynamics in coeﬃcients and forecasts well relative to a wide range of competing models.

JEL : C11, C30, C53, E44, E47

KEYWORDS : Time-varying parameter vector autoregressions, mixture priors, hier-archical modeling, clustering ∗ Salzburg Centre of European Union Studies, University of Salzburg. Address: M¨onchsberg 2a, 5020 Salzburg,Austria. Email: [email protected]. I thank Paul Hofmarcher, Florian Huber, Karin Klieber, Gary Koop,Luca Onorante, Michael Pfarrhofer and Anna Stelzer for valuable comments and suggestions. The author gratefullyacknowledges ﬁnancial support from the Austrian Science Fund (FWF, grant no. ZK 35) and the OesterreichischeNationalbank (OeNB, Anniversary Fund, project no. 18127). a r X i v : . [ ec on . E M ] J un . INTRODUCTION A growing number of papers introduces time-varying parameters (TVP) in econometric modelsfor capturing structural breaks in relations across macroeconomic fundamentals (see, for example,Cogley and Sargent, 2005; Primiceri, 2005; Sims and Zha, 2006; Korobilis, 2013; Eickmeier et al. ,2015; Mumtaz and Theodoridis, 2018; Paul, 2019) and to achieve more accurate macroeconomicforecasts (see, for instance, Koop and Korobilis, 2012; 2013; D’Agostino et al. , 2013; Groen et al. ,2013; Bauwens et al. , 2015; Hauzenberger et al. , 2019; Huber et al. , 2020a;b).In this paper, we focus on estimating TVP vector autoregressive (VAR) models with a large num-ber of endogenous variables. Due to severe overﬁtting issues in large TVP-VARs, special emphasis ispaid to important modeling decisions, such as whether coeﬃcients evolve gradually, change abruptlyor remain constant for subset of periods. In macroeconomic applications, it is common to assumethat coeﬃcients evolve according to a random walk, implying that parameter change smoothly overtime. As noted by the recent literature (see, for example, Lopes et al. , 2016; Hauzenberger et al. ,2019), however, this assumption might be overly simplistic and lead to model misspeciﬁcation.In large TVP-VARs it is often reasonable to assume that most parameters remain constantover time, while only few vary. To capture this behaviour, the Bayesian literature frequently usesshrinkage priors on the state innovation variances to suﬃciently push them towards zero (Fr¨uhwirth-Schnatter and Wagner, 2010; Belmonte et al. , 2014). A severe drawback of this strategy is that itonly accounts for the case that a given coeﬃcient is constant for all points in time (labeled staticsparsity).Another common situation faced by researchers is that coeﬃcients change only at certain pointsin time (this is referred to as dynamic sparsity). Using a mixture distribution on the innovationvariances, for example, allows to push small parameter changes towards zero (see, inter alia, Ger-lach et al. , 2000; Giordani and Kohn, 2008; Koop et al. , 2009; Huber et al. , 2019). Alternatively,Hauzenberger et al. (2019) introduce a more ﬂexible law of motion by assuming a conjugate hier-archical location mixture prior directly on the time-varying part of the coeﬃcients. This locationmixture allows for a dynamically adjusting the prior mean on the TVPs to capture situations witha low, moderate or even large number of structural breaks in the coeﬃcients. However, both tech-niques come with drawbacks. For instance, the mixture innovation model of Huber et al. (2019),equipped with a latent threshold mechanism, discriminates between a high and a low innovationvariance state. However, the authors do not discard the random walk law of motion, which might Other dynamic sparsiﬁcation techniques include diﬀerent forms of dynamic shrinkage processes (see, inter alia, Kalliand Griﬃn, 2014; Uribe and Lopes, 2017; Rockova and McAlinn, 2018; Kowal et al. , 2019; Hauzenberger et al. ,2020), latent threshold models (Nakajima and West, 2013) or dynamic model selection techniques (Chan et al. ,2012; Koop and Korobilis, 2013). e too restrictive. Hauzenberger et al. (2019) use either a conjugate g-prior (Zellner, 1986) or aconjugate Minnesota prior (Doan et al. , 1984; Litterman, 1986), potentially lacking ﬂexibility todisentangle abrupt from gradual changes.In this paper, we carefully design suitable mixture priors for the state equation. In a ﬁrst variant,a mixture prior is not only introduced on the state innovations, but also on the autoregressivecoeﬃcients in the state equation to obtain suﬃcient ﬂexibility. To achieve parsimony in large models,a latent binary indicator determines the law of motion for the TVPs and detect periods wherecoeﬃcients evolve according to a random walk and times where the TVPs are better characterizedby a stationary stochastic process. Combined with a mixture on the innovation volatilites andsuitable shrinkage priors, this approach is capable of automatically capturing a wide range of typicalparameter changes. In a second variant, the sparse ﬁnite location mixture model of Hauzenberger et al. (2019) is extended by considering non-conjugate shrinkage priors and by replacing the locationmixture with a location-scale mixture. Here, an additional mixture on the state variances capturesthe notion that structural breaks in coeﬃcients happen infrequently (with potentially large TVPinnovations), while most of the time coeﬃcients are constant (with TVP innovations pushed towardszero), similar to mixture innovation models.In the previous paragraphs we repeatedly stated that our techniques are well suited to handleoverﬁtting issues in large TVP-VARs. But large TVP models also raise the question of computa-tional feasibility. In this contribution, computational complexity is reduced by using recent advancesin estimating large-scale TVP regression (see Chan and Jeliazkov, 2009; McCausland et al. , 2011;Hauzenberger et al. , 2020). These are based on rewriting the TVP model in its static regressionform. In this representation, the TVP model is treated as a very big regression model and thetechniques proposed in Bhattacharya et al. (2016) can be used. Since these algorithms are designedfor single equation models, we estimate the VAR model using its structural representation and thusestimate a set of unrelated TVP regressions (see Carriero et al. , 2019).Based on two applications we investigate the merits of the techniques developed in the paper.First, in an application using synthetic data we illustrate that the proposed methods work well indetecting small and large structural breaks in coeﬃcients. Second, we employ a large US macroeco-nomic dataset for an empirical application. Our proposed methods reveal interesting patterns in thelow-frequency relationship between unemployment and inﬂation. Moreover, to evaluate predictiveperformance of our approach, we perform a comprehensive forecasting exercise. This forecastinghorse race shows that the proposed framework works well relative to a wide range of competingmodels. he remainder of the paper is structured as follows. Section 2 introduces a TVP regression modelwith ﬂexible mixture priors and sketches the main contributions of the paper. Section 3 generallyoutlines inference in these models, while Section 4 discusses the posterior sampling algorithm ofBhattacharya et al. (2016), when applied to non-centerred TVP regressions. Section 5 and Section6 show the results for artiﬁcial data and US data, respectively. Finally, Section 7 summarizes andconcludes.

2. ECONOMETRIC FRAMEWORK2.1.

A TVP Regression

Let y t denote a scalar time series and x t refer to a K -dimensional vector of predictors, then theobservation equation for a TVP regression can be written as: y t = x (cid:48) t α t + ε t , ε t ∼ N (0 , σ t ) . (1)Here, α t is a K -dimensional vector of TVPs that relates x t to the quantity of interest and ε t denotesthe measurement error with mean zero and time-varying variance σ t . For the state equation of σ t ,we assume a stochastic volatility (SV) speciﬁcation and refer to Appendix A.1 and Kastner andFr¨uhwirth-Schnatter (2014) for details.Typically, α t is assumed to evolve according to a random walk (RW). In this paper, interestcenters on relaxing this assumption. In the following, to achieve both suﬃcient ﬂexibility and modelparsimony, we use two diﬀerent mixture speciﬁcations for α t . In the ﬁrst variant, we assume thatcoeﬃcients evolve according to a mixture of a random walk and a white noise process. In thesecond variant, interest centers on further relaxing the law of motion proposed in Hauzenberger et al. (2019). A Flexible State Equation

For a mixture between a random walk and white noise process we assume that the evolution of α t is given by: α t = α + φ t ( α t − − α ) + ς t , ς t ∼ N ( , Ψ t ) (2)with α denoting a K -dimensional intercept vector, φ t being a K -dimensional diagonal autore-gressive coeﬃcient matrix and ς t denoting a K -dimensional vector of state innovations, which arecentered on zero and feature a K × K -dimensional variance-covariance matrix Ψ t . Moreover, we ssume φ t and Ψ t to evolve according to a regime-switching process: φ t = S t (3)and Ψ t = S t ¯ Ψ + ( I K − S t ) ¯ Ψ . (4)Here, S t = diag ( s t , . . . , s Kt ) denotes a binary indicator matrix with { s it } Ki =1 being either zeroor one. ¯ Ψ = diag ( ¯ ψ , . . . , ¯ ψ K ), and ¯ Ψ = diag ( ¯ ψ , . . . , ¯ ψ K ) refer to K -dimensional diagonalmatrices. Equation 3 assumes that coeﬃcients evolve according to a mixture of a random walkand a white noise process, while Equation 4 ensures suﬃcient ﬂexibility of the state innovations,respectively. For example, if the covariate-speciﬁc indicator s it = 1 in the t th period, the i th covariatefollows a random walk with state innovation variance ψ it = ¯ ψ i , while if s it = 0 in the t th period, itfollows a stochastic process with variance ψ it = ¯ ψ i .This speciﬁcation (henceforth labeled as TVP-MIX ) nests a wide variety of popular TVP models,such as standard RW state equations and mixture innovation models. A standard random walkevolution is trivially obtained by setting S t = I K . A so-called mixture innovation model assumes φ t = I K and speciﬁes Ψ t similar to Equation 4 (Gerlach et al. , 2000; Giordani and Kohn, 2008; Koop et al. , 2009; Huber et al. , 2019). Additionally, mixture innovation speciﬁcations restrict ¯ Ψ = κ ˆ Ψ with κ being a small value close to zero and ˆ Ψ being a diagonal matrix collecting variable speciﬁcscaling parameters. Apart from discussing the relation to other popular TVP models, it is also worth highlightingadditional features of the model proposed in (2) to (4). If a parameter is almost constant, butalso features larger abrupt changes for some periods, we would expect that ¯ ψ i > ¯ ψ i . This case isof particular interest, when compared to a standard mixture innovation model with random walkstate equation. Conversely, if a coeﬃcient features large, more persistent swings, but also someperiods of parameter stability, we would expect ¯ ψ i < ¯ ψ i . Intuitively, the relative proportions of¯ ψ i and ¯ ψ i depend mainly on the nature of coeﬃcient changes. Alternatively, if the i th coeﬃcientis constant or negligible (static sparsity), this can be achieved with ¯ ψ i and/or ¯ ψ i close to zero(Lopes et al. , 2016). Note that in the special case of constant coeﬃcients, the proposed speciﬁcationis not identiﬁed. We address this issue in the context of interpreting the state indicators S t . In the empirical application, these are considered as important benchmarks. Related to literature on variable selection (George and McCulloch, 1993; 1997), here ¯ Ψ is commonly referred to asslab component and ¯ Ψ as spike component (see, for example, Huber et al. , 2019). .3. A Hierarchical Pooling Speciﬁcation

For a hierarchical pooling speciﬁcation, we follow Hauzenberger et al. (2019) and assume that thetime-varying part of α t follows a sparse ﬁnite mixture in the spirit of Malsiner-Walli et al. (2016).The speciﬁcation of the state equation α t (labeled as TVP-POOL ) reads as: α t = α + γ t . (5)Here, α denotes a K -dimensional constant coeﬃcient vector and γ t is assumed to be a K -dimensional vector of random coeﬃcients featuring a speciﬁc structure. That is, conditional onlatent group indicators θ t that takes a value n ∈ { , . . . , N } , γ t follows a multivariate Gaussiandistribution: γ t = µ n + ς t , ς t ∼ N ( , Ψ t ) , if θ t = n, (6)where µ n refers to the group-speciﬁc mean and Ψ t denotes the variance-covariance matrix. It isalso worth noting that θ t serves as group indicator for γ t . The probability that γ t is assigned tocluster n is deﬁned as P ( θ t = n ) = ω n .This structure is closely related to the setup of Hauzenberger et al. (2019). In the following,we extend their location mixture prior to a location-scale mixture prior by introducing a regime-switching speciﬁcation on Ψ t similar to Equation 4. That is, Ψ t = S t ¯ Ψ + ( I K − S t ) ¯ Ψ , (7)with both ¯ Ψ and ¯ Ψ being diagonal matrices and S t denoting a binary indicator matrix. Sim-ilar to standard mixture innovation models one component serves to detect larger breaks, while asecond component handles dynamic sparsity. We therefore discard the conjugate prior assumptionof Hauzenberger et al. (2019) and instead assume non-conjugate shrinkage priors on both statevariances (described in more detail in Subsection 3.1).Before proceeding, it is also worth sketching the general idea of this random coeﬃcient speciﬁc-ation. This model can be seen as a stochastic variant of multiple break point speciﬁcations (Koopand Potter, 2007), which is capable of capturing situations with a low, moderate or even largenumber of structural breaks. To estimate the number of regimes, we follow Malsiner-Walli et al. (2016) and Hauzenberger et al. (2019) and specify an “overﬁtting” model by setting N to a largeinteger (i.e. consider many regimes a priori). To achieve parsimony, we come up with an estimatefor the number of clusters ˆ N (usually ˆ N < N ) by specifying a shrinkage prior on both the mixture eights and the component means. Thus, overall shrinkage is determined between two interactingobjectives: we aim at eliminating irrelevant clusters, while at the same time attempting to avoidhighly overlapping component means.At this stage one might ask, why we do not assume N diﬀerent state innovation variances(i.e. using the group indicators θ t for both γ t and Ψ t )? Here, it is worth discussing two importantconsiderations. First, N denotes a large integer and might lead to overﬁtting issues without assumingadditional hierarchical shrinkage/pooling priors on the state innovation variances. Second, covariate-speciﬁc binary indicators ( S t ) for the scales already render the model highly ﬂexible and it allowsto introduce shrinkage on the state innovation variances in a simpler way. Moreover, the two-state mixture on the state variances (see Equation 7) is designed to support inference about thelocations γ n , for n = { , . . . , N } . We expect that many elements in { γ t } Tt =1 cluster around zero(i.e. coeﬃcients are constant with Ψ t close to zero), while occasionally there are structural breaksin some coeﬃcients (requires relatively large values in Ψ t ). Especially we aim to detect these twoextremes (changes/no changes in α t ) with γ t . The Latent State Indicator Matrix

Sofar we remained silent on the evolution of S t . There are many diﬀerent possibilities how thebinary indicators s it , for i = { , . . . , K } evolve over time. In the following, we assume two laws ofmotion:1. Pooled Markov-switching process:

When assuming a ﬁrst-order Markov process for each s it independently, sampling the state indicators can be computationally cumbersome, espe-cially if K is large. Since one has to rely on forward ﬁltering backward sampling algorithms,computation time quickly adds up. Therefore, we replace S t with s t I K . In the following, s t is assumed to be common to all K covariates in period t and governed by a joint Markovprocess. This process is driven by a transition probability matrix given by: P =  p − p − p p  , with transition probabilities from state k to l denoted by p kl and following a Beta distribution p kk ∼ B ( c k , c k ), for k = { , } (see Uribe and Lopes, 2017).2. Independent over time and covariate-speciﬁc indicators:

The assumption that a jointindicator governs the evolution of large number of coeﬃcients might be too inﬂexible in certaincases. For this reason, we also specify covariate-speciﬁc indicators, coupled with independent Alternatively, Koop et al. (2009) group coeﬃcients and assume class-speciﬁc indicators. ixture priors (see Lopes et al. , 2016). In contrast to covariate-speciﬁc Markov processes,mixture priors are assumed to be independent over time and thus do not involve computa-tionally demanding forward ﬁltering backward sampling algorithms. In the following, s it isassumed to follow an independent Bernoulli distribution with P ( s it = 1) = p i and p i beingBeta distributed, i.e. p i ∼ B ( c i, , c i, ).Moreover, it should be noted that the prior choice on the binary indicators is quite inﬂuential.For the random walk/white noise mixture ( TVP-MIX ), the hyperparameters are chosen in such away that it is more likely that gradual changes have a higher (unconditional) expected duration(with s t = 1) than abrupt changes (with s t = 0). In the empirical application we therefore set c = 0 . , c = 30 , c = 30, c = 0 . c i, = 0 . c i, = 30, i = { , . . . , K } , for the independent mixture distribution. For the location-scale mixture ( TVP-POOL with S t solely governing the state innovation variances, we take a more agnostic approach byassuming c = 0 . c = c i, = 0 . c = c = c i, = 3.

3. BAYESIAN INFERENCE

To discuss inference for both variants outlined in Section 2, we introduce a very general stateequation for α t : α t = α + γ t + φ t ( α t − − α ) + ς t , ς t ∼ N ( , Ψ t ) . (8)Equation 8 nests both approaches with the ﬁrst variant ( TVP-MIX ) being obtained by setting γ t = K × , while the second approach ( TVP-POOL ) is given by deﬁning φ t = K × K and γ t = µ n , if θ t = n . The Non-Centered Parameterization

In this subsection we exploit the non-centered parameterization to write ¯ Ψ and ¯ Ψ as part ofthe observation equation, enabling shrinkage on the regime-switching state innovation volatilities(Fr¨uhwirth-Schnatter and Wagner, 2010).We therefore recast the model as follows: y t = x (cid:48) t ( α + ˜ Ψ t ˜ α t ) (cid:124) (cid:123)(cid:122) (cid:125) α t + σ t (cid:15) t , (cid:15) t ∼ N (0 , , ˜ α t =˜ γ t + φ t ˜ α t − + η t , η t ∼ N ( , I K ) , ˜ α = , φ = I K . (9) ere, ˜ α t is a K -dimensional vector of normalized states, deﬁned as ˜ α t = ˜ Ψ − t ( α t − α ) and˜ γ t = ˜ Ψ − t γ t with ˜ Ψ t = diag ( √ ψ t , . . . , √ ψ Kt ) denoting the (matrix) square-root of Ψ t . Usingthe deﬁnition of Ψ t in Equation 4 (or Equation 7) the observation equation in Equation 9 can berewritten as: y t = x (cid:48) t ( α + S t ˜ Ψ ˜ α t + ( I K − S t ) ˜ Ψ ˜ α t ) + σ t (cid:15) t , and, more compactly, as a standard regression model: y t = ˆ x (cid:48) t ˆ α + σ t (cid:15) t , with ˆ x t = ( x (cid:48) t , ( S t x t (cid:12) ˜ α t ) (cid:48) , (( I K − S t ) x t (cid:12) ˜ α t ) (cid:48) ) (cid:48) denoting a 3 K -dimensional covariate vector andˆ α = ( α (cid:48) , (cid:112) ¯ ψ , . . . , (cid:112) ¯ ψ K , (cid:112) ¯ ψ , . . . , (cid:112) ¯ ψ K ) (cid:48) being a 3 K -dimensional coeﬃcient vector.On the time-invariant ˆ α we use a hierarchical global-local shrinkage prior (see Polson and Scott,2010):ˆ α j ∼ N (0 , τ j ) , τ j | λ ∼ f, λ ∼ g, for j = 1 , . . . K, where ˆ α j refers to the j th element in ˆ α , λ denotes a global shrinkage parameter and τ j induceslocal shrinkage. In the empirical application, we focus on the Normal-Gamma (Griﬃn and Brown,2010) shrinkage prior. This shrinkage prior has been proven to be successful in macroeconomic andﬁnancial application (see, for example, Huber and Feldkircher, 2019) and is quite common in theliterature. The exact prior speciﬁcation is outlined in Appendix A.2.

The Static Representation

If interest centers on estimating the latent states { ˜ α t } Tt =1 , we can straightforwardly recast Equation 9in a static regression form by conditioning on α , the state innovation volatilities { ˜ Ψ t } Tt =1 and thestochastic volatilities in Σ = diag ( σ , . . . , σ T ). We deﬁne y as a T -dimensional vector, X as a T × K -dimensional matrix and (cid:15) as a T -dimensional vector with y t , x (cid:48) t and (cid:15) t on the t th position,respectively. Then, the static form of Equation 9 is: y = Xα + W ˜ α + Σ (cid:15) , (cid:15) ∼ N (0 , I T ) , Φ ˜ α =˜ γ + η , η ∼ N ( , I ν ) . It is worth noting that any global-local shrinkage prior might be used. Other popular choices are the SSVS prior(George and McCulloch, 1993; 1997), the Horseshoe prior (Carvalho et al. , 2010), the Bayesian Lasso (Park andCasella, 2008) or the Triple-Gamma prior (Cadonna et al. , 2020). See also Huber et al. (2020a), Kastner and Huber(2020) and Cross et al. (2020) for thorough studies of global-local shrinkage priors in macroeconomic applications. ere, ˜ α = ( ˜ α (cid:48) , . . . , ˜ α (cid:48) T ) (cid:48) is a ν (= T K )-dimensional latent state vector, ˜ γ = (˜ γ (cid:48) , . . . , ˜ γ (cid:48) T ) (cid:48) is a ν -dimensional intercept vector and η is a ν -dimensional shock vector. After deﬁning ˜ x (cid:48) t = x (cid:48) t ˜ Ψ t , theprecise structure of W and Φ is given by: W =  ˜ x (cid:48) (cid:48) K × . . . (cid:48) K × (cid:48) K × ˜ x (cid:48) . . . (cid:48) K × ... ... . . . ... (cid:48) K × (cid:48) K × . . . ˜ x (cid:48) T  , and Φ =  I K K × K . . . K × K K × K − φ I K . . . K × K K × K ... ... . . . ... ... K × K K × K . . . − φ T I K  . In the following, solving for ˜ α yields:˜ α = Φ − (˜ γ + η ) , implying that ˜ α ∼ N ( a , Ω ) with prior mean a = Φ − ˜ γ and prior variance-covariance matrix Ω = ( Φ (cid:48) Φ ) − of ˜ α (see, for instance, Chan and Jeliazkov, 2009; Chan and Strachan, 2020). Inthe special case of φ t = K × K , for all t , Φ (and thus Ω ) reduces to an identity matrix, while φ t (cid:54) = K × K , for any t , induces a (speciﬁc) banded lower-triangular (block diagonal) structure of Φ ( Ω ). Here it is worth emphasizing, that the prior variance-covariance matrix Ω solely dependson state indicators S t .Moreover, the prior mean a also depends on the structure of Φ and ˜ γ . The simplest thing isto set a to a zero vector, which we implicitly assume for the TVP-MIX variants. For the

TVP-POOL approach we use a hierarchical mixture prior on a , described in detail next. A Hierarchical Prior Mean

The model outlined in Equation 5 to Equation 7 denotes a sparse ﬁnite location-scale mixture. Afterrecasting the model in the non-centered parameterization, we are able to replace the location-scalemixture prior on α t (outlined in Equation 6) with a location mixture prior on the normalized latentstates ˜ α t , since the scales of Equation 7 ( ¯ Ψ and ¯ Ψ ) are now part of the observation equation.That is:˜ α t | θ t = n ∼ N ( ˜ µ n , I K ) . (10)with group-speciﬁc mean ˜ µ n , for n = { , . . . , N } and variance-covariance matrix I K . In the follow-ing, the prior mean is deﬁned as a (= ˜ γ ) = ( a (cid:48) , . . . , a (cid:48) T ) (cid:48) , with a t = ˜ µ n if θ t = n . Note that φ t = K × K , for t = { , . . . , T } , is always true for the TVP-POOL model, but not ruled out for the

TVP-MIX speciﬁcation. Moreover, if Ω = I ν , a = ˜ γ . he sparse ﬁnite location mixture in Equation 10 allows us to use a similar prior setup asproposed in Malsiner-Walli et al. (2016) and Hauzenberger et al. (2019). To ensure model parsimonywe use a Dirichlet prior on ω = ( ω , . . . , ω N ) (cid:48) : ω | ξ ∼ Dir( ξ, . . . , ξ ) , with ξ referring to an intensity parameter. The prior on the intensity parameter is speciﬁed as: ξ ∼ G ( d , d N ) , with d = 10 in the empirical application. Here, we closely follow Malsiner-Walli et al. (2016),who show that this prior choice is successful in detecting superﬂuous components and obtaining aparsimonious mixture representation.Moreover, on the group means we specify the following shrinkage prior:˜ µ n | ˜ α ∼ N ( K × , Λ ) , with ˜ µ n being centered on zero and prior variance-covariance matrix Λ = LRL . Here, L =diag ( √ l , . . . , √ l K ) and R = diag( r , . . . , r K ) with r i denoting the range of ˜ α i = ( ˜ α i , . . . , ˜ α iT ) (cid:48) .Moreover, we specify a Gamma prior on the elements in L : l i ∼ G ( e , e ) . In what follows, we deﬁne e = e = 0 . et al. , 2016).

4. POSTERIOR COMPUTATION

In this subsection, we outline the MCMC sampling step for ˜ α . We stress that drawing ˆ α is computa-tionally fast, also for relatively large K , but sampling the ν -dimensional vector ˜ α is computationallydemanding (Hauzenberger et al. , 2020). Thus for ˆ α (and the remaining parameters) we use standardMCMC techniques with sampling steps and conditional posteriors outlined in Appendix A.3. or ˜ α , irrespectively of the structure of a and Ω , we obtain standard conditional Gaussianposterior quantities with ˜ y = Σ − ( y − Xα ) and ˜ W = Σ − W :˜ α | ˜ y , ˜ W , a , Ω ∼N ( a , Ω ) with Ω − = ( ˜ W (cid:48) ˜ W + Ω − ) (cid:124) (cid:123)(cid:122) (cid:125) ν × ν and a = Ω ( ˜ W (cid:48) ˜ y + Ω − a ) . The main issue, however, is that the inversion of Ω − is computationally costly, since it is ν × ν -dimensional matrix with ν = T K and { T, K } being potentially large integers.Thus, to avoid high-dimensional full matrix inversions and Cholesky decompositions for drawingthe normalized latent states ˜ α , we rely on the algorithms proposed in Bhattacharya et al. (2016),applied to TVP models in Hauzenberger et al. (2020). This method involves the following steps:1. Draw a ν -dimensional vector u ∼ N ( , I ν ) .

2. Sample a T -dimensional vector v ∼ N ( , I T ) .

3. Deﬁne q = a + Φ − u , with Φ − denoting the lower Cholesky factor of Ω , and r = ˜ W q + v .

4. Compute ˜ Ω = ( I T + ˜ W Ω ˜ W (cid:48) ) − .

5. Set f = ˜ Ω ( ˜ y − r ) .

6. Obtain a draw for ˜ α = ( Ω ˜ W (cid:48) f ) + q . Moreover, using the static representation for a TVP regression the involved matrices are sparse,which can be exploited to achieve additional computational gains (see Chan and Jeliazkov, 2009;Hauzenberger et al. , 2019; 2020). Depending on the structure of Φ there a two extreme cases asbrieﬂy discussed in Subsection 3.2. Computationally the most expensive case is a random walkstate equation ( φ t = I K , ∀ t ), while having no autoregressive structure in the state equation( φ t = K × K , ∀ t ) it is computationally less demanding. Recall, the former Φ has a speciﬁc lowertriangular structure (rendering Ω block diagonal) and in the latter both Φ and Ω are diagonal.Thus, even for a random walk state equation (the most dense case), using sparse algorithms pays oﬀin terms of computation. Moreover, if φ t = S t , for some t , we have to account for an intermediatecomputational burden lying between the two extremes that eventually depend on the exact structureof Φ ( Ω ). See Hauzenberger et al. (2020) for a comparison between the two extremes. .1. Equation-wise estimation for a TVP-VAR

The methods outlined in the previous subsection are designed for single equation models. To usethese algorithms also for posterior inference in TVP-VARs, we rewrite the multivariate model as aset of unrelated TVP regressions (see Carriero et al. , 2019).This can be done by using the structural form of the TVP-VAR: Y t = B t Y t + p (cid:88) i =1 B it Y t − i + C t + (cid:15) t , (cid:15) t ∼ N ( , Σ t ) . (11)Here, Y t = ( Y t , . . . , Y mt ) (cid:48) denotes an m -dimensional vector of endogenous variables with B t beingan m × m -dimensional strictly lower-triangular matrix (with zero main diagonal) deﬁning contem-poraneous relationships between the elements of Y t . Moreover, B it , for i = 1 , . . . , p , denotes an m × m -dimensional time-varying coeﬃcient matrix, C t is an m -dimensional intercept vector and (cid:15) t refers to an m -dimensional Gaussian distributed error vector, centered on zero and with time-varying m -dimensional diagonal variance-covariance matrix Σ t = diag ( σ t , . . . , σ mt ). Before proceeding, itis convenient to deﬁne B t = ( B t , . . . , B pt ).In the following, for i = 2 , . . . , m , the i th equation of Y t is given by: y it = x (cid:48) it ( α i + ˜ α it ) (cid:124) (cid:123)(cid:122) (cid:125) α it + (cid:15) it , (cid:15) it ∼ N (0 , σ it ) , with x it denoting a K i (= mp + i )-dimensional covariate vector with x it = ( { y jt } i − j =1 , Y (cid:48) t − , . . . , Y (cid:48) t − p , (cid:48) and α it = ( { b ij, t } i − j =1 , B i • ,t , c it ) (cid:48) a K i -dimensional vector of time-varying coeﬃcients. Here b ij, t refers to the ( i, j ) th element of B t , B i • ,t denotes the i th row of B t and c it the i th element of C t .Moreover, for the ﬁrst equation ( i = 1) we have x t = ( Y (cid:48) t − , . . . , Y (cid:48) t − p , (cid:48) and α t = ( B • ,t , c t ) (cid:48) .

5. SIMULATION STUDY

In this section we use synthetic data to illustrate the features of the proposed mixture variants.For the data generating process (DGP) we assume that the number of observations is T = 100 andthe number of covariates is given by K = 5. The covariates are simulated with X j ∼ N ( , I T )for j = 1 , . . . , ( K −

1) and X K = ι T with ι T being a T -dimensional vector of ones. For the errorvariance σ t , we assume an SV speciﬁcation with log( σ t ) = h t following a random walk process.That is, h t = h t − + ϑ t with ϑ t ∼ N (0 , .

1) and h = log(0 . α t we assume quite speciﬁc laws of motion. We deﬁne α = ( − , , − , , (cid:48) as initial level and assumethat both regime-switching autoregressive parameters and regime-switching variances in the stateequation are governed by a joint Markov process s t . Here, we let S t = s t I K , φ t = S t , Ψ t = S t ¯ Ψ + I K − S t ) ¯ Ψ with ¯ Ψ = diag (10 − , . , . , − ,

1) and ¯ Ψ = diag (1 , . , . , − , − ). Thejoint indicator s t is simulated with transition probabilities p = 0 . p = 0 .

95, eﬀectivelyleading to a higher unconditional probability that α t follows a random walk evolution.In particular, the ﬁrst coeﬃcient features larger, more persistent parameter changes (with ψ =1), while for a small number of periods α t is basically constant (achieved through the white noisestate equation with ψ close to zero). The second parameter features small gradual changes overtime ( ψ = 0 . ψ = 0 . ψ > ψ . The fourth coeﬃcient is assumed to be constant over time( ψ and ψ are both close to zero) and, ﬁnally, the ﬁfth parameter features some extremely largebreaks (with ψ = 1), while it is otherwise assumed to be constant (here achieved through therandom walk state equation with ψ close to zero).To assess the ﬂexibility of our approaches we compare them to models assuming a standardrandom walk evolution of coeﬃcients and to those assuming constant coeﬃcients. Moreover, weconsider a typical mixture innovation model as an important benchmark. For each model weuse a Normal-Gamma (Griﬃn and Brown, 2010) prior on the constant part and allow for SV inthe measurement error variance. Furthermore, each TVP model features a Normal-Gamma prioron the square root of the state innovation variances, which are potentially regime-switching (seeEquation 9).Panels (a) to (c) in Figure 1 depict the evolution of regression parameters estimated with ourproposed methods, while panel (d) shows estimates with a typical mixture innovation model. Thered solid lines denote the true coeﬃcients, the blue shaded areas represent the 68% posterior credibleinterval (with the blue solid lines referring to the posterior median), the gray shaded area representthe respective credible set of a standard TVP model with random walk state equation and theblack-dotted lines indicate the 16 th /50 th /84 th percentiles of a constant coeﬃcient model.The results with artiﬁcial data reveal at least three important features. First, all TVP modelsyield reasonable estimates for constant coeﬃcients, which is most important for forecasting applica-tions. Focussing on the the fourth parameter, considering a more ﬂexible model pays oﬀ to produceless biased and more precise estimates, especially when compared to the constant coeﬃcient model.Second, the TVP-MIX speciﬁcations are capable in capturing both rapid shifts and smooth adjust-ments in the regression coeﬃcients. Our methods in panel (a) and (b) tend to quickly adjust whenfacing high frequency changes, rendering the methods even more ﬂexible when compared to a typ-ical mixture innovation speciﬁcation in panel (d). Third, the

TVP-POOL model in panel (c) tends todetect sudden changes in the parameters quite well, but is less capable in capturing low frequencymovements. This feature diﬀers form a standard random walk evolution assumption on the TVPs. ssuming a standard random walk implies smoothly evolving coeﬃcients makes capturing high fre-quency changes diﬃcult. Interestingly, the time-varying intercept (the ﬁfth coeﬃcients) tends tosoak up movements of other parameters. Models that do not truly detect the large breaks of thethird coeﬃcient are particularly prone to this issue ( TVP-POOL but also

TVP-RW speciﬁcations). st coeﬃcient with true parameters: α = − , ¯ ψ ≈ , ¯ ψ = 1 nd coeﬃcient with true parameters: α = 3 , ¯ ψ = 0 . , ¯ ψ = 0 . rd coeﬃcient with true parameters: α = − , ¯ ψ = 0 . , ¯ ψ = 0 . th coeﬃcient with true parameters: α = 2 , ¯ ψ ≈ , ¯ ψ ≈ th coeﬃcient with true parameters: α = 0 , ¯ ψ = 1 , ¯ ψ ≈ (a) TVP-MIX with ﬂexible state variances (

FLEX ) and S t = s t I K ( MS ) : − − − − − − − − − − . . . . . . . − − − (b) TVP-MIX with ﬂexible state variances (

FLEX ) and covariate-speciﬁc indicators (

MIX ) : − − − − . . . . . . . . − − − − − − . . . . . . . − − − (c) TVP-POOL with ﬂexible state variances (

FLEX ) and covariate-speciﬁc indicators (

MIX ) : − − − − . . . . . . . − − − − − − . . . . − − (d) TVP-RW with SSVS-type state variances (

SSVS ) and covariate-speciﬁc indicators (

MIX ) : − − − − . . . . . . . . − − − − − − . . . . . . . − − − − Figure 1:

The blue-shaded areas denote the 68% posterior credible intervals of the proposed methods with the blue solid lines denoting the posterior medians.The gray shaded areas refer to the 68% credible sets of a standard TVP regression with random walk state equation. The black dotted lines indicate the16 th /50 th /84 th percentiles of a constant coeﬃcient model. Moreover, the red lines denote the true coeﬃcients of α t . . EMPIRICAL APPLICATION Structural analysis and forecasting key macroeconomic indicators is of great relevance for policymakers. In the empirical work, we focus on output growth, inﬂation, unemployment, and/or theinterest rate. Focussing on these variables we investigate the merits of our approach by using thepopular quarterly US data described in McCracken and Ng (2016). The data set includes 165macroeconomic and ﬁnancial variables and ranges from 1959:Q1 to 2019:Q4. In Subsection 6.1 weshow some stylized in-sample features of our methods for a small-scale model. By including the fourtarget variables in a small-scale VAR (henceforth

S-VAR ) we present posterior probabilities of thestate indicator matrix S t and estimate the low-frequency relationship between unemployment andinﬂation. Moreover, in Subsection 6.2, this variable set forms the basis for evaluating the predictiveperformance of our methods in a comprehensive forecast exercise.For the forecasting exercise, we consider two additional information sets. In our largest speciﬁc-ation ( L-VAR ) we pick 20 macroeconomic indicators, which are commonly considered by the recentliterature for forecasting (see, for example, Huber et al. , 2020a; Pfarrhofer, 2020). In particular,we include ﬁnancial market indicators that carry important information about the future stance ofthe economy (see Ba´nbura et al. , 2010). Moreover, we consider a factor-augmented VAR (

FA-VAR ).Here, we augment the target variables with six principal components compromising information ofthe remaining variables in the data set, eﬀectively leading to VAR with ten endogenous variables. In such larger scale-models our methods are capable of handling less frequent (but important)parameter instabilities in a genuine way.Especially forecasting these important macroeconomic aggregates remains a challenging task,since (at least) two issues arise. First, we have to decide on a set of variables, which we want toinclude in our econometric model. The recent literature on constant parameter VARs highlights thatexploiting large information sets yields forecast gains (see, for example, Ba´nbura et al. , 2010; Koop,2013). Second, it is well documented that important economic indicators feature instabilities instructural parameters and innovation volatilities. In the literature there is strong agreement thatSV is important in macroeconomic applications (see Clark, 2011). There is also strong empiricalsupport for shifting parameters in small-scale models (see D’Agostino et al. , 2013). However, thereis less consensus for time-varying parameters in larger-scale models. With increasing amount ofinformation overall time-variation in parameters tends to reduce. Recent contributions dealing with In the empirical application we start with 1962:Q1 and use the ﬁrst observations for transformations. In Appendix C we provide further details on the speciﬁc variable set, included in the largest speciﬁcation, and thetransformation applied. The number of principal components is motivated by the speciﬁcation in Stock and Watson (2012), who also considersix factors. See, for example Stock and Watson (2012), Ng and Wright (2013) and Aastveit et al. (2017), which put specialemphasis on the recent ﬁnancial crisis. arge-scale TVP-VARs argue that in smaller models the TVP part controls for an omitted variablebias (see Feldkircher et al. , 2017; Huber et al. , 2020b). In the following empirical application, note thate we consider two lags for every model and allowfor SV.

In-sample evidence

Before proceeding, we brieﬂy elaborate on a potential identiﬁcation problem when interpretingthe state indicators S t (see Fr¨uhwirth-Schnatter, 2001). For the TVP-MIX models, identiﬁcationis ensured by construction (if coeﬃcients indeed feature time variation). Assuming φ t = S t (seeEquation 3) automatically imposes inequality constraints on the autoregressive coeﬃcients in thestate equation. However, non-identiﬁability can occur when coeﬃcients are constant. In such a case,elements in S t are hard to interpret, since a no change evolution is supported by both a randomwalk and a white noise process. Interpreting S t for the TVP-POOL speciﬁcation is an even morechallenging task, since in these models S t solely controls the evolution of state innovations. Here,inference about the state indicator matrix is only useful in combination with inference about thesize of state innovation variances ¯ Ψ and ¯ Ψ and with imposing an inequality restriction ex-post(for example, ¯ ψ i < ¯ ψ i ).Therefore, we solely focus on two variants of a TVP-MIX model to illustrate the switching be-haviour. Figure 2 depicts the posterior median of the diagonal elements in S t . Panel (a) showsa TVP-MIX model with S t = s t I K and s t following a ﬁrst-order Markov process ( MS ). Panel (b)depicts a speciﬁcation with elements in S t following an independent mixture distribution ( MIX ). Acomparison between both approaches highlights that a joint indicator evidently leads to a diﬀerentposterior median of S t than covariate-speciﬁc indicators. By restricting S t = s t I K , all covariatesare driven solely by a single indicator that pushes all covariates towards either a random walk orwhite noise state equation in period t . Conversely, with covariate-speciﬁc indicators, we see moredispersion across covariates. However, both approaches agree on a white noise state equation intimes of turmoil, suggesting a need for abruptly adjusting parameters in these periods. This modelfeature is in line with the discussion in Primiceri (2005), who suggests that an economically stableperiod favours more gradual changes (which are more consistent with a random walk state equation)in the coeﬃcients, while shifts in policy rules require quickly adjusting coeﬃcients (which is bettercaptured by using a white noise state equation). Since estimating TVP models with typical MCMC methods remains computationally demanding, several studiestake this argument as a reason to opt for approximating the TVP part or rely on dimension reduction techniques,yielding fast inference while accepting a certain risk of misspeciﬁcation (see, inter alia Eisenstat et al. , 2019; Korobilis,2019; Hauzenberger et al. , 2020; Huber et al. , 2020b; Korobilis and Koop, 2020). o further illustrate the proposed methods, we estimate the low-frequency relationship betweenunemployment and inﬂation. This low-frequency measure corresponds to a long-run coeﬃcient ofdistributed-lag regression models (Whiteman, 1984) and disentangles systematic co-movements fromshort-run ﬂuctuations. Panel (a) to (c) in Figure 3 depict the obtained low-frequency component with our proposedapproaches and panel (d) shows estimates with a standard TVP model assuming a standard randomevolution assumption. Starting with a comparison between the random walk/white noise mixture(

TVP-MIX ) and a classic random walk TVP model, we observe a similar pattern for both approachesduring tranquil periods. However, during recessions, the approaches signiﬁcantly diﬀer. Both

TVP-MIX models are capable of detecting a major structural break in the low-frequency relationshipafter the oil crisis in the 1970s and strongly support a long-lasting stagﬂation period (i.e. positiverelationship between unemployment and inﬂation). While

TVP-MIX methods are designed to quicklycapture these large abrupt breaks in parameters, a standard random walk state equation translatesinto a low-frequency component that gradually adapts over time. However, the

TVP-MIX modelwith covariate-speciﬁc indicators

MIX is slightly more sensitive with respect to abrupt changes inparameters than the

TVP-MIX MS model.Panel (c) shows the sparse scale-location mixture (

TVP-POOL ) approach with covariate-speciﬁcindicators (

MIX ). We observe that this method almost resembles a constant coeﬃcient speciﬁcationwith SV. In the mid 1980s and in the ﬁnancial crisis movement in the low-frequency relationship isslightly more erratic compared to other periods, but it stays mostly constant and signiﬁcant.Overall, considering

TVP-MIX methods seem to improve the economic interpretability of the low-frequency component, while a

TVP-POOL model aggressively pushes coeﬃcients towards a constantevolution, which could pay oﬀ for forecasting.

Forecasting evidence

In the forecast exercise we consider a wide range of models varying along the evolution assumptionof parameters and the information set considered.With respect to the evolution of parameters, it proves convenient to summarize the diﬀerentspeciﬁcations (see Table 1). The models diﬀer along three dimensions: the autoregressive parameters φ t , the innovation variances Ψ t and the state indicator matrix S t . First, our main speciﬁcationsvary between a model that assumes a binary indicator matrix on the autoregressive parameter with φ t = S t (labeled as TVP-MIX ) and a model that introduces a hierarchical prior on the TVP-part Sargent and Surico (2011) and Kliem et al. (2016) suggest that a TVP-VAR framework, additionally, allows toaccount for changes in the transmission channels (time-varying coeﬃcients) and changes in the error volatilities(SV). For further details see Appendix A.4. a) S t = s t I K with s t following an MS process: G_t C_t U_t F_t

G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t G_t−1C_t−1 U_t−1 F_t−1 G_t−2C_t−2 U_t−2 F_t−2 ic G_t C_t U_t F_t K prob. (b) Elements in S t follow an independent mixture speciﬁcation: G_t C_t U_t F_t

Posterior distribution of s it , for i = { , . . . , K } , for two small-scale TVP-MIX models.Here, G denotes output growth (GDPC1), C the inﬂation (CPIAUCSL), U the unemploymentrate (UNRATE), F refers to the interest rate (FEDFUNDS) and ic to an intercept. Moreover, thestructural form in Equation 11 implies that some parameters are not part of the i th equation(denoted by grey shaded areas), due to the strictly lower triangular structure of B t .( TVP-POOL ). For the latter we implicitly assume that φ t = K × K , for all t , in Equation 5. Regardingthe autoregressive parameter, a natural competing model is a standard random walk assumptionwith φ t = I K , for all t ( TVP-RW ). Second, the models diﬀer in the treatment of the state innovationvariances. The most ﬂexible innovation variance speciﬁcation does not restrict ¯ Ψ and ¯ Ψ (labeledas FLEX ), a second speciﬁcation assumes ¯ Ψ = κ ˆ Ψ ( SSVS ), while the most restrictive speciﬁcationﬁxes Ψ t = ¯ Ψ , for all t , to a a single variance state ( SINGLE ). In the empirical exercise, we set κ = 0 . and ˆ Ψ = diag ( ˆ ψ , . . . , ˆ ψ K ) with ˆ ψ i , for, i = { , . . . , K } , denoting ordinary least square(OLS) variances obtained from an AR( p ) model (see Huber et al. , 2019). Third, with regards tothe state indicator matrix S t , we discriminate between a joint Markov-switching indicator (labeledas MS ) and covariate-speciﬁc indicators following an independent mixture distribution ( MIX ). Recall,that for the

TVP-MIX models S t adjusts both the autoregressive parameters and the state innovationvariances, while for the TVP-POOL and

TVP-RW models S t only controls the state innovations. Inthe following, we deﬁne TVP-MIX , TVP-POOL and

TVP-RW as the

Class of the TVP model and thecombination of the acronyms for the innovation variances and indicator matrix as the

Subclass ofthe speciﬁcation. A single model is identiﬁed by a combination of all three acronyms. For example,a

TVP-MIX FLEX MIX speciﬁcation denotes a model with a random walk/white noise mixture forthe state equation, with unrestricted two-state variances and with the elements in S t following anindependent mixture distribution. a) TVP-MIX with ﬂexible state variances (

FLEX ) and S t = s t I K ( MS ) : − (b) TVP-MIX with ﬂexible state variances (

FLEX ) and covariate-speciﬁc indicators (

MIX ) : − (c) TVP-POOL with ﬂexible state variances (

FLEX ) and covariate-speciﬁc indicators (

MIX ) : − (d) Standard TVP-VAR with random walk state equation: − Figure 3:

Low-frequency relationship between the unemployment rate and the inﬂation. Theblue line refers to the posterior median, while the blue-shaded area indicates the 68% posteriorcredible set. The red line indicates zero. able 1: Overview of speciﬁcations.

TVP-MIX φ t = S t Ψ t = S t = Related to: FLEX MS S t ¯ Ψ + ( I K − S t ) ¯ Ψ s t I K FLEX MIX S t ¯ Ψ + ( I K − S t ) ¯ Ψ diag ( s t , . . . , s Kt ) SINGLE ¯ Ψ SSVS MIX S t ¯ Ψ + κ ( I K − S t ) ˆ Ψ diag ( s t , . . . , s Kt ) Chan et al. (2012) TVP-POOL φ t = K × K FLEX MS S t ¯ Ψ + ( I K − S t ) ¯ Ψ s t I K FLEX MIX S t ¯ Ψ + ( I K − S t ) ¯ Ψ diag ( s t , . . . , s Kt ) SINGLE ¯ Ψ Hauzenberger et al. (2019)

SSVS MIX S t ¯ Ψ + κ ( I K − S t ) ˆ Ψ diag ( s t , . . . , s Kt ) TVP-RW φ t = I K FLEX MS S t ¯ Ψ + ( I K − S t ) ¯ Ψ s t I K FLEX MIX S t ¯ Ψ + ( I K − S t ) ¯ Ψ diag ( s t , . . . , s Kt ) SINGLE ¯ Ψ Standard

TVP-RWSSVS MIX S t ¯ Ψ + κ ( I K − S t ) ˆ Ψ diag ( s t , . . . , s Kt ) e.g. Huber et al. (2019) All these TVP models feature a Normal-Gamma (Griﬃn and Brown, 2010) on ˆ α . We compareour methods to two constant parameter models. One variant features a Normal-Gamma ( const.(NG) ) prior, while the second variant assumes a Minnesota ( const. (MIN) ) prior. We consider anon-conjugate Minnesota prior, capturing the notion that own lags are more important than lagsfrom other variables (Doan et al. , 1984; Litterman, 1986). We estimate this set of models for threeinformation sets (

FA-VAR , L-VAR and

S-VAR ) with each featuring a diﬀerent number of endogenousvariables. Every considered speciﬁcation features two lags and SV.To asses one-quarter-, one-year- and two-year-ahead predictions, we treat observations rangingfrom 1962:Q1 to 1999:Q4 as an initial sample and the periods from 2000:Q1 to 2019:Q4 as a hold-outsample. The initial sample is then recursively expanded until the penultimate quarter (2019:Q3)is reached. For each forecast comparison, a small-scale Minnesota VAR with constant parameters(

S-VAR const. (MIN) ) serves as our benchmark. In the following, Table 2 shows the best performingmodels for point and density forecasts, being a tractable summary of Table 3 and Table 4. Table 3depicts root-mean squared error ratios (RMSEs) as point forecast measures and Table 4 the logpredictive Bayes factors (LPBFs) as density forecast metrics. The best performing models withineach column are indicated by bold numbers. In Table B.1 we provide additional results on continuousrank probability score (CRPS) ratios. This alternative density forecast measure is more robust tooutliers than log predictive scores (Gneiting and Raftery, 2007). With three diﬀerent measures Note with a single-state variance ( Ψ t = ˆ Ψ ), ˆ α collapses to a 2 K -dimensional vector (see Bitto and Fr¨uhwirth-Schnatter, 2019). A constant coeﬃcient model can be obtained by either oﬀsetting ˜ α = ν × or setting { Ψ t } Tt =1 ≈ K × K . t three diﬀerent horizons we obtain a comprehensive picture to evaluate our methods jointly andmarginally along the four target variables. Table 2:

Overview of the best performing models, indicated by bold numbers in Table 3 andTable 4.

Variable 1-quarter-ahead 1-year-ahead 2-years-ahead

Size Class Subclass Size Class Subclass Size Class Subclass

Point forecastsRMSE ratios

TOT

L-VAR TVP-POOL SINGLE L-VAR TVP-POOL SINGLE FA-VAR TVP-POOL SINGLE

GDPC1

L-VAR TVP-POOL FLEX MIX FA-VAR TVP-POOL FLEX MS FA-VAR TVP-MIX FLEX MIX

CPIAUCSL

S-VAR TVP-MIX FLEX MIX S-VAR TVP-RW FLEX MIX L-VAR TVP-RW SINGLE

UNRATE

L-VAR TVP-POOL SSVS MIX L-VAR TVP-POOL SINGLE FA-VAR TVP-POOL SSVS MIX

FEDFUNDS

FA-VAR TVP-RW SSVS MIX FA-VAR TVP-RW SSVS MIX FA-VAR TVP-POOL SINGLE

Density forecastsLPBFs

TOT

L-VAR TVP-POOL FLEX MIX L-VAR TVP-POOL SSVS MIX L-VAR const (NG.)

GDPC1

L-VAR TVP-POOL FLEX MS FA-VAR TVP-POOL SSVS MIX FA-VAR TVP-POOL SINGLE

CPIAUCSL

S-VAR TVP-MIX SSVS MIX L-VAR const. (NG) L-VAR const. (NG)

UNRATE

L-VAR TVP-POOL SSVS MIX L-VAR TVP-POOL SSVS MIX L-VAR TVP-MIX SSVS MIX

FEDFUNDS

L-VAR TVP-POOL SSVS MIX FA-VAR TVP-RW FLEX MIX FA-VAR const. (Min)

Table 2 summarizes the main ﬁndings of our forecast exercise.

First , larger-scale models (

FA-VAR , L-VAR ) generally outperform the small-scale speciﬁcations across horizon-variable combinations,indicating that an increasing amount of information pays oﬀ for forecasting (see Ba´nbura et al. ,2010). One exception is inﬂation. For inﬂation, ﬂexible

S-VAR s yield more accurate forecasts than

FA-VAR s and

L-VAR s for one-quarter- and one-year-ahead point forecasts and one-quarter-aheaddensity forecasts. Comparing

FA-VAR s with

L-VAR s, the results are mixed. One pattern worthnoting is that

L-VAR s tend to outperform

FA-VAR s for the one-quarter-ahead horizon while thepicture reverses for higher-order forecasts. Second, with respect to parameter changes we see thatthe

TVP-POOL speciﬁcations forecast particularly well across all horizons and target variables. Thesemodels substantially improve upon a wide range of benchmarks. Overall, Table 2 shows that allTVP classes that provide accurate point predictions generally also perform well in terms of densityforecasts. able 3: Point forecast performance (RMSE ratios) relative to the benchmark ( const (Min.) ). The red shaded row denotes the benchmark (andits RMSE values). Asterisks indicate statistical signiﬁcance for each model relative to const (Min.) at the 1 ( ∗∗∗ ), 5 ( ∗∗ ) and 10 ( ∗ ) percentsigniﬁcance levels. Speciﬁcation 1-quarter-ahead 1-year-ahead 2-years-ahead

Class Subclass TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS

FA-VAR const. (Min.) 0.91* 0.87 0.94 0.79 1.22 0.92** 0.84* 1.03 0.80 1.11 0.93* 1.00 1.04 0.86 0.78***const. (NG) 0.88** 0.81* 0.95 0.80 1.14 0.90** 0.84** 0.99 0.81 1.02 0.92 1.00 1.01 0.87 0.75**TVP-MIX FLEX MIX 0.89** 0.80* 0.97 0.81 0.91 0.94* 0.92 0.98 0.88 0.98 1.01

SSVS MIX 0.89** 0.81* 0.96 0.76 1.06 0.88** 0.81** 1.00 0.77 0.98 0.89 0.98 1.01

L-VAR const. (Min.) 1.02 0.99 1.05 0.72 1.52 0.92** 0.93 0.95* 0.79 1.00 0.97** 1.02 0.98 0.95 0.88**const. (NG) 0.91** 0.85 0.96 0.71 1.10 0.88*** 0.89* 0.92** 0.74 0.98 0.92** 1.00 0.97 0.89 0.79**TVP-MIX FLEX MIX 1.04 0.90 1.14 0.71 2.19 0.91* 0.82 1.02 0.81 1.09 0.92 0.93 0.99 0.89 0.90FLEX MS 1.06 0.92 1.17 0.72 1.51 0.90* 0.86 0.95 0.79 1.08 1.22 1.56 1.30 0.90 1.13SINGLE 0.92* 0.86 0.98 0.74 0.90** 0.87* 0.82* 0.94 0.83 0.87* 0.92 0.94 0.97 0.91 0.86SSVS MIX 0.96 0.98 0.94 0.72 1.20 0.90 0.86 0.98 0.78 1.00 0.95 0.96 1.06 0.91 0.89TVP-POOL FLEX MIX 0.88***

S-VAR const. (Min.) 0.60 0.83 0.85 0.15 0.11 0.76 1.00 0.87 0.62 0.41 0.91 0.93 0.85 1.10 0.73const. (NG) 0.99** 0.99* 0.99 1.00 1.01 0.96** 0.94* 0.98 0.98* 0.98 0.97 0.99 0.98 0.97** 0.95TVP-MIX FLEX MIX 0.93** 0.94* able 4: Density forecast performance (LPBFs) relative to the benchmark ( const (Min.) ). The red shaded row denotes the benchmark (and itsLPS values). Asterisks indicate statistical signiﬁcance for each model relative to const (Min.) at the 1 ( ∗∗∗ ), 5 ( ∗∗ ) and 10 ( ∗ ) percentsigniﬁcance levels. Speciﬁcation 1-quarter-ahead 1-year-ahead 2-years-ahead

Class Subclass TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS

FA-VAR const. (Min.) 11.35 12.01** 2.23 8.56 -4.19 14.59 8.30** 2.02 14.03 8.69 37.98** 2.12 5.63 12.49 const. (NG) 18.88 13.48*** 4.19 8.66 0.46 25.90*** 9.01*** 2.34 14.21 10.93 40.82*** 3.08 7.01 12.22 28.66***TVP-MIX FLEX MIX 23.23 12.71** 2.78 11.15 12.95*** 39.12** 7.60 2.33 26.49 16.00 50.99 3.80 3.97 23.44 13.09FLEX MS 29.37 11.74** 5.32 13.84 9.59* 49.54** 7.49 2.32 27.51 17.64 41.96 0.10 3.65 22.67 13.22SINGLE 24.87 11.07** 2.24 10.12 17.01*** 52.21** 6.25 0.25 25.16 24.39 55.19 0.70 3.17 22.70 18.79SSVS MIX 25.93 13.23** 2.59 10.60 6.38* 40.29* 7.92* 1.90 23.53 10.64 40.57 3.38 1.82 25.83 7.45TVP-POOL FLEX MIX 26.23 13.49*** 3.44 10.51 7.28*** 39.69*** 10.63** 2.06 19.71 15.86 53.01** 3.61 6.18 22.98 28.67*FLEX MS 28.06 14.49*** 3.98* 10.24 7.62*** 34.75** 10.85*** 2.71 17.29 14.71 52.55** 4.07 6.27 21.13 28.34*SINGLE 25.63 14.42*** 2.98 10.61 6.20*** 34.67** 10.54*** 2.21 18.32 16.06 46.04**

L-VAR const. (Min.) 15.05 13.32* -0.29 12.15 -3.68*** 56.73** 2.78 3.22*** 32.41 11.78*** 61.59*** -3.86 3.18 15.52 18.28**const. (NG) 28.70 15.97*** 1.06 14.87 2.34 70.27*** 7.40**

S-VAR const. (Min.) -22.11 -82.64 -80.74 45.02 86.64 -256.48 -97.26 -83.70 -69.04 -29.94 -383.84 -92.27 -89.84 -125.53 -87.85const. (NG) 4.55 1.97* 0.71 0.62* 1.39*** 4.97 2.64* 1.95* 0.03 2.30 9.15 0.58 3.07 2.99 1.37TVP-MIX FLEX MIX 24.41 3.97* 5.52 2.90 12.18*** 15.60 9.64** 3.98 8.50 5.03 -3.76 3.00 3.74 -0.02 -7.71FLEX MS 18.00 2.66 3.57 6.42 5.86** 9.48 5.25* -0.44 12.27 -0.22 0.77 -1.60 0.83 10.35 -11.26SINGLE 7.18** 2.45 -1.98 0.85 13.26*** 12.51 5.42 3.33 6.38 7.12* -0.80 -2.74 1.19 2.14 -5.03SSVS MIX 23.21 3.44* hen examining Table 3 and Table 4 in greater detail, note that a large number of modelsshown in the tables outperform the Minnesota benchmark in terms of RMSEs (indicated by ratiosbelow one) and in terms of LPBFs (indicated by values above zero). However, the benchmark is atough competitor when predicting inﬂation and for higher-order point forecasts.When focussing on the diﬀerences occuring through the varying treatment of parameter evol-utions, our proposed methods, the TVP-POOL and

TVP-MIX speciﬁcations, show that their goodperformance is mainly driven by improved forecast accuracy for output growth and unemployment.In terms of the innovation variance assumption for these speciﬁcations, we observe that additionalﬂexibility tends to improve density forecasts performance and yields accurate point forecasts. Forthe

TVP-POOL models this higher degree of ﬂexibility generally pays oﬀ across variables and modelsizes. For ﬂexible

TVP-MIX speciﬁcations, forecast ability tends to improve for

S-VAR s and

FA-VARs and is competitive for the

L-VAR s. Especially the

TVP-MIX SSVS MIX and

TVP-MIX FLEX MIX mod-els using a small information set yield quite accurate inﬂation forecasts, being the best performingmodels for the one-quarter-ahead horizon. Across variables, a notable exceptions is the interestrate for

TVP-MIX models. Here, a

TVP-MIX SINGLE speciﬁcation is superior to models assuming amixture on innovation volatilities.When assessing random walk state equation (

TVP-RW ) speciﬁcations across the information sets,two things are worth noting. First, a standard TVP model with random walk assumption

TVP-RWSINGLE is only competitive for one-year- and two-year-ahead forecasts and otherwise forecasts poorly.Second, more ﬂexible

TVP-RW variants produce quite accurate forecasts for

FA-VAR s. Constantparameter models with a Normal-Gamma prior show reasonable forecasts for

L-VAR s (especially forinﬂation), but lack ﬂexibility in smaller-scale models. This observation is in line with the fact thatin larger-scale model, time-variation in coeﬃcients vanishes (see Huber et al. , 2020b). However,few parameter instabilities might still be present since, apart from some exceptions, our methodsprovide improvements when compared to constant coeﬃcient models.To illustrate the forecast performance over time, Figure 4 depicts the evolution of cumulatedjoint LPBFs relative to our benchmark. Overall we ﬁnd that for all four target variable jointly ourproposed methods never forecast poorly. Both

TVP-MIX (black lines) and

TVP-POOL (green lines)methods outperform a standard TVP model with random walk state equation (

TVP-RW SINGLE denoted by the red solid line) across information sets (one exception is the

TVP-MIX SINGLE modelfor the

S-VAR ). Moreover, allowing for occasional parameter changes during and in the aftermathof ﬁnancial crisis tends to increase predictive ability.A few points are worth discussing in greater detail. First, at the beginning of the hold-outsample is characterized by the early 2000s recession. Although this was a quite short crisis, it a) FA-VAR (b)

L-VAR (c)

S-VAR l const. (Min) const. (NG) TVP−MIX TVP−POOL TVP−RW FLEX MIX−SPEC FLEX MS−POOL SINGLE (MS−POOL) SSVS MIX−SPEC

Figure 4:

Evolution of one-quarter-ahead total cumulative LPBFs relative to the benchmark.The gray dashed lines refer to the maximum/minimum Bayes factor over the full hold-out sample.The light gray shaded areas indicate the NBER recessions in the US. lready leads to a quite diverse model performance across information sets. During this episode andits consecutive three years, L-VAR s strictly dominate the other two information sets (

FA-VAR s and

S-VAR s). This implies, that for any TVP evolution assumption, the large-scale model outperforms itssmaller-scale counterparts. Moreover, during the ﬁnancial crisis, we observe a substantial increasein LPBFs for a wide range of

FA-VAR s and

L-VAR s, while for

S-VAR s we see similar improvementssolely for some TVP-VARs. This feature might indicate that TVPs are capable of mitigating apotential omitted variable bias (Huber et al. , 2020b).Second, within each information set, performance across parameter evolution assumptions ismixed. Evidently, performance of the four

L-VAR s featuring a

TVP-POOL speciﬁcation stands out(depicted by green lines). In more tranquil periods they show constant improvements and yieldsubstantial predictive gains during the ﬁnancial crisis. Especially in the aftermath of the GreatRecession, the LPBFs steadily increase compared to other large TVP-VARs. This episode alsoincludes a time characterized by a (sluggish) recovery of the US economy after the ﬁnancial crisis andthe interest rate hitting the zero lower bound. With monetary policy shifting towards unconventionalmeasures it might not only pay oﬀ to include ﬁnancial market variables, such as longer-term yields,but also to allow for occasional changes in transmission channels in these variables. Moreover,it is worth noting that the four

TVP-POOL variants tend to perform almost identical until 2010,while afterwards slight performance diﬀerences become evident. Hauzenberger et al. (2019) havemade a similar observation, when varying the hyperparameters on their conjugate prior of the stateequation.Third,

TVP-MIX methods generally forecast well for

S-VARs and

FA-VARs , while for

L-VARs onlythe

TVP-MIX SINGLE speciﬁcation yields substantial gains. In particular for

FA-VARs and

S-VAR s,a ﬂexible variance modelling (

TVP-MIX FLEX MIX and

TVP-MIX SSVS MIX ) generally pays oﬀ. For

L-VAR s these two models also yield reasonable forecast accuracy, while the

TVP-MIX MS forfeits fore-cast accuracy. Thus, for a large-scale model, the assumption that a joint indicator governs theevolution of large number of coeﬃcients might be less appropriate. Moreover, comparing

TVP-MIX with

TVP-RW speciﬁcations reveals that the random walk/white noise mixture yields gains in tranquilperiods for larger-scale models (

FA-VAR s and

L-VAR s) and does particularly well in recessions forthe small information set (

S-VAR ). Especially for small-scale VARs the

TVP-MIX variants, featuringa mixture distribution on the state innovation volatilities, greatly improve predictive performancerelative to

TVP-RW models during the ﬁnancial crisis. Moreover, it is worth noting that the

TVP-RWSINGLE model forecasts poorly for larger-scale models (

FA-VAR s and

L-VAR s) during tranquil periodsprevious to the ﬁnancial crisis, while performance slightly recovers in the middle of the Great Reces-sion. A plausible explanation for this pattern might be that spurious movements in coeﬃcients lead o overﬁtting, widening the predictive density of the TVP-RW SINGLE model. This feature is harmingpredictive accuracy in stable times, while it is to some extent helpful in times of turmoil (periodscharacterized by large outliers). In contrast to the

TVP-RW SINGLE the other three ﬂexible

TVP-RW variants forecast particularly well, suggesting that ﬂexible state innovation volatilities greatly in-crease precision for TVP coeﬃcients. In particular for

FA-VAR s these models show improved forecastaccuracy after the Great Recession.

7. CLOSING REMARKS

It is empirically well documented that macroeconomic time series feature instabilities in the paramet-ers and innovation volatilities. In the literature there is strong agreement that stochastic volatilityis important, while, especially in larger-scale models, there is less consensus for time-varying coeﬃ-cients. With increasing amount of information overall time-variation in parameters tends to reduce,but might be still present at few points in time for some parameters. Detecting such occasionalchanges is challenging and requires highly ﬂexible modeling techniques. To achieve such ﬂexibilitywe introduce mixture priors on the time-varying part of the parameters. By additionally usinghierarchical shrinkage priors on dynamic state variances, these methods are capable in imposingdynamic sparsity, as well as capturing a wide range of parameter changes. In a simulation studywe show that our methods detect both sudden and gradual changes in parameters. In an empiricalexercise we ﬁnd that some coeﬃcients tend to change abruptly in times of turmoil. Moreover, allproposed approaches forecast well. Even for large VARs ﬂexible mixture priors improve forecastaccuracy upon a wide range of benchmarks, suggesting that capturing these infrequent instabilitiespays oﬀ. EFERENCES

Aastveit KA, Carriero A, Clark TE, and Marcellino M (2017), “Have standard VARs remainedstable since the crisis?”

Journal of applied econometrics (5), 931–951. Ba´nbura M, Giannone D, and Reichlin L (2010), “Large Bayesian vector auto regressions,”

Journal ofApplied Econometrics (1), 71–92. Bauwens L, Koop G, Korobilis D, and Rombouts JV (2015), “The contribution of structural breakmodels to forecasting macroeconomic series,”

Journal of Applied Econometrics (4), 596–620. Belmonte MA, Koop G, and Korobilis D (2014), “Hierarchical shrinkage in time-varying parametermodels,”

Journal of Forecasting (1), 80–94. Bhattacharya A, Chakraborty A, and Mallick BK (2016), “Fast sampling with Gaussian scalemixture priors in high-dimensional regression,”

Biometrika asw042.

Bitto A, and Fr¨uhwirth-Schnatter S (2019), “Achieving shrinkage in a time-varying parameter modelframework,”

Journal of Econometrics (1), 75–97.

Cadonna A, Fr¨uhwirth-Schnatter S, and Knaus P (2020), “Triple the gamma–A unifying shrinkageprior for variance and variable selection in sparse state space and TVP models,”

Econometrics (2), 20. Carriero A, Clark TE, and Marcellino M (2019), “Large Bayesian vector autoregressions withstochastic volatility and non-conjugate priors,”

Journal of Econometrics (1), 137–154.

Carvalho CM, Polson NG, and Scott JG (2010), “The horseshoe estimator for sparse signals,”

Bio-metrika (2), 465–480. Chan JC, and Jeliazkov I (2009), “Eﬃcient simulation and integrated likelihood estimation in state spacemodels,”

International Journal of Mathematical Modelling and Numerical Optimisation (1-2), 101–120. Chan JC, Koop G, Leon-Gonzalez R, and Strachan RW (2012), “Time varying dimension models,”

Journal of Business & Economic Statistics (3), 358–367. Chan JC, and Strachan RW (2020), “Bayesian State Space Models in Macroeconometrics,” .

Clark T (2011), “Real-time density forecasts from BVARs with stochastic volatility,”

Journal of Businessand Economic Statistics , 327–341. Cogley T, and Sargent TJ (2005), “Drifts and volatilities: monetary policies and outcomes in the postWWII US,”

Review of Economic Dynamics (2), 262 – 302. Cross JL, Hou C, and Poon A (2020), “Macroeconomic forecasting with large Bayesian VARs: Global-local priors and the illusion of sparsity,”

International Journal of Forecasting . D’Agostino A, Gambetti L, and Giannone D (2013), “Macroeconomic forecasting and structuralchange,”

Journal of Applied Econometrics (1), 82–101. Doan T, Litterman R, and Sims C (1984), “Forecasting and conditional projection using realistic priordistributions,”

Econometric reviews (1), 1–100. Eickmeier S, Lemke W, and Marcellino M (2015), “Classical time varying factor-augmented vectorauto-regressive models-estimation, forecasting and structural analysis,”

Journal of the Royal StatisticalSociety: Series A (Statistics in Society) (3), 493–533.

Eisenstat E, Chan J, and Strachan R (2019), “Reducing Dimensions in a Large TVP-VAR,” Technicalreport.

Feldkircher M, Huber F, and Kastner G (2017), “Sophisticated and small versus simple and size-able: When does it pay oﬀ to introduce drifting coeﬃcients in Bayesian VARs?” arXiv preprintarXiv:1711.00564 . Fr¨uhwirth-Schnatter S (2001), “Markov chain Monte Carlo estimation of classical and dynamic switchingand mixture models,”

Journal of the American Statistical Association (453), 194–209. Fr¨uhwirth-Schnatter S, and Wagner H (2010), “Stochastic model speciﬁcation search for Gaussianand partial non-Gaussian state space models,”

Journal of Econometrics (1), 85–100.

George EI, and McCulloch RE (1993), “Variable selection via Gibbs sampling,”

Journal of the AmericanStatistical Association (423), 881–889.——— (1997), “Approaches for Bayesian variable selection,” Statistica sinica

Gerlach R, Carter C, and Kohn R (2000), “Eﬃcient Bayesian inference for dynamic mixture models,”

Journal of the American Statistical Association (451), 819–828. Giordani P, and Kohn R (2008), “Eﬃcient Bayesian inference for multiple change-point and mixtureinnovation models,”

Journal of Business & Economic Statistics (1), 66–77. Gneiting T, and Raftery AE (2007), “Strictly proper scoring rules, prediction, and estimation,”

Journalof the American statistical Association (477), 359–378.

Griffin J, and Brown P (2010), “Inference with normal-gamma prior distributions in regression problems,”

Bayesian Analysis (1), 171–188. Groen JJ, Paap R, and Ravazzolo F (2013), “Real-time inﬂation forecasting in a changing world,”

Journal of Business & Economic Statistics (1), 29–44. Hauzenberger N, Huber F, and Koop G (2020), “Dynamic Shrinkage Priors for Large Time- arying Parameter Regressions using Scalable Markov Chain Monte Carlo Methods,” arXiv preprintarXiv:2005.03906 . Hauzenberger N, Huber F, Koop G, and Onorante L (2019), “Fast and Flexible Bayesian Inferencein Time-varying Parameter Regression Models,” arXiv preprint arXiv:1910.10779 . Huber F, and Feldkircher M (2019), “Adaptive shrinkage in Bayesian vector autoregressive models,”

Journal of Business & Economic Statistics (1), 27–39. Huber F, Kastner G, and Feldkircher M (2019), “Should I stay or should I go? A latent thresholdapproach to large-scale mixture innovation models,”

Journal of Applied Econometrics (5), 621–640. Huber F, Koop G, and Onorante L (2020a), “Inducing sparsity and shrinkage in time-varying parametermodels,”

Journal of Business & Economic Statistics (just-accepted), 1–48.

Huber F, Koop G, and Pfarrhofer M (2020b), “Bayesian Inference in High-Dimensional Time-varyingParameter Models using Integrated Rotated Gaussian Approximations,” arXiv preprint arXiv:2002.10274 . Kalli M, and Griffin J (2014), “Time-varying sparsity in dynamic regression models,”

Journal of Eco-nometrics (2), 779 – 793.

Kastner G (2016), “Dealing with stochastic volatility in time series using the R package stochvol,”

Journalof Statistical software (5), 1–30. Kastner G, and Fr¨uhwirth-Schnatter S (2014), “Ancillarity-suﬃciency interweaving strategy (ASIS)for boosting MCMC estimation of stochastic volatility models,”

Computational Statistics & Data Analysis , 408–423. Kastner G, and Huber F (2020), “Sparse Bayesian Vector Autoregressions in Huge Dimensions,”

Journalof Forecasting forthcoming . Kim CJ, and Nelson CR (1999), “Has the US economy become more stable? A Bayesian approach basedon a Markov-switching model of the business cycle,”

Review of Economics and Statistics (4), 608–616. Kliem M, Kriwoluzky A, and Sarferaz S (2016), “On the Low-Frequency Relationship Between PublicDeﬁcits and Inﬂation,”

Journal of applied econometrics (3), 566–583. Koop G, and Korobilis D (2012), “Forecasting inﬂation using dynamic model averaging,”

InternationalEconomic Review (3), 867–886.——— (2013), “Large time-varying parameter VARs,” Journal of Econometrics (2), 185–198.

Koop G, Leon-Gonzalez R, and Strachan RW (2009), “On the evolution of the monetary policytransmission mechanism,”

Journal of Economic Dynamics and Control (4), 997–1017. Koop G, and Potter SM (2007), “Estimation and forecasting in models with multiple breaks,”

The Reviewof Economic Studies (3), 763–789. Koop GM (2013), “Forecasting with medium and large Bayesian VARs,”

Journal of Applied Econometrics (2), 177–203. Korobilis D (2013), “Assessing the transmission of monetary policy using time-varying parameter dynamicfactor models,”

Oxford Bulletin of Economics and Statistics (2), 157–179.——— (2019), “High-dimensional macroeconomic forecasting using message passing algorithms,” Journal ofBusiness & Economic Statistics

Korobilis D, and Koop G (2020), “Bayesian dynamic variable selection in high dimensions,” .

Kowal DR, Matteson DS, and Ruppert D (2019), “Dynamic shrinkage processes,”

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) . Litterman RB (1986), “Forecasting with Bayesian vector autoregressions – ﬁve years of experience,”

Journalof Business & Economic Statistics (1), 25–38. Lopes HF, McCulloch RE, and Tsay RS (2016), “Parsimony inducing priors for large scale state-spacemodels,”

Bayesian Anal . Malsiner-Walli G, Fr¨uhwirth-Schnatter S, and Gr¨un B (2016), “Model-based clustering based onsparse ﬁnite Gaussian mixtures,”

Statistics and computing (1-2), 303–324. McCausland WJ, Miller S, and Pelletier D (2011), “Simulation smoothing for state–space models:A computational eﬃciency analysis,”

Computational Statistics & Data Analysis (1), 199–212. McCracken MW, and Ng S (2016), “FRED-MD: A monthly database for macroeconomic research,”

Journal of Business & Economic Statistics (4), 574–589. Mumtaz H, and Theodoridis K (2018), “The changing transmission of uncertainty shocks in the US,”

Journal of Business & Economic Statistics (2), 239–252. Nakajima J, and West M (2013), “Bayesian analysis of latent threshold dynamic models,”

Journal ofBusiness & Economic Statistics (2), 151–164. Ng S, and Wright JH (2013), “Facts and challenges from the great recession for forecasting and macroe-conomic modeling,”

Journal of Economic Literature (4), 1120–54. Park T, and Casella G (2008), “The Bayesian Lasso,”

Journal of the American Statistical Association (482), 681–686.

Paul P (2019), “The time-varying eﬀect of monetary policy on asset prices,”

Review of Economics and tatistics Pfarrhofer M (2020), “Forecasts with Bayesian vector autoregressions under real time conditions,” arXivpreprint arXiv:2004.04984 . Polson NG, and Scott JG (2010), “Shrink globally, act locally: Sparse Bayesian regularization andprediction,”

Bayesian statistics , 501–538. Primiceri G (2005), “Time varying structural autoregressions and monetary policy,”

Oxford UniversityPress (3), 821–852. Rockova V, and McAlinn K (2018), “Dynamic variable selection with spike-and-slab process priors,”

Bayesian Analysis . Sargent TJ, and Surico P (2011), “Two illustrations of the quantity theory of money: Breakdowns andrevivals,”

American Economic Review (1), 109–28.

Sims CA, and Zha T (2006), “Were there regime switches in US monetary policy?”

American EconomicReview (1), 54–81. Stock JH, and Watson MW (2012), “Disentangling the Channels of the 2007-2009 Recession,” Technicalreport, National Bureau of Economic Research.

Uribe PV, and Lopes HF (2017), “Dynamic sparsity on dynamic regression models,”

Manuscript, availableat http://hedibert. org/wp-content/uploads/2018/06/uribe-lopes-Sep2017. pdf . Whiteman CH (1984), “Lucas on the quantity theory: Hypothesis testing without theory,”

The AmericanEconomic Review (4), 742–749. Yau C, and Holmes C (2011), “Hierarchical Bayesian nonparametric mixture models for clustering withvariable relevance determination,”

Bayesian analysis (Online) (2), 329. Zellner A (1986), “On Assessing Prior Distributions and Bayesian Regression Analysis with g Prior Distri-butions,” n Goel, P.; Zellner, A. (eds.). Bayesian Inference and Decision Techniques: Essays in Honorof Bruno de Finetti. Studies in Bayesian Econometrics and Statistics 6 . TECHNICAL APPENDIXA.1. Stochastic volatility speciﬁcation

A stochastic volatility speciﬁcations assumes that h t = log( σ t ) follows an AR(1)-process: h t = µ h + φ h ( h t − − µ h ) + ϑ t , ϑ t ∼ N (0 , ψ h ) . (A.1)Following Kastner and Fr¨uhwirth-Schnatter (2014), we assume Gaussian priors on the initial state h ∼ N (cid:16) µ h , ψ h − φ h (cid:17) and the unconditional mean µ h ∼ N (0 , ψ h +12 ∼ B (25 , .

5) and a Gamma prior on the state variance ψ h ∼ G (1 / , / ψ h pushes the speciﬁcation towards a random walk. A.2.

The Normal-Gamma prior (Griﬃn and Brown, 2010)

Similar to Bitto and Fr¨uhwirth-Schnatter (2019), we introduce class-speciﬁc global shrinkage para-meters, diﬀerentiating between the constant part of the coeﬃcients (labeled λ a ) and regime-switchingvariances (labeled λ ψ and λ ψ , respectively). In the following, specify τ j | λ j ∼ G ( (cid:37) j , (cid:37) j λ j /

2) and λ j ∼ G ( ζ, ζ ) with λ j = λ k and (cid:37) j = (cid:37) k if j ∈ P k for k = { a, ψ , ψ } . P k denotes a classiﬁer (i.e.deﬁnes the set of coeﬃcients belonging to the k th group). In the following, P a = { j : ˆ α j ∈ α } , P ψ = (cid:40) j : ˆ α j ∈ (cid:26)(cid:113) ¯ ψ i (cid:27) Ki =1 (cid:41) , and P ψ = (cid:40) j : ˆ α j ∈ (cid:26)(cid:113) ¯ ψ i (cid:27) Ki =1 (cid:41) . Moreover, we learn the hyperparameter (cid:37) k in a fully Bayesian fashion and specify ζ = 0 . A.3.

Detailed MCMC algorithm

In this section, we provide details on each sampling step of the MCMC algorithm and on the fullconditional posterior distributions. After deﬁning appropriate starting values, we iterate throughthe following steps 20 ,

000 times and discard the ﬁrst 10 ,

000 draws as burn-in:1. The sampling steps (and conditional posteriors) for ˆ α t , λ k , τ j , for k = { a, ψ , ψ } and j =1 , . . . , K and (cid:37) k are of standard form (Griﬃn and Brown, 2010):(a) Draw ˆ α from a multivariate Gaussian distribution:ˆ α | y , ˆ X , Σ , { τ j } Kj =1 ∼ N (cid:16) ˆ α , ˆ V (cid:17) . ere, ˆ X is a T × K -dimensional matrix with ˆ x (cid:48) t on the t th position and:ˆ V − = (cid:16) ( Σ − ˆ X ) (cid:48) ( Σ − ˆ X ) + diag ( τ − , . . . , τ − K ) (cid:17) , ˆ α = ˆ V (cid:16) ( Σ − ˆ X ) (cid:48) ( Σ − y ) (cid:17) . (b) Sample the local shrinkage scalings { τ j } Kj =1 from a generalized inverse Gaussian (GIG)distribution (Griﬃn and Brown, 2010): τ j | ˆ α j , λ j , (cid:37) j ∼ GIG (cid:18) (cid:37) j − , (cid:37) j λ j , ˆ α j (cid:19) , for j = { , . . . , K } . Here, λ j = λ k and (cid:37) j = (cid:37) k if j ∈ P k with k = { a, ψ , ψ } .(c) Sample the associated global shrinkage parameter λ k , for k = { a, ψ , ψ } , from a Gammadistribution distribution: λ k |{ τ j } j ∈ P k , (cid:37) k ∼ G  ζ + (cid:37) k p k , ζ + (cid:37) k (cid:88) j ∈ P k τ j  with p k denoting the cardinality of the set P k (see Appendix A.2).(d) The hyperparameter (cid:37) k , for k = { a, ψ , ψ } , are updated with a random walk MetropolisHastings (MH) step. We refer to Bitto and Fr¨uhwirth-Schnatter (2019) for details.2. Draw the normalized latent states ˜ α from a ν -dimensional Gaussian distribution by exploitingthe static representation (see Section 4).3. Draw time-varying volatilities Σ using the R package stochvol (Kastner, 2016).4. Update binary indicators in S t , depending on its law of motion. We recast state equation backin the centered parameterization and evaluate the following regime-switching speciﬁcation: α t =  α + γ t + ¯ φ ( α t − − α ) + ς t , ς t ∼ N ( , ¯ Ψ ) if s t = 0 , α + γ t + ¯ φ ( α t − − α ) + ς t , ς t ∼ N ( , ¯ Ψ ) if s t = 1 , (A.2)with ¯ φ = K × K , ¯ φ = I K and γ t = K × for the TVP-MIX model. For the

TVP-POOL modelwe set ¯ φ = ¯ φ = K × K . • s t follows a ﬁrst-order MS process ( MS ): The GIG( a, b, c ) is parameterized as p ( x ) ∝ x a − exp {− ( bx + c/x ) / } . a) Conditional on the other parameters in Equation A.2, we follow Kim and Nelson(1999) and sample { s t } Tt =1 using standard algorithms.(b) Conditional on { s t } Tt =1 we update transition probabilities by sampling p ∼ B ( T + c , T + c ) and p ∼ B ( T + c , T + c ) both from a Beta distribution with T kl , denoting the number of transitions from the k th to the l th regime. • Covariate-speciﬁc indicators with { s it } Ki =1 independent over time ( MIX ):(a) Conditional on the other parameters in Equation A.2 we evaluate both regimes inEquation A.2 and sample s it for each period and covariate independently from aBernoulli distribution.(b) Conditional on { s it } Tt =1 we are able to update the success probability for each co-variate by sampling from a Beta distribution p i ∼ B ( T i, + c i, , T i, + c i, ), for i = { , . . . , K } , with T i,k denoting the number of periods in the k th regime.5. For the speciﬁcation with a hierarchical prior on ˜ γ t and φ = φ = K × K , we need ﬁve addi-tional sampling steps (details can be found in Malsiner-Walli et al. (2016) and Hauzenberger et al. (2019)):(a) Draw mixture weights ω from a Dirichlet distribution: ω | θ , ξ ∼ D ir( ξ , . . . , ξ N ) , with θ = ( θ , . . . , θ T ) (cid:48) and ξ n = ξ + T n , where T n denotes the number of periods assignedto group n .(b) Update the hyperparemter ξ with random walk Metropolis-Hastings step (for details, seeMalsiner-Walli et al. , 2016).(c) Sample group indicators θ t ∈ { , . . . , N } for each ˜ α t from a Multinomial distribution:P( θ t = n | ω n , ˜ µ n ) ∝ ω n f N ( ˜ α t | ˜ µ n , I K ) , for n = { , . . . , N } . (d) The full conditional posterior of ˜ µ = vec( ˜ µ , . . . , ˜ µ N ) follows a multivariate Gaussiandistribution:˜ µ | Λ , θ ∼ N ( c , Λ ) , ith θ = ( θ , . . . , θ T ) (cid:48) and: Λ = (cid:0) I K ⊗ Ξ (cid:48) Ξ + I N ⊗ Λ − (cid:1) − , c = Λ (cid:16) vec( Ξ (cid:48) ˜ A ) (cid:17) . Here, Ξ denotes a T × N matrix with ( t, n ) th element given by I ( θ t = n ), where I ( • )refers to the indicator function and ˜ A collects ˜ α in a T × K matrix.(e) Sample shrinkage parameters { l j } Kj =1 from a GIG distribution: l j | R , { ˜ µ n } Nn =1 ∼ GIG (cid:32) e − N , e , (cid:80) Nn =1 ˜ µ jn r j (cid:33) . with ˜ µ jn , for n = { , . . . , N } , denoting the j th element of ˜ µ n . A.4.

The spectral decomposition

To obtain a time-varying low-frequency measure between two endogenous variable, we follow Sargentand Surico (2011) and Kliem et al. (2016). We therfore recast a TVP-VAR model in its companionform: Y t = J Z t Z t = F t Z t − + E t , E t ∼ N ( , Υ t )In the following, the spectral density of Y t at the very low frequency ρ = 0 is given by: Π t ( ρ = 0) = J (cid:0) I mp +1 − F t ) Υ t ( I mp +1 − F (cid:48) t ) − (cid:1) J (cid:48) . For ρ = 0 the low-frequency relationship π ij,t between two variables ( Y it , Y jt ) ∈ Y t can be derivedwith: π ij,t = Π ij,t ( ρ = 0)Π jj,t ( ρ = 0)with Π ij,t denoting the ( i, j ) th element in Π t . . ADDITIONAL FORECASTING RESULTS i. One-year-ahead ii.

Two-years-ahead (a)

FA-VAR (b)

L-VAR (c)

S-VAR l const. (Min) const. (NG) TVP−MIX TVP−POOL TVP−RW FLEX MIX−SPEC FLEX MS−POOL SINGLE (MS−POOL) SSVS MIX−SPEC

Figure B.1:

Evolution of one- and two-year-ahead total cumulative LPBFs relative to thebenchmark. The gray dashed lines refer to the maximum/minimum Bayes factor over the fullhold-out sample. The light gray shaded areas indicate the NBER recessions in the US. able B.1: Density forecast performance (CRPS ratios) relative to the benchmark ( const (Min.) ). The red shaded row denotes the benchmark(and its CRPS values). Asterisks indicate statistical signiﬁcance for each model relative to const (Min.) at the 1 ( ∗∗∗ ), 5 ( ∗∗ ) and 10 ( ∗ ) percentsigniﬁcance levels. Speciﬁcation 1-quarter-ahead 1-year-ahead 2-years-ahead

Class Subclass TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS TOT GDPC1 CPIAUCSL UNRATE FEDFUNDS

FA-VAR const. (Min.) 0.92 0.85 0.97 0.89 1.10 0.90** 0.85** 1.02 0.81 0.94 0.88** 0.98 0.97 0.87 0.71***const. (NG) 0.90** 0.82** 0.97 0.89 1.04 0.89** 0.84** 1.00 0.82 0.89 0.88** 0.97 0.97 0.87 0.72**TVP-MIX FLEX MIX 0.89*** 0.82** 0.96 0.89 0.85* 0.91** 0.89 0.98 0.86 0.88 0.97 0.95 1.03 0.99 0.89FLEX MS 0.90** 0.84* 0.97 0.85 0.88 0.91* 0.89* 1.02 0.82 0.85 0.95 1.00 1.00 0.94 0.87SINGLE 0.90** 0.84* 0.98 0.88 0.80** 0.90** 0.88** 1.01 0.83 0.79** 0.96 0.98 1.04 0.96 0.82SSVS MIX 0.89** 0.83* 0.97 0.88 0.92 0.91** 0.88** 1.01 0.83 0.88 0.95 0.97 1.05 0.90 0.89TVP-POOL FLEX MIX 0.88** 0.80** 0.97 0.86 0.95 0.87** 0.81** 1.00 0.79 0.85 0.86* 0.94* 0.97 0.82 0.70**FLEX MS 0.88*** 0.80** 0.96 0.86 0.95 0.87**

TVP-RW FLEX MIX 0.89** 0.85* 0.95 0.85 0.82** 0.87* 0.87** 0.99 0.78

L-VAR const. (Min.) 0.94 0.87 1.03 0.83 1.10 0.90** 0.95* 0.95*** 0.79 0.87 0.94** 1.03 0.95 0.94 0.83***const. (NG) 0.89** 0.82*** 0.97 0.81 0.99 0.85*** 0.89**

S-VAR const. (Min.) 0.24 0.44 0.41 0.07 0.05 0.38 0.54 0.44 0.31 0.22 0.49 0.48 0.46 0.58 0.44const. (NG) 0.98** 0.98* 0.99 1.00 1.00 0.96** 0.94** 0.97** 0.96** 0.96 0.97 0.99 0.96 0.96** 0.95TVP-MIX FLEX MIX 0.92*** 0.92** . DATA In this section we provide further details on the variable used for the large-scale VAR (

L-VAR ).Table C.1 lists the exact description and provides further information on the transformation of theindicators. The gray shaded rows denote our target variables.

Table C.1:

Data for the US is obtained from the FRED data base of the Federal Reserve of St. Louis.The column

Transformation shows the transformation applied to each variable. Following McCracken andNg (2016), (1) implies no transformation, (5) denotes growth rates, deﬁned as log ﬁrst diﬀerences ln (cid:16) x t x t − (cid:17) and (7) denotes diﬀerences in percentage changes with ∆ (cid:16) x t − x t − x t − (cid:17) . All variables are standardized bysubtracting the mean and dividing by the standard deviation. FRED.Mnemonic Description TransformationGDPC1 Real Gross Domestic Product 5PCECC96 Real Personal Consumption Expenditures 5FPIx Real private ﬁxed investment 5GCEC1 Real Government Consumption Expenditures and Gross Investment 5INDPRO IP:Total index Industrial Production Index (Index 2012=100) 5CE16OV Civilian Employment (Thousands of Persons) 5UNRATE Civilian Unemployment Rate (Percent) 1CES0600000007 Average Weekly Hours of Production and Nonsupervisory Employees: Goods-Producing 1HOUST Housing Starts: Total: New Privately Owned Housing Units Started 5PERMIT New Private Housing Units Authorized by Building Permits 5PCECTPI Personal Consumption Expenditures: Chain-type Price Index 5GDPCTPI Gross Domestic Product: Chain-type Price Index 5CPIAUCSL Consumer Price Index for All Urban Consumers: All Items 5CES0600000008 Average Hourly Earnings of Production and Nonsupervisory Employees 5FEDFUNDS Eﬀective Federal Funds Rate (Percent) 1GS1 1-Year Treasury Constant Maturity Rate (Percent) 1GS10 10-Year Treasury Constant Maturity Rate (Percent) 1TOTRESNS Total Reserves of Depository Institutions 5NONBORRES Reserves Of Depository Institutions, Nonborrowed 7S.P.500 S & P’s Common Stock Price Index: Composite 5

For the factor-augmented VAR (

FA-VAR ) we consider the full data set, compromising 165 vari-ables. For brevity we refer to McCracken and Ng (2016) for a detailed description and transformationcodes. All variables, serving as a basis for the principal components, are transformed to station-arity as suggested in McCracken and Ng (2016). Finally, we standardise the data by demeaningeach variable and dividing through the standard deviation. Especially for principal componentsstandardising is important due to the scale variance of the components.) we consider the full data set, compromising 165 vari-ables. For brevity we refer to McCracken and Ng (2016) for a detailed description and transformationcodes. All variables, serving as a basis for the principal components, are transformed to station-arity as suggested in McCracken and Ng (2016). Finally, we standardise the data by demeaningeach variable and dividing through the standard deviation. Especially for principal componentsstandardising is important due to the scale variance of the components.