[PDF] A Factor-Augmented Markov Switching (FAMS) Model

Abstract

This paper investigates the role of high-dimensional information sets in the context of Markov switching models with time varying transition probabilities. Markov switching models are commonly employed in empirical macroeconomic research and policy work. However, the information used to model the switching process is usually limited drastically to ensure stability of the model. Increasing the number of included variables to enlarge the information set might even result in decreasing precision of the model. Moreover, it is often not clear a priori which variables are actually relevant when it comes to informing the switching behavior. Building strongly on recent contributions in the field of factor analysis, we introduce a general type of Markov switching autoregressive models for non-linear time series analysis. Large numbers of time series are allowed to inform the switching process through a factor structure. This factor-augmented Markov switching (FAMS) model overcomes estimation issues that are likely to arise in previous assessments of the modeling framework. More accurate estimates of the switching behavior as well as improved model fit result. The performance of the FAMS model is illustrated in a simulated data example as well as in an US business cycle application.

Full PDF

AA Factor-Augmented Markov Switching (FAMS)Model ∗ GREGOR ZENS and MAXIMILIAN BÖCK

Vienna University of Economics and Business

Abstract

This paper investigates the role of high-dimensional information sets in the context of Markovswitching models with time varying transition probabilities. Markov switching models are com-monly employed in empirical macroeconomic research and policy work. However, the informationused to model the switching process is usually limited drastically to ensure stability of the model.Increasing the number of included variables to enlarge the information set might even result indecreasing precision of the model. Moreover, it is often not clear a priori which variables areactually relevant when it comes to informing the switching behavior. Building strongly on recentcontributions in the ﬁeld of factor analysis, we introduce a general type of Markov switching autore-gressive models for non-linear time series analysis. Large numbers of time series are allowed toinform the switching process through a factor structure. This factor-augmented Markov switching(FAMS) model overcomes estimation issues that are likely to arise in previous assessments ofthe modeling framework. More accurate estimates of the switching behavior as well as improvedmodel ﬁt result. The performance of the FAMS model is illustrated in a simulated data exampleas well as in an US business cycle application.

Keywords:

Bayesian analysis, factor models, Markov switching, business cycles.

JEL Codes:

C11, C24, C38, E32, E37 ∗ Address: Department of Economics, Vienna University of Economics and Business. Welthandelsplatz 1, 1020 Vienna,Austria. E-mail: [email protected] and [email protected] . Date : 7th May 2019. a r X i v : . [ ec on . E M ] M a y . ZENS & M. BÖCK Since the late 1980s, Markov switching (MS) models have been widely employed by economic re-searchers to analyze various quantities of interest. Early studies analyze for instance stock marketreturns (Pagan and Schwert, 1990) or asymetries over the business cycle (Hamilton, 1989). It has beenwidely acknowledged that MS models are rather useful in capturing nonlinear time series characterist-ics as compared to more classical approaches such as standard ARMA models (Hamilton, 1994). Asa result, MS models have gained in popularity and are commonly found in empirical macroeconomicresearch (Hamilton, 2010).Despite the popularity of the model family, the standard MS framework does not come without criti-cism. As pointed out by Kaufmann (2015), a main critique is the assumption of exogenous transitionprobabilities. This assumption implies that there is no explicit interpretation of the process drivingthe switching behavior of the model. Moreover, the most commonly applied prior framework a priorifavors the model switching to the states that are visited most often. This approach leads to a modelthat neglects further information as for instance macroeconomic conditions that are readily availableto the researcher.This criticism has lead to a rather popular extension of the standard MS model that uses time-varyingtransition probabilities arising from a prior structure taking the form of a (multinomial) logit or probitspeciﬁcation (Filardo, 1994; Meligkotsidou and Dellaportas, 2011; Kaufmann, 2015). This modiﬁedprior setup eﬀectively enables relevant exogenous variables to inform the switching behavior of themodel. Kaufmann (2015) discusses Bayesian estimation and models a Philipps curve in a MS frame-work.While in general this method is a viable and useful extension of the standard MS framework, it comeswith some unpleasant downsides, especially when dealing in data rich environments like macroeco-nomics or ﬁnance. The (multinomial) probit or logit regression that forms the prior distribution of thetransition probabilities is known to perform rather poorly when the set of predictors becomes too largeas discussed in Zahid and Tutz (2013), Ranganathan et al. (2017) and de Jong et al. (2019). This oftendestabilizes the modeling framework and makes the eﬀort to model regime switching using exogenousregressors cumbersome. In addition, it is often not clear a priori which variables are relevant to includewhen modeling switching behavior in a MS setting. In principal, this problem can be overcome byshrinkage priors on the coeﬃcients of the logit/probit regression as demonstrated in Zens (2019) forthe related family of mixture-of-experts models. However, although variable selection might work wellfor a medium sized set of predictors, it is likely to become problematic when the number of regressorsbecomes too large compared to the available information in the data. Even more importantly, a variableselection approach does not resolve multicollinearity issues that commonly arise in macroeconomics and ﬁnance, where various time series describe very similar underlying processes of an economy. Forinstance, quarterly gross domestic product and quarterly industrial production are likely to show a largeamount of comovement. Variable selection is prone to be not well behaved in such cases (George,2010). Thus, the researcher is required to come up with a pre-selection of relevant variables. Thisprocess is usually rather arbitrary and likely to severely reduce the information set available to informthe switching behavior of the model.As a result, most articles dealing with MS models in economics and ﬁnance restrict the number ofvariables that inform the switching process to a relatively small number. As pointed out by Bernanke et al. (2005), a small number of variables will most probably not be able to span the informationset consisting of hundreds of time series available to the researcher, to central banks and to ﬁnancialmarket participants. This leads to two potential problems when employing these ”restricted” data sets.The ﬁrst problem relates to policy analysis and forecasting exercises. It is well known that despitegood in-sample predictions, out of sample forecasts of MS models regularly fail to consistently beatsimple benchmark processes (Engel, 1994; Boot, 2017). In these cases, the models might be missingimportant information due to restrictions in the estimation process. However, as shown by Stock andWatson (2002), the forecasting ability of models using factor augmentation increases signiﬁcantly ascompared to models using ”restricted” information sets. Second, the eﬀect of certain variables ofinterest to researchers and policy makers cannot be evaluated in the ”restricted” framework as they arenot explicitly included in the model.To overcome above mentioned issues, we oﬀer a modeling framework that combines standard MSmodels with recent developments in factor modeling. In spirit, this article is closely related to theFAVAR model developed in Bernanke et al. (2005) and builds on the sparse factor model outlined inKastner (2019). The factor augmented Markov switching model (FAMS) that we derive in this articleeﬀectively summarizes large amounts of information about the economy and ﬁnancial markets in asmall number of estimated factors. Augmenting the model to inform the switching process in a MSframework provides a rather appealing solution to computational and statistical problems arising whenusing a large number of variables in this modeling framework.This article summarizes the key points of the statistical framework necessary to estimate FAMS modelsin a fully Bayesian setup through Gibbs sampling. We demonstrate the ability of this model usingautoregressive processes with possibly switching means and switching variances. Applying the modelto an artiﬁcial data set, we ﬁnd that the large information set included in FAMS models results inimproved in-sample ﬁt and more precise state allocations as compared to standard MS models makinguse of the full data set or random subsets of the data. The real world abilities of FAMS models are For instance, Kaufmann (2015) uses one exogenous variable and Meligkotsidou and Dellaportas (2011) uses six variablesto inform the switching process.

3. ZENS & M. BÖCK demonstrated through an application, where we aim to identify US business cycles through informingthe switching process with a vast amount of macroeconomic time series.Our contribution is thus twofold. The FAMS model enables it to conveniently estimate MS modelswhere the switching process is inﬂuenced by a lower dimensional representation of a large numberof possibly collinear candidate variables. Moreover, a shrinkage prior is imposed to conduct variableselection in cases where some factors might be irrelevant for the regime switching behavior of the model.The remainder of this article is organized as follows. In Section 2 we formulate the general modelingframework for the FAMS model. Section 3 discusses Bayesian estimation of the model using MCMCmethods. Section 4 presents the beneﬁts of the FAMS model via a simulation study. In Section 5, ahigh-dimensional macroeconomic data set is used in an application to estimate business cycles in theUnited States. A brief conclusion is provided in Section 6.

We consider a general Markov switching autoregressive model (MS-AR) of order p with time-varyingtransition probabilities. This model has been thoroughly discussed in Kim and Nelson (1999, Ch. 4;Ch. 9) and Frühwirth-Schnatter (2006, Ch. 12). Let { y t } Tt = be an observed time series arising fromthe data-generating process (DGP) y t = µ S t + p (cid:213) j = φ j , S t y t − j + (cid:15) t , (cid:15) t ∼ N ( , σ S t ) , (2.1)where { S t } Tt = is assumed to be a hidden discrete-time state process with ﬁnite state space H = { , . . . , H } . In the most general setup, the intercept parameter µ S t , the persistence parameters φ j , S t and the error variance parameter σ S t are state-dependent. First, we collect all our parameters as θ = { θ h | θ h = ( µ h , φ , h , . . . , φ p , h , σ h ) (cid:62) , h = , . . . , H } . Thus, it is straightforward to see that if theprocess is in state k ∈ H in time period t , parameter values change accordingly: θ k = ( µ k , φ , k , . . . , φ p , k , σ k ) if S t = k . (2.2)The peculiarity of Markov switching models is that their switching is determined completely by astochastic process. The state indicator vector S can thus be seen as an irreducible, aperiodic Markovchain describing a sequence of events where the probability of each state only depends on the stateattained in previous rounds. We assume a ﬁrst order Markov process, where the process has only amemory of one period: Pr [ S t = j | S t − = h t − , S t − = h t − , . . . , S = h ] = Pr [ S t = j | S t − = k ] . (2.3)Therefore, the transitions between two states are described by probabilities that are summarized in atransition matrix Ξ t of dimensions H × H . In this transition matrix, each element ξ jk , t suﬃcientlydescribes the stochastic behaviour, i.e. the transition from state k to state j . Note that the matrix Ξ t isrow-standardized. Furthermore, the transition matrix Ξ t features a time index t since we assume thatthe switching process is not constant over time. It is modeled as dependent on a set of r exogenousfactors captured in { Z t } Tt = . These variables are not entering the model directly in the autoregressiveprocess, but are rather expected to drive the transition dynamics indirectly. Therefore, they are re-sponsible for altering the state allocation and thus the estimation of our state-dependent parameters in θ . Moreover, a delay parameter d is introduced for convenience in cases where the exogenous factors Z t are used as leading indicators. The estimation of Z t is covered in section Sec. 2.2.The multinomial logit link is used to model the inﬂuence of the factors Z t on the transition probabilities ξ jk , t in a way such that Pr [ S t = j | S t − = k , Z t − d , γ ] = ξ jk , t = exp ( γ jk + β jk Z t − d ) (cid:205) Hl = exp ( γ jl + β jl Z t − d ) , (2.4)where j , k = , . . . , H and γ jk and β jk are the coeﬃcients of the intercepts and the factors, respectively.Moreover, we follow Kaufmann (2015) and specify Z t = ˜ Z t − ¯ Z i in a centered way for two reasons.Besides the fact that it deﬁnes the mean ¯ Z i as an arbitrary threshold level, more importantly, the time-invariant part of the transition probabilities γ jk does then not depend on the scale of ˜ Z t . Otherwise,this would have to be taken into account when choosing a prior distribution for the parameter.For identiﬁcation purposes, it is necessary to deﬁne a baseline state h ∈ H in which the parametersare assumed to be zero, i.e. ( γ jh , β jh ) =

0. This yields Pr [ S t = j | S t − = h , Z t − d , γ ] = + (cid:205) l ∈H − h exp ( γ jl + β jl Z t − d ) , where H − h denotes all states but the reference state h .In order to improve the eﬃciency of the estimates, we follow Amisano and Fagan (2013) in imposing therestriction of common slope coeﬃcients across states, β jk = β k . This translates into the assumption thatthe eﬀect of Z t diﬀers only by the state attained in the previous time period. For notational convenience,we collect the multinomial logit coeﬃcients in the vectors γ = { γ h | γ h = ( γ h , . . . , γ Hh ) (cid:62) , h = , . . . , H } and β = { β h , h = , . . . , H } and gather all parameters in ϑ = { θ, γ, β } .

5. ZENS & M. BÖCK

In the above mentioned modeling framework, it is possible to proceed by estimating the model usinga given subset of the full set of exogenous covariates X t to inform the switching process through theprior speciﬁcation. However, there is a variety of scenarios where additional relevant information maybe available that is not incorporated in this subset. Suppose we can ﬁnd a set of latent factors Z t thatcondenses a large part of the available information into a small number of time series. These factorsmight represent more diﬀuse concepts such as ”political instability” or ”credit market sentiments”.These concepts are generally diﬃcult to capture in one or two time series. The task of ﬁnding Z t given X t can be dealt with using factor analysis frameworks (see Lopes and West, 2004 or Lopes, 2014 fora comprehensive overview). The setup used in this article is closely following Kastner (2019) whoproposes a time varying sparse factor model. It can be summarized as X t | Λ , Z t , U t ∼ N ( Λ Z t , U t ) Z t | V t ∼ N ( , V t ) (2.5)where Z t is the set of latent factors we want to recover for use in the MS model, Λ is a m × r factorloadings matrix and U t and V t are diagonal matrices discussed in more detail below. The m scaled andcentered time series X t = ( X t , . . . , X mt ) describing the economy are assumed to follow a conditionalGaussian distribution, i.e. X t | Σ t ∼ N m ( , Σ t ) , (2.6)where N m (·) denotes a m -dimensional normal distribution. To achieve dimensionality reduction, the m × m covariance matrix Σ t is assumed to decompose into a m × r factor loadings matrix Λ as well astwo diagonal matrices V t ( r × r ) and U t ( m × m ) as follows: Σ t = Λ V t Λ (cid:48) + U t . (2.7)Following Kastner (2019), Λ is assumed to be constant over time. The factor variances in V t and the idiosyncratic variances in U t are evolving over time through stochastic volatility models(Kastner and Frühwirth-Schnatter, 2014). Speciﬁcally, let U t = diag ( exp ( g t ) , . . . , exp ( g mt )) and V t = diag ( exp ( h t ) , . . . , exp ( h rt )) . Then the log variances are modeled through autoregressive pro-cesses of the form g it ∼ N ( µ g , i + φ g , i ( g it − − µ g , i ) , σ g , i ) , i = , . . . , m (2.8)and h jt ∼ N ( φ h , j h jt − , σ h , j ) , j = , . . . , r . (2.9)That is, the idiosyncratic variances follow a centered AR(1) process and the factor variances follow anautoregressive process with zero mean to identify the scaling of the factors. For further details on thefactor model described here, refer to Kastner (2019). For further topics in factor modeling includingestimation and identiﬁcation issues, see for instance Lopes (2014) or Kaufmann and Schumacher(2019). Bayesian estimation requires the speciﬁcation of adequate prior distributions. This section gives anoverview of prior distributions both in the Markov switching model (Sec. 2.1) and the factor model(Sec. 2.2) that form the proposed model setup.For the Markov switching AR(p) model, independent prior distributions for all parameters π ( ϑ ) = π ( α ) π ( φ ) π ( σ ) π ( γ ) π ( β ) are assumed. Since the speciﬁcation in Eq. (2.1) is piece-wise linear, wespecify a normal prior distribution for the mean and persistence parameters and an inverse Gammadistribution for the variance parameters µ ∼ H (cid:214) h = π ( µ h ) = H (cid:214) h = N ( m , M ) ,φ j ∼ H (cid:214) h = π ( φ j , h ) = H (cid:214) h = N ( r , R ) , ∀ j = , . . . , p ,σ ∼ H (cid:214) h = π ( σ h ) = H (cid:214) h = IG ( c , d ) . (3.1)Regarding the hyperparameters, we choose m = r = M =

10 and R = c = d =

1, we use an informative prior distribution on the varianceparameters in order to regularize the variances suﬃciently away from zero to circumvent singularities.In the multinomial logit, seperate prior distributions for the intercepts γ and the coeﬃcients β arespeciﬁed. For the intercepts, we assume γ ∼ H (cid:214) h ∈H - h π ( γ h ) = H (cid:214) h ∈H - h N ( g , h , G ) , (3.2)where g , h = G =

7. ZENS & M. BÖCK

Generally speaking, some factors might be more prone to capture information relevant to the switchingprocess, whereas other factors might mainly introduce noise to the model. To overcome these issues,a normal gamma shrinkage prior (Polson and Scott, 2010) is chosen for the coeﬃcients β of theexogenous factors. Therefore, the prior on each element for each state of the coeﬃcient vector can bewritten as β i , h | ψ i , h ∼ N ( , λ ψ, h ψ i , h ) , ψ i , h ∼ G ( ω ψ , ω ψ ) , ∀ h ∈ H - h , ∀ i = , . . . , r . (3.3)Now we turn to the speciﬁcation of prior distributions for the factor model. For the elements of thefactor loadings matrix, again a normal gamma prior is employed to achieve sparsity: Λ i , j | τ i , j ∼ N ( , λ τ, i τ i , j ) , τ i , j ∼ G ( ω τ , ω τ ) , ∀ i = , . . . , m ∀ j = , . . . , r (3.4)where the global shrinkage parameter λ τ, i is deﬁned per row of the factor loadings matrix. Thus,this setup implies that each time series has a high a priori probability of not loading on any factor.Similarly, the global shrinkage parameter λ ψ, h is deﬁned per equation of the multinomial logistic priorto allow varying levels of shrinkage across states in the MS model. In choosing prior distributionson the stochastic volatility parameters, we strongly follow Kastner (2019). Since both the stochasticvolatility process of the factor variances and the idiosyncratic variances have own persistence andvariance parameters, a subscript j ∈ { g , h } indicates to which group of processes they belong. Thepriors can then be summarized as µ g ∼ N ( , ) ,φ j + ∼ B ( b , b ) ,σ j ∼ G ( / , /( B σ )) , (3.5)where the function of persistence parameters φ j follows a Beta distribution to ensure stationarity.Finally, it is useful to impose hyperpriors on the hyperparameters in Eq. (3.3) and Eq. (3.4). In general,normal gamma type shrinkage priors (Polson and Scott, 2010; Griﬃn and Brown, 2010) enable implicitvariable selection in a sophisticated way by specifying a continous prior distribution with a rather largeprobability mass on zero and fat tails. The degree of shrinkage is inﬂuenced by two parameters: aglobal shrinkage parameter λ j ( j ∈ { ψ, τ }) and a local shrinkage parameter ψ or τ . As seen in Eq. (3.3)and Eq. (3.4), an additional layer of prior distributions on the local shrinkage parameters is introduced.Both ψ and τ are assumed to follow a Gamma distribution with shape and scale hyperparameters ω j ( j ∈ { ψ, τ }) . Note that by deviating from specifying the shape and scale parameters to be equal, itis not possible to identify either the global shrinkage parameter or the parameters of the prior for the local shrinkage component. A choice of ω j = λ j , j ∈ { ψ, τ } : λ j ∼ G ( c j , c j ) , (3.6)where the hyperparameters for the prior on β are set to c ψ = c ψ = .

01. This implies heavy shrinkageon β . For reference, see for instance Bitto and Frühwirth-Schnatter (2019). The hyperparameter valuesin the factor model are chosen in accordance with the standard values proposed by Kastner (2016).They imply less heavy shrinkage on the elements of the factor loadings matrix Λ . Estimation of the model is carried out via MCMC sampling using a variety of data augmentationtechniques. Bayesian estimation and inference with a Gibbs sampling algorithm requires the combin-ation of the likelihood and the proposed priors to produce conditional posterior distributions for allparameters. The employed algorithms closely follow the algorithms described in Frühwirth-Schnatter(2006, Ch. 11) and Kim and Nelson (1999, Ch. 9). Since the FAMS model is a variant of a statespace model, the data augmentation techniques introduced by Carter and Kohn (1994) and Frühwirth-Schnatter (1994) are viable candidates for performing estimation. Sequentially sampling from theconditional posterior distributions after convergence of the algorithm is achieved then allows posteriorparameter inference to take place. Generally speaking, the sampling scheme employed in this paper isdesigned to iterate over the following three steps:(i)

Classiﬁcation . Conditional on the estimated parameters, the ﬁltering approach proposed byHamilton (1989), a forward-ﬁltering backward-sampling scheme, is employed to classify eachobservation t into one of the H states by sampling the state indicator from the posterior distri-bution p ( S | y , Z , ϑ ) .(ii) Estimation . Conditional on the state indicators vector S , the regime-speciﬁc parameters ( θ , . . . , θ H ) and ( γ, β ) are conditionally independent. Therefore, we rely on standard Bayesiantechniques for linear models to draw from the posterior p ( θ | S , y ) and use the partial dRUMapproach described in Frühwirth-Schnatter and Frühwirth (2010) to simulate the multinomiallogit coeﬃcients p ( γ, β | S , Z ) .(iii) Identiﬁcation . Label switching is a widely known issue one has to consider when working withmixture and Markov switching models. Since this is a non-trivial issue, a discussion is providedin Section Sec. 3.3.

9. ZENS & M. BÖCK

For estimating the factor stochastic volatility related quantities, we use the R -package factorstochvol provided by Kastner (2016). This allows us to make use of the eﬃcient implementation using inter-weaving techniques. For ease of reference, estimation of the multinomial logistic prior speciﬁcationusing the partial dRUM sampler is discussed in detail in App. A. Posterior simulation resulting from anormal gamma prior setup is brieﬂy discussed in App. B. The remaining simulation steps are standardand thus not discussed in detail.

Parameter estimation in the family of mixture and Markvo switching models can suﬀer from variousdiﬃculties, especially in a Bayesian framework. Label switching is a common issue in mixture mod-eling (Hurn et al. , 2003; Jasra et al. , 2005). It is the result of the likelihood function being invariantto relabeling the components due to multimodality as discussed in Redner and Walker (1984). Thiscan lead to problems as label switching during MCMC sampling might result in possibly distorted,multimodal posterior distributions that are in general diﬃcult to summarize. Deriving posterior meansor other point estimates based on these posteriors then becomes inappropriate (Stephens, 2000b). Thesame rationale holds true for Markov switching models.Early references for relabeling algorithms include Celeux et al. (1996). However, this algorithm re-quires known true parameter values, which renders it not very useful in real data applications. Stephens(2000a) suggests an algorithm that relabels the draws in a way such that the (marginal) parameter pos-terior distributions are as unimodal as possible. Stephens (2000b) provides a literature review as wellas a decision theoretic framework to deal with label switching.In the simulation studies and application presented below, we make use of identifying restrictionsto identify the sampler (see for instance Lenk and DeSarbo, 2000). This completes the simulationsetup and the description of the employed estimation techniques of the FAMS model. The followingsections discuss the model in the context of artiﬁcial data as well as an application to US businesscycle estimation. In this preliminary version of the paper we choose to implement the FAMS model in a two step procedure. In a ﬁrst step,we simulate from the posterior distributions of the factors. The resulting posterior means are then used as explanatoryvariables that inform the switching process in the MS framework. A fully Bayesian approach that includes the uncertaintyaround the factor posterior means will thus result in slightly higher uncertainty around other model estimates and will beincluded in future versions of the paper. Knowing that this is not ideal (Frühwirth-Schnatter, 2004), a random permutation sampler in combination with a post-processing procedure to relabel the draws will be implemented in future versions of this paper.

To test the performance of the proposed modeling framework, synthetic data sets are generated usinga recursive procedure. The factor model outlined in Sec. 2.2 is used to simulate data under theassumption that the true DGP is driven by the factors. Hence, we proceed in the following manner:In a ﬁrst step, we generate r = f i , t = . f i , t − + η i , t , η t ∼ N ( , ) . i = , . . . , r , (4.1)centered around zero. From these, 200 time series are generated using the DGP outlined in sectionSec. 2.2. The AR coeﬃcients φ j ( j ∈ { g , h }) of the log variances are simulated from U(-0.8,0.8)and the means µ g are assumed to follow a normal distribution with N(0.2,0.2). The variances σ j ( j ∈ { g , h }) are simulated from |N(0.2,0.2)|.After that, three factors are estimated from the generated time series. Estimation is based on 80.000draws, where the ﬁrst 40.000 draws are discarded as burn-in. The true factors F t = ( f , t , f , t , f , t ) T areused to generate transition probabilities ξ jk , t = exp ( γ jk + β k F t ) (cid:205) Hl = exp ( γ jl + β l F t ) , (4.2)which are then employed to draw the states S t of the process. Conditional on knowing the states, theﬁnal time series y t can be generated using the process y t = µ S t + φ y t − + (cid:15) t (cid:15) t ∼ N ( , σ S t ) . (4.3)In this setup, H = H = { , } . We choose γ j = ( . , − . ) T and set γ j = ( , ) T for identiﬁcation. Furthermore, β = (− . , . , . ) T . Again, the second column of β isset to zero due to identiﬁcation reasons. The parameters in the AR model are set to µ = (− . , . ) T , φ = .

55 and σ = ( . , . ) T . Note that this setup allows to identify the sampler using identifyingrestrictions. Furthermore, the parameters are carefully chosen in a way such that a lot of time variationin the transition probabilities results and that the data is comparable to real world examples in econom-ics and ﬁnance. This setting is used to generate N =

100 diﬀerent datasets, each with T = We proceed like this for two reasons. The ﬁrst argument is a theoretical one, where we argue that it is the factors whichconvey the information on whether transition probabilities should change or not. Second, from an econometric point ofview generating probabilities from multinomial logit models using about 200 time series is neither advisable nor applicable.The stability of the exercise depends strongly on the amplitude of the eﬀects and it is extremely tedious to generate a setupwhere a simulation study can be conducted in a controlled setup.

11. ZENS & M. BÖCK tercept only” model serves as baseline model in what follows below. In the second model, the fullinformation set of 200 time series is used to model the transition probabilities. The third model usesall 200 time series as well, however, has a shrinkage prior placed on β to induce sparsity. The fourthmodel corresponds to the FAMS model without normal gamma prior. Finally, the ﬁfth model is theFAMS model including a normal gamma prior. To gain even more insight, all models are estimatedwith varying degrees of shrinkage induced by the normal gamma prior. For the sake of brevity, we ﬁxthe number of factors in the estimation to the true number of factors.For each run of the MCMC sampler, M = ,

000 draws are kept after a burn-in phase of 50 , RM SE ˆ y = N N (cid:213) n = q (cid:32)(cid:118)(cid:117)(cid:116) T T (cid:213) t = ( y − ˆ y ( m ) ) (cid:33) . (4.4)The second criterion is a measure of the misclassiﬁcation rate of the states through MC R ˆ S = N N (cid:213) n = q (cid:32) (cid:18) − T T (cid:213) t = I( ˆ S t = S t ) (cid:19) (cid:33) , (4.5)where I(·) denotes the indicator function. In both cases, q denotes the posterior median.Tab. 1 reports the average in-sample median RMSE and MCR across 100 runs relative to the resultsof the baseline model. Results are presented for both modeling approaches and varying degrees ofshrinkage as indicated by various values of ω ψ . It is noticeable that the FAMS model performs consid-erably better than the baseline model. Moreover, the FAMS model performs better than the full data setin most cases. A few interesting patterns are worth mentioning. Unsurprisingly, the full informationset without a shrinkage prior performs even worse than the basline model (bear in mind that the fullinformation set consists of 200 time series). Applying shrinkage therefore deﬁnitely makes sense andimproves in-sample ﬁt and classiﬁcation. Furthermore, the degree of shrinkage matters. While theperformance of θ ψ = . θ ψ = θ ψ < . β getsclose to 0. In our experience, the coeﬃcients in a multinomial logistic prior in a MS context are rathersensitive and in general diﬃcult to estimate precisely. Often, this leads to the normal gamma setupapplying too much shrinkage. This furthermore underlines the beneﬁts of using estimated factors in aMS context. It drastically reduces multicollinearity issues and maximizes the amount of information A Gaussian prior with zero mean and a variance of 4 is assumed on β for the models with no shrinkage prior. Table 1:

Simulation Study Results

Average RMSE Average MCR

Full information set NG oﬀ 1 .

043 1 . ω ψ = . .

960 0 . ω ψ = . .

933 0 . ω ψ = . .

934 0 . ω ψ = . .

917 0 . ω ψ = . .

912 0 . ω ψ = . .

911 0 . Since the seminal paper of Hamilton (1989), Markov switching models have been used to detect andexplore the nature of business cycles. An excellent overview is provided by Hamilton (2016). Itis nowadays widely acknowledged that the nature of macroeconomic behaviour is usually not wellcaptured by linear models.

13. ZENS & M. BÖCK

From a theoretical point of view, there is a lively debate on why and how regime changes take place.One school of thought posits that the alternation of expansions and recessions results from diﬀerentpaces of technological innovations, which are assumed to be exogenous to the economy. Anotherparadigm starts by assuming that the market economy is inherently instable and produces businesscycles endogenously. A recent theoretical contribution by Matsuyama (2013) thus endogenizes creditmarkets and shows how an economy itself can drive recurrent booms and busts, with each bust sowingthe seeds of the next boom.

For the basic model of the macroeconomy of United States, we obtain publicly available data fromthe Federal Reserve Bank of St. Louis (see McCracken and Ng, 2016). This very well known dataset has been used in a variety of macroeconomic studies, see for instance Ludvigson et al. (2015) orStock and Watson (2016). It covers a variety of macroeconomic series capturing eight core areas ofthe US economy such as real activity and ﬁnancial market. Over 200 time series in this data set spanan information set that is too large to be employed in standard MS models. Hence, the FRED databaseis a proper candidate to demonstrate the beneﬁts of the FAMS model.As the goal is to model business cycles, the quarterly log diﬀerenced time series of industrial productionis modeled as uncentered AR(4) process with switching intercept: y t = µ S t + (cid:213) j = φ j y t − j + (cid:15) t , (cid:15) t ∼ N ( , σ ) . (5.1)This approach closely follows the seminal paper by Hamilton (1989), who however uses a centeredMS-AR model. We deviate from the centered parameterization, because in a Markov switching contextthe uncentered parametrization does not induce immediate mean level shifts. Instead, the mean levelapproaches the new value smoothly over various time periods. In our eyes, this behavior is a betterdescription of large, complex and dynamic systems such as an economy.After transforming all remaining variables to ensure stationarity and discarding time series with toomany NA values, the full data set employed contains 209 time series and covers 234 time periodsfrom 1959Q3 to 2017Q4. This data set is assumed to contain all necessary information to inform theswitching mean process of industrial production growth.The factor model outlined in Sec. 2.2 is then applied to this data set. Comparing BICs for modelswith 1-25 factors points into the direction of a model with seven factors. Thus, the 7-factor model isdescribed in more detail below. Guided by economic theory, a MS model with two states – interpreted (a) Full information set. (b)

Posterior mean of extracted factors.

Figure 1:

Input and output from factor model. as recessionary and expansionary periods – is implemented as in previous literature. . The estimatedfactors are then used as exogenous predictors in the MS framework to model the regime switchingbehavior interpreted as US business cycles.Identiﬁcation of the MS model is achieved by imposing the identifying restriction µ > µ . Identiﬁc-ation of the factor model is achieved by restricting the factor loadings matrix Λ to have zeros abovethe diagonal. In theory, this might lead to problems when the ﬁrst r variables gain too much inﬂuenceand weight when employed in estimating the factors. However, a large enough number of time seriesshould prohibit this scenario and estimating 7 factors on more than 200 time series minimizes the riskto a certain degree. In addition, the estimated factor loadings matrix is not extremely sparse, againcounteracting these possible threats to identiﬁcation (see Kaufmann and Schumacher, 2019). A more sophisticated approach would implement a marginal likelihood based grid search over various combinations ofnumber of factors, number of states and diﬀerent prior values.

15. ZENS & M. BÖCK (a)

Industrial production growth. (b)

Smoothed state probabilities.

Figure 2:

Input and output from Markov switching model with NBER recession dates.

The sampling algorithm is iterated 50 ,

000 times after discarding the ﬁrst 50 ,

000 draws as burn-in. Con-vergence of the MCMC sampler is usually very good. Fig. 1a and Fig. 1b show the full information setused to estimate the factors as well as the factors posterior means used to inform the switching process. The estimated factors explain a variance share of 34% when averaging over time and series. This iscomparable to related literature. For instance, Kaufmann and Schumacher (2019) explain around 38%of the variance of 282 time series using a model with 20 factors. However, the explained varianceshare varies signiﬁcantly over time and between series. The interquartile range of the explained vari-ance share over time lies between 23% and 45%. The factors explain over 90% of the variance of 6%of the time series, over 80% of 10% of the series and more than 50% of the variance of 36% of the series. To be clear, in this exercise the factor means in period t inform the switching process in period t . This corresponds toa delay parameter d =

0. It is possible to look at leading indicators by setting d >

0. However, this leaves the resultsqualitatively unchanged in this setup. Thus, we do not show these additional results for brevity reasons.

Table 2:

AR Process Parameter Estimates

Est. HPD Region µ Recession − . − .

99 0 . µ Expansion .

70 0 .

26 0 . σ .

40 1 .

14 1 . φ .

53 0 .

37 0 . φ − . − . − . φ . − .

05 0 . φ − . − . − . Note:

Estimates correspond to the posterior medians. 90% HPDregions are reported.

Interpreting the factors is more of an art than a science. However, to give a brief intuition on how thefactors could be interpreted, a table with the time series with the highest absolute factor loadings isprovided in App. C. This table is used to name the factors in the results presented below. However,when interpreting factors, there is usually no clear cut and distinct interpretation possible as we do notrestrict time series to only load on one factor. All in all, naming the factors is mainly done to achieveincreased ease of presentation, but should be taken with a grain of salt.Fig. 2a provides the industrial production growth rate time series used in the analysis. Fig. 2b plotsthe estimated posterior median of the transition probabilities. Both plots show recession dates aspublished by NBER as shaded areas. The FAMS model is able to capture the recessions rather well.Interestingly, the model detects a recession in 2014 that is not classiﬁed as a recession by NBER. Thecorresponding estimates of the parameters of the AR process can be found in Tab. 2. The interceptestimates point into the direction of one state with rather volatile growth rates of industrial production.A large part of the intercept posterior density of this state also lies in the negative spectrum. Thus,we proceed to label this state

Recession . On the other hand, the second state can be characterized byconsistently positive growth rates and is thus labeled

Expansion .The estimated uncentered AR(4) process seems to be suﬃciently stationary. This can be easily in-spected visually by rewriting the AR(4) process as AR(1) process. This results in the so called statespace or companion form of the process. If the eigenvalues of the resulting companion matrix lie wellinside the unit circle, the process is considered stationary. This exercise is depicted in Fig. 3, wherethe abscissa denotes the real part and the ordinate the imaginary part of the eigenvalues. Bold blackdots denote the median eigenvalues.

17. ZENS & M. BÖCK

Figure 3:

Stationarity plot for AR(4) process.

Finally, the results of the multinomial logit part of the FAMS model are provided graphically in Fig. 4. The states are labeled ”Expansion” and ”Recession” corresponding to the estimated posterior meangrowth rates. ”Expansion” is the baseline state in this setup, thus all coeﬃcients are set to 0 foridentiﬁcation reasons as discussed earlier. The black dots indicate posterior means and the grey linesshow the 90% HPD region of the posterior density of the MCMC draws. For this estimation run, weset ω ψ = . Business & Employment and

Real Activity ). Third,

Interest Rate Spreads seem to have quite somepredictive power when it comes to explaining recessions in this data set. This is in line with a largestrain of literature connecting real economic recessions and risk premiums on the ﬁnancial markets(Gilchrist and Zakrajšek, 2012; López-Salido et al. , 2017).For future applications, it will be interesting to estimate the FAMS with switching variances in stockmarket or interest rate applications as well as to experiment with various amounts of shrinkage. Itis also possible to estimate ω ψ from the data as demonstrated for instance in Malsiner-Walli et al. (2016). Numerical results are available from the authors upon request.

Expansion Recession−5.0 −2.5 0.0 2.5 −5.0 −2.5 0.0 2.5Real ActivityBusiness & EmploymentInterest RatesInterest Rate SpreadsConsumptionAssets & StocksCreditIntercept ExpansionIntercept Recession

Log Odds

Figure 4:

Estimated posterior medians of γ and β . In this article, we develop a factor-augmented Markov switching model with time varying transitionprobabilities. We assume that a set of latent factors informs the switching process through a multinomiallogistic prior setup. This alleviates problems arising from a high-dimensional data set where a largeamount of variables might potentially be relevant to model the Markov switching process. Estimationof the factors and the MS model is carried out in a Bayesian framework through Gibbs sampling. Ina ﬁrst step, the FAMS model is applied to artiﬁcial data sets. These simulation studies underline thebeneﬁts of the FAMS when compared to similar regime switching frameworks. In addition, a real dataapplication highlights the potential of the FAMS model in business cycle research.Future avenues for research include for instance a thorough cross-model comparison of forecastingabilities in diﬀerent environments and additional applications that test the usefulness of the FAMSframework in the context of switching variances. It will be interesting to see how the model performsin other real world applications where the goal is for instance to model interest rates (Ang and Bekaert,2002) or stock market returns (Schaller and Norden, 1997).To summarize, we hope the FAMS model will provide a proper modeling framework for ﬁelds whereMarkov switching models are commonly employed in data rich environments. Rather commonly,applications will thus be found in economics and ﬁnance. However, the modeling framework mightalso be useful in ﬁelds like medicine, where Markov switching models are used as an early detectionsystem for inﬂuenza (Martínez-Beneito et al. , 2008) or in accident analysis where MS models are usedto model vehicle accicdent frequencies (Malyshkina et al. , 2009).

References A MISANO G, AND F AGAN

G (2013), “Money growth and inﬂation: A regime switching approach,”

Journal ofInternational Money and Finance , 118–145. [5]A NG A, AND B EKAERT

G (2002), “Regime switches in interest rates,”

Journal of Business & Economic Stat-istics (2), 163–182. [19]

19. ZENS & M. BÖCK B ERNANKE

BS, B

OIVIN J, AND E LIASZ

P (2005), “Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach,”

Quarterly Journal of Economics (1), 387–422. [3]B

ITTO A, AND F RÜHWIRTH -S CHNATTER

S (2019), “Achieving shrinkage in a time-varying parameter modelframework,”

Journal of Econometrics (1), 75–97. [9]B

OOT

T (2017),

Macroeconomic Forecasting under Regime Switching, Structural Breaks and High-dimensionalData , Ph.D. thesis, Department of Econometrics, Erasmus University of Rotterdam. [3]C

ARTER

CK,

AND K OHN

R (1994), “On Gibbs sampling for state space models,”

Biometrika (3), 541–553.[9]C ELEUX

G, C

HAUVEAU D, AND D IEBOLT

J (1996), “Stochastic versions of the EM algorithm: an experi-mental study in the mixture case,”

Journal of Statistical Computation and Simulation (4), 287–314. [10] DE J ONG

VMT, E

IJKEMANS

MJC,

VAN C ALSTER

B, T

IMMERMAN

D, M

OONS

KGM, S

TEYERBERG

EW,

AND VAN S MEDEN

M (2019), “Sample size considerations and predictive performance of multinomial logisticprediction models,”

Statistics in Medicine (9), 1601–1619. [2]E NGEL

C (1994), “Can the Markov switching model forecast exchange rates?”

Journal of International Eco-nomics (1-2), 151–165. [3]F ILARDO

AJ (1994), “Business-cycle phases and their transitional dynamics,”

Journal of Business & EconomicStatistics (3), 299–308. [2]F RÜHWIRTH -S CHNATTER

S (1994), “Data augmentation and dynamic linear models,”

Journal of Time SeriesAnalysis (2), 183–202. [9]——— (2004), “Estimating marginal likelihoods for mixture and Markov switching models using bridgesampling techniques,” The Econometrics Journal (1), 143–167. [10]——— (2006), Finite mixture and Markov switching models , Berlin/Heidelberg: Springer Science+BusinessMedia. [4, 9]F

RÜHWIRTH -S CHNATTER S, AND F RÜHWIRTH

R (2010), “Data augmentation and MCMC for binary andmultinomial logit models,” in “Statistical modelling and regression structures,” 111–132, Springer. [9, 23, 24]G

EORGE

EI (2010), “Dilution Priors: Compensating for Model Space Redundancy,”

IMS Collections: Borrow-ing Strength: Theory Powering Applications-A. Festschrift for Lawrence D. Brown , 158. [3]G ILCHRIST S, AND Z AKRAJŠEK

E (2012), “Credit Spreads and Business Cycle Fluctuations,”

American Eco-nomic Review (4), 1692–1720. [18]G

RIFFIN

JE,

AND B ROWN

PJ (2010), “Inference with normal-gamma prior distributions in regression prob-lems,”

Bayesian Analysis (1), 171–188. [8]H AMILTON

JD (1989), “A New Approach to the Economic Analysis of Nonstationary Time Series and theBusiness Cycle,”

Econometrica (2), 357–384. [2, 9, 13, 14]——— (1994), Time Series Analysis , Princeton University Press. [2]——— (2010),

Regime switching models , 202–209, London. [2]——— (2016), “Macroeconomic regimes and regime shifts,” in “Handbook of Macroeconomics,” volume 2,163–201, Elsevier. [13]H

OLMES

CC,

AND H ELD

L (2006), “Bayesian auxiliary variable models for binary and multinomial regres-sion,”

Bayesian analysis (1), 145–168. [24]H ÖRMANN W, AND L EYDOLD

J (2014), “Generating generalized inverse Gaussian random variates,”

Statisticsand Computing (4), 547–557. [24]H URN

M, J

USTEL A, AND R OBERT

CP (2003), “Estimating mixtures of regressions,”

Journal of Computa-tional and Graphical Statistics (1), 55–79. [10]J ASRA

A, H

OLMES

CC,

AND S TEPHENS

DA (2005), “Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling,”

Statistical Science

ASTNER

G (2016), factorstochvol: Bayesian estimation of (sparse) latent factor stochastic volatility models , R -package version 0.8.3. [9, 10]——— (2019), “Sparse Bayesian time-varying covariance estimation in many dimensions,” Journal of Econo-metrics (1), 98 – 115, annals Issue in Honor of John Geweke “Complexity and Big Data in Economics andFinance: Recent Developments from a Bayesian Perspective”. [3, 6, 7, 8]K

ASTNER G, AND F RÜHWIRTH -S CHNATTER

S (2014), “Ancillarity-sufﬁciency interweaving strategy (ASIS)for boosting MCMC estimation of stochastic volatility models,”

Computational Statistics & Data Analysis ,408–423. [6]K AUFMANN

S (2015), “K-state switching models with time-varying transition distributions—Does loan growthsignal stronger effects of variables on inﬂation?”

Journal of Econometrics (1), 82–94. [2, 3, 5]K

AUFMANN S, AND S CHUMACHER

C (2019), “Bayesian estimation of sparse dynamic factor models withorder-independent and ex-post mode identiﬁcation,”

Journal of Econometrics (1), 116–134. [7, 15, 16]K IM CJ,

AND N ELSON

CR (1999),

State-space models with regime switching: Classical and Gibbs-samplingapproaches with applications , Cambridge and London: MIT press. [4, 9]L

ENK

PJ,

AND D E S ARBO

WS (2000), “Bayesian inference for ﬁnite mixtures of generalized linear modelswith random effects,”

Psychometrika (1), 93–119. [10]L EYDOLD J, AND H ÖRMANN

W (2015), “GIGrvg: Random variate generator for the GIG distribution,”

Rpackage version 0.4 . [24]L

OPES

HF (2014), “Modern Bayesian Factor Analysis,”

Bayesian Inference in the Social Sciences

OPES

HF,

AND W EST

M (2004), “Bayesian model assessment in factor analysis,”

Statistica Sinica (1),41–68. [6]L UDVIGSON

SC, M A S, AND N G S (2015), “Uncertainty and business cycles: exogenous impulse or endogen-ous response?” Technical report, National Bureau of Economic Research. [14]L

ÓPEZ -S ALIDO

D, S

TEIN

JC,

AND Z AKRAJŠEK

E (2017), “Credit-Market Sentiment and the BusinessCycle*,”

The Quarterly Journal of Economics (3), 1373–1426. [18]M

ALSINER -W ALLI

G, F

RÜHWIRTH -S CHNATTER S, AND G RÜN

B (2016), “Model-based clustering based onsparse ﬁnite Gaussian mixtures,”

Statistics and computing (1-2), 303–324. [18]M ALYSHKINA

NV, M

ANNERING

FL,

AND T ARKO

AP (2009), “Markov switching negative binomial models:an application to vehicle accident frequencies,”

Accident Analysis & Prevention (2), 217–226. [19]M ARTÍNEZ -B ENEITO

MA, C

ONESA

D, L

ÓPEZ -Q UÍLEZ A, AND L ÓPEZ -M ASIDE

A (2008), “BayesianMarkov switching models for the early detection of inﬂuenza epidemics,”

Statistics in Medicine (22), 4455–4468. [19]M ATSUYAMA

K (2013), “The Good, the Bad, and the Ugly: An inquiry into the causes and nature of creditcycles,”

Theoretical Economics (3), 623–651. [14]M C C RACKEN

MW,

AND N G S (2016), “FRED-MD: A Monthly Database for Macroeconomic Research,”

Journal of Business & Economic Statistics (4), 574–589. [14]M C F ADDEN

D (1974), “Conditional logit analysis of qualitative choice behavior,”

Frontiers in Econometrics

ELIGKOTSIDOU L, AND D ELLAPORTAS

P (2011), “Forecasting with non-homogeneous hidden Markov mod-els,”

Statistics and Computing (3), 439–449. [2, 3]P AGAN

AR,

AND S CHWERT

GW (1990), “Alternative models for conditional stock volatility,”

Journal ofEconometrics (1-2), 267–290. [2]P ARK T, AND C ASELLA

G (2008), “The bayesian lasso,”

Journal of the American Statistical Association

21. ZENS & M. BÖCK (482), 681–686. [9]P

OLSON

NG,

AND S COTT

JG (2010), “Shrink globally, act locally: sparse Bayesian regularization and predic-tion,”

Bayesian Statistics , 501–538. [8]R ANGANATHAN

P, P

RAMESH C, AND A GGARWAL

R (2017), “Common pitfalls in statistical analysis: Logisticregression,”

Perspectives in clinical research (3), 148. [2]R EDNER

RA,

AND W ALKER

HF (1984), “Mixture densities, maximum likelihood and the EM algorithm,”

SIAM review (2), 195–239. [10]S CHALLER H, AND N ORDEN

SV (1997), “Regime switching in stock market returns,”

Applied FinancialEconomics (2), 177–191. [19]S COTT

SL (2011), “Data augmentation, frequentist estimation, and the Bayesian analysis of multinomial logitmodels,”

Statistical Papers (1), 87–109. [24]S TEPHENS

M (2000a), “Bayesian analysis of mixture models with an unknown number of components-analternative to reversible jump methods,”

Annals of statistics

Journal of the Royal Statistical Society:Series B (Statistical Methodology) (4), 795–809. [10]S TOCK

JH,

AND W ATSON

MW (2002), “Forecasting using principal components from a large number ofpredictors,”

Journal of the American Statistical Association (460), 1167–1179. [3]——— (2016), “Dynamic factor models, factor-augmented vector autoregressions, and structural vector autore-gressions in macroeconomics,” in “Handbook of macroeconomics,” volume 2, 415–525, Elsevier. [14]Z AHID

FM,

AND T UTZ

G (2013), “Multinomial logit models with implicit variable selection,”

Advances inData Analysis and Classiﬁcation (4), 393–416. [2]Z ENS

G (2019), “Bayesian shrinkage in mixture-of-experts models: identifying robust determinants of classmembership,”

Advances in Data Analysis and Classiﬁcation . [2]

A Auxiliary Mixture Sampling of the MNL coeﬃcients

This section provides a short overview of the sampling technique employed to simulate from the posteriordistribution of the multiomial logit coeﬃcients β and γ . We implement an auxiliary mixture sampler resultingfrom the partial dRUM representation of the multinomial logistic regression in the style of Frühwirth-Schnatterand Frühwirth (2010) as follows:Let y i ( i = , . . . , N ) be an independent sequence of categorical data where y i can take one of m + L = { , . . . , m } and, for any k , the set of all categories except k by L − k = L \ { k } . Assume that the observed categorical outcomes y i result from underlying latent utility processgoverned by (continous) latent utilities y uk , i . The standard latent variable representation of the multinomiallogistic model following McFadden (1974) can be written as y uki = x i β k + δ ki , k = , . . . , m (A.1)where y i = k ⇔ y uki = max l ∈ L y uli . (A.2)Thus, the observed category corresponds to the category with the highest latent utility. If the error terms δ ki follow an extreme value type I distribution, the multinomial logistic regression model results as the marginaldistribution of y i . As shown in Frühwirth-Schnatter and Frühwirth (2010), the latent utilities y uki can be sampledsimultaneously from y uki = − log (cid:16) − log ( U i ) + (cid:205) ml = λ li − log V ki λ ki I { y i (cid:44) k } (cid:17) , (A.3)where U i and V i , . . . , V mi are m + λ li = exp ( x i β l ) for l = , . . . , m . This corresponds to the standard RUM representation of the multinomial logistic regression.Note that Eq. (A.2) can be rewritten as y i = k ⇔ y uki > y u − k , i , y u − k , i = max l ∈ L y uli . (A.4)Hence, we observe category k if and only if y uki is larger than the maximum of all other utilities. This makes itpossible to construct another set of latent variables w ki that is deﬁned as the diﬀerence between y uki and y u − k , i .Note that it directly follows that y i = k if and only if w ki >

0. The latent variables w ki thus make it possible toconstruct binary observations d ki = I { y i = k } whenever w ki > w ki = y uki − y u − k , i , d ki = I { w ki > } . (A.5)Frühwirth-Schnatter and Frühwirth (2010) show that the distribution of w ki has an explicit form for the multi-nomial logistic model and derive the partial dRUM representation of the multinomial logit as w ki = x i β k − log ( λ − k , i ) + (cid:15) ki (A.6)where (cid:15) ki follows a logistic distribution and

23. ZENS & M. BÖCK log ( λ − k , i ) = (cid:213) l ∈ L − k λ li . (A.7)Thus, the problem of sampling β k reduces to sampling regression coeﬃcients from a linear regression withlogistic errors. Various sampling methods have been proposed to accomplish this. For instance, Holmesand Held (2006) represent the logistic distribution of (cid:15) ki as inﬁnite scale mixture of normals, resulting in acomputationally rather demanding sampler. Scott (2011) applies independence MH steps to sample β k usinga normal proposal with variance π . In the present paper, we use the ﬁnite scale mixture approximation ofthe logistic distribution proposed in Frühwirth-Schnatter and Frühwirth (2010). With this sampling method,the problem collapses to sampling coeﬃcients from a normal linear regression model with heteroskedasticerrors. Hence, the resulting Gibbs sampler can be implemented in a computationally very eﬃcient way. Priordistributions and posterior simulation are discussed in Sec. 3.1 and Sec. 3.2. B Posterior Simulation of β The posterior distributions of λ j and ψ i , h are of well-known form and can be derived as π ( λ ψ, h | ·) ∼ G ( g , d h ) π ( ψ i , h | ·) ∼ GIG ( ω ψ − . , β i , h , λ ψ, h ω ψ ) g = ω ψ r + c d h = c + ω ψ (cid:205) mj = ψ i , h , (A.1)where r is the number of factors entering the model and GIG denotes the Generalized Inverse Gaussiandistribution. The posteriors of τ i , j and λ τ, i can be derived in a similar fashion. Hörmann and Leydold (2014)provide an eﬃcient adaptive rejection sampling algorithm that makes it possible to easily draw from the GIG .We use the R package

GIGrvg (Leydold and Hörmann, 2015) in our computations to employ this algorithm.

C Highest loading time series per factor

Table 3:

Ten Time Series with Highest Factor Loadings Factors 1-4

Industrial Production: Manufacturing (SIC) (Index 2012=100)Nonfarm Business Sector: Real OutputBusiness Sector: Real OutputReal Gross Domestic ProductReal private ﬁxed investmentIP:Final products Industrial Production: Final Products (Market Group) (Index 2012=100)IP:Consumer goods Industrial Production: Consumer Goods (Index 2012=100)Durable Consumer Goods (Index 2012=100)Unemployment Rate less than 27 weeks (Percent)Total Business Inventories (Millions of Dollars)All Employees: Wholesale Trade (Thousands of Persons)Capacity Utilization: Manufacturing (SIC) (Percent of Capacity)All Employees: Service-Providing Industries (Thousands of Persons)Shares of gross domestic product: Gross private domestic investment: Change in private inventoriesAll Employees: Trade, Transportation & Utilities (Thousands of Persons)Emp:Nonfarm All Employees: Total nonfarm (Thousands of Persons)All Employees: Total Private Industries (Thousands of Persons)Moody’s Seasoned Aaa Corporate Bond Minus Federal Funds RateAll Employees: Durable goods (Thousands of Persons)6-Month Treasury Bill: Secondary Market Rate (Percent)1-Year Treasury Constant Maturity Rate (Percent)3-Month Treasury Bill: Secondary Market Rate (Percent)3-Month AA Financial Commercial Paper Rate5-Year Treasury Constant Maturity RateEﬀective Federal Funds Rate (Percent)10-Year Treasury Constant Maturity Rate (Percent)Moody’s Seasoned Aaa Corporate Bond Yield (Percent)Moody’s Seasoned Baa Corporate Bond Yield (Percent)6-Month Treasury Bill Minus 3-Month Treasury Bill, secondary market (Percent)Moody’s Seasoned Aaa Corporate Bond Minus Federal Funds RateAll Employees: Education & Health Services (Thousands of Persons)All Employees: Government: State Government (Thousands of Persons)All Employees: Government: Local Government (Thousands of Persons)3-Month Commercial Paper Minus 3-Month Treasury Bill, secondary marketAll Employees: Other Services (Thousands of Persons)1-Year Treasury Constant Maturity Minus 3-Month Treasury Bill, secondary marketAll Employees: Government (Thousands of Persons)Average Weekly Hours of Production and Nonsupervisory Employees: ManufacturingCapacity Utilization: Manufacturing (SIC) (Percent of Capacity)

25. ZENS & M. BÖCK

Table 4:

Ten Time Series with Highest Factor Loadings Factors 5-7